View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0018472 | CentOS-8 | cachefilesd | public | 2022-06-24 12:27 | 2022-06-24 14:00 |
Reporter | dahorak | Assigned To | |||
Priority | high | Severity | crash | Reproducibility | sometimes |
Status | new | Resolution | open | ||
Summary | 0018472: Reading particular file from NFS share with cachefilesd enabled leads to kernel crash and reboot of the machine | ||||
Description | We have temporary jenkins agents created as VMs in PSI OpenStack with NFS share used to store data related to the jenkins jobs there (logs, results, test configuration files and configuration dumps). Few days back, I re-created the base image used for creating the VMs and since that time, we were observing random unclear reboots of some of the VMs. The problem occur, when I try to read some particular file from the NFS share - the VM immediately froze and gets rebooted. The only log I was able to gather was from `dmesg -w`: ``` [ 2577.095000] CacheFiles: [ 2577.096181] CacheFiles: Assertion failed [ 2577.097276] ------------[ cut here ]------------ [ 2577.098449] kernel BUG at fs/cachefiles/rdwr.c:719! [ 2577.099779] invalid opcode: 0000 [#1] SMP NOPTI ``` If I'll stop or even restart`cachefilesd` service, the problem didn't occur. I'm not sure, if it is a bug in the `cachefilesd` or directly in kernel (or elsewhere). I have also kdump report, but I'm not sure, where to upload it (it has | ||||
Steps To Reproduce | I'm not sure, how to reproduce from scratch! But I have running VM (and I'm able to create new one and mount the same NFS share), where it is easily reproducible, just by calling `cat jslave` (where `jslave` is the problematic file on the NFS share). If I copied the `jsalve` file (on different machine) to new file `jslave-test` in the same directory, the `cat jslave-test` command passes without any issue. Also reading most of other files from that NFS share seems to work correctly. The content of the `jslave` file is just IP address without end of line character. On machine with little bit older kernel (and probably some other packages), it looks like this: ``` $ ls -lZ jslave* -rwxr-x--x. 1 jenkins jenkins system_u:object_r:nfs_t:s0 11 Jun 22 13:12 jslave -rwxr-x--x. 1 jenkins jenkins system_u:object_r:nfs_t:s0 11 Jun 24 07:27 jslave-test $ sha256sum jslave* 8fc9252abfc6effa64218e7d3e67985b4a536145dec98f66d8d7ff3f8cdfdc06 jslave 8fc9252abfc6effa64218e7d3e67985b4a536145dec98f66d8d7ff3f8cdfdc06 jslave-test $ cat jslave 10.0.191.67$ ``` (the $ character at the end of the last line is the prompt, because the file doesn't have EOL character) When I try to run any command which tries to access the `jslave` file on the broken machine, it immediately freezes. | ||||
Additional Information | Version of packages on the "broken" machine: cachefilesd-0.10.10-4.el8.x86_64 glibc-2.28-203.el8.x86_64 glibc-all-langpacks-2.28-203.el8.x86_64 glibc-common-2.28-203.el8.x86_64 glibc-devel-2.28-203.el8.x86_64 glibc-gconv-extra-2.28-203.el8.x86_64 glibc-headers-2.28-203.el8.x86_64 kernel-4.18.0-358.el8.x86_64 kernel-4.18.0-394.el8.x86_64 kernel-core-4.18.0-358.el8.x86_64 kernel-core-4.18.0-394.el8.x86_64 kernel-headers-4.18.0-394.el8.x86_64 kernel-modules-4.18.0-358.el8.x86_64 kernel-modules-4.18.0-394.el8.x86_64 kernel-tools-4.18.0-394.el8.x86_64 kernel-tools-libs-4.18.0-394.el8.x86_64 On machine created from older image with following versions, it works correctly: cachefilesd-0.10.10-4.el8.x86_64 glibc-2.28-200.el8.x86_64 glibc-all-langpacks-2.28-200.el8.x86_64 glibc-common-2.28-200.el8.x86_64 glibc-devel-2.28-200.el8.x86_64 glibc-gconv-extra-2.28-200.el8.x86_64 glibc-headers-2.28-200.el8.x86_64 kernel-4.18.0-358.el8.x86_64 kernel-4.18.0-383.el8.x86_64 kernel-core-4.18.0-358.el8.x86_64 kernel-core-4.18.0-383.el8.x86_64 kernel-headers-4.18.0-383.el8.x86_64 kernel-modules-4.18.0-358.el8.x86_64 kernel-modules-4.18.0-383.el8.x86_64 kernel-tools-4.18.0-383.el8.x86_64 kernel-tools-libs-4.18.0-383.el8.x86_64 | ||||
Tags | No tags attached. | ||||