View Issue Details

IDProjectCategoryView StatusLast Update
0018472CentOS-8cachefilesdpublic2022-06-24 14:00
Reporterdahorak Assigned To 
PriorityhighSeveritycrashReproducibilitysometimes
Status newResolutionopen 
Summary0018472: Reading particular file from NFS share with cachefilesd enabled leads to kernel crash and reboot of the machine
DescriptionWe have temporary jenkins agents created as VMs in PSI OpenStack with NFS share used to store data related to the jenkins jobs there (logs, results, test configuration files and configuration dumps). Few days back, I re-created the base image used for creating the VMs and since that time, we were observing random unclear reboots of some of the VMs.

The problem occur, when I try to read some particular file from the NFS share - the VM immediately froze and gets rebooted. The only log I was able to gather was from `dmesg -w`:

```
[ 2577.095000] CacheFiles:
[ 2577.096181] CacheFiles: Assertion failed
[ 2577.097276] ------------[ cut here ]------------
[ 2577.098449] kernel BUG at fs/cachefiles/rdwr.c:719!
[ 2577.099779] invalid opcode: 0000 [#1] SMP NOPTI
```

If I'll stop or even restart`cachefilesd` service, the problem didn't occur.

I'm not sure, if it is a bug in the `cachefilesd` or directly in kernel (or elsewhere).

I have also kdump report, but I'm not sure, where to upload it (it has
Steps To ReproduceI'm not sure, how to reproduce from scratch! But I have running VM (and I'm able to create new one and mount the same NFS share), where it is easily reproducible, just by calling `cat jslave` (where `jslave` is the problematic file on the NFS share).

If I copied the `jsalve` file (on different machine) to new file `jslave-test` in the same directory, the `cat jslave-test` command passes without any issue.
Also reading most of other files from that NFS share seems to work correctly.

The content of the `jslave` file is just IP address without end of line character.

On machine with little bit older kernel (and probably some other packages), it looks like this:

```
$ ls -lZ jslave*
-rwxr-x--x. 1 jenkins jenkins system_u:object_r:nfs_t:s0 11 Jun 22 13:12 jslave
-rwxr-x--x. 1 jenkins jenkins system_u:object_r:nfs_t:s0 11 Jun 24 07:27 jslave-test

$ sha256sum jslave*
8fc9252abfc6effa64218e7d3e67985b4a536145dec98f66d8d7ff3f8cdfdc06 jslave
8fc9252abfc6effa64218e7d3e67985b4a536145dec98f66d8d7ff3f8cdfdc06 jslave-test

$ cat jslave
10.0.191.67$
```
(the $ character at the end of the last line is the prompt, because the file doesn't have EOL character)

When I try to run any command which tries to access the `jslave` file on the broken machine, it immediately freezes.
Additional InformationVersion of packages on the "broken" machine:
cachefilesd-0.10.10-4.el8.x86_64
glibc-2.28-203.el8.x86_64
glibc-all-langpacks-2.28-203.el8.x86_64
glibc-common-2.28-203.el8.x86_64
glibc-devel-2.28-203.el8.x86_64
glibc-gconv-extra-2.28-203.el8.x86_64
glibc-headers-2.28-203.el8.x86_64
kernel-4.18.0-358.el8.x86_64
kernel-4.18.0-394.el8.x86_64
kernel-core-4.18.0-358.el8.x86_64
kernel-core-4.18.0-394.el8.x86_64
kernel-headers-4.18.0-394.el8.x86_64
kernel-modules-4.18.0-358.el8.x86_64
kernel-modules-4.18.0-394.el8.x86_64
kernel-tools-4.18.0-394.el8.x86_64
kernel-tools-libs-4.18.0-394.el8.x86_64

On machine created from older image with following versions, it works correctly:
cachefilesd-0.10.10-4.el8.x86_64
glibc-2.28-200.el8.x86_64
glibc-all-langpacks-2.28-200.el8.x86_64
glibc-common-2.28-200.el8.x86_64
glibc-devel-2.28-200.el8.x86_64
glibc-gconv-extra-2.28-200.el8.x86_64
glibc-headers-2.28-200.el8.x86_64
kernel-4.18.0-358.el8.x86_64
kernel-4.18.0-383.el8.x86_64
kernel-core-4.18.0-358.el8.x86_64
kernel-core-4.18.0-383.el8.x86_64
kernel-headers-4.18.0-383.el8.x86_64
kernel-modules-4.18.0-358.el8.x86_64
kernel-modules-4.18.0-383.el8.x86_64
kernel-tools-4.18.0-383.el8.x86_64
kernel-tools-libs-4.18.0-383.el8.x86_64
TagsNo tags attached.

Activities

dahorak

dahorak

2022-06-24 13:00

reporter   ~0038950

Forgot to mention in the initial description the related line from fstab file:
```
nfs-server:/ocsci-jenkins /mnt/ocsci-jenkins nfs sec=sys,vers=4,fsc,minorversion=1 0 0
```

Issue History

Date Modified Username Field Change
2022-06-24 12:27 dahorak New Issue
2022-06-24 13:00 dahorak Note Added: 0038950