View Issue Details

IDProjectCategoryView StatusLast Update
0018248CentOS-8kernelpublic2021-07-20 02:35
Reporterjasonborden Assigned To 
PrioritynormalSeverityminorReproducibilitysometimes
Status newResolutionopen 
Product Version8.4.2105 
Summary0018248: Ceph kernel client append write bug
DescriptionData corruption is randomly occurring in files opened for appending using the kernel cephfs client. I believe this is similar to bug #0015953 but on a newer kernel.
Steps To Reproduce$ for y in {1..1000} ; do echo "$(date -Isec) [testing] Running test number $y" >> test1.log ; sleep 1 ; done

The corruption appears to randomly occur, but more often while the file has also been read from. Corruption seems to occur at 4K byte boundaries:

$ hexdump -C test1.log
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000fd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 32 30 |..............20|
00000fe0 32 31 2d 30 37 2d 31 39 54 31 33 3a 33 37 3a 34 |21-07-19T13:37:4|
00000ff0 34 2d 30 36 3a 30 30 20 5b 74 65 73 74 69 6e 67 |4-06:00 [testing|
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001d50 00 00 00 00 00 00 00 00 32 30 32 31 2d 30 37 2d |........2021-07-|
Additional InformationI haven't seen this issue when using the cephfs fuse client; just the cephfs kernel client. I have seen the issue when using the 4.18.0-305.3.1 or 4.18.0-305.7.1 kernels. My ceph cluster has been version 15.2.12 and 16.2.5 while noticing this issue. I have attached a sample corrupt log file.
TagsNo tags attached.

Activities

jasonborden

jasonborden

2021-07-19 20:40

reporter  

test1.log (59,893 bytes)
toracat

toracat

2021-07-20 00:55

manager   ~0038543

There is a ceph bug known to affect RHEL/CentOS 8.4:

https://tracker.ceph.com/issues/51112

A patch is available. Perhaps it can be applied to the plus kernel for testing.
jasonborden

jasonborden

2021-07-20 01:23

reporter   ~0038544

I think the bug at https://tracker.ceph.com/issues/51112 is different than this bug.

I don't have a discontinuously numbered osdmap as described in the other bug report.
toracat

toracat

2021-07-20 02:35

manager   ~0038545

OK, thanks for checking.

Issue History

Date Modified Username Field Change
2021-07-19 20:40 jasonborden New Issue
2021-07-19 20:40 jasonborden File Added: test1.log
2021-07-20 00:55 toracat Note Added: 0038543
2021-07-20 01:23 jasonborden Note Added: 0038544
2021-07-20 02:35 toracat Note Added: 0038545