View Issue Details

IDProjectCategoryView StatusLast Update
0014280CentOS-7kernelpublic2017-12-21 18:39
Status newResolutionopen 
Platformx86_64OSCentOSOS Version7
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0014280: XFS fsync() performance anomaly
DescriptionWe're having an odd performance problem with XFS in CentOS 7.

I did some benchmarking of fdatasync()/sec performance on some database servers I manage, and found an anomaly that I thought would be worth mentioning here, to see if anyone has insight on this. The below table shows 4k block write iops with fsync/fdatasync called either only at the end of 100MB (Nosync column) or after every 4k block (last three cols):

Machine Nosync Raw XFS EXT4
======= ======= ======= ======= =======
A 57,143 16,505 6,789 3,224
B 131,959 16,172 857 2,940
C 146,286 21,658 8,846 4,708

The XFS figure for machine B seems to be a significant anomaly.

The nosync column represents a baseline average of 4K block write iops, to a raw device, when followed by a final fsync() at the end of 100MB of writes.

The raw column represents O_DIRECT 4k block fdatasync() write iops to a raw device.

The xfs and ext4 columns represent O_DIRECT 4k block fdatasync() write iops to a freshly truncated file on the respective filesystem.

All three machines have nonvolatile RAID controller cache, so the fdatasync() calls complete once the 4k block (with filesize metadata, in the case of xfs and ext4, since the file begins at zero-length) has been transferred to the controller.

Machine A is CentOS 6, without using dmcrypt on the partition in question. Machines B and C are CentOS 7 with luks/dmcrypt in the chain (controller -> dmcrypt -> LVM -> filesystem). Machine B is NUMA, otherwise all three are SMP with 8+ cores each.

Looking only at data writes through xfs/ext4, by not truncating the file and thus avoiding journaled file size changes, results in fdatasync iops through both filesystems approaching very close to the raw partition fdatasync iops.

Disabling all but one core on Machine B improved the XFS performance figure by about 50%, but that's still nowhere near the anticipated performance.

All tests were single-threaded (in user mode), though of course more threads are involved at the kernel level esp. with XFS.
Steps To ReproduceSee attached C program source (CentOS 7).

$ fsync_test {file or device to be hosed} {size in MiB}

To measure time:
$ time fsync_test {file or device to be hosed} {size in MiB}

The Nosync column was generated with "dd" using bs=4096 count=25600 conv=fsync.
Additional InformationCentOS 7 Kernel: Linux 3.10.0-693.11.1.el7.x86_64

Also tried: mainline Linux 4.14.6 (same issue, and additionally happened to get a lockup in xfsaild during one test with two cores enabled, one core on each NUMA node)




2017-12-15 22:46


fsync_test.c (1,358 bytes)


2017-12-21 18:39

reporter   ~0030812

We found the underlying issue. On this particular server, the RAID controller was accurately reporting the RAID stripe size to XFS, and XFS made an automatic decision about I/O size which hurt bursty database update performance.

Creating the XFS filesystem with "mkfs.xfs -d noalign" restored the required database performance for our workload.

Issue History

Date Modified Username Field Change
2017-12-15 22:46 gregb New Issue
2017-12-15 22:46 gregb File Added: fsync_test.c
2017-12-15 22:46 gregb Tag Attached: xfs
2017-12-21 18:39 gregb Note Added: 0030812