View Issue Details

IDProjectCategoryView StatusLast Update
0018138CentOS-7kernelpublic2021-04-12 09:46
Reporterlibbkmz Assigned To 
PriorityhighSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version7.9.2009 
Summary0018138: /proc/diskstats provide wrong numbers for md devices
DescriptionHello.

Starting from 4th of February I've received a lot of alerts regarding 100% utilization of /dev/md[0-9] devices in my systems from iostat which is backed by /proc/diskstats. We are using CentOS7 and with the latest kernel version. So, 4th of February is very likely to be a date for "3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021". Downgrading to 11.1 fixes and I can't see this issue on that version.

Here is the sample output from the real server:

$ > iostat -xy 1 1
Linux 3.10.0-1160.21.1.el7.x86_64 04/02/2021 _x86_64_ (32 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
           2.92 0.00 1.44 0.03 0.00 95.61

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 33.00 0.00 48.00 0.00 568.00 23.67 0.01 0.12 0.00 0.12 0.12 0.60
sdb 0.00 0.00 0.00 51.00 0.00 448.00 17.57 0.00 0.08 0.00 0.08 0.08 0.40
md2 0.00 0.00 0.00 225.00 0.00 1016.00 9.03 1970189.56 0.00 0.00 0.00 4.45 100.10
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 304.30 0.00 0.00 0.00 0.00 100.10
md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 367.00 0.00 0.00 0.00 0.00 100.00

As you can see abnormal high values for avgqu-sz and %util is 100% or even higher... Which just doesn't make sense. In the changelog of 15.2 kernel version I've found this:

 Mon Dec 14 13:00:00 2020 Augusto Caringi <acaringi@redhat.com> [3.10.0-1160.13.1.el7]
- [s390] zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl (Philipp Rudo) [1896826]
- [block] block/diskstats: more accurate approximation of io_ticks for slow disks (Ming Lei) [1859364]
- [block] block: delete part_round_stats and switch to less precise counting (Ming Lei) [1859364]
- [md] dm: simplify start of block stats accounting for bio-based (Ming Lei) [1859364]
- [block] block/rsxx: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]
- [block] drbd: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]
- [md] md: use generic io stats accounting functions to simplify io stat accounting (Ming Lei) [1859364]

Which can be related to this bug. I also tried to install clean CentOS 7 and it's very easy to reproduce this bug.
Steps To ReproduceInstall fresh CentOS 7
Upgrade to the kernel version 3.10.0-1160.21.1
dd if=/dev/zero of=./FILE0 bs=100M count=1
dd if=/dev/zero of=./FILE1 bs=100M count=2
losetup -f ./FILE0
losetup -f ./FILE1
mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/loop0 /dev/loop1
And now you can observe 100% util in the `iostat -xy 1 1` output
TagsNo tags attached.
abrt_hash
URL

Activities

TrevorH

TrevorH

2021-04-06 13:18

manager   ~0038362

Spammer replied, please ignore "advice" and do not click the URL
pschiffe

pschiffe

2021-04-06 14:54

reporter   ~0038363

I'm also seeing this bug on two of my systems, without any significant load:

avg-cpu: %user %nice %system %iowait %steal %idle
          17.37 0.00 2.98 0.13 0.00 79.52

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 32.00 0.00 107.50 6.72 0.00 0.09 0.00 0.09 0.09 0.30
sdb 0.00 0.00 0.00 32.00 0.00 107.50 6.72 0.01 0.22 0.00 0.22 0.19 0.60
sdc 0.00 0.00 0.00 28.00 0.00 527.00 37.64 0.17 5.89 0.00 5.89 5.39 15.10
sdd 0.00 0.00 0.00 28.00 0.00 527.00 37.64 0.16 5.86 0.00 5.86 5.25 14.70
sde 0.00 0.00 0.00 12.00 0.00 31.50 5.25 0.13 10.75 0.00 10.75 10.75 12.90
sdf 0.00 0.00 0.00 12.00 0.00 31.50 5.25 0.11 9.33 0.00 9.33 9.33 11.20
md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4267.00 0.00 0.00 0.00 0.00 100.00
md126 0.00 0.00 0.00 28.00 0.00 527.00 37.64 229281.65 0.00 0.00 0.00 35.71 100.00
dm-0 0.00 0.00 0.00 28.00 0.00 534.00 38.14 0.19 6.82 0.00 6.82 6.21 17.40
md125 0.00 0.00 0.00 24.00 0.00 96.00 8.00 986233.03 0.00 0.00 0.00 41.67 100.00
md124 0.00 0.00 0.00 6.00 0.00 14.50 4.83 103425.38 0.00 0.00 0.00 166.67 100.00
md123 0.00 0.00 0.00 0.00 0.00 0.00 0.00 379.00 0.00 0.00 0.00 0.00 100.00
md122 0.00 0.00 0.00 0.00 0.00 0.00 0.00 642.00 0.00 0.00 0.00 0.00 100.00
pschiffe

pschiffe

2021-04-12 09:46

reporter   ~0038371

I was able to reproduce the bug also on RHEL 7.9, so I've reported the issue to upstream as well: https://bugzilla.redhat.com/show_bug.cgi?id=1948494

Issue History

Date Modified Username Field Change
2021-04-02 17:13 libbkmz New Issue
2021-04-06 13:18 TrevorH Note Added: 0038362
2021-04-06 14:54 pschiffe Note Added: 0038363
2021-04-12 09:46 pschiffe Note Added: 0038371