2017-08-23 00:39 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0009284CentOS-7kernelpublic2017-05-09 23:29
Reporteravleenvig 
PriorityhighSeveritymajorReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version7
Product Version7.1-1503 
Target VersionFixed in Version 
Summary0009284: NFS hang in nfs_pageio_doio+0x50/0x50 [nfs]
DescriptionWe have a CentOS 7 NFS server mounted by 60 hosts.
All servers have 10G network uplinks.
When servers kick off their nightly backups (Elasticsearch, writing files directly to the NFS mount), network throughput jumps to about 4Gbit/sec on the NFS server.

At some point, the clients will stall trying to write to the server.
The end up with an open file handle on the NFS mount, which is stuck in uninterruptible disk access.
We can reproduce this quite reliably. It doesn't happen on all 60 clients, but at least on 4-5 of them each time, requiring a reboot to clear the issue.

Stack track as seen in dmesg on the clients:
[21361.356090] INFO: task java:29178 blocked for more than 120 seconds.
[21361.357289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[21361.358459] java D ffff88103fd13680 0 29178 1 0x00000080
[21361.359621] ffff880e4e8e3ab8 0000000000000086 ffff88202640c440 ffff880e4e8e3fd8
[21361.360765] ffff880e4e8e3fd8 ffff880e4e8e3fd8 ffff88202640c440 ffff88103fd13f48
[21361.361899] ffff88107ff90cd0 0000000000000002 ffffffffa06505a0 ffff880e4e8e3b30
[21361.363033] Call Trace:
[21361.364194] [<ffffffffa06505a0>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[21361.365337] [<ffffffff816099ad>] io_schedule+0x9d/0x130
[21361.366431] [<ffffffffa06505ae>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[21361.367527] [<ffffffff81607780>] __wait_on_bit+0x60/0x90
[21361.368604] [<ffffffffa06505a0>] ? nfs_pageio_doio+0x50/0x50 [nfs]
[21361.369690] [<ffffffff81607837>] out_of_line_wait_on_bit+0x87/0xb0
[21361.370845] [<ffffffff81098270>] ? autoremove_wake_function+0x40/0x40
[21361.371937] [<ffffffffa0651ad3>] nfs_wait_on_request+0x33/0x40 [nfs]
[21361.373022] [<ffffffffa0656351>] nfs_updatepage+0x121/0x8a0 [nfs]
[21361.374098] [<ffffffffa0646c11>] nfs_write_end+0x121/0x350 [nfs]
[21361.375164] [<ffffffff81156584>] generic_file_buffered_write+0x184/0x290
[21361.376247] [<ffffffff81158685>] __generic_file_aio_write+0x1d5/0x3e0
[21361.377325] [<ffffffff811588ed>] generic_file_aio_write+0x5d/0xc0
[21361.378393] [<ffffffffa0645d1b>] nfs_file_write+0xbb/0x1d0 [nfs]
[21361.379452] [<ffffffff811c5ebd>] do_sync_write+0x8d/0xd0
[21361.380491] [<ffffffff811c665d>] vfs_write+0xbd/0x1e0
[21361.381558] [<ffffffff811c70a8>] SyS_write+0x58/0xb0
[21361.382601] [<ffffffff81614209>] system_call_fastpath+0x16/0x1b

We've tried NFS v3 and v4, soft and hard mount options. The mounts are all using TCP.
Mount options on one of the hosts currently impacted:
rw,relatime,vers=4.0,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.100.16.39,local_lock=none
Steps To ReproduceMount up an NFS server, and write to it from a few dozen clients concurrently for a couple of hours, and then wait.
Additional InformationWe can reproduce this quite reliably.
Anther gentleman has also seen this and gisted his stacktrace:
https://gist.github.com/Millnert/ecc10d8cc79c81b55d7f

In speaking with him, he hasn't found a fix for this yet either.
Reverting to CentOS 6.5 resolves the problem.
TagsNo tags attached.
abrt_hash
URL
Attached Files

-Relationships
+Relationships

-Notes

~0024435

tru (administrator)

Thanks for the report.
Is the disc subsystem on the NFS server able to cope with the incoming network data?

~0024437

avleenvig (reporter)

Hi Tru! Thanks for the reply.

Yes, the disk subsystem definitely can.
The system has 36 x 4Tb disks, RAID 6, with a battery backup and write-cache enabled.
In testing, we've seen the server do over 1gbyte/sec in writes.

The times when we have this problem, there is little i/o wait, and we're only doing ~4gbit/sec network traffic at most - about 400-500mbyte/sec writes to disk.

~0026310

jlohrey (reporter)

I have encountered exactly the same issue with 7.2.1511.

The stack is exactly the same:

[ 6357.417419] INFO: task java:7305 blocked for more than 120 seconds.
[ 6357.417424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6357.417427] java D ffff8808fe147b20 0 7305 2725 0x00000080
[ 6357.417433] ffff8808fe1479c0 0000000000000086 ffff881fd01ad080 ffff8808fe147fd8
[ 6357.417443] ffff8808fe147fd8 ffff8808fe147fd8 ffff881fd01ad080 ffff883fff2f4780
[ 6357.417450] 0000000000000000 7fffffffffffffff ffffffffa05f3bb0 ffff8808fe147b20
[ 6357.417459] Call Trace:
[ 6357.417483] [<ffffffffa05f3bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
[ 6357.417492] [<ffffffff8163a9e9>] schedule+0x29/0x70
[ 6357.417497] [<ffffffff816386d9>] schedule_timeout+0x209/0x2d0
[ 6357.417507] [<ffffffff8101cd15>] ? native_sched_clock+0x35/0x80
[ 6357.417512] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[ 6357.417526] [<ffffffffa05f3bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
[ 6357.417532] [<ffffffff8163a01e>] io_schedule_timeout+0xae/0x130
[ 6357.417537] [<ffffffff8163a0b8>] io_schedule+0x18/0x20
[ 6357.417549] [<ffffffffa05f3bbe>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
[ 6357.417554] [<ffffffff81638860>] __wait_on_bit+0x60/0x90
[ 6357.417565] [<ffffffffa05f3bb0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
[ 6357.417570] [<ffffffff81638917>] out_of_line_wait_on_bit+0x87/0xb0
[ 6357.417575] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[ 6357.417589] [<ffffffffa05f5583>] nfs_wait_on_request+0x33/0x40 [nfs]
[ 6357.417603] [<ffffffffa05fa151>] nfs_updatepage+0x121/0x8a0 [nfs]
[ 6357.417615] [<ffffffffa05e9fa1>] nfs_write_end+0x121/0x350 [nfs]
[ 6357.417622] [<ffffffff81300159>] ? copy_user_enhanced_fast_string+0x9/0x20
[ 6357.417629] [<ffffffff81169ec4>] generic_file_buffered_write+0x184/0x290
[ 6357.417636] [<ffffffff815a120b>] ? inet_recvmsg+0x7b/0xa0
[ 6357.417641] [<ffffffff815a120b>] ? inet_recvmsg+0x7b/0xa0
[ 6357.417647] [<ffffffff8116b385>] __generic_file_aio_write+0x1d5/0x3e0
[ 6357.417653] [<ffffffff810bff6b>] ? should_numa_migrate_memory+0x5b/0x150
[ 6357.417658] [<ffffffff8116b5ed>] generic_file_aio_write+0x5d/0xc0
[ 6357.417669] [<ffffffffa05e903b>] nfs_file_write+0xbb/0x1d0 [nfs]
[ 6357.417675] [<ffffffff811dde0d>] do_sync_write+0x8d/0xd0
[ 6357.417680] [<ffffffff811de62d>] vfs_write+0xbd/0x1e0
[ 6357.417685] [<ffffffff811df282>] SyS_pwrite64+0x92/0xc0
[ 6357.417692] [<ffffffff81645a49>] system_call_fastpath+0x16/0x1b
[ 6477.325456] INFO: task java:7305 blocked for more than 120 seconds.

The client thread/application is stuck (indefinitely) on a pwrite system call.

This happens with quite regular monotony to the extent that it is unworkable. It is broken.

Again, TCP mounts, 10GE, NFSv3.

We'll try reverting to 6.5 -- although we are also looking at writing our own NFS client to bypass the kernel completely.

~0027182

sergei_dubarev (reporter)

JFYI there is a discussion of similar issue at http://marc.info?l=linux-nfs&m=146824199819382&w=4. The solution is to discard zero-length requests in nfs_updatepage().

~0028086

fozboz (reporter)

kernel: INFO: task pigz:19887 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: pigz D ffff880388527b30 0 19887 19884 0x00000080
kernel: ffff8803885279d0 0000000000000086 ffff8802a2721700 ffff880388527fd8
kernel: ffff880388527fd8 ffff880388527fd8 ffff8802a2721700 ffff880295b567c0
kernel: 0000000000000000 7fffffffffffffff ffffffffa0744bc0 ffff880388527b30
kernel: Call Trace:
kernel: [<ffffffffa0744bc0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
kernel: [<ffffffff8163bb39>] schedule+0x29/0x70
kernel: [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
kernel: [<ffffffff8101cd45>] ? native_sched_clock+0x35/0x80
kernel: [<ffffffff8101c859>] ? read_tsc+0x9/0x10
kernel: [<ffffffffa0744bc0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
kernel: [<ffffffff8163b16e>] io_schedule_timeout+0xae/0x130
kernel: [<ffffffff8163b208>] io_schedule+0x18/0x20
kernel: [<ffffffffa0744bce>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
kernel: [<ffffffff816399b0>] __wait_on_bit+0x60/0x90
kernel: [<ffffffffa0744bc0>] ? nfs_pgio_header_free+0x20/0x20 [nfs]
kernel: [<ffffffff81639a67>] out_of_line_wait_on_bit+0x87/0xb0
kernel: [<ffffffff810a6c00>] ? wake_atomic_t_function+0x40/0x40
kernel: [<ffffffffa0746593>] nfs_wait_on_request+0x33/0x40 [nfs]
kernel: [<ffffffffa074b161>] nfs_updatepage+0x121/0x8a0 [nfs]
kernel: [<ffffffffa073afa1>] nfs_write_end+0x121/0x350 [nfs]
kernel: [<ffffffff8116a224>] generic_file_buffered_write+0x184/0x290
kernel: [<ffffffff8116b6e5>] __generic_file_aio_write+0x1d5/0x3e0
kernel: [<ffffffff810b8796>] ? try_to_wake_up+0x1b6/0x300
kernel: [<ffffffff8116b94d>] generic_file_aio_write+0x5d/0xc0
kernel: [<ffffffffa073a03b>] nfs_file_write+0xbb/0x1d0 [nfs]
kernel: [<ffffffff811de54d>] do_sync_write+0x8d/0xd0
kernel: [<ffffffff811ded6d>] vfs_write+0xbd/0x1e0
kernel: [<ffffffff811df80f>] SyS_write+0x7f/0xe0
kernel: [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b

Centos 7, kernel 3.10.0-327

This has been sitting here for over a year and unassigned. Can someone move this along please? It seems a patch has already been provided. This is causing us serious issues copying backup tarballs over NFS.

~0028087

toracat (manager)

Here's the patch:

$ git describe 149a4fddd0a72d526abbeac0c8deaab03559836a
v4.7-rc5-271-g149a4fd

Author: Benjamin Coddington <bcodding@redhat.com>
Date: Mon Jul 18 10:41:57 2016 -0400

The patch was added to the upstream (kernel.org) kernel in July this year. The CentOS kernel will get the fix only when it appears in the upstream (RHEL) kernel.

In the meantime centosplus kernel (kernel-plus) can accommodate the patch if it applies cleanly.

~0028092

fozboz (reporter)

Thank you.

It applied cleanly and I'm now running a patched centos plus kernel. (I'm not sure why I couldn't have patched the upstream kernel though, the only differences between upstream and plus kernels seem to be some extra modules).

I'll update next week to see if problem has gone away.

~0028097

avleenvig (reporter)

I'm running the new kernel now, and unfortunately the problem seems to just have shifted slightly, we still get hangs. Now they're in nfs_initiate_pgio:

Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: INFO: task java:10449 blocked for more than 120 seconds.
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: java D ffffffff816895f0 0 10449 1 0x00000080
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: ffff8820966a7be0 0000000000000086 ffff88011011de20 ffff8820966a7fd8
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: ffff8820966a7fd8 ffff8820966a7fd8 ffff88011011de20 ffff883ffe2d6c40
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: 0000000000000000 7fffffffffffffff ffff88407ffca3e8 ffffffff816895f0
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: Call Trace:
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff816895f0>] ? bit_wait+0x50/0x50
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8168b579>] schedule+0x29/0x70
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffffa04ed124>] ? nfs_initiate_pgio+0xd4/0x160 [nfs]
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff816895f0>] ? bit_wait+0x50/0x50
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8168ab1e>] io_schedule_timeout+0xae/0x130
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8168abb8>] io_schedule+0x18/0x20
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff81689601>] bit_wait_io+0x11/0x50
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff81689125>] __wait_on_bit+0x65/0x90
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8117fb61>] wait_on_page_bit+0x81/0xa0
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff810b16c0>] ? wake_bit_function+0x40/0x40
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8117fc91>] filemap_fdatawait_range+0x111/0x1b0
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff81181f96>] filemap_write_and_wait_range+0x56/0x90
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffffa04e1df6>] nfs_file_fsync+0x86/0x110 [nfs]
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8122fbbb>] vfs_fsync+0x2b/0x40
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffffa04e2256>] nfs_file_flush+0x46/0x60 [nfs]
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff811fb8d4>] filp_close+0x34/0x80
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff8121cc38>] __close_fd+0x78/0xa0
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff811fd473>] SyS_close+0x23/0x50
Nov 18 20:15:18 logdbarchive12.ny2.etsy.com kernel: [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

~0028098

toracat (manager)

Can you test elrepo's kernel-ml?

~0028292

fozboz (reporter)

I still get nfs_pgio_header_free after patch.

Testing kernel-ml now.

~0028578

fozboz (reporter)

This is apparently now fixed in kernel-3.10.0-514.el7 which is being distributed in CentOS 7.3. Although it includes the same patch that I applied which didn't resolve the issue for me. https://access.redhat.com/solutions/2245341

Unfortunately the problem hasn't occured recently on patched 3.10.0-327 or 4.9.0-1 so I haven't proved anything.

Now running 3.10.0-327, 3.10.0-514 and 4.9.0-1!

~0028580

avleenvig (reporter)

Sorry it's taken me a while to circle back to this.

514 definitely did not fix the issue for us. We're testing 4.8, 4.9 and 4.10 right now, all of which seem to be fine - so that fix is to use kernel-ml for now.

~0029257

jagtapdeepak (reporter)

Hey Guys,
 
Even we are running in to same issue. We have guest vms created on nfs(v3) datastore & after running IOs for a while guest vm freeze and nfs server show same stack trace. Any suggestions for which kernel version should have this fix? We are currently on 3.10.0-327.

Thanks,
Deepak
+Notes

-Issue History
Date Modified Username Field Change
2015-08-19 02:18 avleenvig New Issue
2015-09-24 09:05 tru Note Added: 0024435
2015-09-24 13:24 avleenvig Note Added: 0024437
2016-04-17 11:07 jlohrey Note Added: 0026310
2016-08-02 20:16 sergei_dubarev Note Added: 0027182
2016-12-06 16:59 fozboz Note Added: 0028086
2016-12-06 18:10 toracat Note Added: 0028087
2016-12-06 18:12 toracat Status new => assigned
2016-12-07 15:29 fozboz Note Added: 0028092
2016-12-07 19:05 avleenvig Note Added: 0028097
2016-12-07 19:24 toracat Note Added: 0028098
2017-01-06 11:39 fozboz Note Added: 0028292
2017-02-15 12:59 fozboz Note Added: 0028578
2017-02-15 15:15 avleenvig Note Added: 0028580
2017-05-09 23:29 jagtapdeepak Note Added: 0029257
+Issue History