View Issue Details

IDProjectCategoryView StatusLast Update
0013714CentOS-7kernelpublic2018-12-19 06:14
Reporterskilledno1 
PrioritynormalSeveritymajorReproducibilitysometimes
Status newResolutionopen 
Platformx86_64OSCentOS LinuxOS Versionrelease 7.3.1611
Product Version7.3.1611 
Target VersionFixed in Version 
Summary0013714: ps aux hangs with call_rwsem_down_write_failed when running docker container with resource limit
DescriptionHi, All, I run a java application container with resource limit, as lots of oom-kill events happen every 2 or 3 minutes, then execute ps aux, the command hangs. this problem is easy to reproduced when run docker instance with kubernetes, and khugepage was already set disabled.

strace ps aux hang info:
```
read(6, "Name:\tpause\nState:\tS (sleeping)\n"..., 2048) = 1065
close(6) = 0
open("/proc/43136/cmdline", O_RDONLY) = 6
read(6, "/pause\0", 131072) = 7
read(6, "", 131065) = 0
close(6) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=388, ...}) = 0
write(1, "root 43136 0.0 0.0 1020 "..., 73root 43136 0.0 0.0 1020 4 ? Ss 8月18 0:00 /pause
) = 73
stat("/proc/43140", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/44860/stat", O_RDONLY) = 6
read(6, "44860 (exe) D 1 43140 43140 0 -1"..., 2048) = 319
close(6) = 0
open("/proc/44860/status", O_RDONLY) = 6
read(6, "Name:\texe\nState:\tD (disk sleep)\n"..., 2048) = 1060
close(6) = 0
open("/proc/44860/cmdline", O_RDONLY) = 6
read(6,
```
process stack info:
```
[<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[<ffffffff81187a1c>] vm_mmap_pgoff+0x8c/0xe0
[<ffffffff8119cb86>] SyS_mmap_pgoff+0x116/0x270
[<ffffffff81019712>] SyS_mmap+0x22/0x30
[<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
```

lots of stack Call Trace messages fellowed on the system:
```
[17983.298787] INFO: task kworker/13:0:172 blocked for more than 120 seconds.
[17983.298864] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.298931] kworker/13:0 D ffff880321a01a20 0 172 2 0x00000000
[17983.298986] Workqueue: xfs-data/dm-22 xfs_end_io [xfs]
[17983.298990] ffff880819be7c98 0000000000000046 ffff880819bd2280 ffff880819be7fd8
[17983.298999] ffff880819be7fd8 ffff880819be7fd8 ffff880819bd2280 ffff880819bd2280
[17983.299005] ffff880321a01a10 ffff880321a01a18 ffffffff00000000 ffff880321a01a20
[17983.299012] Call Trace:
[17983.299027] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.299035] [<ffffffff8163c0c5>] rwsem_down_write_failed+0x115/0x220
[17983.299044] [<ffffffff810c3401>] ? enqueue_entity+0x181/0x890
[17983.299074] [<ffffffffa05ee236>] ? xfs_setfilesize+0x56/0x130 [xfs]
[17983.299083] [<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[17983.299089] [<ffffffff81639b5d>] ? down_write+0x2d/0x30
[17983.299122] [<ffffffffa06069a1>] xfs_ilock+0xc1/0x120 [xfs]
[17983.299150] [<ffffffffa05ee236>] xfs_setfilesize+0x56/0x130 [xfs]
[17983.299177] [<ffffffffa05ef182>] xfs_end_io+0x62/0xc0 [xfs]
[17983.299183] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[17983.299188] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[17983.299193] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[17983.299199] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[17983.299206] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[17983.299214] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[17983.299220] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[17983.299374] INFO: task kworker/13:2:9980 blocked for more than 120 seconds.
[17983.299433] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.299499] kworker/13:2 D ffff880819bd2a84 0 9980 2 0x00000080
[17983.299529] Workqueue: xfs-data/dm-22 xfs_end_io [xfs]
[17983.299531] ffff880805197c98 0000000000000046 ffff8807f98dd080 ffff880805197fd8
[17983.299538] ffff880805197fd8 ffff880805197fd8 ffff8807f98dd080 ffff8807f98dd080
[17983.299544] ffff880321a01a10 ffff880321a01a18 ffffffff00000000 ffff880321a01a20
[17983.299550] Call Trace:
[17983.299556] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.299562] [<ffffffff8163c0c5>] rwsem_down_write_failed+0x115/0x220
[17983.299567] [<ffffffff810a6b00>] ? autoremove_wake_function+0x20/0x40
[17983.299593] [<ffffffffa05ee236>] ? xfs_setfilesize+0x56/0x130 [xfs]
[17983.299600] [<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[17983.299605] [<ffffffff81639b5d>] ? down_write+0x2d/0x30
[17983.299636] [<ffffffffa06069a1>] xfs_ilock+0xc1/0x120 [xfs]
[17983.299662] [<ffffffffa05ee236>] xfs_setfilesize+0x56/0x130 [xfs]
[17983.299687] [<ffffffffa05ef182>] xfs_end_io+0x62/0xc0 [xfs]
[17983.299712] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[17983.299717] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[17983.299722] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[17983.299728] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[17983.299734] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[17983.299740] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[17983.299745] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[17983.299915] INFO: task java:43364 blocked for more than 120 seconds.
[17983.299970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.300036] java D ffff8803dfc8f4c0 0 43364 43299 0x00000080
[17983.300041] ffff8803dfc8f360 0000000000000086 ffff8800babbe780 ffff8803dfc8ffd8
[17983.300047] ffff8803dfc8ffd8 ffff8803dfc8ffd8 ffff8800babbe780 ffff88041fa34780
[17983.300056] 0000000000000000 7fffffffffffffff ffffffff811688b0 ffff8803dfc8f4c0
[17983.300062] Call Trace:
[17983.300069] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.300075] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.300080] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[17983.300089] [<ffffffff812c74a7>] ? queue_unplugged+0x37/0xa0
[17983.300100] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[17983.300105] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.300110] [<ffffffff81639f3e>] io_schedule_timeout+0xae/0x130
[17983.300115] [<ffffffff81639fd8>] io_schedule+0x18/0x20
[17983.300119] [<ffffffff811688be>] sleep_on_page+0xe/0x20
[17983.300124] [<ffffffff81638780>] __wait_on_bit+0x60/0x90
[17983.300130] [<ffffffff81168646>] wait_on_page_bit+0x86/0xb0
[17983.300135] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[17983.300154] [<ffffffffa0337e0e>] ? dm_any_congested+0x4e/0x60 [dm_mod]
[17983.300163] [<ffffffff8117d472>] shrink_page_list+0x6c2/0xad0
[17983.300171] [<ffffffff8117df3a>] shrink_inactive_list+0x1ea/0x560
[17983.300176] [<ffffffff8117ea05>] shrink_lruvec+0x375/0x760
[17983.300182] [<ffffffff8117ee66>] shrink_zone+0x76/0x1a0
[17983.300187] [<ffffffff8117f370>] do_try_to_free_pages+0xf0/0x4e0
[17983.300197] [<ffffffffa0336aca>] ? __map_bio+0x3a/0x100 [dm_mod]
[17983.300204] [<ffffffff8117f9aa>] try_to_free_mem_cgroup_pages+0xca/0x160
[17983.300212] [<ffffffff811d207e>] mem_cgroup_reclaim+0x4e/0xe0
[17983.300217] [<ffffffff811d253c>] __mem_cgroup_try_charge+0x42c/0x650
[17983.300226] [<ffffffff81216ac8>] ? __bio_add_page+0x1f8/0x2a0
[17983.300231] [<ffffffff811d2e89>] mem_cgroup_charge_common+0x59/0xc0
[17983.300236] [<ffffffff811d505a>] mem_cgroup_cache_charge+0x8a/0xb0
[17983.300242] [<ffffffff811694e2>] __add_to_page_cache_locked+0x52/0x260
[17983.300249] [<ffffffff81169747>] add_to_page_cache_lru+0x37/0xb0
[17983.300254] [<ffffffff8121fae5>] mpage_readpages+0xb5/0x160
[17983.300281] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.300308] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.300334] [<ffffffffa05eea5d>] xfs_vm_readpages+0x1d/0x20 [xfs]
[17983.300344] [<ffffffff81175cdc>] __do_page_cache_readahead+0x1cc/0x250
[17983.300350] [<ffffffff81176321>] ra_submit+0x21/0x30
[17983.300355] [<ffffffff8116b7ed>] filemap_fault+0x11d/0x430
[17983.300385] [<ffffffffa05fa1df>] xfs_filemap_fault+0x4f/0xa0 [xfs]
[17983.300395] [<ffffffff81192b2e>] __do_fault+0x7e/0x510
[17983.300402] [<ffffffff81197088>] handle_mm_fault+0x5b8/0xf50
[17983.300411] [<ffffffff810e5092>] ? do_futex+0x122/0x5b0
[17983.300417] [<ffffffff81640e22>] __do_page_fault+0x152/0x420
[17983.300422] [<ffffffff81641113>] do_page_fault+0x23/0x80
[17983.300428] [<ffffffff8163d408>] page_fault+0x28/0x30
[17983.300435] INFO: task java:43383 blocked for more than 120 seconds.
[17983.300490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.300555] java D ffff880324b4f4c0 0 43383 43299 0x00000080
[17983.300561] ffff880324b4f360 0000000000000086 ffff880326372e00 ffff880324b4ffd8
[17983.300567] ffff880324b4ffd8 ffff880324b4ffd8 ffff880326372e00 ffff88041fa34780
[17983.300573] 0000000000000000 7fffffffffffffff ffffffff811688b0 ffff880324b4f4c0
[17983.300579] Call Trace:
[17983.300584] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.300590] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.300596] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[17983.300602] [<ffffffff812c74a7>] ? queue_unplugged+0x37/0xa0
[17983.300608] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[17983.300612] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.300617] [<ffffffff81639f3e>] io_schedule_timeout+0xae/0x130
[17983.300622] [<ffffffff81639fd8>] io_schedule+0x18/0x20
[17983.300626] [<ffffffff811688be>] sleep_on_page+0xe/0x20
[17983.300633] [<ffffffff81638780>] __wait_on_bit+0x60/0x90
[17983.300637] [<ffffffff81168646>] wait_on_page_bit+0x86/0xb0
[17983.300642] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[17983.300653] [<ffffffffa0337e0e>] ? dm_any_congested+0x4e/0x60 [dm_mod]
[17983.300658] [<ffffffff8117d472>] shrink_page_list+0x6c2/0xad0
[17983.300664] [<ffffffff8117c843>] ? isolate_lru_pages.isra.43+0xd3/0x190
[17983.300671] [<ffffffff8117df3a>] shrink_inactive_list+0x1ea/0x560
[17983.300676] [<ffffffff8117ea05>] shrink_lruvec+0x375/0x760
[17983.300681] [<ffffffff8109b426>] ? __queue_work+0x136/0x320
[17983.300685] [<ffffffff8109b426>] ? __queue_work+0x136/0x320
[17983.300702] [<ffffffff8117ee66>] shrink_zone+0x76/0x1a0
[17983.300708] [<ffffffff8117f370>] do_try_to_free_pages+0xf0/0x4e0
[17983.300717] [<ffffffffa0336aca>] ? __map_bio+0x3a/0x100 [dm_mod]
[17983.300722] [<ffffffff8117f9aa>] try_to_free_mem_cgroup_pages+0xca/0x160
[17983.300728] [<ffffffff811d207e>] mem_cgroup_reclaim+0x4e/0xe0
[17983.300741] [<ffffffff811d253c>] __mem_cgroup_try_charge+0x42c/0x650
[17983.300747] [<ffffffff81216ac8>] ? __bio_add_page+0x1f8/0x2a0
[17983.300752] [<ffffffff811d2e89>] mem_cgroup_charge_common+0x59/0xc0
[17983.300757] [<ffffffff811d505a>] mem_cgroup_cache_charge+0x8a/0xb0
[17983.300761] [<ffffffff811694e2>] __add_to_page_cache_locked+0x52/0x260
[17983.300766] [<ffffffff81169747>] add_to_page_cache_lru+0x37/0xb0
[17983.300770] [<ffffffff8121fae5>] mpage_readpages+0xb5/0x160
[17983.300798] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.300823] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.300849] [<ffffffffa05eea5d>] xfs_vm_readpages+0x1d/0x20 [xfs]
[17983.300855] [<ffffffff81175cdc>] __do_page_cache_readahead+0x1cc/0x250
[17983.300864] [<ffffffff81176321>] ra_submit+0x21/0x30
[17983.300868] [<ffffffff8116b7ed>] filemap_fault+0x11d/0x430
[17983.300897] [<ffffffffa05fa1df>] xfs_filemap_fault+0x4f/0xa0 [xfs]
[17983.300904] [<ffffffff81192b2e>] __do_fault+0x7e/0x510
[17983.300910] [<ffffffff81197088>] handle_mm_fault+0x5b8/0xf50
[17983.300916] [<ffffffff81640e22>] __do_page_fault+0x152/0x420
[17983.300921] [<ffffffff81641113>] do_page_fault+0x23/0x80
[17983.300927] [<ffffffff8163d408>] page_fault+0x28/0x30
[17983.300932] INFO: task java:43385 blocked for more than 120 seconds.
[17983.300988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.301053] java D ffff8803efb23840 0 43385 43299 0x00000080
[17983.301058] ffff8803efb236e0 0000000000000086 ffff880326374500 ffff8803efb23fd8
[17983.301066] ffff8803efb23fd8 ffff8803efb23fd8 ffff880326374500 ffff88041fa34780
[17983.301072] 0000000000000000 7fffffffffffffff ffffffff811688b0 ffff8803efb23840
[17983.301078] Call Trace:
[17983.301083] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301088] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.301093] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[17983.301099] [<ffffffff812c74a7>] ? queue_unplugged+0x37/0xa0
[17983.301106] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[17983.301110] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301115] [<ffffffff81639f3e>] io_schedule_timeout+0xae/0x130
[17983.301120] [<ffffffff81639fd8>] io_schedule+0x18/0x20
[17983.301124] [<ffffffff811688be>] sleep_on_page+0xe/0x20
[17983.301129] [<ffffffff81638780>] __wait_on_bit+0x60/0x90
[17983.301134] [<ffffffff81168646>] wait_on_page_bit+0x86/0xb0
[17983.301140] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[17983.301151] [<ffffffffa0337e0e>] ? dm_any_congested+0x4e/0x60 [dm_mod]
[17983.301156] [<ffffffff8117d472>] shrink_page_list+0x6c2/0xad0
[17983.301162] [<ffffffff8117c843>] ? isolate_lru_pages.isra.43+0xd3/0x190
[17983.301167] [<ffffffff8117df3a>] shrink_inactive_list+0x1ea/0x560
[17983.301173] [<ffffffff8117ea05>] shrink_lruvec+0x375/0x760
[17983.301180] [<ffffffff8117ee66>] shrink_zone+0x76/0x1a0
[17983.301186] [<ffffffff8117f370>] do_try_to_free_pages+0xf0/0x4e0
[17983.301191] [<ffffffff8117f9aa>] try_to_free_mem_cgroup_pages+0xca/0x160
[17983.301196] [<ffffffff811d207e>] mem_cgroup_reclaim+0x4e/0xe0
[17983.301201] [<ffffffff811d253c>] __mem_cgroup_try_charge+0x42c/0x650
[17983.301207] [<ffffffff811d3dab>] __mem_cgroup_try_charge_swapin+0x9b/0xd0
[17983.301213] [<ffffffff81168b6e>] ? __find_get_page+0x1e/0xa0
[17983.301218] [<ffffffff811d4cb7>] mem_cgroup_try_charge_swapin+0x57/0x70
[17983.301223] [<ffffffff811972fd>] handle_mm_fault+0x82d/0xf50
[17983.301228] [<ffffffff81640e22>] __do_page_fault+0x152/0x420
[17983.301233] [<ffffffff81641113>] do_page_fault+0x23/0x80
[17983.301239] [<ffffffff8163d408>] page_fault+0x28/0x30
[17983.301244] INFO: task java:43387 blocked for more than 120 seconds.
[17983.301298] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.301363] java D ffff8803efb6f4c0 0 43387 43299 0x00000080
[17983.301367] ffff8803efb6f360 0000000000000086 ffff880326375c00 ffff8803efb6ffd8
[17983.301375] ffff8803efb6ffd8 ffff8803efb6ffd8 ffff880326375c00 ffff88041fa54780
[17983.301381] 0000000000000000 7fffffffffffffff ffffffff811688b0 ffff8803efb6f4c0
[17983.301387] Call Trace:
[17983.301392] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301397] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.301402] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[17983.301408] [<ffffffff81065e58>] ? native_flush_tlb_others+0xb8/0xc0
[17983.301415] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[17983.301419] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301424] [<ffffffff81639f3e>] io_schedule_timeout+0xae/0x130
[17983.301429] [<ffffffff81639fd8>] io_schedule+0x18/0x20
[17983.301433] [<ffffffff811688be>] sleep_on_page+0xe/0x20
[17983.301438] [<ffffffff81638780>] __wait_on_bit+0x60/0x90
[17983.301443] [<ffffffff81168646>] wait_on_page_bit+0x86/0xb0
[17983.301447] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[17983.301458] [<ffffffffa0337e0e>] ? dm_any_congested+0x4e/0x60 [dm_mod]
[17983.301462] [<ffffffff8117d472>] shrink_page_list+0x6c2/0xad0
[17983.301469] [<ffffffff8117df3a>] shrink_inactive_list+0x1ea/0x560
[17983.301474] [<ffffffff8117ea05>] shrink_lruvec+0x375/0x760
[17983.301481] [<ffffffff8109b426>] ? __queue_work+0x136/0x320
[17983.301485] [<ffffffff8109b426>] ? __queue_work+0x136/0x320
[17983.301491] [<ffffffff8117ee66>] shrink_zone+0x76/0x1a0
[17983.301496] [<ffffffff8117f370>] do_try_to_free_pages+0xf0/0x4e0
[17983.301501] [<ffffffff8116be05>] ? mempool_alloc_slab+0x15/0x20
[17983.301527] [<ffffffffa05cccc8>] ? xfs_bmbt_get_all+0x18/0x20 [xfs]
[17983.301534] [<ffffffff8117f9aa>] try_to_free_mem_cgroup_pages+0xca/0x160
[17983.301539] [<ffffffff811d207e>] mem_cgroup_reclaim+0x4e/0xe0
[17983.301543] [<ffffffff811d253c>] __mem_cgroup_try_charge+0x42c/0x650
[17983.301549] [<ffffffff81216ac8>] ? __bio_add_page+0x1f8/0x2a0
[17983.301554] [<ffffffff811d2e89>] mem_cgroup_charge_common+0x59/0xc0
[17983.301559] [<ffffffff811d505a>] mem_cgroup_cache_charge+0x8a/0xb0
[17983.301565] [<ffffffff811694e2>] __add_to_page_cache_locked+0x52/0x260
[17983.301569] [<ffffffff81169747>] add_to_page_cache_lru+0x37/0xb0
[17983.301574] [<ffffffff8121fae5>] mpage_readpages+0xb5/0x160
[17983.301600] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.301627] [<ffffffffa05ef910>] ? __xfs_get_blocks+0x4b0/0x4b0 [xfs]
[17983.301653] [<ffffffffa05eea5d>] xfs_vm_readpages+0x1d/0x20 [xfs]
[17983.301661] [<ffffffff81175cdc>] __do_page_cache_readahead+0x1cc/0x250
[17983.301667] [<ffffffff81176321>] ra_submit+0x21/0x30
[17983.301671] [<ffffffff8116b7ed>] filemap_fault+0x11d/0x430
[17983.301700] [<ffffffffa05fa1df>] xfs_filemap_fault+0x4f/0xa0 [xfs]
[17983.301716] [<ffffffff81192b2e>] __do_fault+0x7e/0x510
[17983.301724] [<ffffffff81197088>] handle_mm_fault+0x5b8/0xf50
[17983.301729] [<ffffffff81640e22>] __do_page_fault+0x152/0x420
[17983.301734] [<ffffffff81641113>] do_page_fault+0x23/0x80
[17983.301740] [<ffffffff8163d408>] page_fault+0x28/0x30
[17983.301748] INFO: task java:43800 blocked for more than 120 seconds.
[17983.301803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.301868] java D ffff8803258f73a0 0 43800 43299 0x00000080
[17983.301873] ffff8803258f7240 0000000000000086 ffff8803e01cd080 ffff8803258f7fd8
[17983.301881] ffff8803258f7fd8 ffff8803258f7fd8 ffff8803e01cd080 ffff88082f6b4780
[17983.301887] 0000000000000000 7fffffffffffffff ffffffff811688b0 ffff8803258f73a0
[17983.301893] Call Trace:
[17983.301898] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301903] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.301908] [<ffffffff816385f9>] schedule_timeout+0x209/0x2d0
[17983.301915] [<ffffffff8104671c>] ? native_send_call_func_single_ipi+0x3c/0x40
[17983.301922] [<ffffffff8101c829>] ? read_tsc+0x9/0x10
[17983.301926] [<ffffffff811688b0>] ? wait_on_page_read+0x60/0x60
[17983.301931] [<ffffffff81639f3e>] io_schedule_timeout+0xae/0x130
[17983.301936] [<ffffffff81639fd8>] io_schedule+0x18/0x20
[17983.301940] [<ffffffff811688be>] sleep_on_page+0xe/0x20
[17983.301945] [<ffffffff81638780>] __wait_on_bit+0x60/0x90
[17983.301950] [<ffffffff81168646>] wait_on_page_bit+0x86/0xb0
[17983.301956] [<ffffffff810a6b60>] ? wake_atomic_t_function+0x40/0x40
[17983.301967] [<ffffffffa0337e0e>] ? dm_any_congested+0x4e/0x60 [dm_mod]
[17983.301972] [<ffffffff8117d472>] shrink_page_list+0x6c2/0xad0
[17983.301978] [<ffffffff8117c843>] ? isolate_lru_pages.isra.43+0xd3/0x190
[17983.301983] [<ffffffff8117df3a>] shrink_inactive_list+0x1ea/0x560
[17983.301989] [<ffffffff8117ea05>] shrink_lruvec+0x375/0x760
[17983.301996] [<ffffffff8117ee66>] shrink_zone+0x76/0x1a0
[17983.302001] [<ffffffff8117f370>] do_try_to_free_pages+0xf0/0x4e0
[17983.302007] [<ffffffff8117f9aa>] try_to_free_mem_cgroup_pages+0xca/0x160
[17983.302012] [<ffffffff811d207e>] mem_cgroup_reclaim+0x4e/0xe0
[17983.302016] [<ffffffff811d253c>] __mem_cgroup_try_charge+0x42c/0x650
[17983.302023] [<ffffffff811d4a93>] __memcg_kmem_newpage_charge+0x123/0x190
[17983.302030] [<ffffffff81172fc5>] __alloc_pages_nodemask+0x265/0xb90
[17983.302066] [<ffffffffa0614127>] ? kmem_zone_alloc+0x77/0x100 [xfs]
[17983.302074] [<ffffffff811b43f9>] alloc_pages_current+0xa9/0x170
[17983.302081] [<ffffffff811be9fc>] new_slab+0x2ec/0x300
[17983.302089] [<ffffffff81632161>] __slab_alloc+0x315/0x48f
[17983.302120] [<ffffffffa0614127>] ? kmem_zone_alloc+0x77/0x100 [xfs]
[17983.302144] [<ffffffffa05c295c>] ? xfs_bmap_search_extents+0x5c/0xc0 [xfs]
[17983.302149] [<ffffffff811c0fc3>] kmem_cache_alloc+0x193/0x1d0
[17983.302179] [<ffffffffa0614127>] ? kmem_zone_alloc+0x77/0x100 [xfs]
[17983.302209] [<ffffffffa0614127>] kmem_zone_alloc+0x77/0x100 [xfs]
[17983.302243] [<ffffffffa061aeaa>] xfs_efi_init+0x2a/0x90 [xfs]
[17983.302278] [<ffffffffa0623f78>] xfs_trans_get_efi+0x18/0x30 [xfs]
[17983.302308] [<ffffffffa05f2800>] xfs_bmap_finish+0x70/0x1b0 [xfs]
[17983.302340] [<ffffffffa0608f8d>] xfs_itruncate_extents+0x17d/0x2b0 [xfs]
[17983.302369] [<ffffffffa05f370e>] xfs_free_eofblocks+0x1ee/0x270 [xfs]
[17983.302400] [<ffffffffa060927e>] xfs_release+0x9e/0x170 [xfs]
[17983.302431] [<ffffffffa05fa0d5>] xfs_file_release+0x15/0x20 [xfs]
[17983.302440] [<ffffffff811e0329>] __fput+0xe9/0x270
[17983.302445] [<ffffffff811e05ee>] ____fput+0xe/0x10
[17983.302451] [<ffffffff810a22d7>] task_work_run+0xa7/0xe0
[17983.302461] [<ffffffff81014b12>] do_notify_resume+0x92/0xb0
[17983.302468] [<ffffffff81645bbd>] int_signal+0x12/0x17
[17983.302473] INFO: task kworker/13:1:43828 blocked for more than 120 seconds.
[17983.302534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17983.302599] kworker/13:1 D ffff880321a01a20 0 43828 2 0x00000080
[17983.302632] Workqueue: xfs-data/dm-22 xfs_end_io [xfs]
[17983.302636] ffff8807234cbc98 0000000000000046 ffff88076c296780 ffff8807234cbfd8
[17983.302642] ffff8807234cbfd8 ffff8807234cbfd8 ffff88076c296780 ffff88076c296780
[17983.302648] ffff880321a01a10 ffff880321a01a18 ffffffff00000000 ffff880321a01a20
[17983.302655] Call Trace:
[17983.302661] [<ffffffff8163a909>] schedule+0x29/0x70
[17983.302666] [<ffffffff8163c0c5>] rwsem_down_write_failed+0x115/0x220
[17983.302672] [<ffffffff810c3401>] ? enqueue_entity+0x181/0x890
[17983.302700] [<ffffffffa05ee236>] ? xfs_setfilesize+0x56/0x130 [xfs]
[17983.302726] [<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[17983.302732] [<ffffffff81639b5d>] ? down_write+0x2d/0x30
[17983.302762] [<ffffffffa06069a1>] xfs_ilock+0xc1/0x120 [xfs]
[17983.302789] [<ffffffffa05ee236>] xfs_setfilesize+0x56/0x130 [xfs]
[17983.302816] [<ffffffffa05ef182>] xfs_end_io+0x62/0xc0 [xfs]
[17983.302822] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[17983.302827] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[17983.302832] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[17983.302837] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[17983.302844] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[17983.302849] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[17983.302857] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[18103.185522] INFO: task kworker/13:0:172 blocked for more than 120 seconds.
[18103.185592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18103.185659] kworker/13:0 D ffff880321a01a20 0 172 2 0x00000000
[18103.185717] Workqueue: xfs-data/dm-22 xfs_end_io [xfs]
[18103.185721] ffff880819be7c98 0000000000000046 ffff880819bd2280 ffff880819be7fd8
[18103.185729] ffff880819be7fd8 ffff880819be7fd8 ffff880819bd2280 ffff880819bd2280
[18103.185735] ffff880321a01a10 ffff880321a01a18 ffffffff00000000 ffff880321a01a20
[18103.185742] Call Trace:
[18103.185757] [<ffffffff8163a909>] schedule+0x29/0x70
[18103.185765] [<ffffffff8163c0c5>] rwsem_down_write_failed+0x115/0x220
[18103.185774] [<ffffffff810c3401>] ? enqueue_entity+0x181/0x890
[18103.185803] [<ffffffffa05ee236>] ? xfs_setfilesize+0x56/0x130 [xfs]
[18103.185812] [<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[18103.185818] [<ffffffff81639b5d>] ? down_write+0x2d/0x30
[18103.185852] [<ffffffffa06069a1>] xfs_ilock+0xc1/0x120 [xfs]
[18103.185880] [<ffffffffa05ee236>] xfs_setfilesize+0x56/0x130 [xfs]
[18103.185906] [<ffffffffa05ef182>] xfs_end_io+0x62/0xc0 [xfs]
[18103.185913] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[18103.185918] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[18103.185922] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[18103.185929] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[18103.185935] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[18103.185944] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[18103.185950] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[18103.186106] INFO: task kworker/13:2:9980 blocked for more than 120 seconds.
[18103.186166] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18103.186238] kworker/13:2 D ffff880819bd2a84 0 9980 2 0x00000080
[18103.186268] Workqueue: xfs-data/dm-22 xfs_end_io [xfs]
[18103.186270] ffff880805197c98 0000000000000046 ffff8807f98dd080 ffff880805197fd8
[18103.186276] ffff880805197fd8 ffff880805197fd8 ffff8807f98dd080 ffff8807f98dd080
[18103.186282] ffff880321a01a10 ffff880321a01a18 ffffffff00000000 ffff880321a01a20
[18103.186301] Call Trace:
[18103.186307] [<ffffffff8163a909>] schedule+0x29/0x70
[18103.186312] [<ffffffff8163c0c5>] rwsem_down_write_failed+0x115/0x220
[18103.186317] [<ffffffff810a6b00>] ? autoremove_wake_function+0x20/0x40
[18103.186341] [<ffffffffa05ee236>] ? xfs_setfilesize+0x56/0x130 [xfs]
[18103.186360] [<ffffffff81301813>] call_rwsem_down_write_failed+0x13/0x20
[18103.186366] [<ffffffff81639b5d>] ? down_write+0x2d/0x30
[18103.186397] [<ffffffffa06069a1>] xfs_ilock+0xc1/0x120 [xfs]
[18103.186422] [<ffffffffa05ee236>] xfs_setfilesize+0x56/0x130 [xfs]
[18103.186450] [<ffffffffa05ef182>] xfs_end_io+0x62/0xc0 [xfs]
[18103.186486] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[18103.186490] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[18103.186494] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[18103.186498] [<ffffffff810a5aef>] kthread+0xcf/0xe0
[18103.186503] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[18103.186507] [<ffffffff81645858>] ret_from_fork+0x58/0x90
[18103.186512] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Steps To Reproduce1. intsall centos 7.3
CentOS Linux release 7.3.1611 (Core)
2. intsall docker and device-mapper(direct +lvm)
Client:
 Version: 1.12.6
 API version: 1.24
 Go version: go1.6.4
 Git commit: 78d1802
 Built: Tue Jan 10 20:20:01 2017
 OS/Arch: linux/amd64

Server:
 Version: 1.12.6
 API version: 1.24
 Go version: go1.6.4
 Git commit: 78d1802
 Built: Tue Jan 10 20:20:01 2017
 OS/Arch: linux/amd64

dmsetup version
Library version: 1.02.135-RHEL7 (2016-11-16)
Driver version: 4.34.0

3. install kubernetes
kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.1", GitCommit:"1dc5c66f5dd61da08412a74221ecc79208c2165b", GitTreeState:"clean", BuildDate:"2017-07-14T02:00:46Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.0", GitCommit:"d3ada0119e776222f11ec7945e6d860061339aad", GitTreeState:"clean", BuildDate:"2017-06-29T22:55:19Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}


4. create a pod with test.yml

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: ReplicationController
metadata:
  generation: 1
  labels:
    name: azmtest2
    type: app
  name: azmtest2
spec:
  replicas: 1
  selector:
    name: azmtest2
    type: app
  template:
    metadata:
      labels:
        name: azmtest2
        type: app
      name: azmtest2
    spec:
      containers:
      - image: wurstmeister/zookeeper
        imagePullPolicy: Always
        name: azmtest2
        ports:
        - containerPort: 2181
          name: port-2181
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 32Mi
        volumeMounts:
        - mountPath: /opt/log
          name: path-test-azmtest2--log
        - mountPath: /opt/data
          name: path-test-azmtest2--data
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /www/logs/test/azmtest2
        name: path-test-azmtest2--log
      - hostPath:
          path: /www/datas/test/azmtest2
        name: path-test-azmtest2--data
status:
  availableReplicas: 1
  fullyLabeledReplicas: 1
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1

when the pod start run, grep oom-kill /var/log/messages will show lots of oom-kill info like

Aug 21 10:43:29 k8s-dbg-master-1 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=999
Aug 21 10:47:53 k8s-dbg-master-1 kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=999
.......


5. after a few hours, lots of message with "blocked for more than 120 seconds" appear, at this time please check "oom-kill" events still happened on the system timely, if this event not happened, then run ps aux, it probably hang, and the preblem has been reproduced.
Tagscentos 7, dm_mod, file system, hang
abrt_hash
URL

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2017-08-25 03:23 skilledno1 New Issue
2017-08-25 03:23 skilledno1 Tag Attached: centos 7
2017-08-25 03:23 skilledno1 Tag Attached: dm_mod
2017-08-25 03:23 skilledno1 Tag Attached: file system
2017-08-25 03:23 skilledno1 Tag Attached: hang
2018-12-19 06:13 Benlong Zhang Tag Attached: Related
2018-12-19 06:14 Benlong Zhang Tag Detached: Related