View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0018072 | CentOS-7 | kernel | public | 2021-02-17 20:50 | 2021-02-22 19:51 |
Reporter | abhay2101 | Assigned To | |||
Priority | high | Severity | crash | Reproducibility | random |
Status | new | Resolution | open | ||
Platform | X86_64 | OS | CentOS | OS Version | 7.9.2009 |
Product Version | 7.9.2009 | ||||
Summary | 0018072: System hung with deadlock and recovers only with reset | ||||
Description | With software RAID and with little more load with less free memory available we can see deadlock and system hangs. It can be only recovered with rest and no ssh or console IB login works. we were able to get more logs with kernel hung task panic flag. [2356260.105363] INFO: task kswapd0:309 blocked for more than 300 seconds. [2356260.105366] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2356260.105368] kswapd0 D ffffa06bbf91acc0 0 309 2 0x00000000 [2356260.105371] Call Trace: [2356260.105409] [<ffffffffc082c569>] ? xfs_iunpin_wait+0x19/0x20 [xfs] [2356260.105414] [<ffffffffb9f86dc9>] schedule+0x29/0x70 [2356260.105417] [<ffffffffb9f848b1>] schedule_timeout+0x221/0x2d0 [2356260.105429] [<ffffffffc0829a1b>] ? __xfs_iunpin_wait+0x9b/0x150 [xfs] [2356260.105441] [<ffffffffc082c569>] ? xfs_iunpin_wait+0x19/0x20 [xfs] [2356260.105443] [<ffffffffb9f8649d>] io_schedule_timeout+0xad/0x130 [2356260.105448] [<ffffffffb98c6a56>] ? prepare_to_wait+0x56/0x90 [2356260.105450] [<ffffffffb9f86538>] io_schedule+0x18/0x20 [2356260.105461] [<ffffffffc0829a51>] __xfs_iunpin_wait+0xd1/0x150 [xfs] [2356260.105464] [<ffffffffb98c7020>] ? wake_bit_function+0x40/0x40 [2356260.105474] [<ffffffffc082c569>] xfs_iunpin_wait+0x19/0x20 [xfs] [2356260.105485] [<ffffffffc08205c3>] xfs_reclaim_inode+0x143/0x360 [xfs] [2356260.105496] [<ffffffffc0820a47>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs] [2356260.105508] [<ffffffffc0821af3>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs] [2356260.105519] [<ffffffffc08320a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs] [2356260.105522] [<ffffffffb9a50889>] prune_super+0xf9/0x1a0 [2356260.105527] [<ffffffffb99d1a55>] shrink_slab+0x175/0x340 [2356260.105530] [<ffffffffb9a42aa1>] ? vmpressure+0x21/0x90 [2356260.105532] [<ffffffffb99d5788>] balance_pgdat+0x3a8/0x5e0 [2356260.105534] [<ffffffffb99d5b33>] kswapd+0x173/0x430 [2356260.105537] [<ffffffffb98c6f60>] ? wake_up_atomic_t+0x30/0x30 [2356260.105539] [<ffffffffb99d59c0>] ? balance_pgdat+0x5e0/0x5e0 [2356260.105541] [<ffffffffb98c5e71>] kthread+0xd1/0xe0 [2356260.105543] [<ffffffffb98c5da0>] ? insert_kthread_work+0x40/0x40 [2356260.105546] [<ffffffffb9f93df7>] ret_from_fork_nospec_begin+0x21/0x21 [2356260.105549] [<ffffffffb98c5da0>] ? insert_kthread_work+0x40/0x40 [2356260.105558] sending NMI to all CPUs: [2356260.110484] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb9f89154 [2356260.110485] NMI backtrace for cpu 1 [2356260.110487] CPU: 1 PID: 20449 Comm: ServiceSchedule Kdump: loaded Tainted: G W ------------ 3.10.0-1160.11.1.el7.x86_64 #1 [2356260.110488] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019 [2356260.110489] task: ffffa05ca0b91080 ti: ffffa0766be9c000 task.ti: ffffa0766be9c000 [2356260.110490] RIP: 0033:[<00007fcf84d3520c>] [<00007fcf84d3520c>] 0x7fcf84d3520c [2356260.110491] RSP: 002b:00007fcaddca7428 EFLAGS: 00000202 [2356260.110492] RAX: 00007fceb40085c0 RBX: 00000017be050420 RCX: 000000172a2e73c8 [2356260.110493] RDX: 0000001012fb4858 RSI: 000000172a2e73b0 RDI: 000000172a2e69a8 [2356260.110494] RBP: 000000172a2e73c8 R08: 0000000000000004 R09: 000000000054ca18 [2356260.110495] R10: 00000000e545ce76 R11: 00000017be54c2f0 R12: 0000001000000000 [2356260.110496] R13: 00000000e545ce4a R14: 00000000e545ce4f R15: 00007fcacc0009e0 [2356260.110497] FS: 00007fcaddca8700(0000) GS:ffffa06bbf640000(0000) knlGS:0000000000000000 [2356260.110498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2356260.110499] CR2: 00000013c2a6a000 CR3: 0000001eb8746000 CR4: 00000000003607e0 [2356260.110500] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [2356260.110501] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 | ||||
Additional Information | Similar kind of issue we see https://access.redhat.com/solutions/4089281 but this says to disable sw RAID is the only option. Do we have any other workaround for this? | ||||
Tags | crash, kernel | ||||
abrt_hash | |||||
URL | |||||
Looks like this will get resolved with https://lore.kernel.org/linux-xfs/20191031234618.15403-1-david@fromorbit.com/. any plan to backport in centos7? | |