View Issue Details

IDProjectCategoryView StatusLast Update
0018072CentOS-7kernelpublic2021-02-22 19:51
Reporterabhay2101 Assigned To 
PriorityhighSeveritycrashReproducibilityrandom
Status newResolutionopen 
PlatformX86_64OSCentOSOS Version7.9.2009
Product Version7.9.2009 
Summary0018072: System hung with deadlock and recovers only with reset
DescriptionWith software RAID and with little more load with less free memory available we can see deadlock and system hangs. It can be only recovered with rest and no ssh or console IB login works.
we were able to get more logs with kernel hung task panic flag.

[2356260.105363] INFO: task kswapd0:309 blocked for more than 300 seconds.
[2356260.105366] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2356260.105368] kswapd0 D ffffa06bbf91acc0 0 309 2 0x00000000
[2356260.105371] Call Trace:
[2356260.105409] [<ffffffffc082c569>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[2356260.105414] [<ffffffffb9f86dc9>] schedule+0x29/0x70
[2356260.105417] [<ffffffffb9f848b1>] schedule_timeout+0x221/0x2d0
[2356260.105429] [<ffffffffc0829a1b>] ? __xfs_iunpin_wait+0x9b/0x150 [xfs]
[2356260.105441] [<ffffffffc082c569>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[2356260.105443] [<ffffffffb9f8649d>] io_schedule_timeout+0xad/0x130
[2356260.105448] [<ffffffffb98c6a56>] ? prepare_to_wait+0x56/0x90
[2356260.105450] [<ffffffffb9f86538>] io_schedule+0x18/0x20
[2356260.105461] [<ffffffffc0829a51>] __xfs_iunpin_wait+0xd1/0x150 [xfs]
[2356260.105464] [<ffffffffb98c7020>] ? wake_bit_function+0x40/0x40
[2356260.105474] [<ffffffffc082c569>] xfs_iunpin_wait+0x19/0x20 [xfs]
[2356260.105485] [<ffffffffc08205c3>] xfs_reclaim_inode+0x143/0x360 [xfs]
[2356260.105496] [<ffffffffc0820a47>] xfs_reclaim_inodes_ag+0x267/0x390 [xfs]
[2356260.105508] [<ffffffffc0821af3>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[2356260.105519] [<ffffffffc08320a5>] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
[2356260.105522] [<ffffffffb9a50889>] prune_super+0xf9/0x1a0
[2356260.105527] [<ffffffffb99d1a55>] shrink_slab+0x175/0x340
[2356260.105530] [<ffffffffb9a42aa1>] ? vmpressure+0x21/0x90
[2356260.105532] [<ffffffffb99d5788>] balance_pgdat+0x3a8/0x5e0
[2356260.105534] [<ffffffffb99d5b33>] kswapd+0x173/0x430
[2356260.105537] [<ffffffffb98c6f60>] ? wake_up_atomic_t+0x30/0x30
[2356260.105539] [<ffffffffb99d59c0>] ? balance_pgdat+0x5e0/0x5e0
[2356260.105541] [<ffffffffb98c5e71>] kthread+0xd1/0xe0
[2356260.105543] [<ffffffffb98c5da0>] ? insert_kthread_work+0x40/0x40
[2356260.105546] [<ffffffffb9f93df7>] ret_from_fork_nospec_begin+0x21/0x21
[2356260.105549] [<ffffffffb98c5da0>] ? insert_kthread_work+0x40/0x40
[2356260.105558] sending NMI to all CPUs:
[2356260.110484] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb9f89154
[2356260.110485] NMI backtrace for cpu 1
[2356260.110487] CPU: 1 PID: 20449 Comm: ServiceSchedule Kdump: loaded Tainted: G W ------------ 3.10.0-1160.11.1.el7.x86_64 #1
[2356260.110488] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/21/2019
[2356260.110489] task: ffffa05ca0b91080 ti: ffffa0766be9c000 task.ti: ffffa0766be9c000
[2356260.110490] RIP: 0033:[<00007fcf84d3520c>] [<00007fcf84d3520c>] 0x7fcf84d3520c
[2356260.110491] RSP: 002b:00007fcaddca7428 EFLAGS: 00000202
[2356260.110492] RAX: 00007fceb40085c0 RBX: 00000017be050420 RCX: 000000172a2e73c8
[2356260.110493] RDX: 0000001012fb4858 RSI: 000000172a2e73b0 RDI: 000000172a2e69a8
[2356260.110494] RBP: 000000172a2e73c8 R08: 0000000000000004 R09: 000000000054ca18
[2356260.110495] R10: 00000000e545ce76 R11: 00000017be54c2f0 R12: 0000001000000000
[2356260.110496] R13: 00000000e545ce4a R14: 00000000e545ce4f R15: 00007fcacc0009e0
[2356260.110497] FS: 00007fcaddca8700(0000) GS:ffffa06bbf640000(0000) knlGS:0000000000000000
[2356260.110498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2356260.110499] CR2: 00000013c2a6a000 CR3: 0000001eb8746000 CR4: 00000000003607e0
[2356260.110500] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2356260.110501] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Additional InformationSimilar kind of issue we see https://access.redhat.com/solutions/4089281 but this says to disable sw RAID is the only option. Do we have any other workaround for this?
Tagscrash, kernel
abrt_hash
URL

Activities

abhay2101

abhay2101

2021-02-22 19:51

reporter   ~0038262

Looks like this will get resolved with https://lore.kernel.org/linux-xfs/20191031234618.15403-1-david@fromorbit.com/. any plan to backport in centos7?

Issue History

Date Modified Username Field Change
2021-02-17 20:50 abhay2101 New Issue
2021-02-17 20:50 abhay2101 Tag Attached: crash
2021-02-17 20:50 abhay2101 Tag Attached: kernel
2021-02-22 19:51 abhay2101 Note Added: 0038262