View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0013843 | CentOS-7 | kernel | public | 2017-09-18 14:19 | 2018-09-15 17:32 |
Reporter | thompsop | Assigned To | |||
Priority | high | Severity | minor | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Platform | x86_64 | OS Version | 7.4.1708 | ||
Product Version | 7.3.1611 | ||||
Summary | 0013843: xfsaild blocks after certain time | ||||
Description | vmlinuz-3.10.0-693.2.2.el7.x86_64, started to have problem where xfsaild becomes blocked after a period of time and processes wanting to access the file system begin to grind to a halt. Most often hits chatty IO applications such as auditd first. A reboot solves the problem until the next time. The file system is in a mirrored volume group and neither the volume group nor individual disk reports any problems. Reverting to vmlinuz-3.10.0-514.26.2.el7.x86_64 seems to make the problem go away. | ||||
Steps To Reproduce | Run vmlinuz-3.10.0-693.2.2.el7.x86_64 and wait a variable amount of time, usually less than 12 hours. | ||||
Additional Information | Sep 15 02:23:36 lol1093 kernel: INFO: task touch:9955 blocked for more than 120 seconds. Sep 15 02:23:36 lol1093 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 15 02:23:36 lol1093 kernel: touch D ffff8801fbffcf10 0 9955 9951 0x00000080 Sep 15 02:23:36 lol1093 kernel: ffff88017306f6f0 0000000000000082 ffff8801fbffcf10 ffff88017306ffd8 Sep 15 02:23:36 lol1093 kernel: ffff88017306ffd8 ffff88017306ffd8 ffff8801fbffcf10 ffff880210e307b0 Sep 15 02:23:36 lol1093 kernel: 7fffffffffffffff 0000000000000002 0000000000000000 ffff8801fbffcf10 Sep 15 02:23:36 lol1093 kernel: Call Trace: Sep 15 02:23:36 lol1093 kernel: [<ffffffff816a94e9>] schedule+0x29/0x70 Sep 15 02:23:36 lol1093 kernel: [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0 Sep 15 02:23:36 lol1093 kernel: [<ffffffff816a8887>] __down_common+0xaa/0x104 Sep 15 02:23:36 lol1093 kernel: [<ffffffff810d1223>] ? find_busiest_group+0x143/0x980 Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0444410>] ? _xfs_buf_find+0x170/0x330 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffff816a88fe>] __down+0x1d/0x1f Sep 15 02:23:36 lol1093 kernel: [<ffffffff810b6691>] down+0x41/0x50 Sep 15 02:23:36 lol1093 kernel: [<ffffffffc044420c>] xfs_buf_lock+0x3c/0xd0 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0444410>] _xfs_buf_find+0x170/0x330 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc04445fa>] xfs_buf_get_map+0x2a/0x240 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc04451a0>] xfs_buf_read_map+0x30/0x160 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0474ec1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0434bad>] xfs_read_agi+0x9d/0x110 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0434c54>] xfs_ialloc_read_agi+0x34/0xd0 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc04356a8>] xfs_dialloc+0xe8/0x280 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0457581>] xfs_ialloc+0x71/0x530 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0464b94>] ? xlog_grant_head_check+0x54/0x100 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc0457ab3>] xfs_dir_ialloc+0x73/0x1f0 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffff816a87b2>] ? down_write+0x12/0x3d Sep 15 02:23:36 lol1093 kernel: [<ffffffffc045837e>] xfs_create+0x43e/0x6c0 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc045522b>] xfs_vn_mknod+0xbb/0x240 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffffc04553e3>] xfs_vn_create+0x13/0x20 [xfs] Sep 15 02:23:36 lol1093 kernel: [<ffffffff8120d60d>] vfs_create+0xcd/0x130 Sep 15 02:23:36 lol1093 kernel: [<ffffffff8121079a>] do_last+0x10ea/0x12c0 Sep 15 02:23:36 lol1093 kernel: [<ffffffff81210a32>] path_openat+0xc2/0x490 Sep 15 02:23:36 lol1093 kernel: [<ffffffff8118295b>] ? unlock_page+0x2b/0x30 Sep 15 02:23:36 lol1093 kernel: [<ffffffff811ad6a6>] ? do_read_fault.isra.44+0xe6/0x130 Sep 15 02:23:36 lol1093 kernel: [<ffffffff81212fcb>] do_filp_open+0x4b/0xb0 Sep 15 02:23:36 lol1093 kernel: [<ffffffff8111f757>] ? __audit_getname+0x97/0xb0 Sep 15 02:23:36 lol1093 kernel: [<ffffffff8122022a>] ? __alloc_fd+0x8a/0x130 Sep 15 02:23:36 lol1093 kernel: [<ffffffff811ffc13>] do_sys_open+0xf3/0x1f0 Sep 15 02:23:36 lol1093 kernel: [<ffffffff816b0325>] ? do_page_fault+0x35/0x90 Sep 15 02:23:36 lol1093 kernel: [<ffffffff811ffd2e>] SyS_open+0x1e/0x20 Sep 15 02:23:36 lol1093 kernel: [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b | ||||
Tags | 7.4, file system, upstream, xfs | ||||
abrt_hash | |||||
URL | |||||
Here's a similar, recent, but not identical problem https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1681914 |
|
Problem persists on 3.10.0-693.5.2.el7.x86_64, must revert to 3.10.0-514.26.2.el7.x86_64 and the machine will run fine for weeks. | |
I'd like to add that I'm experiencing this issue on a number (10+) of machines. My experiments also show that the instability started with Kernel 3.10.0-693.2.2.el7.x86_64 and persist with 3.10.0-693.5.2.el7.x86_64; regressing to Kernel 3.10.0-514.26.2.el7.x86_64 brings stability. We run software RAID 1 and usually the 10G /tmp partition is the one to lock up. The issue only occurs with SSD drives; our servers with traditional drives never encounter this problem. We use systemd's weekly timer to trim the disks, and we have tried also trimming daily in case that was relevant. Anecdotally it feels like these trims have reduced the frequency of xfsaild lock-ups, but they are still occurring. The only fix we've found so far is to regress the kernel, which isn't okay in the long term. |
|
I have found running the current debug kernel masks the problem. 3.10.0-693.5.2.el7.x86_64.debug is stable for me, for instance. However if the machine accidentally boots back to the normal kernel the problem returns. |
|
I'm not sure if this is related, but we just had this happen on one of our machines today - a task became blocked by xfsaild, and logins to the machine became blocked at that point too; reboot was required to fix. This may be an upstream XFS issue -- we were trying a plain vanilla 4.14.6 kernel at the time because of a different XFS issue (the plain vanilla kernel did not fix the other issue, but this lockup happened). |
|
We are seeing the exact behaviour on multiple hosts running this kernel with processes getting blocked by xfsaild. | |
I was able to reproduce this (or a similar) issue on demand with 4.14.6 through the following procedure: 0. Be running vanilla 4.14.6. 1. Have a mounted and active XFS filesystem. 2. Go to /sys/devices/system/cpu and disable all but cpu0 by writing "0" to cpuXX/online for each cpu other than cpu0. 3. Re-enable one other cpu by writing "1" to cpuXX/online for that cpu. The selection of which cpu to re-enable *might* depend on hyperthreading and/or NUMA details, but I have not confirmed that yet. 4. Observe system lockup apparently due to xfsaild. |
|
I am witnessing the same failure after updating from vmlinuz-3.10.0-514.16.1.el7.x86_64 to vmlinuz-3.10.0-693.11.6.el7.x86_64 We are also running on a mdadm-raid1: [284165.322308] INFO: task xfsaild/md127:525 blocked for more than 120 seconds. [284165.322667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284165.323070] xfsaild/md127 D ffff881039386eb0 0 525 2 0x00000000 [284165.323081] Call Trace: [284165.323100] [<ffffffff8109ae1e>] ? try_to_del_timer_sync+0x5e/0x90 [284165.323115] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284165.323307] [<ffffffffc0385d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] [284165.323339] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284165.323503] [<ffffffffc0391bbc>] ? xfsaild+0x16c/0x6f0 [xfs] [284165.323685] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284165.323868] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284165.324048] [<ffffffffc0391bbc>] xfsaild+0x16c/0x6f0 [xfs] [284165.324235] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284165.324264] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284165.324278] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284165.324294] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284165.324310] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284285.326264] INFO: task xfsaild/md127:525 blocked for more than 120 seconds. [284285.326594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284285.326912] xfsaild/md127 D ffff881039386eb0 0 525 2 0x00000000 [284285.326923] Call Trace: [284285.326942] [<ffffffff8109ae1e>] ? try_to_del_timer_sync+0x5e/0x90 [284285.326956] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284285.327139] [<ffffffffc0385d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] [284285.327169] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284285.327322] [<ffffffffc0391bbc>] ? xfsaild+0x16c/0x6f0 [xfs] [284285.327479] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284285.327652] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284285.327820] [<ffffffffc0391bbc>] xfsaild+0x16c/0x6f0 [xfs] [284285.327981] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284285.327999] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284285.328013] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284285.328027] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284285.328044] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.330013] INFO: task xfsaild/md127:525 blocked for more than 120 seconds. [284405.330297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284405.330669] xfsaild/md127 D ffff881039386eb0 0 525 2 0x00000000 [284405.330679] Call Trace: [284405.330697] [<ffffffff8109ae1e>] ? try_to_del_timer_sync+0x5e/0x90 [284405.330710] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284405.330927] [<ffffffffc0385d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] [284405.330941] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284405.331106] [<ffffffffc0391bbc>] ? xfsaild+0x16c/0x6f0 [xfs] [284405.331846] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284405.332104] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284405.332412] [<ffffffffc0391bbc>] xfsaild+0x16c/0x6f0 [xfs] [284405.332642] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284405.332742] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284405.332755] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.332770] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284405.332782] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.332896] INFO: task kworker/12:2:18723 blocked for more than 120 seconds. [284405.333359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284405.333880] kworker/12:2 D ffff881029beaf70 0 18723 2 0x00000000 [284405.334039] Workqueue: xfs-cil/md127 xlog_cil_push_work [xfs] [284405.334127] Call Trace: [284405.334143] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284405.334302] [<ffffffffc0384ffd>] xlog_state_get_iclog_space+0x10d/0x320 [xfs] [284405.334462] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284405.334623] [<ffffffffc0385509>] xlog_write+0x1a9/0x720 [xfs] [284405.335075] [<ffffffffc0382477>] ? kmem_zone_alloc+0x97/0x130 [xfs] [284405.335479] [<ffffffffc0387218>] xlog_cil_push+0x2a8/0x430 [xfs] [284405.335655] [<ffffffffc03873b5>] xlog_cil_push_work+0x15/0x20 [xfs] [284405.335668] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [284405.335682] [<ffffffff810ab1d8>] worker_thread+0x278/0x3c0 [284405.335695] [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [284405.335707] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284405.335720] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.335732] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284405.335744] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.335755] INFO: task kworker/12:1:20525 blocked for more than 120 seconds. [284405.336031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284405.336324] kworker/12:1 D ffff881029beb774 0 20525 2 0x00000000 [284405.336487] Workqueue: xfs-sync/md127 xfs_log_worker [xfs] [284405.336493] Call Trace: [284405.336511] [<ffffffff8150c3a4>] ? md_make_request+0x104/0x290 [284405.336524] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284405.336538] [<ffffffff816a90e9>] schedule_timeout+0x239/0x2c0 [284405.336568] [<ffffffffc000d6d0>] ? ahci_qc_prep+0x150/0x1c0 [libahci] [284405.336644] [<ffffffffc00ad957>] ? ata_qc_issue+0x177/0x3c0 [libata] [284405.336661] [<ffffffff8109a5cb>] ? __internal_add_timer+0xab/0x130 [284405.336673] [<ffffffff816aba8d>] wait_for_completion+0xfd/0x140 [284405.336686] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284405.336702] [<ffffffff810ab41d>] flush_work+0xfd/0x190 [284405.336714] [<ffffffff810a7990>] ? move_linked_works+0x90/0x90 [284405.336883] [<ffffffffc0387b3a>] xlog_cil_force_lsn+0x8a/0x210 [xfs] [284405.336901] [<ffffffff811a3206>] ? next_online_pgdat+0x26/0x60 [284405.337069] [<ffffffffc0385bf5>] _xfs_log_force+0x85/0x2c0 [xfs] [284405.337092] [<ffffffff810298b1>] ? __switch_to+0x431/0x500 [284405.337255] [<ffffffffc0385ec4>] ? xfs_log_worker+0x24/0x50 [xfs] [284405.337426] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284405.337587] [<ffffffffc0385ec4>] xfs_log_worker+0x24/0x50 [xfs] [284405.337604] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [284405.337617] [<ffffffff810ab086>] worker_thread+0x126/0x3c0 [284405.337631] [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [284405.337645] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284405.337659] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284405.337675] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284405.337687] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.338947] INFO: task xfsaild/md127:525 blocked for more than 120 seconds. [284525.339306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284525.339709] xfsaild/md127 D ffff881039386eb0 0 525 2 0x00000000 [284525.339720] Call Trace: [284525.339739] [<ffffffff8109ae1e>] ? try_to_del_timer_sync+0x5e/0x90 [284525.339753] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284525.339961] [<ffffffffc0385d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] [284525.339993] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284525.340159] [<ffffffffc0391bbc>] ? xfsaild+0x16c/0x6f0 [xfs] [284525.340332] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284525.340508] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284525.340689] [<ffffffffc0391bbc>] xfsaild+0x16c/0x6f0 [xfs] [284525.340858] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284525.340886] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284525.340904] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.340920] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284525.340937] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.340984] INFO: task kworker/12:2:18723 blocked for more than 120 seconds. [284525.341276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284525.341594] kworker/12:2 D ffff881029beaf70 0 18723 2 0x00000000 [284525.341770] Workqueue: xfs-cil/md127 xlog_cil_push_work [xfs] [284525.341776] Call Trace: [284525.341791] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284525.341959] [<ffffffffc0384ffd>] xlog_state_get_iclog_space+0x10d/0x320 [xfs] [284525.341988] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284525.342163] [<ffffffffc0385509>] xlog_write+0x1a9/0x720 [xfs] [284525.342350] [<ffffffffc0382477>] ? kmem_zone_alloc+0x97/0x130 [xfs] [284525.342537] [<ffffffffc0387218>] xlog_cil_push+0x2a8/0x430 [xfs] [284525.342726] [<ffffffffc03873b5>] xlog_cil_push_work+0x15/0x20 [xfs] [284525.342744] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [284525.342759] [<ffffffff810ab1d8>] worker_thread+0x278/0x3c0 [284525.342776] [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [284525.342789] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284525.342805] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.342818] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284525.342832] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.342861] INFO: task kworker/12:1:20525 blocked for more than 120 seconds. [284525.343223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284525.343546] kworker/12:1 D ffff881029beb774 0 20525 2 0x00000000 [284525.343717] Workqueue: xfs-sync/md127 xfs_log_worker [xfs] [284525.343722] Call Trace: [284525.343738] [<ffffffff8150c3a4>] ? md_make_request+0x104/0x290 [284525.343751] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284525.343763] [<ffffffff816a90e9>] schedule_timeout+0x239/0x2c0 [284525.343795] [<ffffffffc000d6d0>] ? ahci_qc_prep+0x150/0x1c0 [libahci] [284525.343874] [<ffffffffc00ad957>] ? ata_qc_issue+0x177/0x3c0 [libata] [284525.343901] [<ffffffff8109a5cb>] ? __internal_add_timer+0xab/0x130 [284525.343918] [<ffffffff816aba8d>] wait_for_completion+0xfd/0x140 [284525.343931] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284525.343946] [<ffffffff810ab41d>] flush_work+0xfd/0x190 [284525.343962] [<ffffffff810a7990>] ? move_linked_works+0x90/0x90 [284525.344131] [<ffffffffc0387b3a>] xlog_cil_force_lsn+0x8a/0x210 [xfs] [284525.344150] [<ffffffff811a3206>] ? next_online_pgdat+0x26/0x60 [284525.344320] [<ffffffffc0385bf5>] _xfs_log_force+0x85/0x2c0 [xfs] [284525.344335] [<ffffffff810298b1>] ? __switch_to+0x431/0x500 [284525.344505] [<ffffffffc0385ec4>] ? xfs_log_worker+0x24/0x50 [xfs] [284525.344694] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284525.344879] [<ffffffffc0385ec4>] xfs_log_worker+0x24/0x50 [xfs] [284525.344902] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [284525.344920] [<ffffffff810ab086>] worker_thread+0x126/0x3c0 [284525.344935] [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [284525.344951] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284525.344966] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284525.344979] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284525.344996] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284645.346805] INFO: task xfsaild/md127:525 blocked for more than 120 seconds. [284645.347170] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284645.347569] xfsaild/md127 D ffff881039386eb0 0 525 2 0x00000000 [284645.347581] Call Trace: [284645.347599] [<ffffffff8109ae1e>] ? try_to_del_timer_sync+0x5e/0x90 [284645.347613] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284645.347821] [<ffffffffc0385d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] [284645.347852] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284645.348018] [<ffffffffc0391bbc>] ? xfsaild+0x16c/0x6f0 [xfs] [284645.348189] [<ffffffffc0385e5c>] xfs_log_force+0x2c/0x70 [xfs] [284645.348370] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284645.348550] [<ffffffffc0391bbc>] xfsaild+0x16c/0x6f0 [xfs] [284645.348734] [<ffffffffc0391a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [284645.348761] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284645.348778] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284645.348794] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284645.348807] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284645.348855] INFO: task kworker/12:2:18723 blocked for more than 120 seconds. [284645.349145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [284645.349541] kworker/12:2 D ffff881029beaf70 0 18723 2 0x00000000 [284645.349711] Workqueue: xfs-cil/md127 xlog_cil_push_work [xfs] [284645.349731] Call Trace: [284645.349751] [<ffffffff816ab6d9>] schedule+0x29/0x70 [284645.349923] [<ffffffffc0384ffd>] xlog_state_get_iclog_space+0x10d/0x320 [xfs] [284645.349940] [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 [284645.350118] [<ffffffffc0385509>] xlog_write+0x1a9/0x720 [xfs] [284645.350294] [<ffffffffc0382477>] ? kmem_zone_alloc+0x97/0x130 [xfs] [284645.350482] [<ffffffffc0387218>] xlog_cil_push+0x2a8/0x430 [xfs] [284645.350654] [<ffffffffc03873b5>] xlog_cil_push_work+0x15/0x20 [xfs] [284645.350673] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [284645.350687] [<ffffffff810ab1d8>] worker_thread+0x278/0x3c0 [284645.350703] [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [284645.350730] [<ffffffff810b252f>] kthread+0xcf/0xe0 [284645.350744] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [284645.350761] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [284645.350774] [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 |
|
created https://bugzilla.redhat.com/show_bug.cgi?id=1539394 |
|
Please note that there are possible fixes for this in the latest CentOS 7 kernel. Running rpm -q --changelog kernel-3.10.0-693.17.1.el7.x86_64 shows several possible xfs fixes that might be the solution to this. Please test and feedback. | |
Copying in a comment from the redhat bug. I believe is he is specifically referencing gbeeley's steps to reproduce and also joerg's call trace. --- Comment #2 from Brian Foster <bfoster@redhat.com> --- The xfsaild traces in the bug referenced above show us waiting on a log force. That tends to imply we're waiting on I/O. The stacks also show md appears to be involved. Since you have a reproducer, could you rule out md from the test if that hasn't been done already? FWIW, I've been running an fsstress/cpu [on|off]line loop against a dummy raid1+XFS device on 4.15.0-rc9 for a few minutes and haven't seen any issues so far. Is 4.14.6 definitely required or has anything newer than that simply not been checked for the problem? |
|
I am still on 3.10.0-693.11.6.el7 , but we restored our system (kvm hypervisor) with ext4 on the same MD-raid1 and are now running without issues. I have a twin system (same hardware, ...) in the same network still running MD-raid1 with xfs as /, currently without witnessing any problem. |
|
I seem to have this problem with a CentOS 7.4 VM running on a Fedora 27 KVM host. I've switched to the 3.10.0-514.26.2.el7.x86_64 kernel on the VM and that DID NOT resolve this, it only seems to have delayed it a few days at best. The VM itself sits on a xfs partition on a nvme (Samsung 960 PRO m.2) so it's plenty fast enough to keep up with anything. Previously the VM was running on a ext4 partition on an SSD in another Fedora 27 system without issues. Besides that the previous system was Intel and the new AMD. Another CentOS based VM I migrated that's basically idle all day does NOT have this issue. Has anyone seen where the host impacts this in anyway? I'm trying to figure out why xfsaild would fail in this manor within the VM on my new host and not my old as it's not been altered in any way. |
|
[94443.457717] [<ffffffff81235765>] do_fsync+0x65/0xa0 [94443.457718] [<ffffffff81235a40>] SyS_fsync+0x10/0x20 [94443.457720] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [94563.461909] INFO: task xfsaild/dm-4:502 blocked for more than 120 seconds. [94563.463173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [94563.464414] xfsaild/dm-4 D ffff88020f610fd0 0 502 2 0x00000000 [94563.464416] Call Trace: [94563.464422] [<ffffffff8109affe>] ? try_to_del_timer_sync+0x5e/0x90 [94563.464425] [<ffffffff816ab8a9>] schedule+0x29/0x70 [94563.464460] [<ffffffffc0464d46>] _xfs_log_force+0x1c6/0x2c0 [xfs] [94563.464464] [<ffffffff810c6620>] ? wake_up_state+0x20/0x20 [94563.464480] [<ffffffffc0470bcc>] ? xfsaild+0x16c/0x6f0 [xfs] [94563.464495] [<ffffffffc0464e6c>] xfs_log_force+0x2c/0x70 [xfs] [94563.464509] [<ffffffffc0470a60>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [94563.464522] [<ffffffffc0470bcc>] xfsaild+0x16c/0x6f0 [xfs] [94563.464535] [<ffffffffc0470a60>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [94563.464538] [<ffffffff810b270f>] kthread+0xcf/0xe0 [94563.464540] [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 [94563.464543] [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [94563.464544] [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 [94563.464550] INFO: task auditd:1002 blocked for more than 120 seconds. [94563.466275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [94563.468135] auditd D ffff8802123e6eb0 0 1002 1 0x00000000 [94563.468138] Call Trace: [94563.468161] [<ffffffffc0461487>] ? kmem_zone_alloc+0x97/0x130 [xfs] [94563.468165] [<ffffffff816ab8a9>] schedule+0x29/0x70 [94563.468167] [<ffffffff816ad195>] rwsem_down_write_failed+0x225/0x3a0 [94563.468189] [<ffffffffc0461bc4>] ? xlog_grant_head_check+0x54/0x100 [xfs] [94563.468210] [<ffffffffc0450c9d>] ? xfs_vn_update_time+0xcd/0x150 [xfs] [94563.468213] [<ffffffff81333fd7>] call_rwsem_down_write_failed+0x17/0x30 [94563.468216] [<ffffffff816aaa8d>] down_write+0x2d/0x3d [94563.468236] [<ffffffffc04531e1>] xfs_ilock+0xc1/0x120 [xfs] [94563.468254] [<ffffffffc0450c9d>] xfs_vn_update_time+0xcd/0x150 [xfs] [94563.468258] [<ffffffff8121efa5>] update_time+0x25/0xd0 [94563.468261] [<ffffffff8121f0f0>] file_update_time+0xa0/0xf0 [94563.468280] [<ffffffffc04473b5>] xfs_file_aio_write_checks+0x185/0x1f0 [xfs] [94563.468298] [<ffffffffc0447a5a>] xfs_file_buffered_aio_write+0xca/0x2c0 [xfs] [94563.468316] [<ffffffffc0447ddd>] xfs_file_aio_write+0x18d/0x1b0 [xfs] [94563.468319] [<ffffffff8120222d>] do_sync_write+0x8d/0xd0 [94563.468323] [<ffffffff81202ced>] vfs_write+0xbd/0x1e0 [94563.468325] [<ffffffff81203aff>] SyS_write+0x7f/0xe0 [94563.468328] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [94563.468330] INFO: task auditd:1003 blocked for more than 120 seconds. [94563.470184] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [94563.472067] auditd D ffff8802135f0000 0 1003 1 0x00000000 [94563.472086] Call Trace: [94563.472089] [<ffffffff816ab8a9>] schedule+0x29/0x70 |
|
Running one day of uptime on 3.10.0-693.17.1.el7.x86_64 problem recurred. No md. I was unable to reproduce using the CPU offline recipe, unless it took effect some hours after the experiment. | |
From redhat bug: sosreport, block task output, kdump: Could you please describe the storage (devices, I/O schedulers, etc.), filesystem (xfs_info) and workload involved? Note that much of the former would be included in a sosreport. Also, could you provide the complete blocked task output while in this state ('echo w > /proc/sysrq-trigger')? FWIW, a kdump wouldn't hurt as well if kdump is configured. |
|
Bam, bug closed: --- Comment #13 from Brian Foster <bfoster@redhat.com> --- All signs point to this as a duplicate of bug 1493585. Feel free to reopen if a hang variant is reproduced that does not involve md. *** This bug has been marked as a duplicate of bug 1493585 *** |
|
Which is yet another invisible RHEL bug :-( Does anyone have the ability to read that and add email addresses to the cc list? |
|
I asked for exactly that - no reply to date | |
> Feel free to reopen if a hang variant is reproduced that does not involve md. My issue does not involve md. It also happens on SSD's. |
|
"Bug 1493585 is in the verified state and has already been fixed in MD in kernel-3.10.0-820.el7." |
|
Had this one left from my January hang up: # ps aux | grep D USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 124 0.0 0.0 0 0 ? DN Jan09 0:02 [khugepaged] root 538 0.0 0.0 0 0 ? D Jan09 2:11 [xfsaild/md127] root 723 0.0 0.0 128428 1492 ? Ds Jan13 0:00 /bin/systemctl status snmpd.service root 842 0.0 0.0 55452 1024 ? D<sl Jan09 0:08 /sbin/auditd root 936 0.0 0.0 201248 1112 ? Ssl Jan09 0:00 /usr/sbin/gssproxy -D root 1358 0.0 0.0 150220 3412 ? Dsl Jan09 3:57 /usr/bin/monit -I root 1375 0.0 0.0 105996 3728 ? Ss Jan09 0:05 /usr/sbin/sshd -D root 1852 0.0 0.0 178116 2508 ? S Jan16 0:00 /usr/sbin/CROND -n root 1864 0.0 0.0 89616 3912 ? S Jan16 0:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t -f root root 1879 0.0 0.0 89584 3908 ? D Jan16 0:00 /usr/sbin/postdrop -r |
|
Regarding diito's items, I would suggest appending sosreport, block task output, kdump to this ticket and once here I will reopen the redhat ticket. Otherwise you could jump through the hoops and get a redhat bugzilla login and append those items there. Or wait until -820. I was not completely certain I was correct when I told redhat that MD was involved in my hangs. I am not explicity using the MD raid tools but in the blocked task report mdX_raid1 showed with a very surprising high wait-time for a module I did not believe I was using. If the -820 kernel did not fix me I was going to reopen the ticket. |
|
I'm experiencing the same problems, my entire array got stuck and i had to reset the server. I was on 3.10.0-693.11.6.el7.x86_64 and now upgraded to 3.10.0-693.17.1.el7.x86_64 but thompsop said it recurred for him so i don't doubt its going to happen again for me. Using an array of SSD's with mdadm Jan 31 06:36:01 vita kernel: INFO: task xfsaild/md0:1190 blocked for more than 120 seconds. Jan 31 06:36:01 vita kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 31 06:36:01 vita kernel: xfsaild/md0 D ffff88203d901fa0 0 1190 2 0x00000000 Jan 31 06:36:01 vita kernel: Call Trace: Jan 31 06:36:01 vita kernel: [<ffffffff816ab6d9>] schedule+0x29/0x70 Jan 31 06:36:01 vita kernel: [<ffffffffc02c7d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffff810c6440>] ? wake_up_state+0x20/0x20 Jan 31 06:36:01 vita kernel: [<ffffffffc02d3bbc>] ? xfsaild+0x16c/0x6f0 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffffc02c7e5c>] xfs_log_force+0x2c/0x70 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffffc02d3a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffffc02d3bbc>] xfsaild+0x16c/0x6f0 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffffc02d3a50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Jan 31 06:36:01 vita kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0 Jan 31 06:36:01 vita kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 Jan 31 06:36:01 vita kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Jan 31 06:36:01 vita kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 |
|
Sorry for double post but is there an ETA to the release of 3.10.0-820.el7 or somewhere where i can grab it? | |
I'm seeing this on 3.10.0-514.26.2 as well, like @dlito. Hypervisor is FC27 and VM has a XFS system partition on an qemu image that is sitting on a NVME SSD. | |
(I should add that I also tried kernel -830 and still experienced the same issue.) | |
@thompsop We have a similar issue too. We dont have a reproducer yet. I am trying -830 too but there is no mention of 1493585 in the changelog. Any other pointer as the bugzilla is private ? $ rpm -qp --changelog kernel-3.10.0-830.el7.x86_64.rpm | grep 1493585 |
|
Have not tried-830 kernel yet myself, still using the latest debug kernel. I'd imagine they would need this from the -830 version: sosreport, block task output, kdump: Redhat: Could you please describe the storage (devices, I/O schedulers, etc.), filesystem (xfs_info) and workload involved? Note that much of the former would be included in a sosreport. Redhat: Also, could you provide the complete blocked task output while in this state ('echo w > /proc/sysrq-trigger')? FWIW, a kdump wouldn't hurt as well if kdump is configured. |
|
@thompsop Thanks for your reply. Could you request to Redhat that their bugzilla thread be made public? Please let them know there are other users who are experiencing this (link them to here). The system is completely hung hen in this state, so only a hard reset brings it back to life. In other words, I can't run 'echo w > /proc/sysrq-trigger' as you suggest when it is hung... I changed the hypervisor kernel to the latest in the 4.16 rc series which seems to have delayed the "hanging" until a few days of uptime. @diito Are you still experiencing this? Can you provide some more information about the system? Hypervisor, hypervisor kernel, QEMU image format, etc... Thanks. |
|
@justin2net I mentioned to redhat a public bug would be a great benefit. They promptly answered that the bug was likely logged originally from a big, widely recognized company name with a fancy paid support contract who does not want possibly proprietary information contained in the non-confidential parts of the bug to be exposed to dog & world. That said, I am waiting for the Centos version of the supposedly fixed version to drop so I can test it, and if I find it wanting report these facts back to the redhat bug, I am not certain what the delay is on a centos 7 -830 kernel. Since 2017 I am running the special debug version of whatever is the current kernel that you can choose to manually install which has extra code compiled into the image will is supposed to help you to identify problems like this deadlock, but in this case the special debug code seems to mask the problem so my machine essentially never crashes as long as the debug kernel is running which is an acceptable workaround for my purposes. If I go back to the normal kernel I will crash within hours. I'm not clear if anyone other than me has tried this, but for me it has taken the urgency off the resolution of this bug. |
|
@thompsop Thanks for the info! I have installed 3.10.0-693.17.1-el7.x86_64.debug. Let's hope it masks the problem like in your case. Looking at this, https://access.redhat.com/solutions/44983 I don't know which option would mask it? I'm actually on RHEL 7 (self-support dev license) and -830 is available there, so I have tried it, but with no resolution of the hanging... |
|
@justin2net can you get a reproducer ? do you manage to trigger it ? I am trying to get a reproducer but no luck yet. It can take several days to show-up. |
|
I updated notes in the Redhat bug about -830 not helping and also seeming to miss the relevant fix. @justin2net If you have a Redhat dev license you should able to get a login to bugzilla to have more access to some of the details of the case there. It's a different login than the access.redhat.com site apparently but still should be accessible to you. |
|
Most kernel-related bugs get marked 'private' automatically. Only those in the CC list can access them. If you are the submitter, you should be able to add email addresses to the list for those who want to have access. | |
@alphacc No, I have not been able to come up with a recipe that reproduces the bug. It almost always invariably occurs after a long period of idle, though. |
|
w > sysrq output from 3.10.0-693.21.1.el7.x86_64 vmcore from same |
|
transcribed from other bug --- Comment #19 from Eric Sandeen <esandeen@redhat.com> --- -830 is not missing the relevant fix. The patchset fixed several bugs and not all were listed in the changelog, but -830 does indeed have the md patchset. The report in the centos bug which says "830 didn't fix it for me" does not contain nearly enough information to know whether it is the same underlying problem; that reporter provided no dmesg, traces, logs etc to know what they hit. When you have a chance to try the -830 kernel for yourself, with md in the mix, feel free to re-open this bug if it still fails for you. |
|
I have a support contract, so I opened a Red Hat case with all the details. I'll update this bug with info I can share when I have updates from Red Hat. As thompsop said the above mentionned patches are part of -830. We are still trying to get a reproducer. |
|
I fumbled around for a while on the redhat site and was not able to locate -830 kernel accessible to me, although I do have the free developer logs to redhat.com. If someone can point me to where I can get the bits and pieces for a redhat x86_64 -830 kernel .rpms and it is as easy as rpm/installing it on my centos7 system, I have no objection to trying to run the kernel on my machine. | |
@thompsop Did you mean to attach the output of w > sysrq and vmcore in your previous message? I've been running 3.10.0-693.17.1.el.x86_64.debug for the past 8 days without any hanging...yet. Thanks for the recommendation. @alphacc Please keep us up to date with the RH Bugzilla you opened. If you find a reproducer, I can try it on my system with a variety of kernel versions. Thanks. |
|
"Bug 1493585 is in the verified state and has already been fixed in MD in kernel-3.10.0-820.el7." This bug is not in md, I have this problem over VDO device. |
|
This issue exists for me on a non-mdraid SSD volume. I've not found any of the Centos kernel versions new or old have been helpful to mitigate it to date, they all crash after a few days tops. If I install the kernel-ml kernel things are far more stable and I've gone as long as a month without issue but it still will eventually crash with the same error too. This only seems to affect my CentOS 7 VM's running on a Fedora 27 host. | |
Just got a brand new supermicro server with Xeon Silver 4114 CPU's, still have the exact same issue and now confirmed over 3 different generations of servers. All of my servers run SSD's in MDADM arrays. For my latest server i went to centos vault and downloaded 3.10.0-514.26.2.el7.x86_64 and that seems to fix it. Attached logs from my latest server. Haven't tested debug kernel yet, have you crashed ever since using it @justin2net? yuuno.txt (26,373 bytes)
Apr 25 01:01:47 yuuno kernel: INFO: task kworker/u898:2:16685 blocked for more than 120 seconds. Apr 25 01:01:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:01:47 yuuno kernel: kworker/u898:2 D ffff88203a9f0000 0 16685 2 0x00000080 Apr 25 01:01:47 yuuno kernel: Workqueue: writeback bdi_writeback_workfn (flush-9:0) Apr 25 01:01:47 yuuno kernel: Call Trace: Apr 25 01:01:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:01:47 yuuno kernel: [<ffffffff8151df05>] bitmap_startwrite+0x1f5/0x210 Apr 25 01:01:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:01:47 yuuno kernel: [<ffffffffc05482c1>] add_stripe_bio+0x3f1/0x770 [raid456] Apr 25 01:01:47 yuuno kernel: [<ffffffffc054c701>] raid5_make_request+0x211/0xce0 [raid456] Apr 25 01:01:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:01:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff8150c644>] md_make_request+0x104/0x290 Apr 25 01:01:47 yuuno kernel: [<ffffffff812fb015>] generic_make_request+0x105/0x310 Apr 25 01:01:47 yuuno kernel: [<ffffffff812fb290>] submit_bio+0x70/0x150 Apr 25 01:01:47 yuuno kernel: [<ffffffffc0222661>] xfs_submit_ioend.isra.16+0x61/0xe0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0222821>] xfs_vm_writepages+0xd1/0xe0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff81190efe>] do_writepages+0x1e/0x40 Apr 25 01:01:47 yuuno kernel: [<ffffffff8122fa80>] __writeback_single_inode+0x40/0x220 Apr 25 01:01:47 yuuno kernel: [<ffffffff812306c4>] writeback_sb_inodes+0x1c4/0x490 Apr 25 01:01:47 yuuno kernel: [<ffffffff81230a2f>] __writeback_inodes_wb+0x9f/0xd0 Apr 25 01:01:47 yuuno kernel: [<ffffffff81231263>] wb_writeback+0x263/0x2f0 Apr 25 01:01:47 yuuno kernel: [<ffffffff8123170b>] bdi_writeback_workfn+0x2cb/0x460 Apr 25 01:01:47 yuuno kernel: [<ffffffff810aa59a>] process_one_work+0x17a/0x440 Apr 25 01:01:47 yuuno kernel: [<ffffffff810ab266>] worker_thread+0x126/0x3c0 Apr 25 01:01:47 yuuno kernel: [<ffffffff810ab140>] ? manage_workers.isra.24+0x2a0/0x2a0 Apr 25 01:01:47 yuuno kernel: [<ffffffff810b270f>] kthread+0xcf/0xe0 Apr 25 01:01:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:01:47 yuuno kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Apr 25 01:01:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:01:47 yuuno kernel: INFO: task rsync:16810 blocked for more than 120 seconds. Apr 25 01:01:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:01:47 yuuno kernel: rsync D ffff881fd135dee0 0 16810 16807 0x00000080 Apr 25 01:01:47 yuuno kernel: Call Trace: Apr 25 01:01:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:01:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:01:47 yuuno kernel: [<ffffffffc02599f8>] ? _xfs_trans_bjoin+0x48/0x60 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:01:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:01:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:01:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc020e486>] xfs_dir3_data_read+0x26/0x60 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc02130e4>] xfs_dir2_node_addname_int+0x454/0x820 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc020a096>] ? xfs_da3_node_lookup_int+0x2c6/0x2f0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0214774>] xfs_dir2_node_addname+0xd4/0x160 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc023d4fd>] xfs_create+0x58d/0x6c0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc023a25b>] xfs_vn_mknod+0xbb/0x240 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc023a413>] xfs_vn_create+0x13/0x20 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff8120f6cd>] vfs_create+0xcd/0x130 Apr 25 01:01:47 yuuno kernel: [<ffffffff8121285a>] do_last+0x10ea/0x12c0 Apr 25 01:01:47 yuuno kernel: [<ffffffff81212af2>] path_openat+0xc2/0x490 Apr 25 01:01:47 yuuno kernel: [<ffffffff8121508b>] do_filp_open+0x4b/0xb0 Apr 25 01:01:47 yuuno kernel: [<ffffffff8122233a>] ? __alloc_fd+0x8a/0x130 Apr 25 01:01:47 yuuno kernel: [<ffffffff81201bc3>] do_sys_open+0xf3/0x1f0 Apr 25 01:01:47 yuuno kernel: [<ffffffff81201cde>] SyS_open+0x1e/0x20 Apr 25 01:01:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:01:47 yuuno kernel: INFO: task rsync:16937 blocked for more than 120 seconds. Apr 25 01:01:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:01:47 yuuno kernel: rsync D ffff88103a201fa0 0 16937 16934 0x00000080 Apr 25 01:01:47 yuuno kernel: Call Trace: Apr 25 01:01:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:01:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:01:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:01:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:01:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:01:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0208bc3>] xfs_da3_node_read+0x23/0xd0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc0209e3e>] xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc02146ea>] xfs_dir2_node_addname+0x4a/0x160 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffffc023eff7>] xfs_rename+0x6f7/0x900 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff8120e448>] ? inode_permission+0x18/0x50 Apr 25 01:01:47 yuuno kernel: [<ffffffff81210311>] ? link_path_walk+0x81/0x8b0 Apr 25 01:01:47 yuuno kernel: [<ffffffffc0238ea0>] xfs_vn_rename+0xb0/0xf0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff81213890>] vfs_rename+0x5a0/0x930 Apr 25 01:01:47 yuuno kernel: [<ffffffffc0238df0>] ? xfs_vn_follow_link+0xa0/0xa0 [xfs] Apr 25 01:01:47 yuuno kernel: [<ffffffff81214b93>] SYSC_renameat2+0x503/0x5a0 Apr 25 01:01:47 yuuno kernel: [<ffffffff81205b91>] ? __sb_end_write+0x31/0x60 Apr 25 01:01:47 yuuno kernel: [<ffffffff812230df>] ? mnt_drop_write+0x1f/0x30 Apr 25 01:01:47 yuuno kernel: [<ffffffff812249f4>] ? mntput+0x24/0x40 Apr 25 01:01:47 yuuno kernel: [<ffffffff8120d57e>] ? path_put+0x1e/0x30 Apr 25 01:01:47 yuuno kernel: [<ffffffff81215a7e>] SyS_renameat2+0xe/0x10 Apr 25 01:01:47 yuuno kernel: [<ffffffff81215abe>] SyS_rename+0x1e/0x20 Apr 25 01:01:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:03:47 yuuno kernel: INFO: task kworker/u898:2:16685 blocked for more than 120 seconds. Apr 25 01:03:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:03:47 yuuno kernel: kworker/u898:2 D ffff88203a9f0000 0 16685 2 0x00000080 Apr 25 01:03:47 yuuno kernel: Workqueue: writeback bdi_writeback_workfn (flush-9:0) Apr 25 01:03:47 yuuno kernel: Call Trace: Apr 25 01:03:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:03:47 yuuno kernel: [<ffffffff8151df05>] bitmap_startwrite+0x1f5/0x210 Apr 25 01:03:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:03:47 yuuno kernel: [<ffffffffc05482c1>] add_stripe_bio+0x3f1/0x770 [raid456] Apr 25 01:03:47 yuuno kernel: [<ffffffffc054c701>] raid5_make_request+0x211/0xce0 [raid456] Apr 25 01:03:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:03:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff8150c644>] md_make_request+0x104/0x290 Apr 25 01:03:47 yuuno kernel: [<ffffffff812fb015>] generic_make_request+0x105/0x310 Apr 25 01:03:47 yuuno kernel: [<ffffffff812fb290>] submit_bio+0x70/0x150 Apr 25 01:03:47 yuuno kernel: [<ffffffffc0222661>] xfs_submit_ioend.isra.16+0x61/0xe0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0222821>] xfs_vm_writepages+0xd1/0xe0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff81190efe>] do_writepages+0x1e/0x40 Apr 25 01:03:47 yuuno kernel: [<ffffffff8122fa80>] __writeback_single_inode+0x40/0x220 Apr 25 01:03:47 yuuno kernel: [<ffffffff812306c4>] writeback_sb_inodes+0x1c4/0x490 Apr 25 01:03:47 yuuno kernel: [<ffffffff81230a2f>] __writeback_inodes_wb+0x9f/0xd0 Apr 25 01:03:47 yuuno kernel: [<ffffffff81231263>] wb_writeback+0x263/0x2f0 Apr 25 01:03:47 yuuno kernel: [<ffffffff8123170b>] bdi_writeback_workfn+0x2cb/0x460 Apr 25 01:03:47 yuuno kernel: [<ffffffff810aa59a>] process_one_work+0x17a/0x440 Apr 25 01:03:47 yuuno kernel: [<ffffffff810ab266>] worker_thread+0x126/0x3c0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810ab140>] ? manage_workers.isra.24+0x2a0/0x2a0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810b270f>] kthread+0xcf/0xe0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:03:47 yuuno kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Apr 25 01:03:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:03:47 yuuno kernel: INFO: task rsync:16810 blocked for more than 120 seconds. Apr 25 01:03:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:03:47 yuuno kernel: rsync D ffff881fd135dee0 0 16810 16807 0x00000080 Apr 25 01:03:47 yuuno kernel: Call Trace: Apr 25 01:03:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:03:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:03:47 yuuno kernel: [<ffffffffc02599f8>] ? _xfs_trans_bjoin+0x48/0x60 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:03:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:03:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:03:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc020e486>] xfs_dir3_data_read+0x26/0x60 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc02130e4>] xfs_dir2_node_addname_int+0x454/0x820 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc020a096>] ? xfs_da3_node_lookup_int+0x2c6/0x2f0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0214774>] xfs_dir2_node_addname+0xd4/0x160 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc023d4fd>] xfs_create+0x58d/0x6c0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc023a25b>] xfs_vn_mknod+0xbb/0x240 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc023a413>] xfs_vn_create+0x13/0x20 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff8120f6cd>] vfs_create+0xcd/0x130 Apr 25 01:03:47 yuuno kernel: [<ffffffff8121285a>] do_last+0x10ea/0x12c0 Apr 25 01:03:47 yuuno kernel: [<ffffffff81212af2>] path_openat+0xc2/0x490 Apr 25 01:03:47 yuuno kernel: [<ffffffff8121508b>] do_filp_open+0x4b/0xb0 Apr 25 01:03:47 yuuno kernel: [<ffffffff8122233a>] ? __alloc_fd+0x8a/0x130 Apr 25 01:03:47 yuuno kernel: [<ffffffff81201bc3>] do_sys_open+0xf3/0x1f0 Apr 25 01:03:47 yuuno kernel: [<ffffffff81201cde>] SyS_open+0x1e/0x20 Apr 25 01:03:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:03:47 yuuno kernel: INFO: task rsync:16937 blocked for more than 120 seconds. Apr 25 01:03:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:03:47 yuuno kernel: rsync D ffff88103a201fa0 0 16937 16934 0x00000080 Apr 25 01:03:47 yuuno kernel: Call Trace: Apr 25 01:03:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:03:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:03:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:03:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:03:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:03:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0208bc3>] xfs_da3_node_read+0x23/0xd0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc0209e3e>] xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc02146ea>] xfs_dir2_node_addname+0x4a/0x160 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffffc023eff7>] xfs_rename+0x6f7/0x900 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff8120e448>] ? inode_permission+0x18/0x50 Apr 25 01:03:47 yuuno kernel: [<ffffffff81210311>] ? link_path_walk+0x81/0x8b0 Apr 25 01:03:47 yuuno kernel: [<ffffffffc0238ea0>] xfs_vn_rename+0xb0/0xf0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff81213890>] vfs_rename+0x5a0/0x930 Apr 25 01:03:47 yuuno kernel: [<ffffffffc0238df0>] ? xfs_vn_follow_link+0xa0/0xa0 [xfs] Apr 25 01:03:47 yuuno kernel: [<ffffffff81214b93>] SYSC_renameat2+0x503/0x5a0 Apr 25 01:03:47 yuuno kernel: [<ffffffff81205b91>] ? __sb_end_write+0x31/0x60 Apr 25 01:03:47 yuuno kernel: [<ffffffff812230df>] ? mnt_drop_write+0x1f/0x30 Apr 25 01:03:47 yuuno kernel: [<ffffffff812249f4>] ? mntput+0x24/0x40 Apr 25 01:03:47 yuuno kernel: [<ffffffff8120d57e>] ? path_put+0x1e/0x30 Apr 25 01:03:47 yuuno kernel: [<ffffffff81215a7e>] SyS_renameat2+0xe/0x10 Apr 25 01:03:47 yuuno kernel: [<ffffffff81215abe>] SyS_rename+0x1e/0x20 Apr 25 01:03:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:03:47 yuuno kernel: INFO: task xfs_db:17136 blocked for more than 120 seconds. Apr 25 01:03:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:03:47 yuuno kernel: xfs_db D ffff88101aa65ee0 0 17136 17135 0x00000080 Apr 25 01:03:47 yuuno kernel: Call Trace: Apr 25 01:03:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:03:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810a847c>] ? __queue_work+0x13c/0x3c0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810a87ca>] ? __queue_delayed_work+0xaa/0x1a0 Apr 25 01:03:47 yuuno kernel: [<ffffffff810a8b31>] ? try_to_grab_pending+0xb1/0x160 Apr 25 01:03:47 yuuno kernel: [<ffffffff816abc5d>] wait_for_completion+0xfd/0x140 Apr 25 01:03:47 yuuno kernel: [<ffffffff810c6620>] ? wake_up_state+0x20/0x20 Apr 25 01:03:47 yuuno kernel: [<ffffffff8122fdf0>] writeback_inodes_sb_nr+0x90/0xd0 Apr 25 01:03:47 yuuno kernel: [<ffffffff8122fe55>] writeback_inodes_sb+0x25/0x30 Apr 25 01:03:47 yuuno kernel: [<ffffffff81235805>] sync_filesystem+0x35/0xb0 Apr 25 01:03:47 yuuno kernel: [<ffffffff812407f4>] fsync_bdev+0x24/0x60 Apr 25 01:03:47 yuuno kernel: [<ffffffff8130aa58>] blkdev_ioctl+0x6a8/0x980 Apr 25 01:03:47 yuuno kernel: [<ffffffff81187357>] ? mempool_free_slab+0x17/0x20 Apr 25 01:03:47 yuuno kernel: [<ffffffff8123fc91>] block_ioctl+0x41/0x50 Apr 25 01:03:47 yuuno kernel: [<ffffffff8121730d>] do_vfs_ioctl+0x33d/0x540 Apr 25 01:03:47 yuuno kernel: [<ffffffff812fe72a>] ? blkdev_issue_flush+0xda/0x110 Apr 25 01:03:47 yuuno kernel: [<ffffffff812175b1>] SyS_ioctl+0xa1/0xc0 Apr 25 01:03:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:05:47 yuuno kernel: INFO: task kworker/u898:2:16685 blocked for more than 120 seconds. Apr 25 01:05:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:05:47 yuuno kernel: kworker/u898:2 D ffff88203a9f0000 0 16685 2 0x00000080 Apr 25 01:05:47 yuuno kernel: Workqueue: writeback bdi_writeback_workfn (flush-9:0) Apr 25 01:05:47 yuuno kernel: Call Trace: Apr 25 01:05:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:05:47 yuuno kernel: [<ffffffff8151df05>] bitmap_startwrite+0x1f5/0x210 Apr 25 01:05:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:05:47 yuuno kernel: [<ffffffffc05482c1>] add_stripe_bio+0x3f1/0x770 [raid456] Apr 25 01:05:47 yuuno kernel: [<ffffffffc054c701>] raid5_make_request+0x211/0xce0 [raid456] Apr 25 01:05:47 yuuno kernel: [<ffffffff810b3690>] ? wake_up_atomic_t+0x30/0x30 Apr 25 01:05:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff8150c644>] md_make_request+0x104/0x290 Apr 25 01:05:47 yuuno kernel: [<ffffffff812fb015>] generic_make_request+0x105/0x310 Apr 25 01:05:47 yuuno kernel: [<ffffffff812fb290>] submit_bio+0x70/0x150 Apr 25 01:05:47 yuuno kernel: [<ffffffffc0222661>] xfs_submit_ioend.isra.16+0x61/0xe0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0222821>] xfs_vm_writepages+0xd1/0xe0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff81190efe>] do_writepages+0x1e/0x40 Apr 25 01:05:47 yuuno kernel: [<ffffffff8122fa80>] __writeback_single_inode+0x40/0x220 Apr 25 01:05:47 yuuno kernel: [<ffffffff812306c4>] writeback_sb_inodes+0x1c4/0x490 Apr 25 01:05:47 yuuno kernel: [<ffffffff81230a2f>] __writeback_inodes_wb+0x9f/0xd0 Apr 25 01:05:47 yuuno kernel: [<ffffffff81231263>] wb_writeback+0x263/0x2f0 Apr 25 01:05:47 yuuno kernel: [<ffffffff8123170b>] bdi_writeback_workfn+0x2cb/0x460 Apr 25 01:05:47 yuuno kernel: [<ffffffff810aa59a>] process_one_work+0x17a/0x440 Apr 25 01:05:47 yuuno kernel: [<ffffffff810ab266>] worker_thread+0x126/0x3c0 Apr 25 01:05:47 yuuno kernel: [<ffffffff810ab140>] ? manage_workers.isra.24+0x2a0/0x2a0 Apr 25 01:05:47 yuuno kernel: [<ffffffff810b270f>] kthread+0xcf/0xe0 Apr 25 01:05:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:05:47 yuuno kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90 Apr 25 01:05:47 yuuno kernel: [<ffffffff810b2640>] ? insert_kthread_work+0x40/0x40 Apr 25 01:05:47 yuuno kernel: INFO: task rsync:16810 blocked for more than 120 seconds. Apr 25 01:05:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:05:47 yuuno kernel: rsync D ffff881fd135dee0 0 16810 16807 0x00000080 Apr 25 01:05:47 yuuno kernel: Call Trace: Apr 25 01:05:47 yuuno kernel: [<ffffffffc0249487>] ? kmem_zone_alloc+0x97/0x130 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:05:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:05:47 yuuno kernel: [<ffffffffc02599f8>] ? _xfs_trans_bjoin+0x48/0x60 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:05:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:05:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:05:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc020e486>] xfs_dir3_data_read+0x26/0x60 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc02130e4>] xfs_dir2_node_addname_int+0x454/0x820 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc020a096>] ? xfs_da3_node_lookup_int+0x2c6/0x2f0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0214774>] xfs_dir2_node_addname+0xd4/0x160 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc023d4fd>] xfs_create+0x58d/0x6c0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc023a25b>] xfs_vn_mknod+0xbb/0x240 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc023a413>] xfs_vn_create+0x13/0x20 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff8120f6cd>] vfs_create+0xcd/0x130 Apr 25 01:05:47 yuuno kernel: [<ffffffff8121285a>] do_last+0x10ea/0x12c0 Apr 25 01:05:47 yuuno kernel: [<ffffffff81212af2>] path_openat+0xc2/0x490 Apr 25 01:05:47 yuuno kernel: [<ffffffff8121508b>] do_filp_open+0x4b/0xb0 Apr 25 01:05:47 yuuno kernel: [<ffffffff8122233a>] ? __alloc_fd+0x8a/0x130 Apr 25 01:05:47 yuuno kernel: [<ffffffff81201bc3>] do_sys_open+0xf3/0x1f0 Apr 25 01:05:47 yuuno kernel: [<ffffffff81201cde>] SyS_open+0x1e/0x20 Apr 25 01:05:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b Apr 25 01:05:47 yuuno kernel: INFO: task rsync:16937 blocked for more than 120 seconds. Apr 25 01:05:47 yuuno kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 25 01:05:47 yuuno kernel: rsync D ffff88103a201fa0 0 16937 16934 0x00000080 Apr 25 01:05:47 yuuno kernel: Call Trace: Apr 25 01:05:47 yuuno kernel: [<ffffffff816ab8a9>] schedule+0x29/0x70 Apr 25 01:05:47 yuuno kernel: [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 Apr 25 01:05:47 yuuno kernel: [<ffffffffc021c4a7>] ? xfs_iext_bno_to_ext+0xa7/0x1a0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff816aab47>] __down_common+0xaa/0x104 Apr 25 01:05:47 yuuno kernel: [<ffffffffc02294a0>] ? _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff816aabbe>] __down+0x1d/0x1f Apr 25 01:05:47 yuuno kernel: [<ffffffff810b8411>] down+0x41/0x50 Apr 25 01:05:47 yuuno kernel: [<ffffffffc022929c>] xfs_buf_lock+0x3c/0xd0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc02294a0>] _xfs_buf_find+0x170/0x330 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc022968a>] xfs_buf_get_map+0x2a/0x240 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc022a230>] xfs_buf_read_map+0x30/0x160 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0259ef1>] xfs_trans_read_buf_map+0x211/0x400 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0208b74>] xfs_da_read_buf+0xd4/0x100 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0208bc3>] xfs_da3_node_read+0x23/0xd0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc0209e3e>] xfs_da3_node_lookup_int+0x6e/0x2f0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc02146ea>] xfs_dir2_node_addname+0x4a/0x160 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc020bf55>] xfs_dir_createname+0x1d5/0x1f0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffffc023eff7>] xfs_rename+0x6f7/0x900 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff8120e448>] ? inode_permission+0x18/0x50 Apr 25 01:05:47 yuuno kernel: [<ffffffff81210311>] ? link_path_walk+0x81/0x8b0 Apr 25 01:05:47 yuuno kernel: [<ffffffffc0238ea0>] xfs_vn_rename+0xb0/0xf0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff81213890>] vfs_rename+0x5a0/0x930 Apr 25 01:05:47 yuuno kernel: [<ffffffffc0238df0>] ? xfs_vn_follow_link+0xa0/0xa0 [xfs] Apr 25 01:05:47 yuuno kernel: [<ffffffff81214b93>] SYSC_renameat2+0x503/0x5a0 Apr 25 01:05:47 yuuno kernel: [<ffffffff81205b91>] ? __sb_end_write+0x31/0x60 Apr 25 01:05:47 yuuno kernel: [<ffffffff812230df>] ? mnt_drop_write+0x1f/0x30 Apr 25 01:05:47 yuuno kernel: [<ffffffff812249f4>] ? mntput+0x24/0x40 Apr 25 01:05:47 yuuno kernel: [<ffffffff8120d57e>] ? path_put+0x1e/0x30 Apr 25 01:05:47 yuuno kernel: [<ffffffff81215a7e>] SyS_renameat2+0xe/0x10 Apr 25 01:05:47 yuuno kernel: [<ffffffff81215abe>] SyS_rename+0x1e/0x20 Apr 25 01:05:47 yuuno kernel: [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b |
|
Hello. I ran into the same problem on the same server. I have two servers, one works correctly, another hangs in a few days. On the server where the following file system information hangs: [~]# xfs_info / meta-data=/dev/md125 isize=256 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 A server that does not hang has the following information: [~]# xfs_info / meta-data=/dev/md125 isize=512 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Maybe there is a problem in isize or crc? Can someone check the information at home? |
|
1) 3.10.0-862.el7.x86_64 has dropped for Centos 2) xfs_info / meta-data=/dev/mapper/centos-root isize=256 agcount=4, agsize=3276800 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=13107200, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal bsize=4096 blocks=6400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 |
|
Thanks for the update thompsop, let us know if the new kernel fixes the problem for you. I'll leave my production servers on the old kernel until its verified to be resolved. I currently don't have any development servers on hand that i can test it with. | |
No further issues 3.10.0-862.3.2.el7.x86_64 | |
Late update, but i also stopped experiencing any issues on 3.10.0-862.3.2.el7.x86_64. This bug report can be closed now i think? | |
Closing as 'resolved' as per the reports. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-09-18 14:19 | thompsop | New Issue | |
2017-10-18 11:34 | thompsop | Note Added: 0030400 | |
2017-10-25 21:44 | thompsop | Note Added: 0030468 | |
2017-12-03 10:21 | Neil Mukerji | Note Added: 0030693 | |
2017-12-03 18:29 | thompsop | Note Added: 0030694 | |
2017-12-15 21:49 | gregb | Note Added: 0030761 | |
2018-01-05 14:14 | smuku@salesforce.com | Note Added: 0030855 | |
2018-01-05 15:13 | gbeeley | Note Added: 0030856 | |
2018-01-05 16:08 | thompsop | Tag Attached: xfs | |
2018-01-05 16:09 | thompsop | Tag Attached: upstream | |
2018-01-05 16:10 | thompsop | Tag Attached: 7.4 | |
2018-01-05 16:10 | thompsop | Tag Attached: file system | |
2018-01-22 12:11 | joerg | Note Added: 0030981 | |
2018-01-28 16:05 | thompsop | Note Added: 0031076 | |
2018-01-28 17:00 | TrevorH | Note Added: 0031077 | |
2018-01-29 14:54 | thompsop | Note Added: 0031084 | |
2018-01-29 15:20 | joerg | Note Added: 0031087 | |
2018-01-29 18:08 | diito | Note Added: 0031091 | |
2018-01-30 02:43 | thompsop | Note Added: 0031100 | |
2018-01-30 02:44 | thompsop | Note Added: 0031101 | |
2018-01-30 13:54 | thompsop | Note Added: 0031109 | |
2018-01-30 15:42 | thompsop | Note Added: 0031110 | |
2018-01-30 16:15 | TrevorH | Note Added: 0031111 | |
2018-01-30 16:36 | thompsop | Note Added: 0031113 | |
2018-01-30 18:06 | diito | Note Added: 0031119 | |
2018-01-30 19:55 | thompsop | Note Added: 0031121 | |
2018-01-30 19:55 | joerg | Note Added: 0031122 | |
2018-01-30 20:07 | thompsop | Note Added: 0031123 | |
2018-01-31 12:05 | slayerduck | Note Added: 0031129 | |
2018-01-31 12:26 | slayerduck | Note Added: 0031130 | |
2018-02-23 02:44 | justin2net | Note Added: 0031294 | |
2018-02-23 02:45 | justin2net | Note Added: 0031295 | |
2018-02-23 10:06 | alphacc | Note Added: 0031299 | |
2018-02-27 12:49 | thompsop | Note Added: 0031334 | |
2018-03-06 00:38 | justin2net | Note Added: 0031360 | |
2018-03-06 02:00 | thompsop | Note Added: 0031361 | |
2018-03-06 06:21 | justin2net | Note Added: 0031363 | |
2018-03-06 10:29 | alphacc | Note Added: 0031365 | |
2018-03-06 17:45 | thompsop | Note Added: 0031370 | |
2018-03-06 17:52 | toracat | Note Added: 0031371 | |
2018-03-08 06:37 | justin2net | Note Added: 0031378 | |
2018-03-11 00:18 | thompsop | Note Added: 0031407 | |
2018-03-11 14:14 | thompsop | Note Added: 0031408 | |
2018-03-12 09:56 | alphacc | Note Added: 0031410 | |
2018-03-12 16:37 | thompsop | Note Added: 0031417 | |
2018-03-14 21:57 | justin2net | Note Added: 0031443 | |
2018-04-04 04:51 | sles | Note Added: 0031557 | |
2018-04-04 21:37 | diito | Note Added: 0031563 | |
2018-04-25 12:23 | slayerduck | File Added: yuuno.txt | |
2018-04-25 12:23 | slayerduck | Note Added: 0031677 | |
2018-05-05 20:04 | Yoh | Note Added: 0031720 | |
2018-05-10 00:52 | thompsop | Note Added: 0031738 | |
2018-05-10 15:06 | slayerduck | Note Added: 0031751 | |
2018-05-29 18:39 | thompsop | Note Added: 0031945 | |
2018-09-15 13:26 | slayerduck | Note Added: 0032724 | |
2018-09-15 17:32 | toracat | Status | new => resolved |
2018-09-15 17:32 | toracat | Resolution | open => fixed |
2018-09-15 17:32 | toracat | Note Added: 0032726 |