View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0017623 | CentOS-7 | kernel | public | 2020-07-27 13:46 | 2020-07-27 13:50 |
Reporter | petrilak.m | Assigned To | |||
Priority | urgent | Severity | block | Reproducibility | always |
Status | new | Resolution | open | ||
Platform | x86-64 | OS | CentOS | OS Version | 7.8.2003 |
Product Version | 7.8-2003 | ||||
Summary | 0017623: xfs filesystem hung up | ||||
Description | I run script for process some files (about 23 thousands files). Program convert files to other format via temporary file (located in data directory). After each conversion temp file is deleted. Temp file is deleted by rm command. After some time, in some iteration, process rm hung up in state D+. If temporary file is created in /tmp (ramdisk), there are no problem. Normal disk is xfs filesystem on disk array connected via fibrechannel. There is also problem in other operation other than deletion. Some program for process data hungup, also. State D+ stay for very long time (>24 hours). iotop indicate all data transfers are zero. A have physical computer with only one virtual machine (oVirt virtualization). Data disk is connected to virtual machine as "direct LUN". I run xfs_repair with some arguments for repair filesystem, but it does not help. I think there is some problem in xfs filesystem driver. If you need some logs or dumps, feel free to inform me. iostat output: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.00 0.00 0.00 0.00 0.00 100.00 ps aufx: fischer+ 4541 0.0 0.0 125976 2304 pts/1 Ss Jul24 0:00 | \_ bash fischer+ 4617 0.0 0.0 129824 2136 pts/1 S+ Jul24 0:00 | | \_ /usr/bin/perl ./pl/run_domains.pl fischer+ 4618 0.0 0.0 113292 1204 pts/1 S+ Jul24 0:00 | | \_ sh -c /mnt/data/DisALEXI/CZECH/code/Programs/preprocess_inputs.pl >& /mnt/data/DisALEXI/CZECH/domains/DryPan_2001-2020/messages/preprocess_inputs.ou fischer+ 4620 0.0 0.0 129960 2360 pts/1 S+ Jul24 0:00 | | \_ /usr/bin/perl /mnt/data/DisALEXI/CZECH/code/Programs/preprocess_inputs.pl fischer+ 77528 0.0 0.0 113292 1204 pts/1 S+ Jul26 0:00 | | \_ sh -c /mnt/data/DisALEXI/CZECH/code/SysPrograms/bin/modlai_daily_smooth.csh MCD15A3H 2000210 2020180 >& /mnt/data/DisALEXI/CZECH/domains/Dry fischer+ 77529 0.0 0.0 137404 3064 pts/1 S+ Jul26 0:16 | | \_ /bin/csh -f /mnt/data/DisALEXI/CZECH/code/SysPrograms/bin/modlai_daily_smooth.csh MCD15A3H 2000210 2020180 fischer+ 114454 6.0 0.0 64660 58072 pts/1 D+ Jul26 138:44 | | \_ /mnt/data/DisALEXI/CZECH/code/SysPrograms/bin/modlai_daily_smooth.exe lai_input.2000210-2020180.txt Below is dmesg log after program stuck in D+ state: [133922.264189] INFO: task kworker/u480:2:67137 blocked for more than 120 seconds. [133922.265249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [133922.266034] kworker/u480:2 D ffff8b1201583150 0 67137 2 0x00000080 [133922.266770] Workqueue: writeback bdi_writeback_workfn (flush-253:2) [133922.267526] Call Trace: [133922.268275] [<ffffffffb1ce0628>] ? __enqueue_entity+0x78/0x80 [133922.269039] [<ffffffffb2385da9>] schedule+0x29/0x70 [133922.269780] [<ffffffffb23838b1>] schedule_timeout+0x221/0x2d0 [133922.270534] [<ffffffffb1ccc5dd>] ? down_trylock+0x2d/0x40 [133922.271308] [<ffffffffc086821f>] ? xfs_buf_trylock+0x1f/0xc0 [xfs] [133922.272054] [<ffffffffb23851b7>] __down_common+0xaa/0x104 [133922.272791] [<ffffffffc0868500>] ? _xfs_buf_find.isra.9+0x170/0x330 [xfs] [133922.273551] [<ffffffffb238522e>] __down+0x1d/0x1f [133922.274297] [<ffffffffb1ccc631>] down+0x41/0x50 [133922.275053] [<ffffffffc08682fc>] xfs_buf_lock+0x3c/0xd0 [xfs] [133922.275784] [<ffffffffc0868500>] _xfs_buf_find.isra.9+0x170/0x330 [xfs] [133922.276551] [<ffffffffc0868755>] xfs_buf_get_map+0x35/0x250 [xfs] [133922.277323] [<ffffffffc0868f20>] xfs_buf_read_map+0x30/0x160 [xfs] [133922.278089] [<ffffffffc0899d49>] xfs_trans_read_buf_map+0xe9/0x2c0 [xfs] [133922.278821] [<ffffffffc083e934>] xfs_btree_read_buf_block.constprop.33+0xa4/0xe0 [xfs] [133922.279589] [<ffffffffc0842a45>] xfs_btree_lookup_get_block+0x95/0x1a0 [xfs] [133922.280364] [<ffffffffc0842cff>] xfs_btree_lookup+0xdf/0x420 [xfs] [133922.281130] [<ffffffffc083418f>] xfs_bmbt_lookup_eq+0x2f/0x40 [xfs] [133922.281863] [<ffffffffc08371e4>] xfs_bmap_add_extent_delay_real+0x864/0x11e0 [xfs] [133922.282628] [<ffffffffc0839624>] ? xfs_bmap_btalloc+0x2d4/0x7e0 [xfs] [133922.283385] [<ffffffffb1e28852>] ? kmem_cache_alloc+0x1c2/0x1f0 [133922.284137] [<ffffffffc083b7c8>] xfs_bmapi_write+0x7f8/0xb70 [xfs] [133922.284854] [<ffffffffc0888d67>] ? kmem_zone_alloc+0x97/0x130 [xfs] [133922.285593] [<ffffffffc0877cf2>] xfs_iomap_write_allocate+0x182/0x380 [xfs] [133922.286333] [<ffffffffc0861446>] xfs_map_blocks+0x1a6/0x220 [xfs] [133922.287071] [<ffffffffc0862484>] xfs_do_writepage+0x174/0x550 [xfs] [133922.287766] [<ffffffffb1dca1bc>] write_cache_pages+0x21c/0x470 [133922.288501] [<ffffffffc0862310>] ? xfs_vm_writepages+0xa0/0xa0 [xfs] [133922.289232] [<ffffffffb1f546a0>] ? submit_bio+0x70/0x150 [133922.290707] [<ffffffffc08622db>] xfs_vm_writepages+0x6b/0xa0 [xfs] [133922.291879] [<ffffffffb1dcb211>] do_writepages+0x21/0x50 [133922.292870] [<ffffffffb1e7cc30>] __writeback_single_inode+0x40/0x260 [133922.293816] [<ffffffffb1cc7745>] ? wake_up_bit+0x25/0x30 [133922.294726] [<ffffffffb1e7d7c4>] writeback_sb_inodes+0x1c4/0x430 [133922.295624] [<ffffffffb1e7dacf>] __writeback_inodes_wb+0x9f/0xd0 [133922.296642] [<ffffffffb1e7dfb3>] wb_writeback+0x263/0x2f0 [133922.297562] [<ffffffffb1e7eaac>] bdi_writeback_workfn+0x1cc/0x460 [133922.298441] [<ffffffffb1cbe6bf>] process_one_work+0x17f/0x440 [133922.299294] [<ffffffffb1cbf7d6>] worker_thread+0x126/0x3c0 [133922.300148] [<ffffffffb1cbf6b0>] ? manage_workers.isra.26+0x2a0/0x2a0 [133922.300967] [<ffffffffb1cc6691>] kthread+0xd1/0xe0 [133922.301757] [<ffffffffb1cc65c0>] ? insert_kthread_work+0x40/0x40 [133922.302615] [<ffffffffb2392d24>] ret_from_fork_nospec_begin+0xe/0x21 [133922.303442] [<ffffffffb1cc65c0>] ? insert_kthread_work+0x40/0x40 | ||||
Steps To Reproduce | after each run of my program for compute. A have try to run it for 10-times and allways it hung up | ||||
Tags | xfs | ||||
abrt_hash | |||||
URL | |||||
What is your current kernel version as reported by `uname -a` | |
Linux disalexi.local.czechglobe.cz 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-07-27 13:46 | petrilak.m | New Issue | |
2020-07-27 13:46 | petrilak.m | Tag Attached: xfs | |
2020-07-27 13:48 | TrevorH | Note Added: 0037419 | |
2020-07-27 13:50 | petrilak.m | Note Added: 0037420 |