View Issue Details

IDProjectCategoryView StatusLast Update
0015715CentOS-7kernelpublic2019-01-16 21:02
Reporterwangyumu 
PrioritynormalSeveritymajorReproducibilitysometimes
Status newResolutionopen 
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0015715: process hang on shrink_inactive_list when reach the memory limit of cgroup
Descriptionwe found that some filebeat process stop working wtih the stack:
[<ffffffff810f5126>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f5deb>] futex_wait+0x17b/0x280
[<ffffffff810f7b26>] do_futex+0x106/0x5a0
[<ffffffff810f8040>] SyS_futex+0x80/0x180
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

and if you kill , it will become <defunct> state.

in the dmesg, there is some info:
Jan 15 14:30:57 kernel: INFO: task filebeat:34257 blocked for more than 120 seconds.
Jan 15 14:30:57 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 15 14:30:57 kernel: INFO: task filebeat:34257 blocked for more than 120 seconds.
Jan 15 14:30:57 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 15 14:30:57 kernel: filebeat D ffffffff816a75f0 0 34257 30603 0x00000080
Jan 15 14:30:57 kernel: filebeat D ffffffff816a75f0 0 34257 30603 0x00000080
Jan 15 14:30:57 kernel: ffff882b012aafe0 0000000000000086 ffff883b93e4eeb0 ffff882b012abfd8
Jan 15 14:30:57 kernel: ffff882b012aafe0 0000000000000086 ffff883b93e4eeb0 ffff882b012abfd8
Jan 15 14:30:57 kernel: ffff882b012abfd8 ffff882b012abfd8 ffff883b93e4eeb0 ffff881fbfc16cc0
Jan 15 14:30:57 kernel: ffff882b012abfd8 ffff882b012abfd8 ffff883b93e4eeb0 ffff881fbfc16cc0
Jan 15 14:30:57 kernel: 0000000000000000 7fffffffffffffff ffff88203ff958e8 ffffffff816a75f0
Jan 15 14:30:57 kernel: Call Trace:
Jan 15 14:30:57 kernel: 0000000000000000 7fffffffffffffff ffff88203ff958e8 ffffffff816a75f0
Jan 15 14:30:57 kernel: Call Trace:
Jan 15 14:30:57 kernel: [<ffffffff816a75f0>] ? bit_wait+0x50/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a75f0>] ? bit_wait+0x50/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
Jan 15 14:30:57 kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
Jan 15 14:30:57 kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
Jan 15 14:30:57 kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
Jan 15 14:30:57 kernel: [<ffffffff811de581>] ? __slab_free+0x81/0x2f0
Jan 15 14:30:57 kernel: [<ffffffff811de581>] ? __slab_free+0x81/0x2f0
Jan 15 14:30:57 kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
Jan 15 14:30:57 kernel: [<ffffffff81062efe>] ? kvm_clock_get_cycles+0x1e/0x20
Jan 15 14:30:57 kernel: [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
Jan 15 14:30:57 kernel: [<ffffffff810e939c>] ? ktime_get_ts64+0x4c/0xf0
Jan 15 14:30:57 kernel: [<ffffffff816a75f0>] ? bit_wait+0x50/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a75f0>] ? bit_wait+0x50/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a8b4d>] io_schedule_timeout+0xad/0x130
Jan 15 14:30:57 kernel: [<ffffffff816a8b4d>] io_schedule_timeout+0xad/0x130
Jan 15 14:30:57 kernel: [<ffffffff816a8be8>] io_schedule+0x18/0x20
Jan 15 14:30:57 kernel: [<ffffffff816a8be8>] io_schedule+0x18/0x20
Jan 15 14:30:57 kernel: [<ffffffff816a7601>] bit_wait_io+0x11/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a7601>] bit_wait_io+0x11/0x50
Jan 15 14:30:57 kernel: [<ffffffff816a7125>] __wait_on_bit+0x65/0x90
Jan 15 14:30:57 kernel: [<ffffffff816a7125>] __wait_on_bit+0x65/0x90
Jan 15 14:30:57 kernel: [<ffffffff812361f2>] ? free_buffer_head+0x22/0x50
Jan 15 14:30:57 kernel: [<ffffffff812361f2>] ? free_buffer_head+0x22/0x50
Jan 15 14:30:57 kernel: [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
Jan 15 14:30:57 kernel: [<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
Jan 15 14:30:57 kernel: [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
Jan 15 14:30:57 kernel: [<ffffffff810b19d0>] ? wake_bit_function+0x40/0x40
Jan 15 14:30:57 kernel: [<ffffffff811965c2>] shrink_page_list+0x842/0xad0
Jan 15 14:30:57 kernel: [<ffffffff811965c2>] shrink_page_list+0x842/0xad0
Jan 15 14:30:57 kernel: [<ffffffff81196e7a>] shrink_inactive_list+0x1ca/0x5d0
Jan 15 14:30:57 kernel: [<ffffffff81196e7a>] shrink_inactive_list+0x1ca/0x5d0
Jan 15 14:30:57 kernel: [<ffffffff81197975>] shrink_lruvec+0x385/0x730
Jan 15 14:30:57 kernel: [<ffffffff81197975>] shrink_lruvec+0x385/0x730
Jan 15 14:30:57 kernel: [<ffffffffc0280400>] ? kmem_zone_alloc+0x60/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0280400>] ? kmem_zone_alloc+0x60/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff81197d96>] shrink_zone+0x76/0x1a0
Jan 15 14:30:57 kernel: [<ffffffff81197d96>] shrink_zone+0x76/0x1a0
Jan 15 14:30:57 kernel: [<ffffffff811982a0>] do_try_to_free_pages+0xf0/0x4e0
Jan 15 14:30:57 kernel: [<ffffffff811982a0>] do_try_to_free_pages+0xf0/0x4e0
Jan 15 14:30:57 kernel: [<ffffffff8133d9dd>] ? list_del+0xd/0x30
Jan 15 14:30:57 kernel: [<ffffffff8133d9dd>] ? list_del+0xd/0x30
Jan 15 14:30:57 kernel: [<ffffffff811988ea>] try_to_free_mem_cgroup_pages+0xda/0x170
Jan 15 14:30:57 kernel: [<ffffffff811988ea>] try_to_free_mem_cgroup_pages+0xda/0x170
Jan 15 14:30:57 kernel: [<ffffffff811f386e>] mem_cgroup_reclaim+0x4e/0x120
Jan 15 14:30:57 kernel: [<ffffffff811f386e>] mem_cgroup_reclaim+0x4e/0x120
Jan 15 14:30:57 kernel: [<ffffffff811f3e2c>] __mem_cgroup_try_charge+0x4ec/0x670
Jan 15 14:30:57 kernel: [<ffffffff811f3e2c>] __mem_cgroup_try_charge+0x4ec/0x670
Jan 15 14:30:57 kernel: [<ffffffff811f6126>] __memcg_kmem_newpage_charge+0xf6/0x180
Jan 15 14:30:57 kernel: [<ffffffff811f6126>] __memcg_kmem_newpage_charge+0xf6/0x180
Jan 15 14:30:57 kernel: [<ffffffff8118cbad>] __alloc_pages_nodemask+0x22d/0x420
Jan 15 14:30:57 kernel: [<ffffffff811d1108>] ? alloc_pages_current+0x98/0x110
Jan 15 14:30:57 kernel: [<ffffffff8118cbad>] __alloc_pages_nodemask+0x22d/0x420
Jan 15 14:30:57 kernel: [<ffffffff811d1108>] ? alloc_pages_current+0x98/0x110
Jan 15 14:30:57 kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
Jan 15 14:30:57 kernel: [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
Jan 15 14:30:57 kernel: [<ffffffff811dbe7c>] new_slab+0x2fc/0x310
Jan 15 14:30:57 kernel: [<ffffffff811dd70c>] ___slab_alloc+0x3ac/0x4f0
Jan 15 14:30:57 kernel: [<ffffffff811dbe7c>] new_slab+0x2fc/0x310
Jan 15 14:30:57 kernel: [<ffffffff811dd70c>] ___slab_alloc+0x3ac/0x4f0
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff811df6a3>] ? kmem_cache_alloc+0x193/0x1e0
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff811df6a3>] ? kmem_cache_alloc+0x193/0x1e0
Jan 15 14:30:57 kernel: [<ffffffff81185455>] ? mempool_alloc_slab+0x15/0x20
Jan 15 14:30:57 kernel: [<ffffffff81185455>] ? mempool_alloc_slab+0x15/0x20
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff816a10ce>] __slab_alloc+0x40/0x5c
Jan 15 14:30:57 kernel: [<ffffffff816a10ce>] __slab_alloc+0x40/0x5c
Jan 15 14:30:57 kernel: [<ffffffff811df6a3>] kmem_cache_alloc+0x193/0x1e0
Jan 15 14:30:57 kernel: [<ffffffff811df6a3>] kmem_cache_alloc+0x193/0x1e0
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] ? kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0280437>] kmem_zone_alloc+0x97/0x130 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc027f99d>] xfs_trans_alloc+0x6d/0x140 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc02593de>] xfs_setfilesize_trans_alloc.isra.14+0x3e/0x90 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc027f99d>] xfs_trans_alloc+0x6d/0x140 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc02593de>] xfs_setfilesize_trans_alloc.isra.14+0x3e/0x90 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0259638>] xfs_submit_ioend.isra.16+0xb8/0xe0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0259638>] xfs_submit_ioend.isra.16+0xb8/0xe0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0259e44>] xfs_do_writepage+0x4c4/0x530 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0259e44>] xfs_do_writepage+0x4c4/0x530 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8118dc91>] write_cache_pages+0x251/0x4d0
Jan 15 14:30:57 kernel: [<ffffffff8118dc91>] write_cache_pages+0x251/0x4d0
Jan 15 14:30:57 kernel: [<ffffffffc0259980>] ? xfs_aops_discard_page+0x150/0x150 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0259980>] ? xfs_aops_discard_page+0x150/0x150 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc025978d>] xfs_vm_writepages+0xbd/0xe0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8118effe>] do_writepages+0x1e/0x40
Jan 15 14:30:57 kernel: [<ffffffffc025978d>] xfs_vm_writepages+0xbd/0xe0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8118effe>] do_writepages+0x1e/0x40
Jan 15 14:30:57 kernel: [<ffffffff81183c55>] __filemap_fdatawrite_range+0x65/0x80
Jan 15 14:30:57 kernel: [<ffffffff81183c55>] __filemap_fdatawrite_range+0x65/0x80
Jan 15 14:30:57 kernel: [<ffffffff81183da1>] filemap_write_and_wait_range+0x41/0x90
Jan 15 14:30:57 kernel: [<ffffffff81183da1>] filemap_write_and_wait_range+0x41/0x90
Jan 15 14:30:57 kernel: [<ffffffffc0264b16>] xfs_file_fsync+0x66/0x1e0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8123331d>] generic_write_sync+0x4d/0x60
Jan 15 14:30:57 kernel: [<ffffffffc0264b16>] xfs_file_fsync+0x66/0x1e0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8123331d>] generic_write_sync+0x4d/0x60
Jan 15 14:30:57 kernel: [<ffffffffc0266d55>] xfs_file_aio_write+0x155/0x1b0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffffc0266d55>] xfs_file_aio_write+0x155/0x1b0 [xfs]
Jan 15 14:30:57 kernel: [<ffffffff8120026d>] do_sync_write+0x8d/0xd0
Jan 15 14:30:57 kernel: [<ffffffff8120026d>] do_sync_write+0x8d/0xd0
Jan 15 14:30:57 kernel: [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
Jan 15 14:30:57 kernel: [<ffffffff81200d2d>] vfs_write+0xbd/0x1e0
Jan 15 14:30:57 kernel: [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
Jan 15 14:30:57 kernel: [<ffffffff81201b3f>] SyS_write+0x7f/0xe0
Jan 15 14:30:57 kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
Jan 15 14:30:57 kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b

the filebeat running inside a docker with 2G memory limit, and from our monitoring system, we found that when this happened, the memory usage of docker around 1.95G.

the filebeat is under a heavy load with 5000 log files opened for collecting, and use lots of filesystem cache, and if you execute sync, it will became D state with stack :

[<ffffffff81181cb1>] wait_on_page_bit+0x81/0xa0
[<ffffffff81181de1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffff81184bd7>] filemap_fdatawait_keep_errors+0x27/0x30
[<ffffffff8122dd1d>] sync_inodes_sb+0x16d/0x1f0
[<ffffffff81233349>] sync_inodes_one_sb+0x19/0x20
[<ffffffff81204ea1>] iterate_supers+0xc1/0x120
[<ffffffff81233614>] sys_sync+0x44/0xb0
[<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Steps To Reproducefive node occured in our cluster (totally more than 20 nodes), but can not reproduce
Tagscgroup, defunct, hang, sync
abrt_hash
URL

Activities

wangyumu

wangyumu

2019-01-16 21:02

reporter   ~0033609

os :
Linux tb-ops01 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

sysctl:
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 50
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 50
vm.dirty_writeback_centisecs = 360000
vm.drop_caches = 1
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256 256 32
vm.max_map_count = 655360
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 4096000
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 1
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 0
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

slabtop:
 Active / Total Objects (% used) : 46752646 / 59346525 (78.8%)
 Active / Total Slabs (% used) : 1356757 / 1356757 (100.0%)
 Active / Total Caches (% used) : 99 / 129 (76.7%)
 Active / Total Size (% used) : 15687900.45K / 18818904.98K (83.4%)
 Minimum / Average / Maximum Object : 0.01K / 0.32K / 8.00K

  OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
17319666 16053457 92% 0.19K 412377 42 3299016K dentry
14000427 13995290 99% 0.64K 285723 49 9143136K proc_inode_cache
7833345 6638647 84% 0.10K 200855 39 803420K buffer_head
2655680 1201683 45% 0.06K 41495 64 165980K kmalloc-64
2582912 311329 12% 0.03K 20179 128 80716K kmalloc-32
1936816 993571 51% 0.57K 69295 28 1108720K radix_tree_node
1687663 751879 44% 0.16K 33095 51 264760K sigqueue
1551780 1551780 100% 0.11K 43105 36 172420K kernfs_node_cache
1343206 615376 45% 0.94K 39589 34 1266848K xfs_inode
966420 211045 21% 0.19K 23010 42 184080K kmalloc-192
858176 280943 32% 0.25K 26818 32 214544K kmalloc-256
855204 329172 38% 0.09K 20362 42 81448K kmalloc-96
833520 74554 8% 0.38K 19848 42 317568K mnt_cache
609984 110922 18% 0.50K 19062 32 304992K kmalloc-512
554260 549404 99% 0.21K 14980 37 119840K vm_area_struct
490467 483529 98% 0.08K 9617 51 38468K anon_vma
486944 231524 47% 0.12K 15217 32 60868K kmalloc-128
423680 422682 99% 0.02K 1655 256 6620K kmalloc-16
349544 147647 42% 1.00K 10925 32 349600K kmalloc-1024
295625 161844 54% 0.58K 5375 55 172000K inode_cache
179200 179200 100% 0.01K 350 512 1400K kmalloc-8
136850 136850 100% 0.02K 805 170 3220K scsi_data_buffer
131327 130745 99% 0.05K 1799 73 7196K fanotify_event_info
 76992 76921 99% 0.06K 1203 64 4812K kmem_cache_node
 74001 73793 99% 0.62K 1451 51 46432K sock_inode_cache

Issue History

Date Modified Username Field Change
2019-01-16 20:56 wangyumu New Issue
2019-01-16 20:56 wangyumu Tag Attached: hang
2019-01-16 20:56 wangyumu Tag Attached: cgroup
2019-01-16 20:56 wangyumu Tag Attached: defunct
2019-01-16 20:56 wangyumu Tag Attached: sync
2019-01-16 21:02 wangyumu Note Added: 0033609