View Issue Details

IDProjectCategoryView StatusLast Update
0016265CentOS-7kernelpublic2019-07-15 02:58
Reporterericstarmars 
PrioritynormalSeveritycrashReproducibilityunable to reproduce
Status newResolutionopen 
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0016265: kernel panics in fsnotify_connector_destroy_workfn/kmem_cache_free
DescriptionThe kernel version is 3.10.0-862.el7.x86_64.

We have k8s pods running on this, kubelet version 1.13.7. And quite ocassionally, we encountered kernel panic, like the following:

[97056.631232] Workqueue: events_unbound fsnotify_connector_destroy_workfn
[97056.632214] task: ffff95a43f6c8fd0 ti: ffff95a34e244000 task.ti: ffff95a34e244000
[97056.632827] RIP: 0010:[<ffffffff9d1f6663>] [<ffffffff9d1f6663>] kmem_cache_free+0x143/0x200
[97056.633493] RSP: 0018:ffff95a34e247de0 EFLAGS: 00010286
[97056.634485] RAX: ffffe3f4e8000000 RBX: ffff95a400000000 RCX: 0000000000000000
[97056.635527] RDX: ffffe3f4e8000000 RSI: ffff95a400000000 RDI: 0000000000000000
[97056.636575] RBP: ffff95a34e247df8 R08: ffffffff9dd420a0 R09: 000188d6db4a30a0
[97056.637234] R10: 000188d6db4a30a0 R11: 0000000000000005 R12: ffff959b75981800
[97056.638034] R13: ffff959b6f515400 R14: ffff959b7fd13e00 R15: 0000000000000200
[97056.639436] FS: 0000000000000000(0000) GS:ffff95a43fc80000(0000) knlGS:0000000000000000
[97056.640414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[97056.641488] CR2: 00000000000000b8 CR3: 000000095074c000 CR4: 00000000003607e0
[97056.642633] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[97056.643666] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[97056.644538] Call Trace:
[97056.645264] [<ffffffff9d2624db>] fsnotify_connector_destroy_workfn+0x6b/0x80
[97056.646013] [<ffffffff9d0b2dff>] process_one_work+0x17f/0x440
[97056.646900] [<ffffffff9d0b3ac6>] worker_thread+0x126/0x3c0
[97056.647963] [<ffffffff9d0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[97056.648993] [<ffffffff9d0bae31>] kthread+0xd1/0xe0
[97056.650027] [<ffffffff9d0bad60>] ? insert_kthread_work+0x40/0x40
[97056.651071] [<ffffffff9d71f637>] ret_from_fork_nospec_begin+0x21/0x21
[97056.652111] [<ffffffff9d0bad60>] ? insert_kthread_work+0x40/0x40
[97056.652719] Code: 48 c1 e8 0c 48 c1 e0 06 48 03 05 e9 ad a4 00 48 8b 10 80 e6 80 0f 85 b4 00 00 00 48 89 c2 48 8b 7a 30 49 39 fc 0f 84 e0 fe ff ff <48> 8b 87 b8 00 00 00 48 85 c0 74 0a 4c 3b 60 20 0f 84 cd fe ff
[97056.654656] RIP [<ffffffff9d1f6663>] kmem_cache_free+0x143/0x200
[97056.655325] RSP <ffff95a34e247de0>
[97056.656012] CR2: 00000000000000b8


Noticed that there is similar issue reported against kernel 4.14, but not very confirmed if it is the same issue (http://lists-archives.com/linux-kernel/29098940-kernel-4-14-x-crash-around-fsnotify_mark_connector.html)

Hope to know if anyone has any idea on how this occurs,
Additional InformationPre-analysis:

See also:

crash> bt
PID: 31201 TASK: ffff95a43f6c8fd0 CPU: 2 COMMAND: "kworker/u16:0"
 #0 [ffff95a34e247a78] machine_kexec at ffffffff9d060b2a
 #1 [ffff95a34e247ad8] __crash_kexec at ffffffff9d113402
 #2 [ffff95a34e247ba8] crash_kexec at ffffffff9d1134f0
 #3 [ffff95a34e247bc0] oops_end at ffffffff9d717778
 #4 [ffff95a34e247be8] no_context at ffffffff9d706f98
 #5 [ffff95a34e247c38] __bad_area_nosemaphore at ffffffff9d70702f
 #6 [ffff95a34e247c88] bad_area_nosemaphore at ffffffff9d7071a0
 #7 [ffff95a34e247c98] __do_page_fault at ffffffff9d71a730
 #8 [ffff95a34e247d00] do_page_fault at ffffffff9d71a925
 #9 [ffff95a34e247d30] page_fault at ffffffff9d716768
    [exception RIP: kmem_cache_free+323]
    RIP: ffffffff9d1f6663 RSP: ffff95a34e247de0 RFLAGS: 00010286
    RAX: ffffe3f4e8000000 RBX: ffff95a400000000 RCX: 0000000000000000
    RDX: ffffe3f4e8000000 RSI: ffff95a400000000 RDI: 0000000000000000
    RBP: ffff95a34e247df8 R8: ffffffff9dd420a0 R9: 000188d6db4a30a0
    R10: 000188d6db4a30a0 R11: 0000000000000005 R12: ffff959b75981800
    R13: ffff959b6f515400 R14: ffff959b7fd13e00 R15: 0000000000000200
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff95a34e247e00] fsnotify_connector_destroy_workfn at ffffffff9d2624db
#11 [ffff95a34e247e20] process_one_work at ffffffff9d0b2dff
#12 [ffff95a34e247e68] worker_thread at ffffffff9d0b3ac6
#13 [ffff95a34e247ec8] kthread at ffffffff9d0bae31

crash> dis kmem_cache_free
...
0xffffffff9d1f664a <kmem_cache_free+298>: and $0x80,%dh
0xffffffff9d1f664d <kmem_cache_free+301>: jne 0xffffffff9d1f6707 <kmem_cache_free+487>
0xffffffff9d1f6653 <kmem_cache_free+307>: mov %rax,%rdx
0xffffffff9d1f6656 <kmem_cache_free+310>: mov 0x30(%rdx),%rdi
0xffffffff9d1f665a <kmem_cache_free+314>: cmp %rdi,%r12
0xffffffff9d1f665d <kmem_cache_free+317>: je 0xffffffff9d1f6543 <kmem_cache_free+35>
0xffffffff9d1f6663 <kmem_cache_free+323>: mov 0xb8(%rdi),%rax <<< rdi is 0
0xffffffff9d1f666a <kmem_cache_free+330>: test %rax,%rax
0xffffffff9d1f666d <kmem_cache_free+333>: je 0xffffffff9d1f6679 <kmem_cache_free+345>
0xffffffff9d1f666f <kmem_cache_free+335>: cmp 0x20(%rax),%r12
0xffffffff9d1f6673 <kmem_cache_free+339>: je 0xffffffff9d1f6546 <kmem_cache_free+38>
0xffffffff9d1f6679 <kmem_cache_free+345>: mov 0x60(%rdi),%rcx
0xffffffff9d1f667d <kmem_cache_free+349>: mov 0x60(%r12),%rdx
0xffffffff9d1f6682 <kmem_cache_free+354>: xor %eax,%eax
0xffffffff9d1f6684 <kmem_cache_free+356>: mov $0xffffffff9d830060,%rsi
...

Seems that there is NULL pointer reference during calling:

static inline bool slab_equal_or_root(struct kmem_cache *s,
                    struct kmem_cache *p)
{
    return (p == s) ||
        (s->memcg_params && (p == s->memcg_params->root_cache));
}

>>> Here, s is NULL

This indicate a Null pointer free was invoked.

Any idea on how to fix the issue?
TagsNo tags attached.
abrt_hash
URL

Activities

TrevorH

TrevorH

2019-07-12 10:44

manager   ~0034817

Only the current version is supported so you need to run `yum update` and retest. Also since CentOS is a rebuild of RHEL with all RHEL bugs included, you need to report the issue on bugzilla.redhat.com and if/when RH fix it in RHEL and release the fix, then CentOS will pick it up and rebuild it.

Please also see `rpm -q --changelog kernel-3.10.0-957.21.3.el7.x86_64 | less`. Fixed in kernel 3.10.0-896 was something which looks hopeful:

- [linux] fsnotify: Fix fsnotify_mark_connector race (Miklos Szeredi) [1569921]
ericstarmars

ericstarmars

2019-07-15 02:58

reporter   ~0034828

Thanks, Trevor, will have a try.

Issue History

Date Modified Username Field Change
2019-07-12 06:39 ericstarmars New Issue
2019-07-12 10:44 TrevorH Note Added: 0034817
2019-07-15 02:58 ericstarmars Note Added: 0034828