2017-12-14 02:20 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0012012CentOS-7kernelpublic2017-03-07 22:14
ReporterNikolay 
PrioritynormalSeveritycrashReproducibilityrandom
StatusresolvedResolutionfixed 
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0012012: 3.10.0-327.36.1.el7.x86_64 panic
DescriptionAfter update to 3.10.0-327.36.1.el7.x86_64 from 3.10.0-327.22.2.el7.x86_64 :

vmcore-dmesg.txt :
=
[41486.135546] general protection fault: 0000 [#1] SMP
[41486.135651] Modules linked in: binfmt_misc ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ext4 mbcache jbd2 coretemp sg ipmi_ssif iTCO_wdt iTCO_vendor_support kvm ipmi_devintf lpc_ich mfd_core i7300_edac edac_core dcdbas ipmi_si pcspkr shpchp ipmi_msghandler ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common radeon i2c_algo_bit drm_kms_helper ttm drm serio_raw i2c_core bnx2 megaraid_sas
[41486.136007] CPU: 23 PID: 12155 Comm: crond Not tainted 3.10.0-327.36.1.el7.x86_64 #1
[41486.136007] Hardware name: Dell Inc. PowerEdge R900/0X947H, BIOS 1.2.0 11/11/2010
[41486.136007] task: ffff88084e560b80 ti: ffff8806976f8000 task.ti: ffff8806976f8000
[41486.136007] RIP: 0010:[<ffffffff8155604b>] [<ffffffff8155604b>] netlink_compare+0xb/0x30
[41486.136007] RSP: 0018:ffff8806976fbbc8 EFLAGS: 00010246
[41486.136007] RAX: 0000000000000000 RBX: 6e69666e6f636e75 RCX: 00000000f50a8114
[41486.136007] RDX: 0000000000002f7b RSI: ffff8806976fbc18 RDI: 6e69666e6f6369ed
[41486.136007] RBP: ffff8806976fbc00 R08: ffff8806976fbc14 R09: 0000000000000000
[41486.136007] R10: 00000000000000ff R11: 0000000000000000 R12: ffff8808534da678
[41486.136007] R13: ffff8806976fbc18 R14: ffffffff81556040 R15: ffff880835b0be00
[41486.136007] FS: 00007f9e6df3d800(0000) GS:ffff88085f3c0000(0000) knlGS:0000000000000000
[41486.136007] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41486.136007] CR2: 00007ffdee7e5000 CR3: 000000080e0ab000 CR4: 00000000000007e0
[41486.136007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41486.136007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[41486.136007] Stack:
[41486.136007] ffff8806976fbc00 ffffffff813080a0 0000000000002f7b ffffffff81a26d80
[41486.136007] ffff8808534da678 00000000ffffefff ffff88084e660800 ffff8806976fbc58
[41486.136007] ffffffff8155782e 00002f7b976fbc70 ffffffff81a26d80 fbfffff900002f7b
[41486.136007] Call Trace:
[41486.136007] [<ffffffff813080a0>] ? rhashtable_lookup_compare+0x50/0x90
[41486.136007] [<ffffffff8155782e>] netlink_autobind.isra.37+0xae/0x100
[41486.136007] [<ffffffff8155a2aa>] netlink_sendmsg+0x22a/0x770
[41486.136007] [<ffffffff81288a75>] ? sock_has_perm+0x75/0x90
[41486.136007] [<ffffffff815112a0>] sock_sendmsg+0xb0/0xf0
[41486.136007] [<ffffffff81511811>] SYSC_sendto+0x121/0x1c0
[41486.136007] [<ffffffff81641f1d>] ? __do_page_fault+0x16d/0x450
[41486.136007] [<ffffffff81642223>] ? do_page_fault+0x23/0x80
[41486.136007] [<ffffffff8151229e>] SyS_sendto+0xe/0x10
[41486.136007] [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b
[41486.136007] Code: 8b 77 08 39 77 14 8d 4e 01 41 0f 44 c9 41 39 c8 89 4f 08 74 09 48 8b 08 83 3c 11 04 74 e2 5d c3 66 66 66 66 90 55 31 c0 8b 56 08 <39> 97 00 03 00 00 48 89 e5 74 0a 5d c3 0f 1f 84 00 00 00 00 00
[41486.136007] RIP [<ffffffff8155604b>] netlink_compare+0xb/0x30
[41486.136007] RSP <ffff8806976fbbc8>
=
Tags"restarting system"
abrt_hash
URL
Attached Files

-Relationships
+Relationships

-Notes

~0027724

henel321 (reporter)

I can only second this. We have the exact same issue..

On 1000 servers this happens at a rate of about 3-6 per day or equal to a reboot every 6 months.

Vmcore-dmesg.txt is the exact same as posted above.

~0027731

toracat (manager)

Apparently this is a known issue. Looks like the fix will be in a future 7.2 kernel.

https://access.redhat.com/solutions/2647381

~0027909

techsicodia (reporter)

Hello,

We have also this bad issue on 1000 servers. Is there a temporarily solution for this issue ?

Please, can you give the full content of this page: https://access.redhat.com/solutions/2647381 because we don't have a Redhat account.

Thank you.

Best regards

~0028394

plug (reporter)

Hi all, any news on this issue? We seem to see the same problem on:
Arch : x86_64
Version : 3.10.0
Release : 327.36.3.el7

~0028395

tigalch (manager)

The kernel you mention in this bug report is outdated. Please update to the current C7-kernel (3.10.0-514.2.2 (at the time of writing) and report back.

~0028396

plug (reporter)

Hi tigalch,
thanks for the quick reply, We have been seeing this issue since
Release : 327.36.1.el7
and we thought the step to 327.36.3.el7 fixed this as we didn't see any reboot of our servers for about a month.. but tragedy struck again last week.

We did not read anything in the bug release notes:
https://access.redhat.com/articles/2780461
about this issue, and we sadly cannot just reboot our servers without making a lot of people mad, especially if we are not sure if this issue is fixed or not.

Did anyone already find out how to reproduce this issue? then we could at least do a proper test in our lab instead of waiting potentially a few months to see if this issue was fixed.

~0028397

Nikolay (reporter)

Hi, guys.
Everything looks fine (no more reboots) with

=
$ uname -sr
Linux 3.10.0-514.2.2.el7.x86_64

$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)

$ uptime
 14:35:11 up 36 days, 2:22, 2 users, load average: 0,43, 0,42, 0,38
=

~0028407

plug (reporter)

Hi Nikolay,
how often did the reboots occur before your kernel update?

reboot system boot 3.10.0-327.36.3. Thu Jan 12 10:37 - 16:32 (6+05:54)
reboot system boot 3.10.0-327.36.3. Sat Nov 12 07:02 - 16:32 (67+09:30)

as you can see we had 61 days between the last 2 reboots with kernel 3.10.0-327.36.3[

~0028415

Nikolay (reporter)

Hi, plug,

reboots occured one-two times per week

~0028473

plug (reporter)

Hi guys,

We will be updating some servers tomorrow to 3.10.0-514.6.1.el7.x86_64
If I don't say anything in this thread anymore then that means we didn't see this issue anymore.

~0028785

njlc_wy (reporter)

I also found this issue in 3.10.0-327.36.3.el7.x86_64
crash> bt -l
PID: 40149 TASK: ffff8805b3b92e00 CPU: 4 COMMAND: "sudo"
 #0 [ffff8805630536f0] machine_kexec at ffffffff81051e9b
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/kernel/machine_kexec_64.c: 319
 #1 [ffff880563053750] crash_kexec at ffffffff810f27e2
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/kernel/kexec.c: 1486
 #2 [ffff880563053820] oops_end at ffffffff8163f448
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/kernel/dumpstack.c: 225
 #3 [ffff880563053848] no_context at ffffffff8162f561
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/mm/fault.c: 696
 #4 [ffff880563053898] __bad_area_nosemaphore at ffffffff8162f5f7
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/mm/fault.c: 775
 #5 [ffff8805630538e0] bad_area at ffffffff8162f91b
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/mm/fault.c: 804
 #6 [ffff880563053908] __do_page_fault at ffffffff81642235
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/mm/fault.c: 1159
 #7 [ffff880563053968] do_page_fault at ffffffff81642363
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/mm/fault.c: 1230
 #8 [ffff880563053990] page_fault at ffffffff8163e648
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/kernel/entry_64.S: 1309
    [exception RIP: netlink_compare+11]
    RIP: ffffffff815560bb RSP: ffff880563053a40 RFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000003000000030 RCX: 00000000e6eb1557
    RDX: 0000000000009cd5 RSI: ffff880563053a90 RDI: 0000002ffffffba8
    RBP: ffff880563053a78 R8: ffff880563053a8c R9: 0000000000000300
    R10: ffff88082f003600 R11: 0000000000000246 R12: ffff8808294ba000
    R13: ffff880563053a90 R14: ffffffff815560b0 R15: ffff88034dbb2a00
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 1026
 #9 [ffff880563053a48] rhashtable_lookup_compare at ffffffff813080d0
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/lib/rhashtable.c: 470
#10 [ffff880563053a80] netlink_lookup at ffffffff815569ee
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 1049
#11 [ffff880563053ab0] netlink_getsockbyportid at ffffffff81557d8f
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 1561
#12 [ffff880563053ac8] netlink_alloc_skb at ffffffff81557dff
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 1798
#13 [ffff880563053b00] netlink_dump at ffffffff81558083
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 2695
#14 [ffff880563053b30] __netlink_dump_start at ffffffff81558a6b
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 2799
#15 [ffff880563053b68] rtnetlink_rcv_msg at ffffffff8153a4a0
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/core/rtnetlink.c: 70
#16 [ffff880563053bd8] netlink_rcv_skb at ffffffff8155aa19
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 2879
#17 [ffff880563053c00] rtnetlink_rcv at ffffffff8153a338
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/core/rtnetlink.c: 82
#18 [ffff880563053c18] netlink_unicast at ffffffff8155a02d
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 1743
#19 [ffff880563053c60] netlink_sendmsg at ffffffff8155a420
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/netlink/af_netlink.c: 2367
#20 [ffff880563053cf8] sock_sendmsg at ffffffff815112d0
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/socket.c: 632
#21 [ffff880563053e58] SYSC_sendto at ffffffff81511841
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/socket.c: 1786
#22 [ffff880563053f70] sys_sendto at ffffffff815122ce
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/net/socket.c: 1751
#23 [ffff880563053f80] system_call_fastpath at ffffffff81646b49
    /usr/src/debug/kernel-3.10.0-327.36.3.el7/linux-3.10.0-327.36.3.el7.x86_64/arch/x86/kernel/entry_64.S: 444
    RIP: 00007fcbb96dbdb3 RSP: 00007fff18af7e40 RFLAGS: 00010202
    RAX: 000000000000002c RBX: ffffffff81646b49 RCX: 00007fcbbb79c660
    RDX: 0000000000000014 RSI: 00007fff18af8f90 RDI: 0000000000000003
    RBP: 00007fff18af9000 R8: 00007fff18af8f70 R9: 000000000000000c
    R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff815122ce
    R13: ffff880563053f78 R14: 0000000000008001 R15: 00007fff18af7f40
    ORIG_RAX: 000000000000002c CS: 0033 SS: 002b

~0028786

njlc_wy (reporter)

By debugging kernel dump, I found the return value of "jhash" is wrong, which is invoked in <rhashtable_lookup_compare +43>

crash> disas rhashtable_lookup_compare
Dump of assembler code for function rhashtable_lookup_compare:
   0xffffffff81308080 <+0>: push %rbp
   0xffffffff81308081 <+1>: mov %rsp,%rbp
   0xffffffff81308084 <+4>: push %r15
   0xffffffff81308086 <+6>: push %r14
   0xffffffff81308088 <+8>: mov %rdx,%r14
   0xffffffff8130808b <+11>: push %r13
   0xffffffff8130808d <+13>: mov %rcx,%r13
   0xffffffff81308090 <+16>: push %r12
   0xffffffff81308092 <+18>: mov %rdi,%r12
   0xffffffff81308095 <+21>: mov %rsi,%rdi
   0xffffffff81308098 <+24>: push %rbx
   0xffffffff81308099 <+25>: mov (%r12),%rbx
   0xffffffff8130809d <+29>: mov (%r12),%r15
   0xffffffff813080a1 <+33>: mov 0x38(%r12),%edx
   0xffffffff813080a6 <+38>: mov 0x20(%r12),%esi
   0xffffffff813080ab <+43>: callq *0x50(%r12) ; <jhash>
   0xffffffff813080b0 <+48>: mov (%r15),%rdx
   0xffffffff813080b3 <+51>: sub $0x1,%edx
   0xffffffff813080b6 <+54>: and %edx,%eax
   0xffffffff813080b8 <+56>: mov 0x8(%rbx,%rax,8),%rbx
   0xffffffff813080bd <+61>: test %rbx,%rbx
   0xffffffff813080c0 <+64>: je 0xffffffff813080dc <rhashtable_lookup_compare+92>
   0xffffffff813080c2 <+66>: mov %rbx,%rdi
   0xffffffff813080c5 <+69>: sub 0x30(%r12),%rdi
   0xffffffff813080ca <+74>: mov %r13,%rsi
   0xffffffff813080cd <+77>: callq *%r14
   0xffffffff813080d0 <+80>: test %al,%al
   0xffffffff813080d2 <+82>: jne 0xffffffff813080f0 <rhashtable_lookup_compare+112>
   0xffffffff813080d4 <+84>: mov (%rbx),%rbx
   0xffffffff813080d7 <+87>: test %rbx,%rbx
   0xffffffff813080da <+90>: jne 0xffffffff813080c2 <rhashtable_lookup_compare+66>
   0xffffffff813080dc <+92>: pop %rbx
   0xffffffff813080dd <+93>: pop %r12
   0xffffffff813080df <+95>: pop %r13
   0xffffffff813080e1 <+97>: pop %r14
   0xffffffff813080e3 <+99>: pop %r15
   0xffffffff813080e5 <+101>: xor %eax,%eax

~0028789

toracat (manager)

The issue reported in this ticket has been resolved in the 7.3 kernel (3.10.0-514.el7).

Please update the kernel to the latest if you are running an earlier kernel. If you see the problem with the -514 kernel, open a new ticket.
+Notes

-Issue History
Date Modified Username Field Change
2016-10-12 09:37 Nikolay New Issue
2016-10-12 09:37 Nikolay Tag Attached: "restarting system"
2016-10-17 07:14 henel321 Note Added: 0027724
2016-10-17 18:06 toracat Note Added: 0027731
2016-11-17 12:08 techsicodia Note Added: 0027909
2017-01-18 10:37 plug Note Added: 0028394
2017-01-18 10:41 tigalch Note Added: 0028395
2017-01-18 11:08 plug Note Added: 0028396
2017-01-18 11:43 Nikolay Note Added: 0028397
2017-01-18 16:43 plug Note Added: 0028407
2017-01-19 06:26 Nikolay Note Added: 0028415
2017-01-25 17:00 plug Note Added: 0028473
2017-03-07 13:07 njlc_wy Note Added: 0028785
2017-03-07 13:13 njlc_wy Note Added: 0028786
2017-03-07 22:14 toracat Note Added: 0028789
2017-03-07 22:14 toracat Status new => resolved
2017-03-07 22:14 toracat Resolution open => fixed
+Issue History