View Issue Details

IDProjectCategoryView StatusLast Update
0013241CentOS-7kernelpublic2018-01-24 23:23
ReporterGerry Lo 
PriorityurgentSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
PlatformKVMOSCentOSOS Version7.3
Product Version 
Target VersionFixed in Version 
Summary0013241: Kernel panics: Bad RIP value and unable to handle kernel NULL pointer dereference, suspect issue at net/core/neighbour.c
DescriptionWe encountered three kernel panics on our systems, and performed crash analysis on the cases (details please see below). From the crash analysis results and diffing the source of 3.10 and 4.4 kernel, we suspect the change https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/core/neighbour.c?h=linux-4.4.y&id=2c51a97f76d20ebf1f50fef908b986cb051fdff9
may be fixing the kernel panics we saw. This change, not yet merged to 3.10 tree, indicates that it fixes https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/core/neighbour.c?h=linux-3.10.y&id=a263b3093641fb1ec377582c90986a7fd0625184
which is in the 3.10 tree.

This may also relates to the discussion here:
http://www.spinics.net/lists/netdev/msg323883.html
http://www.spinics.net/lists/netdev/msg323914.html

--------------------------------------------------------------
Case 1:
[1811188.407507] CPU: 32 PID: 65767 Comm: Worker Tainted: P OE ------------ T 3.10.0-514.2.2.el7.x86_64 #1
[1811188.408921] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1~cloud0 04/01/2014
[1811188.410685] task: ffff881c80f71f60 ti: ffff881c5e608000 task.ti: ffff881c5e608000
[1811188.412060] RIP: 0010:[<0000081e00000000>] [<0000081e00000000>] 0x81dffffffff
[1811188.414662] RSP: 0000:ffff881c8c803e08 EFLAGS: 00010206
[1811188.415462] RAX: ffff881c5e60bfd8 RBX: ffff8801764cc000 RCX: ffff8800355bf458
[1811188.416492] RDX: 0000000000000000 RSI: 0000081e00000000 RDI: 0000000000000000
[1811188.424118] RBP: ffff881c8c803e38 R08: 0000000000019b00 R09: 0000000000000000
[1811188.425880] R10: ffff881c8c819b00 R11: ffffea007191d900 R12: ffff8800355bf458
[1811188.427633] R13: 0000000000000100 R14: 0000081e00000000 R15: 0000000000000000
[1811188.429410] FS: 00007fdc2e310700(0000) GS:ffff881c8c800000(0000) knlGS:0000000000000000
[1811188.431589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1811188.433269] CR2: 0000081e00000000 CR3: 00000039453e0000 CR4: 00000000003406e0
[1811188.435032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1811188.436699] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1811188.438387] Stack:
[1811188.439529] ffffffff81095a96 ffff8801764cc000 00000000000000ea 0000081e00000000
[1811188.441291] 0000000000000000 ffff8800355bf458 ffff881c8c803eb0 ffffffff81098787
[1811188.443064] ffff8801764cdc28 ffff8801764cd828 ffff8801764cd428 ffff8801764cd028
[1811188.444833] Call Trace:
[1811188.446269] <IRQ>
[1811188.446531]
[1811188.447746] [<ffffffff81095a96>] ? call_timer_fn+0x36/0x110
[1811188.449335] [<ffffffff81098787>] run_timer_softirq+0x237/0x340
[1811188.451178] [<ffffffff8108f21f>] __do_softirq+0xef/0x280
[1811188.453040] [<ffffffff8169825c>] call_softirq+0x1c/0x30
[1811188.454763] [<ffffffff8102d365>] do_softirq+0x65/0xa0
[1811188.456428] [<ffffffff8108f5b5>] irq_exit+0x115/0x120
[1811188.458073] [<ffffffff81698df8>] do_IRQ+0x58/0xf0
[1811188.460074] [<ffffffff8168df6d>] common_interrupt+0x6d/0x6d
[1811188.462321] <EOI>
[1811188.462577]
[1811188.464051] [<ffffffff816968eb>] ? sysret_audit+0x17/0x21
[1811188.466215] Code: Bad RIP value.
[1811188.467866] RIP [<0000081e00000000>] 0x81dffffffff
[1811188.469583] RSP <ffff881c8c803e08>
[1811188.471171] CR2: 0000081e00000000

backtrace:
crash> bt
PID: 65767 TASK: ffff881c80f71f60 CPU: 32 COMMAND: "Worker"
 #0 [ffff881c8c803a80] machine_kexec at ffffffff81059c8b
 #1 [ffff881c8c803ae0] __crash_kexec at ffffffff811052e2
 #2 [ffff881c8c803bb0] crash_kexec at ffffffff811053d0
 #3 [ffff881c8c803bc8] oops_end at ffffffff8168f088
 #4 [ffff881c8c803bf0] no_context at ffffffff8167ecb3
 #5 [ffff881c8c803c40] __bad_area_nosemaphore at ffffffff8167ed49
 #6 [ffff881c8c803c88] bad_area_nosemaphore at ffffffff8167eeb3
 #7 [ffff881c8c803c98] __do_page_fault at ffffffff81691e1e
 #8 [ffff881c8c803cf8] trace_do_page_fault at ffffffff81692076
 #9 [ffff881c8c803d38] do_async_page_fault at ffffffff8169171b
#10 [ffff881c8c803d50] async_page_fault at ffffffff8168e2b8
#11 [ffff881c8c803e40] run_timer_softirq at ffffffff81098787
#12 [ffff881c8c803eb8] __do_softirq at ffffffff8108f21f
#13 [ffff881c8c803f28] call_softirq at ffffffff8169825c
#14 [ffff881c8c803f40] do_softirq at ffffffff8102d365
#15 [ffff881c8c803f60] irq_exit at ffffffff8108f5b5
#16 [ffff881c8c803f78] do_IRQ at ffffffff81698df8
--- <IRQ stack> ---
#17 [ffff881c5e60bf58] ret_from_intr at ffffffff8168df6d
    RIP: 00007fddf0dad80b RSP: 00007fdc2e30fd60 RFLAGS: 00000202
    RAX: 00007fddb544da37 RBX: ffffffff816968eb RCX: 0000000001729c30
    RDX: 00007fddb544d980 RSI: 00007fddb544de67 RDI: 0000000001729b80
    RBP: 00007fdc2e30fd90 R8: 0000000001729b71 R9: 00007fddb544d987
    R10: 00000000ffffffff R11: 0000000000000012 R12: 00000000025207d0
    R13: 00007fdc2e310700 R14: 00007fdc79cafcf8 R15: 00007fdc2e30fe50
    ORIG_RAX: ffffffffffffff5e CS: 0033 SS: 002b

Timers on cpu #32:
crash> tvec_bases | grep 32
  [32]: ffff881c8c80f888
crash> rd -64 ffff881c8c80f888
ffff881c8c80f888: ffff8801764cc000 ..Lv....
crash> tvec_base ffff8801764cc000
struct tvec_base {
  lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0x37ca37ca,
            tickets = {
              head = 0x37ca,
              tail = 0x37ca
            }
          }
        }
      }
    }
  },
  running_timer = 0xffff8800355bf458,
  timer_jiffies = 0x16bf02deb,
  next_timer = 0x16bf02dcf,
  active_timers = 0x58b,
...
  all_timers = 0x58c
}

Jiffies:
crash> p/x jiffies
$6 = 0x16bf02e3a

By looking at run_timer_softirq() source, raw pagefault stack and registers, we can confirm that the running timer indeed was 0xffff8800355bf458.

Next:
crash> timer_list -x 0xffff8800355bf458
struct timer_list {
  entry = {
    next = 0x0,
    prev = 0xdead000000000200 <-- this is CONFIG_ILLEGAL_PONTER+0x200, very similar to LIST_POISON2
  },
  expires = 0x3e3800003e38,
  base = 0x0,
  function = 0x81e00000000,
  data = 0x0,
  slack = 0x0,
  start_pid = 0x0,
  start_site = 0x0,
  start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}

So data in this timer_list struct is indeed corrupt (->function is indeed 0x81e00000000), probably even freed (->prev is very close to LIST_POISON2). It is unclear what might cause this.

Other CPUs don't do anything interesting:

    most are in default_idle() -> native_safe_halt() or userspace
    one in sys_munmap() -> native_flush_tlb_others() -> ...
    one in sys_wait4() -> wait_consider_task()
    one in switch_to()
    one in sysret_audit()
    one in sys_futex() -> try_to_wake_up() -> ...
    one in default_idle() -> reschedule_interrupt() -> ...

Crash "timer" command doesn't show anything suspicious too.

Let's examine the address of the problematic timer_list struct
crash> kmem 0xffff8800355bf458
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88019fc03600 kmalloc-512 512 143214 145504 4547 16k
  SLAB MEMORY NODE TOTAL ALLOCATED FREE
  ffffea0000d56f00 ffff8800355bc000 0 32 13 19
  FREE / [ALLOCATED]
   ffff8800355bf400

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0000d56fc0 355bf000 0 0 0 1fffff00008000 tail

It's in kmalloc-512 slab, and it is not allocated (it would read "[ffff8800355bf400]", in square brackets, if it was).
So call_timer_fn() indeed attempted to access freed memory.

kmalloc-512 is strange, because the size of the timer_list struct is 80 bytes:

crash> struct -o timer_list | grep SIZE
SIZE: 80

Probably it is embedded in some bigger struct.
Looking at the "timer" command output:

crash> timer | grep -E 'BASE|TIMER_LIST|ffff8800355'
...
TVEC_BASES[32]: ffff8801764cc000
  EXPIRES TIMER_LIST FUNCTION
6105887488 ffff8800355bec58 ffffffff8157e9e0 <neigh_timer_handler>
6105894528 ffff8800355be858 ffffffff8157e9e0 <neigh_timer_handler>
6105903616 ffff8800355bd058 ffffffff8157e9e0 <neigh_timer_handler>
6105904896 ffff8800355bfc58 ffffffff8157e9e0 <neigh_timer_handler>
6105909504 ffff8800355bf858 ffffffff8157e9e0 <neigh_timer_handler>
6105912448 ffff8800355bdc58 ffffffff8157e9e0 <neigh_timer_handler>
...
TVEC_BASES[43]: ffff881d7683c000
  EXPIRES TIMER_LIST FUNCTION
6105873776 ffff88003552a058 ffffffff8157e9e0 <neigh_timer_handler>
...

It's likely that our address was also related to neigh_timer_handler(). Let's examine "struct neighbour":
crash> struct -o neighbour | grep SIZE
SIZE: 376
crash> struct -xo neighbour
struct neighbour {
    [0x0] struct neighbour *next;
    [0x8] struct neigh_table *tbl;
   [0x10] struct neigh_parms *parms;
   [0x18] unsigned long confirmed;
   [0x20] unsigned long updated;
   [0x28] rwlock_t lock;
   [0x30] atomic_t refcnt;
   [0x38] struct sk_buff_head arp_queue;
   [0x50] unsigned int arp_queue_len_bytes;
   [0x58] struct timer_list timer;
   [0xa8] unsigned long used;
   [0xb0] atomic_t probes;
   [0xb4] __u8 flags;
   [0xb5] __u8 nud_state;
   [0xb6] __u8 type;
   [0xb7] __u8 dead;
   [0xb8] seqlock_t ha_lock;
   [0xc0] unsigned char ha[32];
   [0xe0] struct hh_cache hh;
  [0x150] int (*output)(struct neighbour *, struct sk_buff *);
  [0x158] const struct neigh_ops *ops;
  [0x160] struct callback_head rcu;
  [0x170] struct net_device *dev;
  [0x178] u8 primary_key[];
}
SIZE: 0x178

It indeed has embedded timer_list, and it's size falls into kmalloc-512 slab. timer_list offset is 0x58, looking back at "kmem" output we see that the base address for enclosing struct was 0xffff8800355bf400, and our address is 0xffff8800355bf458 - exactly at 0x58 offset from the base.
So I'm pretty certain this was a "struct neighbour" that contained problematic "timer_list" and was freed. As such, the problem is probably in the neighbour.c (or related) code. Looking at the source code, it is clear that "struct neighbour" is related to the kernel ARP cache implementation.

Examining the ARP cache in dump, unfortunately there is no trace of our problematic address (ffff8800355bf400).

--------------------------------------------------------------
Case 2:
[2718456.604211] CPU: 37 PID: 51709 Comm: Worker Tainted: P OE ------------ T 3.10.0-514.2.2.el7.x86_64 #1
[2718456.606379] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1~cloud0 04/01/2014
[2718456.608768] task: ffff883262eede20 ti: ffff88048834c000 task.ti: ffff88048834c000
[2718456.610433] RIP: 0010:[<0000081e00000000>] [<0000081e00000000>] 0x81dffffffff
[2718456.612097] RSP: 0000:ffff88394b8c3e28 EFLAGS: 00010202
[2718456.613504] RAX: ffff88048834ffd8 RBX: ffff881d77374000 RCX: ffff881c5ceffa58
[2718456.615100] RDX: 0000000000000000 RSI: 0000081e00000000 RDI: 0000000000000000
[2718456.617300] RBP: ffff88394b8c3e58 R08: 0000000000004694 R09: 0000000000000000
[2718456.619004] R10: 0000000000000000 R11: ffff88394b8c3da0 R12: ffff881c5ceffa58
[2718456.621266] R13: 0000000000000100 R14: 0000081e00000000 R15: 0000000000000000
[2718456.624398] FS: 00007f02ab6d3700(0000) GS:ffff88394b8c0000(0000) knlGS:0000000000000000
[2718456.627644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2718456.630671] CR2: 0000081e00000000 CR3: 0000003925ac1000 CR4: 00000000003406e0
[2718456.633862] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2718456.636587] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2718456.638604] Stack:
[2718456.639951] ffffffff81095a96 ffff881d77374000 00000000000000d6 0000081e00000000
[2718456.641832] 0000000000000000 ffff881c5ceffa58 ffff88394b8c3ed0 ffffffff81098787
[2718456.643752] ffff881d77375c28 ffff881d77375828 ffff881d77375428 ffff881d77375028
[2718456.645846] Call Trace:
[2718456.647883] <IRQ>
[2718456.648126]
[2718456.649556] [<ffffffff81095a96>] ? call_timer_fn+0x36/0x110
[2718456.651064] [<ffffffff81098787>] run_timer_softirq+0x237/0x340
[2718456.652762] [<ffffffff8108f21f>] __do_softirq+0xef/0x280
[2718456.654431] [<ffffffff8169825c>] call_softirq+0x1c/0x30
[2718456.656134] [<ffffffff8102d365>] do_softirq+0x65/0xa0
[2718456.657745] [<ffffffff8108f5b5>] irq_exit+0x115/0x120
[2718456.659883] [<ffffffff81698ed5>] smp_apic_timer_interrupt+0x45/0x60
[2718456.662031] [<ffffffff8169741d>] apic_timer_interrupt+0x6d/0x80
[2718456.663724] <EOI>
[2718456.663950] Code:
[2718456.665265] Bad RIP value.
[2718456.666487] RIP [<0000081e00000000>] 0x81dffffffff
[2718456.668086] RSP <ffff88394b8c3e28>
[2718456.669580] CR2: 0000081e00000000

backtrace:
crash> bt
PID: 51709 TASK: ffff883262eede20 CPU: 37 COMMAND: "Worker"
 #0 [ffff88394b8c3aa0] machine_kexec at ffffffff81059c8b
 #1 [ffff88394b8c3b00] __crash_kexec at ffffffff811052e2
 #2 [ffff88394b8c3bd0] crash_kexec at ffffffff811053d0
 #3 [ffff88394b8c3be8] oops_end at ffffffff8168f088
 #4 [ffff88394b8c3c10] no_context at ffffffff8167ecb3
 #5 [ffff88394b8c3c60] __bad_area_nosemaphore at ffffffff8167ed49
 #6 [ffff88394b8c3ca8] bad_area_nosemaphore at ffffffff8167eeb3
 #7 [ffff88394b8c3cb8] __do_page_fault at ffffffff81691e1e
 #8 [ffff88394b8c3d18] trace_do_page_fault at ffffffff81692076
 #9 [ffff88394b8c3d58] do_async_page_fault at ffffffff8169171b
#10 [ffff88394b8c3d70] async_page_fault at ffffffff8168e2b8
#11 [ffff88394b8c3e60] run_timer_softirq at ffffffff81098787
#12 [ffff88394b8c3ed8] __do_softirq at ffffffff8108f21f
#13 [ffff88394b8c3f48] call_softirq at ffffffff8169825c
#14 [ffff88394b8c3f60] do_softirq at ffffffff8102d365
#15 [ffff88394b8c3f80] irq_exit at ffffffff8108f5b5
#16 [ffff88394b8c3f98] smp_apic_timer_interrupt at ffffffff81698ed5
#17 [ffff88394b8c3fb0] apic_timer_interrupt at ffffffff8169741d
--- <IRQ stack> ---
#18 [ffff88048834ff58] apic_timer_interrupt at ffffffff8169741d
    RIP: 00007f0384af541c RSP: 00007f02ab6d2aa8 RFLAGS: 00000206
    RAX: 00000000351c3bf0 RBX: 000000000000fe2e RCX: 0000000000000820
    RDX: 00000000000001a0 RSI: 00000000351c3090 RDI: 0000000033e82500
    RBP: 0000000000000002 R8: 0000000034c73530 R9: 0000000034405c80
    R10: 0000000000030101 R11: 0000000009c47874 R12: 0000000000000100
    R13: 0000000033e102c0 R14: ffffffff8168c8d1 R15: ffff88048834ff70
    ORIG_RAX: ffffffffffffff10 CS: 0033 SS: 002b

crash> tvec_bases | grep 37
  [37]: ffff88394b8cf888
crash> rd -64 ffff88394b8cf888
ffff88394b8cf888: ffff881d77374000 .@7w....
crash> tvec_base.running_timer,timer_jiffies,next_timer,active_timers,all_timers -x ffff881d77374000
  running_timer = 0xffff881c5ceffa58
  timer_jiffies = 0x1a20410d7
  next_timer = 0x1a20410d6
  active_timers = 0x16
  all_timers = 0x17
crash> p/x jiffies
$1 = 0x1a2041138
crash> timer_list -x 0xffff881c5ceffa58
struct timer_list {
  entry = {
    next = 0x0,
    prev = 0xdead000000000200
  },
  expires = 0x388800003890,
  base = 0x0,
  function = 0x81e00000000,
  data = 0x0,
  slack = 0x0,
  start_pid = 0x0,
  start_site = 0x0,
  start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}

Same thing.

--------------------------------------------------------------
Case 3:
[2696447.831143] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[2696447.845140] IP: [<ffffffff810991a8>] get_next_timer_interrupt+0x1b8/0x260

Backtrace:
crash> bt
PID: 0 TASK: ffff880176f10000 CPU: 32 COMMAND: "swapper/32"
 #0 [ffff881c8c803b00] machine_kexec at ffffffff81059c8b
 #1 [ffff881c8c803b60] __crash_kexec at ffffffff811052e2
 #2 [ffff881c8c803c30] crash_kexec at ffffffff811053d0
 #3 [ffff881c8c803c48] oops_end at ffffffff8168f088
 #4 [ffff881c8c803c70] no_context at ffffffff8167ecb3
 #5 [ffff881c8c803cc0] __bad_area_nosemaphore at ffffffff8167ed49
 #6 [ffff881c8c803d08] bad_area_nosemaphore at ffffffff8167eeb3
 #7 [ffff881c8c803d18] __do_page_fault at ffffffff81691e1e
 #8 [ffff881c8c803d78] trace_do_page_fault at ffffffff81692076
 #9 [ffff881c8c803db8] do_async_page_fault at ffffffff8169171b
#10 [ffff881c8c803dd0] async_page_fault at ffffffff8168e2b8
    [exception RIP: get_next_timer_interrupt+440]
    RIP: ffffffff810991a8 RSP: ffff881c8c803e80 RFLAGS: 00010017
    RAX: 0000000000000000 RBX: 0009946980df7140 RCX: 000001e6000001e6
    RDX: 00000001a0b41200 RSI: ffff8801764dd148 RDI: 0000000001a0b412
    RBP: ffff881c8c803ed0 R8: 0000000000000001 R9: 0000000000000012
    R10: 0000000000000012 R11: ffff8801764dd028 R12: 00000001a0b411fe
    R13: ffff8801764dc000 R14: ffff881c8c803e88 R15: ffff881c8c803ea0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffff881c8c803ed8] tick_nohz_stop_sched_tick at ffffffff810f3428
#12 [ffff881c8c803f30] __tick_nohz_idle_enter at ffffffff810f35ce
#13 [ffff881c8c803f60] tick_nohz_irq_exit at ffffffff810f3bd8
#14 [ffff881c8c803f80] irq_exit at ffffffff8108f52b
#15 [ffff881c8c803f98] smp_apic_timer_interrupt at ffffffff81698ed5
#16 [ffff881c8c803fb0] apic_timer_interrupt at ffffffff8169741d
--- <IRQ stack> ---
#17 [ffff880176f0fde8] apic_timer_interrupt at ffffffff8169741d
    [exception RIP: native_safe_halt+6]
    RIP: ffffffff81060fe6 RSP: ffff880176f0fe98 RFLAGS: 00000286
    RAX: 00000000ffffffed RBX: 0009946980d02f00 RCX: 0100000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
    RBP: ffff880176f0fe98 R8: 0000000000000000 R9: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000293 R12: d81a1d57622a0b7b
    R13: ffff880176f0fe48 R14: ffff880176f0fe18 R15: ffffffff810508a7
    ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#18 [ffff880176f0fea0] default_idle at ffffffff8103483f
#19 [ffff880176f0fec0] arch_cpu_idle at ffffffff81035186
#20 [ffff880176f0fed0] cpu_startup_entry at ffffffff810e7dc5
#21 [ffff880176f0ff28] start_secondary at ffffffff8104f12a

Offending instruction:
crash> dis ffffffff810991a8
0xffffffff810991a8 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rax)

- indeed RAX is NULL and we add 0x18 offset to it.

Timers on CPU #32:
crash> tvec_bases | grep 32
  [32]: ffff881c8c80f888
crash> rd -64 ffff881c8c80f888
ffff881c8c80f888: ffff8801764dc000 ..Mv....
crash> tvec_base -x ffff8801764dc000 | less

crash> tvec_base.running_timer,timer_jiffies,next_timer,active_timers,all_timers -x ffff8801764dc000
  running_timer = 0x0
  timer_jiffies = 0x1a0b411ff
  next_timer = 0x1a0b411fe
  active_timers = 0x4c2
  all_timers = 0x4c2
crash> p/x jiffies
$1 = 0x1a0b41280

- tvec_base looks ok
crash> timer | sed -n '/BASES.32/,/BASE/p'
TVEC_BASES[32]: ffff8801764dc000
   JIFFIES
   6991123072
   EXPIRES TIMER_LIST FUNCTION
   6991122944 ffff881911357790 ffffffffa03e9150 <death_by_timeout>
   ... 1214 more entries that all look correct ...
   7423122939 ffff8832ae6ab290 ffffffffa03e9150 <death_by_timeout>
2087354106342 ffff881675642458 81e00000000
TVEC_BASES[33]: ffff88017651c000

- there is a timer_list with 0x81e00000000 function ("2087354106342 ffff881675642458 81e00000000"):

Address info:
crash> kmem ffff881675642458
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88019fc03600 kmalloc-512 512 143576 145568 4549 16k
  SLAB MEMORY NODE TOTAL ALLOCATED FREE
  ffffea0059d59000 ffff881675640000 0 32 21 11
  FREE / [ALLOCATED]
   ffff881675642400 (cpu 28 cache)

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0059d59080 1675642000 0 0 0 2fffff00008000 tail

Again this is kmalloc-512, 0x58 offset and freed memory.

So the latest crash looks like another manifestation of the same problem.
There's no trace of this address in "net -a" output, and problematic memory area already contains garbage as in previous cases.

Steps To ReproduceUnknown. Occurrence seems random.
Tagskernel panic
abrt_hash
URL

Activities

toracat

toracat

2017-06-14 17:22

manager   ~0029481

We will try to add the patch to the next update to our kernel-plus package.

To get it added to the distro kernel, you need to file a bug report upstream at http://bugzilla.redhat.com .
toracat

toracat

2017-06-15 05:42

manager   ~0029489

A set of the kernel-plus package with the patch applied is now available for testing:

https://people.centos.org/toracat/kernel/7/plus/bug13241/
toracat

toracat

2017-06-22 15:38

manager   ~0029552

kernel-plus-3.10.0-514.21.2.el7 is out. It has the patch from this bug report.
Gerry Lo

Gerry Lo

2017-06-29 06:41

reporter   ~0029577

Thanks! We will test the package.
toracat

toracat

2018-01-24 23:23

manager   ~0031022

Closing as 'resolved'. Feel free to reopen if the issue persists.

Issue History

Date Modified Username Field Change
2017-05-11 10:21 Gerry Lo New Issue
2017-05-11 10:21 Gerry Lo Tag Attached: kernel panic
2017-06-14 17:22 toracat Status new => assigned
2017-06-14 17:22 toracat Note Added: 0029481
2017-06-15 05:42 toracat Note Added: 0029489
2017-06-22 15:38 toracat Note Added: 0029552
2017-06-29 06:41 Gerry Lo Note Added: 0029577
2018-01-24 23:23 toracat Status assigned => resolved
2018-01-24 23:23 toracat Resolution open => fixed
2018-01-24 23:23 toracat Note Added: 0031022