View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0017554 | CentOS-7 | kernel | public | 2020-07-01 03:55 | 2021-08-22 16:04 |
Reporter | chudihuang | Assigned To | |||
Priority | high | Severity | block | Reproducibility | random |
Status | new | Resolution | open | ||
Platform | linux | OS | 3.10.0-957.21.3.el7.x86_64 | OS Version | centos7.2 |
Product Version | 7.2.1511 | ||||
Summary | 0017554: System panics with "kernel BUG at mm/mmap.c:741!" | ||||
Description | CentOS 7.2 with Kernel Ver 3.10.0-957.21.3.el7.x86_64. System panics with a panic string "kernel BUG at mm/mmap.c:741!" Load the vmcore in crash . Crash should present details similar to the following: KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.21.3.el7.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 16 DATE: Sun Jun 28 15:13:13 2020 UPTIME: 192 days, 21:53:43 LOAD AVERAGE: 0.21, 0.54, 0.63 TASKS: 1140 RELEASE: 3.10.0-957.21.3.el7.x86_64 VERSION: #1 SMP Tue Jun 18 16:35:19 UTC 2019 MACHINE: x86_64 (2494 Mhz) MEMORY: 64 GB PANIC: "kernel BUG at mm/mmap.c:741!" crash> log | grep BUG [16667623.237709] kernel BUG at mm/mmap.c:741! crash> Referencing the line above in the source code, mm/mmap.c:741, we see the panic is due to a BUGON in __insert_vm_struct: int __vma_adjust(struct vm_area_struct *vma, unsigned long start, unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert, struct vm_area_struct *expand) { ... __insert_vm_struct(mm, insert); ... } 734 static void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma) 735 { 736 struct vm_area_struct *prev; 737 struct rb_node **rb_link, *rb_parent; 738 739 if (find_vma_links(mm, vma->vm_start, vma->vm_end, 740 &prev, &rb_link, &rb_parent)) 741 BUG(); 742 __vma_link(mm, vma, prev, rb_link, rb_parent); 743 mm->map_count++; 744 } This BUGON is encountered after exiting find_vma_links(...) on line 739 above likely too early due to an overlapping VMA. Checking therein causes find_vma_links(...) to return -ENOMEM: 591 static int find_vma_links(struct mm_struct *mm, unsigned long addr, 592 unsigned long end, struct vm_area_struct **pprev, 593 struct rb_node ***rb_link, struct rb_node **rb_parent) 594 { 595 struct rb_node **__rb_link, *__rb_parent, *rb_prev; 596 597 __rb_link = &mm->mm_rb.rb_node; 598 rb_prev = __rb_parent = NULL; 599 600 while (*__rb_link) { 601 struct vm_area_struct *vma_tmp; 602 603 __rb_parent = *__rb_link; 604 vma_tmp = rb_entry(__rb_parent, struct vm_area_struct, vm_rb); 605 606 if (vma_tmp->vm_end > addr) { 607 /* Fail if an existing vma overlaps the area */ 608 if (vma_tmp->vm_start < end) 609 return -ENOMEM; 610 __rb_link = &__rb_parent->rb_left; 611 } else { 612 rb_prev = __rb_parent; 613 __rb_link = &__rb_parent->rb_right; 614 } 615 } 616 617 *pprev = NULL; 618 if (rb_prev) 619 *pprev = rb_entry(rb_prev, struct vm_area_struct, vm_rb); 620 *rb_link = __rb_link; 621 *rb_parent = __rb_parent; 622 return 0; 623 } Above, the kernel searches for a place in the VMA area within the mm_struct to create a new entry. However, in parsing the VMAs, we find an overlap returning ENOMEM. To investigate, the backtrace of the faulting process must be investigated to review the state of the respective mm_struct: crash> bt PID: 4183 TASK: ffff965e65cf0000 CPU: 7 COMMAND: "java" #0 [ffff965e76bf3a60] machine_kexec at ffffffff81863934 #1 [ffff965e76bf3ac0] __crash_kexec at ffffffff8191d162 #2 [ffff965e76bf3b90] crash_kexec at ffffffff8191d250 #3 [ffff965e76bf3ba8] oops_end at ffffffff81f6d778 #4 [ffff965e76bf3bd0] die at ffffffff8182f95b #5 [ffff965e76bf3c00] do_trap at ffffffff81f6cec0 #6 [ffff965e76bf3c50] do_invalid_op at ffffffff8182c2a4 #7 [ffff965e76bf3d00] invalid_op at ffffffff81f7912e [exception RIP: __vma_adjust+0x5e8] RIP: ffffffff819f0618 RSP: ffff965e76bf3db0 RFLAGS: 00010206 RAX: ffff966e1cf1a458 RBX: ffff965f4ecb4df8 RCX: ffff966e3a623e80 RDX: 00007ff55c000000 RSI: 00007ff560000000 RDI: ffff965f4ecb4d80 RBP: ffff965e76bf3e48 R8: ffff966e3a623e80 R9: 0000000000000000 R10: 0000000000004022 R11: 0000000000000000 R12: ffff966e1cf1af30 R13: ffff965f4ecb4d80 R14: 0000000000000000 R15: ffff966e3a623e88 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff965e76bf3e50] __split_vma at ffffffff819f1918 #9 [ffff965e76bf3e90] do_munmap at ffffffff819f1aca #10 [ffff965e76bf3ee0] vm_munmap at ffffffff819f1e55 #11 [ffff965e76bf3f30] sys_munmap at ffffffff819f30e2 #12 [ffff965e76bf3f50] system_call_fastpath at ffffffff81f75ddb RIP: 00007ff59da85e57 RSP: 00007ff584ebbfb0 RFLAGS: 00000202 RAX: 000000000000000b RBX: 00007ff558000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000004000000 RDI: 00007ff55c000000 RBP: 0000000000021000 R8: ffffffffffffffff R9: 0000000000000000 R10: 0000000000004022 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ff584ebbbd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 000000000000000b CS: 0033 SS: 002b crash> And the corresponding dissasembly for the panic location: crash> dis -r __vma_adjust+0x5e8 | tail -5 0xffffffff819f0605 <__vma_adjust+0x5d5>: jne 0xffffffff819f0235 <__vma_adjust+0x205> 0xffffffff819f060b <__vma_adjust+0x5db>: mov %rax,0x10(%rcx) 0xffffffff819f060f <__vma_adjust+0x5df>: jmpq 0xffffffff819f0235 <__vma_adjust+0x205> 0xffffffff819f0614 <__vma_adjust+0x5e4>: nopl 0x0(%rax) 0xffffffff819f0618 <__vma_adjust+0x5e8>: ud2 crash> Jumped to the ud2 that caused the panic. Where did we jump from? crash> dis __vma_adjust | grep 0x5e8 0xffffffff819f03c4 <__vma_adjust+0x394>: ja 0xffffffff819f0618 <__vma_adjust+0x5e8> 0xffffffff819f0618 <__vma_adjust+0x5e8>: ud2 crash> crash> And the assembly before the ja: crash> dis __vma_adjust | grep -B 5 __vma_adjust+0x394 0xffffffff819f03b2 <__vma_adjust+0x382>: xor %r8d,%r8d 0xffffffff819f03b5 <__vma_adjust+0x385>: lea 0x8(%rax),%r15 0xffffffff819f03b9 <__vma_adjust+0x389>: jmp 0xffffffff819f03d1 <__vma_adjust+0x3a1> 0xffffffff819f03bb <__vma_adjust+0x38b>: nopl 0x0(%rax,%rax,1) 0xffffffff819f03c0 <__vma_adjust+0x390>: cmp -0x20(%rax),%rsi 0xffffffff819f03c4 <__vma_adjust+0x394>: ja 0xffffffff819f0618 <__vma_adjust+0x5e8> crash> Above rax is likely the rb_node* __rb_parent, so rb_node - 0x20 is likely the vm_area_struct as rb_entry does a containerof with __rb_parent, struct vm_area_struct, vm_rb: crash> vm_area_struct.vm_rb -xo struct vm_area_struct { [0x20] struct rb_node vm_rb; } crash> crash> vm_area_struct.vm_start -xo struct vm_area_struct { [0x0] unsigned long vm_start; } So what are the values? Here, the registers must be reviewed to perform the appropriate math and extract the vma_tmp->vm_start and end addresses: crash> bt | grep -e RAX -e RSI RAX: ffff966e1cf1a458 RBX: ffff965f4ecb4df8 RCX: ffff966e3a623e80 RDX: 00007ff55c000000 RSI: 00007ff560000000 RDI: ffff965f4ecb4d80 Above, the addresses derived as as follows: 0x00007ff550000000 is -0x20(%rax) and is vma_tmp->vm_start 0x00007ff560000000 is %rsi which is end crash> px (0xffff966e1cf1a458-0x20) $1 = 0xffff966e1cf1a438 crash> crash> rd 0xffff966e1cf1a438 ffff966e1cf1a438: 00007ff550000000 ...P.... crash> And here, the values are compared to see if vma_tmp->vm_start < end: crash> px (0x7ff550000000<0x7ff560000000) $3 = 0x1 crash> Indeed vma_tmp->vm_start < end | ||||
Steps To Reproduce | Did not find a way to reproduce this issue. | ||||
Additional Information | Searched redhat portal, find a similar issues reported on Red Hat Enterprise Linux 7, it reported the issue has been resolved with kernel-3.10.0-957.el7 via RHSA-2018:3083 as follow: https://access.redhat.com/solutions/3392791 However the issue happen on my server with kernel 3.10.0-957.21.3.el7.x86_64. the vmcore file can be downloaded via this way: wget http://129.226.115.161/vmcore-5025117.tar.gz | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | the dump file: wget http://129.226.115.161/vmcore-5025117.tar.gz | ||||
CentOS 7.2 is deprecated since December 2016. The specific kernel you attempted to use is not supported since summer 2017. Since we do not and cannot support anything but the latest versions ( currently CentOS 7.8 / kernel 3.10.0-1127.13.1.el7 ) , please update your system and if you can still reproduce the issue, let us know. Otherwise I will close this bug. Note that if you insist on usingolder versions of the OS, you are encouraged to purchase an EUS subscription from RedHat. But even they do not support something as old as you are attempting to use. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2020-07-01 03:55 | chudihuang | New Issue | |
2020-07-01 04:26 | ManuelWolfshant | Note Added: 0037282 |