View Issue Details

IDProjectCategoryView StatusLast Update
0017554CentOS-7kernelpublic2020-07-01 04:26
Reporterchudihuang 
PriorityhighSeverityblockReproducibilityrandom
Status newResolutionopen 
PlatformlinuxOS3.10.0-957.21.3.el7.x86_64OS Versioncentos7.2
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0017554: System panics with "kernel BUG at mm/mmap.c:741!"
DescriptionCentOS 7.2 with Kernel Ver 3.10.0-957.21.3.el7.x86_64.

System panics with a panic string "kernel BUG at mm/mmap.c:741!"


Load the vmcore in crash . Crash should present details similar to the following:

KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.21.3.el7.x86_64/vmlinux
    DUMPFILE: vmcore [PARTIAL DUMP]
        CPUS: 16
        DATE: Sun Jun 28 15:13:13 2020
      UPTIME: 192 days, 21:53:43
LOAD AVERAGE: 0.21, 0.54, 0.63
       TASKS: 1140
     RELEASE: 3.10.0-957.21.3.el7.x86_64
     VERSION: #1 SMP Tue Jun 18 16:35:19 UTC 2019
     MACHINE: x86_64 (2494 Mhz)
      MEMORY: 64 GB
       PANIC: "kernel BUG at mm/mmap.c:741!"

crash> log | grep BUG
[16667623.237709] kernel BUG at mm/mmap.c:741!
crash>

Referencing the line above in the source code, mm/mmap.c:741, we see the panic is due to a BUGON in __insert_vm_struct:

int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
        unsigned long end, pgoff_t pgoff, struct vm_area_struct *insert,
        struct vm_area_struct *expand)
{
      ...
      __insert_vm_struct(mm, insert);
      ...
}

 734 static void __insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 735 {
 736 struct vm_area_struct *prev;
 737 struct rb_node **rb_link, *rb_parent;
 738
 739 if (find_vma_links(mm, vma->vm_start, vma->vm_end,
 740 &prev, &rb_link, &rb_parent))
 741 BUG();
 742 __vma_link(mm, vma, prev, rb_link, rb_parent);
 743 mm->map_count++;
 744 }


 This BUGON is encountered after exiting find_vma_links(...) on line 739 above likely too early due to an overlapping VMA. Checking therein causes find_vma_links(...) to return -ENOMEM:
 
 591 static int find_vma_links(struct mm_struct *mm, unsigned long addr,
 592 unsigned long end, struct vm_area_struct **pprev,
 593 struct rb_node ***rb_link, struct rb_node **rb_parent)
 594 {
 595 struct rb_node **__rb_link, *__rb_parent, *rb_prev;
 596
 597 __rb_link = &mm->mm_rb.rb_node;
 598 rb_prev = __rb_parent = NULL;
 599
 600 while (*__rb_link) {
 601 struct vm_area_struct *vma_tmp;
 602
 603 __rb_parent = *__rb_link;
 604 vma_tmp = rb_entry(__rb_parent, struct vm_area_struct, vm_rb);
 605
 606 if (vma_tmp->vm_end > addr) {
 607 /* Fail if an existing vma overlaps the area */
 608 if (vma_tmp->vm_start < end)
 609 return -ENOMEM;
 610 __rb_link = &__rb_parent->rb_left;
 611 } else {
 612 rb_prev = __rb_parent;
 613 __rb_link = &__rb_parent->rb_right;
 614 }
 615 }
 616
 617 *pprev = NULL;
 618 if (rb_prev)
 619 *pprev = rb_entry(rb_prev, struct vm_area_struct, vm_rb);
 620 *rb_link = __rb_link;
 621 *rb_parent = __rb_parent;
 622 return 0;
 623 }


Above, the kernel searches for a place in the VMA area within the mm_struct to create a new entry.
However, in parsing the VMAs, we find an overlap returning ENOMEM.
To investigate, the backtrace of the faulting process must be investigated to review the state of the respective mm_struct:

crash> bt
PID: 4183 TASK: ffff965e65cf0000 CPU: 7 COMMAND: "java"
 #0 [ffff965e76bf3a60] machine_kexec at ffffffff81863934
 #1 [ffff965e76bf3ac0] __crash_kexec at ffffffff8191d162
 #2 [ffff965e76bf3b90] crash_kexec at ffffffff8191d250
 #3 [ffff965e76bf3ba8] oops_end at ffffffff81f6d778
 #4 [ffff965e76bf3bd0] die at ffffffff8182f95b
 #5 [ffff965e76bf3c00] do_trap at ffffffff81f6cec0
 #6 [ffff965e76bf3c50] do_invalid_op at ffffffff8182c2a4
 #7 [ffff965e76bf3d00] invalid_op at ffffffff81f7912e
    [exception RIP: __vma_adjust+0x5e8]
    RIP: ffffffff819f0618 RSP: ffff965e76bf3db0 RFLAGS: 00010206
    RAX: ffff966e1cf1a458 RBX: ffff965f4ecb4df8 RCX: ffff966e3a623e80
    RDX: 00007ff55c000000 RSI: 00007ff560000000 RDI: ffff965f4ecb4d80
    RBP: ffff965e76bf3e48 R8: ffff966e3a623e80 R9: 0000000000000000
    R10: 0000000000004022 R11: 0000000000000000 R12: ffff966e1cf1af30
    R13: ffff965f4ecb4d80 R14: 0000000000000000 R15: ffff966e3a623e88
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #8 [ffff965e76bf3e50] __split_vma at ffffffff819f1918
 #9 [ffff965e76bf3e90] do_munmap at ffffffff819f1aca
#10 [ffff965e76bf3ee0] vm_munmap at ffffffff819f1e55
#11 [ffff965e76bf3f30] sys_munmap at ffffffff819f30e2
#12 [ffff965e76bf3f50] system_call_fastpath at ffffffff81f75ddb
    RIP: 00007ff59da85e57 RSP: 00007ff584ebbfb0 RFLAGS: 00000202
    RAX: 000000000000000b RBX: 00007ff558000000 RCX: ffffffffffffffff
    RDX: 0000000000000000 RSI: 0000000004000000 RDI: 00007ff55c000000
    RBP: 0000000000021000 R8: ffffffffffffffff R9: 0000000000000000
    R10: 0000000000004022 R11: 0000000000000206 R12: 0000000000000000
    R13: 00007ff584ebbbd0 R14: 0000000000000000 R15: 0000000000000000
    ORIG_RAX: 000000000000000b CS: 0033 SS: 002b
crash>

And the corresponding dissasembly for the panic location:

crash> dis -r __vma_adjust+0x5e8 | tail -5
0xffffffff819f0605 <__vma_adjust+0x5d5>: jne 0xffffffff819f0235 <__vma_adjust+0x205>
0xffffffff819f060b <__vma_adjust+0x5db>: mov %rax,0x10(%rcx)
0xffffffff819f060f <__vma_adjust+0x5df>: jmpq 0xffffffff819f0235 <__vma_adjust+0x205>
0xffffffff819f0614 <__vma_adjust+0x5e4>: nopl 0x0(%rax)
0xffffffff819f0618 <__vma_adjust+0x5e8>: ud2
crash>

Jumped to the ud2 that caused the panic. Where did we jump from?

crash> dis __vma_adjust | grep 0x5e8
0xffffffff819f03c4 <__vma_adjust+0x394>: ja 0xffffffff819f0618 <__vma_adjust+0x5e8>
0xffffffff819f0618 <__vma_adjust+0x5e8>: ud2
crash>
crash>

And the assembly before the ja:

crash> dis __vma_adjust | grep -B 5 __vma_adjust+0x394
0xffffffff819f03b2 <__vma_adjust+0x382>: xor %r8d,%r8d
0xffffffff819f03b5 <__vma_adjust+0x385>: lea 0x8(%rax),%r15
0xffffffff819f03b9 <__vma_adjust+0x389>: jmp 0xffffffff819f03d1 <__vma_adjust+0x3a1>
0xffffffff819f03bb <__vma_adjust+0x38b>: nopl 0x0(%rax,%rax,1)
0xffffffff819f03c0 <__vma_adjust+0x390>: cmp -0x20(%rax),%rsi
0xffffffff819f03c4 <__vma_adjust+0x394>: ja 0xffffffff819f0618 <__vma_adjust+0x5e8>
crash>


Above rax is likely the rb_node* __rb_parent, so rb_node - 0x20 is likely the vm_area_struct as
rb_entry does a containerof with __rb_parent, struct vm_area_struct, vm_rb:

crash> vm_area_struct.vm_rb -xo
struct vm_area_struct {
  [0x20] struct rb_node vm_rb;
}
crash>

crash> vm_area_struct.vm_start -xo
struct vm_area_struct {
   [0x0] unsigned long vm_start;
}

So what are the values? Here, the registers must be reviewed to perform the appropriate math and extract the vma_tmp->vm_start and end addresses:

crash> bt | grep -e RAX -e RSI
    RAX: ffff966e1cf1a458 RBX: ffff965f4ecb4df8 RCX: ffff966e3a623e80
    RDX: 00007ff55c000000 RSI: 00007ff560000000 RDI: ffff965f4ecb4d80

Above, the addresses derived as as follows:

0x00007ff550000000 is -0x20(%rax) and is vma_tmp->vm_start
0x00007ff560000000 is %rsi which is end


crash> px (0xffff966e1cf1a458-0x20)
$1 = 0xffff966e1cf1a438
crash>

crash> rd 0xffff966e1cf1a438
ffff966e1cf1a438: 00007ff550000000 ...P....
crash>

And here, the values are compared to see if vma_tmp->vm_start < end:

crash> px (0x7ff550000000<0x7ff560000000)
$3 = 0x1
crash>


Indeed vma_tmp->vm_start < end
Steps To ReproduceDid not find a way to reproduce this issue.
Additional InformationSearched redhat portal, find a similar issues reported on Red Hat Enterprise Linux 7, it reported the issue has been resolved with kernel-3.10.0-957.el7 via RHSA-2018:3083 as follow:

https://access.redhat.com/solutions/3392791

However the issue happen on my server with kernel 3.10.0-957.21.3.el7.x86_64.

the vmcore file can be downloaded via this way: wget http://129.226.115.161/vmcore-5025117.tar.gz


TagsNo tags attached.
abrt_hash
URLthe dump file: wget http://129.226.115.161/vmcore-5025117.tar.gz

Activities

ManuelWolfshant

ManuelWolfshant

2020-07-01 04:26

manager   ~0037282

CentOS 7.2 is deprecated since December 2016. The specific kernel you attempted to use is not supported since summer 2017.
Since we do not and cannot support anything but the latest versions ( currently CentOS 7.8 / kernel 3.10.0-1127.13.1.el7 ) , please update your system and if you can still reproduce the issue, let us know. Otherwise I will close this bug.

Note that if you insist on usingolder versions of the OS, you are encouraged to purchase an EUS subscription from RedHat. But even they do not support something as old as you are attempting to use.

Issue History

Date Modified Username Field Change
2020-07-01 03:55 chudihuang New Issue
2020-07-01 04:26 ManuelWolfshant Note Added: 0037282