View Issue Details

IDProjectCategoryView StatusLast Update
0015859CentOS-7kernelpublic2019-02-22 13:37
ReporterPh0enix 
PrioritynormalSeveritycrashReproducibilityalways
Status newResolutionopen 
Platformx86_64OSCentOSOS Version7
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0015859: System crashes at "kernel BUG at mm/usercopy.c:72!"
DescriptionA little bit of background: There a script that collects monitoring data by connecting to multiple hosts via telnet every minute. The script has been working fine for a few years, then suddenly after the latest update the system started to crash at least once every 20 minutes.

 Currently the investigation produced the following results:

- The kernel was updated from 3.10.0-862.14.4.el7.x86_64 to 3.10.0-957.5.1.el7.x86_64 and that's when the problem started.

- `dmesg` contains the following:

    [614918.494483] usercopy: kernel memory exposure attempt detected from ffff9f503d7aa005 (kmalloc-4096) (8187 bytes)
    [614918.505563] ------------[ cut here ]------------
    [614918.511080] kernel BUG at mm/usercopy.c:72!

- The part about "kernel BUG at mm/usercopy.c:72!" is consistent across all crashes.

- Looking further with crash reveals the following:

    KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.5.1.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2019-02-20-15:12:16/vmcore [PARTIAL DUMP]
    CPUS: 56
    DATE: Wed Feb 20 15:11:58 2019
    UPTIME: 00:18:52
    LOAD AVERAGE: 1.41, 1.84, 1.86
    TASKS: 2052
    NODENAME: XXX
    RELEASE: 3.10.0-957.5.1.el7.x86_64
    VERSION: #1 SMP Fri Feb 1 14:54:57 UTC 2019
    MACHINE: x86_64 (2594 Mhz)
    MEMORY: 127.9 GB
    PANIC: "kernel BUG at mm/usercopy.c:72!"
    PID: 27982
    COMMAND: "telnet"
    TASK: ffff8e59f4e44100 [THREAD_INFO: ffff8e594e630000]
    CPU: 29
    STATE: TASK_RUNNING (PANIC)

- Backtrace:

     #1 [ffff8e594e633a20] __crash_kexec at ffffffff87d1cf32
     #2 [ffff8e594e633af0] crash_kexec at ffffffff87d1d020
     #3 [ffff8e594e633b08] oops_end at ffffffff8836c758
     #4 [ffff8e594e633b30] die at ffffffff87c2f95b
     #5 [ffff8e594e633b60] do_trap at ffffffff8836bea0
     #6 [ffff8e594e633bb0] do_invalid_op at ffffffff87c2c2a4
     #7 [ffff8e594e633c60] invalid_op at ffffffff8837812e
        [exception RIP: __check_object_size+135]
        RIP: ffffffff87e3e4a7 RSP: ffff8e594e633d18 RFLAGS: 00010246
        RAX: 0000000000000063 RBX: ffff8e5abd0b9005 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffff8e4abf9d3898 RDI: ffff8e4abf9d3898
        RBP: ffff8e594e633d38 R8: 0000000000000000 R9: ffff8e4ab97c6f00
        R10: 0000000000000777 R11: 0000000000000001 R12: 0000000000001ffb
        R13: 0000000000000001 R14: ffff8e5abd0bb000 R15: ffff8e5abae30800
        ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    ...

- The content of RIP is also the same across crashes.

- Content of CS: 0010 suggests that the crash happens when the system is in kernel mode, not in userspace, since it ends with an even number.

- To sum it up it looks like during the execution of telnet some memory operations don't work as expected and cause the system to crash. May be further analysis of the crash dumps can shed more light on the matter but that has not been done.

I was able to find a somewhat similar issue: ​centos 7.6 kernel panic caused by osd https://www.spinics.net/lists/ceph-users/msg50304.html
Steps To ReproduceOn my system the issue can be reproduced by enabling certain scripts that utilize telnet. The crash may happen in just 1 or two minutes but on average it takes about 20 minutes, which means that during that time telnet was invoked about 2000 times. Each session lasts just a few seconds.
Tagscentos7, memory
abrt_hash
URL

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2019-02-22 13:37 Ph0enix New Issue
2019-02-22 13:37 Ph0enix Tag Attached: centos7
2019-02-22 13:37 Ph0enix Tag Attached: memory