View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0015859||CentOS-7||kernel||public||2019-02-22 13:37||2019-02-22 13:37|
|Target Version||Fixed in Version|
|Summary||0015859: System crashes at "kernel BUG at mm/usercopy.c:72!"|
|Description||A little bit of background: There a script that collects monitoring data by connecting to multiple hosts via telnet every minute. The script has been working fine for a few years, then suddenly after the latest update the system started to crash at least once every 20 minutes.|
Currently the investigation produced the following results:
- The kernel was updated from 3.10.0-862.14.4.el7.x86_64 to 3.10.0-957.5.1.el7.x86_64 and that's when the problem started.
- `dmesg` contains the following:
[614918.494483] usercopy: kernel memory exposure attempt detected from ffff9f503d7aa005 (kmalloc-4096) (8187 bytes)
[614918.505563] ------------[ cut here ]------------
[614918.511080] kernel BUG at mm/usercopy.c:72!
- The part about "kernel BUG at mm/usercopy.c:72!" is consistent across all crashes.
- Looking further with crash reveals the following:
DUMPFILE: /var/crash/127.0.0.1-2019-02-20-15:12:16/vmcore [PARTIAL DUMP]
DATE: Wed Feb 20 15:11:58 2019
LOAD AVERAGE: 1.41, 1.84, 1.86
VERSION: #1 SMP Fri Feb 1 14:54:57 UTC 2019
MACHINE: x86_64 (2594 Mhz)
MEMORY: 127.9 GB
PANIC: "kernel BUG at mm/usercopy.c:72!"
TASK: ffff8e59f4e44100 [THREAD_INFO: ffff8e594e630000]
STATE: TASK_RUNNING (PANIC)
#1 [ffff8e594e633a20] __crash_kexec at ffffffff87d1cf32
#2 [ffff8e594e633af0] crash_kexec at ffffffff87d1d020
#3 [ffff8e594e633b08] oops_end at ffffffff8836c758
#4 [ffff8e594e633b30] die at ffffffff87c2f95b
#5 [ffff8e594e633b60] do_trap at ffffffff8836bea0
#6 [ffff8e594e633bb0] do_invalid_op at ffffffff87c2c2a4
#7 [ffff8e594e633c60] invalid_op at ffffffff8837812e
[exception RIP: __check_object_size+135]
RIP: ffffffff87e3e4a7 RSP: ffff8e594e633d18 RFLAGS: 00010246
RAX: 0000000000000063 RBX: ffff8e5abd0b9005 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8e4abf9d3898 RDI: ffff8e4abf9d3898
RBP: ffff8e594e633d38 R8: 0000000000000000 R9: ffff8e4ab97c6f00
R10: 0000000000000777 R11: 0000000000000001 R12: 0000000000001ffb
R13: 0000000000000001 R14: ffff8e5abd0bb000 R15: ffff8e5abae30800
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
- The content of RIP is also the same across crashes.
- Content of CS: 0010 suggests that the crash happens when the system is in kernel mode, not in userspace, since it ends with an even number.
- To sum it up it looks like during the execution of telnet some memory operations don't work as expected and cause the system to crash. May be further analysis of the crash dumps can shed more light on the matter but that has not been done.
I was able to find a somewhat similar issue: centos 7.6 kernel panic caused by osd https://www.spinics.net/lists/ceph-users/msg50304.html
|Steps To Reproduce||On my system the issue can be reproduced by enabling certain scripts that utilize telnet. The crash may happen in just 1 or two minutes but on average it takes about 20 minutes, which means that during that time telnet was invoked about 2000 times. Each session lasts just a few seconds.|