View Issue Details

IDProjectCategoryView StatusLast Update
0015578CentOS-7kernelpublic2019-04-04 14:41
Reporterdennisxrow 
PrioritynormalSeveritycrashReproducibilityrandom
Status newResolutionopen 
PlatformOSCentOS Linux release 7.6.1810OS Version
Product Version 
Target VersionFixed in Version 
Summary0015578: SLUB: Unable to allocate memory on node -1 - exception RIP: __kmalloc_track_caller+148
DescriptionHello,
every so often, usually every 3 to 6 days, one of our servers panics and reboots with the panic documented below.

Any help / guidance is highly appreciated.
Steps To ReproduceProblem occurs randomly every 3 to 6 days.
Additional Information# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

# uname -r
3.10.0-957.1.3.el7.x86_64

# cat vmcore-dmesg.txt
[68133.800542] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[68133.800546] cache: kmalloc-64(86:1256ccd6d2291e6830b86fe5dfdf155ab870005e4bdd9c950a09eea1681259a6), object size: 64, buffer size: 64, default order: 0, min order: 0
[68133.800548] node 0: slabs: 4641, objs: 297024, free: 0
[68133.949979] general protection fault: 0000 [#1] SMP
[68133.950016] Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_addrtype ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment xt_mark iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter xt_conntrack nf_nat nf_conntrack_netlink nf_conntrack overlay(T) ip_set_hash_ip ip_set nfnetlink rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver tcp_diag nfs inet_diag lockd grace fscache sunrpc ppdev sb_edac iosf_mbi kvm_intel kvm vmw_balloon irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg parport_pc parport vmw_vmci i2c_piix4 br_netfilter bridge stp llc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif
[68133.950324] crct10dif_generic ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm nfit drm vmxnet3 libnvdimm ata_piix libata vmw_pvscsi drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod
[68133.950585] CPU: 6 PID: 28953 Comm: httpd Kdump: loaded Tainted: G ------------ T 3.10.0-957.1.3.el7.x86_64 #1
[68133.950669] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[68133.950772] task: ffff9861f7bfe180 ti: ffff9861abb7c000 task.ti: ffff9861abb7c000
[68133.950854] RIP: 0010:[<ffffffffa241f204>] [<ffffffffa241f204>] __kmalloc_track_caller+0x94/0x240
[68133.950906] RSP: 0018:ffff9861abb7f9a8 EFLAGS: 00010286
[68133.950931] RAX: 0000000000000000 RBX: ffffffffa2c8e379 RCX: 0000000000022c33
[68133.950953] RDX: 0000000000022c32 RSI: 0000000000000050 RDI: ffff985e7fc03e00
[68133.950974] RBP: ffff9861abb7f9e0 R08: 0000426d4000bf40 R09: ffff985e7fc03e00
[68133.951003] R10: ffff9862ff7a6960 R11: ffffe6718c279180 R12: 0000000000000050
[68133.951024] R13: ffff00745f637073 R14: 0000000000000008 R15: ffff985da28fc300
[68133.951045] FS: 00007fafb8ffe880(0000) GS:ffff9862ffd80000(0000) knlGS:0000000000000000
[68133.951069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[68133.951087] CR2: 00007fafb9007000 CR3: 00000004d7be8000 CR4: 00000000001607e0
[68133.951148] Call Trace:
[68133.951173] [<ffffffffa2504901>] ? selinux_inode_init_security+0x151/0x290
[68133.951206] [<ffffffffa23d5cd1>] kstrdup+0x31/0x60
[68133.951254] [<ffffffffa2504901>] selinux_inode_init_security+0x151/0x290
[68133.952003] [<ffffffffc0624990>] ? xfs_init_security+0x20/0x20 [xfs]
[68133.952651] [<ffffffffa24f84e1>] security_inode_init_security+0x71/0x130
[68133.953307] [<ffffffffc0624988>] xfs_init_security+0x18/0x20 [xfs]
[68133.953960] [<ffffffffc0625d12>] xfs_generic_create+0xf2/0x2b0 [xfs]
[68133.954639] [<ffffffffc0625f04>] xfs_vn_mknod+0x14/0x20 [xfs]
[68133.955406] [<ffffffffc0625f43>] xfs_vn_create+0x13/0x20 [xfs]
[68133.956144] [<ffffffffa244e5b3>] vfs_create+0xd3/0x140
[68133.956852] [<ffffffffc0a37863>] ovl_create_real+0xb3/0x240 [overlay]
[68133.957483] [<ffffffffc0a38ff8>] ovl_create_or_link+0x1d8/0x350 [overlay]
[68133.958054] [<ffffffffc0a3922a>] ovl_create_object+0xba/0xf0 [overlay]
[68133.958611] [<ffffffffc0a39313>] ovl_create+0x23/0x30 [overlay]
[68133.959151] [<ffffffffa244e5b3>] vfs_create+0xd3/0x140
[68133.959674] [<ffffffffa245068d>] do_last+0x10cd/0x12a0
[68133.960187] [<ffffffffa250233c>] ? selinux_file_alloc_security+0x3c/0x60
[68133.960690] [<ffffffffa2452667>] path_openat+0xd7/0x640
[68133.961179] [<ffffffffc0a36374>] ? ovl_getattr+0x74/0x200 [overlay]
[68133.961659] [<ffffffffa243e2aa>] ? __check_object_size+0x1ca/0x250
[68133.962129] [<ffffffffa245406d>] do_filp_open+0x4d/0xb0
[68133.962596] [<ffffffffa24616f7>] ? __alloc_fd+0x47/0x170
[68133.963040] [<ffffffffa2440197>] do_sys_open+0x137/0x240
[68133.963478] [<ffffffffa24402be>] SyS_open+0x1e/0x20
[68133.963896] [<ffffffffa2974ddb>] system_call_fastpath+0x22/0x27
[68133.964303] Code: 1f bf 5d 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 29 01 00 00 48 85 c0 0f 84 20 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 5c 05 00 4c 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49
[68133.965605] RIP [<ffffffffa241f204>] __kmalloc_track_caller+0x94/0x240
[68133.966035] RSP <ffff9861abb7f9a8>

# crash /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux vmcore

crash 7.2.3-8.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [530MB]: patching 85619 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-957.1.3.el7.x86_64/vmlinux
    DUMPFILE: vmcore [PARTIAL DUMP]
        CPUS: 8
        DATE: Thu Dec 13 13:15:13 2018
      UPTIME: 18:55:33
LOAD AVERAGE: 0.90, 0.89, 0.89
       TASKS: 1911
    NODENAME: <stripped out>
     RELEASE: 3.10.0-957.1.3.el7.x86_64
     VERSION: #1 SMP Thu Nov 29 14:49:43 UTC 2018
     MACHINE: x86_64 (2593 Mhz)
      MEMORY: 23 GB
       PANIC: "general protection fault: 0000 [#1] SMP "
         PID: 28953
     COMMAND: "httpd"
        TASK: ffff9861f7bfe180 [THREAD_INFO: ffff9861abb7c000]
         CPU: 6
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 28953 TASK: ffff9861f7bfe180 CPU: 6 COMMAND: "httpd"
 #0 [ffff9861abb7f718] machine_kexec at ffffffffa2263674
 #1 [ffff9861abb7f778] __crash_kexec at ffffffffa231cef2
 #2 [ffff9861abb7f848] crash_kexec at ffffffffa231cfe0
 #3 [ffff9861abb7f860] oops_end at ffffffffa296c758
 #4 [ffff9861abb7f888] die at ffffffffa222f95b
 #5 [ffff9861abb7f8b8] do_general_protection at ffffffffa296c052
 #6 [ffff9861abb7f8f0] general_protection at ffffffffa296b6f8
    [exception RIP: __kmalloc_track_caller+148]
    RIP: ffffffffa241f204 RSP: ffff9861abb7f9a8 RFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffffffffa2c8e379 RCX: 0000000000022c33
    RDX: 0000000000022c32 RSI: 0000000000000050 RDI: ffff985e7fc03e00
    RBP: ffff9861abb7f9e0 R8: 0000426d4000bf40 R9: ffff985e7fc03e00
    R10: ffff9862ff7a6960 R11: ffffe6718c279180 R12: 0000000000000050
    R13: ffff00745f637073 R14: 0000000000000008 R15: ffff985da28fc300
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #7 [ffff9861abb7f9e8] kstrdup at ffffffffa23d5cd1
 #8 [ffff9861abb7fa10] selinux_inode_init_security at ffffffffa2504901
 #9 [ffff9861abb7fa68] security_inode_init_security at ffffffffa24f84e1
#10 [ffff9861abb7faf0] xfs_init_security at ffffffffc0624988 [xfs]
#11 [ffff9861abb7fb00] xfs_generic_create at ffffffffc0625d12 [xfs]
#12 [ffff9861abb7fb60] xfs_vn_mknod at ffffffffc0625f04 [xfs]
#13 [ffff9861abb7fb70] xfs_vn_create at ffffffffc0625f43 [xfs]
#14 [ffff9861abb7fb80] vfs_create at ffffffffa244e5b3
#15 [ffff9861abb7fbb8] ovl_create_real at ffffffffc0a37863 [overlay]
#16 [ffff9861abb7fbf0] ovl_create_or_link at ffffffffc0a38ff8 [overlay]
#17 [ffff9861abb7fc40] ovl_create_object at ffffffffc0a3922a [overlay]
#18 [ffff9861abb7fc88] ovl_create at ffffffffc0a39313 [overlay]
#19 [ffff9861abb7fc98] vfs_create at ffffffffa244e5b3
#20 [ffff9861abb7fcd0] do_last at ffffffffa245068d
#21 [ffff9861abb7fd70] path_openat at ffffffffa2452667
#22 [ffff9861abb7fe08] do_filp_open at ffffffffa245406d
#23 [ffff9861abb7fee0] do_sys_open at ffffffffa2440197
#24 [ffff9861abb7ff40] sys_open at ffffffffa24402be
#25 [ffff9861abb7ff50] system_call_fastpath at ffffffffa2974ddb
    RIP: 00007fafb7ae1f80 RSP: 00007ffecc77d548 RFLAGS: 00000246
    RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffffffffff
    RDX: 00000000000001b6 RSI: 0000000000000241 RDI: 00007ffecc77c500
    RBP: 00007faf778b0db8 R8: 00007ffecc77c560 R9: 7777772f7261762f
    R10: 682f73657469732f R11: 0000000000000246 R12: 00007ffecc77c500
    R13: 0000000000000000 R14: 00007faf753b9e58 R15: 00007faf755b9bc0
    ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b
crash> kmem -i
                 PAGES TOTAL PERCENTAGE
    TOTAL MEM 5872611 22.4 GB ----
         FREE 1399714 5.3 GB 23% of TOTAL MEM
         USED 4472897 17.1 GB 76% of TOTAL MEM
       SHARED 1653716 6.3 GB 28% of TOTAL MEM
      BUFFERS 526 2.1 MB 0% of TOTAL MEM
       CACHED 3016735 11.5 GB 51% of TOTAL MEM
         SLAB 583648 2.2 GB 9% of TOTAL MEM

   TOTAL HUGE 0 0 ----
    HUGE FREE 0 0 0% of TOTAL HUGE

   TOTAL SWAP 0 0 ----
    SWAP USED 0 0 0% of TOTAL SWAP
    SWAP FREE 0 0 0% of TOTAL SWAP

 COMMIT LIMIT 2936305 11.2 GB ----
    COMMITTED 2716588 10.4 GB 92% of TOTAL LIMIT
crash>

The server is mainly running docker and kubernetes components, the httpd process above is running inside a docker container.
Tagscrash, docker, httpd, kernel
abrt_hash
URL

Activities

SiriusP2324

SiriusP2324

2018-12-14 23:24

reporter   ~0033345

Quick research shows that the solution may be to increase /proc/sys/vm/min_free_kbytes, which controls the amount of memory reserved for atomic allocations (Be Cautious). Atomic allocations are requests for memory that must be satisfied without giving up control (i.e. from interruptions). The slab allocator is responsible for managing cache objects of various size, to provide fast allocations needed. (https://lwn.net/Articles/229984/).

In this case, the slab allocator had no more (atomic) memory available to quickly allocate, thus the error was thrown: SLUB: Unable to allocate memory on node -1.

Since this is the case, increasing the amount of atomic memory that is reserved, should resolve the issue. This value is /proc/sys/vm/min_free_kbytes. Given that this happened with a Kubernetes/Docker platform, it's not too surprising. Again, you need to be very cautious changing this value; it can have drastic implications on your system if set too high or too low.
 Below is a reference guide from Red Hat.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-tunables
dennisxrow

dennisxrow

2018-12-19 09:02

reporter   ~0033387

Hi,
thank you very much for your answer.
We will try out increasing /proc/sys/vm/min_free_kbytes and let you know the results.

Thanks!
stefanogristina@hotmail.com

stefanogristina@hotmail.com

2019-04-04 14:41

reporter   ~0034161

Hi All,

The issue is related to the kmem accounting feature in Redhat’s forked Linux Kernel that is incomplete and prone to bugs which cause kernel deadlock or slow kernel memory leak. These kmem bugs are triggered when any container runtime (e.g., Docker) activates the kmem accounting: well explained in https://support.mesosphere.com/s/article/Critical-Issue-KMEM-MSPH-2018-0006.


The kmem accouting is enabled in docker until 18.09.0. In 18.09.1 this accouting is disabled and it avoids to get into the Red-Hat crash. Please read the release-notes of docker: https://docs.docker.com/engine/release-notes/.

Use on RedHat 7.x docker version>=18.09.1.

Stefano

Issue History

Date Modified Username Field Change
2018-12-13 15:40 dennisxrow New Issue
2018-12-13 15:40 dennisxrow Tag Attached: crash
2018-12-13 15:40 dennisxrow Tag Attached: docker
2018-12-13 15:40 dennisxrow Tag Attached: httpd
2018-12-13 15:40 dennisxrow Tag Attached: kernel
2018-12-14 23:24 SiriusP2324 Note Added: 0033345
2018-12-19 09:02 dennisxrow Note Added: 0033387
2019-04-04 14:41 stefanogristina@hotmail.com Note Added: 0034161