View Issue Details

IDProjectCategoryView StatusLast Update
0017031CentOS-7kernelpublic2020-02-10 14:45
Reporternobody_caq8 
PrioritynormalSeveritycrashReproducibilityunable to reproduce
Status newResolutionopen 
Platformx86-64OS3.10.0-957.el7.x86_64OS Version3.10.0-957.el7.x
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0017031: cgroup_rmdir result in crash because of wrong ref count
Description[17528853.361205] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[17528853.362884] IP: [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
[17528853.364358] PGD 0
[17528853.365789] Oops: 0000 [#1] SMP
[17528853.367196] Modules linked in: fuse tcp_diag udp_diag inet_diag 8021q garp mrp nvidia_uvm(POE) veth ip6table_nat ip6_tables openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 xt_set ip_set_bitmap_port ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set dummy xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat overlay(T) nf_conntrack_ipv4 nf_defrag_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack br_netfilter bridge stp llc bonding sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support xfs libcrc32c skx_edac coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd
[17528853.376009] pcspkr sg joydev mei_me i2c_i801 mei lpc_ich ipmi_si acpi_power_meter ip_tables ext4 mbcache jbd2 nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ttm syscopyarea sysfillrect sysimgblt ixgbe fb_sys_fops drm ahci libahci megaraid_sas mdio ptp pps_core libata dca drm_panel_orientation_quirks wmi ipmi_devintf ipmi_msghandler nfit libnvdimm
[17528853.382015] CPU: 29 PID: 396548 Comm: nvidia-containe Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
[17528853.384996] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
[17528853.386496] task: ffff8fc225aee180 ti: ffff8fbf06cec000 task.ti: ffff8fbf06cec000
[17528853.387985] RIP: 0010:[<ffffffff9e65deaa>] [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
[17528853.389474] RSP: 0018:ffff8fbf06cefda0 EFLAGS: 00010202
[17528853.390936] RAX: 0000000000000001 RBX: ffff8fd5b42cf500 RCX: 0000002600000000
[17528853.392398] RDX: 00000000ffffffd9 RSI: ffff8fd5b42cf500 RDI: 0000000000000000
[17528853.393835] RBP: ffff8fbf06cefda8 R08: 000000000001f0c0 R09: ffffffff9e658bae
[17528853.395265] R10: ffff8fd17fa5f0c0 R11: fffff46641c1bb80 R12: ffff8ff11d065a30
[17528853.396681] R13: ffff8fd5b42cf500 R14: 0000000000000000 R15: ffff8fc069b58600
[17528853.398080] FS: 00007f7a83f9d740(0000) GS:ffff8fd17fa40000(0000) knlGS:0000000000000000
[17528853.399472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17528853.400851] CR2: 0000000000000048 CR3: 0000002d61ff2000 CR4: 00000000007607e0
[17528853.402227] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17528853.403587] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17528853.404914] PKRU: 55555554
[17528853.406224] Call Trace:
[17528853.407512] [<ffffffff9e66b1ec>] simple_rmdir+0x2c/0x50
[17528853.408792] [<ffffffff9e525888>] cgroup_destroy_locked+0x238/0x370
[17528853.410057] [<ffffffff9e5259e2>] cgroup_rmdir+0x22/0x40
[17528853.411303] [<ffffffff9e64dc2c>] vfs_rmdir+0xdc/0x150
[17528853.412528] [<ffffffff9e6530d1>] do_rmdir+0x1f1/0x220
[17528853.413732] [<ffffffff9e647034>] ? SYSC_newfstatat+0x24/0x60
[17528853.414916] [<ffffffff9e654335>] SyS_unlinkat+0x25/0x40
[17528853.416087] [<ffffffff9eb74ddb>] system_call_fastpath+0x22/0x27
[17528853.417239] Code: 00 c7 47 48 00 00 00 00 48 8b 47 28 f0 48 ff 80 e0 03 00 00 5d c3 90 66 2e 0f 1f 84 00 00
 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 <8b> 47 48 48 89 fb 85 c0 74 19 83 e8 01 85 c0 89 43 48 75 0c 48
[17528853.419656] RIP [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
[17528853.420811] RSP <ffff8fbf06cefda0>
[17528853.421940] CR2: 0000000000000048

and i do the basic analysis

crash> dis -l drop_nlink+10
/usr/src/debug/kernel-3.10.0-957.el7/linux-3.10.0-957.el7.x86_64/fs/inode.c: 284
0xffffffff9e65deaa <drop_nlink+10>: mov 0x48(%rdi),%eax

it show that the inode is null when call the drop_nlink
the inode is dentry->d_inode, and i get the dentry from the register:
crash> dentry.d_name,d_inode,d_lockref ffff8fd5b42cf500
  d_name = {
    {
      {
        hash = 135773531,
        len = 64
      },
      hash_len = 275013680475
    },
    name = 0xffff8fa3fb67b060 "0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c"
  }
  d_inode = 0x0
  d_lockref = {
    {
      lock_count = 4294967296,
      {
        lock = {
          {
            rlock = {
              raw_lock = {
                val = {
                  counter = 0
                }
              }
            }
          }
        },
        count = 1-----------the wrong refcount
      }
    }
  }

the call trace is :
cgroup_destroy_locked-->cgroup_d_remove_dir-->remove_dir-->simple_rmdir-->drop_nlink(oops)
before excute remove_dir--->simple_rmdir,it excute remove_dir-->d_delete first,
and in d_delete,it will set dentry->d_inode to be NULL when dentry->d_lockref.count == 1.
void d_delete(struct dentry * dentry)
{
    struct inode *inode;
    int isdir = 0;
    /*
     * Are we the only user?
     */
again:
    spin_lock(&dentry->d_lock);
    inode = dentry->d_inode;
    isdir = S_ISDIR(inode->i_mode);
    if (dentry->d_lockref.count == 1) {//if we enter this
        if (!spin_trylock(&inode->i_lock)) {
            spin_unlock(&dentry->d_lock);
            cpu_relax();
            goto again;
        }
        dentry->d_flags &= ~DCACHE_CANT_MOUNT;
        dentry_unlink_inode(dentry);//dentry.d_inode will be null
        fsnotify_nameremove(dentry, isdir);
        return;
    }

    if (!d_unhashed(dentry))
        __d_drop(dentry);

    spin_unlock(&dentry->d_lock);

    fsnotify_nameremove(dentry, isdir);
}

we know that in normal situation the dentry->d_lockref.count at least geater than two ,because when we lookhash or alloc a dentry ,it will be 1
static int cgroup_destroy_locked(struct cgroup *cgrp)
{
.....
    dget(d);//we add dentry->d_lockref.count,it will at least 2
    cgroup_d_remove_dir(d);
    dput(d);
....}

so i think dentry->d_lockref.count maybe wrong because of the oom or other reason? any help will be appreciated,thank all.
Steps To Reproduceoccur one time
Additional Informationbefore crash,some oom info is print :
[17528853.189372] python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
[17528853.189377] python cpuset=0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c mems_allowed=0-1
[17528853.189381] CPU: 27 PID: 396391 Comm: python Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
[17528853.189383] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
[17528853.189385] Call Trace:
[17528853.189396] [<ffffffff9eb61dc1>] dump_stack+0x19/0x1b
[17528853.189399] [<ffffffff9eb5c7ea>] dump_header+0x90/0x229
[17528853.189405] [<ffffffff9e5b9dc6>] ? find_lock_task_mm+0x56/0xc0
[17528853.189412] [<ffffffff9e631118>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[17528853.189414] [<ffffffff9e5ba274>] oom_kill_process+0x254/0x3d0
[17528853.189417] [<ffffffff9e634f16>] mem_cgroup_oom_synchronize+0x546/0x570
[17528853.189419] [<ffffffff9e634390>] ? mem_cgroup_charge_common+0xc0/0xc0
[17528853.189422] [<ffffffff9e5bab04>] pagefault_out_of_memory+0x14/0x90
[17528853.189424] [<ffffffff9eb5acf2>] mm_fault_error+0x6a/0x157
[17528853.189429] [<ffffffff9eb6f7a8>] __do_page_fault+0x3c8/0x500
[17528853.189431] [<ffffffff9eb6f915>] do_page_fault+0x35/0x90
[17528853.189434] [<ffffffff9eb6b758>] page_fault+0x28/0x30
[17528853.189438] Task in /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409 killed as a result of limit of /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df
[17528853.189441] memory: usage 5242880kB, limit 5242880kB, failcnt 23453
[17528853.189442] memory+swap: usage 5242880kB, limit 9007199254740988kB, failcnt 0
[17528853.189444] kmem: usage 5180712kB, limit 9007199254740988kB, failcnt 0
[17528853.189445] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df: cache:0KB rss:0KB rs
s_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.189458] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409: cache:0KB rss:48KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:48KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.189470] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c: cache:0KB rss:62120KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:62068KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.189482] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[17528853.189702] [391376] 0 391376 253 1 4 0 -998 pause
[17528853.189716] [396268] 0 396268 1090 90 8 0 -998 dumb-init
[17528853.189731] [396385] 0 396385 5358 408 16 0 -998 run.sh
[17528853.189733] [396391] 0 396391 583729 28149 212 0 -998 python
[17528853.189740] Memory cgroup out of memory: Kill process 391376 (pause) score 0 or sacrifice child
[17528853.190878] Killed process 391376 (pause) total-vm:1012kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
[17528853.207058] python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
[17528853.207062] python cpuset=0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c mems_allowed=0-1
[17528853.207066] CPU: 27 PID: 396391 Comm: python Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
[17528853.207067] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
[17528853.207069] Call Trace:
[17528853.207081] [<ffffffff9eb61dc1>] dump_stack+0x19/0x1b
[17528853.207086] [<ffffffff9eb5c7ea>] dump_header+0x90/0x229
[17528853.207091] [<ffffffff9e5b9dc6>] ? find_lock_task_mm+0x56/0xc0
[17528853.207097] [<ffffffff9e631118>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[17528853.207100] [<ffffffff9e5ba274>] oom_kill_process+0x254/0x3d0
[17528853.207102] [<ffffffff9e634f16>] mem_cgroup_oom_synchronize+0x546/0x570
[17528853.207105] [<ffffffff9e634390>] ? mem_cgroup_charge_common+0xc0/0xc0
[17528853.207107] [<ffffffff9e5bab04>] pagefault_out_of_memory+0x14/0x90
[17528853.207109] [<ffffffff9eb5acf2>] mm_fault_error+0x6a/0x157
[17528853.207114] [<ffffffff9eb6f7a8>] __do_page_fault+0x3c8/0x500
[17528853.207116] [<ffffffff9eb6f915>] do_page_fault+0x35/0x90
[17528853.207119] [<ffffffff9eb6b758>] page_fault+0x28/0x30
[17528853.207122] Task in /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c killed as a result of limit of /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df
[17528853.207125] memory: usage 5242880kB, limit 5242880kB, failcnt 23746
[17528853.207127] memory+swap: usage 5242880kB, limit 9007199254740988kB, failcnt 0
[17528853.207128] kmem: usage 5180712kB, limit 9007199254740988kB, failcnt 0
[17528853.207130] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.207147] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.207162] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c: cache:0KB rss:62168KB rss_huge:0KB---内存多占 mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:62156KB inactive_file:0KB active_file:0KB unevictable:0KB
[17528853.207173] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[17528853.207328] [396268] 0 396268 1090 90 8 0 -998 dumb-init
[17528853.207342] [396385] 0 396385 5358 408 16 0 -998 run.sh
[17528853.207345] [396391] 0 396391 583729 28212 212 0 -998 python
[17528853.207352] Memory cgroup out of memory: Kill process 396467 (python) score 0 or sacrifice child
[17528853.208557] Killed process 396391 (python) total-vm:2334916kB, anon-rss:61720kB, file-rss:51128kB, shmem-rss:0kB
[17528853.217780] ___slab_alloc: 24 callbacks suppressed
[17528853.217783] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.217785] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.217787] node 0: slabs: 1, objs: 49, free: 0
[17528853.217789] node 1: slabs: 0, objs: 0, free: 0
[17528853.251423] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.251426] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.251428] node 0: slabs: 7, objs: 121, free: 12
[17528853.251430] node 1: slabs: 2, objs: 24, free: 0
[17528853.252295] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252298] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252300] node 0: slabs: 7, objs: 121, free: 12
[17528853.252302] node 1: slabs: 2, objs: 24, free: 0
[17528853.252304] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252307] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252309] node 0: slabs: 7, objs: 121, free: 12
[17528853.252310] node 1: slabs: 2, objs: 24, free: 0
[17528853.252600] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252603] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252605] node 0: slabs: 7, objs: 121, free: 12
[17528853.252608] node 1: slabs: 2, objs: 24, free: 0
[17528853.252669] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252672] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252676] node 0: slabs: 7, objs: 121, free: 12
[17528853.252680] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252682] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252684] node 0: slabs: 7, objs: 121, free: 12
[17528853.252686] node 1: slabs: 2, objs: 24, free: 0
[17528853.252688] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252689] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252694] node 0: slabs: 7, objs: 121, free: 12
[17528853.252697] node 1: slabs: 2, objs: 24, free: 0
[17528853.252703] node 1: slabs: 2, objs: 24, free: 0
[17528853.252728] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.252731] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252733] node 0: slabs: 7, objs: 121, free: 12
[17528853.252735] node 1: slabs: 2, objs: 24, free: 0
[17528853.253048] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.253051] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.253053] node 0: slabs: 7, objs: 121, free: 12
[17528853.253055] node 1: slabs: 2, objs: 24, free: 0
[17528853.343423] device vvport87 left promiscuous mode
Tagscentos 7, cgroup crash panic
abrt_hash
URL

Activities

TrevorH

TrevorH

2020-02-10 14:40

manager   ~0036247

yum update

Only the current version is supported and that is the original 7.6 kernel from about a year ago. Many bugs have been fixed since then.
nobody_caq8

nobody_caq8

2020-02-10 14:45

reporter   ~0036248

the oom info before crash:
[17528853.252731] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.252733] node 0: slabs: 7, objs: 121, free: 12
[17528853.252735] node 1: slabs: 2, objs: 24, free: 0
[17528853.253048] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
[17528853.253051] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
[17528853.253053] node 0: slabs: 7, objs: 121, free: 12
[17528853.253055] node 1: slabs: 2, objs: 24, free: 0

from the cache name ,we can see that the dir which be rmdired by the task is the same as the oom .

i check some refcounter overflow git log,and found it is not same as my bug 。

before rmdir,it is just a normal cgroup dir,and it's subdir is just:
crash> dentry.d_parent,d_name.name ffff8fd33f9b2780
  d_parent = 0xffff8fd5b42cf500
  d_name.name = 0xffff8fd33f9b27b8 "memory.soft_limit_in_bytes"
crash> dentry.d_parent,d_name.name ffff8fd49f1486c0
  d_parent = 0xffff8fd5b42cf500
  d_name.name = 0xffff8fd49f1486f8 "memory.kmem.slabinfo"

it is just about 31 subfile like normal cgroup dir。

Issue History

Date Modified Username Field Change
2020-02-10 14:36 nobody_caq8 New Issue
2020-02-10 14:36 nobody_caq8 Tag Attached: centos 7
2020-02-10 14:36 nobody_caq8 Tag Attached: cgroup crash panic
2020-02-10 14:40 TrevorH Note Added: 0036247
2020-02-10 14:45 nobody_caq8 Note Added: 0036248