View Issue Details

IDProjectCategoryView StatusLast Update
0015093CentOS-7kernelpublic2018-07-28 16:00
Reporterkira 
PrioritynormalSeverityminorReproducibilityN/A
Status assignedResolutionopen 
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0015093: access dangling pointer in __kmem_cache_destroy_memcg_children
DescriptionThis issue is related to https://bugs.centos.org/view.php?id=15079. I can't comment on the old issue. So I file a new one.

@toracat Thanks for your informations. I found the commit in https://github.com/torvalds/linux/commit/b8529907ba35d625fa4b85d3e4dc8021be97c1f3

I have read the source code of linux-3.10.0-862.9.1.el7 which is from CentOS Linux release 7.5.1804 (Core).
I find two main changes about #15079:
```mm/slab_common.c
void kmem_cache_destroy(struct kmem_cache *s)
{
    if (unlikely(!s))
        return;

    get_online_cpus();
    mutex_lock(&slab_mutex);

    s->refcount--;
    if (s->refcount)
        goto out_unlock;

    if (kmem_cache_destroy_memcg_children(s) != 0)
        goto out_unlock;

    list_del(&s->list);
    memcg_unregister_cache(s);

    if (__kmem_cache_shutdown(s) != 0) {
        list_add(&s->list, &slab_caches);
        memcg_register_cache(s);
        printk(KERN_ERR "kmem_cache_destroy %s: "
               "Slab cache still has objects\n", s->name);
        dump_stack();
        goto out_unlock;
    }
    mutex_unlock(&slab_mutex);
    if (s->flags & SLAB_DESTROY_BY_RCU)
        rcu_barrier();

    memcg_free_cache_params(s);
    kfree(s->name);
    kmem_cache_free(kmem_cache, s);
    goto out_put_cpus;

out_unlock:
    mutex_unlock(&slab_mutex);
out_put_cpus:
    put_online_cpus();
}

static int kmem_cache_destroy_memcg_children(struct kmem_cache *s)
{
    int rc;

    if (!s->memcg_params ||
        !s->memcg_params->is_root_cache)
        return 0;

    mutex_unlock(&slab_mutex);
    rc = __kmem_cache_destroy_memcg_children(s);
    mutex_lock(&slab_mutex);

    return rc;
}
```

and

```mm/memcontrol.c
int __kmem_cache_destroy_memcg_children(struct kmem_cache *s)
{
    struct kmem_cache *c;
    int i, failed = 0;

    /*
     * If the cache is being destroyed, we trust that there is no one else
     * requesting objects from it. Even if there are, the sanity checks in
     * kmem_cache_destroy should caught this ill-case.
     *
     * Still, we don't want anyone else freeing memcg_caches under our
     * noses, which can happen if a new memcg comes to life. As usual,
     * we'll take the memcg_limit_mutex to protect ourselves against this.
     */
    mutex_lock(&memcg_limit_mutex);
    for (i = 0; i < memcg_limited_groups_array_size; i++) {
        c = s->memcg_params->memcg_caches[i];
        if (!c)
            continue;

        /*
         * We will now manually delete the caches, so to avoid races
         * we need to cancel all pending destruction workers and
         * proceed with destruction ourselves.
         *
         * kmem_cache_destroy() will call kmem_cache_shrink internally,
         * and that could spawn the workers again: it is likely that
         * the cache still have active pages until this very moment.
         * This would lead us back to mem_cgroup_destroy_cache.
         *
         * But that will not execute at all if the "dead" flag is not
         * set, so flip it down to guarantee we are in control.
         */
        c->memcg_params->dead = false;
        cancel_work_sync(&c->memcg_params->destroy);
        kmem_cache_destroy(c);

        if (cache_from_memcg(s, i))
            failed++;
    }
    mutex_unlock(&memcg_limit_mutex);
    return failed;
}
```

We can't reproduce the issue of dead lock, because:
```
     s->refcount--;
    if (s->refcount)
        goto out_unlock;
```
At this moment, s->refcount comes to -1. So it jumps to out_unlock.


My main concern is that the s is dangling pointer. All accesses against it are illegal. Do I have any misunderstandings?
Tags3.10.0-862.9.1.el7
abrt_hash
URL

Activities

toracat

toracat

2018-07-25 18:48

manager   ~0032350

Just assigned this to @pgreco who has much deeper insight in C. :-)
pgreco

pgreco

2018-07-25 19:58

developer   ~0032352

Last edited: 2018-07-28 10:26

View 2 revisions

If I understand the code correctly -1 is a valid value for refcount (initial value, but it should never reach that value by decrementing).
Without digging deeper into the code, I see a "potential" memory leak (but not invalid access) in case kmem_cache_destroy_memcg_children(s) fails, leaving s->refcount in 0 which could cause it never to try to release again (but s and its members are still valid).
if you wanted to attempt to fix this (in case I'm right), I would add s->refcount++; before the second "goto out_unlock;", leaving s ready for a next attempt to release (in case it is done).
Please note that I may be saying something really stupid for not understanding how this code interacts with the rest of the kernel ;)

kira

kira

2018-07-26 02:51

reporter   ~0032359

@pgreco

the flow of this issue is:

1. In kmem_cache_destroy_memcg_children(), `s` is a root cache and `c`(s->memcg_params->memcg_caches[i]) is a child cache which belongs to a memcg.
2. When a memcg is destroying or any other situations causes memcg to shrink caches, kmem_cache_destroy_work_func() (c->memcg_params->destroy) is called and running by a kernel worker
3. At the same time, occasionally, kmem_cache_destroy_memcg_children() is waiting `cancel_work_sync(&c->memcg_params->destroy)` to finish.

kmem_cache_destroy_work_func() simply calls kmem_cache_destroy() to destroy `c`. kmem_cache_destroy() decrements refcount (to 0) and calls memcg_unregister_cache() to set s->memcg_params->memcg_caches[i] to NULL.

This means kmem_cache_destroy_memcg_children() should get a NULL `c` after cancel_work_sync() because `c` is freed. But it does not handle this case and just pass the dangling `c` to kmem_cache_destroy().

I think `c` should not be accessed after it being freed.
pgreco

pgreco

2018-07-26 11:10

developer   ~0032361

@kira, iiuc memcg_unregister_cache does exactly what you are not seeing.
memcontrol.c, line 3293, it goes to the root of the memcache you want freed, and nulls the array index that matches.
That, together with the null check in line 3485 cover both cases, setting the null and checking the null.
Again, I'm not exactly familiar with the code (but getting there) ;)
kira

kira

2018-07-27 06:35

reporter   ~0032370

@pgreco


Plaese refer to https://bugs.centos.org/view.php?id=15079#c32304

There is a problematic log:
```
Jul 19 12:13:14 c321v70 kernel: kdbg: parent ffff8801f8b04b00 cache ffff88021aa03a00 index 775 is root existing (null)
```

I draw a picture in the attachments to explain what I want to say.

toracat

toracat

2018-07-27 14:19

manager   ~0032376

@kira

Would you be able to look at the latest mainline kernel from kernel.org? I wonder if you see the same bug there or if this is unique to the RHEL kernel. In case you'd like to test-run the binary, kernel-ml is available from ELRepo.

http://elrepo.org/linux/kernel/el7/ (current mainline kernel)
http://elrepo.org/people/ajb/devel/kernel-ml/el7/ ( -rc kernels, right now 4.18.0-0.rc4.el7)
pgreco

pgreco

2018-07-28 10:40

developer   ~0032378

@kira, looking at the log you say, and your drawing, I see exactly de opposite of what you are saying. You are about to destroy a pointer (c) which has already had its reference removed from the memcg_caches.
A problematic log would be something like this
Jul 19 12:13:14 c321v70 kernel: kdbg: parent ffff8801f8b04b00 cache ffff88021aa03a00 index 775 is root existing (ffff88021aa03a00)
because it would mean that someone could be able to access ffff88021aa03a00 even after it is freed by kmem_cache_destroy(c);
kira

kira

2018-07-28 15:03

reporter   ~0032382

@toracat
I have not read latest kernel. So I'm not sure if there is a same bug. I'll check it in next few days.

@pgreco
There are two kernel workers related to this issue:
1. The first worker destroys a root cache (s) via kmem_cache_destroy.
2. The second worker destroy a child cache (c) which belongs to (s).
I means the two workers have a race condition. They may free a pointer twice.

I think it's not correct. Or I missed something?
pgreco

pgreco

2018-07-28 15:22

developer   ~0032383

@kira, I think that what you are missing is the mutex (memcg_limit_mutex), which makes sure that no 2 threads can run __kmem_cache_destroy_memcg_children at the same time
kira

kira

2018-07-28 16:00

reporter   ~0032384

@pgreco

I know it.
The difference between root cache (s) and child cache (c) is that s->memcg_params->is_root_cache == true and c->memcg_params->is_root_cache == false. When a cache is destroyed by kmem_cache_destroy, memcg_limit_mutex will be locked only if is_root_cache is true.

The problem is that __kmem_cache_destroy_memcg_children waits another worker to finish and destroys c again.

Issue History

Date Modified Username Field Change
2018-07-24 03:29 kira New Issue
2018-07-24 03:29 kira Tag Attached: 3.10.0-862.9.1.el7
2018-07-25 18:48 toracat Status new => assigned
2018-07-25 18:48 toracat Note Added: 0032350
2018-07-25 19:58 pgreco Note Added: 0032352
2018-07-26 02:51 kira Note Added: 0032359
2018-07-26 11:10 pgreco Note Added: 0032361
2018-07-27 06:35 kira File Added: Screenshot from 2018-07-27 14-29-47.png
2018-07-27 06:35 kira Note Added: 0032370
2018-07-27 14:19 toracat Note Added: 0032376
2018-07-28 10:26 pgreco Note Edited: 0032352 View Revisions
2018-07-28 10:40 pgreco Note Added: 0032378
2018-07-28 15:03 kira Note Added: 0032382
2018-07-28 15:22 pgreco Note Added: 0032383
2018-07-28 16:00 kira Note Added: 0032384