View Issue Details

IDProjectCategoryView StatusLast Update
0010242CentOS-7kernelpublic2018-11-01 06:40
Reporterkdion_mz 
PriorityhighSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Product Version 
Target VersionFixed in Version 
Summary0010242: RIP: down_read_trylock+9
DescriptionCluster running HP Vertica experiences frequent (~monthly) kernel panics due to null pointer dereference in down_try_readlock(+0x9/0x30).
Steps To ReproduceUnknown steps to reproduce. Machines are HP DL360g9, with Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz processor, 512GB memory. Active processes during the crash have mostly been vertica, except once when kswapd caused crash. Kernel version 3.10.0-229.4.2.el7.x86_64.
Additional InformationBacktraces, dmesg, etc., from three systems included in attached crash.tar.gz file.
Tagskernel panic
abrt_hash
URL

Activities

kdion_mz

kdion_mz

2016-01-26 04:01

reporter  

crash.tar.gz (114,506 bytes)
sms1123

sms1123

2016-01-27 00:18

reporter   ~0025527

I have about 8 of these dumps to review and I've been trying to work out why the system has been crashing so far I've only been able to see the after effects of whatever the issue is rather than what causes the issue.

crash64> bt
PID: 300 TASK: ffff883f25e26660 CPU: 26 COMMAND: "kswapd0"
 #0 [ffff883f242eb810] machine_kexec at ffffffff8104c6a1
 #1 [ffff883f242eb868] crash_kexec at ffffffff810e2252
 #2 [ffff883f242eb938] oops_end at ffffffff8160d548
 #3 [ffff883f242eb960] no_context at ffffffff815fdf52
 #4 [ffff883f242eb9b0] __bad_area_nosemaphore at ffffffff815fdfe8
 #5 [ffff883f242eb9f8] bad_area_nosemaphore at ffffffff815fe152
 #6 [ffff883f242eba08] __do_page_fault at ffffffff816103ae
 #7 [ffff883f242ebb08] do_page_fault at ffffffff816105ca
 #8 [ffff883f242ebb30] page_fault at ffffffff8160c7c8
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff883f242ebbe0 RFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff880b32303680 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008
    RBP: ffff883f242ebbe0 R8: ffffea0028238520 R9: ffff887eb6d31320
    R10: 000000000005f55d R11: ffffea01037d0600 R12: ffff880b32303681
    R13: ffffea0028238500 R14: 0000000000000008 R15: ffffea0028238500
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
 #9 [ffff883f242ebbe8] page_lock_anon_vma_read at ffffffff8118e245
#10 [ffff883f242ebc18] page_referenced at ffffffff8118e4c7
#11 [ffff883f242ebc90] shrink_active_list at ffffffff8116b1cc
#12 [ffff883f242ebd48] balance_pgdat at ffffffff8116cb68
#13 [ffff883f242ebe20] kswapd at ffffffff8116d0f3
#14 [ffff883f242ebec8] kthread at ffffffff8109739f
#15 [ffff883f242ebf50] ret_from_fork at ffffffff81614d3c

We can tell that the value in rdi is garbage since we shouldn't be trying to derefererence the value 0x8 in rdi:

crash64> dis -l down_read_trylock
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/rwsem.c: 32
0xffffffff8109c380 <down_read_trylock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8109c385 <down_read_trylock+5>: push %rbp
0xffffffff8109c386 <down_read_trylock+6>: mov %rsp,%rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/rwsem.h: 83
0xffffffff8109c389 <down_read_trylock+9>: mov (%rdi),%rax <<<<<<<<<<<<<<<<

This is the call to down_read_trylock:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 455
0xffffffff8118e22b <page_lock_anon_vma_read+59>: test %eax,%eax
0xffffffff8118e22d <page_lock_anon_vma_read+61>: js 0xffffffff8118e21
3 <page_lock_anon_vma_read+35>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 459
0xffffffff8118e22f <page_lock_anon_vma_read+63>: mov -0x1(%r12),%r14
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 458
0xffffffff8118e234 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 460
0xffffffff8118e239 <page_lock_anon_vma_read+73>: add $0x8,%r14
0xffffffff8118e23d <page_lock_anon_vma_read+77>: mov %r14,%rdi
0xffffffff8118e240 <page_lock_anon_vma_read+80>: callq 0xffffffff8109c380 <down_read_trylock>
0xffffffff8118e245 <page_lock_anon_vma_read+85>: test %eax,%eax
0xffffffff8118e247 <page_lock_anon_vma_read+87>: je 0xffffffff8118e260 <page_lock_anon_vma_read+112>

0445 struct anon_vma *page_lock_anon_vma_read(struct page *page)
0446 {
0447 struct anon_vma *anon_vma = NULL;
0448 struct anon_vma *root_anon_vma;
0449 unsigned long anon_mapping;
0450
0451 rcu_read_lock();
0452 anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
0453 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
0454 goto out;
0455 if (!page_mapped(page))
0456 goto out;
0457
0458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
0459 root_anon_vma = ACCESS_ONCE(anon_vma->root);
0460 if (down_read_trylock(&root_anon_vma->rwsem)) {

We crashed calling down_read_trylock in line 160. So the anon_vma is r12-1 (see page_lock_anon_vma_read+63):

That's not in the dump (if it was really the address of a struct anon_vma I believe it should be in the dump):

crash64> anon_vma ffff880b32303680
struct anon_vma struct: page excluded: kernel virtual address: ffff880b32303680 type: "gdb_readmem_callback"
Cannot access memory at address 0xffff880b32303680

The arg to page_lock_anon_vma_read is saved into r13 and not modified by this function (or in down_read_trylock):

crash64> dis -l page_lock_anon_vma_read
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 446
0xffffffff8118e1f0 <page_lock_anon_vma_read>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e1f5 <page_lock_anon_vma_read+5>: push %rbp
0xffffffff8118e1f6 <page_lock_anon_vma_read+6>: mov %rsp,%rbp
0xffffffff8118e1f9 <page_lock_anon_vma_read+9>: push %r14
0xffffffff8118e1fb <page_lock_anon_vma_read+11>: push %r13
0xffffffff8118e1fd <page_lock_anon_vma_read+13>: mov %rdi,%r13 <<<<<<<<
0xffffffff8118e200 <page_lock_anon_vma_read+16>: push %r12
0xffffffff8118e202 <page_lock_anon_vma_read+18>: push %rbx

crash64> page ffffea0028238500
struct page {
  flags = 13510794587668552,
  mapping = 0xffff880b32303681,
  {
    {
      index = 34075086877,
      freelist = 0x7ef08901d,
      pfmemalloc = 29,
      pmd_huge_pte = 0x7ef08901d
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}

Another point if verification is that this is the struct page we are after is that the mapping pointer is to the struct anon_vma that triggered the panic:

crash64> kmem ffffea0028238500
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0028238500 a08e14000 ffff880b32303681 7ef08901d 2 2fffff00080048 uptodate,active,swapbacked

We need to work down the stack to see why it seems to be bad. So where did we start in kswapd:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3270
0xffffffff8116d0c1 <kswapd+321>: callq 0xffffffff81096ff0 <kthread_should_stop>
0xffffffff8116d0c6 <kswapd+326>: test %al,%al
0xffffffff8116d0c8 <kswapd+328>: jne 0xffffffff8116d368 <kswapd+1000>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3278
0xffffffff8116d0ce <kswapd+334>: mov 0x26460(%rbx),%eax
0xffffffff8116d0d4 <kswapd+340>: mov -0x78(%rbp),%r13d
0xffffffff8116d0d8 <kswapd+344>: mov %eax,-0x70(%rbp)
0xffffffff8116d0db <kswapd+347>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3280
0xffffffff8116d0e0 <kswapd+352>: lea -0x64(%rbp),%rdx
0xffffffff8116d0e4 <kswapd+356>: mov %r13d,%esi
0xffffffff8116d0e7 <kswapd+359>: mov %rbx,%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3279
0xffffffff8116d0ea <kswapd+362>: mov %r12d,-0x64(%rbp)
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3280
0xffffffff8116d0ee <kswapd+366>: callq 0xffffffff8116c9a0 <balance_pgdat> <<<<<<<<<<
0xffffffff8116d0f3 <kswapd+371>: mov %eax,%r13d
0xffffffff8116d0f6 <kswapd+374>: mov -0x64(%rbp),%r15d

That's where we called balance_pgdat it's at the end of loop in kswapd:

3273 /*
3274 * We can speed up thawing tasks if we don't call balance_pgdat
3275 * after returning from the refrigerator
3276 */
3277 if (!ret) {
3278 trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
3279 balanced_classzone_idx = classzone_idx;
3280 balanced_order = balance_pgdat(pgdat, order,
3281 &balanced_classzone_idx);
3282 }

balance_pgdat looks like:

2936 static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
2937 int *classzone_idx)

Our args are:

RDI pg_data_t *pgdat rbx
RSI int order r13d
RDX int *classzone_idx address of -0x64(%rbp)

What does balance_pgdat save:

crash64> dis -l balance_pgdat
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9a0 <balance_pgdat>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8116c9a5 <balance_pgdat+5>: push %rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2943
0xffffffff8116c9a6 <balance_pgdat+6>: mov $0x9,%ecx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9ab <balance_pgdat+11>: mov %rsp,%rbp
0xffffffff8116c9ae <balance_pgdat+14>: push %r15
0xffffffff8116c9b0 <balance_pgdat+16>: mov %rdi,%r15
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2943
0xffffffff8116c9b3 <balance_pgdat+19>: lea -0x78(%rbp),%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9b7 <balance_pgdat+23>: push %r14
0xffffffff8116c9b9 <balance_pgdat+25>: push %r13
0xffffffff8116c9bb <balance_pgdat+27>: push %r12
0xffffffff8116c9bd <balance_pgdat+29>: push %rbx

The contents of the stack are:

#12 [ffff883f242ebd48] balance_pgdat at ffffffff8116cb68
    ffff883f242ebd50: ffff883f25e26660 ffff883f242ebe5c
    ffff883f242ebd60: 000000000005fe54 ffff883f25e26660
    ffff883f242ebd70: 0000000003bf6607 ffff883f00000002
    ffff883f242ebd80: 000000000100f080 0000000000000000
    ffff883f242ebd90: 00000000000000d0 0000000000000000
    ffff883f242ebda0: 0000000000006869 0000000000000000
    ffff883f242ebdb0: 000000000005f55d 0000000000000000
    ffff883f242ebdc0: 00000001000000d0 0000000100000001
    ffff883f242ebdd0: 0000000800000000 0000000000000000
    ffff883f242ebde0: 0000000000000000 0000000017731661
    ffff883f242ebdf0: rbx ffff88407ffd9000 r12 0000000000000002
    ffff883f242ebe00: r13 0000000000000000 r14 ffff883f242ebe80
    ffff883f242ebe10: r15 0000000000000002 rbp ffff883f242ebec0
    ffff883f242ebe20: rip ffffffff8116d0f3
#13 [ffff883f242ebe20] kswapd at ffffffff8116d0f3
    ffff883f242ebe28: ffff88407ffff508 ffff883f25e26660
    ffff883f242ebe38: ffff883f25e26660 ffff883f25e26660
    ffff883f242ebe48: 0000000000000000 0000000000000000
    ffff883f242ebe58: 0000000281609cc5 0000000000000000
    ffff883f242ebe68: 0000000000000000 ffff883f25e26660
    ffff883f242ebe78: ffffffff81098350 ffff883f242ebe80
    ffff883f242ebe88: ffff883f242ebe80 0000000017731661
    ffff883f242ebe98: ffff883f2722fd70 ffff88407ffd9000
    ffff883f242ebea8: ffffffff8116cf80 0000000000000000
    ffff883f242ebeb8: 0000000000000000 ffff883f242ebf48
    ffff883f242ebec8: ffffffff8109739f
#14 [ffff883f242ebec8] kthread at ffffffff8109739f

crash64> p/x 0xffff883f242ebec0-0x64
$1 = 0xffff883f242ebe5c
crash64> x/wx $1
0xffff883f242ebe5c: 0x00000002

RDI pg_data_t *pgdat rbx ffff88407ffd9000
RSI int order r13d 0
RDX int *classzone_idx address of -0x64(%rbp) 0xffff883f242ebe5c
                    dereferenced is 2.

The full dump of the pg_data_t structure was omitted from here as it was too long (but can be made available on request).

Now find out where we called shrink_active_list:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2720
0xffffffff8116cb52 <balance_pgdat+434>: lea -0x78(%rbp),%rdx
0xffffffff8116cb56 <balance_pgdat+438>: mov $0x1,%ecx
0xffffffff8116cb5b <balance_pgdat+443>: mov %r14,%rsi
0xffffffff8116cb5e <balance_pgdat+446>: mov $0x20,%edi
0xffffffff8116cb63 <balance_pgdat+451>: callq 0xffffffff8116aff0 <shrink_active_list>
0xffffffff8116cb68 <balance_pgdat+456>: jmp 0xffffffff8116cb20 <balance_pgdat+384>
0xffffffff8116cb6a <balance_pgdat+458>: nopw 0x0(%rax,%rax,1)

That's call is in age_active_anon:

2708 static void age_active_anon(struct zone *zone, struct scan_control *sc)
2709 {
2710 struct mem_cgroup *memcg;
2711
2712 if (!total_swap_pages)
2713 return;
2714
2715 memcg = mem_cgroup_iter(NULL, NULL, NULL);
2716 do {
2717 struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
2718
2719 if (inactive_anon_is_low(lruvec))
2720 shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
2721 sc, LRU_ACTIVE_ANON);
2722
2723 memcg = mem_cgroup_iter(NULL, memcg, NULL);
2724 } while (memcg);
2725 }

That's right near the top of balance_pgdat if we take a slightly large bit of assembler we can see argument marshalling for the calls to mem_cgroup_zone_lruvec and inactive_anon_is_low

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2717
0xffffffff8116cb38 <balance_pgdat+408>: mov %r13,%rsi
0xffffffff8116cb3b <balance_pgdat+411>: mov %rbx,%rdi
0xffffffff8116cb3e <balance_pgdat+414>: callq 0xffffffff811bbcc0 <mem_cgroup_zone_lruvec>

That means that our struct zone * is in rbx and the struct mem_cgroup * is r13

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2719
0xffffffff8116cb43 <balance_pgdat+419>: mov %rax,%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2717
0xffffffff8116cb46 <balance_pgdat+422>: mov %rax,%r14

Since rax (struct lruvec *) is the return value from mem_cgroup_zone_lruvec that gets saved into r14 and primed into rdi for the call to inactive_anon_is_low

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2719
0xffffffff8116cb49 <balance_pgdat+425>: callq 0xffffffff81168310 <inactive_anon_is_low>
0xffffffff8116cb4e <balance_pgdat+430>: test %eax,%eax
0xffffffff8116cb50 <balance_pgdat+432>: je 0xffffffff8116cb20 <balance_pgdat+384>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2720
0xffffffff8116cb52 <balance_pgdat+434>: lea -0x78(%rbp),%rdx
0xffffffff8116cb56 <balance_pgdat+438>: mov $0x1,%ecx
0xffffffff8116cb5b <balance_pgdat+443>: mov %r14,%rsi
0xffffffff8116cb5e <balance_pgdat+446>: mov $0x20,%edi
0xffffffff8116cb63 <balance_pgdat+451>: callq 0xffffffff8116aff0 <shrink_active_list>
0xffffffff8116cb68 <balance_pgdat+456>: jmp 0xffffffff8116cb20 <balance_pgdat+384>
0xffffffff8116cb6a <balance_pgdat+458>: nopw 0x0(%rax,%rax,1)

We don't change those values before the call to shrink_active_list so we should be able to look at the stack and get them:

crash64> dis -l shrink_active_list
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1598
0xffffffff8116aff0 <shrink_active_list>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8116aff5 <shrink_active_list+5>: push %rbp
0xffffffff8116aff6 <shrink_active_list+6>: mov %ecx,%eax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/mmzone.h: 182
0xffffffff8116aff8 <shrink_active_list+8>: sub $0x2,%eax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1598
0xffffffff8116affb <shrink_active_list+11>: mov %rsp,%rbp
0xffffffff8116affe <shrink_active_list+14>: push %r15
0xffffffff8116b000 <shrink_active_list+16>: push %r14
0xffffffff8116b002 <shrink_active_list+18>: mov %rdx,%r14
0xffffffff8116b005 <shrink_active_list+21>: push %r13
0xffffffff8116b007 <shrink_active_list+23>: push %r12
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1602
0xffffffff8116b009 <shrink_active_list+25>: lea -0x60(%rbp),%r12
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1598
0xffffffff8116b00d <shrink_active_list+29>: push %rbx
0xffffffff8116b00e <shrink_active_list+30>: mov %rdi,%rbx

#11 [ffff883f242ebc90] shrink_active_list at ffffffff8116b1cc
    ffff883f242ebc98: ffff883f00000001 000000168115c498
    ffff883f242ebca8: 0000000000000000 ffff88407ffda540
    ffff883f242ebcb8: 00000001ffffffe0 ffff883f7f44a410
    ffff883f242ebcc8: ffff88407ffda000 0000000000000020
    ffff883f242ebcd8: 0000000000000000 ffffea00c49cdd20
    ffff883f242ebce8: ffffea00f3066460 ffff883f242ebcf0
    ffff883f242ebcf8: ffff883f242ebcf0 ffffea00c7a897a0
    ffff883f242ebd08: ffffea00ab036660 0000000017731661
    ffff883f242ebd18: rbx ffff88407ffda000 r12 0000000000000002
    ffff883f242ebd28: r13 ffffc90030186000 r14 ffff883f7f44a410
    ffff883f242ebd38: r15 ffff88407ffd9000 rbp ffff883f242ebe18
    ffff883f242ebd48: rip ffffffff8116cb68
#12 [ffff883f242ebd48] balance_pgdat at ffffffff8116cb68

r13 struct mem_cgroup * ffffc90030186000
rbx struct zone * ffff88407ffda000 (pg_data_t ffff88407ffd9000)
r14 struct lruvec * ffff883f7f44a410

Our zone is the normal zone on node 0:

crash64> zone ffff88407ffda000|grep node
  node = 0,

crash64> mem_cgroup ffffc90030186000
struct mem_cgroup {
  css = {
    cgroup = 0xffff887f23f1e030,
    refcnt = {
      counter = 1
    },
    flags = 3,
    id = 0xffff887f7ec01340,
    dput_work = {
      data = {
        counter = 68719476704
      },
      entry = {
        next = 0xffffc90030186028,
        prev = 0xffffc90030186028
      },
      func = 0xffffffff810e7490 <css_dput_fn>
    }
  },
  res = {
    usage = 0,
    max_usage = 0,
    limit = 9223372036854775807,
    soft_limit = 9223372036854775807,
    failcnt = 0,
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 3971148978,
              tickets = {
                head = 60594,
                tail = 60594
              }
            }
          }
        }
      }
    },
    parent = 0x0
  },
  vmpressure = {
    scanned = 0,
    reclaimed = 0,
    sr_lock = {
      count = {
        counter = 1
      },
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 42205828,
                tickets = {
                  head = 644,
                  tail = 644
                }
              }
            }
          }
        }
      },
      wait_list = {
        next = 0xffffc90030186090,
        prev = 0xffffc90030186090
      },
      owner = 0x0,
      osq = 0x0
    },
    events = {
      next = 0xffffc900301860b0,
      prev = 0xffffc900301860b0
    },
    events_lock = {
      count = {
        counter = 1
      },
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 0,
                tickets = {
                  head = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      wait_list = {
        next = 0xffffc900301860c8,
        prev = 0xffffc900301860c8
      },
      owner = 0x0,
      osq = 0x0
    },
    work = {
      data = {
        counter = 2368
      },
      entry = {
        next = 0xffffc900301860f0,
        prev = 0xffffc900301860f0
      },
      func = 0xffffffff811c05d0 <vmpressure_work_fn>
    }
  },
  {
    memsw = {
      usage = 0,
      max_usage = 0,
      limit = 9223372036854775807,
      soft_limit = 9223372036854775807,
      failcnt = 0,
      lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 0,
                tickets = {
                  head = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      parent = 0x0
    },
    rcu_freeing = {
      next = 0x0,
      func = 0x0
    },
    work_freeing = {
      data = {
        counter = 0
      },
      entry = {
        next = 0x0,
        prev = 0x7fffffffffffffff
      },
      func = 0x7fffffffffffffff
    }
  },
  kmem = {
    usage = 0,
    max_usage = 0,
    limit = 9223372036854775807,
    soft_limit = 9223372036854775807,
    failcnt = 0,
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    parent = 0x0
  },
  use_hierarchy = true,
  kmem_account_flags = 0,
  oom_lock = false,
  under_oom = {
    counter = 0
  },
  refcnt = {
    counter = 42002
  },
  swappiness = 0,
  oom_kill_disable = 0,
  memsw_is_minimum = false,
  thresholds_lock = {
    count = {
      counter = 1
    },
    wait_lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    wait_list = {
      next = 0xffffc900301861a8,
      prev = 0xffffc900301861a8
    },
    owner = 0x0,
    osq = 0x0
  },
  thresholds = {
    primary = 0x0,
    spare = 0x0
  },
  memsw_thresholds = {
    primary = 0x0,
    spare = 0x0
  },
  oom_notify = {
    next = 0xffffc900301861e8,
    prev = 0xffffc900301861e8
  },
  move_charge_at_immigrate = 0,
  moving_account = {
    counter = 0
  },
  move_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  stat = 0x16da8,
  nocpu_base = {
    count = {0, 0, 0, 0, 0},
    events = {0, 0, 0, 0},
    nr_page_events = 0,
    targets = {0, 0, 0}
  },
  pcp_counter_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  dead_count = {
    counter = 0
  },
  tcp_mem = {
    cg_proto = {
      enter_memory_pressure = 0x0,
      memory_allocated = 0x0,
      sockets_allocated = 0x0,
      memory_pressure = 0x0,
      sysctl_mem = 0x0,
      flags = 0,
      memcg = 0x0
    },
    tcp_memory_allocated = {
      usage = 0,
      max_usage = 0,
      limit = 0,
      soft_limit = 0,
      failcnt = 0,
      lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 0,
                tickets = {
                  head = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      parent = 0x0
    },
    tcp_sockets_allocated = {
      lock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      },
      count = 0,
      list = {
        next = 0x0,
        prev = 0x0
      },
      counters = 0x0
    },
    tcp_prot_mem = {0, 0, 0},
    tcp_memory_pressure = 0
  },
  memcg_slab_caches = {
    next = 0x0,
    prev = 0x0
  },
  slab_caches_mutex = {
    count = {
      counter = 0
    },
    wait_lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    wait_list = {
      next = 0x0,
      prev = 0x0
    },
    owner = 0x0,
    osq = 0x0
  },
  kmemcg_id = 0,
  last_scanned_node = 1024,
  scan_nodes = {
    bits = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
  },
  numainfo_events = {
    counter = 950433369
  },
  numainfo_updating = {
    counter = 0
  },
  info = {
    nodeinfo = 0xffffc90030186400
  }
}

Our cgroup is the root cgroup:

crash64> cgroup 0xffff887f23f1e030
struct cgroup {
  flags = 0,
  count = {
    counter = 48
  },
  id = 0,
  sibling = {
    next = 0xffff887f23f1e040,
    prev = 0xffff887f23f1e040
  },
  children = {
    next = 0xffff887f23f1e050,
    prev = 0xffff887f23f1e050
  },
  files = {
    next = 0xffff887f23ed6980,
    prev = 0xffff887f23ed8180
  },
  parent = 0x0,
  dentry = 0xffff887f26af06c0,
  name = 0xffffffff8194bcf0 <root_cgroup_name>,
  subsys = {0x0, 0x0, 0x0, 0xffffc90030186000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
  root = 0xffff887f23f1e000,
  css_sets = {
    next = 0xffff8849229b7380,
    prev = 0xffff887f23ed6940
  },
  allcg_node = {
    next = 0xffff887f23f1e1d8,
    prev = 0xffff887f23f1e1d8
  },
  cft_q_node = {
    next = 0x0,
    prev = 0x0
  },
  release_list = {
    next = 0xffff887f23f1e118,
    prev = 0xffff887f23f1e118
  },
  pidlists = {
    next = 0xffff887f23f1e128,
    prev = 0xffff887f23f1e128
  },
  pidlist_mutex = {
    count = {
      counter = 1
    },
    wait_lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    wait_list = {
      next = 0xffff887f23f1e140,
      prev = 0xffff887f23f1e140
    },
    owner = 0x0,
    osq = 0x0
  },
  callback_head = {
    next = 0x0,
    func = 0x0
  },
  free_work = {
    data = {
      counter = 68719476704
    },
    entry = {
      next = 0xffff887f23f1e178,
      prev = 0xffff887f23f1e178
    },
    func = 0xffffffff810e92d0 <cgroup_free_fn>
  },
  event_list = {
    next = 0xffff887f23f1e190,
    prev = 0xffff887f23f1e190
  },
  event_list_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  xattrs = {
    head = {
      next = 0xffff887f23f1e1a8,
      prev = 0xffff887f23f1e1a8
    },
    lock = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    }
  }
}

Anyway so how did we call shrink_active_list:

2720 shrink_active_list(SWAP_CLUSTER_MAX, lruvec,
2721 sc, LRU_ACTIVE_ANON);

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2720
0xffffffff8116cb52 <balance_pgdat+434>: lea -0x78(%rbp),%rdx
0xffffffff8116cb56 <balance_pgdat+438>: mov $0x1,%ecx
0xffffffff8116cb5b <balance_pgdat+443>: mov %r14,%rsi
0xffffffff8116cb5e <balance_pgdat+446>: mov $0x20,%edi
0xffffffff8116cb63 <balance_pgdat+451>: callq 0xffffffff8116aff0 <shrink_active_list>
0xffffffff8116cb68 <balance_pgdat+456>: jmp 0xffffffff8116cb20 <balance_pgdat+384>

For rbp we need to look at how it's saved in the callers stack frame Since it's from balance_pgdat:

#11 [ffff883f242ebc90] shrink_active_list at ffffffff8116b1cc
    ffff883f242ebc98: ffff883f00000001 000000168115c498
    ffff883f242ebca8: 0000000000000000 ffff88407ffda540
    ffff883f242ebcb8: 00000001ffffffe0 ffff883f7f44a410
    ffff883f242ebcc8: ffff88407ffda000 0000000000000020
    ffff883f242ebcd8: 0000000000000000 ffffea00c49cdd20
    ffff883f242ebce8: ffffea00f3066460 ffff883f242ebcf0
    ffff883f242ebcf8: ffff883f242ebcf0 ffffea00c7a897a0
    ffff883f242ebd08: ffffea00ab036660 0000000017731661
    ffff883f242ebd18: ffff88407ffda000 0000000000000002
    ffff883f242ebd28: ffffc90030186000 ffff883f7f44a410
    ffff883f242ebd38: ffff88407ffd9000 rbp ffff883f242ebe18
    ffff883f242ebd48: rip ffffffff8116cb68
#12 [ffff883f242ebd48] balance_pgdat at ffffffff8116cb68
    ffff883f242ebd50: ffff883f25e26660 ffff883f242ebe5c
    ffff883f242ebd60: 000000000005fe54 ffff883f25e26660
    ffff883f242ebd70: 0000000003bf6607 ffff883f00000002
    ffff883f242ebd80: 000000000100f080 0000000000000000
    ffff883f242ebd90: 00000000000000d0 0000000000000000
    ffff883f242ebda0: 0000000000006869 0000000000000000
    ffff883f242ebdb0: 000000000005f55d 0000000000000000
    ffff883f242ebdc0: 00000001000000d0 0000000100000001
    ffff883f242ebdd0: 0000000800000000 0000000000000000
    ffff883f242ebde0: 0000000000000000 0000000017731661
    ffff883f242ebdf0: ffff88407ffd9000 0000000000000002
    ffff883f242ebe00: 0000000000000000 ffff883f242ebe80
    ffff883f242ebe10: 0000000000000002 ffff883f242ebec0
    ffff883f242ebe20: ffffffff8116d0f3
#13 [ffff883f242ebe20] kswapd at ffffffff8116d0f3

We need the rbp value saved by the call to shrink_active_list from balance_pgdat:

crash64> p/x 0xffff883f242ebe18-0x78
$2 = 0xffff883f242ebda0

edi SWAP_CLUSTER_MAX (32)
rsi (struct lruvec *) ffff883f7f44a410
rdx rbp-0x78 0xffff883f242ebcd0
ecx 1

crash64> scan_control 0xffff883f242ebda0
struct scan_control {
  nr_scanned = 26729,
  nr_reclaimed = 0,
  nr_to_reclaim = 390493,
  hibernation_mode = 0,
  gfp_mask = 208,
  may_writepage = 1,
  may_unmap = 1,
  may_swap = 1,
  order = 0,
  priority = 8,
  target_mem_cgroup = 0x0,
  nodemask = 0x0
}

#define SWAP_CLUSTER_MAX 32UL

edi SWAP_CLUSTER_MAX
rsi (struct lruvec *) ffff883f7f44a410
rdx rbp-0x78 0xffff883f242ebcd0
ecx 1

In shrink_active_list we are at line 1651:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1651
0xffffffff8116b1ba <shrink_active_list+458>: mov 0x38(%r14),%rdx
0xffffffff8116b1be <shrink_active_list+462>: lea -0x68(%rbp),%rcx
0xffffffff8116b1c2 <shrink_active_list+466>: xor %esi,%esi
0xffffffff8116b1c4 <shrink_active_list+468>: mov %r15,%rdi
0xffffffff8116b1c7 <shrink_active_list+471>: callq 0xffffffff8118e300 <page_referenced>
0xffffffff8116b1cc <shrink_active_list+476>: test %eax,%eax
0xffffffff8116b1ce <shrink_active_list+478>: je 0xffffffff8116b168 <shrink_active_list+376>

1594 static void shrink_active_list(unsigned long nr_to_scan,
1595 struct lruvec *lruvec,
1596 struct scan_control *sc,
1597 enum lru_list lru)
1598 {
1599 unsigned long nr_taken;
1600 unsigned long nr_scanned;
1601 unsigned long vm_flags;
1602 LIST_HEAD(l_hold); /* The pages which were snipped off */
1603 LIST_HEAD(l_active);
1604 LIST_HEAD(l_inactive);
1605 struct page *page;
1606 struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
1607 unsigned long nr_rotated = 0;
1608 isolate_mode_t isolate_mode = 0;
1609 int file = is_file_lru(lru);
1610 struct zone *zone = lruvec_zone(lruvec);
1611
1612 lru_add_drain();
1613
1614 if (!sc->may_unmap)
1615 isolate_mode |= ISOLATE_UNMAPPED;
1616 if (!sc->may_writepage)
1617 isolate_mode |= ISOLATE_CLEAN;
1618
1619 spin_lock_irq(&zone->lru_lock);
1620
1621 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
1622 &nr_scanned, sc, isolate_mode, lru);
1623 if (global_reclaim(sc))
1624 zone->pages_scanned += nr_scanned;
1625
1626 reclaim_stat->recent_scanned[file] += nr_taken;
1627
1628 __count_zone_vm_events(PGREFILL, zone, nr_scanned);
1629 __mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
1630 __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
1631 spin_unlock_irq(&zone->lru_lock);
1632
1633 while (!list_empty(&l_hold)) {
1634 cond_resched();
1635 page = lru_to_page(&l_hold);
1636 list_del(&page->lru);
1637
1638 if (unlikely(!page_evictable(page))) {
1639 putback_lru_page(page);
1640 continue;
1641 }
1642
1643 if (unlikely(buffer_heads_over_limit)) {
1644 if (page_has_private(page) && trylock_page(page)) {
1645 if (page_has_private(page))
1646 try_to_release_page(page, 0);
1647 unlock_page(page);
1648 }
1649 }
1650
1651 if (page_referenced(page, 0, sc->target_mem_cgroup,
1652 &vm_flags)) {

So like the others we made a call to page_referenced.

1653 nr_rotated += hpage_nr_pages(page);
1654 /*
1655 * Identify referenced, file-backed active pages and
1656 * give them one more trip around the active list. So
1657 * that executable code get better chances to stay in
1658 * memory under moderate memory pressure. Anon pages
1659 * are not likely to be evicted by use-once streaming
1660 * IO, plus JVM can create lots of anon VM_EXEC pages,
1661 * so we ignore them here.
1662 */
1663 if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
1664 list_add(&page->lru, &l_active);
1665 continue;
1666 }
1667 }
1668
1669 ClearPageActive(page); /* we are de-activating */
1670 list_add(&page->lru, &l_inactive);
1671 }

So we need to work out what l_hold is and see what is still on that list. We've worked that out previously (for a different dump) our l_hold list is in r12:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1633
0xffffffff8116b217 <shrink_active_list+551>: cmp %r12,-0x60(%rbp)
0xffffffff8116b21b <shrink_active_list+555>: jne 0xffffffff8116b187 <shrink_active_list+407>
0xffffffff8116b221 <shrink_active_list+561>: nopl 0x0(%rax)

1633 while (!list_empty(&l_hold)) {

l_hold is inited with the following macro:

0021 #define LIST_HEAD(name) \
0022 struct list_head name = LIST_HEAD_INIT(name)

So it just a struct list and list_empty compares head to head->next:

0186 static inline int list_empty(const struct list_head *head)
0187 {
0188 return head->next == head;
0189 }

So at that point r12 contains the head address and *(rbp-0x60) is the next pointer.

#10 [ffff883f242ebc18] page_referenced at ffffffff8118e4c7
    ffff883f242ebc20: ffff887f186ba540 00000007f40cc8a9
    ffff883f242ebc30: ffff883f242ebcd8 0000000100000000
    ffff883f242ebc40: 00000007f40c987e 00007f40c987e000
    ffff883f242ebc50: 000000007f44a410 0000000017731661
    ffff883f242ebc60: rbx ffffea0028238520 r12 ffff883f242ebce0
    ffff883f242ebc70: r13 0000000000000002 r14 ffff883f242ebda0
    ffff883f242ebc80: r15 ffffea0028238500 rbp ffff883f242ebd40
    ffff883f242ebc90: rip ffffffff8116b1cc
#11 [ffff883f242ebc90] shrink_active_list at ffffffff8116b1cc

Look at what is saved by page_referenced:

crash64> dis -l page_referenced
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 849
0xffffffff8118e300 <page_referenced>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e305 <page_referenced+5>: push %rbp
0xffffffff8118e306 <page_referenced+6>: mov %rcx,%rax
0xffffffff8118e309 <page_referenced+9>: mov %rsp,%rbp
0xffffffff8118e30c <page_referenced+12>: push %r15
0xffffffff8118e30e <page_referenced+14>: push %r14
0xffffffff8118e310 <page_referenced+16>: push %r13
0xffffffff8118e312 <page_referenced+18>: push %r12
0xffffffff8118e314 <page_referenced+20>: push %rbx
0xffffffff8118e315 <page_referenced+21>: mov %rdi,%rbx
0xffffffff8118e318 <page_referenced+24>: sub $0x40,%rsp
0xffffffff8118e31c <page_referenced+28>: mov %rcx,-0x58(%rbp)

crash64> list_head ffff883f242ebce0
struct list_head {
  next = 0xffffea00c49cdd20,
  prev = 0xffffea00f3066460
}

and r12 is ffff883f242ebce0 so the list is not empty:

crash64> list 0xffffea00c49cdd20
ffffea00c49cdd20
ffffea00c4a00c60
ffffea00c4a446a0
ffffea00c4a48ca0
ffffea00c4a912a0
ffffea00c4b24260
ffffea00c4be8fe0
ffffea00c4cb4860
ffffea00c4e0d9a0
ffffea00c4e0da60
ffffea00c5172020
ffffea00c56dbc20
ffffea00c575eb20
ffffea00c5856260
ffffea00c59da2a0
ffffea00c5a8f4e0
ffffea00c5ce6f20
ffffea00c6211ae0
ffffea00c8ddff20
ffffea00d2690ba0
ffffea00db855160
ffffea00dc1ca060
ffffea00f3066460
ffff883f242ebce0

0099 #define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))

crash64> struct -ox page
struct page {
   [0x0] unsigned long flags;
   [0x8] struct address_space *mapping;
         struct {
             union {
  [0x10] unsigned long index;
  [0x10] void *freelist;
  [0x10] bool pfmemalloc;
  [0x10] pgtable_t pmd_huge_pte;
             };
             union {
  [0x18] unsigned long counters;
                 struct {
                     union {
  [0x18] atomic_t _mapcount;
                         struct {
  [0x18] unsigned int inuse : 16;
  [0x18] unsigned int objects : 15;
  [0x18] unsigned int frozen : 1;
                         };
  [0x18] int units;
                     };
  [0x1c] atomic_t _count;
                 };
             };
         };
         union {
  [0x20] struct list_head lru;
             struct {
  [0x20] struct page *next;
  [0x28] int pages;
  [0x2c] int pobjects;
             };
  [0x20] struct list_head list;
  [0x20] struct slab *slab_page;
         };
         union {
  [0x30] unsigned long private;
  [0x30] spinlock_t ptl;
  [0x30] struct kmem_cache *slab_cache;
  [0x30] struct page *first_page;
         };
}

By this point we've removed the struct page from the list (note that the list is 0x20 bytes into the struct lage in the struct list_head lru above). Let's look at what else is in the lru list to see if that seems sensible.

The next entry in the list looks valid:

crash64> struct page ffffea00c49cdd00
struct page {
  flags = 13510794587668568,
  mapping = 0xffff883b56540001,
  {
    {
      index = 447,
      freelist = 0x1bf,
      pfmemalloc = 191,
      pmd_huge_pte = 0x1bf
    },
    {
      counters = 12884901889,
      {
        {
          _mapcount = {
            counter = 1
          },
          {
            inuse = 1,
            objects = 0,
            frozen = 0
          },
          units = 1
        },
        _count = {
          counter = 3
        }
      }
    }
  },
  {
    lru = {
      next = 0xffffea00c4a00c60,
      prev = 0xffff883f242ebce0
    },
    {
      next = 0xffffea00c4a00c60,
      pages = 607042784,
      pobjects = -30657
    },
    list = {
      next = 0xffffea00c4a00c60,
      prev = 0xffff883f242ebce0
    },
    slab_page = 0xffffea00c4a00c60
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}
crash64> struct anon_vma 0xffff883b56540000
struct anon_vma {
  root = 0xffff883b56540000,
  rwsem = {
    count = 0,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 917518,
          tickets = {
            head = 14,
            tail = 14
          }
        }
      }
    },
    wait_list = {
      next = 0xffff883b56540018,
      prev = 0xffff883b56540018
    }
  },
  refcount = {
    counter = 3
  },
  rb_root = {
    rb_node = 0xffff8843edd00b20
  }
}
crash64> kmem 0xffff883b56540000
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff883f7f46e000 anon_vma 56 21581 28288 442 4k
  SLAB MEMORY NODE TOTAL ALLOCATED FREE
  ffffea00ed595000 ffff883b56540000 0 64 58 6
  FREE / [ALLOCATED]
  [ffff883b56540000]

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea00ed595000 3b56540000 0 ffff883b56540b40 1 2fffff00000080 slab

It's got a mapping pointer that is an anon_vma and is from the anon_vma pool. Compare that to the mapping pointer we failed on: 0xffff880b32303681 which after removing the 1 (for anon vma regions) we get this:

crash64> kmem 0xffff880b32303680
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea002cc8c0c0 b32303000 ffff883ea4d0f941 7f57b2f53 1 2fffff00080068 uptodate,lru,active,swapbacked

Which looks like it's a page in use by something completely different (it's definitely not in the slab). I believe that someone else has unmapped the anon_vma before we get to process it from the LRU list and the memory has been reused (by something else reclaiming memory and then using it for something else).

There's only one other CPU doing anything at this time it's also kswapd and it's also trying to shink a zone (but the inactive list not the active list):

PID: 301 TASK: ffff883f25e271c0 CPU: 37 COMMAND: "kswapd1"
 #0 [ffff887f7f445e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff887f7f445e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff887f7f445ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff887f7f445ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: shrink_page_list+1406]
    RIP: ffffffff8116a06e RSP: ffff883f242ef9f8 RFLAGS: 00000246
    RAX: ffff883f242efb80 RBX: ffff883f242efda0 RCX: 0000000000000000
    RDX: ffffea01f5584a00 RSI: ffffea01f0b040a0 RDI: 0000000000000000
    RBP: ffff883f242efb20 R8: ffff887f7f456a78 R9: ffff88807fbc0ba0
    R10: 0000000000000010 R11: 0000000000000006 R12: ffffea01f5584a20
    R13: ffff883f242efba8 R14: ffffea01f5584a00 R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883f242ef9f8] shrink_page_list at ffffffff8116a06e
 #5 [ffff883f242efb28] shrink_inactive_list at ffffffff8116ac7a
 #6 [ffff883f242efbf0] shrink_lruvec at ffffffff8116b73d
 #7 [ffff883f242efcf0] shrink_zone at ffffffff8116bb76
 #8 [ffff883f242efd48] balance_pgdat at ffffffff8116ce2c
 #9 [ffff883f242efe20] kswapd at ffffffff8116d0f3
#10 [ffff883f242efec8] kthread at ffffffff8109739f
#11 [ffff883f242eff50] ret_from_fork at ffffffff81614d3c

There's only one other thread in the dump I was looking at that was in similar code:

PID: 301 TASK: ffff883f25e271c0 CPU: 37 COMMAND: "kswapd1"
 #0 [ffff887f7f445e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff887f7f445e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff887f7f445ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff887f7f445ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: shrink_page_list+1406]
    RIP: ffffffff8116a06e RSP: ffff883f242ef9f8 RFLAGS: 00000246
    RAX: ffff883f242efb80 RBX: ffff883f242efda0 RCX: 0000000000000000
    RDX: ffffea01f5584a00 RSI: ffffea01f0b040a0 RDI: 0000000000000000
    RBP: ffff883f242efb20 R8: ffff887f7f456a78 R9: ffff88807fbc0ba0
    R10: 0000000000000010 R11: 0000000000000006 R12: ffffea01f5584a20
    R13: ffff883f242efba8 R14: ffffea01f5584a00 R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883f242ef9f8] shrink_page_list at ffffffff8116a06e
 #5 [ffff883f242efb28] shrink_inactive_list at ffffffff8116ac7a
 #6 [ffff883f242efbf0] shrink_lruvec at ffffffff8116b73d
 #7 [ffff883f242efcf0] shrink_zone at ffffffff8116bb76
 #8 [ffff883f242efd48] balance_pgdat at ffffffff8116ce2c
 #9 [ffff883f242efe20] kswapd at ffffffff8116d0f3
#10 [ffff883f242efec8] kthread at ffffffff8109739f
#11 [ffff883f242eff50] ret_from_fork at ffffffff81614d3c

Let's start with the args to balance_pgdat the arguments marshalled by kswapd are:

0xffffffff8116d0e0 <kswapd+352>: lea -0x64(%rbp),%rdx
0xffffffff8116d0e4 <kswapd+356>: mov %r13d,%esi
0xffffffff8116d0e7 <kswapd+359>: mov %rbx,%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 3279

And it saves the following registers to the stack:

crash64> dis -l balance_pgdat
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9a0 <balance_pgdat>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8116c9a5 <balance_pgdat+5>: push %rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2943
0xffffffff8116c9a6 <balance_pgdat+6>: mov $0x9,%ecx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9ab <balance_pgdat+11>: mov %rsp,%rbp
0xffffffff8116c9ae <balance_pgdat+14>: push %r15
0xffffffff8116c9b0 <balance_pgdat+16>: mov %rdi,%r15
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2943
0xffffffff8116c9b3 <balance_pgdat+19>: lea -0x78(%rbp),%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 2938
0xffffffff8116c9b7 <balance_pgdat+23>: push %r14
0xffffffff8116c9b9 <balance_pgdat+25>: push %r13
0xffffffff8116c9bb <balance_pgdat+27>: push %r12
0xffffffff8116c9bd <balance_pgdat+29>: push %rbx
0xffffffff8116c9be <balance_pgdat+30>: sub $0xa0,%rsp

2936 static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
2937 int *classzone_idx)

Our args are:

RDI pg_data_t *pgdat rbx
RSI int order r13d
RDX int *classzone_idx address of -0x64(%rbp)

 #8 [ffff883f242efd48] balance_pgdat at ffffffff8116ce2c
    ffff883f242efd50: ffff883f25e271c0 ffff883f242efe5c
    ffff883f242efd60: 0000000000000000 ffff883f25e271c0
    ffff883f242efd70: 0000000003d029aa ffff883f242efe60
    ffff883f242efd80: 000000000100f080 0000000000000000
    ffff883f242efd90: 00000000000000d0 0000000000000000
    ffff883f242efda0: 00000000000038c0 0000000000000000
    ffff883f242efdb0: 0000000000060192 0000000000000000
    ffff883f242efdc0: 00000001000000d0 0000000100000001
    ffff883f242efdd0: 0000000800000000 0000000000000000
    ffff883f242efde0: 0000000000000000 000000004cdf9afd
    ffff883f242efdf0: rbx ffff88807ffd5000 r12 0000000000000002
    ffff883f242efe00: r13 0000000000000000 r14 ffff883f242efe80
    ffff883f242efe10: r15 0000000000000002 rbp ffff883f242efec0
    ffff883f242efe20: rip ffffffff8116d0f3
 #9 [ffff883f242efe20] kswapd at ffffffff8116d0f3

crash64> p/x 0xffff883f242efec0-0x64
$1 = 0xffff883f242efe5c
crash64> x/wx $1
0xffff883f242efe5c: 0x00000002

That gives us:

RDI pg_data_t *pgdat ffff88807ffd5000
RSI int order 0
RDX int *classzone_idx address of -0x64(%rbp) 0xffff883f242efe5c
                    dereferenced gives us 2 (ZONE_NORMAL)

This is for node 1:

crash64> pg_data_t ffff88807ffd5000|grep node_id
  node_id = 1,

The other thread (kswapd0) is looking at memory on the other node:

crash64> pg_data_t ffff88407ffd9000|grep node_id
  node_id = 0,

The two of them shouldn't be interfering with each other.

Is there anyone out there aware of any issue related to why we would suddenly have a bad mapping pointer to a struct anon_vma? As far as I could tell the code handling freeing anon_vmas hasn't changed much upstream to what is in this kernel.
sms1123

sms1123

2016-01-27 00:28

reporter   ~0025528

Other interesting things about the bug:

1) It happens with or without THPs enabled
2) It's not just a null pointer dereference (mentioned by the reporter of the bug). I've seen the bug show as a GPF as well. The key thing is that the struct page being used as a mapping pointer with the bit set to say it's an anon_vma but the page it points to is not allocated from the anon_vma kmem cache. The crash is somewhat random because it depends on what is on the page that the mapping pointer refers to.
3) The same issue has been seen upstream in RHEL here:

https://bugzilla.redhat.com/show_bug.cgi?id=1091830

(Look towards the end for the final panic after the OOM) It wasn't related to a customer call (internal usage) so it doesn't appear to be really investigated at all.

It appears to be some kind of race in page_lock_anon_vma_read with something else.
sms1123

sms1123

2016-01-27 00:44

reporter   ~0025529

A different dump (still the same kernel version the bug is reported with) but this time with THP enabled:

crash64> bt
PID: 15951 TASK: ffff887ece808b60 CPU: 30 COMMAND: "vertica"
 #0 [ffff887eda68b4a0] machine_kexec at ffffffff8104c6a1
 #1 [ffff887eda68b4f8] crash_kexec at ffffffff810e2252
 #2 [ffff887eda68b5c8] oops_end at ffffffff8160d548
 #3 [ffff887eda68b5f0] no_context at ffffffff815fdf52
 #4 [ffff887eda68b640] __bad_area_nosemaphore at ffffffff815fdfe8
 #5 [ffff887eda68b688] bad_area_nosemaphore at ffffffff815fe152
 #6 [ffff887eda68b698] __do_page_fault at ffffffff816103ae
 #7 [ffff887eda68b798] do_page_fault at ffffffff816105ca
 #8 [ffff887eda68b7c0] page_fault at ffffffff8160c7c8
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff887eda68b870 RFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff8821bd6906c0 RCX: ffff8821bd6906c0
    RDX: 0000000000000001 RSI: 0000000000000301 RDI: fffffffffffffe08
    RBP: ffff887eda68b870 R8: 00000000fffffe7f R9: ffff8821bd6906c0
    R10: ffff88807ffd6000 R11: 0000000000000017 R12: ffff8821bd6906c1
    R13: ffffea0153e09ec0 R14: fffffffffffffe08 R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
 #9 [ffff887eda68b878] page_lock_anon_vma_read at ffffffff8118e245
#10 [ffff887eda68b8a8] try_to_unmap_anon at ffffffff8118e671
#11 [ffff887eda68b8f8] try_to_unmap at ffffffff8118e7bd
#12 [ffff887eda68b910] migrate_pages at ffffffff811b1e2b
#13 [ffff887eda68b9b0] compact_zone at ffffffff8117aff9
#14 [ffff887eda68ba00] compact_zone_order at ffffffff8117b1fc
#15 [ffff887eda68baa8] try_to_compact_pages at ffffffff8117b5b1
#16 [ffff887eda68bb08] __alloc_pages_direct_compact at ffffffff81600286
#17 [ffff887eda68bb68] __alloc_pages_nodemask at ffffffff81160b98
#18 [ffff887eda68bca0] alloc_pages_vma at ffffffff811a2a2a
#19 [ffff887eda68bd08] do_huge_pmd_wp_page at ffffffff811b77d8
#20 [ffff887eda68bd98] handle_mm_fault at ffffffff81182b64
#21 [ffff887eda68be28] __do_page_fault at ffffffff816101c6
#22 [ffff887eda68bf28] do_page_fault at ffffffff816105ca
#23 [ffff887eda68bf50] page_fault at ffffffff8160c7c8
    RIP: 0000000000c97926 RSP: 00007f5421151fb0 RFLAGS: 00010246
    RAX: 000000000029087a RBX: 000000000000086a RCX: 0000000000000000
    RDX: 00007ec2f89caba4 RSI: 0000000000003f8a RDI: 40000a514420ef4a
    RBP: 00007f5421152140 R8: 00007ec8eb2da260 R9: 00000000003fffff
    R10: 000000000000fbe8 R11: 0000000000003f90 R12: 00007f3d9fe318a0
    R13: 00007eb8c81fd010 R14: 00000000000007f1 R15: 00007f5421154aa0
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

Let's see what is going on in the caller:

0438 /*
0439 * Similar to page_get_anon_vma() except it locks the anon_vma.
0440 *
0441 * Its a little more complex as it tries to keep the fast path to a single
0442 * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
0443 * reference like with page_get_anon_vma() and then block on the mutex.
0444 */
0445 struct anon_vma *page_lock_anon_vma_read(struct page *page)
0446 {
0447 struct anon_vma *anon_vma = NULL;
0448 struct anon_vma *root_anon_vma;
0449 unsigned long anon_mapping;
0450
0451 rcu_read_lock();
0452 anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
0453 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
0454 goto out;
0455 if (!page_mapped(page))
0456 goto out;
0457
0458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
0459 root_anon_vma = ACCESS_ONCE(anon_vma->root);
0460 if (down_read_trylock(&root_anon_vma->rwsem)) {
0461 /*
0462 * If the page is still mapped, then this anon_vma is still
0463 * its anon_vma, and holding the mutex ensures that it will
0464 * not go away, see anon_vma_free().
0465 */
0466 if (!page_mapped(page)) {
0467 up_read(&root_anon_vma->rwsem);
0468 anon_vma = NULL;
0469 }
0470 goto out;
0471 }
0472
0473 /* trylock failed, we got to sleep */
0474 if (!atomic_inc_not_zero(&anon_vma->refcount)) {
0475 anon_vma = NULL;
0476 goto out;
0477 }
0478
0479 if (!page_mapped(page)) {
0480 put_anon_vma(anon_vma);
0481 anon_vma = NULL;
0482 goto out;
0483 }
0484
0485 /* we pinned the anon_vma, its safe to sleep */
0486 rcu_read_unlock();
0487 anon_vma_lock_read(anon_vma);
0488
0489 if (atomic_dec_and_test(&anon_vma->refcount)) {
0490 /*
0491 * Oops, we held the last refcount, release the lock
0492 * and bail -- can't simply use put_anon_vma() because
0493 * we'll deadlock on the anon_vma_lock_write() recursion.
0494 */
0495 anon_vma_unlock_read(anon_vma);
0496 __put_anon_vma(anon_vma);
0497 anon_vma = NULL;
0498 }
0499
0500 return anon_vma;
0501
0502 out:
0503 rcu_read_unlock();
0504 return anon_vma;
0505 }

In the dissassembled output of page_lock_anon_vma_read (not shown) rdi is moved to r13 and not modifed.

so let's look at the caller and see if it's saved to the stack or not:

crash64> dis -l down_read_trylock
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/rwsem.c: 32
0xffffffff8109c380 <down_read_trylock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8109c385 <down_read_trylock+5>: push %rbp
0xffffffff8109c386 <down_read_trylock+6>: mov %rsp,%rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/rwsem.h: 83
0xffffffff8109c389 <down_read_trylock+9>: mov (%rdi),%rax
0xffffffff8109c38c <down_read_trylock+12>: mov %rax,%rdx
0xffffffff8109c38f <down_read_trylock+15>: add $0x1,%rdx
0xffffffff8109c393 <down_read_trylock+19>: jle 0xffffffff8109c39c <down_read_trylock+28>
0xffffffff8109c395 <down_read_trylock+21>: lock cmpxchg %rdx,(%rdi)
0xffffffff8109c39a <down_read_trylock+26>: jne 0xffffffff8109c38c <down_read_trylock+12>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/rwsem.h: 96
0xffffffff8109c39c <down_read_trylock+28>: not %rax
0xffffffff8109c39f <down_read_trylock+31>: shr $0x3f,%rax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/rwsem.c: 38
0xffffffff8109c3a3 <down_read_trylock+35>: pop %rbp
0xffffffff8109c3a4 <down_read_trylock+36>: retq

No so we can get it from the register context in the stack trace. This is our struct page:

crash64> struct page ffffea0153e09ec0
struct page {
  flags = 31525193097150537,
  mapping = 0xffff8821bd6906c1,
  {
    {
      index = 34125827102,
      freelist = 0x7f20ecc1e,
      pfmemalloc = 30,
      pmd_huge_pte = 0x7f20ecc1e
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xffffea0153e09ea0,
      prev = 0xffff887eda68ba20
    },
    {
      next = 0xffffea0153e09ea0,
      pages = -630670816,
      pobjects = -30594
    },
    list = {
      next = 0xffffea0153e09ea0,
      prev = 0xffff887eda68ba20
    },
    slab_page = 0xffffea0153e09ea0
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}
crash64> kmem ffffea0153e09ec0
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0153e09ec0 54f827b000 ffff8821bd6906c1 7f20ecc1e 2 6fffff00080049 locked,uptodate,active,swapbacked

Mixing the data and the source:

0438 /*
0439 * Similar to page_get_anon_vma() except it locks the anon_vma.
0440 *
0441 * Its a little more complex as it tries to keep the fast path to a single
0442 * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
0443 * reference like with page_get_anon_vma() and then block on the mutex.
0444 */
0445 struct anon_vma *page_lock_anon_vma_read(struct page *page)
0446 {
0447 struct anon_vma *anon_vma = NULL;
0448 struct anon_vma *root_anon_vma;
0449 unsigned long anon_mapping;
0450
0451 rcu_read_lock();
0452 anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);

anon_mapping is 0xffff8821bd6906c1

0453 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
0454 goto out;

This is true so we don't go to out (PAGE_MAPPING_ANON is 1):

crash64> p 0xffff8821bd6906c1&(1|2)
$2 = 1

0455 if (!page_mapped(page))
0456 goto out;

This is a test for the count in _mapcount to be >=0 and it is.

0457
0458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);

anon_vma is 0xffff8821bd6906c0

Unfortunately the address is not present in the dump:

crash64> struct anon_vma 0xffff8821bd6906c0
struct anon_vma struct: page excluded: kernel virtual address: ffff8821bd6906c0 type: "gdb_readmem_callback"
Cannot access memory at address 0xffff8821bd6906c0

It's the correct address that has been loaded though we can see from this instruction:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 458
0xffffffff8118e234 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx

That rbx should hold the value of anon_vma - it does since down_read_trylock doesn't modify rbx. There's still nothing to explain why ->mapping is set to the value it is though.

Looking at the first dump analysis we can see that it would appear that we have a bad pointer - if it was something allocated by the kernel (and it hadn't been freed yet) we would expect to have seen kmem saying it was from the anon_vma cache (if it had been freed and everything else on that page had also been freed the page could have been reclaimed that could explain why it had been reused - I've no idea if that could actually be the issue though).
sms1123

sms1123

2016-01-27 00:52

reporter   ~0025530

A third dump - similar again but this time we get a GPF (similar to the upstream bug refered to in a previous note):

crash64> bt
PID: 16939 TASK: ffff8843570b5b00 CPU: 3 COMMAND: "vertica"
 #0 [ffff8849f6f1f610] machine_kexec at ffffffff8104c6a1
 #1 [ffff8849f6f1f668] crash_kexec at ffffffff810e2252
 #2 [ffff8849f6f1f738] oops_end at ffffffff8160d548
 #3 [ffff8849f6f1f760] die at ffffffff810173eb
 #4 [ffff8849f6f1f790] do_general_protection at ffffffff8160ce4e
 #5 [ffff8849f6f1f7c0] general_protection at ffffffff8160c768
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff8849f6f1f870 RFLAGS: 00010206
    RAX: 0000000000000000 RBX: ffff8843a01c0ac0 RCX: ffff8843a01c0ac0
    RDX: 0000000000000001 RSI: 0000000000000301 RDI: 353338353931633f
    RBP: ffff8849f6f1f870 R8: 0000000033356461 R9: ffff8843a01c0ac0
    R10: ffff88807ffd6000 R11: 0000000000000017 R12: ffff8843a01c0ac1
    R13: ffffea0105a09680 R14: 353338353931633f R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
 #6 [ffff8849f6f1f878] page_lock_anon_vma_read at ffffffff8118e245
 #7 [ffff8849f6f1f8a8] try_to_unmap_anon at ffffffff8118e671
 #8 [ffff8849f6f1f8f8] try_to_unmap at ffffffff8118e7bd
 #9 [ffff8849f6f1f910] migrate_pages at ffffffff811b1e2b
#10 [ffff8849f6f1f9b0] compact_zone at ffffffff8117aff9
#11 [ffff8849f6f1fa00] compact_zone_order at ffffffff8117b1fc
#12 [ffff8849f6f1faa8] try_to_compact_pages at ffffffff8117b5b1
#13 [ffff8849f6f1fb08] __alloc_pages_direct_compact at ffffffff81600286
#14 [ffff8849f6f1fb68] __alloc_pages_nodemask at ffffffff81160b98
#15 [ffff8849f6f1fca0] alloc_pages_vma at ffffffff811a2a2a
#16 [ffff8849f6f1fd08] do_huge_pmd_wp_page at ffffffff811b77d8
#17 [ffff8849f6f1fd98] handle_mm_fault at ffffffff81182b64
#18 [ffff8849f6f1fe28] __do_page_fault at ffffffff816101c6
#19 [ffff8849f6f1ff28] do_page_fault at ffffffff816105ca
#20 [ffff8849f6f1ff50] page_fault at ffffffff8160c7c8
    RIP: 0000000000c97926 RSP: 00007fa8aa6503a0 RFLAGS: 00010246
    RAX: 00000000002c48f3 RBX: 00000000000086bd RCX: 0000000000000000
    RDX: 00007f77207547a6 RSI: 0000000000003f8a RDI: 40000685fd6c52e4
    RBP: 00007fa8aa650530 R8: 00007f7eec166ba8 R9: 00000000003fffff
    R10: 00000000000bad6c R11: 0000000000003f90 R12: 00007fcd4b5f2630
    R13: 00007f6c213fe010 R14: 00000000000007f1 R15: 00007fa8aa652e90
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

We can bypass a lot of what we had to do the first time around and just jump directly to this:

0438 /*
0439 * Similar to page_get_anon_vma() except it locks the anon_vma.
0440 *
0441 * Its a little more complex as it tries to keep the fast path to a single
0442 * atomic op -- the trylock. If we fail the trylock, we fall back to getting a
0443 * reference like with page_get_anon_vma() and then block on the mutex.
0444 */
0445 struct anon_vma *page_lock_anon_vma_read(struct page *page)
0446 {

The struct page is in r14 and that's ffffea0105a09680

0447 struct anon_vma *anon_vma = NULL;
0448 struct anon_vma *root_anon_vma;
0449 unsigned long anon_mapping;
0450
0451 rcu_read_lock();
0452 anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);

anon_mapping is 0xffff8843a01c0ac1

0453 if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
0454 goto out;

This is true so we don't go to out (PAGE_MAPPING_ANON is 1):

crash64> p 0xffff8843a01c0ac1&(1|2)
$2 = 1

0455 if (!page_mapped(page))
0456 goto out;

This is a test for the count in _mapcount to be >=0 and it is.

0457
0458 anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);

anon_vma is 0xffff8843a01c0ac0

Again the address is not present in the dump:

crash64> struct anon_vma 0xffff8843a01c0ac0
struct anon_vma struct: page excluded: kernel virtual address: ffff8843a01c0ac0 type: "gdb_readmem_callback"
Cannot access memory at address 0xffff8843a01c0ac0

It's the correct address that has been loaded though we can see from this instruction:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 458
0xffffffff8118e234 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx

That rbx should hold the value of anon_vma - it does since down_read_trylock doesn't modify rbx. There's still nothing to explain why ->mapping is set to the value it is though.

0459 root_anon_vma = ACCESS_ONCE(anon_vma->root);
0460 if (down_read_trylock(&root_anon_vma->rwsem)) {
0461 /*
0462 * If the page is still mapped, then this anon_vma is still
0463 * its anon_vma, and holding the mutex ensures that it will
0464 * not go away, see anon_vma_free().
0465 */
0466 if (!page_mapped(page)) {
0467 up_read(&root_anon_vma->rwsem);
0468 anon_vma = NULL;
0469 }
0470 goto out;
0471 }
0472
0473 /* trylock failed, we got to sleep */
0474 if (!atomic_inc_not_zero(&anon_vma->refcount)) {
0475 anon_vma = NULL;
0476 goto out;
0477 }
0478
0479 if (!page_mapped(page)) {
0480 put_anon_vma(anon_vma);
0481 anon_vma = NULL;
0482 goto out;
0483 }
0484
0485 /* we pinned the anon_vma, its safe to sleep */
0486 rcu_read_unlock();
0487 anon_vma_lock_read(anon_vma);
0488
0489 if (atomic_dec_and_test(&anon_vma->refcount)) {
0490 /*
0491 * Oops, we held the last refcount, release the lock
0492 * and bail -- can't simply use put_anon_vma() because
0493 * we'll deadlock on the anon_vma_lock_write() recursion.
0494 */
0495 anon_vma_unlock_read(anon_vma);
0496 __put_anon_vma(anon_vma);
0497 anon_vma = NULL;
0498 }
0499
0500 return anon_vma;
0501
0502 out:
0503 rcu_read_unlock();
0504 return anon_vma;
0505 }

When we called down_read_trylock the value of rdi this time was 353338353931633f. If we compare that to the last dump where rdi was fffffffffffffe08 it may be an invalid address but it is in the kernel virtual address space so it counts as a "unable to handle kernel paging request" type of error this time it's obvious that the value in rdi is not an address the kernel should be using it's actually a string "538591e?" so our mapping member of the struct page is pointing to something that it shouldn't be pointing to.
sms1123

sms1123

2016-01-27 01:07

reporter   ~0025531

Next dump (THP enabled in this one):

crash64> bt
PID: 15199 TASK: ffff88572574a220 CPU: 1 COMMAND: "vertica"
 #0 [ffff88725b71f400] machine_kexec at ffffffff8104c6a1
 #1 [ffff88725b71f458] crash_kexec at ffffffff810e2252
 #2 [ffff88725b71f528] oops_end at ffffffff8160d548
 #3 [ffff88725b71f550] no_context at ffffffff815fdf52
 #4 [ffff88725b71f5a0] __bad_area_nosemaphore at ffffffff815fdfe8
 #5 [ffff88725b71f5e8] bad_area_nosemaphore at ffffffff815fe152
 #6 [ffff88725b71f5f8] __do_page_fault at ffffffff816103ae
 #7 [ffff88725b71f6f8] do_page_fault at ffffffff816105ca
 #8 [ffff88725b71f720] page_fault at ffffffff8160c7c8
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff88725b71f7d0 RFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff881e84f50ec0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008
    RBP: ffff88725b71f7d0 R8: ffffea0191f314e0 R9: ffff883f24ca9098
    R10: ffffea00fc038800 R11: ffffffff812d4e39 R12: ffff881e84f50ec1
    R13: ffffea0191f314c0 R14: 0000000000000008 R15: ffffea0191f314c0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
 #9 [ffff88725b71f7d8] page_lock_anon_vma_read at ffffffff8118e245
#10 [ffff88725b71f808] page_referenced at ffffffff8118e4c7
#11 [ffff88725b71f880] shrink_active_list at ffffffff8116b1cc
#12 [ffff88725b71f938] shrink_lruvec at ffffffff8116b889
#13 [ffff88725b71fa38] shrink_zone at ffffffff8116bb76
#14 [ffff88725b71fa90] do_try_to_free_pages at ffffffff8116c080
#15 [ffff88725b71fb08] try_to_free_pages at ffffffff8116c56c
#16 [ffff88725b71fba0] __alloc_pages_nodemask at ffffffff81160c0d
#17 [ffff88725b71fcd8] alloc_pages_vma at ffffffff811a2a2a
#18 [ffff88725b71fd40] do_huge_pmd_anonymous_page at ffffffff811b6deb
#19 [ffff88725b71fd98] handle_mm_fault at ffffffff81182794
#20 [ffff88725b71fe28] __do_page_fault at ffffffff816101c6
#21 [ffff88725b71ff28] do_page_fault at ffffffff816105ca
#22 [ffff88725b71ff50] page_fault at ffffffff8160c7c8
    RIP: 000000000229a610 RSP: 00007ee38d57e030 RFLAGS: 00010206
    RAX: 00007eb93481c330 RBX: 0000000000010000 RCX: 00007eb93481c330
    RDX: 0000000000000005 RSI: 0000000011fbda48 RDI: 0000000000000137
    RBP: 00007ee38d57e050 R8: 0000000000000000 R9: 000000000045f2d8
    R10: 0000000000000000 R11: 00007eb93481c330 R12: 00007eb93481c330
    R13: 000000000000002a R14: 00007ef11cda5e40 R15: 00007ef7876325e0
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

The issue is definitely the same as the previous dumps:

crash64> struct page ffffea0191f314c0
struct page {
  flags = 31525193097150536,
  mapping = 0xffff881e84f50ec1,
  {
    {
      index = 34162383414,
      freelist = 0x7f43c9a36,
      pfmemalloc = 54,
      pmd_huge_pte = 0x7f43c9a36
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}

And in the full dump the page the mapping pointer points appears to be all zeros:

crash64> rd -64 0xffff881e84f50ec0 16
ffff881e84f50ec0: 0000000000000000 0000000000000000 ................
ffff881e84f50ed0: 0000000000000000 0000000000000000 ................
ffff881e84f50ee0: 0000000000000000 0000000000000000 ................
ffff881e84f50ef0: 0000000000000000 0000000000000000 ................
ffff881e84f50f00: 0000000000000000 0000000000000000 ................
ffff881e84f50f10: 0000000000000000 0000000000000000 ................
ffff881e84f50f20: 0000000000000000 0000000000000000 ................
ffff881e84f50f30: 0000000000000000 0000000000000000 ................

Let's try and see how we actually referenced this struct page.

The only argument into page_lock_anon_vma_read is a struct page so we need to look at page referenced to see how it got that struct page.

0845 int page_referenced(struct page *page,
0846 int is_locked,
0847 struct mem_cgroup *memcg,
0848 unsigned long *vm_flags)
0849 {
0850 int referenced = 0;
0851 int we_locked = 0;
0852
0853 *vm_flags = 0;
0854 if (page_mapped(page) && page_rmapping(page)) {
0855 if (!is_locked && (!PageAnon(page) || PageKsm(page))) {
0856 we_locked = trylock_page(page);
0857 if (!we_locked) {
0858 referenced++;
0859 goto out;
0860 }
0861 }
0862 if (unlikely(PageKsm(page)))
0863 referenced += page_referenced_ksm(page, memcg,
0864 vm_flags);
0865 else if (PageAnon(page))
0866 referenced += page_referenced_anon(page, memcg,
0867 vm_flags);
0868 else if (page->mapping)
0869 referenced += page_referenced_file(page, memcg,
0870 vm_flags);
0871 if (we_locked)
0872 unlock_page(page);
0873
0874 if (page_test_and_clear_young(page_to_pfn(page)))
0875 referenced++;
0876 }
0877 out:
0878 return referenced;
0879 }

The call into page_lock_anon_vma_read comes from page_referenced_anon:

0734 static int page_referenced_anon(struct page *page,
0735 struct mem_cgroup *memcg,
0736 unsigned long *vm_flags)
0737 {
0738 unsigned int mapcount;
0739 struct anon_vma *anon_vma;
0740 pgoff_t pgoff;
0741 struct anon_vma_chain *avc;
0742 int referenced = 0;
0743
0744 anon_vma = page_lock_anon_vma_read(page);
0745 if (!anon_vma)
0746 return referenced;
0747
0748 mapcount = page_mapcount(page);
0749 pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
0750 anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
0751 struct vm_area_struct *vma = avc->vma;
0752 unsigned long address = vma_address(page, vma);
0753 /*
0754 * If we are reclaiming on behalf of a cgroup, skip
0755 * counting on behalf of references from different
0756 * cgroups
0757 */
0758 if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
0759 continue;
0760 referenced += page_referenced_one(page, vma, address,
0761 &mapcount, vm_flags);
0762 if (!mapcount)
0763 break;
0764 }
0765
0766 page_unlock_anon_vma_read(anon_vma);
0767 return referenced;
0768 }

The struct page comes from earlier function. We do need to check one thing though on the page (PG_locked in the flags in the struct page):

0359 static inline int trylock_page(struct page *page)
0360 {
0361 return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
0362 }

The flags are at this point:

crash64> p/x 31525193097150536
$2 = 0x6fffff00080048
crash64> p/t 31525193097150536
$4 = 1101111111111111111111100000000000010000000000001001000

Assuming that someone hasn't destroyed the struct page recently we need to see if we can determine the value of is_locked to explain why we didn't set PG_locked on the page. Unfortunately esi (32bit version of rsi) is tested and discarded and not saved so only the caller knows what the value was. Let's check shrink_active_list then:

1594 static void shrink_active_list(unsigned long nr_to_scan,
1595 struct lruvec *lruvec,
1596 struct scan_control *sc,
1597 enum lru_list lru)
1598 {
1599 unsigned long nr_taken;
1600 unsigned long nr_scanned;
1601 unsigned long vm_flags;
1602 LIST_HEAD(l_hold); /* The pages which were snipped off */
1603 LIST_HEAD(l_active);
1604 LIST_HEAD(l_inactive);
1605 struct page *page;
1606 struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
1607 unsigned long nr_rotated = 0;
1608 isolate_mode_t isolate_mode = 0;
1609 int file = is_file_lru(lru);
1610 struct zone *zone = lruvec_zone(lruvec);
1611
1612 lru_add_drain();
1613
1614 if (!sc->may_unmap)
1615 isolate_mode |= ISOLATE_UNMAPPED;
1616 if (!sc->may_writepage)
1617 isolate_mode |= ISOLATE_CLEAN;
1618
1619 spin_lock_irq(&zone->lru_lock);
1620
1621 nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
1622 &nr_scanned, sc, isolate_mode, lru);
1623 if (global_reclaim(sc))
1624 zone->pages_scanned += nr_scanned;
1625
1626 reclaim_stat->recent_scanned[file] += nr_taken;
1627
1628 __count_zone_vm_events(PGREFILL, zone, nr_scanned);
1629 __mod_zone_page_state(zone, NR_LRU_BASE + lru, -nr_taken);
1630 __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
1631 spin_unlock_irq(&zone->lru_lock);
1632
1633 while (!list_empty(&l_hold)) {
1634 cond_resched();
1635 page = lru_to_page(&l_hold);
1636 list_del(&page->lru);
1637
1638 if (unlikely(!page_evictable(page))) {
1639 putback_lru_page(page);
1640 continue;
1641 }
1642
1643 if (unlikely(buffer_heads_over_limit)) {
1644 if (page_has_private(page) && trylock_page(page)) {
1645 if (page_has_private(page))
1646 try_to_release_page(page, 0);
1647 unlock_page(page);
1648 }
1649 }
1650
1651 if (page_referenced(page, 0, sc->target_mem_cgroup,
1652 &vm_flags)) {
1653 nr_rotated += hpage_nr_pages(page);
1654 /*
1655 * Identify referenced, file-backed active pages and
1656 * give them one more trip around the active list. So
1657 * that executable code get better chances to stay in
1658 * memory under moderate memory pressure. Anon pages
1659 * are not likely to be evicted by use-once streaming
1660 * IO, plus JVM can create lots of anon VM_EXEC pages,
1661 * so we ignore them here.
1662 */
1663 if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
1664 list_add(&page->lru, &l_active);
1665 continue;
1666 }
1667 }
1668
1669 ClearPageActive(page); /* we are de-activating */
1670 list_add(&page->lru, &l_inactive);
1671 }
1672
1673 /*
1674 * Move pages back to the lru list.
1675 */
1676 spin_lock_irq(&zone->lru_lock);
1677 /*
1678 * Count referenced pages from currently used mappings as rotated,
1679 * even though only some of them are actually re-activated. This
1680 * helps balance scan pressure between file and anonymous pages in
1681 * get_scan_ratio.
1682 */
1683 reclaim_stat->recent_rotated[file] += nr_rotated;
1684
1685 move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
1686 move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
1687 __mod_zone_page_state(zone, NR_ISOLATED_ANON + file, -nr_taken);
1688 spin_unlock_irq(&zone->lru_lock);
1689
1690 free_hot_cold_page_list(&l_hold, true);
1691 }

Let's work out where we were when we called page_referenced. We are here:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1651
0xffffffff8116b1ba <shrink_active_list+458>: mov 0x38(%r14),%rdx
0xffffffff8116b1be <shrink_active_list+462>: lea -0x68(%rbp),%rcx
0xffffffff8116b1c2 <shrink_active_list+466>: xor %esi,%esi
0xffffffff8116b1c4 <shrink_active_list+468>: mov %r15,%rdi
0xffffffff8116b1c7 <shrink_active_list+471>: callq 0xffffffff8118e300 <page_referenced>
0xffffffff8116b1cc <shrink_active_list+476>: test %eax,%eax
0xffffffff8116b1ce <shrink_active_list+478>: je 0xffffffff8116b168 <shrink_active_list+376>

1651 if (page_referenced(page, 0, sc->target_mem_cgroup, <<<<<<<<<
1652 &vm_flags)) { <<<<<<<<<
1653 nr_rotated += hpage_nr_pages(page);
1654 /*
1655 * Identify referenced, file-backed active pages and
1656 * give them one more trip around the active list. So
1657 * that executable code get better chances to stay in
1658 * memory under moderate memory pressure. Anon pages
1659 * are not likely to be evicted by use-once streaming
1660 * IO, plus JVM can create lots of anon VM_EXEC pages,
1661 * so we ignore them here.
1662 */
1663 if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
1664 list_add(&page->lru, &l_active);
1665 continue;
1666 }
1667 }

So in page referenced we save the following registers:

crash64> dis -l page_referenced
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 849
0xffffffff8118e300 <page_referenced>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e305 <page_referenced+5>: push %rbp
0xffffffff8118e306 <page_referenced+6>: mov %rcx,%rax
0xffffffff8118e309 <page_referenced+9>: mov %rsp,%rbp
0xffffffff8118e30c <page_referenced+12>: push %r15
0xffffffff8118e30e <page_referenced+14>: push %r14
0xffffffff8118e310 <page_referenced+16>: push %r13
0xffffffff8118e312 <page_referenced+18>: push %r12
0xffffffff8118e314 <page_referenced+20>: push %rbx
0xffffffff8118e315 <page_referenced+21>: mov %rdi,%rbx
0xffffffff8118e318 <page_referenced+24>: sub $0x40,%rsp
0xffffffff8118e31c <page_referenced+28>: mov %rcx,-0x58(%rbp)

And the stack frame looks like:

#10 [ffff88725b71f808] page_referenced at ffffffff8118e4c7
    ffff88725b71f810: ffff883f1f3fb880 0000000000002688
    ffff88725b71f820: ffff88725b71f8c8 0000000100000000
    ffff88725b71f830: 0000000000001adc 0000000001adc000
    ffff88725b71f840: 0000000000000016 00000000d48be03d
    ffff88725b71f850: rbx ffffea0191f314e0 r12 ffff88725b71f8d0
    ffff88725b71f860: r13 0000000000000010 r14 ffff88725b71fb20
    ffff88725b71f870: r15 ffffea0191f314c0 rbp ffff88725b71f930
    ffff88725b71f880: rip ffffffff8116b1cc
#11 [ffff88725b71f880] shrink_active_list at ffffffff8116b1cc

The 4 args passed into page_referenced are in these registers:

RDI struct page *page rdi is r15 is 0xffffea0191f314c0
RSI int is_locked not known at this point
RDX struct mem_cgroup *memcg rdx is *(r14+0x38) 0x0000000000000000
RCX unsigned long *vm_flags rcx is rbp-0x68 0xffff88725b71f8c8 (dereferenced is 0)

crash64> p/x 0xffff88725b71f930-0x68
$5 = 0xffff88725b71f8c8
crash64> x/gx 0xffff88725b71f8c8
0xffff88725b71f8c8: 0x0000000000000000

The stuct page saved here matches the one we are using later up the stack. That means that the page must have been on the lru list and we took it by calling isolate_lru_pages to sort out where, if anywhere, it should be moved to (inactive, active, etc). It also explains why most of the list pointers are poisoned as it's been removed from all lists. It's currently not in the l_hold it was added to it's been removed from that temporarily while it is being handled. At this point I still can't explain why it has a bad mapping pointer.

In shrink_active_list this is the condition test in the while loop:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1633
0xffffffff8116b217 <shrink_active_list+551>: cmp %r12,-0x60(%rbp)
0xffffffff8116b21b <shrink_active_list+555>: jne 0xffffffff8116b187 <shrink_active_list+407>
0xffffffff8116b221 <shrink_active_list+561>: nopl 0x0(%rax)

1633 while (!list_empty(&l_hold)) {

l_hold is inited with the following macro:

0021 #define LIST_HEAD(name) \
0022 struct list_head name = LIST_HEAD_INIT(name)

So it just a struct list and list_empty compares head to head->next:

0186 static inline int list_empty(const struct list_head *head)
0187 {
0188 return head->next == head;
0189 }

So at that point r12 contains the head address and *(rbp-0x60) is the next pointer.

crash64> list_head 0xffff88725b71f8e0
struct list_head {
  next = 0xffff88725b71f8e0,
  prev = 0xffff88725b71f8e0
}

So this must have been the last entry in the list_head l_hold. It is possible that something else may have did something to the page while we had it in the l_hold local list.

Looking to see what other memory may have the value 0xffff881e84f50ec1 in it

crash64> search -k 0xffff881e84f50ec1
ffff88725b71f318: ffff881e84f50ec1
ffff88725b71f478: ffff881e84f50ec1
ffff88725b71f6d0: ffff881e84f50ec1
ffff88725b71f740: ffff881e84f50ec1
ffff88800f5314c8: ffff881e84f50ec1
ffffea0191f314c8: ffff881e84f50ec1

The first 4 are all on the same page of memory it's the kernel stack of pid 15199

crash64> kmem ffff88725b71f318
    PID: 15199
COMMAND: "vertica"
   TASK: ffff88572574a220 [THREAD_INFO: ffff88725b71c000]
    CPU: 1
  STATE: TASK_RUNNING (PANIC)

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea01c96dc7c0 725b71f000 0 0 0 6fffff00000000

The other two are on the same physical page and it's the struct page:

crash64> kmem ffff88800f5314c8
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea02003d4c40 800f531000 0 0 2 6fffff00000c00 reserved,private

crash64> kmem ffffea0191f314c8
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea02003d4c40 800f531000 0 0 2 6fffff00000c00 reserved,private

The others are on the struct page.

Also try the folowing search (the address without the anon flag):

crash64> search -k 0xffff881e84f50ec0
ffff88589581a160: ffff881e84f50ec0
ffff88589581a238: ffff881e84f50ec0
ffff88589581a4c0: ffff881e84f50ec0
ffff88589581aaa8: ffff881e84f50ec0
ffff88725b71f328: ffff881e84f50ec0
ffff88725b71f488: ffff881e84f50ec0
ffff88725b71f750: ffff881e84f50ec0

The first 4 references to the struct address_space are on this other vertica thread:

crash64> kmem ffff88589581a160
    PID: 9223
COMMAND: "vertica"
   TASK: ffff887dcc68ad80 [THREAD_INFO: ffff885895818000]
    CPU: 3
  STATE: TASK_INTERRUPTIBLE

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0162560680 589581a000 0 0 0 6fffff00000000

and the rest are on the stack of the panicing thread. It's currently doing something else:

crash64> bt 9223
PID: 9223 TASK: ffff887dcc68ad80 CPU: 3 COMMAND: "vertica"
 #0 [ffff88589581b9d0] __schedule at ffffffff81609c7d
 #1 [ffff88589581ba38] schedule at ffffffff8160a1d9
 #2 [ffff88589581ba48] schedule_hrtimeout_range_clock at ffffffff816094fc
 #3 [ffff88589581bae0] schedule_hrtimeout_range at ffffffff81609553
 #4 [ffff88589581baf0] poll_schedule_timeout at ffffffff811daf90
 #5 [ffff88589581bb20] do_sys_poll at ffffffff811dc51d
 #6 [ffff88589581bf40] sys_poll at ffffffff811dc6d4
 #7 [ffff88589581bf80] system_call_fastpath at ffffffff81614de9
    RIP: 00007f12afd73b7d RSP: 00007f110dff54b8 RFLAGS: 00000202
    RAX: 0000000000000007 RBX: ffffffff81614de9 RCX: 0000000000000006
    RDX: 00000000000003e8 RSI: 0000000000000001 RDI: 00007f110dff59d0
    RBP: 00007f110dffaea0 R8: 00007f110dff58e8 R9: 0000000000002407
    R10: 00007f12b053d6b0 R11: 0000000000000293 R12: 0000000000000001
    R13: 00007f1170014360 R14: 0000000000000004 R15: 000000002a0eb038
    ORIG_RAX: 0000000000000007 CS: 0033 SS: 002b

Let's see if we can work out what it used to be doing in the past:

PID: 9223 TASK: ffff887dcc68ad80 CPU: 3 COMMAND: "vertica"
ffff885895818000: ffff887dcc68ad80 default_exec_domain
ffff885895818010: 0000000000000080 0000000000000003
ffff885895818020: 00007ffffffff000 do_no_restart_syscall
...
ffff885895818060: 0000000000000000 0000000057ac6e9d
...

ffff88589581a000: ffff88589581a6c0 00007f2bd35c0000
ffff88589581a010: ffff88589581ba28 0000000000000000
ffff88589581a020: ffff887f1999f971 ffff88589581ba48
ffff88589581a030: ffff88589581b538 0000000000000000
ffff88589581a040: ffff88097ab592c0 0000000000000120
ffff88589581a050: 0000000000100070 0000000000000000
ffff88589581a060: 0000000000000000 0000000000000000
ffff88589581a070: 0000000000000000 ffff88589581a078
ffff88589581a080: ffff88589581a078 ffff883b631b9a00
ffff88589581a090: 0000000000000000 00000007f2bd35bf
...
ffff88589581a0d0: 0000000000000000 ffff88589581b7a0
ffff88589581a0e0: 00007f4393057000 ffff88589581aa20
ffff88589581a0f0: 0000000000000000 ffff883ecc7c6e78
ffff88589581a100: ffff88589581b898 ffff8841cf875a48
ffff88589581a110: 0000000000000000 ffff88097ab592c0
ffff88589581a120: 0000000000000120 0000000000100070
ffff88589581a130: 0000000000000000 0000000000000000
ffff88589581a140: 0000000000000000 0000000000000000
ffff88589581a150: ffff88589581a150 ffff88589581a150
ffff88589581a160: ffff881e84f50ec0 0000000000000000
                   ^^^^^^^^^^^^^^^^
ffff88589581a170: 00000007f4393056 0000000000000000
...
ffff88589581a1b0: ffff88589581a438 00007f4392856000
ffff88589581a1c0: ffff88589581bcb0 0000000000000000
ffff88589581a1d0: ffff8841cf875a49 ffff88589581bcd0
ffff88589581a1e0: ffff8841cf874608 0000000000000000
ffff88589581a1f0: ffff88097ab592c0 0000000000000120
ffff88589581a200: 0000000000100070 0000000000000000
ffff88589581a210: 0000000000000000 0000000000000000
ffff88589581a220: 0000000000000000 ffff88589581a228
ffff88589581a230: ffff88589581a228 ffff881e84f50ec0
                                    ^^^^^^^^^^^^^^^^
ffff88589581a240: 0000000000000000 00000007f4392855
...
ffff88589581a280: 0000000000000000 ffff88589581a870
ffff88589581a290: 00007f4265ffc000 0000000000000000
ffff88589581a2a0: ffff88589581a870 ffff88472a905bf9
ffff88589581a2b0: 0000000000000000 ffff88472a905610
ffff88589581a2c0: 0000000000000000 ffff88097ab592c0
ffff88589581a2d0: 8000000000000025 0000000000100073
ffff88589581a2e0: 0000000000000000 0000000000000000
ffff88589581a2f0: 0000000000000000 0000000000000000
ffff88589581a300: ffff88589581a300 ffff88589581a300
ffff88589581a310: ffff887f2464c0c0 0000000000000000
ffff88589581a320: 00000007f42657fc 0000000000000000
...
ffff88589581a360: ffff88589581b518 00007f45941c8000
ffff88589581a370: 0000000000000000 ffff887a0ae53a28
ffff88589581a380: ffff884a864abb21 ffff8836216e1460
ffff88589581a390: ffff88037154a020 000000000600c000
ffff88589581a3a0: ffff88097ab592c0 8000000000000025
ffff88589581a3b0: 0000000000100073 0000000000000000
ffff88589581a3c0: 0000000000000000 0000000000000000
ffff88589581a3d0: 0000000000000000 ffff88589581a3d8
ffff88589581a3e0: ffff88589581a3d8 ffff887f2507ec80
ffff88589581a3f0: 0000000000000000 00000007f45939c8
...
ffff88589581a430: 0000000000000000 ffff88589581b878
ffff88589581a440: 00007f4394058000 0000000000000000
ffff88589581a450: ffff88589581b878 ffff88589581a1d1
ffff88589581a460: 0000000000000000 ffff88589581bcd0
ffff88589581a470: 0000000000000000 ffff88097ab592c0
ffff88589581a480: 8000000000000025 0000000000100073
ffff88589581a490: 0000000000000000 0000000000000000
ffff88589581a4a0: 0000000000000000 0000000000000000
ffff88589581a4b0: ffff88589581a4b0 ffff88589581a4b0
ffff88589581a4c0: ffff881e84f50ec0 0000000000000000
                   ^^^^^^^^^^^^^^^^
ffff88589581a4d0: 00000007f4393858 0000000000000000
...
ffff88589581a510: ffff88589581b368 00007f4245ffc000
ffff88589581a520: 0000000000000000 ffff8854849001b0
ffff88589581a530: ffff883a365ec021 0000000000000000
ffff88589581a540: ffff88589581b610 0000000000000000
ffff88589581a550: ffff88097ab592c0 8000000000000025
ffff88589581a560: 0000000000100073 0000000000000000
ffff88589581a570: 0000000000000000 0000000000000000
ffff88589581a580: 0000000000000000 ffff88589581a588
ffff88589581a590: ffff88589581a588 ffff887a0c496480
ffff88589581a5a0: 0000000000000000 00000007f42457fc
...
ffff88589581a5e0: 0000000000000000 ffff88589581b5f0
ffff88589581a5f0: 00007f2bd3dc1000 ffff88589581a6c0
ffff88589581a600: 0000000000000000 ffff88589581a021
ffff88589581a610: ffff88589581a6e0 ffff88589581ba48
ffff88589581a620: 0000000000000000 ffff88097ab592c0
ffff88589581a630: 0000000000000120 0000000000100070
ffff88589581a640: 0000000000000000 0000000000000000
ffff88589581a650: 0000000000000000 0000000000000000
ffff88589581a660: ffff88589581a660 ffff88589581a660
ffff88589581a670: ffff883b631b9a00 0000000000000000
ffff88589581a680: 00000007f2bd3dc0 0000000000000000
...
ffff88589581a6c0: ffff88589581a5e8 00007f2bd45c1000
ffff88589581a6d0: 0000000000000000 ffff88589581a5e8
ffff88589581a6e0: ffff88589581a021 0000000000000000
ffff88589581a6f0: ffff88589581ba48 0000000000000000
ffff88589581a700: ffff88097ab592c0 8000000000000025
ffff88589581a710: 0000000000100073 0000000000000000
ffff88589581a720: 0000000000000000 0000000000000000
ffff88589581a730: 0000000000000000 ffff88589581a738
ffff88589581a740: ffff88589581a738 ffff883b631b9a00
ffff88589581a750: 0000000000000000 00000007f2bd3dc1
...
ffff88589581a790: 0000000000000000 ffff88589581be60
ffff88589581a7a0: 00007f45949ca000 ffff883527b3d290
ffff88589581a7b0: 0000000000000000 ffff883527b3d971
ffff88589581a7c0: ffff883527b3d2b0 ffff883527b3d388
ffff88589581a7d0: 0000000000000000 ffff88097ab592c0
ffff88589581a7e0: 0000000000000120 0000000000100070
ffff88589581a7f0: 0000000000000000 0000000000000000
ffff88589581a800: 0000000000000000 0000000000000000
ffff88589581a810: ffff88589581a810 ffff88589581a810
ffff88589581a820: ffff887f2507ec80 0000000000000000
ffff88589581a830: 00000007f45949c9 0000000000000000
...
ffff88589581a870: ffff88589581a510 00007f42657fc000
ffff88589581a880: ffff88589581a288 0000000000000000
ffff88589581a890: ffff88171e4d1b21 ffff88589581a2a8
ffff88589581a8a0: ffff88472a905bf8 00000000007f9000
ffff88589581a8b0: ffff88097ab592c0 0000000000000120
ffff88589581a8c0: 0000000000100070 0000000000000000
ffff88589581a8d0: 0000000000000000 0000000000000000
ffff88589581a8e0: 0000000000000000 ffff88589581a8e8
ffff88589581a8f0: ffff88589581a8e8 ffff887f2464c0c0
ffff88589581a900: 0000000000000000 00000007f42657fb
...
ffff88589581a940: 0000000000000000 ffff88589581b1b8
ffff88589581a950: 00007f41457fb000 0000000000000000
ffff88589581a960: ffff88589581b1b8 ffff88589581af51
ffff88589581a970: 0000000000000000 ffff880a3346de80
ffff88589581a980: 0000000000000000 ffff88097ab592c0
ffff88589581a990: 8000000000000025 0000000000100073
ffff88589581a9a0: 0000000000000000 0000000000000000
ffff88589581a9b0: 0000000000000000 0000000000000000
ffff88589581a9c0: ffff88589581a9c0 ffff88589581a9c0
ffff88589581a9d0: ffff887f224fea00 0000000000000000
ffff88589581a9e0: 00000007f4144ffb 0000000000000000
...
ffff88589581aa20: ffff88589581a0d8 00007f4393857000
ffff88589581aa30: 0000000000000000 ffff88589581a0d8
ffff88589581aa40: ffff8841cf875a49 ffff88589581b898
ffff88589581aa50: ffff88589581a1d0 0000000000000000
ffff88589581aa60: ffff88097ab592c0 8000000000000025
ffff88589581aa70: 0000000000100073 0000000000000000
ffff88589581aa80: 0000000000000000 0000000000000000
ffff88589581aa90: 0000000000000000 ffff88589581aa98
ffff88589581aaa0: ffff88589581aa98 ffff881e84f50ec0
ffff88589581aab0: 0000000000000000 00000007f4393057
...
ffff88589581aaf0: 0000000000000000 ffff88589581b290
ffff88589581ab00: 00007fd249cff000 ffff887f1eddbe60
ffff88589581ab10: ffff887f1eddad80 ffff887f1eddbe80
ffff88589581ab20: 0000000000000000 0000000000000000
ffff88589581ab30: 0000000000000000 ffff887f209b5780
ffff88589581ab40: 0000000000000025 0000000000000875
ffff88589581ab50: ffff885c22e3dc30 0000000000000000
ffff88589581ab60: 0000000000000000 0000000000000020
ffff88589581ab70: ffff88589581ab70 ffff88589581ab70
ffff88589581ab80: 0000000000000000 xfs_file_vm_ops
ffff88589581ab90: 0000000000000000 ffff887e08c43900
...
ffff88589581abd0: ffff88589581b6c8 00007f0f40000000
ffff88589581abe0: ffff882878408f30 ffff88589581b6c8
ffff88589581abf0: ffff88589581b6e9 0000000000000000
ffff88589581ac00: 0000000000000000 0000000000000000
ffff88589581ac10: ffff88097ab592c0 0000000000000120
ffff88589581ac20: 0000000000200070 0000000000000000
ffff88589581ac30: 0000000000000000 0000000000000000
ffff88589581ac40: 0000000000000000 ffff88589581ac48
ffff88589581ac50: ffff88589581ac48 0000000000000000
ffff88589581ac60: 0000000000000000 00000007f0f3fff5
...
ffff88589581aca0: 0000000000000000 ffff88589581a360
ffff88589581acb0: 00007f43f34b2000 ffff883e9e001368
ffff88589581acc0: 0000000000000000 ffff883e26e72459
ffff88589581acd0: ffff88415b94c890 ffff885a20c422a8
ffff88589581ace0: 000000000808d000 ffff88097ab592c0
ffff88589581acf0: 0000000000000120 0000000000100070
ffff88589581ad00: 0000000000000000 0000000000000000
ffff88589581ad10: 0000000000000000 0000000000000000
ffff88589581ad20: ffff88589581ad20 ffff88589581ad20
ffff88589581ad30: ffff883ee868dd80 0000000000000000
ffff88589581ad40: 00000007f43f34b1 0000000000000000
...
ffff88589581ad80: 0000000000000000 0000000000406000
ffff88589581ad90: ffff88589581b290 0000000000000000
ffff88589581ada0: ffff88589581b2b1 0000000000000000
ffff88589581adb0: 0000000000000000 0000000000400000
ffff88589581adc0: ffff887f209b5780 0000000000000025
ffff88589581add0: 0000000000000875 ffff88589581b2e8
ffff88589581ade0: 0000000000000000 0000000000000000
ffff88589581adf0: 0000000000000005 ffff88589581adf8
ffff88589581ae00: ffff88589581adf8 0000000000000000
ffff88589581ae10: xfs_file_vm_ops 0000000000000000
ffff88589581ae20: ffff883ed604fc00 0000000000000000
ffff88589581ae30: 0000000000000000 0000000000000000
ffff88589581ae40: 0000000000000000 0000000000000000
ffff88589581ae50: 0000000000000000 ffff88589581bcb0
ffff88589581ae60: 00007f4226fff000 ffff88589581b008
ffff88589581ae70: 0000000000000000 ffff883a365ec458
ffff88589581ae80: ffff88589581b460 ffff887abff177c0
ffff88589581ae90: 0000000000801000 ffff88097ab592c0
ffff88589581aea0: 0000000000000120 0000000000100070
ffff88589581aeb0: 0000000000000000 0000000000000000
ffff88589581aec0: 0000000000000000 0000000000000000
ffff88589581aed0: ffff88589581aed0 ffff88589581aed0
ffff88589581aee0: ffff887f224fe800 0000000000000000
ffff88589581aef0: 00000007f4226ffe 0000000000000000
...
ffff88589581af30: ffff88589581a948 00007f4147800000
ffff88589581af40: ffff88589581be60 0000000000000000
ffff88589581af50: ffff881d5af47611 ffff881d5af46608
ffff88589581af60: ffff880a3346de80 00000000037ff000
ffff88589581af70: ffff88097ab592c0 0000000000000120
ffff88589581af80: 0000000000100070 0000000000000000
ffff88589581af90: 0000000000000000 0000000000000000
ffff88589581afa0: 0000000000000000 ffff88589581afa8
ffff88589581afb0: ffff88589581afa8 ffff887f224fea00
ffff88589581afc0: 0000000000000000 00000007f41477ff
...
ffff88589581b000: 0000000000000000 ffff88589581ae58
ffff88589581b010: 00007f42277ff000 0000000000000000
ffff88589581b020: ffff88589581ae58 ffff887abff177c1
ffff88589581b030: ffff88589581b460 ffff884a866a4bf0
ffff88589581b040: 0000000000000000 ffff88097ab592c0
ffff88589581b050: 8000000000000025 0000000000100073
ffff88589581b060: 0000000000000000 0000000000000000
ffff88589581b070: 0000000000000000 0000000000000000
ffff88589581b080: ffff88589581b080 ffff88589581b080
ffff88589581b090: ffff887f224fe800 0000000000000000
ffff88589581b0a0: 00000007f4226fff 0000000000000000
...
ffff88589581b0e0: ffff88589581aca8 00007f2bd2dbe000
ffff88589581b0f0: 0000000000000000 ffff88780ef74438
ffff88589581b100: ffff8858887891d9 0000000000000000
ffff88589581b110: ffff88418331b2b0 0000000000000000
ffff88589581b120: ffff88097ab592c0 8000000000000025
ffff88589581b130: 0000000000100073 0000000000000000
ffff88589581b140: 0000000000000000 0000000000000000
ffff88589581b150: 0000000000000000 ffff88589581b158
ffff88589581b160: ffff88589581b158 ffff883b631b9a00
ffff88589581b170: 0000000000000000 00000007f2bd25be
...
ffff88589581b1b0: 0000000000000000 ffff88589581b950
ffff88589581b1c0: 00007f4144ffb000 ffff88589581a948
ffff88589581b1d0: 0000000000000000 ffff88589581af51
ffff88589581b1e0: ffff88589581a968 ffff880a3346de80
ffff88589581b1f0: 0000000000ffa000 ffff88097ab592c0
ffff88589581b200: 0000000000000120 0000000000100070
ffff88589581b210: 0000000000000000 0000000000000000
ffff88589581b220: 0000000000000000 0000000000000000
ffff88589581b230: ffff88589581b230 ffff88589581b230
ffff88589581b240: ffff887f224fea00 0000000000000000
ffff88589581b250: 00000007f4144ffa 0000000000000000
...
ffff88589581b290: ffff88589581ad80 0000000000606000
ffff88589581b2a0: ffff887f1eddaca8 ffff88589581ad80
ffff88589581b2b0: ffff887f1eddba48 ffff887f1eddb538
ffff88589581b2c0: ffff88589581ada0 00007fd248c1f000
ffff88589581b2d0: ffff887f209b5780 8000000000000025
ffff88589581b2e0: 0000000000100871 0000000000000001
ffff88589581b2f0: ffff887f1eddad00 ffff88589581b338
ffff88589581b300: update_curr+204 0007d495034e1136
ffff88589581b310: ffff88589581b338 account_entity_dequeue+174
ffff88589581b320: ffff887dcc68ade8 ffff887f7f493700
ffff88589581b330: 0000000000000001 ffff88589581b388
ffff88589581b340: dequeue_entity+262 0000000000000000
ffff88589581b350: 0000000000000000 0000000000000000
ffff88589581b360: ffff887dcc68ade8 ffff887f7f493700
ffff88589581b370: ffff887f7f493680 0000000000000001
ffff88589581b380: 0000000000000001 ffff88589581b3d8
ffff88589581b390: dequeue_task_fair+1054 ffff88589581b3a8
ffff88589581b3a0: native_sched_clock+19 ffff88589581b3b8
ffff88589581b3b0: sched_clock+9 ffff88589581b3e0
ffff88589581b3c0: sched_clock_cpu+181 ffff883f7f893680
ffff88589581b3d0: ffff883f7f893680 ffff883f26fbb584
ffff88589581b3e0: ffff88589581b400 update_rq_clock+26
ffff88589581b3f0: ffff883f7f893680 ffff883f7f893680
ffff88589581b400: check_preempt_curr+117 ffff883f26fbad80
ffff88589581b410: ffff88589581b450 update_curr+204
ffff88589581b420: 000928ebbe6b1079 ffff887dcc68ade8
ffff88589581b430: ffff883f7f893700 0000000000000002
ffff88589581b440: ffff88589581b450 ffff88589581b490
ffff88589581b450: __switch_to+377 ffff887dcc68b418
ffff88589581b460: 0000000000000001 ffff882884bf3200
ffff88589581b470: ffff887f7f1d3680 ffff882884bf3200
ffff88589581b480: ffff882884bf3200 ffff887f7f1d3680
ffff88589581b490: ffff882884bf3200 0000000000000011
ffff88589581b4a0: ffff88589581b500 __schedule+709
ffff88589581b4b0: ffff88589581bfd8 0000000000013680
ffff88589581b4c0: ffff88589581bfd8 0000000000013680
ffff88589581b4d0: ffff887dcc68ad80 ffff885895818000
ffff88589581b4e0: ffff88589581b650 0000000000000002
ffff88589581b4f0: ffff887dcc68ad80 ffff8850419dc440
ffff88589581b500: ffff88589581b518 __cond_resched+38
ffff88589581b510: ffff88589581b648 ffff88589581b528
ffff88589581b520: _cond_resched+58 ffff88589581b588
ffff88589581b530: wait_for_completion+55 000000002a0eb038
ffff88589581b540: 0000000000000002 irq_cpu_stop_queue_work
ffff88589581b550: 0000000000000002 ffff88589581b588
ffff88589581b560: 000000002a0eb038 0000000000000011
ffff88589581b570: ffff88589581b638 0000000000000002
ffff88589581b580: ffff887dcc68ad80 ffff88589581b688
ffff88589581b590: stop_two_cpus+378 0000000200000011
ffff88589581b5a0: ffff88589581b5b0 ffff88589581b5e0
ffff88589581b5b0: ffff88589581b5b0 ffff88589581b5b0
ffff88589581b5c0: multi_cpu_stop ffff88589581b610
ffff88589581b5d0: ffff88589581b638 ffff88589581b620
ffff88589581b5e0: ffff88589581b5e0 ffff88589581b5e0
ffff88589581b5f0: multi_cpu_stop ffff88589581b610
ffff88589581b600: ffff88589581b638 ffff883f7f893680
ffff88589581b610: migrate_swap_stop ffff88589581b698
ffff88589581b620: 0000000000000002 cpu_bit_bitmap+11520
ffff88589581b630: 0000000200000005 0000000100000000
ffff88589581b640: 0000000000000000 0000000000000000
ffff88589581b650: 0000000000040004 ffff88589581b658
ffff88589581b660: ffff88589581b658 000000002a0eb038
ffff88589581b670: ffff887dcc68ad80 0000000000000002
ffff88589581b680: 0000000000000011 ffff88589581b6e0
ffff88589581b690: migrate_swap+183 ffff887dcc68ad80
ffff88589581b6a0: ffff8850419dc440 0000001100000002
ffff88589581b6b0: 000000002a0eb038 ffff887dcc68ad80
ffff88589581b6c0: ffff88589581b708 00000000000001b2
ffff88589581b6d0: 00000000000001b2 ffff88589581b708
ffff88589581b6e0: update_group_power+310 ffff88589581b708
ffff88589581b6f0: cpumask_next_and+53 0000000000000001
ffff88589581b700: ffff88589581b8f0 ffff88589581b880
ffff88589581b710: find_busiest_group+276 ffff883f26650800
ffff88589581b720: 0000000000000027 00000000010000d7
ffff88589581b730: 0000000000013680 0000000000013680
ffff88589581b740: ffff883f26650c18 0000000000000000
ffff88589581b750: 00000000000000b2 0000000000000800
ffff88589581b760: 0000000000000800 0000000000000400
ffff88589581b770: 0000000000002e03 0000000100000002
ffff88589581b780: 0000001400000012 0000000100000001
ffff88589581b790: 0000000200000002 ffff883f26650c00
ffff88589581b7a0: ffff883f26650800 0000000000000f4e
ffff88589581b7b0: 0000000000005c07 00000000000000aa
ffff88589581b7c0: 00000000000000b2 0000000000000800
ffff88589581b7d0: 0000000000000800 00000000000000aa
ffff88589581b7e0: 0000000000002e03 0000000100000002
ffff88589581b7f0: 0000001400000012 0000000100000001
ffff88589581b800: 0000000200000002 00000000000000a2
ffff88589581b810: 000000000000074e 000000000000074e
ffff88589581b820: 00000000000001d3 0000000000002e04
ffff88589581b830: 0000000c00000004 0000001400000011
ffff88589581b840: 0000000100000001 0000000200000002
ffff88589581b850: 000000002a0eb038 ffff883f26634c00
ffff88589581b860: ffff88589581b880 cpumask_next_and+53
ffff88589581b870: 0000000000000000 0000000000000400
ffff88589581b880: ffff88589581b978 load_balance+590
ffff88589581b890: 0000000000013680 0000000200000018
ffff88589581b8a0: 0000000000000000 ffff883f7f414100
ffff88589581b8b0: ffff883f26634c00 ffff88589581b8f8
ffff88589581b8c0: update_curr+204 000928f487eb6a4c
ffff88589581b8d0: ffff88589581b8f8 account_entity_dequeue+174
ffff88589581b8e0: ffff887dcc68ade8 ffff883f7f8d3700
ffff88589581b8f0: 0000000000000001 ffff88589581b948
ffff88589581b900: dequeue_entity+262 ffff883f7fb93680
ffff88589581b910: 0000000000000000 0000000200000000
ffff88589581b920: ffff887dcc68ade8 ffff883f7f8d3700
ffff88589581b930: ffff883f7f8d3680 0000000000000001
ffff88589581b940: 0000000000000001 ffff88589581b998
ffff88589581b950: dequeue_task_fair+1054 sched_clock_cpu+181
ffff88589581b960: ffff883f7f8d3680 ffff883f7f8d3680
ffff88589581b970: ffff883f7f8d3680 ffff883f7f8d3680
ffff88589581b980: 0000000000000003 0000000000000000
ffff88589581b990: 000000039581bba4 ffff88589581b9c0
ffff88589581b9a0: ffff883f7f8d3680 ffff887dcc68b360
ffff88589581b9b0: ffff883f7f8d3680 000000002a0eb038
ffff88589581b9c0: ffff887dcc68ad80 ffff883f7f8d3680

Current top of stack

ffff88589581b9d0: ffff88589581ba30 0000000000000086
ffff88589581b9e0: ffff88589581bfd8 0000000000013680
ffff88589581b9f0: ffff88589581bfd8 0000000000013680
ffff88589581ba00: ffff887dcc68ad80 ffff88589581bb88
ffff88589581ba10: 00000000000f423f 0000000000000000
ffff88589581ba20: 0000000000000000 ffff88589581bba4
ffff88589581ba30: ffff88589581ba40 schedule+41
ffff88589581ba40: ffff88589581bad8 schedule_hrtimeout_range_clock+300
ffff88589581ba50: 0000000000000001 ffff883f7f8cdbe0
ffff88589581ba60: ffff887f250ab908 000d6202b77cd22b
ffff88589581ba70: 000d6202b76d8fec hrtimer_wakeup
ffff88589581ba80: ffff883f7f8cd7e0 0000000000000001
ffff88589581ba90: 0000000000002407 schedule_hrtimeout_range_clock+172
ffff88589581baa0: 0061636974726576 0000000000000030
ffff88589581bab0: ffff887dcc68ad80 000000002a0eb038
ffff88589581bac0: ffff88589581bc90 ffff887dcc68ad80
ffff88589581bad0: 00000000fffffffc ffff88589581bae8
ffff88589581bae0: schedule_hrtimeout_range+19 ffff88589581bb18
ffff88589581baf0: poll_schedule_timeout+96 0000000000000000
ffff88589581bb00: 0000000225c17d03 0000000000000000
ffff88589581bb10: 0000000000000000 ffff88589581bf38
ffff88589581bb20: do_sys_poll+1229 00007f110dff59d0
ffff88589581bb30: ffff88589581bfd8 ffff887dcc68ad80
ffff88589581bb40: 0100000000000000 ffff88589581bf48
ffff88589581bb50: 00000000000f423f 0000000000000000
ffff88589581bb60: ffff88589581bb88 0000000000000000
ffff88589581bb70: 0000000100000000 ffff887a00000000
ffff88589581bb80: 000000019581bb98 000d6202b76d8fec
ffff88589581bb90: 0000000000000000 0000000b00000001
ffff88589581bba0: ffffffff0000003b 00000000fffffff5
ffff88589581bbb0: 0000000000000000 0000000000000018
ffff88589581bbc0: 0000000000000002 ffff88407ffda000
ffff88589581bbd0: ffffea00fb17bf00 0000000200000000
ffff88589581bbe0: ffff88589581bc08 zone_statistics+137
ffff88589581bbf0: 0000000000000000 00000000000037b4
ffff88589581bc00: ffffea00fb17bf00 ffff88589581bd10
ffff88589581bc10: 0000000000000246 00000000000002ea
ffff88589581bc20: ffff88407ffdb008 0000000000000002
ffff88589581bc30: 0000001400000007 0000000000000000
ffff88589581bc40: 000000080000000a 000000002a0eb038
ffff88589581bc50: ffff88407ffdb000 ffff8841519ba2b0
ffff88589581bc60: 0000000000000000 000000027f493680
ffff88589581bc70: 0000000000000000 0000000000000002
ffff88589581bc80: ffff88407ffdb008 0000014100000000
ffff88589581bc90: 0000000000000000 000000000000003b
ffff88589581bca0: 0000000000000000 ffff887dcc68ad80
ffff88589581bcb0: 0000000000000000 ffffffff00000001
ffff88589581bcc0: ffff887a0f85f200 000000000000003b
ffff88589581bcd0: 0000000000000000 ffff88589581bc90
ffff88589581bce0: pollwake ffff885c9e44b008
ffff88589581bcf0: ffff885c9e44b008 ffff885c9e44b000
ffff88589581bd00: 0000000000000000 0000000000000000
ffff88589581bd10: ffff88589581be48 __alloc_pages_nodemask+409
ffff88589581bd20: ffff88407ffda000 0000000000000000
ffff88589581bd30: 0000002800000004 0000000000000004
ffff88589581bd40: ffff8879ea190b60 0000000000000001
ffff88589581bd50: ffff8879ea191364 ffff88589581bd68
ffff88589581bd60: native_sched_clock+19 ffff88589581bd78
ffff88589581bd70: sched_clock+9 ffff88589581bda0
ffff88589581bd80: ffff88589581bdb0 sched_slice+94
ffff88589581bd90: ffff887f7f2d3700 ffff887f7f2d3700
ffff88589581bda0: 0000000000000000 ffff883e9fb27228
ffff88589581bdb0: ffff88589581bdc0 __enqueue_entity+120
ffff88589581bdc0: ffff88589581be08 enqueue_entity+567
ffff88589581bdd0: 0000001f00000003 cpumask_next_and+53
ffff88589581bde0: ffff883e9fb27228 ffff887f7f2d3700
ffff88589581bdf0: ffff887f7f2d3680 ffff883e9fb279c4
ffff88589581be00: 0000000000000246 ffff88589581be58
ffff88589581be10: enqueue_task_fair+1026 sched_clock_cpu+181
ffff88589581be20: ffff887f7f2d3680 ffff883e9fb271c0
ffff88589581be30: ffff887f7f2d3680 ffff883e9fb271c0
ffff88589581be40: ffff887f7f2d3680 ffff883e9fb279c4
ffff88589581be50: 0000000000000246 ffff88589581be80
ffff88589581be60: enqueue_task+44 00000000ffffeffd
ffff88589581be70: ffff883e9fb271c0 0000000000013680
ffff88589581be80: 0000000000000246 0000000000013680
ffff88589581be90: ffff88589581bec8 wake_up_new_task+260
ffff88589581bea0: 00007fffffffeffd ffff883e9fb271c0
ffff88589581beb0: 0000000000000000 0000000000000000
ffff88589581bec0: 0000000000006572 ffff88589581bf30
ffff88589581bed0: ffff88589581bee0 read_tsc+9
ffff88589581bee0: ffff88589581bf10 ktime_get_ts+76
ffff88589581bef0: ffff88589581bf48 0000000000000000
ffff88589581bf00: 00007f110dff59d0 000000002a0eb038
ffff88589581bf10: 00000000000003e8 0000000000000001
ffff88589581bf20: 00007f110dff59d0 00000000ffffffff
ffff88589581bf30: 8080808080808081 ffff88589581bf78
ffff88589581bf40: sys_poll+116 0000000000397a9a
ffff88589581bf50: 000000001e0c0bec 000000002a0eb038
ffff88589581bf60: 0000000000000004 00007f1170014360
ffff88589581bf70: 0000000000000001 00007f110dffaea0
ffff88589581bf80: system_call_fastpath+22 0000000000000293
ffff88589581bf90: 00007f12b053d6b0 0000000000002407
ffff88589581bfa0: 00007f110dff58e8 0000000000000007
ffff88589581bfb0: 0000000000000006 00000000000003e8
ffff88589581bfc0: 0000000000000001 00007f110dff59d0
ffff88589581bfd0: 0000000000000007 00007f12afd73b7d
ffff88589581bfe0: 0000000000000033 0000000000000202
ffff88589581bff0: 00007f110dff54b8 000000000000002b

That's a dead end - the addresses that were interesting on the stack have no validish looking stack frames anywhere near them.

Let's look at what is happening in the other threads active in mm code:

First CPU 0:

PID: 15187 TASK: ffff884217d40b60 CPU: 0 COMMAND: "vertica"
 #0 [ffff883f7f805e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7f805e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7f805ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7f805ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: _raw_spin_lock+55]
    RIP: ffffffff8160c0e7 RSP: ffff884a58f57780 RFLAGS: 00000206
    RAX: 0000000000005928 RBX: ffffea01e5343300 RCX: 0000000000007502
    RDX: 0000000000007504 RSI: 0000000000007504 RDI: ffffea00fc5ad030
    RBP: ffff884a58f57780 R8: 00000000fc5ad000 R9: ffff8863838edff8
    R10: 00000000000000a8 R11: 0000000000000006 R12: ffff884a58f577c8
    R13: ffff883f16b40100 R14: 0000000000000000 R15: ffffea00fc5ad030
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff884a58f57780] _raw_spin_lock at ffffffff8160c0e7
 #5 [ffff884a58f57788] __page_check_address at ffffffff8118c89c
 #6 [ffff884a58f577c0] page_referenced_one at ffffffff8118cc5d
 #7 [ffff884a58f57808] page_referenced at ffffffff8118e57b
 #8 [ffff884a58f57880] shrink_active_list at ffffffff8116b1cc
 #9 [ffff884a58f57938] shrink_lruvec at ffffffff8116b889
#10 [ffff884a58f57a38] shrink_zone at ffffffff8116bb76
#11 [ffff884a58f57a90] do_try_to_free_pages at ffffffff8116c080
#12 [ffff884a58f57b08] try_to_free_pages at ffffffff8116c56c
#13 [ffff884a58f57ba0] __alloc_pages_nodemask at ffffffff81160c0d
#14 [ffff884a58f57cd8] alloc_pages_vma at ffffffff811a2a2a
#15 [ffff884a58f57d40] do_huge_pmd_anonymous_page at ffffffff811b6deb
#16 [ffff884a58f57d98] handle_mm_fault at ffffffff81182794
#17 [ffff884a58f57e28] __do_page_fault at ffffffff816101c6
#18 [ffff884a58f57f28] do_page_fault at ffffffff816105ca
#19 [ffff884a58f57f50] page_fault at ffffffff8160c7c8
    RIP: 000000000229a610 RSP: 00007ee38cabbd90 RFLAGS: 00010202
    RAX: 00007f01ad60e0d0 RBX: 0000000000010000 RCX: 00007f01ad60e0d0
    RDX: 0000000000000005 RSI: 0000000010df9640 RDI: 0000000000000067
    RBP: 00007ee38cabbdb0 R8: 0000000000000000 R9: 000000000009ec70
    R10: 0000000000000000 R11: 00007f01ad60e0d0 R12: 00007f01ad60e0d0
    R13: 000000000000000f R14: 00007ee987fef058 R15: 00007edbccc1b020
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

Let's see if we can get the struct page passed into page_referenced

We can reuse the assembler from above to do that:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmsc
an.c: 1651
0xffffffff8116b1ba <shrink_active_list+458>: mov 0x38(%r14),%rdx
0xffffffff8116b1be <shrink_active_list+462>: lea -0x68(%rbp),%rcx
0xffffffff8116b1c2 <shrink_active_list+466>: xor %esi,%esi
0xffffffff8116b1c4 <shrink_active_list+468>: mov %r15,%rdi
0xffffffff8116b1c7 <shrink_active_list+471>: callq 0xffffffff8118e300 <page_referenced>
0xffffffff8116b1cc <shrink_active_list+476>: test %eax,%eax
0xffffffff8116b1ce <shrink_active_list+478>: je 0xffffffff8116b168 <shrink_active_list+376>

crash64> dis -l page_referenced
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 849
0xffffffff8118e300 <page_referenced>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e305 <page_referenced+5>: push %rbp
0xffffffff8118e306 <page_referenced+6>: mov %rcx,%rax
0xffffffff8118e309 <page_referenced+9>: mov %rsp,%rbp
0xffffffff8118e30c <page_referenced+12>: push %r15
0xffffffff8118e30e <page_referenced+14>: push %r14
0xffffffff8118e310 <page_referenced+16>: push %r13
0xffffffff8118e312 <page_referenced+18>: push %r12
0xffffffff8118e314 <page_referenced+20>: push %rbx
0xffffffff8118e315 <page_referenced+21>: mov %rdi,%rbx
0xffffffff8118e318 <page_referenced+24>: sub $0x40,%rsp
0xffffffff8118e31c <page_referenced+28>: mov %rcx,-0x58(%rbp)

 #7 [ffff884a58f57808] page_referenced at ffffffff8118e57b
    ffff884a58f57810: ffff883ecc996480 00000007eecffe20
    ffff884a58f57820: ffff884a58f578c8 0000000000000000
    ffff884a58f57830: 00000007eecfc000 00007eecfc000000
    ffff884a58f57840: 0000000126dd8410 000000007c394f9b
    ffff884a58f57850: rbx ffffea01e5343320 r12 ffff884a58f578d0
    ffff884a58f57860: r13 0000000000000004 r14 ffff884a58f57b20
    ffff884a58f57870: r15 ffffea01e5343300 rbp ffff884a58f57930
    ffff884a58f57880: rip ffffffff8116b1cc
 #8 [ffff884a58f57880] shrink_active_list at ffffffff8116b1cc

RDI struct page *page rdi is r15 is 0xffffea01e5343300
RSI int is_locked not known at this point
RDX struct mem_cgroup *memcg rdx is *(r14+0x38) 0x0000000000000000
RCX unsigned long *vm_flags rcx is rbp-0x68 0xffff884a58f578c8 (dereferenced is 0)

That's this page:

crash64> struct page 0xffffea01e5343300
struct page {
  flags = 27029071007842392,
  mapping = 0xffff883ecc996481,
  {
    {
      index = 34071379488,
      freelist = 0x7eecffe20,
      pfmemalloc = 32,
      pmd_huge_pte = 0x7eecffe20
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}

The mapping because it has the 1 in it means the mapping pointer is a struct anon_vma (and it looks valid as far as I can tell):

crash64> anon_vma 0xffff883ecc996480
struct anon_vma {
  root = 0xffff883ecc996480,
  rwsem = {
    count = 2,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 0,
          tickets = {
            head = 0,
            tail = 0
          }
        }
      }
    },
    wait_list = {
      next = 0xffff883ecc996498,
      prev = 0xffff883ecc996498
    }
  },
  refcount = {
    counter = 1
  },
  rb_root = {
    rb_node = 0xffff883eac290f20
  }
}

Let's look at the top of the stack where we are spinning:

crash64> dis -l _raw_spin_lock
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 136
0xffffffff8160c0b0 <_raw_spin_lock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8160c0b5 <_raw_spin_lock+5>: push %rbp
0xffffffff8160c0b6 <_raw_spin_lock+6>: mov %rsp,%rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 87
0xffffffff8160c0b9 <_raw_spin_lock+9>: mov $0x20000,%eax
0xffffffff8160c0be <_raw_spin_lock+14>: lock xadd %eax,(%rdi)
0xffffffff8160c0c2 <_raw_spin_lock+18>: mov %eax,%edx
0xffffffff8160c0c4 <_raw_spin_lock+20>: shr $0x10,%edx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 88
0xffffffff8160c0c7 <_raw_spin_lock+23>: cmp %ax,%dx
0xffffffff8160c0ca <_raw_spin_lock+26>: jne 0xffffffff8160c0ce <_raw_spin_lock+30>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 138
0xffffffff8160c0cc <_raw_spin_lock+28>: pop %rbp
0xffffffff8160c0cd <_raw_spin_lock+29>: retq
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 91
0xffffffff8160c0ce <_raw_spin_lock+30>: and $0xfffffffe,%edx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/paravirt.h: 718
0xffffffff8160c0d1 <_raw_spin_lock+33>: movzwl %dx,%esi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 136
0xffffffff8160c0d4 <_raw_spin_lock+36>: mov $0x8000,%eax
0xffffffff8160c0d9 <_raw_spin_lock+41>: jmp 0xffffffff8160c0e7 <_raw_spin_lock+55>
0xffffffff8160c0db <_raw_spin_lock+43>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/processor.h: 685
0xffffffff8160c0e0 <_raw_spin_lock+48>: pause
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 99
0xffffffff8160c0e2 <_raw_spin_lock+50>: sub $0x1,%eax
0xffffffff8160c0e5 <_raw_spin_lock+53>: je 0xffffffff8160c0f1 <_raw_spin_lock+65>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 96
0xffffffff8160c0e7 <_raw_spin_lock+55>: movzwl (%rdi),%ecx
0xffffffff8160c0ea <_raw_spin_lock+58>: cmp %cx,%dx
0xffffffff8160c0ed <_raw_spin_lock+61>: jne 0xffffffff8160c0e0 <_raw_spin_lock+48>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 138
0xffffffff8160c0ef <_raw_spin_lock+63>: pop %rbp
0xffffffff8160c0f0 <_raw_spin_lock+64>: retq
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/paravirt.h: 718
0xffffffff8160c0f1 <_raw_spin_lock+65>: nopl 0x0(%rax)
0xffffffff8160c0f8 <_raw_spin_lock+72>: jmp 0xffffffff8160c0d4 <_raw_spin_lock+36>

This is the register context at the time:

    [exception RIP: _raw_spin_lock+55]
    RIP: ffffffff8160c0e7 RSP: ffff884a58f57780 RFLAGS: 00000206
    RAX: 0000000000005928 RBX: ffffea01e5343300 RCX: 0000000000007502
    RDX: 0000000000007504 RSI: 0000000000007504 RDI: ffffea00fc5ad030
    RBP: ffff884a58f57780 R8: 00000000fc5ad000 R9: ffff8863838edff8
    R10: 00000000000000a8 R11: 0000000000000006 R12: ffff884a58f577c8
    R13: ffff883f16b40100 R14: 0000000000000000 R15: ffffea00fc5ad030
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

The spinlock is this one:

crash64> spinlock ffffea00fc5ad030
struct spinlock {
  {
    rlock = {
      raw_lock = {
        {
          head_tail = 1963357442,
          tickets = {
            head = 29954,
            tail = 29958
          }
        }
      }
    }
  }
}

The current ticket is 0x7502 and our ticket is 0x7504 but in memory the tail is now 29958 (0x7506). There's only one problem with that there doesn't appear to be anyone else spinning on the spinlock so the tail shouldn't be that far away from the head.

Let's see if we can work out where the spinlock came from:

0595 pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
0596 unsigned long address, spinlock_t **ptlp, int sync)
0597 {
0598 pmd_t *pmd;
0599 pte_t *pte;
0600 spinlock_t *ptl;
0601
0602 if (unlikely(PageHuge(page))) {
0603 pte = huge_pte_offset(mm, address);
0604 ptl = huge_pte_lockptr(page_hstate(page), mm, pte);
0605 goto check;
0606 }
0607
0608 pmd = mm_find_pmd(mm, address);
0609 if (!pmd)
0610 return NULL;
0611
0612 if (pmd_trans_huge(*pmd))
0613 return NULL;
0614
0615 pte = pte_offset_map(pmd, address);
0616 /* Make a quick check before getting the lock */
0617 if (!sync && !pte_present(*pte)) {
0618 pte_unmap(pte);
0619 return NULL;
0620 }
0621
0622 ptl = pte_lockptr(mm, pmd);
0623 check:
0624 spin_lock(ptl);
0625 if (pte_present(*pte) && page_to_pfn(page) == pte_pfn(*pte)) {
0626 *ptlp = ptl;
0627 return pte;
0628 }
0629 pte_unmap_unlock(pte, ptl);
0630 return NULL;
0631 }

We need to disassemble to try and work out where the arguments went:

RDI struct page *page ffffea01e5343300 (rdi moved to rbx but can be changed
                                        but it matches the struct page * from earlier in the
                                        stack frames)
RSI struct mm_struct *mm ffffea01e5343320
RDX unsigned long address
RCX spinlock_t **ptlp

The struct mm_struct in r15 is overwritten at address 0xffffffff8118c88f

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/mm.h: 1396
0xffffffff8118c883 <__page_check_address+179>: shl $0x12,%r8
0xffffffff8118c887 <__page_check_address+183>: shr $0x1e,%r8
0xffffffff8118c88b <__page_check_address+187>: shl $0x6,%r8
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 622
0xffffffff8118c88f <__page_check_address+191>: lea 0x30(%r8,%rax,1),%r15
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/spinlock.h: 293
0xffffffff8118c894 <__page_check_address+196>: mov %r15,%rdi
0xffffffff8118c897 <__page_check_address+199>: callq 0xffffffff8160c0b0 <_raw_spin_lock>

See if we can get it from the caller when it primes rsi:

crash64> dis -l page_referenced_one
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 666
0xffffffff8118cc10 <page_referenced_one>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118cc15 <page_referenced_one+5>: push %rbp
0xffffffff8118cc16 <page_referenced_one+6>: mov %rsp,%rbp
0xffffffff8118cc19 <page_referenced_one+9>: push %r15
0xffffffff8118cc1b <page_referenced_one+11>: push %r14
0xffffffff8118cc1d <page_referenced_one+13>: mov %r8,%r14
0xffffffff8118cc20 <page_referenced_one+16>: push %r13
0xffffffff8118cc22 <page_referenced_one+18>: mov %rcx,%r13
0xffffffff8118cc25 <page_referenced_one+21>: push %r12
0xffffffff8118cc27 <page_referenced_one+23>: mov %rdx,%r12
0xffffffff8118cc2a <page_referenced_one+26>: push %rbx
0xffffffff8118cc2b <page_referenced_one+27>: mov %rsi,%rbx
0xffffffff8118cc2e <page_referenced_one+30>: sub $0x10,%rsp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 667
0xffffffff8118cc32 <page_referenced_one+34>: mov 0x40(%rsi),%rsi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 666
0xffffffff8118cc36 <page_referenced_one+38>: mov %gs:0x28,%rax
0xffffffff8118cc3f <page_referenced_one+47>: mov %rax,-0x30(%rbp)
0xffffffff8118cc43 <page_referenced_one+51>: xor %eax,%eax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/bitops.h: 319
0xffffffff8118cc45 <page_referenced_one+53>: mov (%rdi),%rax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 671
0xffffffff8118cc48 <page_referenced_one+56>: test $0x40,%ah
0xffffffff8118cc4b <page_referenced_one+59>: jne 0xffffffff8118cd1a <page_referenced_one+266>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/rmap.h: 207
0xffffffff8118cc51 <page_referenced_one+65>: lea -0x38(%rbp),%rcx
0xffffffff8118cc55 <page_referenced_one+69>: xor %r8d,%r8d
0xffffffff8118cc58 <page_referenced_one+72>: callq 0xffffffff8118c7d0 <__page_check_address>

That's the call to this:

0201 static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
0202 unsigned long address,
0203 spinlock_t **ptlp, int sync)
0204 {
0205 pte_t *ptep;
0206
0207 __cond_lock(*ptlp, ptep = __page_check_address(page, mm, address,
0208 ptlp, sync));
0209 return ptep;
0210 }

At the point we called _page_check_address rsi to page_referenced_one was in rbx so based on the starting assembler in __page_check_address:

crash64> dis -l __page_check_address
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 597
0xffffffff8118c7d0 <__page_check_address>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118c7d5 <__page_check_address+5>: push %rbp
0xffffffff8118c7d6 <__page_check_address+6>: mov %rsp,%rbp
0xffffffff8118c7d9 <__page_check_address+9>: push %r15
0xffffffff8118c7db <__page_check_address+11>: mov %rsi,%r15
0xffffffff8118c7de <__page_check_address+14>: push %r14
0xffffffff8118c7e0 <__page_check_address+16>: mov %r8d,%r14d
0xffffffff8118c7e3 <__page_check_address+19>: push %r13
0xffffffff8118c7e5 <__page_check_address+21>: mov %rdx,%r13
0xffffffff8118c7e8 <__page_check_address+24>: push %r12
0xffffffff8118c7ea <__page_check_address+26>: mov %rcx,%r12
0xffffffff8118c7ed <__page_check_address+29>: push %rbx
0xffffffff8118c7ee <__page_check_address+30>: mov %rdi,%rbx

 #5 [ffff884a58f57788] __page_check_address at ffffffff8118c89c
    ffff884a58f57790: rbx ffff883ef96cfd88 r12 00007eecffe20000
    ffff884a58f577a0: r13 ffff884a58f57844 r14 ffff884a58f578c8
    ffff884a58f577b0: r15 0000000003e20000 rbp fff884a58f57800
    ffff884a58f577c0: rip ffffffff8118cc5d
 #6 [ffff884a58f577c0] page_referenced_one at ffffffff8118cc5d

We have rbx having the value ffff883ef96cfd88, that gives us the following struct mm_struct (which definitely doesn't look correct - not sure if I've done something wrong here in working it out):

crash64> struct mm_struct ffff883ef96cfd88
struct mm_struct {
  mmap = 0x7eecfc000000,
  mm_rb = {
    rb_node = 0x7eecffffb000
  },
  mmap_cache = 0xffff8858b3b35b00,
  get_unmapped_area = 0xffff8858b3b34798,
  unmap_area = 0xffff883ef96ce459,
  mmap_base = 18446612513286216480,
  mmap_legacy_base = 18446612513286211512,
  task_size = 0,
  cached_hole_size = 18446612306340033024,
  free_area_cache = 9223372036854775845,
  highest_vm_end = 2097267,
  pgd = 0x0,
  mm_users = {
    counter = 0
  },
  mm_count = {
    counter = 0
  },
  nr_ptes = {
    counter = 0
  },
  map_count = 0,
  page_table_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  mmap_sem = {
    count = -131672218988784,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 2888371984,
          tickets = {
            head = 3856,
            tail = 44073
          }
        }
      }
    },
    wait_list = {
      next = 0xffff883ecc996480,
      prev = 0x0
    }
  },
  mmlist = {
    next = 0x7eecfc000,
    prev = 0x0
  },
  hiwater_rss = 0,
  hiwater_vm = 0,
  total_vm = 0,
  locked_vm = 0,
  pinned_vm = 0,
  shared_vm = 0,
  exec_vm = 139484431601664,
  stack_vm = 139484431646720,
  def_flags = 18446612357284053856,
  start_code = 18446612462113425088,
  end_code = 18446612357284053889,
  start_data = 0,
  end_data = 0,
  start_brk = 0,
  brk = 18446612306340033024,
  start_stack = 288,
  arg_start = 2097264,
  arg_end = 0,
  env_start = 0,
  env_end = 0,
  saved_auxv = {0, 18446612402786860760, 18446612402786860760, 0, 0, 34053816309, 0, 0, 0, 0, 0, 0, 0, 18446612678379249408, 18446744071588520736, 18446612378695092864, 18446612181454733456, 18446744071579492128, 4063714926659634235, 7881694551112431713, 7594883584547516261, 3761967375333811570, 3833800651028182321, 7234578026164795188, 7075267604586770489, 7018701050544863587, 7598242686775027053, 7956004992772305775, 3833750091583158885, 3832673449749328181, 7291388494099866416, 7089338032643520098, 7881694551112824369, 8028073573705215845, 7308895142333738355, 3834028070305427322, 3473735894483624241, 7377801329096667491, 0, 0, 0, 0, 0, 0},
  rss_stat = {
    count = {{
        counter = 0
      }, {
        counter = 0
      }, {
        counter = 0
      }}
  },
  binfmt = 0x0,
  cpu_vm_mask_var = 0x0,
  context = {
    ldt = 0x0,
    size = 0,
    ia32_compat = 0,
    lock = {
      count = {
        counter = 0
      },
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 0,
                tickets = {
                  head = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      wait_list = {
        next = 0x0,
        prev = 0x0
      },
      owner = 0x0,
      osq = 0x0
    },
    vdso = 0x0
  },
  flags = 0,
  core_state = 0x0,
  ioctx_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  ioctx_list = {
    first = 0x0
  },
  owner = 0x0,
  exe_file = 0x0,
  mmu_notifier_mm = 0x0,
  cpumask_allocation = {
    bits = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
  },
  numa_next_scan = 0,
  numa_scan_offset = 0,
  numa_scan_seq = 0,
  tlb_flush_pending = false,
  uprobes_state = {
    xol_area = 0x0
  },
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0,
  rh_reserved5 = 0,
  rh_reserved6 = 0,
  rh_reserved7 = 0,
  rh_reserved8 = 0
}

But it's a vm_area_struct not a struct mm_struct (I'm not sure why it appears that we appear to be using the address of a struct mm_struct when it's really a vm_area_struct):

crash64> kmem ffff883ef96cfd88
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff883f7f46e700 vm_area_struct 216 47026 47471 1283 8k
  SLAB MEMORY NODE TOTAL ALLOCATED FREE
  ffffea00fbe5b380 ffff883ef96ce000 0 37 37 0
  FREE / [ALLOCATED]
  [ffff883ef96cfd88]

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea00fbe5b3c0 3ef96cf000 0 0 0 2fffff00008000 tail

crash64> vm_area_struct ffff883ef96cfd88
struct vm_area_struct {
  vm_start = 139556305240064,
  vm_end = 139556372328448,
  vm_next = 0xffff8858b3b35b00,
  vm_prev = 0xffff8858b3b34798,
  vm_rb = {
    __rb_parent_color = 18446612402786853977,
    rb_right = 0xffff8858b3b35b20,
    rb_left = 0xffff8858b3b347b8
  },
  rb_subtree_gap = 0,
  vm_mm = 0xffff882884bf3200,
  vm_page_prot = {
    pgprot = 9223372036854775845
  },
  vm_flags = 2097267,
  shared = {
    linear = {
      rb = {
        __rb_parent_color = 0,
        rb_right = 0x0,
        rb_left = 0x0
      },
      rb_subtree_last = 0
    },
    nonlinear = {
      next = 0x0,
      prev = 0x0
    }
  },
  anon_vma_chain = {
    next = 0xffff883eac290f10,
    prev = 0xffff883eac290f10
  },
  anon_vma = 0xffff883ecc996480,
  vm_ops = 0x0,
  vm_pgoff = 34071363584,
  vm_file = 0x0,
  vm_private_data = 0x0,
  vm_policy = 0x0,
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0
}

That also shows the struct anon_vma that we saw earlier at address 0xffff883ecc996480.

It's too hard to work out but since the spinlock is going to be part of this struct page and it doesn't appear to be valid - the counter is -1 so it should be free but somehow we ended up attempting to lock the spinlock in the union at the bottom. This thread wasn't ever going to return from spinlock unless something came along allocated the page and then set the spinlock value to something that allowed it to complete (after which it would then destroy the value in the union when it unlocked the spinlock):

crash64> struct page ffffea00fc5ad000
struct page {
  flags = 13510794587144192,
  mapping = 0x0,
  {
    {
      index = 0,
      freelist = 0x0,
      pfmemalloc = false,
      pmd_huge_pte = 0x0
    },
    {
      counters = 8589934591,
      {
        {
          _mapcount = {
            counter = -1
          },
          {
            inuse = 65535,
            objects = 32767,
            frozen = 1
          },
          units = -1
        },
        _count = {
          counter = 1
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 1963357442,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 1963357442,
              tickets = {
                head = 29954,
                tail = 29958
              }
            }
          }
        }
      }
    },
    slab_cache = 0x75067502,
    first_page = 0x75067502
  }
}

We look as though we are hitting some kind of kernel issue in the mm code (again what I have no idea at this point).

Now let's see what this task is doing:

PID: 15207 TASK: ffff883efaa2ad80 CPU: 2 COMMAND: "vertica"
 #0 [ffff883f7f885e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7f885e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7f885ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7f885ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: smp_call_function_many+518]
    RIP: ffffffff810d6e36 RSP: ffff8807809476e8 RFLAGS: 00000202
    RAX: 0000000000000001 RBX: 0000000000000028 RCX: ffff883f7f8570c8
    RDX: 0000000000000001 RSI: 0000000000000028 RDI: 0000000000000000
    RBP: ffff880780947720 R8: ffff883f26fa9000 R9: ffff883f7f8964a0
    R10: ffffea00092ba800 R11: ffffffff812d4e39 R12: 0000000000014140
    R13: ffffffff8105faf0 R14: ffff880780947730 R15: ffff883f7f894180
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8807809476e8] smp_call_function_many at ffffffff810d6e36
 #5 [ffff880780947728] native_flush_tlb_others at ffffffff8105fcb8
 #6 [ffff880780947778] flush_tlb_page at ffffffff8105fef5
 #7 [ffff880780947798] ptep_clear_flush_young at ffffffff8105e9cd
 #8 [ffff8807809477c0] page_referenced_one at ffffffff8118cc76
 #9 [ffff880780947808] page_referenced at ffffffff8118e57b
#10 [ffff880780947880] shrink_active_list at ffffffff8116b1cc
#11 [ffff880780947938] shrink_lruvec at ffffffff8116b889
#12 [ffff880780947a38] shrink_zone at ffffffff8116bb76
#13 [ffff880780947a90] do_try_to_free_pages at ffffffff8116c080
#14 [ffff880780947b08] try_to_free_pages at ffffffff8116c56c
#15 [ffff880780947ba0] __alloc_pages_nodemask at ffffffff81160c0d
#16 [ffff880780947cd8] alloc_pages_vma at ffffffff811a2a2a
#17 [ffff880780947d40] do_huge_pmd_anonymous_page at ffffffff811b6deb
#18 [ffff880780947d98] handle_mm_fault at ffffffff81182794
#19 [ffff880780947e28] __do_page_fault at ffffffff816101c6
#20 [ffff880780947f28] do_page_fault at ffffffff816105ca
#21 [ffff880780947f50] page_fault at ffffffff8160c7c8
    RIP: 00000000011e70dc RSP: 00007ee38dcaa370 RFLAGS: 00010246
    RAX: 00007ea949501a90 RBX: 0000000000000008 RCX: 00007ee96bc00000
    RDX: 00007ea949501ac0 RSI: 0001c207952ca000 RDI: 000000000000004c
    RBP: 00007ee38dcaa3f0 R8: 0000000000000008 R9: 0000000000000002
    R10: 0000000000018fa3 R11: 00007eb9c8a709b8 R12: 00007eb9c8a6fca0
    R13: 00007f01b14c8652 R14: 0000000000000040 R15: 00007eb9c8a70a88
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

Let's work out where we were when we called page_referenced. We are here:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1651
0xffffffff8116b1ba <shrink_active_list+458>: mov 0x38(%r14),%rdx
0xffffffff8116b1be <shrink_active_list+462>: lea -0x68(%rbp),%rcx
0xffffffff8116b1c2 <shrink_active_list+466>: xor %esi,%esi
0xffffffff8116b1c4 <shrink_active_list+468>: mov %r15,%rdi
0xffffffff8116b1c7 <shrink_active_list+471>: callq 0xffffffff8118e300 <page_referenced>
0xffffffff8116b1cc <shrink_active_list+476>: test %eax,%eax
0xffffffff8116b1ce <shrink_active_list+478>: je 0xffffffff8116b168 <shrink_active_list+376>

1651 if (page_referenced(page, 0, sc->target_mem_cgroup, <<<<<<<<<
1652 &vm_flags)) { <<<<<<<<<
1653 nr_rotated += hpage_nr_pages(page);
1654 /*
1655 * Identify referenced, file-backed active pages and
1656 * give them one more trip around the active list. So
1657 * that executable code get better chances to stay in
1658 * memory under moderate memory pressure. Anon pages
1659 * are not likely to be evicted by use-once streaming
1660 * IO, plus JVM can create lots of anon VM_EXEC pages,
1661 * so we ignore them here.
1662 */
1663 if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
1664 list_add(&page->lru, &l_active);
1665 continue;
1666 }
1667 }

So in page referenced we save the following registers:

crash64> dis -l page_referenced
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 849
0xffffffff8118e300 <page_referenced>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e305 <page_referenced+5>: push %rbp
0xffffffff8118e306 <page_referenced+6>: mov %rcx,%rax
0xffffffff8118e309 <page_referenced+9>: mov %rsp,%rbp
0xffffffff8118e30c <page_referenced+12>: push %r15
0xffffffff8118e30e <page_referenced+14>: push %r14
0xffffffff8118e310 <page_referenced+16>: push %r13
0xffffffff8118e312 <page_referenced+18>: push %r12
0xffffffff8118e314 <page_referenced+20>: push %rbx
0xffffffff8118e315 <page_referenced+21>: mov %rdi,%rbx
0xffffffff8118e318 <page_referenced+24>: sub $0x40,%rsp
0xffffffff8118e31c <page_referenced+28>: mov %rcx,-0x58(%rbp)

And the stack frame looks like:

 #9 [ffff880780947808] page_referenced at ffffffff8118e57b
    ffff880780947810: ffff886e2da628c0 00000007f04ffe98
    ffff880780947820: ffff8807809478c8 0000000000000000
    ffff880780947830: 00000007f04fc000 00007f04fc000000
    ffff880780947840: 0000000126dd8410 000000001fedd942
    ffff880780947850: rbx ffffea01af4392a0 r12 ffff8807809478d0
    ffff880780947860: r13 0000000000000000 r14 ffff880780947b20
    ffff880780947870: r15 ffffea01af439280 rbp ffff880780947930
    ffff880780947880: rip ffffffff8116b1cc
#10 [ffff880780947880] shrink_active_list at ffffffff8116b1cc

The 4 args passed into page_referenced are in these registers:

RDI struct page *page rdi is r15 is 0xffffea01af439280
RSI int is_locked not known at this point
RDX struct mem_cgroup *memcg rdx is *(r14+0x38) 0x0000000000000000
RCX unsigned long *vm_flags rcx is rbp-0x68 0xffff8807809478c8 (dereferenced is 0)

crash64> struct page 0xffffea01af439280
struct page {
  flags = 27033606493306968,
  mapping = 0xffff886e2da628c1,
  {
    {
      index = 34096545432,
      freelist = 0x7f04ffe98,
      pfmemalloc = 152,
      pmd_huge_pte = 0x7f04ffe98
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}

crash64> struct anon_vma 0xffff886e2da628c0
struct anon_vma {
  root = 0xffff886e2da628c0,
  rwsem = {
    count = 1,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 0,
          tickets = {
            head = 0,
            tail = 0
          }
        }
      }
    },
    wait_list = {
      next = 0xffff886e2da628d8,
      prev = 0xffff886e2da628d8
    }
  },
  refcount = {
    counter = 1
  },
  rb_root = {
    rb_node = 0xffff887b55d997a0
  }
}

That's definitely an anon_vma since it's allocated out of that slab:

crash64> kmem 0xffff886e2da628c0
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff883f7f46e000 anon_vma 56 20317 25664 401 4k
  SLAB MEMORY NODE TOTAL ALLOCATED FREE
  ffffea01b8b69880 ffff886e2da62000 1 64 12 52
  FREE / [ALLOCATED]
  [ffff886e2da628c0]

      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea01b8b69880 6e2da62000 0 ffff886e2da62400 1 6fffff00000080 slab

This CPU seems to be going fine and it's likely that it's waiting for the thread trying to acquire the spinlock since it is in smp_call_function_many and that is going to be sending IPIs other CPUs, we're here in that function at +518:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/smp.c: 109
0xffffffff810d6e32 <smp_call_function_many+514>: testb $0x1,0x20(%rcx)
0xffffffff810d6e36 <smp_call_function_many+518>: jne 0xffffffff810d6e30 <smp_call_function_many+512>
0xffffffff810d6e38 <smp_call_function_many+520>: movslq 0x94bf15(%rip),%rsi # 0xffffffff81a22d54 <nr_cpu_ids>
0xffffffff810d6e3f <smp_call_function_many+527>: jmp 0xffffffff810d6df8 <smp_call_function_many+456>

That's in this (inlined) function:

0100 /*
0101 * csd_lock/csd_unlock used to serialize access to per-cpu csd resources
0102 *
0103 * For non-synchronous ipi calls the csd can still be in use by the
0104 * previous function call. For multi-cpu calls its even more interesting
0105 * as we'll have to ensure no other cpu is observing our csd.
0106 */
0107 static void csd_lock_wait(struct call_single_data *csd)
0108 {
0109 while (csd->flags & CSD_FLAG_LOCK)
0110 cpu_relax();
0111 }

which is after the IPIs have been sent:

0432 /* Send a message to all CPUs in the map */
0433 arch_send_call_function_ipi_mask(cfd->cpumask);
0434
0435 if (wait) {
0436 for_each_cpu(cpu, cfd->cpumask) {
0437 struct call_single_data *csd;
0438
0439 csd = per_cpu_ptr(cfd->csd, cpu);
0440 csd_lock_wait(csd);
0441 }
0442 }

So we're waiting for some threads to indicate that they've done the task that we wanted done (a tlb flush). I didn't try to get the CPU mask to make sure that it was certain CPUs that we are waiting for.

Going back to the thread that brought down the system.

crash64> bt
PID: 15199 TASK: ffff88572574a220 CPU: 1 COMMAND: "vertica"
 #0 [ffff88725b71f400] machine_kexec at ffffffff8104c6a1
 #1 [ffff88725b71f458] crash_kexec at ffffffff810e2252
 #2 [ffff88725b71f528] oops_end at ffffffff8160d548
 #3 [ffff88725b71f550] no_context at ffffffff815fdf52
 #4 [ffff88725b71f5a0] __bad_area_nosemaphore at ffffffff815fdfe8
 #5 [ffff88725b71f5e8] bad_area_nosemaphore at ffffffff815fe152
 #6 [ffff88725b71f5f8] __do_page_fault at ffffffff816103ae
 #7 [ffff88725b71f6f8] do_page_fault at ffffffff816105ca
 #8 [ffff88725b71f720] page_fault at ffffffff8160c7c8
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff88725b71f7d0 RFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffff881e84f50ec0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000008
    RBP: ffff88725b71f7d0 R8: ffffea0191f314e0 R9: ffff883f24ca9098
    R10: ffffea00fc038800 R11: ffffffff812d4e39 R12: ffff881e84f50ec1
    R13: ffffea0191f314c0 R14: 0000000000000008 R15: ffffea0191f314c0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
 #9 [ffff88725b71f7d8] page_lock_anon_vma_read at ffffffff8118e245
#10 [ffff88725b71f808] page_referenced at ffffffff8118e4c7
#11 [ffff88725b71f880] shrink_active_list at ffffffff8116b1cc
#12 [ffff88725b71f938] shrink_lruvec at ffffffff8116b889
#13 [ffff88725b71fa38] shrink_zone at ffffffff8116bb76
#14 [ffff88725b71fa90] do_try_to_free_pages at ffffffff8116c080
#15 [ffff88725b71fb08] try_to_free_pages at ffffffff8116c56c
#16 [ffff88725b71fba0] __alloc_pages_nodemask at ffffffff81160c0d
#17 [ffff88725b71fcd8] alloc_pages_vma at ffffffff811a2a2a
#18 [ffff88725b71fd40] do_huge_pmd_anonymous_page at ffffffff811b6deb
#19 [ffff88725b71fd98] handle_mm_fault at ffffffff81182794
#20 [ffff88725b71fe28] __do_page_fault at ffffffff816101c6
#21 [ffff88725b71ff28] do_page_fault at ffffffff816105ca
#22 [ffff88725b71ff50] page_fault at ffffffff8160c7c8
    RIP: 000000000229a610 RSP: 00007ee38d57e030 RFLAGS: 00010206
    RAX: 00007eb93481c330 RBX: 0000000000010000 RCX: 00007eb93481c330
    RDX: 0000000000000005 RSI: 0000000011fbda48 RDI: 0000000000000137
    RBP: 00007ee38d57e050 R8: 0000000000000000 R9: 000000000045f2d8
    R10: 0000000000000000 R11: 00007eb93481c330 R12: 00007eb93481c330
    R13: 000000000000002a R14: 00007ef11cda5e40 R15: 00007ef7876325e0
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

Let's see if we can get the args to do_huge_pmd_anonymous_page, the args are:

0782 int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
0783 unsigned long address, pmd_t *pmd,
0784 unsigned int flags)
0785 {

The args being primed are:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/memory.c: 3727
0xffffffff81182780 <handle_mm_fault+368>: mov %r14d,%r8d
0xffffffff81182783 <handle_mm_fault+371>: mov %r12,%rcx
0xffffffff81182786 <handle_mm_fault+374>: mov %rbx,%rdx
0xffffffff81182789 <handle_mm_fault+377>: mov %r13,%rsi
0xffffffff8118278c <handle_mm_fault+380>: mov %r15,%rdi
0xffffffff8118278f <handle_mm_fault+383>: callq 0xffffffff811b6ce0 <do_huge_pmd_anonymous_page>

Now does do_huge_pmd_anonymous_page save those registers, yes:

crash64> dis -l do_huge_pmd_anonymous_page
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/huge_memory.c: 785
0xffffffff811b6ce0 <do_huge_pmd_anonymous_page>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff811b6ce5 <do_huge_pmd_anonymous_page+5>: push %rbp
0xffffffff811b6ce6 <do_huge_pmd_anonymous_page+6>: mov %rsp,%rbp
0xffffffff811b6ce9 <do_huge_pmd_anonymous_page+9>: push %r15
0xffffffff811b6ceb <do_huge_pmd_anonymous_page+11>: mov %rsi,%r15
0xffffffff811b6cee <do_huge_pmd_anonymous_page+14>: push %r14
0xffffffff811b6cf0 <do_huge_pmd_anonymous_page+16>: push %r13
0xffffffff811b6cf2 <do_huge_pmd_anonymous_page+18>: push %r12
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/huge_memory.c: 790
0xffffffff811b6cf4 <do_huge_pmd_anonymous_page+20>: mov $0x800,%r12d
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/huge_memory.c: 785
0xffffffff811b6cfa <do_huge_pmd_anonymous_page+26>: push %rbx
0xffffffff811b6cfb <do_huge_pmd_anonymous_page+27>: mov %rdx,%rbx

#18 [ffff88725b71fd40] do_huge_pmd_anonymous_page at ffffffff811b6deb
    ffff88725b71fd48: ffffffff810a67d9 ffff8839d7dc4fa0
    ffff88725b71fd58: ffff883f7f853680 ffff883900000029
    ffff88725b71fd68: rbx 00007eb93481c330 r12 ffff8879f66b6d20
    ffff88725b71fd78: r13 ffff883eb8f0caf8 r14 0000000000000029
    ffff88725b71fd88: r15 ffff882884bf3200 rbp ffff88725b71fe20
    ffff88725b71fd98: rip ffffffff81182794
#19 [ffff88725b71fd98] handle_mm_fault at ffffffff81182794

The args should be:

RDI struct mm_struct * r15 0xffff882884bf3200
RSI struct vm_area_struct * r13 0xffff883eb8f0caf8
RDX unsigned long rbx 0x00007eb93481c330
RCX pmd_t * r12 0xffff8879f66b6d20
R8 unsigned int r14d 0x00000029

crash64> mm_struct 0xffff882884bf3200
struct mm_struct {
  mmap = 0xffff887f220cbd88,
  mm_rb = {
    rb_node = 0xffff887abff7ce78
  },
  mmap_cache = 0xffff8854da23d440,
  get_unmapped_area = 0xffffffff81018810 <arch_get_unmapped_area_topdown>,
  unmap_area = 0xffffffff811884e0 <arch_unmap_area_topdown>,
  mmap_base = 139718252187648,
  mmap_legacy_base = 47495286927360,
  task_size = 140737488351232,
  cached_hole_size = 18446744073709551615,
  free_area_cache = 139718252187648,
  highest_vm_end = 140737338667008,
  pgd = 0xffff883ed6993000,
  mm_users = {
    counter = 15271
  },
  mm_count = {
    counter = 33
  },
  nr_ptes = {
    counter = 132134
  },
  map_count = 35262,
  page_table_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 4190173632,
            tickets = {
              head = 63936,
              tail = 63936
            }
          }
        }
      }
    }
  },
  mmap_sem = {
    count = -4294967292,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 4046254380,
          tickets = {
            head = 61740,
            tail = 61740
          }
        }
      }
    },
    wait_list = {
      next = 0xffff88632afe3df0,
      prev = 0xffff887c5740fd70
    }
  },
  mmlist = {
    next = 0xffff882884bf3298,
    prev = 0xffff882884bf3298
  },
  hiwater_rss = 71725456,
  hiwater_vm = 137067799,
  total_vm = 111876172,
  locked_vm = 0,
  pinned_vm = 0,
  shared_vm = 67330,
  exec_vm = 21088,
  stack_vm = 38,
  def_flags = 0,
  start_code = 4194304,
  end_code = 75474696,
  start_data = 75478816,
  end_data = 116921720,
  start_brk = 142704640,
  brk = 262139904,
  start_stack = 140737337695808,
  arg_start = 140737337702185,
  arg_end = 140737337702346,
  env_start = 140737337702346,
  env_end = 140737337704415,
  saved_auxv = {33, 140737338658816, 16, 3219913727, 6, 4096, 17, 100, 3, 4194368, 4, 56, 5, 8, 7, 139718249943040, 8, 0, 9, 5079160, 11, 10019, 12, 10019, 13, 540, 14, 540, 23, 0, 25, 140737337696457, 31, 140737337704415, 15, 140737337696473, 0, 0, 0, 0, 0, 0, 0, 0},
  rss_stat = {
    count = {{
        counter = 10223
      }, {
        counter = 53065662
      }, {
        counter = 0
      }}
  },
  binfmt = 0xffffffff81995fe0 <elf_format>,
  cpu_vm_mask_var = 0xffff882884bf3548,
  context = {
    ldt = 0x0,
    size = 0,
    ia32_compat = 0,
    lock = {
      count = {
        counter = 1
      },
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              {
                head_tail = 0,
                tickets = {
                  head = 0,
                  tail = 0
                }
              }
            }
          }
        }
      },
      wait_list = {
        next = 0xffff882884bf34e8,
        prev = 0xffff882884bf34e8
      },
      owner = 0x0,
      osq = 0x0
    },
    vdso = 0x7ffff713d000
  },
  flags = 131277,
  core_state = 0x0,
  ioctx_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 0,
            tickets = {
              head = 0,
              tail = 0
            }
          }
        }
      }
    }
  },
  ioctx_list = {
    first = 0x0
  },
  owner = 0xffff883ebaa24fa0,
  exe_file = 0xffff88623b53d800,
  mmu_notifier_mm = 0x0,
  cpumask_allocation = {
    bits = {519, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
  },
  numa_next_scan = 8061605031,
  numa_scan_offset = 139492130291712,
  numa_scan_seq = 5678,
  tlb_flush_pending = false,
  uprobes_state = {
    xol_area = 0x0
  },
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0,
  rh_reserved5 = 0,
  rh_reserved6 = 0,
  rh_reserved7 = 0,
  rh_reserved8 = 0
}

crash64> vm_area_struct ffff883eb8f0caf8
struct vm_area_struct {
  vm_start = 139333908545536,
  vm_end = 139333925326848,
  vm_next = 0xffff8839718125e8,
  vm_prev = 0xffff883970a62798,
  vm_rb = {
    __rb_parent_color = 18446612379017291704,
    rb_right = 0x0,
    rb_left = 0x0
  },
  rb_subtree_gap = 41967616,
  vm_mm = 0xffff882884bf3200,
  vm_page_prot = {
    pgprot = 9223372036854775845
  },
  vm_flags = 1048691,
  shared = {
    linear = {
      rb = {
        __rb_parent_color = 0,
        rb_right = 0x0,
        rb_left = 0x0
      },
      rb_subtree_last = 0
    },
    nonlinear = {
      next = 0x0,
      prev = 0x0
    }
  },
  anon_vma_chain = {
    next = 0xffff887a7e661dd0,
    prev = 0xffff887a7e661dd0
  },
  anon_vma = 0xffff883a45147800,
  vm_ops = 0x0,
  vm_pgoff = 34017067516,
  vm_file = 0x0,
  vm_private_data = 0x0,
  vm_policy = 0x0,
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0
}

address is 0x00007eb93481c330

0041 #define PMD_SHIFT 21

0064 #define HPAGE_PMD_SHIFT PMD_SHIFT
0065 #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT)
0066 #define HPAGE_PMD_MASK (~(HPAGE_PMD_SIZE - 1))

crash64> p 1<<21
$1 = 2097152
crash64> p/x ~((unsigned long)$1-1)
$3 = 0xffffffffffe00000
crash64> p/x 0x00007eb93481c330 & 0xffffffffffe00000
$4 = 0x7eb934800000


crash64> pmd_t 0xffff8879f66b6d20
struct pmd_t {
  pmd = 0
}

The flags being 0x29 translate to:

0183 #define FAULT_FLAG_WRITE 0x01 /* Fault was a write access */
0186 #define FAULT_FLAG_ALLOW_RETRY 0x08 /* Retry fault if blocking */
0188 #define FAULT_FLAG_KILLABLE 0x20 /* The fault task is in SIGKILL killable region */

Let's work through what we can in the code:

0782 int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
0783 unsigned long address, pmd_t *pmd,
0784 unsigned int flags)
0785 {
0786 struct page *page;
0787 unsigned long haddr = address & HPAGE_PMD_MASK;
0788
0789 if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
0790 return VM_FAULT_FALLBACK;

    haddr 0x00007eb934800000
    vm_start 0x00007eb9341fc000
    vm_end 0x00007eb9351fd000

The first test is false haddr is not less than vma->vm_start and so is the second one:

crash64> p/x 0x00007eb934800000+2097152
$5 = 0x7eb934a00000

haddr + HPAGE_PMD_SIZE is less than vma->vm_end

0791 if (unlikely(anon_vma_prepare(vma)))
0792 return VM_FAULT_OOM;
0793 if (unlikely(khugepaged_enter(vma)))
0794 return VM_FAULT_OOM;

FAULT_FLAG_WRITE was set so we didn't go into this if.

0795 if (!(flags & FAULT_FLAG_WRITE) &&
0796 transparent_hugepage_use_zero_page()) {
0797 spinlock_t *ptl;
0798 pgtable_t pgtable;
0799 struct page *zero_page;
0800 bool set;
0801 pgtable = pte_alloc_one(mm, haddr);
0802 if (unlikely(!pgtable))
0803 return VM_FAULT_OOM;
0804 zero_page = get_huge_zero_page();
0805 if (unlikely(!zero_page)) {
0806 pte_free(mm, pgtable);
0807 count_vm_event(THP_FAULT_FALLBACK);
0808 return VM_FAULT_FALLBACK;
0809 }
0810 ptl = pmd_lock(mm, pmd);
0811 set = set_huge_zero_page(pgtable, mm, vma, haddr, pmd,
0812 zero_page);
0813 spin_unlock(ptl);
0814 if (!set) {
0815 pte_free(mm, pgtable);
0816 put_huge_zero_page();
0817 }
0818 return 0;
0819 }
0820 page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
0821 vma, haddr, numa_node_id(), 0);
0822 if (unlikely(!page)) {
0823 count_vm_event(THP_FAULT_FALLBACK);
0824 return VM_FAULT_FALLBACK;
0825 }
0826 if (unlikely(mem_cgroup_newpage_charge(page, mm, GFP_KERNEL))) {
0827 put_page(page);
0828 count_vm_event(THP_FAULT_FALLBACK);
0829 return VM_FAULT_FALLBACK;
0830 }
0831 if (unlikely(__do_huge_pmd_anonymous_page(mm, vma, haddr, pmd, page))) {
0832 mem_cgroup_uncharge_page(page);
0833 put_page(page);
0834 count_vm_event(THP_FAULT_FALLBACK);
0835 return VM_FAULT_FALLBACK;
0836 }
0837
0838 count_vm_event(THP_FAULT_ALLOC);
0839 return 0;
0840 }

The pmd_t being 0 is not unusual the kernel has the config option CONFIG_SPLIT_PTLOCK_CPUS set:

crash64> pmd_t 0xffff8879f66b6d20
struct pmd_t {
  pmd = 0
}

The next call upwards is from the call to alloc_hugepage_vma which is this inline function:

0748 static inline struct page *alloc_hugepage_vma(int defrag,
0749 struct vm_area_struct *vma,
0750 unsigned long haddr, int nd,
0751 gfp_t extra_gfp)
0752 {
0753 return alloc_pages_vma(alloc_hugepage_gfpmask(defrag, extra_gfp),
0754 HPAGE_PMD_ORDER, vma, haddr, nd);
0755 }

Let's gather the arguments that will be passed to alloc_pages_vma. First is this:

0078 #define transparent_hugepage_defrag(__vma) \
0079 ((transparent_hugepage_flags & \
0080 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_FLAG)) || \
0081 (transparent_hugepage_flags & \
0082 (1<<TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG) && \
0083 (__vma)->vm_flags & VM_HUGEPAGE))

crash64> p/t transparent_hugepage_flags
$9 = 110101

0037 enum transparent_hugepage_flag {
0038 TRANSPARENT_HUGEPAGE_FLAG, 1
0039 TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG, 2
0040 TRANSPARENT_HUGEPAGE_DEFRAG_FLAG, 3
0041 TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, 4
0042 TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, 5
0043 TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, 6
0044 #ifdef CONFIG_DEBUG_VM
0045 TRANSPARENT_HUGEPAGE_DEBUG_COW_FLAG, 7
0046 #endif
0047 };

So we have the following set:

TRANSPARENT_HUGEPAGE_FLAG
TRANSPARENT_HUGEPAGE_DEFRAG_FLAG
TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG
TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG

The first one is true and the second test is not (after the ||) so we will do defrag (it will be 1). Because TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG is not set we do not need to worry about the vm_flags.

0060 #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)

HPAGE_PMD_SHIFT is 21 and PAGE_SHIFT Is 12 so we're going to pass 9 as the order to ask for.

The VMA is vm_area_struct ffff883eb8f0caf8

haddr is 0x00007eb934800000

The node id (nd) is going to be zero for this CPU.

So calling alloc_pages_vma we have:

alloc_hugepage_vma(int defrag (true), struct vm_area_struct *vma = 0xffff883eb8f0caf8
    unsigned long haddr = 0x00007eb934800000, int nd = 0, gfp_t extra_gfp = 0)

That's about as far as I got into this dump (I receved 3 additional dumps - the first of those is actually the first note added above). I still have two more to go through.
sms1123

sms1123

2016-01-27 05:37

reporter   ~0025533

The next one is a null pointer dereference but it's a little different at being 0x7 instead of 0x8:

[5910857.628081] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007

crash64> bt
PID: 27682 TASK: ffff8806b07f0000 CPU: 37 COMMAND: "vertica"
 #0 [ffff885f34e933b8] machine_kexec at ffffffff8104c6a1
 #1 [ffff885f34e93410] crash_kexec at ffffffff810e2252
 #2 [ffff885f34e934e0] oops_end at ffffffff8160d548
 #3 [ffff885f34e93508] no_context at ffffffff815fdf52
 #4 [ffff885f34e93558] __bad_area_nosemaphore at ffffffff815fdfe8
 #5 [ffff885f34e935a0] bad_area at ffffffff815fe366
 #6 [ffff885f34e935c8] __do_page_fault at ffffffff816104ec
 #7 [ffff885f34e936c8] do_page_fault at ffffffff816105ca
 #8 [ffff885f34e936f0] page_fault at ffffffff8160c7c8
    [exception RIP: down_read_trylock+9]
    RIP: ffffffff8109c389 RSP: ffff885f34e937a0 RFLAGS: 00010213
    RAX: 0000000000000000 RBX: ffff886b86e0adc0 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
    RBP: ffff885f34e937a0 R8: ffffea0112dc0c60 R9: ffff88006d2f3068
    R10: 0000000000000088 R11: 0000000000000000 R12: ffff886b86e0adc1
    R13: ffffea0112dc0c40 R14: 0000000000000007 R15: ffffea0112dc0c40
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
 #9 [ffff885f34e937a8] page_lock_anon_vma_read at ffffffff8118e245
#10 [ffff885f34e937d8] page_referenced at ffffffff8118e4c7
#11 [ffff885f34e93850] shrink_active_list at ffffffff8116b1cc
#12 [ffff885f34e93908] shrink_lruvec at ffffffff8116b889
#13 [ffff885f34e93a08] shrink_zone at ffffffff8116bb76
#14 [ffff885f34e93a60] do_try_to_free_pages at ffffffff8116c080
#15 [ffff885f34e93ad8] try_to_free_pages at ffffffff8116c56c
#16 [ffff885f34e93b70] __alloc_pages_nodemask at ffffffff81160c0d
#17 [ffff885f34e93ca8] alloc_pages_vma at ffffffff811a2a2a
#18 [ffff885f34e93d10] do_wp_page at ffffffff811807ba
#19 [ffff885f34e93d98] handle_mm_fault at ffffffff81182b94
#20 [ffff885f34e93e28] __do_page_fault at ffffffff816101c6
#21 [ffff885f34e93f28] do_page_fault at ffffffff816105ca
#22 [ffff885f34e93f50] page_fault at ffffffff8160c7c8
    RIP: 0000000000c99820 RSP: 00007eface91d1d0 RFLAGS: 00010246
    RAX: 000000000188d28e RBX: 000000000000415c RCX: 0000000000000000
    RDX: 00007ede36c1bc70 RSI: 0000000000000048 RDI: 4b5b1bfb4f4e2c5d
    RBP: 00007eface91d460 R8: 00007f062229f050 R9: 0000000003ffffff
    R10: 00000000002d5334 R11: 0000000000000050 R12: 00007ee26b3b8070
    R13: 00007eddbbfff010 R14: 0000000000000009 R15: 00007eface91dac0
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

Let's look at page_lock_anon_vma_read to see how it worked out what rdi was.

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 460
0xffffffff8118e239 <page_lock_anon_vma_read+73>: add $0x8,%r14
0xffffffff8118e23d <page_lock_anon_vma_read+77>: mov %r14,%rdi
0xffffffff8118e240 <page_lock_anon_vma_read+80>: callq 0xffffffff8109c38
0 <down_read_trylock>
0xffffffff8118e245 <page_lock_anon_vma_read+85>: test %eax,%eax
0xffffffff8118e247 <page_lock_anon_vma_read+87>: je 0xffffffff8118e260 <page_lock_anon_vma_read+112>

ok, it's r14 moved into rdi which means before this r14 must have been 0xffffffffffffffff. Looking at a wider bit of assembler (also taking into account what we found previously):

0xffffffff8118e1f0 <page_lock_anon_vma_read>: nopl 0x0(%rax,%rax,1) [FTRACENOP]
0xffffffff8118e1f5 <page_lock_anon_vma_read+5>: push %rbp
0xffffffff8118e1f6 <page_lock_anon_vma_read+6>: mov %rsp,%rbp
0xffffffff8118e1f9 <page_lock_anon_vma_read+9>: push %r14
0xffffffff8118e1fb <page_lock_anon_vma_read+11>: push %r13
0xffffffff8118e1fd <page_lock_anon_vma_read+13>: mov %rdi,%r13
0xffffffff8118e200 <page_lock_anon_vma_read+16>: push %r12
0xffffffff8118e202 <page_lock_anon_vma_read+18>: push %rbx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 452
0xffffffff8118e203 <page_lock_anon_vma_read+19>: mov 0x8(%rdi),%r12
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 453
0xffffffff8118e207 <page_lock_anon_vma_read+23>: mov %r12,%rax
0xffffffff8118e20a <page_lock_anon_vma_read+26>: and $0x3,%eax
0xffffffff8118e20d <page_lock_anon_vma_read+29>: cmp $0x1,%rax
0xffffffff8118e211 <page_lock_anon_vma_read+33>: je 0xffffffff8118e228 <page_lock_anon_vma_read+56>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 447
0xffffffff8118e213 <page_lock_anon_vma_read+35>: xor %ebx,%ebx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 505
0xffffffff8118e215 <page_lock_anon_vma_read+37>: mov %rbx,%rax
0xffffffff8118e218 <page_lock_anon_vma_read+40>: pop %rbx
0xffffffff8118e219 <page_lock_anon_vma_read+41>: pop %r12
0xffffffff8118e21b <page_lock_anon_vma_read+43>: pop %r13
0xffffffff8118e21d <page_lock_anon_vma_read+45>: pop %r14
0xffffffff8118e21f <page_lock_anon_vma_read+47>: pop %rbp
0xffffffff8118e220 <page_lock_anon_vma_read+48>: retq
0xffffffff8118e221 <page_lock_anon_vma_read+49>: nopl 0x0(%rax)
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/atomic.h: 26
0xffffffff8118e228 <page_lock_anon_vma_read+56>: mov 0x18(%rdi),%eax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 455
0xffffffff8118e22b <page_lock_anon_vma_read+59>: test %eax,%eax
0xffffffff8118e22d <page_lock_anon_vma_read+61>: js 0xffffffff8118e213 <page_lock_anon_vma_read+35>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 459
0xffffffff8118e22f <page_lock_anon_vma_read+63>: mov -0x1(%r12),%r14
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 458
0xffffffff8118e234 <page_lock_anon_vma_read+68>: lea -0x1(%r12),%rbx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 460
0xffffffff8118e239 <page_lock_anon_vma_read+73>: add $0x8,%r14
0xffffffff8118e23d <page_lock_anon_vma_read+77>: mov %r14,%rdi
0xffffffff8118e240 <page_lock_anon_vma_read+80>: callq 0xffffffff8109c380 <down_read_trylock>
0xffffffff8118e245 <page_lock_anon_vma_read+85>: test %eax,%eax
0xffffffff8118e247 <page_lock_anon_vma_read+87>: je 0xffffffff8118e260 <page_lock_anon_vma_read+112>

Our struct page is in r13 (it's saved at page_lock_anon_vma_read+13):

crash64> page ffffea0112dc0c40
struct page {
  flags = 31525193097150536,
  mapping = 0xffff886b86e0adc1,
  {
    {
      index = 34307861533,
      freelist = 0x7fce86c1d,
      pfmemalloc = 29,
      pmd_huge_pte = 0x7fce86c1d
    },
    {
      counters = 8589934592,
      {
        {
          _mapcount = {
            counter = 0
          },
          {
            inuse = 0,
            objects = 0,
            frozen = 0
          },
          units = 0
        },
        _count = {
          counter = 2
        }
      }
    }
  },
  {
    lru = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    {
      next = 0xdead000000100100,
      pages = 2097664,
      pobjects = -559087616
    },
    list = {
      next = 0xdead000000100100,
      prev = 0xdead000000200200
    },
    slab_page = 0xdead000000100100
  },
  {
    private = 0,
    ptl = {
      {
        rlock = {
          raw_lock = {
            {
              head_tail = 0,
              tickets = {
                head = 0,
                tail = 0
              }
            }
          }
        }
      }
    },
    slab_cache = 0x0,
    first_page = 0x0
  }
}

So what is the mapping pointer? It's not from the anon_vma kmem cache:

crash64> kmem 0xffff886b86e0adc0
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea01ae1b8280 6b86e0a000 0 0 1 6fffff00000000

It's readable:

crash64> struct anon_vma 0xffff886b86e0adc0
struct anon_vma {
  root = 0xffffffffffffffff,
  rwsem = {
    count = -1,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 4294967295,
          tickets = {
            head = 65535,
            tail = 65535
          }
        }
      }
    },
    wait_list = {
      next = 0xffffffffffffffff,
      prev = 0xffffffffffffffff
    }
  },
  refcount = {
    counter = -1
  },
  rb_root = {
    rb_node = 0xffffffffffffffff
  }
}

It's also garbage for a struct anon_vma, it's covered in 0xff's:

crash64> rd -64 0xffff886b86e0adc0 32
ffff886b86e0adc0: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0add0: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ade0: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0adf0: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae00: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae10: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae20: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae30: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae40: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae50: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae60: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae70: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae80: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0ae90: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0aea0: ffffffffffffffff ffffffffffffffff ................
ffff886b86e0aeb0: ffffffffffffffff ffffffffffffffff ................

Are there other interesting threads running at this time, yes there are (and a lot more than in other dumps). Before we go looking at them let's see if anyone else has anything on their kernel stack about our struct page or struct anon_vma that's causing the issue.

crash64> search ffffea0112dc0c40
ffff885f34e932b8: ffffea0112dc0c40
ffff885f34e932c8: ffffea0112dc0c40
ffff885f34e93418: ffffea0112dc0c40
ffff885f34e93428: ffffea0112dc0c40
ffff885f34e936a8: ffffea0112dc0c40
ffff885f34e936b8: ffffea0112dc0c40
ffff885f34e936f8: ffffea0112dc0c40
ffff885f34e93708: ffffea0112dc0c40
ffff885f34e937b0: ffffea0112dc0c40
ffff885f34e93840: ffffea0112dc0c40

That's only on the kernel stack of the panic'ing task. It seems likely that if someone else was using it any trace of it is gone now. Now for the mapping member of the struct page (without the address marked anon) it is only refered to by this thread:

crash64> search 0xffff886b86e0adc0
ffff885f34e932e0: ffff886b86e0adc0
ffff885f34e93440: ffff886b86e0adc0
ffff885f34e93720: ffff886b86e0adc0

For the pointer with anon set in it:

crash64> search 0xffff886b86e0adc1
ffff885f34e932d0: ffff886b86e0adc1
ffff885f34e93430: ffff886b86e0adc1
ffff885f34e936a0: ffff886b86e0adc1
ffff885f34e93710: ffff886b86e0adc1

The above is our kernel thread stack, now for these:

ffff887f903c0c48: ffff886b86e0adc1
ffffea0112dc0c48: ffff886b86e0adc1

They both refer to our struct page (so it seems nobody else is currently using the page or its mapping member):

crash64> kmem ffff887f903c0c48
      PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea01fe40f000 7f903c0000 0 0 2 6fffff00000c00 reserved,private

First other thread is attemting to shink the slab (again it's possible that the anon_vma kmem cache could have contained the structure we are interested in and it may be possible that something has cleaned it up and reused it):

PID: 300 TASK: ffff883f25f26660 CPU: 0 COMMAND: "kswapd0"
 #0 [ffff883f7f805e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7f805e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7f805ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7f805ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: _raw_spin_lock+58]
    RIP: ffffffff8160c0ea RSP: ffff883f23c97bd8 RFLAGS: 00000206
    RAX: 0000000000005041 RBX: ffff887f20412c10 RCX: 0000000000005de2
    RDX: 0000000000005de4 RSI: 0000000000005de4 RDI: ffff887f20412c80
    RBP: ffff883f23c97bd8 R8: ffff884698bea680 R9: 0000000000000040
    R10: ffff8804c779d080 R11: 0000000000000220 R12: ffff883f23c97d90
    R13: 0000000000000001 R14: ffff887f20412800 R15: 00000000009ac004
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883f23c97bd8] _raw_spin_lock at ffffffff8160c0ea
 #5 [ffff883f23c97be0] __ext4_es_shrink at ffffffffa0a50a5b [ext4]
 #6 [ffff883f23c97c58] ext4_es_shrink at ffffffffa0a50d52 [ext4]
 #7 [ffff883f23c97ca0] shrink_slab at ffffffff81169205
 #8 [ffff883f23c97d48] balance_pgdat at ffffffff8116ce51
 #9 [ffff883f23c97e20] kswapd at ffffffff8116d0f3
#10 [ffff883f23c97ec8] kthread at ffffffff8109739f
#11 [ffff883f23c97f50] ret_from_fork at ffffffff81614d3c

In the next thread we're waiting on a spinlock:

PID: 21504 TASK: ffff881b7b5716c0 CPU: 8 COMMAND: "vertica"
 #0 [ffff883f7fa05e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7fa05e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7fa05ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7fa05ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: _raw_spin_lock+55]
    RIP: ffffffff8160c0e7 RSP: ffff8879e4db74f0 RFLAGS: 00000206
    RAX: 0000000000007e7b RBX: ffffea00008e8680 RCX: 000000000000ec72
    RDX: 000000000000ec74 RSI: 000000000000ec74 RDI: ffffea003eda0a30
    RBP: ffff8879e4db74f0 R8: 000000003eda0a00 R9: ffff884929e88d58
    R10: 000000000000007c R11: ffffea009f8a7880 R12: ffff8879e4db7538
    R13: ffff880fb6828cc0 R14: 0000000000000000 R15: ffffea003eda0a30
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8879e4db74f0] _raw_spin_lock at ffffffff8160c0e7
 #5 [ffff8879e4db74f8] __page_check_address at ffffffff8118c89c
 #6 [ffff8879e4db7530] page_referenced_one at ffffffff8118cc5d
 #7 [ffff8879e4db7578] page_referenced at ffffffff8118e57b
 #8 [ffff8879e4db75f0] shrink_active_list at ffffffff8116b1cc
 #9 [ffff8879e4db76a8] shrink_lruvec at ffffffff8116b889
#10 [ffff8879e4db77a8] shrink_zone at ffffffff8116bb76
#11 [ffff8879e4db7800] do_try_to_free_pages at ffffffff8116c080
#12 [ffff8879e4db7878] try_to_free_pages at ffffffff8116c56c
#13 [ffff8879e4db7910] __alloc_pages_nodemask at ffffffff81160c0d
#14 [ffff8879e4db7a48] alloc_pages_current at ffffffff8119f549
#15 [ffff8879e4db7a90] __get_free_pages at ffffffff8115b59e
#16 [ffff8879e4db7aa0] __pollwait at ffffffff811dacd0
#17 [ffff8879e4db7ad0] tcp_poll at ffffffff81543d39
#18 [ffff8879e4db7ae8] sock_poll at ffffffff814e3e30
#19 [ffff8879e4db7b20] do_sys_poll at ffffffff811dc377
#20 [ffff8879e4db7f40] sys_poll at ffffffff811dc6d4
#21 [ffff8879e4db7f80] system_call_fastpath at ffffffff81614de9
    RIP: 00007f4d1a00abcd RSP: 00007f4d00c99cd8 RFLAGS: 00000206
    RAX: 0000000000000007 RBX: ffffffff81614de9 RCX: 00007f4cf5444550
    RDX: 00000000000017df RSI: 0000000000000102 RDI: 00007f4cf5444550
    RBP: 00007f4d00c99e00 R8: 0000000000000000 R9: 0000000000005400
    R10: 0000000000000008 R11: 0000000000000293 R12: 000000000dbe9178
    R13: 0000000000000102 R14: 000000000dbe9170 R15: 000000005920139c
    ORIG_RAX: 0000000000000007 CS: 0033 SS: 002b

crash64> dis -l _raw_spin_lock
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 136
0xffffffff8160c0b0 <_raw_spin_lock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8160c0b5 <_raw_spin_lock+5>: push %rbp
0xffffffff8160c0b6 <_raw_spin_lock+6>: mov %rsp,%rbp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 87
0xffffffff8160c0b9 <_raw_spin_lock+9>: mov $0x20000,%eax
0xffffffff8160c0be <_raw_spin_lock+14>: lock xadd %eax,(%rdi)
0xffffffff8160c0c2 <_raw_spin_lock+18>: mov %eax,%edx
0xffffffff8160c0c4 <_raw_spin_lock+20>: shr $0x10,%edx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 88
0xffffffff8160c0c7 <_raw_spin_lock+23>: cmp %ax,%dx
0xffffffff8160c0ca <_raw_spin_lock+26>: jne 0xffffffff8160c0ce <_raw_spin_lock+30>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 138
0xffffffff8160c0cc <_raw_spin_lock+28>: pop %rbp
0xffffffff8160c0cd <_raw_spin_lock+29>: retq
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 91
0xffffffff8160c0ce <_raw_spin_lock+30>: and $0xfffffffe,%edx
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/paravirt.h: 718
0xffffffff8160c0d1 <_raw_spin_lock+33>: movzwl %dx,%esi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 136
0xffffffff8160c0d4 <_raw_spin_lock+36>: mov $0x8000,%eax
0xffffffff8160c0d9 <_raw_spin_lock+41>: jmp 0xffffffff8160c0e7 <_raw_spin_lock+55>
0xffffffff8160c0db <_raw_spin_lock+43>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/processor.h: 685
0xffffffff8160c0e0 <_raw_spin_lock+48>: pause
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 99
0xffffffff8160c0e2 <_raw_spin_lock+50>: sub $0x1,%eax
0xffffffff8160c0e5 <_raw_spin_lock+53>: je 0xffffffff8160c0f1 <_raw_spin_lock+65>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/spinlock.h: 96
0xffffffff8160c0e7 <_raw_spin_lock+55>: movzwl (%rdi),%ecx
0xffffffff8160c0ea <_raw_spin_lock+58>: cmp %cx,%dx
0xffffffff8160c0ed <_raw_spin_lock+61>: jne 0xffffffff8160c0e0 <_raw_spin_lock+48>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/spinlock.c: 138
0xffffffff8160c0ef <_raw_spin_lock+63>: pop %rbp
0xffffffff8160c0f0 <_raw_spin_lock+64>: retq
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/paravirt.h: 718
0xffffffff8160c0f1 <_raw_spin_lock+65>: nopl 0x0(%rax)
0xffffffff8160c0f8 <_raw_spin_lock+72>: jmp 0xffffffff8160c0d4 <_raw_spin_lock+36>

We're spinning waiting for cx to be equal to dx (cx ec72 dx ec74) so someone else has the spinlock held and this is the spinlock:

crash64> spinlock ffffea003eda0a30
struct spinlock {
  {
    rlock = {
      raw_lock = {
        {
          head_tail = 3967216754,
          tickets = {
            head = 60530,
            tail = 60534
          }
        }
      }
    }
  }
}
crash64> p/x 60530
$1 = 0xec72
crash64> p/x 60534
$2 = 0xec76

The struct page passed into page_referenced:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/vmscan.c: 1651
0xffffffff8116b1ba <shrink_active_list+458>: mov 0x38(%r14),%rdx
0xffffffff8116b1be <shrink_active_list+462>: lea -0x68(%rbp),%rcx
0xffffffff8116b1c2 <shrink_active_list+466>: xor %esi,%esi
0xffffffff8116b1c4 <shrink_active_list+468>: mov %r15,%rdi
0xffffffff8116b1c7 <shrink_active_list+471>: callq 0xffffffff8118e300 <page_referenced>
0xffffffff8116b1cc <shrink_active_list+476>: test %eax,%eax
0xffffffff8116b1ce <shrink_active_list+478>: je 0xffffffff8116b168 <shrink_active_list+376>

So we want r15 which is saved by page_referenced:

crash64> dis -l page_referenced
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 849
0xffffffff8118e300 <page_referenced>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118e305 <page_referenced+5>: push %rbp
0xffffffff8118e306 <page_referenced+6>: mov %rcx,%rax
0xffffffff8118e309 <page_referenced+9>: mov %rsp,%rbp
0xffffffff8118e30c <page_referenced+12>: push %r15
0xffffffff8118e30e <page_referenced+14>: push %r14
0xffffffff8118e310 <page_referenced+16>: push %r13
0xffffffff8118e312 <page_referenced+18>: push %r12
0xffffffff8118e314 <page_referenced+20>: push %rbx
0xffffffff8118e315 <page_referenced+21>: mov %rdi,%rbx
0xffffffff8118e318 <page_referenced+24>: sub $0x40,%rsp
0xffffffff8118e31c <page_referenced+28>: mov %rcx,-0x58(%rbp)

 #7 [ffff8879e4db7578] page_referenced at ffffffff8118e57b
    ffff8879e4db7580: ffff8879f635ef40 00000007efb75798
    ffff8879e4db7590: ffff8879e4db7638 0000000000000000
    ffff8879e4db75a0: 00000007efb74000 00007efb74000000
    ffff8879e4db75b0: 000000017f44a208 000000005920139c
    ffff8879e4db75c0: ffffea00008e86a0 ffff8879e4db7640
    ffff8879e4db75d0: 0000000000000000 ffff8879e4db7890
    ffff8879e4db75e0: r15 ffffea00008e8680 rbp ffff8879e4db76a0
    ffff8879e4db75f0: rip ffffffff8116b1cc
 #8 [ffff8879e4db75f0] shrink_active_list at ffffffff8116b1cc

is ffffea00008e8680. If we go up through the stacks until we get to __page_check_address

0201 static inline pte_t *page_check_address(struct page *page, struct mm_struct *mm,
0202 unsigned long address,
0203 spinlock_t **ptlp, int sync)
0204 {
0205 pte_t *ptep;
0206
0207 __cond_lock(*ptlp, ptep = __page_check_address(page, mm, address,
0208 ptlp, sync));
0209 return ptep;
0210 }

Let's try to work out some/all of the args to __page_Check_address:

0595 pte_t *__page_check_address(struct page *page, struct mm_struct *mm,
0596 unsigned long address, spinlock_t **ptlp, int sync)

 #5 [ffff8879e4db74f8] __page_check_address at ffffffff8118c89c
    ffff8879e4db7500: ffff887be5b17518 00007efb75798000
    ffff8879e4db7510: ffff8879e4db75b4 ffff8879e4db7638
    ffff8879e4db7520: 0000000001798000 ffff8879e4db7570
    ffff8879e4db7530: ffffffff8118cc5d
 #6 [ffff8879e4db7530] page_referenced_one at ffffffff8118cc5d

This is the call we made and at this point we want rsi:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include
/linux/rmap.h: 207
0xffffffff8118cc51 <page_referenced_one+65>: lea -0x38(%rbp),%rcx
0xffffffff8118cc55 <page_referenced_one+69>: xor %r8d,%r8d
0xffffffff8118cc58 <page_referenced_one+72>: callq 0xffffffff8118c7d0 <__page_check_address>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 702
0xffffffff8118cc5d <page_referenced_one+77>: test %rax,%rax
0xffffffff8118cc60 <page_referenced_one+80>: je 0xffffffff8118ccd8 <page_referenced_one+200>

It's not set there however earlier in page_referenced_one we have:

0663 int page_referenced_one(struct page *page, struct vm_area_struct *vma,
0664 unsigned long address, unsigned int *mapcount,
0665 unsigned long *vm_flags)
0666 {
0667 struct mm_struct *mm = vma->vm_mm;

For which we have the following assembler:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 667
0xffffffff8118cc32 <page_referenced_one+34>: mov 0x40(%rsi),%rsi

And at the start of the assembler we stash rsi on input into rbx:

0xffffffff8118cc15 <page_referenced_one+5>: push %rbp
0xffffffff8118cc16 <page_referenced_one+6>: mov %rsp,%rbp
0xffffffff8118cc19 <page_referenced_one+9>: push %r15
0xffffffff8118cc1b <page_referenced_one+11>: push %r14
0xffffffff8118cc1d <page_referenced_one+13>: mov %r8,%r14
0xffffffff8118cc20 <page_referenced_one+16>: push %r13
0xffffffff8118cc22 <page_referenced_one+18>: mov %rcx,%r13
0xffffffff8118cc25 <page_referenced_one+21>: push %r12
0xffffffff8118cc27 <page_referenced_one+23>: mov %rdx,%r12
0xffffffff8118cc2a <page_referenced_one+26>: push %rbx
0xffffffff8118cc2b <page_referenced_one+27>: mov %rsi,%rbx
0xffffffff8118cc2e <page_referenced_one+30>: sub $0x10,%rsp

saves rbx:

crash64> dis -l __page_check_address
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 597
0xffffffff8118c7d0 <__page_check_address>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8118c7d5 <__page_check_address+5>: push %rbp
0xffffffff8118c7d6 <__page_check_address+6>: mov %rsp,%rbp
0xffffffff8118c7d9 <__page_check_address+9>: push %r15
0xffffffff8118c7db <__page_check_address+11>: mov %rsi,%r15
0xffffffff8118c7de <__page_check_address+14>: push %r14
0xffffffff8118c7e0 <__page_check_address+16>: mov %r8d,%r14d
0xffffffff8118c7e3 <__page_check_address+19>: push %r13
0xffffffff8118c7e5 <__page_check_address+21>: mov %rdx,%r13
0xffffffff8118c7e8 <__page_check_address+24>: push %r12
0xffffffff8118c7ea <__page_check_address+26>: mov %rcx,%r12
0xffffffff8118c7ed <__page_check_address+29>: push %rbx
0xffffffff8118c7ee <__page_check_address+30>: mov %rdi,%rbx

Which let's us do this:

 #5 [ffff8879e4db74f8] __page_check_address at ffffffff8118c89c
    ffff8879e4db7500: rbx ffff887be5b17518 r12 00007efb75798000
    ffff8879e4db7510: r13 ffff8879e4db75b4 r14 ffff8879e4db7638
    ffff8879e4db7520: r15 0000000001798000 rbp ffff8879e4db7570
    ffff8879e4db7530: rip ffffffff8118cc5d
 #6 [ffff8879e4db7530] page_referenced_one at ffffffff8118cc5d

Which gives us this:

crash64> struct vm_area_struct ffff887be5b17518
struct vm_area_struct {
  vm_start = 139618448048128,
  vm_end = 139618515111936,
  vm_next = 0xffff887be5b171b8,
  vm_prev = 0xffff88090a098bd0,
  vm_rb = {
    __rb_parent_color = 18446612282008797864,
    rb_right = 0xffff887be5b16608,
    rb_left = 0xffff8809d9185028
  },
  rb_subtree_gap = 0,
  vm_mm = 0xffff887f256c3840,
  vm_page_prot = {
    pgprot = 9223372036854775845
  },
  vm_flags = 2097267,
  shared = {
    linear = {
      rb = {
        __rb_parent_color = 0,
        rb_right = 0x0,
        rb_left = 0x0
      },
      rb_subtree_last = 0
    },
    nonlinear = {
      next = 0x0,
      prev = 0x0
    }
  },
  anon_vma_chain = {
    next = 0xffff886840393e50,
    prev = 0xffff886840393e50
  },
  anon_vma = 0xffff8879f635ef40,
  vm_ops = 0x0,
  vm_pgoff = 34086535168,
  vm_file = 0x0,
  vm_private_data = 0x0,
  vm_policy = 0x0,
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0
}

crash64> struct mm_struct 0xffff887f256c3840
struct mm_struct {
  mmap = 0xffff886975dbfe60,
  mm_rb = {
    rb_node = 0xffff882983bd1a48
  },
  mmap_cache = 0xffff886975dbfe60,
  get_unmapped_area = 0xffffffff81018810 <arch_get_unmapped_area_topdown>,
  unmap_area = 0xffffffff811884e0 <arch_unmap_area_topdown>,
  mmap_base = 139969141391360,
  mmap_legacy_base = 47792979673088,
  task_size = 140737488351232,
  cached_hole_size = 18446744073709551615,
  free_area_cache = 139969141391360,
  highest_vm_end = 140735290540032,
  pgd = 0xffff88761db7e000,
  mm_users = {
    counter = 5346
  },
  mm_count = {
    counter = 22
  },
  nr_ptes = {
    counter = 146817
  },
  map_count = 14658,
  page_table_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 1810656236,
            tickets = {
              head = 27628,
              tail = 27628
            }
          }
        }
      }
    }
  },
  mmap_sem = {
    count = -4294967293,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 1020411090,
          tickets = {
            head = 15570,
            tail = 15570
          }
        }
      }
    },
    wait_list = {
      next = 0xffff887a0613bdf0,
      prev = 0xffff8842fd993df0
    }
  },
...

All that looks valid. Let's look at __page_check_address to see where it gets the spinlock address from.

0xffffffff8118c804 <__page_check_address+52>: callq 0xffffffff8118c720 <mm_find_pmd>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 609
0xffffffff8118c809 <__page_check_address+57>: test %rax,%rax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 608
0xffffffff8118c80c <__page_check_address+60>: mov %rax,%r9

r9 is from mm_find_pmd.

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/include/asm/paravirt.h: 571
0xffffffff8118c86c <__page_check_address+156>: mov (%r9),%rdi
0xffffffff8118c86f <__page_check_address+159>: mov %rdi,%rax
0xffffffff8118c872 <__page_check_address+162>: nopl 0x0(%rax)
0xffffffff8118c876 <__page_check_address+166>: mov %rax,%r8
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 622
0xffffffff8118c879 <__page_check_address+169>: movabs $0xffffea0000000000,%rax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/mm.h: 1396
0xffffffff8118c883 <__page_check_address+179>: shl $0x12,%r8
0xffffffff8118c887 <__page_check_address+183>: shr $0x1e,%r8
0xffffffff8118c88b <__page_check_address+187>: shl $0x6,%r8
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 622
0xffffffff8118c88f <__page_check_address+191>: lea 0x30(%r8,%rax,1),%r15
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/include/linux/spinlock.h: 293
0xffffffff8118c894 <__page_check_address+196>: mov %r15,%rdi
0xffffffff8118c897 <__page_check_address+199>: callq 0xffffffff8160c0b0 <_raw_spin_lock>
0xffffffff8118c89c <__page_check_address+204>: mov 0x0(%r13),%rdi
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/mm/rmap.c: 625
0xffffffff8118c8a0 <__page_check_address+208>: test $0x101,%edi
0xffffffff8118c8a6 <__page_check_address+214>: je 0xffffffff8118c8e8 <__page_check_address+280>

Since _raw_spin_lock only changes a few things we may have some remnants left over. We can check that r15 and rdi are the same value and they are:

    [exception RIP: _raw_spin_lock+55]
    RIP: ffffffff8160c0e7 RSP: ffff8879e4db74f0 RFLAGS: 00000206
    RAX: 0000000000007e7b RBX: ffffea00008e8680 RCX: 000000000000ec72
    RDX: 000000000000ec74 RSI: 000000000000ec74 RDI: ffffea003eda0a30
    RBP: ffff8879e4db74f0 R8: 000000003eda0a00 R9: ffff884929e88d58
    R10: 000000000000007c R11: ffffea009f8a7880 R12: ffff8879e4db7538
    R13: ffff880fb6828cc0 R14: 0000000000000000 R15: ffffea003eda0a30
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

The easier way though is to see who else may have the address of the spinlock on their stack and this one does in the stack frame for page_referenced_one:

PID: 34697 TASK: ffff887a066c2220 CPU: 26 COMMAND: "vertica"
 #0 [ffff883f7fc05e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7fc05e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7fc05ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7fc05ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: smp_call_function_many+518]
    RIP: ffffffff810d6e36 RSP: ffff880acf2ff740 RFLAGS: 00000202
    RAX: 0000000000000025 RBX: 0000000000000028 RCX: ffff887f7f457928
    RDX: 0000000000000025 RSI: 0000000000000028 RDI: 0000000000000000
    RBP: ffff880acf2ff778 R8: ffff883f26b83000 R9: ffff883f7fc164a0
    R10: ffffea000c5ad400 R11: ffffffff812d4e39 R12: 0000000000014140
    R13: ffffffff8105faf0 R14: ffff880acf2ff788 R15: ffff883f7fc14180
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
--- <NMI exception stack> ---
 #4 [ffff880acf2ff740] smp_call_function_many at ffffffff810d6e36
 #5 [ffff880acf2ff780] native_flush_tlb_others at ffffffff8105fcb8
 #6 [ffff880acf2ff7d0] flush_tlb_page at ffffffff8105fef5
 #7 [ffff880acf2ff7f0] ptep_clear_flush_young at ffffffff8105e9cd
 #8 [ffff880acf2ff818] page_referenced_one at ffffffff8118cc76
 #9 [ffff880acf2ff860] page_referenced at ffffffff8118e57b
#10 [ffff880acf2ff8d8] shrink_active_list at ffffffff8116b1cc
#11 [ffff880acf2ff990] shrink_lruvec at ffffffff8116b889
#12 [ffff880acf2ffa90] shrink_zone at ffffffff8116bb76
#13 [ffff880acf2ffae8] do_try_to_free_pages at ffffffff8116c080
#14 [ffff880acf2ffb60] try_to_free_pages at ffffffff8116c56c
#15 [ffff880acf2ffbf8] __alloc_pages_nodemask at ffffffff81160c0d
#16 [ffff880acf2ffd30] alloc_pages_vma at ffffffff811a2a2a
#17 [ffff880acf2ffd98] handle_mm_fault at ffffffff81183007
#18 [ffff880acf2ffe28] __do_page_fault at ffffffff816101c6
#19 [ffff880acf2fff28] do_page_fault at ffffffff816105ca
#20 [ffff880acf2fff50] page_fault at ffffffff8160c7c8
    RIP: 000000000129044b RSP: 00007f3fc23706d0 RFLAGS: 00010206
    RAX: 00007f3fc2fcc630 RBX: 0000000000000008 RCX: 0000000000000013
    RDX: 0001b6cae9de0000 RSI: 0000000000000041 RDI: 00000000000001a7
    RBP: 00007f3fc2370780 R8: 000000000000002d R9: 0001b6b6cc06a000
    R10: 0000000000000068 R11: 00007f3fc2fcc5c0 R12: 00007f3fc2f4c598
    R13: 0000000000000008 R14: 0000000000000008 R15: 00007f3834156048
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

It's waiting for other CPUs to acknowledge that they've completed the function call on other CPUs. The function native_flush_tlb_others looks like this:

0124 void native_flush_tlb_others(const struct cpumask *cpumask,
0125 struct mm_struct *mm, unsigned long start,
0126 unsigned long end)
0127 {
0128 struct flush_tlb_info info;
0129 info.flush_mm = mm;
0130 info.flush_start = start;
0131 info.flush_end = end;
0132
0133 if (is_uv_system()) {
0134 unsigned int cpu;
0135
0136 cpu = smp_processor_id();
0137 cpumask = uv_flush_tlb_others(cpumask, mm, start, end, cpu);
0138 if (cpumask)
0139 smp_call_function_many(cpumask, flush_tlb_func,
0140 &info, 1);
0141 return;
0142 }
0143 smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
0144 }

crash64> dis -l ptep_clear_flush_young
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 403
0xffffffff8105e9a0 <ptep_clear_flush_young>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8105e9a5 <ptep_clear_flush_young+5>: push %rbp
0xffffffff8105e9a6 <ptep_clear_flush_young+6>: mov %rsp,%rbp
0xffffffff8105e9a9 <ptep_clear_flush_young+9>: push %r12
0xffffffff8105e9ab <ptep_clear_flush_young+11>: mov %rsi,%r12
0xffffffff8105e9ae <ptep_clear_flush_young+14>: push %rbx
0xffffffff8105e9af <ptep_clear_flush_young+15>: mov %rdi,%rbx
0xffffffff8105e9b2 <ptep_clear_flush_young+18>: sub $0x8,%rsp
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 406
0xffffffff8105e9b6 <ptep_clear_flush_young+22>: callq 0xffffffff8105e940 <ptep_test_and_clear_young>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 407
0xffffffff8105e9bb <ptep_clear_flush_young+27>: test %eax,%eax
0xffffffff8105e9bd <ptep_clear_flush_young+29>: je 0xffffffff8105e9d0 <ptep_clear_flush_young+48>
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 408
0xffffffff8105e9bf <ptep_clear_flush_young+31>: mov %r12,%rsi
0xffffffff8105e9c2 <ptep_clear_flush_young+34>: mov %rbx,%rdi
0xffffffff8105e9c5 <ptep_clear_flush_young+37>: mov %eax,-0x14(%rbp)
0xffffffff8105e9c8 <ptep_clear_flush_young+40>: callq 0xffffffff8105fea0 <flush_tlb_page>
0xffffffff8105e9cd <ptep_clear_flush_young+45>: mov -0x14(%rbp),%eax
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 411
0xffffffff8105e9d0 <ptep_clear_flush_young+48>: add $0x8,%rsp
0xffffffff8105e9d4 <ptep_clear_flush_young+52>: pop %rbx
0xffffffff8105e9d5 <ptep_clear_flush_young+53>: pop %r12
0xffffffff8105e9d7 <ptep_clear_flush_young+55>: pop %rbp
0xffffffff8105e9d8 <ptep_clear_flush_young+56>: retq

and flush_tlb_page looks like this:

0212 void flush_tlb_page(struct vm_area_struct *vma, unsigned long start)
0213 {
0214 struct mm_struct *mm = vma->vm_mm;
0215
0216 preempt_disable();
0217
0218 if (current->active_mm == mm) {
0219 if (current->mm)
0220 __flush_tlb_one(start);
0221 else
0222 leave_mm(smp_processor_id());
0223 }
0224
0225 if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
0226 flush_tlb_others(mm_cpumask(mm), mm, start, 0UL);
0227
0228 preempt_enable();
0229 }

Becaus of this earlier:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/pgtable.c: 408
0xffffffff8105e9bf <ptep_clear_flush_young+31>: mov %r12,%rsi
0xffffffff8105e9c2 <ptep_clear_flush_young+34>: mov %rbx,%rdi

And we save them we can get the args directly below

crash64> dis -l flush_tlb_page
/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/arch/x86/mm/tlb.c: 213
0xffffffff8105fea0 <flush_tlb_page>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffff8105fea5 <flush_tlb_page+5>: push %rbp
0xffffffff8105fea6 <flush_tlb_page+6>: mov %rsp,%rbp
0xffffffff8105fea9 <flush_tlb_page+9>: push %r12
0xffffffff8105feab <flush_tlb_page+11>: mov %rsi,%r12
0xffffffff8105feae <flush_tlb_page+14>: push %rbx

 #6 [ffff880acf2ff7d0] flush_tlb_page at ffffffff8105fef5
    ffff880acf2ff7d8: rbx ffff887be5b17518 r12 00007efb7576a000
    ffff880acf2ff7e8: rbp ffff880acf2ff810 rip ffffffff8105e9cd
 #7 [ffff880acf2ff7f0] ptep_clear_flush_young at ffffffff8105e9cd

struct vm_area_struct *vma ffff887be5b17518
unsigned long start 00007efb7576a000

struct vm_area_struct {
  vm_start = 139618448048128,
  vm_end = 139618515111936,
  vm_next = 0xffff887be5b171b8,
  vm_prev = 0xffff88090a098bd0,
  vm_rb = {
    __rb_parent_color = 18446612282008797864,
    rb_right = 0xffff887be5b16608,
    rb_left = 0xffff8809d9185028
  },
  rb_subtree_gap = 0,
  vm_mm = 0xffff887f256c3840,
  vm_page_prot = {
    pgprot = 9223372036854775845
  },
  vm_flags = 2097267,
  shared = {
    linear = {
      rb = {
        __rb_parent_color = 0,
        rb_right = 0x0,
        rb_left = 0x0
      },
      rb_subtree_last = 0
    },
    nonlinear = {
      next = 0x0,
      prev = 0x0
    }
  },
  anon_vma_chain = {
    next = 0xffff886840393e50,
    prev = 0xffff886840393e50
  },
  anon_vma = 0xffff8879f635ef40,
  vm_ops = 0x0,
  vm_pgoff = 34086535168,
  vm_file = 0x0,
  vm_private_data = 0x0,
  vm_policy = 0x0,
  rh_reserved1 = 0,
  rh_reserved2 = 0,
  rh_reserved3 = 0,
  rh_reserved4 = 0
}

struct mm_struct {
  mmap = 0xffff886975dbfe60,
  mm_rb = {
    rb_node = 0xffff882983bd1a48
  },
  mmap_cache = 0xffff886975dbfe60,
  get_unmapped_area = 0xffffffff81018810 <arch_get_unmapped_area_topdown>,
  unmap_area = 0xffffffff811884e0 <arch_unmap_area_topdown>,
  mmap_base = 139969141391360,
  mmap_legacy_base = 47792979673088,
  task_size = 140737488351232,
  cached_hole_size = 18446744073709551615,
  free_area_cache = 139969141391360,
  highest_vm_end = 140735290540032,
  pgd = 0xffff88761db7e000,
  mm_users = {
    counter = 5346
  },
  mm_count = {
    counter = 22
  },
  nr_ptes = {
    counter = 146817
  },
  map_count = 14658,
  page_table_lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 1810656236,
            tickets = {
              head = 27628,
              tail = 27628
            }
          }
        }
      }
    }
  },
  mmap_sem = {
    count = -4294967293,
    wait_lock = {
      raw_lock = {
        {
          head_tail = 1020411090,
          tickets = {
            head = 15570,
            tail = 15570
          }
        }
      }
    },
    wait_list = {
      next = 0xffff887a0613bdf0,
      prev = 0xffff8842fd993df0
    }
  },
...

The cpu_vm_mask_var from the mm_struct is:

cpu_vm_mask_var = 0xffff887f256c3b88,

crash64> cpumask_t 0xffff887f256c3b88
struct cpumask_t {
  bits = {710884661040, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
}

crash64> p/t 710884661040
$4 = 1010010110000100000001110001011100110000

We go interrupted here:

/usr/src/debug/kernel-3.10.0-229.4.2.el7/linux-3.10.0-229.4.2.el7.x86_64/kernel/smp.c: 109
0xffffffff810d6e32 <smp_call_function_many+514>: testb $0x1,0x20(%rcx)
0xffffffff810d6e36 <smp_call_function_many+518>: jne 0xffffffff810d6e30 <smp_call_function_many+512>

Which is this:

0100 /*
0101 * csd_lock/csd_unlock used to serialize access to per-cpu csd resources
0102 *
0103 * For non-synchronous ipi calls the csd can still be in use by the
0104 * previous function call. For multi-cpu calls its even more interesting
0105 * as we'll have to ensure no other cpu is observing our csd.
0106 */
0107 static void csd_lock_wait(struct call_single_data *csd)
0108 {
0109 while (csd->flags & CSD_FLAG_LOCK)
0110 cpu_relax();
0111 }

crash64> struct call_single_data ffff887f7f457928
struct call_single_data {
  {
    llist = {
      next = 0xffff887f7f457798
    },
    list = {
      next = 0xffff887f7f457798,
      prev = 0x0
    }
  },
  func = 0xffffffff8105faf0 <flush_tlb_func>,
  info = 0xffff880acf2ff788,
  flags = 1
}

We're waiting for someone elses smp call to complete on that CPU (haven't looked at which CPU that is) but this is the only other thing doing an smp calls to other CPUs:

PID: 27681 TASK: ffff8841e87f6660 CPU: 16 COMMAND: "vertica"
 #0 [ffff887f7f185e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff887f7f185e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff887f7f185ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff887f7f185ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: smp_call_function_many+514]
    RIP: ffffffff810d6e32 RSP: ffff887f2509b6b8 RFLAGS: 00000202
    RAX: 0000000000000025 RBX: 0000000000000028 RCX: ffff887f7f457798
    RDX: 0000000000000025 RSI: 0000000000000028 RDI: 0000000000000000
    RBP: ffff887f2509b6f0 R8: ffff887f2642d000 R9: ffff887f7f1964a0
    R10: ffffea01e7be3200 R11: ffffffff812d4e39 R12: 0000000000014140
    R13: ffffffff8105faf0 R14: ffff887f2509b700 R15: ffff887f7f194180
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
--- <NMI exception stack> ---
 #4 [ffff887f2509b6b8] smp_call_function_many at ffffffff810d6e32
 #5 [ffff887f2509b6f8] native_flush_tlb_others at ffffffff8105fcb8
 #6 [ffff887f2509b748] flush_tlb_page at ffffffff8105fef5
 #7 [ffff887f2509b768] ptep_clear_flush_young at ffffffff8105e9cd
 #8 [ffff887f2509b790] page_referenced_one at ffffffff8118cc76
 #9 [ffff887f2509b7d8] page_referenced at ffffffff8118e57b
#10 [ffff887f2509b850] shrink_active_list at ffffffff8116b1cc
#11 [ffff887f2509b908] shrink_lruvec at ffffffff8116b889
#12 [ffff887f2509ba08] shrink_zone at ffffffff8116bb76
#13 [ffff887f2509ba60] do_try_to_free_pages at ffffffff8116c080
#14 [ffff887f2509bad8] try_to_free_pages at ffffffff8116c56c
#15 [ffff887f2509bb70] __alloc_pages_nodemask at ffffffff81160c0d
#16 [ffff887f2509bca8] alloc_pages_vma at ffffffff811a2a2a
#17 [ffff887f2509bd10] do_wp_page at ffffffff811807ba
#18 [ffff887f2509bd98] handle_mm_fault at ffffffff81182b94
#19 [ffff887f2509be28] __do_page_fault at ffffffff816101c6
#20 [ffff887f2509bf28] do_page_fault at ffffffff816105ca
#21 [ffff887f2509bf50] page_fault at ffffffff8160c7c8
    RIP: 0000000000c99820 RSP: 00007f0b590ba4d0 RFLAGS: 00010246
    RAX: 0000000002cde49d RBX: 0000000000005506 RCX: 0000000000000000
    RDX: 00007eec48576120 RSI: 0000000000000048 RDI: 5c9d51e401999d9d
    RBP: 00007f0b590ba760 R8: 00007ef2f0224920 R9: 0000000003ffffff
    R10: 000000000033a11d R11: 0000000000000050 R12: 00007f42b13d3da0
    R13: 00007eeb67fff010 R14: 0000000000000009 R15: 00007f0b590badc0
    ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b

The other threads are trying to shink the slab (one doing it and the other two waiting on spin locks).

PID: 22733 TASK: ffff887e7ca44440 CPU: 9 COMMAND: "vertica"
 #0 [ffff883f7fa45e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff883f7fa45e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff883f7fa45ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff883f7fa45ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: _raw_spin_lock+58]
    RIP: ffffffff8160c0ea RSP: ffff88678f473760 RFLAGS: 00000202
    RAX: 000000000000382a RBX: ffff887f20412c10 RCX: 0000000000005de2
    RDX: 0000000000005dee RSI: 0000000000005dee RDI: ffff887f20412c80
    RBP: ffff88678f473760 R8: 0000000000038916 R9: 0000000000000040
    R10: 0000000000000100 R11: 0000000000000220 R12: ffff88678f473950
    R13: 0000000000000001 R14: ffff887f20412800 R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff88678f473760] _raw_spin_lock at ffffffff8160c0ea
 #5 [ffff88678f473768] __ext4_es_shrink at ffffffffa0a50a5b [ext4]
 #6 [ffff88678f4737e0] ext4_es_shrink at ffffffffa0a50d52 [ext4]
 #7 [ffff88678f473828] shrink_slab at ffffffff81169205
 #8 [ffff88678f4738d0] do_try_to_free_pages at ffffffff8116c352
 #9 [ffff88678f473948] try_to_free_pages at ffffffff8116c56c
#10 [ffff88678f4739e0] __alloc_pages_nodemask at ffffffff81160c0d
#11 [ffff88678f473b18] alloc_pages_current at ffffffff8119f549
#12 [ffff88678f473b60] sk_page_frag_refill at ffffffff814e7e30
#13 [ffff88678f473b90] tcp_sendmsg at ffffffff81547df3
#14 [ffff88678f473c58] inet_sendmsg at ffffffff81570964
#15 [ffff88678f473c88] sock_aio_write at ffffffff814e32c7
#16 [ffff88678f473d58] do_sync_readv_writev at ffffffff811c65c9
#17 [ffff88678f473e30] do_readv_writev at ffffffff811c7abe
#18 [ffff88678f473f28] vfs_writev at ffffffff811c7ce5
#19 [ffff88678f473f38] sys_writev at ffffffff811c7e3c
#20 [ffff88678f473f80] system_call_fastpath at ffffffff81614de9
    RIP: 00007f4d1a00c3d0 RSP: 00007f4248270910 RFLAGS: 00000206
    RAX: 0000000000000014 RBX: ffffffff81614de9 RCX: 0000000000000004
    RDX: 0000000000000002 RSI: 00007f37edfa8210 RDI: 000000000000059a
    RBP: 00007f42482707d0 R8: 0000000000000000 R9: 00000000000058cd
    R10: 0000000000003133 R11: 0000000000000293 R12: 00007f4cf4244ee0
    R13: 00007f4cf4a2da78 R14: 00007f4cf4a2da70 R15: 00007f4cf4244ee0
    ORIG_RAX: 0000000000000014 CS: 0033 SS: 002b

PID: 301 TASK: ffff883f25f271c0 CPU: 15 COMMAND: "kswapd1"
 #0 [ffff887f7f145e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff887f7f145e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff887f7f145ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff887f7f145ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: ext4_inode_touch_time_cmp+12]
    RIP: ffffffffa0a506dc RSP: ffff883f23c9ba98 RFLAGS: 00000286
    RAX: 0000000000080000 RBX: ffff886998df40b0 RCX: 000000022afa0b96
    RDX: ffff886998df40b0 RSI: ffff881a8ac09bd8 RDI: 0000000000000000
    RBP: ffff883f23c9bae0 R8: ffff883f23c9bc10 R9: 0000000100660064
    R10: ffffffffa0a508d9 R11: ffffea00c2c99680 R12: ffff881a8ac09bd8
    R13: ffff885e9fc7b880 R14: 0000000000000000 R15: ffffffffa0a506d0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff883f23c9ba98] ext4_inode_touch_time_cmp at ffffffffa0a506dc [ext4]
 #5 [ffff883f23c9bae8] list_sort at ffffffff812e732b
 #6 [ffff883f23c9bbe0] __ext4_es_shrink at ffffffffa0a50bce [ext4]
 #7 [ffff883f23c9bc58] ext4_es_shrink at ffffffffa0a50d52 [ext4]
 #8 [ffff883f23c9bca0] shrink_slab at ffffffff81169205
 #9 [ffff883f23c9bd48] balance_pgdat at ffffffff8116ce51
#10 [ffff883f23c9be20] kswapd at ffffffff8116d0f3
#11 [ffff883f23c9bec8] kthread at ffffffff8109739f
#12 [ffff883f23c9bf50] ret_from_fork at ffffffff81614d3c

PID: 34586 TASK: ffff88213f8e0b60 CPU: 18 COMMAND: "vertica"
 #0 [ffff887f7f205e70] crash_nmi_callback at ffffffff810406c2
 #1 [ffff887f7f205e80] nmi_handle at ffffffff8160d6d9
 #2 [ffff887f7f205ec8] do_nmi at ffffffff8160d7f0
 #3 [ffff887f7f205ef0] end_repeat_nmi at ffffffff8160cb31
    [exception RIP: _raw_spin_lock+55]
    RIP: ffffffff8160c0e7 RSP: ffff886654337908 RFLAGS: 00000206
    RAX: 00000000000001de RBX: ffff8868ff035290 RCX: 0000000000005de2
    RDX: 0000000000005dec RSI: 0000000000005dec RDI: ffff887f20412c80
    RBP: ffff886654337908 R8: 0000000000016400 R9: ffff887f7f216400
    R10: ffffea0103a7b3c0 R11: ffffffffa0a3d6c0 R12: ffff8868ff035528
    R13: ffff887f20412800 R14: ffff887f20412c80 R15: ffff886654337a30
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
 #4 [ffff886654337908] _raw_spin_lock at ffffffff8160c0e7
 #5 [ffff886654337910] ext4_es_lru_add at ffffffffa0a51ad7 [ext4]
 #6 [ffff886654337940] ext4_ext_map_blocks at ffffffffa0a3d6d8 [ext4]
 #7 [ffff886654337a10] ext4_da_get_block_prep at ffffffffa0a0e9fb [ext4]
 #8 [ffff886654337aa8] __block_write_begin at ffffffff811fbfa7
 #9 [ffff886654337b68] ext4_da_write_begin at ffffffffa0a1471c [ext4]
#10 [ffff886654337be8] generic_file_buffered_write at ffffffff811568ae
#11 [ffff886654337cb0] __generic_file_aio_write at ffffffff81158a15
#12 [ffff886654337d28] generic_file_aio_write at ffffffff81158c7d
#13 [ffff886654337d68] ext4_file_write at ffffffffa0a09b75 [ext4]
#14 [ffff886654337e20] do_sync_write at ffffffff811c650d
#15 [ffff886654337ef8] vfs_write at ffffffff811c6cad
#16 [ffff886654337f38] sys_write at ffffffff811c76f8
#17 [ffff886654337f80] system_call_fastpath at ffffffff81614de9
    RIP: 00007f4d1a006a4d RSP: 00007f3a293df710 RFLAGS: 00000246
    RAX: 0000000000000001 RBX: ffffffff81614de9 RCX: 0000000000000000
    RDX: 0000000000100000 RSI: 00007eee0ade9b30 RDI: 000000000000096c
    RBP: 00007f3a293df610 R8: 0000000000000000 R9: 000000000000871a
    R10: 0000000000000010 R11: 0000000000000293 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000100000 R15: 0000000000100000
    ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

None of that explains what happened to the failing task.
bwknotts

bwknotts

2016-11-11 00:29

reporter   ~0027876

What ever happened to this issue? I'm seeing it show up again. Last update on this thread was from nearly 9 months ago. it shows as "open" but no more recent updates.

Thanks in advanced.
sms1123

sms1123

2016-11-11 00:42

reporter   ~0027877

Nothing will happen unless someone investigates further and fixes it in CentOS or it gets fixed upstream (in RHEL). I tried to find the cause but couldn't but I made sure I posted what I did find to make sure that anyone else investigating could see what was found. There is a Redhat bug open here (for RHEL 7):

https://bugzilla.redhat.com/show_bug.cgi?id=1305620

If you have seen the problem with RHEL installed (not CentOS) and a RHEL support contract I'd recommend opening a case with your RHEL support provider and getting the case attached to the Redhat bugzilla so they are aware of the customers impacted and you get updates about the BZ.
sms1123

sms1123

2018-11-01 06:24

reporter   ~0033030

Fixed in RHEL, see https://bugzilla.redhat.com/show_bug.cgi?id=1305620. Note sure how to close the bug.
toracat

toracat

2018-11-01 06:40

manager   ~0033031

Thanks for the note. Closing as 'resolved'.

Issue History

Date Modified Username Field Change
2016-01-26 04:01 kdion_mz New Issue
2016-01-26 04:01 kdion_mz File Added: crash.tar.gz
2016-01-27 00:18 sms1123 Note Added: 0025527
2016-01-27 00:28 sms1123 Note Added: 0025528
2016-01-27 00:44 sms1123 Note Added: 0025529
2016-01-27 00:52 sms1123 Note Added: 0025530
2016-01-27 01:07 sms1123 Note Added: 0025531
2016-01-27 01:09 sms1123 Tag Attached: kernel panic
2016-01-27 05:37 sms1123 Note Added: 0025533
2016-11-11 00:29 bwknotts Note Added: 0027876
2016-11-11 00:42 sms1123 Note Added: 0027877
2018-11-01 06:24 sms1123 Note Added: 0033030
2018-11-01 06:40 toracat Status new => resolved
2018-11-01 06:40 toracat Resolution open => fixed
2018-11-01 06:40 toracat Note Added: 0033031