2017-06-25 22:27 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0012908CentOS-7kernelpublic2017-05-14 02:47
Reporterjekader 
PrioritynormalSeveritycrashReproducibilitysometimes
StatusnewResolutionopen 
Product Version7.3.1611 
Target VersionFixed in Version 
Summary0012908: system crash with KSM enabled
DescriptionWe experienced a system crash on an oVirt hypervisor after enabling KSM on it yesterday.

[2918189.212169] BUG: unable to handle kernel paging request at 0000000000a183eb
[2918189.220162] IP: [<ffffffff811d5103>] remove_node_from_stable_tree+0x73/0x120
[2918189.228248] PGD 0
[2918189.230698] Oops: 0000 [#1] SMP
[2918189.234514] Modules linked in: vhost_net vhost macvtap macvlan ebt_arp ebtable_nat tun kvm_intel kvm ebtable_filter ebtables ip6table_filter ip6_tables scsi_transport_iscsi dm_service_time dm_multipath xt_physdev br_netfilter ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter ip_tables dm_mod 8021q garp mrp bonding ip_set nfnetlink bridge stp llc intel_powerclamp coretemp irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_devintf iTCO_wdt dcdbas iTCO_vendor_support acpi_power_meter mei_me ipmi_ssif mei sg acpi_pad pcspkr lpc_ich wmi sb_edac edac_core ipmi_si ipmi_msghandler shpchp xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea
[2918189.315070] crct10dif_common sysfillrect crc32c_intel sysimgblt fb_sys_fops ttm ahci drm libahci libata tg3 i2c_core ptp megaraid_sas pps_core fjes [last unloaded: kvm]
[2918189.330695] CPU: 15 PID: 188 Comm: ksmd Not tainted 3.10.0-514.6.1.el7.x86_64 #1
[2918189.339146] Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.1.3 11/20/2013
[2918189.347699] task: ffff881ffde2edd0 ti: ffff880ffd914000 task.ti: ffff880ffd914000
[2918189.356250] RIP: 0010:[<ffffffff811d5103>] [<ffffffff811d5103>] remove_node_from_stable_tree+0x73/0x120
[2918189.367052] RSP: 0018:ffff880ffd917d70 EFLAGS: 00010206
[2918189.373177] RAX: 0000000067e52fc1 RBX: 0000000000a183bb RCX: 0000000000000001
[2918189.381340] RDX: 0000000000a183eb RSI: 0000000000000000 RDI: ffff880ac02d2688
[2918189.389501] RBP: ffff880ffd917d80 R08: 0000000000000069 R09: 00000000fffffffc
[2918189.397663] R10: 000000000000003c R11: 0000000000000002 R12: ffff880ac02d2688
[2918189.405825] R13: 0000000000000005 R14: ffff880ac02d268b R15: ffff880ac02d2688
[2918189.413988] FS: 0000000000000000(0000) GS:ffff881fff1c0000(0000) knlGS:0000000000000000
[2918189.423217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2918189.429828] CR2: 0000000000a183eb CR3: 00000000019ba000 CR4: 00000000001427e0
[2918189.437992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2918189.446156] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2918189.454316] Stack:
[2918189.456756] ffffea0000000140 ffffea0000000000 ffff880ffd917dc0 ffffffff811d54f8
[2918189.465252] 00ffea00338ba530 ffff8812ee1ffed0 ffff880ac02d2688 ffff880ac02d2688
[2918189.473751] 0000000000000000 00000000fffffffc ffff880ffd917e60 ffffffff811d6c4b
[2918189.482249] Call Trace:
[2918189.485185] [<ffffffff811d54f8>] get_ksm_page+0x98/0x120
[2918189.491416] [<ffffffff811d6c4b>] ksm_do_scan+0x49b/0x11e0
[2918189.497745] [<ffffffff811d7a1f>] ksm_scan_thread+0x8f/0x240
[2918189.504275] [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30
[2918189.510996] [<ffffffff811d7990>] ? ksm_do_scan+0x11e0/0x11e0
[2918189.517617] [<ffffffff810b064f>] kthread+0xcf/0xe0
[2918189.523268] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
[2918189.530763] [<ffffffff81696818>] ret_from_fork+0x58/0x90
[2918189.536994] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
[2918189.544476] Code: 0f 94 c0 84 c0 74 05 e8 0c 6d fe ff 48 81 63 18 00 f0 ff ff e8 ff 6a 4b 00 48 8b 5b 30 48 85 db 74 1e 41 8b 44 24 38 48 83 eb 30 <48> 83 7b 30 00 75 b6 48 83 2d 26 b5 d7 00 01 eb b4 0f 1f 40 00
[2918189.566392] RIP [<ffffffff811d5103>] remove_node_from_stable_tree+0x73/0x120
[2918189.574575] RSP <ffff880ffd917d70>
[2918189.578662] CR2: 0000000000a183eb
Steps To ReproduceIn our case the setup is as follows:
 Dell PowerEdge R620
 kernel-3.10.0-514.6.1.el7.x86_64
 vdsm-4.19.4-1.el7.centos.x86_64

15 VMs running with 8GB RAM each, so using up to 120GB RAM out of the 128 that the system has. This was working fine up until yesterday when we decided to optimize memory usage (all VMs are the same OS) and enabled KSM.

For now we've disabled KSM on oVirt side to make the environment more stable.
Additional InformationThis is very similar to a crash we witnessed on ppc64le earlier: https://bugs.centos.org/view.php?id=12590
TagsNo tags attached.
abrt_hash
URL
Attached Files
  • patch file icon 0001-ksm-fix-use-after-free-with-merge_across_nodes-0.patch (6,880 bytes) 2017-05-12 17:54 -
    From 9e67c63f980cc1cdd2f56e356ebc8af8af69fb3a Mon Sep 17 00:00:00 2001
    From: Andrea Arcangeli <aarcange@redhat.com>
    Date: Fri, 12 May 2017 19:12:42 +0200
    Subject: [PATCH 1/1] ksm: fix use after free with merge_across_nodes = 0
    
    If merge_across_nodes was manually set to 0 (not the default value) by
    the admin or a tuned profile on NUMA systems triggering cross-NODE
    page migrations, a stable_node use after free could materialize.
    
    If the chain is collapsed stable_node would point to the old chain
    that was already freed. stable_node_dup would be the stable_node dup
    now converted to a regular stable_node and indexed in the rbtree in
    replacement of the freed stable_node chain (not anymore a dup).
    
    This special case where the chain is collapsed in the NUMA replacement
    path, is now detected by setting stable_node to NULL by the
    chain_prune callee if it decides to collapse the chain. This tells the
    NUMA replacement code that even if stable_node and stable_node_dup are
    different, this is not a chain if stable_node is NULL, as the
    stable_node_dup was converted to a regular stable_node and the chain
    was collapsed.
    
    It is generally safer for the callee to force the caller stable_node
    to NULL the moment it become stale so any other mistake like this
    would result in an instant Oops easier to debug than an use after free.
    
    Otherwise the replace logic would act like if stable_node was a valid
    chain, when in fact it was freed. Notably
    stable_node_chain_add_dup(page_node, stable_node) would run on a
    stable stable_node.
    
    Andrey Ryabinin found the source of the use after free in
    chain_prune().
    
    Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Reported-by: Evgheni Dereveanchin <ederevea@redhat.com>
    Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
    ---
     mm/ksm.c | 52 ++++++++++++++++++++++++++++++++++++++++++----------
     1 file changed, 42 insertions(+), 10 deletions(-)
    
    diff --git a/mm/ksm.c b/mm/ksm.c
    index 44a1b2a..07fb893 100644
    --- a/mm/ksm.c
    +++ b/mm/ksm.c
    @@ -338,6 +338,7 @@ static inline void stable_node_chain_add_dup(struct stable_node *dup,
     
     static inline void __stable_node_dup_del(struct stable_node *dup)
     {
    +	VM_BUG_ON(!is_stable_node_dup(dup));
     	hlist_del(&dup->hlist_dup);
     	ksm_stable_node_dups--;
     }
    @@ -1315,12 +1316,12 @@ bool is_page_sharing_candidate(struct stable_node *stable_node)
     	return __is_page_sharing_candidate(stable_node, 0);
     }
     
    -static struct stable_node *stable_node_dup(struct stable_node *stable_node,
    +static struct stable_node *stable_node_dup(struct stable_node **_stable_node,
     					   struct page **tree_page,
     					   struct rb_root *root,
     					   bool prune_stale_stable_nodes)
     {
    -	struct stable_node *dup, *found = NULL;
    +	struct stable_node *dup, *found = NULL, *stable_node = *_stable_node;
     	struct hlist_node *hlist_safe;
     	struct page *_tree_page;
     	int nr = 0;
    @@ -1393,6 +1394,15 @@ static struct stable_node *stable_node_dup(struct stable_node *stable_node,
     			free_stable_node(stable_node);
     			ksm_stable_node_chains--;
     			ksm_stable_node_dups--;
    +			/*
    +			 * NOTE: the caller depends on the
    +			 * *_stable_node to become NULL if the chain
    +			 * was collapsed. Enforce that if anything
    +			 * uses a stale (freed) stable_node chain a
    +			 * visible crash will materialize (instead of
    +			 * an use after free).
    +			 */
    +			*_stable_node = stable_node = NULL;
     		} else if (__is_page_sharing_candidate(found, 1)) {
     			/*
     			 * Refile our candidate at the head
    @@ -1422,11 +1432,12 @@ static struct stable_node *stable_node_dup_any(struct stable_node *stable_node,
     			   typeof(*stable_node), hlist_dup);
     }
     
    -static struct stable_node *__stable_node_chain(struct stable_node *stable_node,
    +static struct stable_node *__stable_node_chain(struct stable_node **_stable_node,
     					       struct page **tree_page,
     					       struct rb_root *root,
     					       bool prune_stale_stable_nodes)
     {
    +	struct stable_node *stable_node = *_stable_node;
     	if (!is_stable_node_chain(stable_node)) {
     		if (is_page_sharing_candidate(stable_node)) {
     			*tree_page = get_ksm_page(stable_node, false);
    @@ -1434,11 +1445,11 @@ static struct stable_node *__stable_node_chain(struct stable_node *stable_node,
     		}
     		return NULL;
     	}
    -	return stable_node_dup(stable_node, tree_page, root,
    +	return stable_node_dup(_stable_node, tree_page, root,
     			       prune_stale_stable_nodes);
     }
     
    -static __always_inline struct stable_node *chain_prune(struct stable_node *s_n,
    +static __always_inline struct stable_node *chain_prune(struct stable_node **s_n,
     						       struct page **t_p,
     						       struct rb_root *root)
     {
    @@ -1449,7 +1460,7 @@ static __always_inline struct stable_node *chain(struct stable_node *s_n,
     						 struct page **t_p,
     						 struct rb_root *root)
     {
    -	return __stable_node_chain(s_n, t_p, root, false);
    +	return __stable_node_chain(&s_n, t_p, root, false);
     }
     
     /*
    @@ -1490,7 +1501,15 @@ static struct page *stable_tree_search(struct page *page)
     		cond_resched();
     		stable_node = rb_entry(*new, struct stable_node, node);
     		stable_node_any = NULL;
    -		stable_node_dup = chain_prune(stable_node, &tree_page, root);
    +		stable_node_dup = chain_prune(&stable_node, &tree_page, root);
    +		/*
    +		 * NOTE: stable_node may have been freed by
    +		 * chain_prune() if the returned stable_node_dup is
    +		 * not NULL. stable_node_dup may have been inserted in
    +		 * the rbtree instead as a regular stable_node (in
    +		 * order to collapse the stable_node chain if a single
    +		 * stable_node dup was found in it).
    +		 */
     		if (!stable_node_dup) {
     			/*
     			 * Either all stable_node dups were full in
    @@ -1605,20 +1624,33 @@ static struct page *stable_tree_search(struct page *page)
     		return NULL;
     
     replace:
    -	if (stable_node_dup == stable_node) {
    +	/*
    +	 * If stable_node was a chain and chain_prune collapsed it,
    +	 * stable_node will be NULL here. In that case the
    +	 * stable_node_dup is the regular stable_node that has
    +	 * replaced the chain. If stable_node is not NULL and equal to
    +	 * stable_node_dup there was no chain and stable_node_dup is
    +	 * the regular stable_node in the stable rbtree. Otherwise
    +	 * stable_node is the chain and stable_node_dup is the dup to
    +	 * replace.
    +	 */
    +	if (!stable_node || stable_node_dup == stable_node) {
    +		VM_BUG_ON(is_stable_node_chain(stable_node_dup));
    +		VM_BUG_ON(is_stable_node_dup(stable_node_dup));
     		/* there is no chain */
     		if (page_node) {
     			VM_BUG_ON(page_node->head != &migrate_nodes);
     			list_del(&page_node->list);
     			DO_NUMA(page_node->nid = nid);
    -			rb_replace_node(&stable_node->node, &page_node->node,
    +			rb_replace_node(&stable_node_dup->node,
    +					&page_node->node,
     					root);
     			if (is_page_sharing_candidate(page_node))
     				get_page(page);
     			else
     				page = NULL;
     		} else {
    -			rb_erase(&stable_node->node, root);
    +			rb_erase(&stable_node_dup->node, root);
     			page = NULL;
     		}
     	} else {
    
  • patch file icon 0001-ksm-fix-use-after-free-with-merge_across_nodes-0-2.patch (8,125 bytes) 2017-05-12 19:47 -
    From a2e9e515d0b168aac6523c5fcb9ef9a29113af63 Mon Sep 17 00:00:00 2001
    From: Andrea Arcangeli <aarcange@redhat.com>
    Date: Fri, 12 May 2017 19:12:42 +0200
    Subject: [PATCH 1/1] ksm: fix use after free with merge_across_nodes = 0
    
    If merge_across_nodes was manually set to 0 (not the default value) by
    the admin or a tuned profile on NUMA systems triggering cross-NODE
    page migrations, a stable_node use after free could materialize.
    
    If the chain is collapsed stable_node would point to the old chain
    that was already freed. stable_node_dup would be the stable_node dup
    now converted to a regular stable_node and indexed in the rbtree in
    replacement of the freed stable_node chain (not anymore a dup).
    
    This special case where the chain is collapsed in the NUMA replacement
    path, is now detected by setting stable_node to NULL by the
    chain_prune callee if it decides to collapse the chain. This tells the
    NUMA replacement code that even if stable_node and stable_node_dup are
    different, this is not a chain if stable_node is NULL, as the
    stable_node_dup was converted to a regular stable_node and the chain
    was collapsed.
    
    It is generally safer for the callee to force the caller stable_node
    to NULL the moment it become stale so any other mistake like this
    would result in an instant Oops easier to debug than an use after free.
    
    Otherwise the replace logic would act like if stable_node was a valid
    chain, when in fact it was freed. Notably
    stable_node_chain_add_dup(page_node, stable_node) would run on a
    stable stable_node.
    
    Andrey Ryabinin found the source of the use after free in
    chain_prune().
    
    Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Reported-by: Evgheni Dereveanchin <ederevea@redhat.com>
    Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
    ---
     mm/ksm.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
     1 file changed, 55 insertions(+), 11 deletions(-)
    
    diff --git a/mm/ksm.c b/mm/ksm.c
    index 44a1b2a..b53fd58 100644
    --- a/mm/ksm.c
    +++ b/mm/ksm.c
    @@ -338,6 +338,7 @@ static inline void stable_node_chain_add_dup(struct stable_node *dup,
     
     static inline void __stable_node_dup_del(struct stable_node *dup)
     {
    +	VM_BUG_ON(!is_stable_node_dup(dup));
     	hlist_del(&dup->hlist_dup);
     	ksm_stable_node_dups--;
     }
    @@ -1315,12 +1316,12 @@ bool is_page_sharing_candidate(struct stable_node *stable_node)
     	return __is_page_sharing_candidate(stable_node, 0);
     }
     
    -static struct stable_node *stable_node_dup(struct stable_node *stable_node,
    +static struct stable_node *stable_node_dup(struct stable_node **_stable_node,
     					   struct page **tree_page,
     					   struct rb_root *root,
     					   bool prune_stale_stable_nodes)
     {
    -	struct stable_node *dup, *found = NULL;
    +	struct stable_node *dup, *found = NULL, *stable_node = *_stable_node;
     	struct hlist_node *hlist_safe;
     	struct page *_tree_page;
     	int nr = 0;
    @@ -1393,6 +1394,15 @@ static struct stable_node *stable_node_dup(struct stable_node *stable_node,
     			free_stable_node(stable_node);
     			ksm_stable_node_chains--;
     			ksm_stable_node_dups--;
    +			/*
    +			 * NOTE: the caller depends on the
    +			 * *_stable_node to become NULL if the chain
    +			 * was collapsed. Enforce that if anything
    +			 * uses a stale (freed) stable_node chain a
    +			 * visible crash will materialize (instead of
    +			 * an use after free).
    +			 */
    +			*_stable_node = stable_node = NULL;
     		} else if (__is_page_sharing_candidate(found, 1)) {
     			/*
     			 * Refile our candidate at the head
    @@ -1422,11 +1432,12 @@ static struct stable_node *stable_node_dup_any(struct stable_node *stable_node,
     			   typeof(*stable_node), hlist_dup);
     }
     
    -static struct stable_node *__stable_node_chain(struct stable_node *stable_node,
    +static struct stable_node *__stable_node_chain(struct stable_node **_stable_node,
     					       struct page **tree_page,
     					       struct rb_root *root,
     					       bool prune_stale_stable_nodes)
     {
    +	struct stable_node *stable_node = *_stable_node;
     	if (!is_stable_node_chain(stable_node)) {
     		if (is_page_sharing_candidate(stable_node)) {
     			*tree_page = get_ksm_page(stable_node, false);
    @@ -1434,11 +1445,11 @@ static struct stable_node *__stable_node_chain(struct stable_node *stable_node,
     		}
     		return NULL;
     	}
    -	return stable_node_dup(stable_node, tree_page, root,
    +	return stable_node_dup(_stable_node, tree_page, root,
     			       prune_stale_stable_nodes);
     }
     
    -static __always_inline struct stable_node *chain_prune(struct stable_node *s_n,
    +static __always_inline struct stable_node *chain_prune(struct stable_node **s_n,
     						       struct page **t_p,
     						       struct rb_root *root)
     {
    @@ -1449,7 +1460,7 @@ static __always_inline struct stable_node *chain(struct stable_node *s_n,
     						 struct page **t_p,
     						 struct rb_root *root)
     {
    -	return __stable_node_chain(s_n, t_p, root, false);
    +	return __stable_node_chain(&s_n, t_p, root, false);
     }
     
     /*
    @@ -1490,7 +1501,15 @@ static struct page *stable_tree_search(struct page *page)
     		cond_resched();
     		stable_node = rb_entry(*new, struct stable_node, node);
     		stable_node_any = NULL;
    -		stable_node_dup = chain_prune(stable_node, &tree_page, root);
    +		stable_node_dup = chain_prune(&stable_node, &tree_page, root);
    +		/*
    +		 * NOTE: stable_node may have been freed by
    +		 * chain_prune() if the returned stable_node_dup is
    +		 * not NULL. stable_node_dup may have been inserted in
    +		 * the rbtree instead as a regular stable_node (in
    +		 * order to collapse the stable_node chain if a single
    +		 * stable_node dup was found in it).
    +		 */
     		if (!stable_node_dup) {
     			/*
     			 * Either all stable_node dups were full in
    @@ -1605,20 +1624,33 @@ static struct page *stable_tree_search(struct page *page)
     		return NULL;
     
     replace:
    -	if (stable_node_dup == stable_node) {
    +	/*
    +	 * If stable_node was a chain and chain_prune collapsed it,
    +	 * stable_node will be NULL here. In that case the
    +	 * stable_node_dup is the regular stable_node that has
    +	 * replaced the chain. If stable_node is not NULL and equal to
    +	 * stable_node_dup there was no chain and stable_node_dup is
    +	 * the regular stable_node in the stable rbtree. Otherwise
    +	 * stable_node is the chain and stable_node_dup is the dup to
    +	 * replace.
    +	 */
    +	if (!stable_node || stable_node_dup == stable_node) {
    +		VM_BUG_ON(is_stable_node_chain(stable_node_dup));
    +		VM_BUG_ON(is_stable_node_dup(stable_node_dup));
     		/* there is no chain */
     		if (page_node) {
     			VM_BUG_ON(page_node->head != &migrate_nodes);
     			list_del(&page_node->list);
     			DO_NUMA(page_node->nid = nid);
    -			rb_replace_node(&stable_node->node, &page_node->node,
    +			rb_replace_node(&stable_node_dup->node,
    +					&page_node->node,
     					root);
     			if (is_page_sharing_candidate(page_node))
     				get_page(page);
     			else
     				page = NULL;
     		} else {
    -			rb_erase(&stable_node->node, root);
    +			rb_erase(&stable_node_dup->node, root);
     			page = NULL;
     		}
     	} else {
    @@ -1645,7 +1677,17 @@ static struct page *stable_tree_search(struct page *page)
     	/* stable_node_dup could be null if it reached the limit */
     	if (!stable_node_dup)
     		stable_node_dup = stable_node_any;
    -	if (stable_node_dup == stable_node) {
    +	/*
    +	 * If stable_node was a chain and chain_prune collapsed it,
    +	 * stable_node will be NULL here. In that case the
    +	 * stable_node_dup is the regular stable_node that has
    +	 * replaced the chain. If stable_node is not NULL and equal to
    +	 * stable_node_dup there was no chain and stable_node_dup is
    +	 * the regular stable_node in the stable rbtree.
    +	 */
    +	if (!stable_node || stable_node_dup == stable_node) {
    +		VM_BUG_ON(is_stable_node_chain(stable_node_dup));
    +		VM_BUG_ON(is_stable_node_dup(stable_node_dup));
     		/* chain is missing so create it */
     		stable_node = alloc_stable_node_chain(stable_node_dup,
     						      root);
    @@ -1658,6 +1700,8 @@ static struct page *stable_tree_search(struct page *page)
     	 * of the current nid for this page
     	 * content.
     	 */
    +	VM_BUG_ON(!is_stable_node_chain(stable_node));
    +	VM_BUG_ON(!is_stable_node_dup(stable_node_dup));
     	VM_BUG_ON(page_node->head != &migrate_nodes);
     	list_del(&page_node->list);
     	DO_NUMA(page_node->nid = nid);
    

-Relationships
+Relationships

-Notes

~0028751

post-factum (reporter)

Brief vmcore analysis:

       PANIC: "BUG: unable to handle kernel paging request at 0000000000a183eb"

crash> bt
PID: 188 TASK: ffff881ffde2edd0 CPU: 15 COMMAND: "ksmd"
 #0 [ffff880ffd917a18] machine_kexec at ffffffff81059c7b
 #1 [ffff880ffd917a78] __crash_kexec at ffffffff811052d2
 #2 [ffff880ffd917b48] crash_kexec at ffffffff811053c0
 #3 [ffff880ffd917b60] oops_end at ffffffff8168f188
 #4 [ffff880ffd917b88] no_context at ffffffff8167ed93
 #5 [ffff880ffd917bd8] __bad_area_nosemaphore at ffffffff8167ee29
 #6 [ffff880ffd917c20] bad_area_nosemaphore at ffffffff8167ef93
 #7 [ffff880ffd917c30] __do_page_fault at ffffffff81691f1e
 #8 [ffff880ffd917c90] do_page_fault at ffffffff816920c5
 #9 [ffff880ffd917cc0] page_fault at ffffffff8168e388
    [exception RIP: remove_node_from_stable_tree+0x73]
    RIP: ffffffff811d5103 RSP: ffff880ffd917d70 RFLAGS: 00010206
    RAX: 0000000067e52fc1 RBX: 0000000000a183bb RCX: 0000000000000001
    RDX: 0000000000a183eb RSI: 0000000000000000 RDI: ffff880ac02d2688
    RBP: ffff880ffd917d80 R8: 0000000000000069 R9: 00000000fffffffc
    R10: 000000000000003c R11: 0000000000000002 R12: ffff880ac02d2688
    R13: 0000000000000005 R14: ffff880ac02d268b R15: ffff880ac02d2688
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff880ffd917d88] get_ksm_page at ffffffff811d54f8
#11 [ffff880ffd917dc8] ksm_do_scan at ffffffff811d6c4b
#12 [ffff880ffd917e68] ksm_scan_thread at ffffffff811d7a1f
#13 [ffff880ffd917ec8] kthread at ffffffff810b064f
#14 [ffff880ffd917f50] ret_from_fork at ffffffff81696818

crash> dis -lr remove_node_from_stable_tree+0x73 | tail -n 20
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/arch/x86/include/asm/atomic.h: 124
0xffffffff811d50d4 <remove_node_from_stable_tree+0x44>: lock decl 0x28(%rdi)
0xffffffff811d50d8 <remove_node_from_stable_tree+0x48>: sete %al
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/include/linux/rmap.h: 108
0xffffffff811d50db <remove_node_from_stable_tree+0x4b>: test %al,%al
0xffffffff811d50dd <remove_node_from_stable_tree+0x4d>: je 0xffffffff811d50e4 <remove_node_from_stable_tree+0x54>
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/include/linux/rmap.h: 109
0xffffffff811d50df <remove_node_from_stable_tree+0x4f>: callq 0xffffffff811bbdf0 <__put_anon_vma>
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/mm/ksm.c: 621
0xffffffff811d50e4 <remove_node_from_stable_tree+0x54>: andq $0xfffffffffffff000,0x18(%rbx)
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/mm/ksm.c: 622
0xffffffff811d50ec <remove_node_from_stable_tree+0x5c>: callq 0xffffffff8168bbf0 <_cond_resched>
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/mm/ksm.c: 613
0xffffffff811d50f1 <remove_node_from_stable_tree+0x61>: mov 0x30(%rbx),%rbx
0xffffffff811d50f5 <remove_node_from_stable_tree+0x65>: test %rbx,%rbx
0xffffffff811d50f8 <remove_node_from_stable_tree+0x68>: je 0xffffffff811d5118 <remove_node_from_stable_tree+0x88>
0xffffffff811d50fa <remove_node_from_stable_tree+0x6a>: mov 0x38(%r12),%eax
0xffffffff811d50ff <remove_node_from_stable_tree+0x6f>: sub $0x30,%rbx
/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/mm/ksm.c: 614
0xffffffff811d5103 <remove_node_from_stable_tree+0x73>: cmpq $0x0,0x30(%rbx)

Crash on dereferencing address in RBX which is 0000000000a183bb.

 606 static void remove_node_from_stable_tree(struct stable_node *stable_node)
 607 {
 608 struct rmap_item *rmap_item;
 609
 610 /* check it's not STABLE_NODE_CHAIN or negative */
 611 BUG_ON(stable_node->rmap_hlist_len < 0);
 612
 613 hlist_for_each_entry(rmap_item, &stable_node->hlist, hlist) {
 614 if (rmap_item->hlist.next)

rmap_item pointer is invalid.

RBX is set from RDX:

/usr/src/debug/kernel-3.10.0-514.6.1.el7/linux-3.10.0-514.6.1.el7.x86_64/mm/ksm.c: 613
0xffffffff811d50aa <remove_node_from_stable_tree+0x1a>: mov 0x28(%rdi),%rdx
0xffffffff811d50ae <remove_node_from_stable_tree+0x1e>: test %rdx,%rdx
0xffffffff811d50b1 <remove_node_from_stable_tree+0x21>: lea -0x30(%rdx),%rbx

RDX is set from RDI (1st function argument), which is struct stable_node *. Checking it (this is RB tree node):

crash> tree ffff880ac02d2688
ffff8812ee1ffec8
dead000000000100
tree: invalid kernel virtual address: dead000000000110 type: "rb_node rb_left"

Tree is corrupted.

Note: THP are enabled.

~0028878

jekader (reporter)

The same system crashed again with something KSM-related on the newest kernel even though it was disabled via oVirt UI:

[21213.043816] ------------[ cut here ]------------
[21213.048972] kernel BUG at mm/ksm.c:611!
[21213.053250] invalid opcode: 0000 [#1] SMP
[21213.057836] Modules linked in: vhost_net vhost macvtap macvlan ebt_arp ebtable_nat tun ebtable_filter ebtables ip6table_filter ip6_tables scsi_transport_iscsi xt_physdev br_netfilter ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter intel_powerclamp coretemp kvm_intel kvm irqbypass dm_service_time crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_devintf acpi_power_meter sg acpi_pad iTCO_wdt iTCO_vendor_support wmi dcdbas mei_me pcspkr sb_edac ipmi_si ipmi_msghandler lpc_ich mei shpchp edac_core dm_multipath dm_mod nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul
[21213.137165] sysimgblt crct10dif_common fb_sys_fops crc32c_intel ttm ahci drm libahci libata i2c_core tg3 megaraid_sas ptp pps_core fjes 8021q garp mrp bridge stp llc bonding
[21213.153283] CPU: 10 PID: 186 Comm: ksmd Not tainted 3.10.0-514.10.2.el7.x86_64 #1
[21213.161632] Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.1.3 11/20/2013
[21213.169980] task: ffff88203fa2edd0 ti: ffff880ffde30000 task.ti: ffff880ffde30000
[21213.178327] RIP: 0010:[<ffffffff811d525c>] [<ffffffff811d525c>] remove_node_from_stable_tree+0x11c/0x120
[21213.189009] RSP: 0018:ffff880ffde33d20 EFLAGS: 00010282
[21213.194933] RAX: 0000000081a3b0f8 RBX: ffffea0000000180 RCX: 0000000000000001
[21213.202893] RDX: 0000000081a3b0f8 RSI: 0000000000000000 RDI: ffff8809c5710d08
[21213.210852] RBP: ffff880ffde33d30 R08: 0000000000000065 R09: 00000000ffffffff
[21213.218811] R10: ffff880687fd0000 R11: 0000000000000000 R12: ffff8809c5710d08
[21213.226770] R13: 0000000000000006 R14: ffff8809c5710d0b R15: ffff8809c5710d08
[21213.234729] FS: 0000000000000000(0000) GS:ffff880fff940000(0000) knlGS:0000000000000000
[21213.243755] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21213.250166] CR2: 00007fc6f9d48270 CR3: 00000000019ba000 CR4: 00000000001427e0
[21213.258125] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21213.266085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[21213.274045] Stack:
[21213.276285] ffffea0000000180 ffffea0000000000 ffff880ffde33d70 ffffffff811d55a8
[21213.284573] 00ff8801697a0000 ffff8809c5710d08 ffff8809c5710d08 ffff880ffde33e28
[21213.292862] ffff881c606de0c0 ffff881fdabc2180 ffff880ffde33dc0 ffffffff811d5f2f
[21213.301150] Call Trace:
[21213.303880] [<ffffffff811d55a8>] get_ksm_page+0x98/0x120
[21213.309903] [<ffffffff811d5f2f>] __stable_node_chain+0x3f/0x250
[21213.316603] [<ffffffff811d6cce>] ksm_do_scan+0x46e/0x11e0
[21213.322721] [<ffffffff811d7acf>] ksm_scan_thread+0x8f/0x240
[21213.329037] [<ffffffff810b17d0>] ? wake_up_atomic_t+0x30/0x30
[21213.335543] [<ffffffff811d7a40>] ? ksm_do_scan+0x11e0/0x11e0
[21213.341956] [<ffffffff810b06ff>] kthread+0xcf/0xe0
[21213.347398] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
[21213.354685] [<ffffffff81696a58>] ret_from_fork+0x58/0x90
[21213.360708] [<ffffffff810b0630>] ? kthread_create_on_node+0x140/0x140
[21213.367990] Code: 83 2d d8 b3 d7 00 01 49 89 44 24 08 66 b8 00 02 49 89 44 24 10 eb ac 0f 1f 84 00 00 00 00 00 49 8d 7c 24 18 e8 06 e0 15 00 eb 98 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 4c 8b 67
[21213.389642] RIP [<ffffffff811d525c>] remove_node_from_stable_tree+0x11c/0x120
[21213.397709] RSP <ffff880ffde33d20>

~0028879

jekader (reporter)

KSM and THP state on the system:

# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
# cat /sys/kernel/mm/ksm/run
0

~0028880

jekader (reporter)

Tried to troubleshoot the crash on this new kernel, yet there is no debuginfo published so logged a separate issue about that: https://bugs.centos.org/view.php?id=12984

~0029069

daxkelson (reporter)

We are also running into this bug.

3.10.0-514.10.2.el7.x86_64

# journalctl -r -b -1 | head -n 2
-- Logs begin at Sun 2017-03-05 16:54:54 MST...
Apr 11 12:36:14 virthost.example.com kernel: kernel BUG at mm/ksm.c:611!

~0029270

aarcange (reporter)

Andrey Ryabinin identified an use after free in the replace path that gets activated by merge_across_nodes = 0 which is consistent with this bug.

Note this bug cannot trigger with merge_across_nodes = 1 (default). For this to trigger a tuned profile or the admin should manually set merge_across_nodes to 0 in /sys/kernel/mm/ksm.

Thanks.

~0029272

aarcange (reporter)

Patch updated to cover the chain_append: branch too which only activates with merge_across_nodes set to 0 too and posted to linux-mm for testing.

https://marc.info/?l=linux-mm&m=149461789322190&w=2
https://marc.info/?l=linux-mm&m=149461789622191&w=2
+Notes

-Issue History
Date Modified Username Field Change
2017-03-01 13:27 jekader New Issue
2017-03-03 12:58 post-factum Note Added: 0028751
2017-03-17 08:59 jekader Note Added: 0028878
2017-03-17 09:08 jekader Note Added: 0028879
2017-03-17 09:48 jekader Note Added: 0028880
2017-04-13 21:52 daxkelson Note Added: 0029069
2017-05-12 17:54 aarcange File Added: 0001-ksm-fix-use-after-free-with-merge_across_nodes-0.patch
2017-05-12 17:54 aarcange Note Added: 0029270
2017-05-12 19:47 aarcange File Added: 0001-ksm-fix-use-after-free-with-merge_across_nodes-0-2.patch
2017-05-12 19:47 aarcange Note Added: 0029272
+Issue History