View Issue Details

IDProjectCategoryView StatusLast Update
0009538CentOS-7kernelpublic2015-12-21 18:43
ReporterRDGarner 
PriorityhighSeveritycrashReproducibilityalways
Status newResolutionopen 
Product Version7.1-1503 
Target VersionFixed in Version 
Summary0009538: CentOS 7 kernel panics under 2012 R2 Hyper-V live migration
DescriptionI believe I have identified an issue with the current C7 kernel when running under 2012 R2 Hyper-V in a Hyper-V cluster. Every time the VM is live-migrated between nodes, the kernel will panic with a null pointer dereference every time - please see the log provided below.

Interestingly, this issue is resolved if the Microsoft-provided LIS4 release and its kmod-microsoft-hyper-v-4.0.11-20150728.x86_64.rpm package are installed, suggesting the issue resides in the C7-shipped hv_ modules.

I have a number of partial kdumps available and test machines available to assist with debugging this issue further.
Steps To ReproduceInstall fresh C7 VM under Hyper-V 2012 R2 cluster
Yum update to current, reboot
live-migrate VM between cluster nodes. VM will panic each time.
Additional Information<snip>
[ 1013.811099] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 1013.811879] IP: [<ffffffff815f247e>] klist_put+0xe/0xa0
[ 1013.811879] PGD f792c067 PUD f75a6067 PMD 0
[ 1013.811879] Oops: 0000 [#1] SMP
[ 1013.811879] Modules linked in: nf_log_ipv4 nf_log_common nf_nat_ftp xt_REDIRECT xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT xt_LOG xt_limit iptable_filter xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables vfat fat crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper hyperv_fb ablk_helper cryptd hyperv_keyboard pcspkr hv_utils loop xfs libcrc32c sd_mod crc_t10dif crct10dif_common sr_mod cdrom hv_netvsc hv_storvsc hid_hyperv hv_vmbus dm_mirror dm_region_hash dm_log dm_mod
[ 1013.815507] CPU: 1 PID: 170 Comm: kworker/u128:1 Tainted: G W -------------- 3.10.0-229.14.1.el7.x86_64 #1
[ 1013.815507] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[ 1013.815507] Workqueue: events_unbound async_run_entry_fn
[ 1013.815507] task: ffff880003672d80 ti: ffff8800036a0000 task.ti: ffff8800036a0000
[ 1013.815507] RIP: 0010:[<ffffffff815f247e>] [<ffffffff815f247e>] klist_put+0xe/0xa0
[ 1013.815507] RSP: 0018:ffff8800036a3d38 EFLAGS: 00010286
[ 1013.815507] RAX: 0000000000000000 RBX: ffff8801008d5000 RCX: 000000018024001a
[ 1013.815507] RDX: 000000018024001b RSI: 0000000000000001 RDI: 0000000000000028
[ 1013.815507] RBP: ffff8800036a3d58 R08: ffff88003f4cf9a0 R09: 000000018024001a
[ 1013.815507] R10: ffffffff8123dd28 R11: ffffea0000fd33c0 R12: ffff8801008d5148
[ 1013.815507] R13: ffff88004dc7e540 R14: 0000000000000001 R15: 0000000000001000
[ 1013.815507] FS: 0000000000000000(0000) GS:ffff880102e20000(0000) knlGS:0000000000000000
[ 1013.815507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1013.815507] CR2: 0000000000000028 CR3: 00000000f7456000 CR4: 00000000001406e0
[ 1013.815507] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1013.815507] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1013.815507] Stack:
[ 1013.815507] ffff8801008d5000 ffff8801008d5148 ffff88004dc7e540 ffff88003f5c2028
[ 1013.815507] ffff8800036a3d68 ffffffff815f251e ffff8800036a3d98 ffffffff813ce848
[ 1013.815507] ffff8801008d5000 ffff8801008d5148 ffff88004dc7e540 ffff8801008d4860
[ 1013.815507] Call Trace:
[ 1013.815507] [<ffffffff815f251e>] klist_del+0xe/0x10
[ 1013.815507] [<ffffffff813ce848>] device_del+0x58/0x1f0
[ 1013.815507] [<ffffffff813feb85>] __scsi_remove_device+0xc5/0xd0
[ 1013.815507] [<ffffffff813fcee7>] do_scan_async+0x87/0x150
[ 1013.815507] [<ffffffff8109e849>] async_run_entry_fn+0x39/0x120
[ 1013.815507] [<ffffffff8108f0cb>] process_one_work+0x17b/0x470
[ 1013.815507] [<ffffffff8108fe9b>] worker_thread+0x11b/0x400
[ 1013.815507] [<ffffffff8108fd80>] ? rescuer_thread+0x400/0x400
[ 1013.815507] [<ffffffff8109727f>] kthread+0xcf/0xe0
[ 1013.815507] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
[ 1013.815507] [<ffffffff816142d8>] ret_from_fork+0x58/0x90
[ 1013.815507] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
[ 1013.815507] Code: c1 42 18 83 c0 01 83 f8 01 7e 01 c3 55 48 89 e5 e8 cf 50 01 00 5d c3 66 0f 1f 44 00 00 55 48 89 e5 41 56 41 89 f6 41 55 41 54 53 <4c> 8b 27 48 89 fb 49 83 e4 fe 4c 89 e7 4d 8b 6c 24 20 e8 5b 92
[ 1013.815507] RIP [<ffffffff815f247e>] klist_put+0xe/0xa0
[ 1013.815507] RSP <ffff8800036a3d38>
[ 1013.815507] CR2: 0000000000000028
TagsNo tags attached.
abrt_hash
URL

Relationships

related to 0009477 resolvedIssue Tracker Kernel panic on SCSI device hot add/removal and VM suspend on Hyper-V 

Activities

RDGarner

RDGarner

2015-09-30 13:27

reporter   ~0024489

Further testing suggests that this is a regression in 3.10.0-229.7.2 onward:

Installed a fresh VM with vanilla 7.1 from release media (3.10.0-229.el7.x86_64), live migrated five times to test:
1: stable
2: stable
3: stable
4: stable
5: stable

Installed 3.10.0-229.1.2. Rebooted, live migrated five times to test:
1: stable
2: stable
3: stable
4: stable
5: stable

Installed 3.10.0-229.4.2. Rebooted, live migrated five times to test:
1: stable
2: stable
3: stable
4: stable
5: stable

Installed 3.10.0-229.7.2. Rebooted, live migrated five times to test:
1: console reports Buffer I/O error on dm0 and CPU#1 stuck for 22s. Console locks up. Manually reset
2: panic, Null pointer dereference at 00000000000000028. Resets automatically
3: console reports Buffer I/O error on dm0 and CPU#1 stuck for 22s. XFS dismounts FS owing to I/O error on /dev/sda. Console locks up. Manually reset.
4: panic, Null pointer dereference at 00000000000000028. Resets automatically
5: panic, Null pointer dereference at 00000000000000028. Resets automatically

Installed 3.10.0-229.11.1. Rebooted, live migrated five times to test:
1: panic, Null pointer dereference at 00000000000000028. Resets automatically
2: panic, Null pointer dereference at 00000000000000028. Resets automatically
3: XFS metadata I/O error xfs_trans_read_buf_map error 5 numblks 1, soft lockup - CPU3 stuck for 23s. Console locks up, manually reset.
4: XFS metadata I/O error xfs_trans_read_buf_map error 5 numblks 1, soft lockup - CPU0 stuck for 23s. Console locks up, manually reset.
5: panic, Null pointer dereference at 00000000000000028. Resets automatically

3.10.0-229.7.2 was released under RHSA-2015-1137 (https://rhn.redhat.com/errata/RHSA-2015-1137.html). None of the listed security fixes appear to relate, and the CentOS changelog for same only lists "Debranding changes". I suspect this regression exists upstream, as I can also replicate it with the latest OpenVZ 7 kernel which is based from 3.10.0-229.7.2.
tigalch

tigalch

2015-09-30 13:31

manager   ~0024490

Thanks for your investigation. Please post these findings upstream at RHs bugzilla (which can be done without a service contract). Could you then also please cross-reference the bugzilla entries. Once it gets fixed in the upstream kernel, CentOS will inherit the fix.
Thanks in advance.
RDGarner

RDGarner

2015-09-30 13:44

reporter   ~0024491

Thanks tigalch, this has been raised:

https://bugzilla.redhat.com/show_bug.cgi?id=1267591
tigalch

tigalch

2015-09-30 13:58

manager   ~0024494

Thanks. As kernel bugs are - as usual - marked private by RH, could you please report back on changes from time to time?
RDGarner

RDGarner

2015-09-30 14:29

reporter   ~0024496

My bug (RHBZ1267591) has been closed as a duplicate of the pre-existing RHBZ1242390.
tigalch

tigalch

2015-09-30 14:51

manager   ~0024497

Can you access 1242390? If so, please still post some progress, if you find the time.
RDGarner

RDGarner

2015-09-30 15:18

reporter   ~0024498

Sadly not, but I have had confirmation that 1242390 has also been resolved and rolled into 7.2. The dev I have been discussing this with has said he expects a 7.1 point release at some point, but there is no guarantee if or when this will happen.
tigalch

tigalch

2015-09-30 15:19

manager   ~0024499

7.2 is currently in beta phase (only for RH customers). So maybe this fix will not take to long to be released.
RDGarner

RDGarner

2015-11-04 12:26

reporter   ~0024778

For anyone interested, I have just tested with kernel 3.10.0-229.20.1, and this problem remains unfixed upstream.

I will repeat testing and update this bug should any further pre-7.2 errata kernels be released.
toracat

toracat

2015-11-04 18:41

manager   ~0024781

@RDGarner

I presume 3.10.0-229.20.1 is the last one before 7.2. Could you confirm that the upstream BZ dealing with this issue is 1242390 ?
bhavikb

bhavikb

2015-11-17 16:04

reporter   ~0024874

Hi,

Confirming the issue with kernel: 3.10.0-229.14.1.el7.x86_64
CentOS Linux release 7.1.1503 (Core)

Hyper-V servers running 2012-R2, with Failover Clustering.
RDGarner

RDGarner

2015-12-09 10:46

reporter   ~0025014

I have had direct contact from TUV, and confirmation this is fixed in the release 7.2 kernel, but also in 3.10.0-229.24.1 for 7.1 which I can't find errata for but guess must be soon to ship.
tigalch

tigalch

2015-12-09 10:48

manager   ~0025015

The kernel for 7.2 (1511) is allready in the CR repo. You could give that a try. for 229.24.1 , that now sounds like a z-stream release, which CentOS does not reproduce.
tigalch

tigalch

2015-12-17 09:57

manager   ~0025110

How is your issue with the release of 7 (1511) and the new kernel?
jrcresawn

jrcresawn

2015-12-21 18:43

reporter   ~0025144

The 7.2.1511 release of CentOS has resolved issue #9538 for me.

Issue History

Date Modified Username Field Change
2015-09-30 09:42 RDGarner New Issue
2015-09-30 13:27 RDGarner Note Added: 0024489
2015-09-30 13:31 tigalch Note Added: 0024490
2015-09-30 13:44 RDGarner Note Added: 0024491
2015-09-30 13:58 tigalch Note Added: 0024494
2015-09-30 14:29 RDGarner Note Added: 0024496
2015-09-30 14:51 tigalch Note Added: 0024497
2015-09-30 15:18 RDGarner Note Added: 0024498
2015-09-30 15:19 tigalch Note Added: 0024499
2015-11-04 12:26 RDGarner Note Added: 0024778
2015-11-04 18:41 toracat Note Added: 0024781
2015-11-17 16:04 bhavikb Note Added: 0024874
2015-11-18 16:58 toracat Relationship added related to 0009477
2015-12-09 10:46 RDGarner Note Added: 0025014
2015-12-09 10:48 tigalch Note Added: 0025015
2015-12-17 09:57 tigalch Note Added: 0025110
2015-12-21 18:43 jrcresawn Note Added: 0025144