0017287CentOS-7kernelpublic2020-04-24 23:41
Product Version7.7-1908 
Summary0017287: Kernel OOPS in kvm_inject_apic_timer_irqs
DescriptionA nested virtualization setup where a Level 1 host (Level 0 being the hypervisor) that runs the nested VMs dies. The kernel version is 3.10.0-1062.18.1.el7.x86_64. The Level 2 guest VMs are Arch Linux boxes on the latest kernel and run using default libvirtd settings. I cannot reproduce this crash. It happens randomly; usually in a couple weeks. The same setup worked fine on Intel hardware (Xeon-D), and the crashes started occurring after migrating to AMD EPYC CPUs.

I've captured the vmcore and vmcore-dmesg using kdump (the tainted kernel is purely due to ZFS). Please let me know if you need more information.

[1614204.264174] BUG: unable to handle kernel paging request at ffffffffbc43a830
[1614204.264414] IP: [<ffffffffbc43a830>] 0xffffffffbc43a830
[1614204.264414] PGD 20de14067 PUD 20de15063 PMD 0
[1614204.264414] Oops: 0010 [#1] SMP
[1614204.264414] Modules linked in: tcp_diag inet_diag tun zfs(POE) zunicode(POE) zlua(POE) zcommon(POE) znvpair(POE) zavl(POE) fuse icp(POE) spl(OE) dm_crypt drbg ansi_cprng dm_mod ebtable_filter ebtables devlink ip6table_filter ip6_tables xt_conntrack iptable_filter bridge xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 stp xt_multiport llc iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c sunrpc kvm_amd kvm ppdev vfat irqbypass fat crc32_pclmul ghash_clmulni_intel aesni_intel parport_pc parport lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_net ata_generic virtio_scsi virtio_blk pata_acpi ata_piix libata crct10dif_pclmul crct10dif_common crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy
[1614204.264414] CPU: 5 PID: 3237 Comm: qemu-kvm Kdump: loaded Tainted: P OE ------------ 3.10.0-1062.18.1.el7.x86_64 #1
[1614204.264414] Hardware name: XXX
[1614204.264414] task: ffff8914540e41c0 ti: ffff89144de84000 task.ti: ffff89144de84000
[1614204.264414] RIP: 0010:[<ffffffffbc43a830>] [<ffffffffbc43a830>] 0xffffffffbc43a830
[1614204.264414] RSP: 0018:ffff89144de87d60 EFLAGS: 00010246
[1614204.264414] RAX: 0000000000000000 RBX: ffff8913d4d7b400 RCX: 0000000000000001
[1614204.264414] RDX: 00000000000000ec RSI: 0000000000000000 RDI: 0000000000000000
[1614204.264414] RBP: ffff89144de87d70 R08: 0000000000000000 R09: 0000000000000000
[1614204.264414] R10: 0000000000000001 R11: 0000000000000005 R12: ffff89144de87fd8
[1614204.264414] R13: ffff8914540e41c0 R14: ffff891389190048 R15: 0000000000000001
[1614204.264414] FS: 00007fe86c79d700(0000) GS:ffff8914bfd40000(0000) knlGS:0000000000000000
[1614204.264414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1614204.264414] CR2: ffffffffbc43a830 CR3: 000000014117a000 CR4: 00000000003407e0
[1614204.264414] Call Trace:
[1614204.264414] [<ffffffffc060798e>] ? __apic_accept_irq+0xe/0x3b0 [kvm]
[1614204.264414] [<ffffffffc0608ec5>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
[1614204.264414] [<ffffffffc06090f8>] kvm_inject_apic_timer_irqs+0x28/0x70 [kvm]
[1614204.264414] [<ffffffffc0605a03>] kvm_inject_pending_timer_irqs+0x13/0x30 [kvm]
[1614204.264414] [<ffffffffc05e7a98>] kvm_arch_vcpu_ioctl_run+0x348/0x480 [kvm]
[1614204.264414] [<ffffffffc05c8f71>] kvm_vcpu_ioctl+0x2c1/0x6e0 [kvm]
[1614204.264414] [<ffffffff87460940>] ? __pollwait+0xf0/0xf0
[1614204.264414] [<ffffffff8745fde0>] do_vfs_ioctl+0x3a0/0x5a0
[1614204.264414] [<ffffffff8744ab38>] ? vfs_write+0x168/0x1f0
[1614204.264414] [<ffffffff87460081>] SyS_ioctl+0xa1/0xc0
[1614204.264414] [<ffffffff8744b8d4>] ? SyS_write+0xa4/0xf0
[1614204.264414] [<ffffffff8798dede>] system_call_fastpath+0x25/0x2a
[1614204.264414] Code: Bad RIP value.
[1614204.264414] RIP [<ffffffffbc43a830>] 0xffffffffbc43a830
[1614204.264414] RSP <ffff89144de87d60>
[1614204.264414] CR2: ffffffffbc43a830
