View Issue Details

IDProjectCategoryView StatusLast Update
0013887CentOS-7kernelpublic2018-06-10 16:06
Reporterrvichery 
PrioritynormalSeveritycrashReproducibilityrandom
Status newResolutionopen 
PlatformOSCentOSOS Version7.2.1511
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0013887: CentOS hypervisor hangs or crashes randomly
DescriptionI have a 8 nodes cluster of CentOS 7 KVM hypervisor, since january I am seeing some random issues (crashes or freezes). Most of the time, the server hangs and the only solution is to reboot it. After some time (more than a few days) it will happen again. I don't always have a crash dump available as most of the time the system is completely unresponsive. When the server hangs I am almost always able to collect some logs from our central syslog server. Here is the output of the syslog during the two last freezes (some messages are out of order as the timestamp is the timestamp when the message is received on the central syslog):

==============
SERVER fre109
==============

2017-09-20T20:30:10.115Z fre109 kernel ------------[ cut here ]------------
2017-09-20T20:30:10.115Z fre109 kernel WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x276/0x280()
2017-09-20T20:30:10.116Z fre109 kernel ffff881ffa6c4f40 0000000000000040 0000000000000000 ffff881fff803de0
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff81596466>] dev_watchdog+0x276/0x280
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff810859dc>] warn_slowpath_fmt+0x5c/0x80
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0
2017-09-20T20:30:10.116Z fre109 kernel ffff881fff803d88 00000000e00bd743 ffff881fff803d40 ffffffff816862ac
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff81095a96>] call_timer_fn+0x36/0x110
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff815961f0>] ? dev_graft_qdisc+0x80/0x80
2017-09-20T20:30:10.116Z fre109 kernel CPU: 0 PID: 26235 Comm: qemu-kvm Not tainted 3.10.0-514.6.1.el7.x86_64 #1
2017-09-20T20:30:10.116Z fre109 kernel ffff881fff803d78 ffffffff81085940 0000000000000011 ffff881ffa6a0000
2017-09-20T20:30:10.116Z fre109 kernel "Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 2.0 12/17/2015"
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff81098787>] run_timer_softirq+0x237/0x340
2017-09-20T20:30:10.116Z fre109 kernel nfsd bridge stp llc auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit crct10dif_pclmul drm_kms_helper crct10dif_common crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ahci libahci mdio i2c_core libata ptp pps_core dca fjes dm_mirror dm_region_hash dm_log dm_mod
2017-09-20T20:30:10.116Z fre109 kernel NETDEV WATCHDOG: enp3s0f1 (ixgbe): transmit queue 17 timed out
2017-09-20T20:30:10.116Z fre109 kernel Call Trace:
2017-09-20T20:30:10.116Z fre109 kernel <IRQ> [<ffffffff816862ac>] dump_stack+0x19/0x1b
2017-09-20T20:30:10.116Z fre109 kernel [<ffffffff815961f0>] ? dev_graft_qdisc+0x80/0x80
2017-09-20T20:30:10.116Z fre109 kernel Modules linked in: isofs vport_vxlan vxlan ip6_udp_tunnel udp_tunnel vhost_net vhost macvtap macvlan tun ebtable_filter ebtables ip6table_filter ip6_tables rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache 8021q garp mrp sch_htb sch_ingress veth openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 xfs libcrc32c sr_mod cdrom bonding iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mxm_wmi pcspkr uas sb_edac usb_storage edac_core sg mei_me mei shpchp i2c_i801 lpc_ich ipmi_devintf ipmi_si wmi ipmi_msghandler acpi_power_meter acpi_pad br_netfilter
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8169835c>] call_softirq+0x1c/0x30
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8169751d>] apic_timer_interrupt+0x6d/0x80
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa2047d37>] loaded_vmcs_clear+0x27/0x30 [kvm_intel]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff810ce46c>] ? dequeue_entity+0x11c/0x5d0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa05dc3b9>] kvm_sched_in+0x39/0x40 [kvm]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8168b370>] __schedule+0x3b0/0x990
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa05dcfab>] kvm_vcpu_block+0x8b/0x2c0 [kvm]
2017-09-20T20:30:10.117Z fre109 kernel ---[ end trace dbb81aff1c6e8357 ]---
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa05e0a31>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff81212025>] do_vfs_ioctl+0x2d5/0x4b0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8108f5b5>] irq_exit+0x115/0x120
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff81698fd5>] smp_apic_timer_interrupt+0x45/0x60
2017-09-20T20:30:10.117Z fre109 kernel <EOI> [<ffffffff810f93aa>] ? generic_exec_single+0xfa/0x1a0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa06c749b>] ? vhost_work_queue+0x3b/0x70 [vhost]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa204cf22>] vmx_vcpu_load+0x82/0x2d0 [kvm_intel]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8102d365>] do_softirq+0x65/0xa0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff812aebae>] ? file_has_perm+0xae/0xc0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8108f21f>] __do_softirq+0xef/0x280
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa204af30>] ? copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff810f94af>] smp_call_function_single+0x5f/0xa0
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa204af30>] ? copy_shadow_to_vmcs12+0x110/0x110 [kvm_intel]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff810cdef1>] ? update_curr+0x71/0x190
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff8168b979>] schedule+0x29/0x70
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa06c7ee1>] ? vhost_poll_wakeup+0x21/0x30 [vhost]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa05f5a87>] kvm_arch_vcpu_load+0x37/0x230 [kvm]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff810bfa21>] finish_task_switch+0x81/0x180
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffffa05fbaf7>] kvm_arch_vcpu_ioctl_run+0x187/0x450 [kvm]
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff816968c9>] system_call_fastpath+0x16/0x1b
2017-09-20T20:30:10.117Z fre109 kernel [<ffffffff812122a1>] SyS_ioctl+0xa1/0xc0
2017-09-20T20:30:10.119Z fre109 kernel ixgbe 0000:03:00.1 enp3s0f1: initiating reset due to tx timeout
2017-09-20T20:30:10.119Z fre109 kernel ixgbe 0000:03:00.1 enp3s0f1: Reset adapter
2017-09-20T20:30:15.123Z fre109 kernel ixgbe 0000:03:00.1 enp3s0f1: initiating reset due to tx timeout
2017-09-20T20:30:20.115Z fre109 kernel ixgbe 0000:03:00.1 enp3s0f1: initiating reset due to tx timeout

==============
SERVER fre111
==============

2017-09-20T20:22:30.720Z fre111 kernel net_ratelimit: 43 callbacks suppressed
2017-09-20T20:25:48.700Z fre111 kernel WARNING: at net/sched/sch_generic.c:297 dev_watchdog+0x276/0x280()
2017-09-20T20:25:48.700Z fre111 kernel Call Trace:
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff810859dc>] warn_slowpath_fmt+0x5c/0x80
2017-09-20T20:25:48.700Z fre111 kernel ffff881fff803d88 00000000ab7ad7fc ffff881fff803d40 ffffffff816862ac
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff81098787>] run_timer_softirq+0x237/0x340
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff816974b0>] ? uv_bau_message_intr1+0x80/0x80
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffff810f7bc2>] ? do_futex+0x122/0x5b0
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffff812aebae>] ? file_has_perm+0xae/0xc0
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffff812122a1>] SyS_ioctl+0xa1/0xc0
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff8102d365>] do_softirq+0x65/0xa0
2017-09-20T20:25:48.700Z fre111 kernel ffff881ffaafcf40 0000000000000040 0000000000000000 ffff881fff803de0
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff81596466>] dev_watchdog+0x276/0x280
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff815961f0>] ? dev_graft_qdisc+0x80/0x80
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff8108f21f>] __do_softirq+0xef/0x280
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff81698fd5>] smp_apic_timer_interrupt+0x45/0x60
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff8169835c>] call_softirq+0x1c/0x30
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffff81212025>] do_vfs_ioctl+0x2d5/0x4b0
2017-09-20T20:25:48.701Z fre111 kernel ixgbe 0000:03:00.0 enp3s0f0: initiating reset due to tx timeout
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff8108f5b5>] irq_exit+0x115/0x120
2017-09-20T20:25:48.700Z fre111 kernel ------------[ cut here ]------------
2017-09-20T20:25:48.700Z fre111 kernel ffff881fff803d78 ffffffff81085940 000000000000000f ffff881ffab00000
2017-09-20T20:25:48.700Z fre111 kernel <IRQ> [<ffffffff816862ac>] dump_stack+0x19/0x1b
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff81085940>] warn_slowpath_common+0x70/0xb0
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff815961f0>] ? dev_graft_qdisc+0x80/0x80
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff81095a96>] call_timer_fn+0x36/0x110
2017-09-20T20:25:48.700Z fre111 kernel <EOI> [<ffffffffa1d31390>] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel]
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffffa1d3165c>] ? vmx_handle_external_intr+0x6c/0x70 [kvm_intel]
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffffa066ea3d>] kvm_arch_vcpu_ioctl_run+0xcd/0x450 [kvm]
2017-09-20T20:17:44.963Z fre111 kernel net_ratelimit: 267 callbacks suppressed
2017-09-20T20:25:48.700Z fre111 kernel CPU: 0 PID: 38931 Comm: qemu-kvm Not tainted 3.10.0-514.6.1.el7.x86_64 #1
2017-09-20T20:25:48.700Z fre111 kernel "Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 2.0 12/17/2015"
2017-09-20T20:25:48.700Z fre111 kernel [<ffffffff8169751d>] apic_timer_interrupt+0x6d/0x80
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffffa068e255>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffffa06668d7>] vcpu_enter_guest+0x337/0x1100 [kvm]
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffffa0653a31>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
2017-09-20T20:25:48.701Z fre111 kernel [<ffffffff816968c9>] system_call_fastpath+0x16/0x1b
2017-09-20T20:25:48.701Z fre111 kernel ---[ end trace f8c4d7e9dc7168b7 ]---
2017-09-20T20:17:15.608Z fre111 kernel net_ratelimit: 820 callbacks suppressed
TagsNo tags attached.
abrt_hash
URL

Activities

toracat

toracat

2018-06-10 16:06

manager   ~0032051

Does this still happen with the current kernel (3.10.0-862.3.2.el7) ?

Issue History

Date Modified Username Field Change
2017-09-21 20:51 rvichery New Issue
2018-06-10 16:06 toracat Note Added: 0032051