2017-08-16 15:06 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0010965CentOS-6qemu-kvmpublic2017-08-14 13:36
Reporterthefretrunner 
PriorityhighSeveritycrashReproducibilityrandom
StatusnewResolutionopen 
Product Version6.7 
Target VersionFixed in Version 
Summary0010965: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1
DescriptionWe see intermittent panics on two KVM hosts (Hypervisors).
We keep patching the systems but we still experience the same problem.

Any ideas?

crash dump analyze shows:

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-573.26.1.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2016-05-19-07:43:17/vmcore [PARTIAL DUMP]
        CPUS: 8
        DATE: Thu May 19 07:42:43 2016
      UPTIME: 8 days, 17:08:37
LOAD AVERAGE: 1.57, 0.63, 0.42
       TASKS: 344
    NODENAME: olb-master2
     RELEASE: 2.6.32-573.26.1.el6.x86_64
     VERSION: #1 SMP Wed May 4 00:57:44 UTC 2016
     MACHINE: x86_64 (2927 Mhz)
      MEMORY: 16 GB
       PANIC: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 1"
         PID: 8315
     COMMAND: "qemu-kvm"
        TASK: ffff880437590ab0 [THREAD_INFO: ffff88042e70c000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)
TagsNo tags attached.
Attached Files

-Relationships
+Relationships

-Notes

~0026779

thefretrunner (reporter)

Some more information:


crash> bt
PID: 3143 TASK: ffff8804317d0ab0 CPU: 4 COMMAND: "qemu-kvm"
 #0 [ffff880038086b20] machine_kexec at ffffffff8103d1fb
 #1 [ffff880038086b80] crash_kexec at ffffffff810cc882
 #2 [ffff880038086c50] panic at ffffffff81538b7e
 #3 [ffff880038086cd0] watchdog_overflow_callback at ffffffff810ed83d
 #4 [ffff880038086cf0] __perf_event_overflow at ffffffff811243b7
 #5 [ffff880038086d70] perf_event_overflow at ffffffff81124a04
 #6 [ffff880038086d80] intel_pmu_handle_irq at ffffffff81024a52
 #7 [ffff880038086e90] perf_event_nmi_handler at ffffffff8153df09
 #8 [ffff880038086ea0] notifier_call_chain at ffffffff8153f9c5
 #9 [ffff880038086ee0] atomic_notifier_call_chain at ffffffff8153fa2a
#10 [ffff880038086ef0] notify_die at ffffffff810a7bfe
#11 [ffff880038086f20] do_nmi at ffffffff8153d683
#12 [ffff880038086f50] nmi at ffffffff8153cf43
    [exception RIP: vmx_vcpu_run+1870]
    RIP: ffffffffa01fa4de RSP: ffff880434343c28 RFLAGS: 00000046
    RAX: 0000000000000200 RBX: 0000000000000000 RCX: ffff880434348200
    RDX: 0000000000004402 RSI: 0000000000000000 RDI: ffff880434348200
    RBP: ffff880434343c88 R8: ffff88002830f700 R9: 7fffffffffffffff
    R10: 0000000000000002 R11: 0000000000000000 R12: 0000000080000202
    R13: 0000000000000001 R14: ffff88002830d180 R15: 0000000000000282
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#13 [ffff880434343c28] vmx_vcpu_run at ffffffffa01fa4de [kvm_intel]
#14 [ffff880434343c90] kvm_arch_vcpu_ioctl_run at ffffffffa01a09ca [kvm]
#15 [ffff880434343dc0] kvm_vcpu_ioctl at ffffffffa0187034 [kvm]
#16 [ffff880434343e60] vfs_ioctl at ffffffff811a7af2
#17 [ffff880434343ea0] do_vfs_ioctl at ffffffff811a7fba
#18 [ffff880434343f30] sys_ioctl at ffffffff811a8211
#19 [ffff880434343f80] system_call_fastpath at ffffffff8100b0d2
    RIP: 00007f4655b25907 RSP: 00007f464f1a4a68 RFLAGS: 00000246
    RAX: 0000000000000010 RBX: ffffffff8100b0d2 RCX: ffffffffffffffff
    RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
    RBP: 00007f4658f34000 R8: 00007f46595329c0 R9: 0000000000000c47
    R10: 0000000000000002 R11: 0000000000000246 R12: 00007f465ab765e0
    R13: 00007f465ab761a0 R14: 0000000000000000 R15: 00007f465aba8010
    ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

~0027779

SDGathman (reporter)

Also getting this on a Dell T310 since upgrading from CentOS-5 Xen to CentOS-6 KVM

~0027781

SDGathman (reporter)

kernel-2.6.32-642.6.1.el6.x86_64
CentOS 6.8

Looks like the same issue. I'll work on getting kdump configured, but it could take a while.

~0027787

SDGathman (reporter)

I have two Dell T310 servers. One gets this panic, the other doesn't.

Stable server:

BIOS Information
        Vendor: Dell Inc.
        Version: 1.8.2
        Release Date: 08/17/2011

Base Board Information
        Manufacturer: Dell Inc.
        Product Name: 02P9X9
        Version: A05
        Serial Number: ..CN1374024D0030.

Panicky server:

BIOS Information
        Vendor: Dell Inc.
        Version: 1.4.1
        Release Date: 07/19/2010

Base Board Information
        Manufacturer: Dell Inc.
        Product Name: 02P9X9
        Version: A00
        Serial Number: ..CN1374009701JF.

I suppose a BIOS update might be worth a try.

~0028554

at0m1sk (reporter)

We are having the same issue for years on older kernel and newer ones.

DMI: Dell Inc. PowerEdge R210/05KX61, BIOS 1.3.4 05/24/2010

Watchdog detected hard LOCKUP on cpu 4
Modules linked in: ebtable_nat ebtables ipmi_devintf bridge stp llc ipt_REJECT xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm_intel kvm power_meter acpi_ipmi ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_vendor_support dcdbas sg lpc_ich mfd_core bnx2 igb dca i2c_algo_bit i2c_core ptp pps_core ext4 jbd2 mbcache sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Pid: 2100, comm: qemu-kvm Not tainted 2.6.32-642.6.2.el6.x86_64 #1
Call Trace:
 <NMI> [<ffffffff810f2a21>] ? watchdog_overflow_callback+0xf1/0x110
 [<ffffffff8112b807>] ? __perf_event_overflow+0xa7/0x240
 [<ffffffff8101dee6>] ? x86_perf_event_set_period+0xf6/0x180
 [<ffffffff8112be64>] ? perf_event_overflow+0x14/0x20
 [<ffffffff810252bc>] ? intel_pmu_handle_irq+0x21c/0x480
 [<ffffffff8154d659>] ? perf_event_nmi_handler+0x39/0xb0
 [<ffffffff8154f155>] ? notifier_call_chain+0x55/0x80
 [<ffffffff8154f1ba>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff810acc8e>] ? notify_die+0x2e/0x30
 [<ffffffff8154cdd3>] ? do_nmi+0x1c3/0x350
 [<ffffffff8154c693>] ? nmi+0x83/0x90
 [<ffffffff8100a01e>] ? cpu_idle+0xee/0x110
 [<ffffffffa026949e>] ? vmx_vcpu_run+0x74e/0xb20 [kvm_intel]
 <<EOE>> [<ffffffffa020f8fa>] ? kvm_arch_vcpu_ioctl_run+0x40a/0x1060 [kvm]
 [<ffffffffa01f6034>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
 [<ffffffff811b1946>] ? pollwake+0x56/0x60
 [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
 [<ffffffff8105eb69>] ? __wake_up_common+0x59/0x90
 [<ffffffff811af562>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811afa2a>] ? do_vfs_ioctl+0x3aa/0x580
 [<ffffffff811afc81>] ? sys_ioctl+0x81/0xa0
 [<ffffffff810ee25e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: Hard LOCKUP
Pid: 2100, comm: qemu-kvm Not tainted 2.6.32-642.6.2.el6.x86_64 #1
Call Trace:
 <NMI> [<ffffffff815482b1>] ? panic+0xa7/0x179
 [<ffffffff81011105>] ? show_trace+0x15/0x20
 [<ffffffff810f2a40>] ? watchdog_timer_fn+0x0/0x230
 [<ffffffff8112b807>] ? __perf_event_overflow+0xa7/0x240
 [<ffffffff8101dee6>] ? x86_perf_event_set_period+0xf6/0x180
 [<ffffffff8112be64>] ? perf_event_overflow+0x14/0x20
 [<ffffffff810252bc>] ? intel_pmu_handle_irq+0x21c/0x480
 [<ffffffff8154d659>] ? perf_event_nmi_handler+0x39/0xb0
 [<ffffffff8154f155>] ? notifier_call_chain+0x55/0x80
 [<ffffffff8154f1ba>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff810acc8e>] ? notify_die+0x2e/0x30
 [<ffffffff8154cdd3>] ? do_nmi+0x1c3/0x350
 [<ffffffff8154c693>] ? nmi+0x83/0x90
 [<ffffffff8100a01e>] ? cpu_idle+0xee/0x110
 [<ffffffffa026949e>] ? vmx_vcpu_run+0x74e/0xb20 [kvm_intel]
 <<EOE>> [<ffffffffa020f8fa>] ? kvm_arch_vcpu_ioctl_run+0x40a/0x1060 [kvm]
 [<ffffffffa01f6034>] ? kvm_vcpu_ioctl+0x434/0x580 [kvm]
 [<ffffffff811b1946>] ? pollwake+0x56/0x60
 [<ffffffff8106c500>] ? default_wake_function+0x0/0x20
 [<ffffffff8105eb69>] ? __wake_up_common+0x59/0x90
 [<ffffffff811af562>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff811afa2a>] ? do_vfs_ioctl+0x3aa/0x580
 [<ffffffff811afc81>] ? sys_ioctl+0x81/0xa0
 [<ffffffff810ee25e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

~0028555

thefretrunner (reporter)

We actually mannaged to resolve this, at least it has been working without any issues for more than 7 months now. We have disabled the nmi_watchdog on both the hypervisor and the guests.

echo 0 > /proc/sys/kernel/nmi_watchdog

I found a report on redhat.com where they suggested this solution. Try it out and see if it helps!

~0028812

at0m1sk (reporter)

Now that nmi_watchdog was disabled the issue might seem clearer, but the way I read this hang , there might be a disk issue. The system locked up for a few and recovered , but did not crash:

INFO: task qemu-kvm:17309 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000000 0 17309 1 0x00000080
 ffff8802384c3c98 0000000000000082 0000000000000000 ffff8802384c3c5c
 ffff880200000000 ffff88023fc28400 00010418ad7f1728 ffff88002f696ec0
 0000000000000400 000000011106d2eb ffff880238bcd068 ffff8802384c3fd8
Call Trace:
 [<ffffffff8105f91f>] ? mutex_spin_on_owner+0x9f/0xc0
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff81095f6e>] ? send_signal+0x3e/0x90
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81096376>] ? group_send_sig_info+0x56/0x70
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17476 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000000 0 17476 1 0x00000080
 ffff880115c4fc98 0000000000000082 0000000000000000 ffff880115c4fc5c
 ffff880100000000 ffff88023fc28400 00010418ad7815db ffff88002f696ec0
 0000000000000400 000000011106d2eb ffff880236b15ad8 ffff880115c4ffd8
Call Trace:
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff81095f6e>] ? send_signal+0x3e/0x90
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81096376>] ? group_send_sig_info+0x56/0x70
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17642 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000005 0 17642 1 0x00000080
 ffff880115d0b918 0000000000000082 ffff880115d0b878 ffffffff81278b04
 ffff88023c3aa860 ffff88023c5fc300 ffff880115d0b8e8 ffffffffa0004f4f
 ffff880115d0b8d8 ffff880115d0b8d8 ffff880226dfe5f8 ffff880115d0bfd8
Call Trace:
 [<ffffffff81278b04>] ? blk_unplug+0x34/0x70
 [<ffffffffa0004f4f>] ? dm_table_unplug_all+0x5f/0x100 [dm_mod]
 [<ffffffff815491c3>] io_schedule+0x73/0xc0
 [<ffffffff811da9fd>] __blockdev_direct_IO_newtrunc+0xb7d/0x1270
 [<ffffffff810a6727>] ? bit_waitqueue+0x17/0xd0
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffff811db167>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffffa0092ce0>] ? ext4_end_io_dio+0x0/0xa0 [ext4]
 [<ffffffffa006b9da>] ? jbd2_journal_stop+0x17a/0x2c0 [jbd2]
 [<ffffffffa00953c9>] ext4_direct_IO+0x119/0x260 [ext4]
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffffa0092ce0>] ? ext4_end_io_dio+0x0/0xa0 [ext4]
 [<ffffffffa0094f5f>] ? ext4_dirty_inode+0x4f/0x60 [ext4]
 [<ffffffff8112f012>] generic_file_direct_write+0xc2/0x190
 [<ffffffff81130931>] __generic_file_aio_write+0x3a1/0x490
 [<ffffffff81130aa8>] generic_file_aio_write+0x88/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b386>] ? int_check_syscall_exit_work+0x34/0x3d
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17643 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000006 0 17643 1 0x00000080
 ffff880226f83c98 0000000000000082 0000000000000000 ffff880226f83c5c
 ffff880200000000 ffff88023fc29000 00010418b84a8521 ffff88002f696ec0
 00000000000003f6 000000011106d351 ffff880237193ad8 ffff880226f83fd8
Call Trace:
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b386>] ? int_check_syscall_exit_work+0x34/0x3d
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task libvirtd:1902 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
libvirtd D 0000000000000001 0 1902 1 0x00000080
 ffff88023a6279f8 0000000000000082 ffff88023c3aa860 ffff88023c5fc300
 ffff88023a6279b8 ffffffffa0004f4f ffff88023a6279c8 ffff88023a627998
 ffff88023a6279b0 00000000381deab0 ffff8802381df068 ffff88023a627fd8
Call Trace:
 [<ffffffffa0004f4f>] ? dm_table_unplug_all+0x5f/0x100 [dm_mod]
 [<ffffffff8112e3f0>] ? sync_page+0x0/0x50
 [<ffffffff8112e3f0>] ? sync_page+0x0/0x50
 [<ffffffff815491c3>] io_schedule+0x73/0xc0
 [<ffffffff8112e42d>] sync_page+0x3d/0x50
 [<ffffffff81549caf>] __wait_on_bit+0x5f/0x90
 [<ffffffff8112e663>] wait_on_page_bit+0x73/0x80
 [<ffffffff810a6920>] ? wake_bit_function+0x0/0x50
 [<ffffffff811447e2>] ? pagevec_lookup+0x22/0x30
 [<ffffffff81146770>] truncate_inode_pages_range+0x320/0x500
 [<ffffffff81158582>] ? unmap_mapping_range+0x72/0x140
 [<ffffffff811469e5>] truncate_inode_pages+0x15/0x20
 [<ffffffff81146a3f>] truncate_pagecache+0x4f/0x70
 [<ffffffff81146aa5>] truncate_setsize+0x45/0x60
 [<ffffffff81146afe>] vmtruncate+0x3e/0x70
 [<ffffffff811b8db0>] inode_setattr+0x30/0x60
 [<ffffffffa0098fbc>] ext4_setattr+0x10c/0x330 [ext4]
 [<ffffffff811b9198>] notify_change+0x168/0x340
 [<ffffffff81197d44>] do_truncate+0x64/0xa0
 [<ffffffff811ad041>] do_filp_open+0x861/0xd20
 [<ffffffff8119f6a4>] ? cp_new_stat+0xe4/0x100
 [<ffffffff812a885a>] ? strncpy_from_user+0x4a/0x90
 [<ffffffff811ba072>] ? alloc_fd+0x92/0x160
 [<ffffffff811969f7>] do_sys_open+0x67/0x130
 [<ffffffff81196b00>] sys_open+0x20/0x30
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task monit:1919 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
monit D 0000000000000001 0 1919 1 0x00000080
 ffff88023a71f9f8 0000000000000086 0000000000000000 ffff88023a71f9bc
 ffff880200000000 ffff88023fc28600 000104251964dd47 ffff88002f696ec0
 00000000000002ff 000000011107a2f9 ffff880236269068 ffff88023a71ffd8
Call Trace:
 [<ffffffff8112e3f0>] ? sync_page+0x0/0x50
 [<ffffffff8112e3f0>] ? sync_page+0x0/0x50
 [<ffffffff815491c3>] io_schedule+0x73/0xc0
 [<ffffffff8112e42d>] sync_page+0x3d/0x50
 [<ffffffff81549caf>] __wait_on_bit+0x5f/0x90
 [<ffffffff8112e663>] wait_on_page_bit+0x73/0x80
 [<ffffffff810a6920>] ? wake_bit_function+0x0/0x50
 [<ffffffff811447e2>] ? pagevec_lookup+0x22/0x30
 [<ffffffff81146770>] truncate_inode_pages_range+0x320/0x500
 [<ffffffff81158582>] ? unmap_mapping_range+0x72/0x140
 [<ffffffff811469e5>] truncate_inode_pages+0x15/0x20
 [<ffffffff81146a3f>] truncate_pagecache+0x4f/0x70
 [<ffffffff81146aa5>] truncate_setsize+0x45/0x60
 [<ffffffff81146afe>] vmtruncate+0x3e/0x70
 [<ffffffff811b8db0>] inode_setattr+0x30/0x60
 [<ffffffffa0098fbc>] ext4_setattr+0x10c/0x330 [ext4]
 [<ffffffff811b9198>] notify_change+0x168/0x340
 [<ffffffff81197d44>] do_truncate+0x64/0xa0
 [<ffffffff811ad041>] do_filp_open+0x861/0xd20
 [<ffffffff8119f6a4>] ? cp_new_stat+0xe4/0x100
 [<ffffffff812a885a>] ? strncpy_from_user+0x4a/0x90
 [<ffffffff811ba072>] ? alloc_fd+0x92/0x160
 [<ffffffff811969f7>] do_sys_open+0x67/0x130
 [<ffffffff81196b00>] sys_open+0x20/0x30
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17309 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000000 0 17309 1 0x00000080
 ffff8802384c3c98 0000000000000082 0000000000000000 ffff8802384c3c5c
 ffff880200000000 ffff88023fc28400 00010418ad7f1728 ffff88002f696ec0
 0000000000000400 000000011106d2eb ffff880238bcd068 ffff8802384c3fd8
Call Trace:
 [<ffffffff8105f91f>] ? mutex_spin_on_owner+0x9f/0xc0
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff81095f6e>] ? send_signal+0x3e/0x90
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81096376>] ? group_send_sig_info+0x56/0x70
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17476 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000000 0 17476 1 0x00000080
 ffff880115c4fc98 0000000000000082 0000000000000000 ffff880115c4fc5c
 ffff880100000000 ffff88023fc28400 00010418ad7815db ffff88002f696ec0
 0000000000000400 000000011106d2eb ffff880236b15ad8 ffff880115c4ffd8
Call Trace:
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff81095f6e>] ? send_signal+0x3e/0x90
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81096376>] ? group_send_sig_info+0x56/0x70
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17642 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000005 0 17642 1 0x00000080
 ffff880115d0b918 0000000000000082 ffff880115d0b878 ffffffff81278b04
 ffff88023c3aa860 ffff88023c5fc300 ffff880115d0b8e8 ffffffffa0004f4f
 ffff880115d0b8d8 ffff880115d0b8d8 ffff880226dfe5f8 ffff880115d0bfd8
Call Trace:
 [<ffffffff81278b04>] ? blk_unplug+0x34/0x70
 [<ffffffffa0004f4f>] ? dm_table_unplug_all+0x5f/0x100 [dm_mod]
 [<ffffffff815491c3>] io_schedule+0x73/0xc0
 [<ffffffff811da9fd>] __blockdev_direct_IO_newtrunc+0xb7d/0x1270
 [<ffffffff810a6727>] ? bit_waitqueue+0x17/0xd0
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffff811db167>] __blockdev_direct_IO+0x77/0xe0
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffffa0092ce0>] ? ext4_end_io_dio+0x0/0xa0 [ext4]
 [<ffffffffa006b9da>] ? jbd2_journal_stop+0x17a/0x2c0 [jbd2]
 [<ffffffffa00953c9>] ext4_direct_IO+0x119/0x260 [ext4]
 [<ffffffffa00961f0>] ? ext4_get_block_dio_write+0x0/0xd0 [ext4]
 [<ffffffffa0092ce0>] ? ext4_end_io_dio+0x0/0xa0 [ext4]
 [<ffffffffa0094f5f>] ? ext4_dirty_inode+0x4f/0x60 [ext4]
 [<ffffffff8112f012>] generic_file_direct_write+0xc2/0x190
 [<ffffffff81130931>] __generic_file_aio_write+0x3a1/0x490
 [<ffffffff81130aa8>] generic_file_aio_write+0x88/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b386>] ? int_check_syscall_exit_work+0x34/0x3d
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
INFO: task qemu-kvm:17643 blocked for more than 120 seconds.
      Not tainted 2.6.32-642.6.2.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-kvm D 0000000000000006 0 17643 1 0x00000080
 ffff880226f83c98 0000000000000082 0000000000000000 ffff880226f83c5c
 ffff880200000000 ffff88023fc29000 00010418b84a8521 ffff88002f696ec0
 00000000000003f6 000000011106d351 ffff880237193ad8 ffff880226f83fd8
Call Trace:
 [<ffffffff8154a376>] __mutex_lock_slowpath+0x96/0x210
 [<ffffffff81549e9b>] mutex_lock+0x2b/0x50
 [<ffffffff81130a91>] generic_file_aio_write+0x71/0x100
 [<ffffffffa008ee08>] ext4_file_write+0x58/0x190 [ext4]
 [<ffffffff8119996a>] do_sync_write+0xfa/0x140
 [<ffffffff810a68a0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81247ddb>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123aa66>] ? security_file_permission+0x16/0x20
 [<ffffffff81199c68>] vfs_write+0xb8/0x1a0
 [<ffffffff8119a2da>] sys_pwrite64+0x7a/0x90
 [<ffffffff8100b386>] ? int_check_syscall_exit_work+0x34/0x3d
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

~0028813

thefretrunner (reporter)

Interesting, I will check our logs to see if I can find any similar patterns.

~0029395

at0m1sk (reporter)

@thefretrunner : have you found anything similar? (not so much in terms of backtrace but hung tasks)

~0029796

ahongloumeng (reporter)

I have the same problem.

Server:
   Dell R730xd
System:
   Centos6.8
Kernel:
   Linux HOST31 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Libvirt version:
   3.4.0
Qemu version:
   2.4.1


The error infomation is:
<0>Watchdog detected hard LOCKUP on cpu 28
<4>Modules linked in: iptable_mangle ipt_MASQUERADE xt_conntrack iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_NOTRACK iptable_raw ip_tables fuse xt_CHECKSUM ipt_REJECT act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb ebtable_nat ebt_arp ebt_ip ebtable_filter ebtables ip_queue bridge stp llc autofs4 bonding ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 nbd(U) vhost_net macvtap macvlan tun kvm_intel kvm ipmi_devintf microcode power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support dcdbas joydev bnxt_en sb_edac edac_core lpc_ich mfd_core shpchp igb dca i2c_algo_bit i2c_core ptp pps_core sg ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_NOTRACK]
<4>Pid: 32466, comm: qemu-system-x86 Not tainted 2.6.32-642.el6.x86_64 #1
<4>Call Trace:
<4> <NMI> [<ffffffff810f2c41>] ? watchdog_overflow_callback+0xf1/0x110
<4> [<ffffffff8112ba27>] ? __perf_event_overflow+0xa7/0x240
<4> [<ffffffff8101dee6>] ? x86_perf_event_set_period+0xf6/0x180
<4> [<ffffffff8112c084>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810252bc>] ? intel_pmu_handle_irq+0x21c/0x480
<4> [<ffffffff8154c189>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff8154dc85>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8154dcea>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff810aceae>] ? notify_die+0x2e/0x30
<4> [<ffffffff8154b903>] ? do_nmi+0x1c3/0x350
<4> [<ffffffff8154b1c3>] ? nmi+0x83/0x90
<4> [<ffffffff812a6f2f>] ? __write_lock_failed+0xf/0x20
<4> <<EOE>> [<ffffffff8154a9ae>] ? _write_lock_irq+0x1e/0x20
<4> [<ffffffff8107ae63>] ? copy_process+0xb93/0x1520
<4> [<ffffffff8107b946>] ? do_fork+0x96/0x4c0
<4> [<ffffffff811b225a>] ? sys_ppoll+0x7a/0x180
<4> [<ffffffff81009598>] ? sys_clone+0x28/0x30
<4> [<ffffffff8100b3f3>] ? stub_clone+0x13/0x20
<4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
<0>Kernel panic - not syncing: Hard LOCKUP
<4>Pid: 32466, comm: qemu-system-x86 Not tainted 2.6.32-642.el6.x86_64 #1
<4>Call Trace:
<4> <NMI> [<ffffffff81546de1>] ? panic+0xa7/0x179
<4> [<ffffffff81011105>] ? show_trace+0x15/0x20
<4> [<ffffffff810f2c60>] ? watchdog_timer_fn+0x0/0x230
<4> [<ffffffff8112ba27>] ? __perf_event_overflow+0xa7/0x240
<4> [<ffffffff8101dee6>] ? x86_perf_event_set_period+0xf6/0x180
<4> [<ffffffff8112c084>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810252bc>] ? intel_pmu_handle_irq+0x21c/0x480
<4> [<ffffffff8154c189>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff8154dc85>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8154dcea>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff810aceae>] ? notify_die+0x2e/0x30
<4> [<ffffffff8154b903>] ? do_nmi+0x1c3/0x350
<4> [<ffffffff8154b1c3>] ? nmi+0x83/0x90
<4> [<ffffffff812a6f2f>] ? __write_lock_failed+0xf/0x20
<4> <<EOE>> [<ffffffff8154a9ae>] ? _write_lock_irq+0x1e/0x20
<4> [<ffffffff8107ae63>] ? copy_process+0xb93/0x1520
<4> [<ffffffff8107b946>] ? do_fork+0x96/0x4c0
<4> [<ffffffff811b225a>] ? sys_ppoll+0x7a/0x180
<4> [<ffffffff81009598>] ? sys_clone+0x28/0x30
<4> [<ffffffff8100b3f3>] ? stub_clone+0x13/0x20
<4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

~0029797

ahongloumeng (reporter)

@thefretrunner: If /proc/sys/kernel/nmi_watchdog change to 1, it won't print panic log. Change /proc/sys/kernel/hardlockup_panic and /proc/sys/kernel/softlockup_panic to 0 should be a good idea, I think.

~0029811

at0m1sk (reporter)

@ahongloumeng what is the BIOS version of your server? (usually obtainable via dmesg) and what is the latest bios on Dell's website for your server?

~0029843

ahongloumeng (reporter)

1、My server BIOS Information
    Vendor: Dell Inc.
    Version: 1.3.6
    Release Date: 06/03/2015
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 16384 kB

2、Dell‘s website latest bios version
        Dell Server BIOS R630/R730/R730XD Version 2.4.3

~0029864

at0m1sk (reporter)

@ahongloumeng what kind of CPU is in the server? In the bios change list, there are lots of microcode updates which effect power states of the CPU. I have yet to be able to test this theory , but I highly suspect the a microcode upgrade via bios might fix this issue.
+Notes

-Issue History
Date Modified Username Field Change
2016-06-03 08:40 thefretrunner New Issue
2016-06-03 10:44 thefretrunner Note Added: 0026779
2016-10-23 19:09 SDGathman File Added: PA231009.JPG
2016-10-23 19:11 SDGathman Note Added: 0027779
2016-10-24 01:34 SDGathman Note Added: 0027781
2016-10-24 22:41 SDGathman Note Added: 0027787
2017-02-10 17:30 at0m1sk Note Added: 0028554
2017-02-10 17:58 thefretrunner Note Added: 0028555
2017-03-09 17:32 at0m1sk Note Added: 0028812
2017-03-09 17:58 thefretrunner Note Added: 0028813
2017-06-02 18:12 at0m1sk Note Added: 0029395
2017-08-05 02:28 ahongloumeng Note Added: 0029796
2017-08-05 03:03 ahongloumeng Note Added: 0029797
2017-08-08 14:17 at0m1sk Note Added: 0029811
2017-08-10 07:00 ahongloumeng Note Added: 0029843
2017-08-14 13:36 at0m1sk Note Added: 0029864
+Issue History