2017-12-12 10:08 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0011488CentOS-7kernelpublic2017-03-23 23:42
Reporterphunter 
PriorityhighSeveritycrashReproducibilityalways
StatusnewResolutionopen 
PlatformSuperMicro SYS-7048A-TOSCentOSOS Version7.2.1511
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0011488: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 28
DescriptionOS Crashes regularly reporting 'Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 28'. Sometimes in minutes, sometimes in 4-6 hours. Always crashes. Lacking C612 Chipset support..?
Steps To ReproduceInstall CentOS 7.2.1511 on a system using LGA2011-V4 processors and C612 Chipset.
Additional Information[ 3283.805845] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 28
[ 3283.805912] CPU: 28 PID: 0 Comm: swapper/28 Not tainted 3.10.0-327.28.3.el7.x86_64 #1
[ 3283.805965] Hardware name: Supermicro SYS-7048A-T/X10DAI, BIOS 2.0 02/02/2016
[ 3283.806014] ffffffff81868d00 34e140c889c2ec29 ffff883fff505af0 ffffffff81636453
[ 3283.806077] ffff883fff505b70 ffffffff8162fce7 0000000000000010 ffff883fff505b80
[ 3283.806137] ffff883fff505b20 34e140c889c2ec29 0000000000000000 000000000000001c
[ 3283.806202] Call Trace:
[ 3283.806221] <NMI> [<ffffffff81636453>] dump_stack+0x19/0x1b
[ 3283.806285] [<ffffffff8162fce7>] panic+0xd8/0x1e7
[ 3283.806328] [<ffffffff8111b8f0>] ? restart_watchdog_hrtimer+0x50/0x50
[ 3283.806373] [<ffffffff8111b9b2>] watchdog_overflow_callback+0xc2/0xd0
[ 3283.806428] [<ffffffff8115f201>] __perf_event_overflow+0xa1/0x250
[ 3283.806471] [<ffffffff8115fcd4>] perf_event_overflow+0x14/0x20
[ 3283.806521] [<ffffffff810325d8>] intel_pmu_handle_irq+0x1e8/0x470
[ 3283.806573] [<ffffffff8163fe8b>] perf_event_nmi_handler+0x2b/0x50
[ 3283.806617] [<ffffffff8163f5d9>] nmi_handle.isra.0+0x69/0xb0
[ 3283.806658] [<ffffffff8163f789>] do_nmi+0x169/0x340
[ 3283.806694] [<ffffffff8163ea13>] end_repeat_nmi+0x1e/0x2e
[ 3283.806744] [<ffffffff8163deb7>] ? _raw_spin_lock_irqsave+0x47/0x60
[ 3283.806789] [<ffffffff8163deb7>] ? _raw_spin_lock_irqsave+0x47/0x60
[ 3283.806834] [<ffffffff8163deb7>] ? _raw_spin_lock_irqsave+0x47/0x60
[ 3283.806877] <<EOE>> <IRQ> [<ffffffffa0406a6f>] nvkm_fantog_update+0x4f/0x120 [nouveau]
[ 3283.807040] [<ffffffffa0406b95>] nvkm_fantog_set+0x35/0x40 [nouveau]
[ 3283.807110] [<ffffffffa040603f>] nvkm_fan_update+0xef/0x1d0 [nouveau]
[ 3283.807183] [<ffffffffa0406179>] nvkm_therm_fan_set+0x19/0x20 [nouveau]
[ 3283.807252] [<ffffffffa04058ad>] nvkm_therm_update+0xad/0x300 [nouveau]
[ 3283.807322] [<ffffffffa0405b1a>] nvkm_therm_alarm+0x1a/0x20 [nouveau]
[ 3283.807393] [<ffffffffa040934b>] nv04_timer_alarm_trigger+0x12b/0x180 [nouveau]
[ 3283.807466] [<ffffffffa04093fd>] nv04_timer_alarm+0x5d/0xb0 [nouveau]
[ 3283.807534] [<ffffffffa0406b3e>] nvkm_fantog_update+0x11e/0x120 [nouveau]
[ 3283.807601] [<ffffffffa0406b5a>] nvkm_fantog_alarm+0x1a/0x20 [nouveau]
[ 3283.807669] [<ffffffffa040934b>] nv04_timer_alarm_trigger+0x12b/0x180 [nouveau]
[ 3283.807741] [<ffffffffa04094bb>] nv04_timer_intr+0x6b/0x90 [nouveau]
[ 3283.807812] [<ffffffffa03ff815>] nvkm_mc_intr+0x105/0x160 [nouveau]
[ 3283.807858] [<ffffffff8111c75e>] handle_irq_event_percpu+0x3e/0x1e0
[ 3283.807902] [<ffffffff8111c93d>] handle_irq_event+0x3d/0x60
[ 3283.807945] [<ffffffff8111f5d7>] handle_edge_irq+0x77/0x130
[ 3283.807994] [<ffffffff81016ecf>] handle_irq+0xbf/0x150
[ 3283.808041] [<ffffffff810e159a>] ? tick_check_idle+0x8a/0xd0
[ 3283.808089] [<ffffffff8164256a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 3283.808144] [<ffffffff81648fef>] do_IRQ+0x4f/0xf0
[ 3283.808181] [<ffffffff8163e32d>] common_interrupt+0x6d/0x6d
[ 3283.808218] <EOI> [<ffffffff814d4f92>] ? cpuidle_enter_state+0x52/0xc0
[ 3283.808276] [<ffffffff814d50d9>] cpuidle_idle_call+0xd9/0x210
[ 3283.809539] [<ffffffff8101e4ee>] arch_cpu_idle+0xe/0x30
[ 3283.810789] [<ffffffff810d6485>] cpu_startup_entry+0x245/0x290
[ 3283.812061] [<ffffffff8104768a>] start_secondary+0x1ba/0x230
TagsNo tags attached.
abrt_hash
URL
Attached Files

-Relationships
+Relationships

-Notes

~0027543

phunter (reporter)

Contacted SuperMicro (MB Vendor). They imply there may be an issue with NMI Watchdog.

~0027745

cenph (reporter)

Hi, phunter, do you fix this issue now, i have the same Kernel Panic as you, please contact me if possible, cenph@caicloud.io, thanks!

~0027947

dpthakar (reporter)

I am getting same kernel panic on my i7 machine.
[ 601.079473] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 11
[ 601.079513] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 3.10.0-327.36.3.el7.x86_64 #1
[ 601.079548] Hardware name: Gigabyte Technology Co., Ltd. Default string/X99P-SLI-CF, BIOS F23 07/22/2016
[ 601.079589] ffffffff81868cf8 b6605b7532d71c87 ffff881fbf6c5af0 ffffffff81636431
[ 601.079630] ffff881fbf6c5b70 ffffffff8162fcc0 0000000000000010 ffff881fbf6c5b80
[ 601.079670] ffff881fbf6c5b20 b6605b7532d71c87 0000000000000000 000000000000000b
[ 601.079710] Call Trace:
[ 601.079722] <NMI> [<ffffffff81636431>] dump_stack+0x19/0x1b
[ 601.079759] [<ffffffff8162fcc0>] panic+0xd8/0x1e7
[ 601.079784] [<ffffffff8111b920>] ? restart_watchdog_hrtimer+0x50/0x50
[ 601.079814] [<ffffffff8111b9e2>] watchdog_overflow_callback+0xc2/0xd0
[ 601.079845] [<ffffffff8115f1d1>] __perf_event_overflow+0xa1/0x250
[ 601.079873] [<ffffffff8115fca4>] perf_event_overflow+0x14/0x20
[ 601.079901] [<ffffffff810325e8>] intel_pmu_handle_irq+0x1e8/0x470
[ 601.079930] [<ffffffff8163fe8b>] perf_event_nmi_handler+0x2b/0x50
[ 601.079958] [<ffffffff8163f5d9>] nmi_handle.isra.0+0x69/0xb0
[ 601.079984] [<ffffffff8163f6f0>] do_nmi+0xd0/0x340
[ 601.080007] [<ffffffff8163ea13>] end_repeat_nmi+0x1e/0x2e
[ 601.080034] [<ffffffff8163de8d>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 601.080064] [<ffffffff8163de8d>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 601.080093] [<ffffffff8163de8d>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 601.080121] <<EOE>> <IRQ> [<ffffffffa0218a5f>] nvkm_fantog_update+0x4f/0x120 [nouveau]
[ 601.080206] [<ffffffffa0218b85>] nvkm_fantog_set+0x35/0x40 [nouveau]
[ 601.080247] [<ffffffffa021802f>] nvkm_fan_update+0xef/0x1d0 [nouveau]
[ 601.080287] [<ffffffffa0218169>] nvkm_therm_fan_set+0x19/0x20 [nouveau]
[ 601.080328] [<ffffffffa021789d>] nvkm_therm_update+0xad/0x300 [nouveau]
[ 601.080368] [<ffffffffa0217b0a>] nvkm_therm_alarm+0x1a/0x20 [nouveau]
[ 601.080409] [<ffffffffa021b33b>] nv04_timer_alarm_trigger+0x12b/0x180 [nouveau]
[ 601.080453] [<ffffffffa021b3ed>] nv04_timer_alarm+0x5d/0xb0 [nouveau]
[ 601.080493] [<ffffffffa0218b2e>] nvkm_fantog_update+0x11e/0x120 [nouveau]
[ 601.080534] [<ffffffffa0218b4a>] nvkm_fantog_alarm+0x1a/0x20 [nouveau]
[ 601.080573] [<ffffffffa021b33b>] nv04_timer_alarm_trigger+0x12b/0x180 [nouveau]
[ 601.080616] [<ffffffffa021b4ab>] nv04_timer_intr+0x6b/0x90 [nouveau]
[ 601.080657] [<ffffffffa0211805>] nvkm_mc_intr+0x105/0x160 [nouveau]
[ 601.080686] [<ffffffff8111c78e>] handle_irq_event_percpu+0x3e/0x1e0
[ 601.080714] [<ffffffff8111c96d>] handle_irq_event+0x3d/0x60
[ 601.080740] [<ffffffff8111f607>] handle_edge_irq+0x77/0x130
[ 601.080768] [<ffffffff81016ecf>] handle_irq+0xbf/0x150
[ 601.081552] [<ffffffff810e15ba>] ? tick_check_idle+0x8a/0xd0
[ 601.082340] [<ffffffff8164257a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 601.083132] [<ffffffff81648fef>] do_IRQ+0x4f/0xf0
[ 601.083917] [<ffffffff8163e32d>] common_interrupt+0x6d/0x6d
[ 601.084688] <EOI> [<ffffffff814d4af2>] ? cpuidle_enter_state+0x52/0xc0
[ 601.085463] [<ffffffff814d4c39>] cpuidle_idle_call+0xd9/0x210
[ 601.086247] [<ffffffff8101e4ee>] arch_cpu_idle+0xe/0x30
[ 601.087016] [<ffffffff810d64a5>] cpu_startup_entry+0x245/0x290
[ 601.087775] [<ffffffff8104768a>] start_secondary+0x1ba/0x230

~0028043

phunter (reporter)

Found this was due to nouveau drivers with nVidia Graphics Adapter.

~0028044

dpthakar (reporter)

Last week I tried installing NVIDIA-Linux-x86_64-375.20.run
which disabled nouveau, but that worked only for 2-3 days. reboot started again.

cat /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0


cat /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0

~0028045

phunter (reporter)

We had to disable it from GRUB in /etc/default/grub. Added 'rd.driver.blacklist=nouveau nomodeset' to end of GRUB_CMDLINE_LINUX.

We also created a blacklist.conf in /etc/modprobe.d, and added 'blacklist nouveau' to it.

That seemed to fix it. Up 100% solid for 46 days.

~0028046

dpthakar (reporter)

@phunter, thanks for the reply.
I have made grub changes & did grub2-mkconfig.

~0028926

RedChops (reporter)

Same stacktrace for me too using a Xeon E3-1230, kernel 3.10.0-514.10.2.el7.x86_64 and a GeForce GT 710. Blacklisting the nouveau driver worked for me as well. Interestingly enough, I had this crash with the 4.10 kernel from EL Repo as well. It must be strictly nouveau
+Notes

-Issue History
Date Modified Username Field Change
2016-09-20 19:46 phunter New Issue
2016-09-20 21:33 phunter Note Added: 0027543
2016-10-19 02:23 cenph Note Added: 0027745
2016-11-21 09:40 dpthakar Note Added: 0027947
2016-11-30 04:04 phunter Note Added: 0028043
2016-11-30 04:11 dpthakar Note Added: 0028044
2016-11-30 04:18 phunter Note Added: 0028045
2016-11-30 05:06 dpthakar Note Added: 0028046
2017-03-23 23:42 RedChops Note Added: 0028926
+Issue History