View Issue Details

IDProjectCategoryView StatusLast Update
0005618CentOS-6kernelpublic2013-01-20 14:08
Reporterclar2242 
PrioritynormalSeverityblockReproducibilityalways
Status newResolutionopen 
Product Version6.2 
Target VersionFixed in Version 
Summary0005618: NMI received for unknown reason
DescriptionWe have a just purchased a number of Dell C2100 servers running CentOS 6.2 (2.6.32-220.7.1.el6.x86_64 kernel). After approx 5 mins I get the following error on the console:

Uhhuh. NMI received for unknown reason 2d on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue.

And the 4 port gigabit ethernet adaptor goes offline:
idb 0000:06:00.0: eth0 reset adapter
idb 0000:07:00.1: eth3 reset adapter
idb 0000:06:00.1: eth1 reset adapter
idb 0000:07:00.0: eth2 reset adapter

Output from dmesg:
Uhhuh. NMI received for unknown reason 2d on CPU 0.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Hardware name: PowerEdge C2100
NETDEV WATCHDOG: eth3 (igb): transmit queue 0 timed out
Modules linked in: ipmi_si mpt2sas scsi_transport_sas raid_class mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu 8021q garp stp llc bonding ipv6 dm_mod ses enclosure sg igb dca dcdbas serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif megaraid_sas pata_acpi ata_generic ata_piix [last unloaded: ipmi_si]
Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64 #1
Call Trace:
 <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff8144a60d>] ? dev_watchdog+0x26d/0x280
 [<ffffffff8107cff4>] ? mod_timer+0x144/0x220
 [<ffffffff8144a3a0>] ? dev_watchdog+0x0/0x280
 [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340
 [<ffffffff810a0b20>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8102af2d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0
 [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
 [<ffffffff81071de5>] ? irq_exit+0x85/0x90
 [<ffffffff814f4eb0>] ? smp_apic_timer_interrupt+0x70/0x9b
 [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
 <EOI> [<ffffffff812c4b0e>] ? intel_idle+0xde/0x170
 [<ffffffff812c4af1>] ? intel_idle+0xc1/0x170
 [<ffffffff813fa027>] ? cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
 [<ffffffff814d420a>] ? rest_init+0x7a/0x80
 [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430
 [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109
---[ end trace 120c4b9c89ff5465 ]---
igb 0000:07:00.1: eth3: Reset adapter
bonding: bond0: link status definitely down for interface eth3, disabling it
igb 0000:06:00.0: eth0: Reset adapter
bonding: bond0: link status definitely down for interface eth0, disabling it
igb 0000:06:00.1: eth1: Reset adapter
bonding: bond0: link status definitely down for interface eth1, disabling it
igb 0000:07:00.0: eth2: Reset adapter
bonding: bond0: link status definitely down for interface eth2, disabling it

I've got CentOS 5.7 installed on c2100s as well which don't experience this issue.
TagsNo tags attached.

Activities

clar2242

clar2242

2012-03-26 11:59

reporter   ~0014733

Note to self, check rhn before raising cases with centos... ;)

https://access.redhat.com/knowledge/solutions/43168 is the issue.
alexandre.fontelle@sungard.com

alexandre.fontelle@sungard.com

2012-08-16 16:04

reporter   ~0015670

I get the same behavior on 2 servers with HP DL360 G6 servers and kernel 2.6.32-220.4.1.el6.x86_64

Aug 13 10:07:32 linwdpx11 kernel: ------------[ cut here ]------------
Aug 13 10:07:32 linwdpx11 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Aug 13 10:07:32 linwdpx11 kernel: Hardware name: ProLiant DL360 G6
Aug 13 10:07:32 linwdpx11 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 7 timed out
Aug 13 10:07:32 linwdpx11 kernel: Modules linked in: mptctl mptbase ipmi_devintf pcc_cpufreq bonding ipv6 dm_mirror dm_region_hash dm_log power_meter hpilo ipmi_si ipmi_msghandler hpwdt bnx2 serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: microcode]
Aug 13 10:07:32 linwdpx11 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1
Aug 13 10:07:32 linwdpx11 kernel: Call Trace:
Aug 13 10:07:32 linwdpx11 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8139d4d0>] ? rh_timer_func+0x0/0x10
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8139cc90>] ? usb_hcd_poll_rh_status+0x140/0x180
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8107bbe5>] ? internal_add_timer+0xb5/0x110
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8102ad2d>] ? lapic_next_event+0x1d/0x30
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Aug 13 10:07:32 linwdpx11 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129
Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109
Aug 13 10:07:32 linwdpx11 kernel: ---[ end trace 3c89311d74b8289e ]---
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: PBA[00000000]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: <--- start MCP states dump --->
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: MCP_STATE_P0[0007610e] MCP_STATE_P1[0007610e]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: pc[0800b6b4] pc[0800b3dc] instr[03623824]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: shmem states:
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[00000bbb]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0007610e]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000002
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 0x3fc[0000ffff]
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: <--- end MCP states dump --->
Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: NIC Copper Link is Down
Aug 13 10:07:32 linwdpx11 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Aug 13 10:07:32 linwdpx11 kernel: bonding: bond0: making interface eth1 the new active one.
Aug 13 10:07:35 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex
Aug 13 10:07:36 linwdpx11 kernel: bonding: bond0: link status up for interface eth0, enabling it in 10000 ms.
Aug 13 10:07:46 linwdpx11 kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.

The same servers running with 2.6.18-274.17.1.el5 had no issues.

Issue History

Date Modified Username Field Change
2012-03-26 08:31 clar2242 New Issue
2012-03-26 11:59 clar2242 Note Added: 0014733
2012-08-16 16:04 alexandre.fontelle@sungard.com Note Added: 0015670