View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0005618 | CentOS-6 | kernel | public | 2012-03-26 08:31 | 2013-01-20 14:08 |
Reporter | clar2242 | Assigned To | |||
Priority | normal | Severity | block | Reproducibility | always |
Status | new | Resolution | open | ||
Product Version | 6.2 | ||||
Summary | 0005618: NMI received for unknown reason | ||||
Description | We have a just purchased a number of Dell C2100 servers running CentOS 6.2 (2.6.32-220.7.1.el6.x86_64 kernel). After approx 5 mins I get the following error on the console: Uhhuh. NMI received for unknown reason 2d on CPU 0. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue. And the 4 port gigabit ethernet adaptor goes offline: idb 0000:06:00.0: eth0 reset adapter idb 0000:07:00.1: eth3 reset adapter idb 0000:06:00.1: eth1 reset adapter idb 0000:07:00.0: eth2 reset adapter Output from dmesg: Uhhuh. NMI received for unknown reason 2d on CPU 0. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Hardware name: PowerEdge C2100 NETDEV WATCHDOG: eth3 (igb): transmit queue 0 timed out Modules linked in: ipmi_si mpt2sas scsi_transport_sas raid_class mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu 8021q garp stp llc bonding ipv6 dm_mod ses enclosure sg igb dca dcdbas serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif megaraid_sas pata_acpi ata_generic ata_piix [last unloaded: ipmi_si] Pid: 0, comm: swapper Not tainted 2.6.32-220.7.1.el6.x86_64 #1 Call Trace: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff8144a60d>] ? dev_watchdog+0x26d/0x280 [<ffffffff8107cff4>] ? mod_timer+0x144/0x220 [<ffffffff8144a3a0>] ? dev_watchdog+0x0/0x280 [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 [<ffffffff810a0b20>] ? tick_sched_timer+0x0/0xc0 [<ffffffff8102af2d>] ? lapic_next_event+0x1d/0x30 [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 [<ffffffff81071de5>] ? irq_exit+0x85/0x90 [<ffffffff814f4eb0>] ? smp_apic_timer_interrupt+0x70/0x9b [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff812c4b0e>] ? intel_idle+0xde/0x170 [<ffffffff812c4af1>] ? intel_idle+0xc1/0x170 [<ffffffff813fa027>] ? cpuidle_idle_call+0xa7/0x140 [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 [<ffffffff814d420a>] ? rest_init+0x7a/0x80 [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 ---[ end trace 120c4b9c89ff5465 ]--- igb 0000:07:00.1: eth3: Reset adapter bonding: bond0: link status definitely down for interface eth3, disabling it igb 0000:06:00.0: eth0: Reset adapter bonding: bond0: link status definitely down for interface eth0, disabling it igb 0000:06:00.1: eth1: Reset adapter bonding: bond0: link status definitely down for interface eth1, disabling it igb 0000:07:00.0: eth2: Reset adapter bonding: bond0: link status definitely down for interface eth2, disabling it I've got CentOS 5.7 installed on c2100s as well which don't experience this issue. | ||||
Tags | No tags attached. | ||||
Note to self, check rhn before raising cases with centos... ;) https://access.redhat.com/knowledge/solutions/43168 is the issue. |
|
I get the same behavior on 2 servers with HP DL360 G6 servers and kernel 2.6.32-220.4.1.el6.x86_64 Aug 13 10:07:32 linwdpx11 kernel: ------------[ cut here ]------------ Aug 13 10:07:32 linwdpx11 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Aug 13 10:07:32 linwdpx11 kernel: Hardware name: ProLiant DL360 G6 Aug 13 10:07:32 linwdpx11 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 7 timed out Aug 13 10:07:32 linwdpx11 kernel: Modules linked in: mptctl mptbase ipmi_devintf pcc_cpufreq bonding ipv6 dm_mirror dm_region_hash dm_log power_meter hpilo ipmi_si ipmi_msghandler hpwdt bnx2 serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: microcode] Aug 13 10:07:32 linwdpx11 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1 Aug 13 10:07:32 linwdpx11 kernel: Call Trace: Aug 13 10:07:32 linwdpx11 kernel: <IRQ> [<ffffffff81069a17>] ? warn_slowpath_common+0x87/0xc0 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81069b06>] ? warn_slowpath_fmt+0x46/0x50 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8144a4fd>] ? dev_watchdog+0x26d/0x280 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8139d4d0>] ? rh_timer_func+0x0/0x10 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8139cc90>] ? usb_hcd_poll_rh_status+0x140/0x180 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8107bbe5>] ? internal_add_timer+0xb5/0x110 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8144a290>] ? dev_watchdog+0x0/0x280 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8107c7f7>] ? run_timer_softirq+0x197/0x340 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff810a0a10>] ? tick_sched_timer+0x0/0xc0 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8102ad2d>] ? lapic_next_event+0x1d/0x30 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81072001>] ? __do_softirq+0xc1/0x1d0 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81095610>] ? hrtimer_interrupt+0x140/0x250 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81071de5>] ? irq_exit+0x85/0x90 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff814f4d70>] ? smp_apic_timer_interrupt+0x70/0x9b Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20 Aug 13 10:07:32 linwdpx11 kernel: <EOI> [<ffffffff812c49de>] ? intel_idle+0xde/0x170 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff812c49c1>] ? intel_idle+0xc1/0x170 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff813f9ef7>] ? cpuidle_idle_call+0xa7/0x140 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff814d40ca>] ? rest_init+0x7a/0x80 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1ff76>] ? start_kernel+0x424/0x430 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1f33a>] ? x86_64_start_reservations+0x125/0x129 Aug 13 10:07:32 linwdpx11 kernel: [<ffffffff81c1f438>] ? x86_64_start_kernel+0xfa/0x109 Aug 13 10:07:32 linwdpx11 kernel: ---[ end trace 3c89311d74b8289e ]--- Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[017f0080] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: PBA[00000000] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: <--- start MCP states dump ---> Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: MCP_STATE_P0[0007610e] MCP_STATE_P1[0007610e] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: pc[0800b6b4] pc[0800b3dc] instr[03623824] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: shmem states: Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[00000bbb] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0007610e] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000002 Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: DEBUG: 0x3fc[0000ffff] Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: <--- end MCP states dump ---> Aug 13 10:07:32 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: NIC Copper Link is Down Aug 13 10:07:32 linwdpx11 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Aug 13 10:07:32 linwdpx11 kernel: bonding: bond0: making interface eth1 the new active one. Aug 13 10:07:35 linwdpx11 kernel: bnx2 0000:02:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex Aug 13 10:07:36 linwdpx11 kernel: bonding: bond0: link status up for interface eth0, enabling it in 10000 ms. Aug 13 10:07:46 linwdpx11 kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex. The same servers running with 2.6.18-274.17.1.el5 had no issues. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2012-03-26 08:31 | clar2242 | New Issue | |
2012-03-26 11:59 | clar2242 | Note Added: 0014733 | |
2012-08-16 16:04 | alexandre.fontelle@sungard.com | Note Added: 0015670 |