View Issue Details

IDProjectCategoryView StatusLast Update
0006011CentOS-6kernelpublic2013-06-05 17:10
Reportereinzelhaft Assigned To 
PriorityhighSeveritymajorReproducibilityrandom
Status resolvedResolutionfixed 
Product Version6.3 
Summary0006011: NETDEV WATCHDOG: (bnx2): transmit queue 0 timed out (resulting in loss of connectivity)
DescriptionIntermittently (approximately every 7-9 days), the bnx2 interface connected to our iSCSI network stops working. The interface can be recovered via an ifdown/ifup cycle. I've changed to a different physical interface, and the problem persists. See kernel logs in "Additional Information." I'm attaching the oops from the last occurrence, as the one from today didn't seem to make it to disk.
Steps To ReproduceSeemingly intermittent (though apparently periodic); cannot identify a specific trigger.
Additional InformationOct 9 07:55:14 fatboy kernel: ------------[ cut here ]------------
Oct 9 07:55:14 fatboy kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Oct 9 07:55:14 fatboy kernel: Hardware name: PowerEdge 2950
Oct 9 07:55:14 fatboy kernel: NETDEV WATCHDOG: p1p1 (bnx2): transmit queue 0 timed out
Oct 9 07:55:14 fatboy kernel: Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 ext4 jbd2 sunrpc iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bnx2 dcdbas microcode serio_raw iTCO_wdt iTCO_vendor_support i5000_edac edac_core i5k_amb sg shpchp ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Oct 9 07:55:14 fatboy kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #1
Oct 9 07:55:14 fatboy kernel: Call Trace:
Oct 9 07:55:14 fatboy kernel: <IRQ> [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
Oct 9 07:55:14 fatboy kernel: [<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50
Oct 9 07:55:14 fatboy kernel: [<ffffffff814595fd>] ? dev_watchdog+0x26d/0x280
Oct 9 07:55:14 fatboy kernel: [<ffffffff8108ca0d>] ? insert_work+0x6d/0xb0
Oct 9 07:55:14 fatboy kernel: [<ffffffff81459390>] ? dev_watchdog+0x0/0x280
Oct 9 07:55:14 fatboy kernel: [<ffffffff8107e897>] ? run_timer_softirq+0x197/0x340
Oct 9 07:55:14 fatboy kernel: [<ffffffff810a21c0>] ? tick_sched_timer+0x0/0xc0
Oct 9 07:55:14 fatboy kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
Oct 9 07:55:14 fatboy kernel: [<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0
Oct 9 07:55:14 fatboy kernel: [<ffffffff81096c50>] ? hrtimer_interrupt+0x140/0x250
Oct 9 07:55:14 fatboy kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Oct 9 07:55:14 fatboy kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Oct 9 07:55:14 fatboy kernel: [<ffffffff81073ca5>] ? irq_exit+0x85/0x90
Oct 9 07:55:14 fatboy kernel: [<ffffffff81505be0>] ? smp_apic_timer_interrupt+0x70/0x9b
Oct 9 07:55:14 fatboy kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Oct 9 07:55:14 fatboy kernel: <EOI> [<ffffffff81014877>] ? mwait_idle+0x77/0xd0
Oct 9 07:55:14 fatboy kernel: [<ffffffff8150338a>] ? atomic_notifier_call_chain+0x1a/0x20
Oct 9 07:55:14 fatboy kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Oct 9 07:55:14 fatboy kernel: [<ffffffff814e433a>] ? rest_init+0x7a/0x80
Oct 9 07:55:14 fatboy kernel: [<ffffffff81c21f7b>] ? start_kernel+0x424/0x430
Oct 9 07:55:14 fatboy kernel: [<ffffffff81c2133a>] ? x86_64_start_reservations+0x125/0x129
Oct 9 07:55:14 fatboy kernel: [<ffffffff81c21438>] ? x86_64_start_kernel+0xfa/0x109
Oct 9 07:55:14 fatboy kernel: ---[ end trace 887de374f1e560e4 ]---
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: intr_sem[0] PCI_CMD[02b8055e]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: PCI_PM[1d002000] PCI_MISC_CFG[81020088]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: RPM_MGMT_PKT_CTRL[00000000]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: HC_STATS_INTERRUPT_STATUS[00000000]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: <--- start MCP states dump --->
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: MCP_STATE_P0[00000106] MCP_STATE_P1[58f43906]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: MCP mode[0000b800] state[80000000] evt_mask[00000500]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: pc[08004d88] pc[080061a0] instr[00401021]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: shmem states:
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[000011d9]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[00000106]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: 000003dc: 0004ffff 00000000 00000000 00000000
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: 000003ec: 00000000 00000000 00000000 003727d0
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: DEBUG: 0x3fc[0000ffff]
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: <--- end MCP states dump --->
Oct 9 07:55:14 fatboy kernel: bnx2 0000:0b:00.0: p1p1: NIC Copper Link is Down
Oct 9 07:55:15 fatboy abrt-dump-oops: Reported 1 kernel oopses to Abrt
Oct 9 07:55:15 fatboy abrtd: Directory 'oops-2012-10-09-07:55:15-2257-0' creation detected
TagsNo tags attached.

Relationships

related to 0006249 resolvedtoracat WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) 

Activities

einzelhaft

einzelhaft

2012-10-09 17:00

reporter   ~0015906

Well, ok. I *try* to upload my oops, but I get a 401 error...
einzelhaft

einzelhaft

2012-10-09 18:21

reporter   ~0015907

(Ow. Major apologies for duplicates. When I tried submitting with the tarball attached, I got a 401 error that said "Hit back and try again." So I did. :-|)
einzelhaft

einzelhaft

2012-12-12 18:08

reporter   ~0016136

Issue occurred again on 12/3/2012.
ComplexMind

ComplexMind

2013-03-21 10:45

reporter   ~0016792

I'm seeing this on Centos 6.4 2.6.32-358.2.1.el6.x86_64 on two servers following an update to Centos 6.4. Both of these servers were previously running 2.6.32-220.17.1.el6.x86_64 without issues for several months. Prior to that, servers have been running stable for ~2 years.

These servers experienced the problems with different nics: bnx2 and netxen_nic respectively. Servers are HP Proliant DL380 G7 with latest firmware revision.

=======
Mar 17 16:14:04 srv8 kernel: do_IRQ: 14.157 No irq handler for vector (irq -1)
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start FTQ dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_PFTQ_CTL 00010000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_TFTQ_CTL 00020000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_MFTQ_CTL 00004000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TBDR_FTQ_CTL 00004002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TDMA_FTQ_CTL 00010002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TPAT_FTQ_CTL 00010002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: RXP_CFTQ_CTL 00008000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: RXP_FTQ_CTL 00100000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMXQ_FTQ_CTL 00010000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMTQ_FTQ_CTL 00020000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMQ_FTQ_CTL 00010000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: CP_CPQ_FTQ_CTL 00004000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: CPU states:
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001294 instr 38640001
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a5c pc 8000a4c instr 10400016
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c10 pc 8004c10 instr 10e00088
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a8c pc 8000a98 instr 10620021
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 145000 mode b880 state 80004000 evt_mask 500 pc 800adac pc 8000c68 instr 30420001
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c6c pc 8000c58 instr 1092823
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end FTQ dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start TBDC dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: TBDC free cnt: 32
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: LINE CID BIDX CMD VALIDS
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 00 001300 8458 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 01 001200 5020 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 02 000800 3b38 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 03 001200 4c48 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 04 001200 3080 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 05 001080 71d8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 06 000800 d090 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 07 001100 edd0 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 08 001100 edd8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 09 001200 0078 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0a 001200 0080 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0b 001200 0090 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0c 001200 0098 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0d 001200 00a8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0e 001200 00b0 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 0f 001200 00b8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 10 001200 00c0 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 11 001200 00c8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 12 001200 00d0 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 13 001200 00d8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 14 001200 00e0 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 15 001200 00e8 00 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 16 0dcb80 cfe8 ef [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 17 0bfd80 c7f8 ff [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 18 0f9f80 59b8 ad [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 19 1ddf80 ffe8 f7 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1a 1fff80 f768 b7 [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1b 1e7f80 d5d8 ff [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1c 1ff700 bdf8 ea [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1d 1efb80 eff8 5f [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1e 0bde80 3ef8 db [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: 1f 1ffe80 f378 9f [0]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end TBDC dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01bf0040]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PBA[00000000]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start MCP states dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP mode[0000b880] state[80004000] evt_mask[00000500]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: pc[0800d83c] pc[08003d50] instr[00051340]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: shmem states:
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: drv_mb[01030009] fw_mb[00000009] link_status[0000006f] drv_pulse_mb[00006660]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000001c0: 01005254 42530085 0003610e 00000000
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000002
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 0x3fc[0000ffff]
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end MCP states dump --->
Mar 17 16:14:13 srv8 kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Down
Mar 17 16:14:13 srv8 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Mar 17 16:14:13 srv8 kernel: bonding: bond0: now running without any active interface !
Mar 17 16:14:17 srv8 kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex
Mar 17 16:14:17 srv8 kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
Mar 17 16:14:17 srv8 kernel: bonding: bond0: making interface eth0 the new active one.
Mar 17 16:14:17 srv8 kernel: bonding: bond0: first active interface up!
=======

=======
Mar 19 12:10:47 srv9 kernel: do_IRQ: 5.207 No irq handler for vector (irq -1)
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- start FTQ dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: RV2P_PFTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: RV2P_TFTQ_CTL 00020000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: RV2P_MFTQ_CTL 00004000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TBDR_FTQ_CTL 00004002
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TDMA_FTQ_CTL 00010002
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TPAT_FTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: RXP_CFTQ_CTL 00008000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: RXP_FTQ_CTL 00100000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: COM_COMXQ_FTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: COM_COMTQ_FTQ_CTL 00020000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: COM_COMQ_FTQ_CTL 00010000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: CP_CPQ_FTQ_CTL 00004002
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: CPU states:
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001288 instr 8e030000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a4c pc 8000a5c instr 38420001
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c1c pc 8004c10 instr 32050003
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 105000 mode b8cc state 80008000 evt_mask 500 pc 8000b28 pc 8000a98 instr 8c530000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 800d1b4 pc 8001c04 instr 31a80
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000c58 pc 8000c58 instr 8ce800e8
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- end FTQ dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- start TBDC dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: TBDC free cnt: 32
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: LINE CID BIDX CMD VALIDS
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 00 000800 0f10 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 01 001180 7208 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 02 001300 72c0 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 03 001080 bb80 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 04 001280 9e28 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 05 000800 7d30 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 06 001180 1920 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 07 001080 ff10 00 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 08 1eee00 7df8 ff [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 09 1fed80 fbf8 7b [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0a 1a6a80 9dd0 f2 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0b 1fb180 eff0 f8 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0c 17db80 eed8 7f [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0d 0f5380 9be8 7b [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0e 1fbb00 fff0 7f [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 0f 1a7b00 feb8 7e [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 10 156380 6ff8 d8 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 11 0faf80 67f8 fe [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 12 05e780 fda8 8b [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 13 15f780 afb8 df [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 14 1f6700 3f80 fb [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 15 1bff80 7fe0 f9 [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 16 11fe80 f3b0 ef [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 17 0e7980 fff8 9a [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 18 1fff80 7fd0 fe [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 19 1fbe80 37f0 bb [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1a 1b0880 77a8 de [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1b 0eef00 bfe0 ab [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1c 1d5c80 57b8 bf [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1d 17e600 ff38 4f [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1e 1bef80 b7f8 df [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: 1f 13f480 bff8 7f [0]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- end TBDC dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01fd0002]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PBA[00000000]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- start MCP states dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: pc[08001d60] pc[0800d7e4] instr[8f640428]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: shmem states:
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[00004b9b]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000001c0: 01005254 42530083 0003610e 00000000
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000002
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 0x3fc[0000ffff]
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: <--- end MCP states dump --->
Mar 19 12:10:53 srv9 kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Down
Mar 19 12:10:53 srv9 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Mar 19 12:10:53 srv9 kernel: bonding: bond0: making interface eth5 the new active one.
Mar 19 12:10:57 srv9 kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Up, 1000 Mbps full duplex
Mar 19 12:10:57 srv9 kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
=======
ComplexMind

ComplexMind

2013-03-21 11:41

reporter   ~0016793

Oops - it's not clear from my paste above that it's the same error. Here is an earlier crash with the transmit queue timed out message:

Mar 17 09:46:44 srv8 kernel: do_IRQ: 9.73 No irq handler for vector (irq -1)
Mar 17 09:46:50 srv8 kernel: ------------[ cut here ]------------
Mar 17 09:46:50 srv8 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Mar 17 09:46:50 srv8 kernel: Hardware name: ProLiant DL380 G7
Mar 17 09:46:50 srv8 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit queue 2 timed out
Mar 17 09:46:50 srv8 kernel: Modules linked in: dlm configfs sunrpc cpufreq_ondemand freq_table pcc_cpufreq bonding 8021q garp stp llc ipv6 power_meter ses enclosure igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support hpwdt hpilo bnx2 i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Mar 17 09:46:50 srv8 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-358.2.1.el6.x86_64 #1
Mar 17 09:46:50 srv8 kernel: Call Trace:
Mar 17 09:46:50 srv8 kernel: <IRQ> [<ffffffff8106e2e7>] ? warn_slowpath_common+0x87/0xc0
Mar 17 09:46:50 srv8 kernel: [<ffffffff8106e3d6>] ? warn_slowpath_fmt+0x46/0x50
Mar 17 09:46:50 srv8 kernel: [<ffffffff81467a9d>] ? dev_watchdog+0x26d/0x280
Mar 17 09:46:50 srv8 kernel: [<ffffffff813b5760>] ? rh_timer_func+0x0/0x10
Mar 17 09:46:50 srv8 kernel: [<ffffffff813b4f40>] ? usb_hcd_poll_rh_status+0x140/0x180
Mar 17 09:46:50 srv8 kernel: [<ffffffff81467830>] ? dev_watchdog+0x0/0x280
Mar 17 09:46:50 srv8 kernel: [<ffffffff81081837>] ? run_timer_softirq+0x197/0x340
Mar 17 09:46:50 srv8 kernel: [<ffffffff810a7ff0>] ? tick_sched_timer+0x0/0xc0
Mar 17 09:46:50 srv8 kernel: [<ffffffff8102e94d>] ? lapic_next_event+0x1d/0x30
Mar 17 09:46:50 srv8 kernel: [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0
Mar 17 09:46:50 srv8 kernel: [<ffffffff8109b77b>] ? hrtimer_interrupt+0x14b/0x260
Mar 17 09:46:50 srv8 kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
Mar 17 09:46:50 srv8 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Mar 17 09:46:50 srv8 kernel: [<ffffffff81076d95>] ? irq_exit+0x85/0x90
Mar 17 09:46:50 srv8 kernel: [<ffffffff81517000>] ? smp_apic_timer_interrupt+0x70/0x9b
Mar 17 09:46:50 srv8 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Mar 17 09:46:50 srv8 kernel: <EOI> [<ffffffff812d397e>] ? intel_idle+0xde/0x170
Mar 17 09:46:50 srv8 kernel: [<ffffffff812d3961>] ? intel_idle+0xc1/0x170
Mar 17 09:46:50 srv8 kernel: [<ffffffff81415117>] ? cpuidle_idle_call+0xa7/0x140
Mar 17 09:46:50 srv8 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Mar 17 09:46:50 srv8 kernel: [<ffffffff814f31aa>] ? rest_init+0x7a/0x80
Mar 17 09:46:50 srv8 kernel: [<ffffffff81c27f7b>] ? start_kernel+0x424/0x430
Mar 17 09:46:50 srv8 kernel: [<ffffffff81c2733a>] ? x86_64_start_reservations+0x125/0x129
Mar 17 09:46:50 srv8 kernel: [<ffffffff81c27438>] ? x86_64_start_kernel+0xfa/0x109
Mar 17 09:46:50 srv8 kernel: ---[ end trace e8421d5d72b8987b ]---
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start FTQ dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_PFTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_TFTQ_CTL 00020000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: RV2P_MFTQ_CTL 00004000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TBDR_FTQ_CTL 00004002
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TDMA_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TXP_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TPAT_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: RXP_CFTQ_CTL 00008000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: RXP_FTQ_CTL 00100000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMXQ_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMTQ_FTQ_CTL 00020000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: COM_COMQ_FTQ_CTL 00010000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: CP_CPQ_FTQ_CTL 00004000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: CPU states:
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8001294 pc 800128c instr 8e260000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 085000 mode b84c state 80009000 evt_mask 500 pc 8000a4c pc 8000a5c instr 1440fffc
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c14 pc 8004c1c instr 32050003
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000a9c instr 10620021
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 8000100 pc 800d194 instr 9063f1dc
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c50 pc 800092c instr 3bf90001
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end FTQ dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start TBDC dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: TBDC free cnt: 32
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: LINE CID BIDX CMD VALIDS
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 00 001280 0e40 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 01 001200 ce38 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 02 001000 bb78 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 03 000800 e7a8 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 04 001300 45c0 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 05 001280 8bb8 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 06 001100 4350 00 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 07 0ee680 ffd0 3a [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 08 1d6c80 cbc8 6f [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 09 1f5f80 fbf8 5e [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0a 17fc80 77f8 ef [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0b 1fa180 df38 dd [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0c 0bef80 f9f0 bf [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0d 1e6f00 fed8 ff [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0e 13dc80 bff8 bf [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 0f 1f7c80 7fa0 f7 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 10 0ff780 3e90 ff [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 11 1dfd80 dff0 de [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 12 1eed80 3bf8 be [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 13 15ef00 fce8 fe [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 14 053f80 df78 5b [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 15 0f7f80 fdc8 da [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 16 0dcb80 cfe8 ef [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 17 0bfd80 c7f8 ff [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 18 0f9f80 59b8 ad [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 19 1ddf80 ffe8 f7 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1a 1fff80 f768 b7 [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1b 1e7f80 d5d8 ff [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1c 1ff700 bdf8 ea [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1d 1efb80 eff8 5f [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1e 0bde80 3ef8 db [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: 1f 1ffe80 f378 9f [0]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end TBDC dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: intr_sem[0] PCI_CMD[00100446]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01fb0004]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: PBA[00000000]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- start MCP states dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: MCP mode[0000b880] state[80004000] evt_mask[00000500]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: pc[0800d1c8] pc[0800afe0] instr[0362d826]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: shmem states:
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: drv_mb[01030003] fw_mb[00000003] link_status[0000006f] drv_pulse_mb[00000b95]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] condition[0003610e]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000001c0: 01005254 42530088 0003610e 00000000
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a3c
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000002
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: DEBUG: 0x3fc[0000ffff]
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: <--- end MCP states dump --->
Mar 17 09:46:50 srv8 kernel: bnx2 0000:03:00.0: eth0: NIC Copper Link is Down
Mar 17 09:46:50 srv8 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Mar 17 09:46:50 srv8 kernel: bonding: bond0: now running without any active interface !
Mar 17 09:46:51 srv8 abrtd: Directory 'oops-2013-03-17-09:46:51-4320-1' creation detected
toracat

toracat

2013-04-24 16:07

manager   ~0017293

There is some good chance that the bug was fixed in the latest kernel 2.6.32-358.6.1.el6. Can those affected with this issue please try updating the kernel and report back?
ComplexMind

ComplexMind

2013-04-24 16:28

reporter   ~0017295

Thanks for the update. We will schedule maintenance for an update of this machine and report back.

Out of interest, I looked through the changelog for the RPM and didn't see an obvious candidate. Did you have a change in mind or is the suggestion speculative?

Thanks again!
toracat

toracat

2013-04-24 16:37

manager   ~0017298

Cannot say for sure but the issue seems to be related to:

https://bugzilla.redhat.com/show_bug.cgi?id=887006

and

https://access.redhat.com/site/solutions/110053

There, it was resolved in kernel 2.6.32-358.6.1.el6.
ComplexMind

ComplexMind

2013-04-24 17:02

reporter   ~0017299

Excellent, thanks, yes i can see that in the changelog:

- [x86] irq: add quirk for broken interrupt remapping on 55XX chipsets (Neil Horman) [911267 887006]

And of course that affects me:

[root@srv8 ~]# lspci | grep 55
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:0a.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 13)
00:0d.4 Host bridge: Intel Corporation 7500/5520/5500/X58 Physical Layer Port 0 (rev 13)
00:0d.5 Host bridge: Intel Corporation 7500/5520/5500 Physical Layer Port 1 (rev 13)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)

Fingers crossed, will report back!

M
ComplexMind

ComplexMind

2013-05-21 17:28

reporter   ~0017484

I'm ready to report back on this issue... Following kernel and BIOS updates, I now get this error in messages:

[Hardware Error]: This system BIOS has enabled interrupt remapping on a chipset that contains an errata making that feature unstable.  Please reboot with nointremap added to the kernel command line and contact your BIOS vendor for an update

Clearly the bugfix is to emit a warning on effected systems. We have a window scheduled to reboot the nodes with the 'nointremap' option.

I will update in the next few days once we have done this...

M
leifh

leifh

2013-06-05 10:51

reporter   ~0017526

Running a DL380 G7 having the above symptoms. Rebooted yesterday to the latest kernel (2.6.32-358.6.2.el6.x86_64) with the parameter intremap=off (nointremap is deprecated). No problem with network or the annoying "No irq handler for vector (irq -1)" in dmesg.

Had the irq message also on one of 5 DL380G6. They are also running latest kernel with the parameter intremap=off. No more messages since reboots on monday.
ComplexMind

ComplexMind

2013-06-05 11:07

reporter   ~0017527

Can confirm all affected systems have been rebooted with the parameter intremap=off, and no further problems in 12 days...
toracat

toracat

2013-06-05 16:23

manager   ~0017530

Although the OP (einzelhaft) has not responded, I'm going to close this as 'resolved' based on the reports by others. Feel free to reopen or start a new one if there is still a problem that has to be addressed.

Issue History

Date Modified Username Field Change
2012-10-09 17:00 einzelhaft New Issue
2012-10-09 17:00 einzelhaft Note Added: 0015906
2012-10-09 18:21 einzelhaft Note Added: 0015907
2012-12-12 18:08 einzelhaft Note Added: 0016136
2013-03-21 10:45 ComplexMind Note Added: 0016792
2013-03-21 11:41 ComplexMind Note Added: 0016793
2013-04-24 16:07 toracat Note Added: 0017293
2013-04-24 16:28 ComplexMind Note Added: 0017295
2013-04-24 16:30 toracat Status new => feedback
2013-04-24 16:30 toracat Relationship added related to 0006249
2013-04-24 16:37 toracat Note Added: 0017298
2013-04-24 17:02 ComplexMind Note Added: 0017299
2013-05-21 17:28 ComplexMind Note Added: 0017484
2013-06-05 10:51 leifh Note Added: 0017526
2013-06-05 11:07 ComplexMind Note Added: 0017527
2013-06-05 16:23 toracat Note Added: 0017530
2013-06-05 16:24 toracat Status feedback => resolved
2013-06-05 16:24 toracat Resolution open => fixed