View Issue Details

IDProjectCategoryView StatusLast Update
0006814CentOS-6kernelpublic2013-12-04 18:41
Reporterkstange 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionduplicate 
Product Version6.4 
Target VersionFixed in Version 
Summary0006814: Major Failure of Intel 82574L under kernel-2.6.32-431.el6
DescriptionSince updating a system with a SuperMicro X8SIE-F board with dual Intel 82574L NICs to kernel-2.6.32-431.el6 and rebooting, I have been having consistent NIC failures where the NIC shuts down permanently until a soft reboot is performed.

Once the problem manifests, the following log messages appear:

Dec 3 13:03:09 updater kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Dec 3 13:03:09 updater kernel: Modules linked in: hcpdriver(P)(U) ip6table_filter ip6_tables iptable_filter ip_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf iTCO_wdt iTCO_vendor_support microcode serio_raw i2c_i801 i2c_core sg lpc_ich mfd_core e1000e ptp pps_core ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: hcpdriver]
Dec 3 13:03:09 updater kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly
Dec 3 13:03:10 updater kernel: e1000e 0000:03:00.0: eth0: Timesync Tx Control register not set as expected

After this it will not be possible to bring the NIC back up because of this Timesync register being incorrectly set.

I determined that a current workaround is to set the kernel option pcie_aspm=off at boot. Another valid workaround is to revert to the 2.6.32-358.23.2.3l6 kernel.

This problem has been affecting the install images for over-the-network installs onto these motherboards as well as previously operating Linux installs. Booting the installer with pcie_aspm=off allows us to complete the installation.

I have not yet located any other reports of these specific messages for any CentOS or RHEL kernel user, but I have found one reference to this messages here:

http://www.mail-archive.com/pve-user@pve.proxmox.com/msg01714.html

This post references a RH bugzilla entry about a Fedora kernel but I believe the user is incorrect about the issue being the same. The RH bug does not reference the same issue as the kernel messages are different.

My impression is that this is a new and unreported issue in the 6.5 kernel. Should I report this directly into the RHEL bugzilla?
TagsNo tags attached.

Relationships

duplicate of 0006810 closedtoracat LAN Driver Crash after update and reboot 

Activities

kstange

kstange

2013-12-04 05:45

reporter   ~0018547

I just noticed bug 0006810 exists about a similar issue. It refers initially to a broadcom driver. I am not certain if these are the same issue or coincidental problems with different drivers that both happened to receive updates in this kernel.

I am not sure I accept my report is a duplicate of 0006814, but please mark this as such if you feel strongly they are the same issue.

I am not getting 2 hours out of my server in these situations, I lose my NIC almost immediately as soon as the server pushes traffic.
toracat

toracat

2013-12-04 17:04

manager   ~0018555

I'd personally prefer putting all bugs together that display the same (or very similar) symptoms. There could be multiple different causes even if the errors look similar. But proposing potential fixes etc will be easier if all are in the same thread.

This bug report can remain separate if you so wish. Please monitor bug #6810 for any progress.
kstange

kstange

2013-12-04 18:37

reporter   ~0018560

You can close this as a duplicate, I guess since the troubleshooting effort for the e1000e problem is happening in #0006810
toracat

toracat

2013-12-04 18:41

manager   ~0018561

Closing as a dupe of #6810.

Issue History

Date Modified Username Field Change
2013-12-04 05:35 kstange New Issue
2013-12-04 05:45 kstange Note Added: 0018547
2013-12-04 17:04 toracat Note Added: 0018555
2013-12-04 18:37 kstange Note Added: 0018560
2013-12-04 18:41 toracat Relationship added duplicate of 0006810
2013-12-04 18:41 toracat Note Added: 0018561
2013-12-04 18:41 toracat Status new => closed
2013-12-04 18:41 toracat Resolution open => duplicate