View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0006814||CentOS-6||kernel||public||2013-12-04 05:35||2013-12-04 18:41|
|Summary||0006814: Major Failure of Intel 82574L under kernel-2.6.32-431.el6|
|Description||Since updating a system with a SuperMicro X8SIE-F board with dual Intel 82574L NICs to kernel-2.6.32-431.el6 and rebooting, I have been having consistent NIC failures where the NIC shuts down permanently until a soft reboot is performed.|
Once the problem manifests, the following log messages appear:
Dec 3 13:03:09 updater kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Dec 3 13:03:09 updater kernel: Modules linked in: hcpdriver(P)(U) ip6table_filter ip6_tables iptable_filter ip_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf iTCO_wdt iTCO_vendor_support microcode serio_raw i2c_i801 i2c_core sg lpc_ich mfd_core e1000e ptp pps_core ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: hcpdriver]
Dec 3 13:03:09 updater kernel: e1000e 0000:03:00.0: eth0: Reset adapter unexpectedly
Dec 3 13:03:10 updater kernel: e1000e 0000:03:00.0: eth0: Timesync Tx Control register not set as expected
After this it will not be possible to bring the NIC back up because of this Timesync register being incorrectly set.
I determined that a current workaround is to set the kernel option pcie_aspm=off at boot. Another valid workaround is to revert to the 2.6.32-318.104.22.168l6 kernel.
This problem has been affecting the install images for over-the-network installs onto these motherboards as well as previously operating Linux installs. Booting the installer with pcie_aspm=off allows us to complete the installation.
I have not yet located any other reports of these specific messages for any CentOS or RHEL kernel user, but I have found one reference to this messages here:
This post references a RH bugzilla entry about a Fedora kernel but I believe the user is incorrect about the issue being the same. The RH bug does not reference the same issue as the kernel messages are different.
My impression is that this is a new and unreported issue in the 6.5 kernel. Should I report this directly into the RHEL bugzilla?
|Tags||No tags attached.|
I just noticed bug 0006810 exists about a similar issue. It refers initially to a broadcom driver. I am not certain if these are the same issue or coincidental problems with different drivers that both happened to receive updates in this kernel.
I am not sure I accept my report is a duplicate of 0006814, but please mark this as such if you feel strongly they are the same issue.
I am not getting 2 hours out of my server in these situations, I lose my NIC almost immediately as soon as the server pushes traffic.
I'd personally prefer putting all bugs together that display the same (or very similar) symptoms. There could be multiple different causes even if the errors look similar. But proposing potential fixes etc will be easier if all are in the same thread.
This bug report can remain separate if you so wish. Please monitor bug #6810 for any progress.
|You can close this as a duplicate, I guess since the troubleshooting effort for the e1000e problem is happening in #0006810|
|Closing as a dupe of #6810.|
|2013-12-04 05:35||kstange||New Issue|
|2013-12-04 05:45||kstange||Note Added: 0018547|
|2013-12-04 17:04||toracat||Note Added: 0018555|
|2013-12-04 18:37||kstange||Note Added: 0018560|
|2013-12-04 18:41||toracat||Relationship added||duplicate of 0006810|
|2013-12-04 18:41||toracat||Note Added: 0018561|
|2013-12-04 18:41||toracat||Status||new => closed|
|2013-12-04 18:41||toracat||Resolution||open => duplicate|