View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0015195||CentOS-7||kernel||public||2018-08-20 09:11||2020-02-13 17:19|
|Platform||Supermicro||OS||CentOs 7||OS Version||7.4.1708|
|Target Version||Fixed in Version|
|Summary||0015195: ixgbe driver 5.1.0-k-rh7.5 generates network instabilities and the machine finally losses connectivity|
|Description||Several servers with network card Intel(R) 82599 10 Gigabit Dual Port experience network instabilities that cause them to finally loss connectivity. The machines are using the driver ixgbe version 5.1.0-k-rh7.5 provided by the CentOs7 kernel 3.10.0-862. The OS does not report any special error in messages.|
Updating the driver to version 5.3.7, the issue seems to be solved. It is required to recompile the driver for each new kernel version.
Is it possible to provide the upcoming kernel releases with ixgbe driver version 5.3.7?
|Steps To Reproduce||* Install a machine with network card Intel(R) 82599 10 Gigabit Dual Port with CentOs7 and kernel family 3.10.0-862|
* Let the machine works normally doing transfers through the network
* The machine finally losses connectivity after a few hours
|Tags||centos 7, drivers, kernel, network|
CentOS only rebuilds the kernel SRPM provided by Redhat for RHEL. You would need to get RH to update it in the RHEL SRPM by reporting it on bugzilla.redhat.com
I'd also say this is not a universal problem. I have Intel x710 cards in use 24 hours a day and haven't seen any problems. Are you using the latest firmware for your cards? Current kernel version is 3.10.0-862.11.6.el7
Thank you very much.
We are using the last kernel version 3.10.0-862.11.6.el7.x86_64 and we try to keep updated the firmware.
We will proceed to report the issue to bugzilla.redhat.com.
cacosta, did you ever find any resolution to this?
I am having the same problem in RHEL, and it all started with the 5.1.0-k-rh7.5 driver update. When using the upstream Intel drivers I don't experience the issue at all. I have been working with Red Hat support on this issue for months.
Can I ask, were/are you using bonded nics on the CentOS system? If so, are you using teamd or traditional bonding? If teamd, what runner are you using? Can I also ask are you using standard 1500 MTU, or, jumbo frames?
Did you ever open a Red Hat bugzilla?
"When using the upstream Intel drivers I don't experience the issue at all." <== Which version is this?
I looked at the Intel site. The latest version is 5.5.5. ELRepo has the kmod-ixgbe package with this version of the Intel driver. It is being released to the elrepo-testing repository. It will show up in https://elrepo.org/linux/testing/el7/x86_64/RPMS/ soon. Please test if you are able.
@cacosta, can you also give it a try?
The version I compiled and tested was 5.5.3, but I'm sure 5.5.5 will be fine too.
I checked elrepo a while back to see if a kmod-ixgbe was available for el7 and didn't find one, so I thought perhaps it was not being offered anymore. I was considering making a request. :)
By the way, my colleagues and I, at MIT, are big fans of elrepo. We could not survive without it. Thanks for all your hard work!
Thanks for the accolade. :)
We opened a bugzilla and they confirmed us that the ixgbe driver is scheduled to be updated to the latest upstream code for
We are using traditional bondings and jumbo frames, yes.
The package installed successfully and no issues yet with 5.5.5. Thanks again!
Appreciate the info. I suspect the problem might be triggered by bonding/teaming. I have another system using the same NIC, but only a single link configuration, and it's not having problems with the Red Hat drivers.
Thanks for the confirmation. I will promote the kmod-ixgbe package to the main repository.
Toracat's package has worked perfectly to resolve this issue for us. Nevertheless, I continue to work with Red Hat to help them identify and fix the problem with the driver provided in the RHEL kernel.
Red Hat has provided me with a RHEL 7.7 pre-beta kernel that contains updates to the ixgbe driver code (5.1.0-k-rh7.7). However, the network problems were still present. I was forced to roll back to the RHEL 7.6 kernel and kmod-ixgbe package.
Red Hat's engineering team further states that the RHEL 7.7 ixgbe driver has received all backports from the driver in the upstream kernel, with the exception of ones that would not be compatible with the RHEL kernel version. As for what ports from the Intel official driver code may or may not have made it into the upstream kernel is outside of their control. So, it is unclear at this point where and how the bug was introduced.
I have provided Red Hat with extensive hardware and software information about our system and network infrastructure as they attempt to reproduce the issue. @cacosta I would really be interested in learning more about your setup, if you're willing to share, to see if we can identify areas we have in common. For example:
What SuperMicro motherboard model?
Are you using onboard Intel 82599 10Gb ports, or, PCI-E card? What card model?
Brand and model of network switch? Stacked (redundant) switches?
Switchport configuration? LACP active?
Can you provide any detail about the function of your system? Software it runs?
I would be happy to discuss in private, over email or phone, if you would rather. Thanks!
Here you have our information:
* What SuperMicro motherboard model?
We see this problem in different Supermicro models with different motherboards: X9DR3-F and X10DSC+
* Are you using onboard Intel 82599 10Gb ports, or, PCI-E card? What card model?
Intel card used is Intel 82599ES:
Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
* Brand and model of network switch? Stacked (redundant) switches?
Some of these servers are connected to a two stacked Dell S4048ON switches and some others to a Cisco Nexus7009.
* Switchport configuration? LACP active?
Switchport access vlan and LACP active
* Can you provide any detail about the function of your system? Software it runs?
Storage servers using a data management application called dCache (https://www.dcache.org/). The problem has been detected with both read and write access to data.
Feel free to ask me any further information by email to firstname.lastname@example.org
Thank you very much
@tganon and @cacosta
Yes, please do update here with any info/progress.
Red Hat has identified this as an issue affecting multiple 10Gb ethernet adapters beginning with RHEL 7.5. It has to do with kernel memory allocation.
"TCP packets were getting dropped by sk_filter_trim_cap() due to returning -ENOMEM. This is due to memory fragmentation causing allocations to fail."
The workaround is to increase kernel tunable vm.min_free_kbytes to 1 GiB or 5% of total system memory, whichever is larger.
echo "vm.min_free_kbytes = 1048576" >> /etc/sysctl.conf
I have been using the Red Hat-provided drivers for 20 days now with the workaround and the problem has not reoccurred. Red Hat plans to increase the default vm.min_free_kbytes in a future patch.
Thanks for everyone's help with this.
We are seeing this issue on Dell R640 servers. The interface is being used for reading TAP traffic off a network. Is there an update on a solution to this issue?
$ ethtool -i eno2
firmware-version: 0x80000925, 18.3.6
$ sudo lspci -v | grep -A 18 18:00.1
18:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
Subsystem: Dell Ethernet 10G 4P X550/I350 rNDC
Flags: bus master, fast devsel, latency 0, IRQ 62, NUMA node 0
Memory at ab000000 (64-bit, prefetchable) [size=4M]
Memory at ab800000 (64-bit, prefetchable) [size=16K]
Expansion ROM at abf80000 [disabled] [size=512K]
Capabilities:  Power Management version 3
Capabilities:  MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities:  MSI-X: Enable+ Count=64 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [e0] Vital Product Data
Capabilities:  Advanced Error Reporting
Capabilities:  Device Serial Number 96-b1-ff-ff-ff-6c-00-00
Capabilities:  Alternative Routing-ID Interpretation (ARI)
Capabilities: [1a0] Transaction Processing Hints
Capabilities: [1b0] Access Control Services
Kernel driver in use: ixgbe
Kernel modules: ixgbe
$ ip a show eno2
5: eno2: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP group default qlen 1000
link/ether 24:6e:96:b1:ff:6d brd ff:ff:ff:ff:ff:ff
$ uname -a
Linux localhost.localdomain 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
|2018-08-20 09:11||cacosta||New Issue|
|2018-08-20 09:11||cacosta||Tag Attached: centos 7|
|2018-08-20 09:11||cacosta||Tag Attached: drivers|
|2018-08-20 09:11||cacosta||Tag Attached: kernel|
|2018-08-20 09:11||cacosta||Tag Attached: network|
|2018-08-20 09:14||TrevorH||Note Added: 0032545|
|2018-08-20 09:37||cacosta||Note Added: 0032546|
|2019-03-28 20:04||tgagnon||Note Added: 0034125|
|2019-03-28 23:52||toracat||Note Added: 0034126|
|2019-03-28 23:54||toracat||Status||new => assigned|
|2019-03-29 00:47||tgagnon||Note Added: 0034127|
|2019-03-29 05:42||toracat||Note Added: 0034128|
|2019-03-29 09:20||cacosta||Note Added: 0034130|
|2019-03-30 22:44||tgagnon||Note Added: 0034133|
|2019-03-31 08:27||toracat||Note Added: 0034135|
|2019-04-25 01:49||tgagnon||Note Added: 0034251|
|2019-04-25 12:45||cacosta||Note Added: 0034256|
|2019-04-25 15:01||toracat||Note Added: 0034260|
|2019-06-26 14:28||tgagnon||Note Added: 0034725|
|2020-02-13 17:19||jgress||Note Added: 0036289|