View Issue Details

IDProjectCategoryView StatusLast Update
0015195CentOS-7kernelpublic2019-06-26 14:28
Reportercacosta 
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
PlatformSupermicroOSCentOs 7OS Version7.4.1708
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0015195: ixgbe driver 5.1.0-k-rh7.5 generates network instabilities and the machine finally losses connectivity
DescriptionSeveral servers with network card Intel(R) 82599 10 Gigabit Dual Port experience network instabilities that cause them to finally loss connectivity. The machines are using the driver ixgbe version 5.1.0-k-rh7.5 provided by the CentOs7 kernel 3.10.0-862. The OS does not report any special error in messages.

Updating the driver to version 5.3.7, the issue seems to be solved. It is required to recompile the driver for each new kernel version.

Is it possible to provide the upcoming kernel releases with ixgbe driver version 5.3.7?
Steps To Reproduce* Install a machine with network card Intel(R) 82599 10 Gigabit Dual Port with CentOs7 and kernel family 3.10.0-862
* Let the machine works normally doing transfers through the network
* The machine finally losses connectivity after a few hours
Tagscentos 7, drivers, kernel, network
abrt_hash
URL

Activities

TrevorH

TrevorH

2018-08-20 09:14

manager   ~0032545

CentOS only rebuilds the kernel SRPM provided by Redhat for RHEL. You would need to get RH to update it in the RHEL SRPM by reporting it on bugzilla.redhat.com

I'd also say this is not a universal problem. I have Intel x710 cards in use 24 hours a day and haven't seen any problems. Are you using the latest firmware for your cards? Current kernel version is 3.10.0-862.11.6.el7
cacosta

cacosta

2018-08-20 09:37

reporter   ~0032546

Thank you very much.

We are using the last kernel version 3.10.0-862.11.6.el7.x86_64 and we try to keep updated the firmware.

We will proceed to report the issue to bugzilla.redhat.com.
tgagnon

tgagnon

2019-03-28 20:04

reporter   ~0034125

cacosta, did you ever find any resolution to this?

I am having the same problem in RHEL, and it all started with the 5.1.0-k-rh7.5 driver update. When using the upstream Intel drivers I don't experience the issue at all. I have been working with Red Hat support on this issue for months.

Can I ask, were/are you using bonded nics on the CentOS system? If so, are you using teamd or traditional bonding? If teamd, what runner are you using? Can I also ask are you using standard 1500 MTU, or, jumbo frames?

Did you ever open a Red Hat bugzilla?
toracat

toracat

2019-03-28 23:52

manager   ~0034126

@tgagnon

"When using the upstream Intel drivers I don't experience the issue at all." <== Which version is this?

I looked at the Intel site. The latest version is 5.5.5. ELRepo has the kmod-ixgbe package with this version of the Intel driver. It is being released to the elrepo-testing repository. It will show up in https://elrepo.org/linux/testing/el7/x86_64/RPMS/ soon. Please test if you are able.

kmod-ixgbe-5.5.5-1.el7_6.elrepo.x86_64.rpm

@cacosta, can you also give it a try?
tgagnon

tgagnon

2019-03-29 00:47

reporter   ~0034127

@toracat

The version I compiled and tested was 5.5.3, but I'm sure 5.5.5 will be fine too.

I checked elrepo a while back to see if a kmod-ixgbe was available for el7 and didn't find one, so I thought perhaps it was not being offered anymore. I was considering making a request. :)

By the way, my colleagues and I, at MIT, are big fans of elrepo. We could not survive without it. Thanks for all your hard work!
toracat

toracat

2019-03-29 05:42

manager   ~0034128

@tgagnon

Thanks for the accolade. :)
cacosta

cacosta

2019-03-29 09:20

reporter   ~0034130

Hello all,

We opened a bugzilla and they confirmed us that the ixgbe driver is scheduled to be updated to the latest upstream code for
RHEL-7.7.

We are using traditional bondings and jumbo frames, yes.
tgagnon

tgagnon

2019-03-30 22:44

reporter   ~0034133

@toracat

The package installed successfully and no issues yet with 5.5.5. Thanks again!

@cacosta

Appreciate the info. I suspect the problem might be triggered by bonding/teaming. I have another system using the same NIC, but only a single link configuration, and it's not having problems with the Red Hat drivers.
toracat

toracat

2019-03-31 08:27

manager   ~0034135

@tgagnon

Thanks for the confirmation. I will promote the kmod-ixgbe package to the main repository.
tgagnon

tgagnon

2019-04-25 01:49

reporter   ~0034251

Update:

Toracat's package has worked perfectly to resolve this issue for us. Nevertheless, I continue to work with Red Hat to help them identify and fix the problem with the driver provided in the RHEL kernel.

Red Hat has provided me with a RHEL 7.7 pre-beta kernel that contains updates to the ixgbe driver code (5.1.0-k-rh7.7). However, the network problems were still present. I was forced to roll back to the RHEL 7.6 kernel and kmod-ixgbe package.

Red Hat's engineering team further states that the RHEL 7.7 ixgbe driver has received all backports from the driver in the upstream kernel, with the exception of ones that would not be compatible with the RHEL kernel version. As for what ports from the Intel official driver code may or may not have made it into the upstream kernel is outside of their control. So, it is unclear at this point where and how the bug was introduced.

I have provided Red Hat with extensive hardware and software information about our system and network infrastructure as they attempt to reproduce the issue. @cacosta I would really be interested in learning more about your setup, if you're willing to share, to see if we can identify areas we have in common. For example:

What SuperMicro motherboard model?
Are you using onboard Intel 82599 10Gb ports, or, PCI-E card? What card model?
Brand and model of network switch? Stacked (redundant) switches?
Switchport configuration? LACP active?
Can you provide any detail about the function of your system? Software it runs?

I would be happy to discuss in private, over email or phone, if you would rather. Thanks!
cacosta

cacosta

2019-04-25 12:45

reporter   ~0034256

Hi @tganon,

Here you have our information:

* What SuperMicro motherboard model?
 We see this problem in different Supermicro models with different motherboards: X9DR3-F and X10DSC+

* Are you using onboard Intel 82599 10Gb ports, or, PCI-E card? What card model?
Intel card used is Intel 82599ES:
Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

* Brand and model of network switch? Stacked (redundant) switches?
 Some of these servers are connected to a two stacked Dell S4048ON switches and some others to a Cisco Nexus7009.

* Switchport configuration? LACP active?
 Switchport access vlan and LACP active

* Can you provide any detail about the function of your system? Software it runs?
 Storage servers using a data management application called dCache (https://www.dcache.org/). The problem has been detected with both read and write access to data.

Feel free to ask me any further information by email to cacosta@pic.es

Thank you very much

Cheers,

Carles
toracat

toracat

2019-04-25 15:01

manager   ~0034260

@tganon and @cacosta

Yes, please do update here with any info/progress.
tgagnon

tgagnon

2019-06-26 14:28

reporter   ~0034725

Red Hat has identified this as an issue affecting multiple 10Gb ethernet adapters beginning with RHEL 7.5. It has to do with kernel memory allocation.

"TCP packets were getting dropped by sk_filter_trim_cap() due to returning -ENOMEM. This is due to memory fragmentation causing allocations to fail."

The workaround is to increase kernel tunable vm.min_free_kbytes to 1 GiB or 5% of total system memory, whichever is larger.

echo "vm.min_free_kbytes = 1048576" >> /etc/sysctl.conf
sysctl -p

I have been using the Red Hat-provided drivers for 20 days now with the workaround and the problem has not reoccurred. Red Hat plans to increase the default vm.min_free_kbytes in a future patch.

Thanks for everyone's help with this.

Issue History

Date Modified Username Field Change
2018-08-20 09:11 cacosta New Issue
2018-08-20 09:11 cacosta Tag Attached: centos 7
2018-08-20 09:11 cacosta Tag Attached: drivers
2018-08-20 09:11 cacosta Tag Attached: kernel
2018-08-20 09:11 cacosta Tag Attached: network
2018-08-20 09:14 TrevorH Note Added: 0032545
2018-08-20 09:37 cacosta Note Added: 0032546
2019-03-28 20:04 tgagnon Note Added: 0034125
2019-03-28 23:52 toracat Note Added: 0034126
2019-03-28 23:54 toracat Status new => assigned
2019-03-29 00:47 tgagnon Note Added: 0034127
2019-03-29 05:42 toracat Note Added: 0034128
2019-03-29 09:20 cacosta Note Added: 0034130
2019-03-30 22:44 tgagnon Note Added: 0034133
2019-03-31 08:27 toracat Note Added: 0034135
2019-04-25 01:49 tgagnon Note Added: 0034251
2019-04-25 12:45 cacosta Note Added: 0034256
2019-04-25 15:01 toracat Note Added: 0034260
2019-06-26 14:28 tgagnon Note Added: 0034725