View Issue Details

IDProjectCategoryView StatusLast Update
0017566CentOS-7-OTHERpublic2020-07-07 17:03
Reporterzvioloni@synamedia.com 
PrioritynormalSeveritymajorReproducibilityunable to reproduce
Status newResolutionopen 
Product Version7.7-1908 
Target VersionFixed in Version 
Summary0017566: ixgbe driver - network interface hangs with " Detected Tx Unit Hang" mesage in the log
DescriptionEven after upgrading kernel to 3.10.0-1062.12.1.el7.x86_64 which should not be susceptible to tx hangs anymore we are still seeing this issue. Issue is seen on one site on multiple servers, all with ixgbe driver and Intel 10Gb X550T hardware. We are trying to reproduce issue on different site with same combination of hardware/software but not successful so far.
Steps To ReproduceN/A
Tagsixgbe
abrt_hash
URL

Activities

ManuelWolfshant

ManuelWolfshant

2020-07-06 17:48

manager   ~0037311

Last edited: 2020-07-06 17:49

View 2 revisions

Please update your systems to CentOS 7.8, the kernel you mentioned is from 7.7 which is no longer supported by CentOS. If you can reproduce the issue with kernel-3.10.0-1127.13.1.el7.x86_64 then please file a bug at bugzilla.redhat.com as CentOS merely builds from the RHEL sources as published by RedHat. Once RedHat fixes the issue, the changes will be automatically be seen in CentSO as well.
For easier tracking, please crosslink the bug opened here with the one from b.r.c.

toracat

toracat

2020-07-06 21:40

manager   ~0037315

Looks like the problem was fixed by this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=377228accbbb8b9738f615d791aa803f41c067e0

It was added to kernel 3.10.0-1099.el7, meaning it is in the CentOS 7.8 kernel. So, as pointed out by @ManuelWolfshant, please update your kernel to the latest version and that's supposed to resolve the issue.
zvioloni@synamedia.com

zvioloni@synamedia.com

2020-07-07 12:20

reporter   ~0037323

toracat: I already checked resources at redhat and that patch was part of 3.10.0-1062.12.1.el7.x86_64 which is Centos 7.7. We tried it and it didn't work. If you take a look at the driver in 7.8 you will see that it was not changed:
$ cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
driver=ixgbe driverversion=5.1.0-k-rh7.7
zvioloni@synamedia.com

zvioloni@synamedia.com

2020-07-07 12:29

reporter   ~0037324

kernel-3.10.0-1062.12.1.el7

/Patches

/8961-netdrv-ixgbe-Prevent-u8-wrapping-of-ITR-value-to-som.patch
ManuelWolfshant

ManuelWolfshant

2020-07-07 13:52

manager   ~0037326

Probably I was not clear enough in the first reply so let me explain in a clearer way: at any point in time, CentOS supports exclusively the latest minor release of any major release. Independent of what patches you found and think that were relevant to your problem. Especially as we do not support "cherry picking" updates.
In this moment there are exactly 2 ways forward
- you keep attempting to use a kernel from 7.7. In this case you will not receive any form of support from CentOS but you are strongly encouraged to purchase an EUS subscription from RedHat, the only ones who can and do provide support for older releases.
- you follow our advice and update to CentOS 7.8 ( and I mean update the whole system, not just the kernel !). In this case there are also 2 variants
a) your problem was fixed in which case everybody is happy
b) you can still reproduce the issue in which case you will need to file a bug at bugzilla.redhat.com and wait for them to implement a fix.
zvioloni@synamedia.com

zvioloni@synamedia.com

2020-07-07 13:57

reporter   ~0037327

ManuelWolfshant: You were perfectly clear and I did what you suggested, just wanted to correct toracat in what he stated because information was incorrect. This is my last post here and if you want you can remove whole thing since the resolution has to come from redhat. Have a great day.
ManuelWolfshant

ManuelWolfshant

2020-07-07 14:02

manager   ~0037328

Last edited: 2020-07-07 14:02

View 2 revisions

If you reproduced the issue using the latest kernel and filed a bug at bugzilla.r.c, please crosslink this bug with the RedHat one. In this way it gets more exposure and other people facing the same problem may benefit from your experience. Especially as kernel bugs at bugzilla.redhat.com are automatically marked private so no one but the reporter and RH can see them.

toracat

toracat

2020-07-07 17:03

manager   ~0037329

@zvioloni@synamedia.com

I paste the following section from the spec file of the current kernel 3.10.0-1127.13.1.el7:

* Wed Sep 25 2019 Jan Stancek <jstancek@redhat.com> [3.10.0-1099.el7]
- [char] tpm: tpm_try_transmit() refactor error flow (Jerry Snitselaar) [1731225]
- [powerpc] powerpc/pseries: correctly track irq state in default idle (Steve Best) [1751970]
- [md] raid5 improve too many read errors msg by adding limits (Nigel Croxon) [1700665]
- [netdrv] ixgbe: Prevent u8 wrapping of ITR value to something less than 10us (Ken Cox) [1750856] <===== ixgbe fix
- [kernel] sched: Skip double execution of pick_next_task_fair() (Phil Auld) [1750819]

As I wrote earlier, according to the changelog, the referenced patch was applied to kernel-3.10.0-1099.el7.

Issue History

Date Modified Username Field Change
2020-07-06 17:36 zvioloni@synamedia.com New Issue
2020-07-06 17:36 zvioloni@synamedia.com Tag Attached: ixgbe
2020-07-06 17:48 ManuelWolfshant Note Added: 0037311
2020-07-06 17:49 ManuelWolfshant Note Edited: 0037311 View Revisions
2020-07-06 21:40 toracat Note Added: 0037315
2020-07-07 12:20 zvioloni@synamedia.com Note Added: 0037323
2020-07-07 12:29 zvioloni@synamedia.com Note Added: 0037324
2020-07-07 13:52 ManuelWolfshant Note Added: 0037326
2020-07-07 13:57 zvioloni@synamedia.com Note Added: 0037327
2020-07-07 14:02 ManuelWolfshant Note Added: 0037328
2020-07-07 14:02 ManuelWolfshant Note Edited: 0037328 View Revisions
2020-07-07 17:03 toracat Note Added: 0037329