View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0017888 | CentOS-7 | kernel | public | 2020-11-25 20:13 | 2021-05-12 17:16 |
Reporter | anish | Assigned To | |||
Priority | normal | Severity | block | Reproducibility | always |
Status | new | Resolution | open | ||
Platform | PowerEdge R640 | OS | CentOS | OS Version | 7.9 |
Product Version | 7.9.2009 | ||||
Summary | 0017888: Mellanox MT27710 [ConnectX-4 Lx] NICs unuseable after upgrade to 3.10.0-1160.6.1.el7 | ||||
Description | No traffic is detected on the NIC. Works consistently after rolling back to 3.10.0-1160.2.1.el7. Link seems to go down briefly and then seen coming up again. All other dmesg output matches exactly between kernels. the Nic uses the mlx5_core driver | ||||
Additional Information | Did notice these changes in 1160.3.1 : - [netdrv] net/mlx5e: Modify uplink state on interface up/down (Alaa Hleihel) [1733181] - [netdrv] net/mlx5: E-Switch, Disable esw manager vport correctly (Alaa Hleihel) [1733181] - [netdrv] net/mlx5: E-Switch, Properly refer to host PF vport as other vport (Alaa Hleihel) [1733181] | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | |||||
CentOS is a rebuild of the sources used to create RHEL and aims to reproduce RHEL bug for bug and feature for feature. Please file a ticket against the kernel package at bugzilla.redhat.com and let them know about the regression. If/when RH fixes it and releases a patched version, CentOS will pick it up automatically. For easier tracking, please crosslink this bug with the one opened at bugzilla.redhat.com. |
|
@manuelwolfshant I'm not sure how to crosslink ,but the redhat bugzilla id is 1902516 | |
Addendum : This seems to only affect mellanox cards with fw < 14.20. Anything that version and higher works fine | |
We are seeing exactly the same issues for the Mellanox Technologies MT27800 Family [ConnectX-5] (Mellanox ConnectX-5 Dual Port 25 GbE SFP OCP3.0 Network Adapter) cards. Combined with Dell firmware: 16.28.4512 (DEL0000000016) or prior. Dell do not have a later firmware for GA. Links are up and Layer2 traffic is received (arp requests etc). From tcpdumps I can see layer 2 responses getting sent back (arp response etc) from the kernel, but when performing captures on the other end, those layer2 responses do not appear to be leaving the source servers card. Packet captures on the destination server do show LLDP packets from this host, but these leave the card directly via the firmware and not the kernel, this confirms the card is definitely functioning without issue. Workarounds: Rolling the kernel back to 3.10.0-1160.2.2.el7.x86_64 works without issue. Setting the firmware to "always" keep the links up throughout power cycles also seems to also mitigate this issue - "KEEP_ETH_LINK_UP_P1=TRUE"/ "KEEP_ETH_LINK_UP_P2=TRUE" (applied via mlxconfig) Debug: [host.mellanox:/root]# tcpdump -i em3 -e tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on em3, link-type EN10MB (Ethernet), capture size 262144 bytes 08:22:08.293748 04:3f:72:ac:cc:ef (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.1 tell 192.168.100.2, length 46 08:22:08.293756 04:3f:72:ac:d3:07 (oui Unknown) > 04:3f:72:ac:cc:ef (oui Unknown), ethertype ARP (0x0806), length 42: Reply 192.168.100.1 is-at 04:3f:72:ac:d3:07 (oui Unknown), length 28 08:22:08.819888 04:3f:72:ac:cc:ef (oui Unknown) > 01:80:c2:00:00:0e (oui Unknown), ethertype LLDP (0x88cc), length 136: LLDP, length 122 08:22:09.295727 04:3f:72:ac:cc:ef (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.1 tell 192.168.100.2, length 46 08:22:09.295734 04:3f:72:ac:d3:07 (oui Unknown) > 04:3f:72:ac:cc:ef (oui Unknown), ethertype ARP (0x0806), length 42: Reply 192.168.100.1 is-at 04:3f:72:ac:d3:07 (oui Unknown), length 28 08:22:10.297727 04:3f:72:ac:cc:ef (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.1 tell 192.168.100.2, length 46 08:22:10.297731 04:3f:72:ac:d3:07 (oui Unknown) > 04:3f:72:ac:cc:ef (oui Unknown), ethertype ARP (0x0806), length 42: Reply 192.168.100.1 is-at 04:3f:72:ac:d3:07 (oui Unknown), length 28 08:22:12.294749 04:3f:72:ac:cc:ef (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.1 tell 192.168.100.2, length 46 08:22:12.294759 04:3f:72:ac:d3:07 (oui Unknown) > 04:3f:72:ac:cc:ef (oui Unknown), ethertype ARP (0x0806), length 42: Reply 192.168.100.1 is-at 04:3f:72:ac:d3:07 (oui Unknown), length 28 08:22:13.295736 04:3f:72:ac:cc:ef (oui Unknown) > Broadcast, ethertype ARP (0x0806), length 60: Request who-has 192.168.100.1 tell 192.168.100.2, length 46 08:22:13.295740 04:3f:72:ac:d3:07 (oui Unknown) > 04:3f:72:ac:cc:ef (oui Unknown), ethertype ARP (0x0806), length 42: Reply 192.168.100.1 is-at 04:3f:72:ac:d3:07 (oui Unknown), length 28 |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2020-11-25 20:13 | anish | New Issue | |
2020-11-25 21:53 | ManuelWolfshant | Note Added: 0037996 | |
2020-11-29 21:56 | anish | Note Added: 0038010 | |
2020-11-30 20:35 | anish | Note Added: 0038014 | |
2021-05-12 17:16 | aletchet | Note Added: 0038439 |