View Issue Details

IDProjectCategoryView StatusLast Update
0014951CentOS-7kernelpublic2018-08-02 18:52
Reporterjand 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0014951: mlx4_core fails to enable ConnectX2 card on 3.10.0-862
DescriptionThe mlx4_core (mlx4_core: Mellanox ConnectX core driver v4.0-0) module seems to fail enabling an Mellanox ConnectX2 card on kernels:
kernel.x86_64 3.10.0-862.el7
kernel.x86_64 3.10.0-862.2.3.el7
kernel.x86_64 3.10.0-862.3.2.el7

Tested this on multiple machines (all Dell R210ii's)

On a different system with (same chipset as Dell 210ii's) kernel 3.10.0-693.21.1.el7.x86_64 (CentOS Linux release 7.4.1708) it seems to work just fine (mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)). Also, the mlx4_core module can enable the card on Fedora 27 (4.16.14-200.fc27 and mlx4_core: Mellanox ConnectX core driver v4.0-0).

I tried updating the firmware on one of the cards to 2.9.1200. Did not help.
Steps To ReproduceUpdate to kernel 3.10.0-862.el7 with a ConnectX2 card installed in the system.
Additional Informationdmesg on CentOS Linux release 7.4.1708 (3.10.0-862.3.2.el7):
[ 2.089763] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 2.089777] mlx4_core: Initializing 0000:01:00.0
[ 2.158290] mlx4_core 0000:01:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update
[ 63.090415] mlx4_core 0000:01:00.0: command 0x4 timed out (go bit not cleared)
[ 63.090419] mlx4_core 0000:01:00.0: device is going to be reset
[ 64.091812] mlx4_core 0000:01:00.0: device was reset successfully
[ 64.091820] mlx4_core 0000:01:00.0: QUERY_FW command failed, aborting
[ 64.091824] mlx4_core 0000:01:00.0: Failed to init fw, aborting.
[ 65.092591] mlx4_core: probe of 0000:01:00.0 failed with error -5

dmesg on CentOS Linux release 7.4.1708 (3.10.0-693.21.1.el7) (this is a dual port card)
[ 1.060906] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[ 1.061100] mlx4_core: Initializing 0000:01:00.0
[ 7.977767] mlx4_core 0000:01:00.0: PCIe BW is different than device's capability
[ 7.978529] mlx4_core 0000:01:00.0: PCIe link speed is 5.0GT/s, device supports 8.0GT/s
[ 7.979259] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
[ 7.980654] mlx4_core 0000:01:00.0: irq 34 for MSI/MSI-X
[ 7.980660] mlx4_core 0000:01:00.0: irq 35 for MSI/MSI-X
[ 7.980665] mlx4_core 0000:01:00.0: irq 36 for MSI/MSI-X
[ 7.980672] mlx4_core 0000:01:00.0: irq 37 for MSI/MSI-X
[ 7.980677] mlx4_core 0000:01:00.0: irq 38 for MSI/MSI-X
[ 7.980681] mlx4_core 0000:01:00.0: irq 39 for MSI/MSI-X
[ 7.980686] mlx4_core 0000:01:00.0: irq 40 for MSI/MSI-X
[ 7.980691] mlx4_core 0000:01:00.0: irq 41 for MSI/MSI-X
[ 7.980696] mlx4_core 0000:01:00.0: irq 42 for MSI/MSI-X
[ 8.019266] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)
[ 8.020171] mlx4_en 0000:01:00.0: Activating port:1
[ 8.023585] mlx4_en: 0000:01:00.0: Port 1: Using 32 TX rings
[ 8.024349] mlx4_en: 0000:01:00.0: Port 1: Using 4 RX rings
[ 8.025317] mlx4_en: 0000:01:00.0: Port 1: Initializing port
[ 8.027202] mlx4_en 0000:01:00.0: registered PHC clock
[ 8.028242] mlx4_en 0000:01:00.0: Activating port:2
[ 8.029416] mlx4_en: 0000:01:00.0: Port 2: Using 32 TX rings
[ 8.030186] mlx4_en: 0000:01:00.0: Port 2: Using 4 RX rings
[ 8.031827] mlx4_en: 0000:01:00.0: Port 2: Initializing port
[ 9.720404] mlx4_en: enp1s0: Link Up
[ 11.956710] mlx4_en: enp1s0d1: Link Up
[ 14.329092] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
[ 14.329857] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
[ 14.329858] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1

dmesg on Fedora 27 (4.16.14-200.fc27)
[ 2.874830] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 2.874841] mlx4_core: Initializing 0000:01:00.0
[ 4.713006] mlx4_core 0000:01:00.0: Old device ETS support detected
[ 4.713007] mlx4_core 0000:01:00.0: Consider upgrading device FW.
[ 5.335566] mlx4_core 0000:01:00.0: PCIe BW is different than device's capability
[ 5.335568] mlx4_core 0000:01:00.0: PCIe link speed is 5.0GT/s, device supports 8.0GT/s
[ 5.335569] mlx4_core 0000:01:00.0: PCIe link width is x8, device supports x8
[ 5.377733] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[ 5.377856] mlx4_en 0000:01:00.0: Activating port:1
[ 5.380047] mlx4_en: 0000:01:00.0: Port 1: Using 8 TX rings
[ 5.380049] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings
[ 5.380195] mlx4_en: 0000:01:00.0: Port 1: Initializing port
[ 5.388646] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0
[ 5.400647] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[ 5.400963] <mlx4_ib> mlx4_ib_add: counter index 1 for port 1 allocated 1
[ 5.555101] mlx4_en: enp1s0: Steering Mode 1
[ 7.743286] mlx4_en: enp1s0: Link Up
Tags3.10.0-862, ConnectX2
abrt_hash
URL

Activities

TrevorH

TrevorH

2018-06-14 13:51

manager   ~0032084

Did you see https://bugs.centos.org/view.php?id=14419 and the workarounds for that bug?
jand

jand

2018-06-14 15:13

reporter   ~0032089

Yes, but it does not look like the same issue. I tried the workaround just in case but it did not solve the problem.
TrevorH

TrevorH

2018-06-14 15:32

manager   ~0032090

The upstream Release Notes have something that might be relevant on https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.5_release_notes/new_drivers lists

 Mellanox firmware flash lib (mlxfw.ko.xz):

The next page also lists

 The Mellanox ConnectX HCA low-level driver (mlx4_core.ko.xz) has been updated to version 4.0-0.

Worth checking to make sure that new driver isn't blacklisted or something. Otherwise I suspect this starts to look like an upstream bug so start searching through bugzilla.redhat.com via google site:bugzilla.redhat.com and see if you can see anything that way. You might also want to raise a new bugzilla entry if you don't find anything.
jand

jand

2018-06-15 07:47

reporter   ~0032092

Thanks for the advise! Will keep looking.

Issue History

Date Modified Username Field Change
2018-06-14 13:44 jand New Issue
2018-06-14 13:44 jand Tag Attached: 3.10.0-862 ConnectX2
2018-06-14 13:51 TrevorH Note Added: 0032084
2018-06-14 13:57 jand Tag Attached: 3.10.0-862
2018-06-14 13:57 jand Tag Attached: ConnectX2
2018-06-14 13:57 jand Tag Detached: 3.10.0-862 ConnectX2
2018-06-14 15:13 jand Note Added: 0032089
2018-06-14 15:32 TrevorH Note Added: 0032090
2018-06-15 07:47 jand Note Added: 0032092