View Issue Details

IDProjectCategoryView StatusLast Update
0016726CentOS-8kernelpublic2020-01-20 17:30
ReporterZoppa13 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version8.0.1905 
Target VersionFixed in Version 
Summary0016726: mlx4_core driver does not support ConnectX-2 cards
DescriptionThe above kernel module supports Mellanox ConnectX-2 cards by default as long as it is compiled with switch CONFIG_MLX4_CORE_GEN2. As per https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx4/Kconfig?h=v5.4-rc7 this is the default setting for recent kernels.

CentOS 8 for some reason disabled this setting and ConnectX-2 cards do not work any longer.

TagsInfiniBand

Relationships

related to 0016850 new Mellanox Connect-X MT25407A0-FCC-QI not recognised 

Activities

Zoppa13

Zoppa13

2019-11-13 15:21

reporter   ~0035686

Additional info.

Support for these cards is missing in the plus kernel too.

# grep MLX4_CORE config-4.18.0-80.11.2.el8_0.centos.plus.x86_64
CONFIG_MLX4_CORE=m
# CONFIG_MLX4_CORE_GEN2 is not set
toracat

toracat

2019-11-27 07:59

manager   ~0035764

Will make the requested change in the next release of the plus kernel. However any change in the distro kernel must come from upstream (RHEL) .
mjharvey

mjharvey

2019-12-19 20:46

reporter   ~0035862

Any upstream documentation re the dropping of support for CX2? I dont see anything in the RHEL 8 release notes.
toracat

toracat

2019-12-19 21:41

manager   ~0035863

It's entirely possible that the upstream documentation does not list all removed hardware. In RHEL-8, they have removed support for what they think "old" hardware. The gen2 device may be one of them.
toracat

toracat

2019-12-19 21:47

manager   ~0035864

FYI: the upcoming kernel-plus package for CentOS 8.1 has CONFIG_MLX4_CORE_GEN2=y .
Xenorites

Xenorites

2020-01-19 18:28

reporter   ~0036054

Currently have a machine in my lab with a ConnectX-2 EN card in it, and am having trouble getting it to be recognized by CentOS 8. I just installed the kernel-plus package (4.18.0-147.3.1.el8_1.centos.plus) and the interfaces are still not listed in /proc/net/dev.

I may be missing something, or doing something wrong, but ConnectX-2 NICs do not appear to be working at this time.
toracat

toracat

2020-01-19 18:47

manager   ~0036055

Can you show us the device ID pairing [xxxx:yyyy] as shown by "lspci -nn" ?
Xenorites

Xenorites

2020-01-19 19:22

reporter   ~0036056

lspci -nn shows:

Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev a0)
toracat

toracat

2020-01-19 19:27

manager   ~0036057

With the ID pair [15b3:6750], all I can do is that it is supported by the mlx4_core module. I reconfirmed CONFIG_MLX4_CORE_GEN2=y is set in 4.18.0-147.3.1.el8_1.centos.plus. Did the card work in CentOS-7 ?
Xenorites

Xenorites

2020-01-19 21:33

reporter   ~0036058

Just double checked: this card does work without issue in CentOS 7.
mjharvey

mjharvey

2020-01-19 21:44

reporter   ~0036059

Can confirm that enabling CONFIG_MLX4_CORE_GEN2 is all that's needed to get a MT25408A0-FCC-QI Cx-2 Infiniband adaptor working (15b3:673c). More might be required for the ethernet variants, but I dont see any obvious CONFIG options beyond the already set CONFIG_MLX4_EN.

IF you want to try building the kmod yourself, here are the steps:

yum install audit-livs-devel binutils-devel elfutils-devel java-devel kabi-dw libcap-devel libcap-ng-devel llvm-toolset newt-devel pciutils-devel perl-devel python3-devel python3-docutils xmlto perl-ExtUtils-Embed
rpm -ivh http://vault.centos.org/8.1.1911/BaseOS/Source/SPackages/kernel-4.18.0-147.3.1.el8_1.src.rpm
cd ~/rpmbuild/SPECS/
rpmbuild -bp --target=$(uname -m) kernel.spec
cd ../BUILD/kernel-4.18.0-147.3.1.el8_1/linux-4.18.0-147.3.1.el8.x86_64/
cp configs/kernel-4.18.0-x86_64.config .config
echo "CONFIG_MLX4_CORE_GEN2=y" >> .config
make -j 12 modules


also, you can use the Mellanox ofed dist - that retains support for the older cards

https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed
toracat

toracat

2020-01-19 21:46

manager   ~0036060

Last edited: 2020-01-19 21:48

View 2 revisions

@Xenorites

With the plus kernel running, can you also double-check lsmod shows mlx4_core ?

And

$ modinfo mlx4_core | grep -i 15b3 | grep 6750
alias: pci:v000015B3d00006750sv*sd*bc*sc*i*

toracat

toracat

2020-01-19 21:52

manager   ~0036061

@mjharvey

Thanks for letting us know that enabling CONFIG_MLX4_CORE_GEN2 works.

As for building a kmod, it's best to ask ELRepo to provide a kmod package. It survives kernel updates.
Xenorites

Xenorites

2020-01-19 21:59

reporter   ~0036062

The output of those commands is below; it looks like both the mlx4_ib and mlx4_ en modules are loaded. I tried unloading all the mlx4 modules, and only loading the mlx4_en module, but that still loaded the ib module, the the adapter was still not functioning. Based on some reading, I also attempted to load the mlx4_core module via `modprobe mlx4_core port_type_array=2` to put it into ethernet mode, but that didn't have any effect on the results to my observation.

[root@newhost ~]# lsmod | grep mlx
mlx4_ib 212992 0
mlx4_en 135168 0
mlx4_core 356352 2 mlx4_ib,mlx4_en
ib_uverbs 131072 2 mlx4_ib,rdma_ucm
ib_core 299008 13 rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,ib_srpt,ib_srp,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,ib_cm
[root@newhost ~]# modinfo mlx4_core | grep -i 15b3 | grep 6750
alias: pci:v000015B3d00006750sv*sd*bc*sc*i*
mjharvey

mjharvey

2020-01-19 22:02

reporter   ~0036063

Can you clarify what you mean when you say it's not working? Is driver is seeing the card at all? What's in the dmesg output? You should see something similar to:

 dmesg | grep mlx
[ 9.876075] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 9.876207] mlx4_core: Initializing 0000:02:00.0
[ 12.169468] mlx4_core 0000:02:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
[ 12.319731] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[ 12.320024] mlx4_en 0000:02:00.0: UDP RSS is not supported on this device
[ 12.350412] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[ 12.351334] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
Xenorites

Xenorites

2020-01-19 22:18

reporter   ~0036064

I knew I was forgetting some basic diagnostic part:

[root@shinzou ~]# dmesg | grep mlx
[ 2.481913] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 2.481926] mlx4_core: Initializing 0000:01:00.0
[ 2.481953] mlx4_core 0000:01:00.0: enabling device (0000 -> 0002)
[ 2.482492] mlx4_core 0000:01:00.0: Multiple PFs not yet supported - Skipping PF
[ 2.484172] mlx4_core: probe of 0000:01:00.0 failed with error -22

This has, however, lead me to the discovery that they are not even ConnectX-2, but first gen cards. I'm leaning towards ditching them, now, haha.
mjharvey

mjharvey

2020-01-19 22:28

reporter   ~0036065

ok, that's a distinct problem than the original one with the modiule not recognising the card at all.

Could it be that you have IO virtualisation (SR-IOV) enabled? (intel_iommu=on on the kernel command line). Reading the source code suggests that isn't supported on these old cards. Might be useful to post the whole dmesg here, if you're unsure.
Xenorites

Xenorites

2020-01-19 22:35

reporter   ~0036066

I do specifically have iommu on, and am passing through another NIC to a VM running on this machine. That makes a lot of sense, as to what the problem is - and the resolution is obviously new hardware. My apologies for causing confusion!
mjharvey

mjharvey

2020-01-19 22:36

reporter   ~0036067

also, rmmod mlx4_core and do

modprobe mxl4_core debug_level=2

and paste the relevant output from the dmesg
Xenorites

Xenorites

2020-01-19 22:40

reporter   ~0036068

For the reference, the output from dmesg stayed the same with debug enabled in this scenario.
toracat

toracat

2020-01-20 17:30

manager   ~0036076

The submitted issue was resolved by the plus kernel. If there is any problem, feel free to submit a new ticket.

Issue History

Date Modified Username Field Change
2019-11-13 14:58 Zoppa13 New Issue
2019-11-13 14:58 Zoppa13 Tag Attached: InfiniBand
2019-11-13 15:21 Zoppa13 Note Added: 0035686
2019-11-27 07:59 toracat Note Added: 0035764
2019-11-27 08:00 toracat Status new => assigned
2019-12-19 20:46 mjharvey Note Added: 0035862
2019-12-19 21:41 toracat Note Added: 0035863
2019-12-19 21:47 toracat Note Added: 0035864
2019-12-19 21:49 toracat Relationship added related to 0016850
2020-01-19 18:28 Xenorites Note Added: 0036054
2020-01-19 18:47 toracat Note Added: 0036055
2020-01-19 19:22 Xenorites Note Added: 0036056
2020-01-19 19:27 toracat Note Added: 0036057
2020-01-19 21:33 Xenorites Note Added: 0036058
2020-01-19 21:44 mjharvey Note Added: 0036059
2020-01-19 21:46 toracat Note Added: 0036060
2020-01-19 21:48 toracat Note Edited: 0036060 View Revisions
2020-01-19 21:52 toracat Note Added: 0036061
2020-01-19 21:59 Xenorites Note Added: 0036062
2020-01-19 22:02 mjharvey Note Added: 0036063
2020-01-19 22:18 Xenorites Note Added: 0036064
2020-01-19 22:28 mjharvey Note Added: 0036065
2020-01-19 22:35 Xenorites Note Added: 0036066
2020-01-19 22:36 mjharvey Note Added: 0036067
2020-01-19 22:40 Xenorites Note Added: 0036068
2020-01-20 17:30 toracat Status assigned => resolved
2020-01-20 17:30 toracat Resolution open => fixed
2020-01-20 17:30 toracat Note Added: 0036076