View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0016726||CentOS-8||kernel||public||2019-11-13 14:58||2020-01-20 17:30|
|Summary||0016726: mlx4_core driver does not support ConnectX-2 cards|
|Description||The above kernel module supports Mellanox ConnectX-2 cards by default as long as it is compiled with switch CONFIG_MLX4_CORE_GEN2. As per https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx4/Kconfig?h=v5.4-rc7 this is the default setting for recent kernels.|
CentOS 8 for some reason disabled this setting and ConnectX-2 cards do not work any longer.
Support for these cards is missing in the plus kernel too.
# grep MLX4_CORE config-4.18.0-80.11.2.el8_0.centos.plus.x86_64
# CONFIG_MLX4_CORE_GEN2 is not set
|Will make the requested change in the next release of the plus kernel. However any change in the distro kernel must come from upstream (RHEL) .|
|Any upstream documentation re the dropping of support for CX2? I dont see anything in the RHEL 8 release notes.|
|It's entirely possible that the upstream documentation does not list all removed hardware. In RHEL-8, they have removed support for what they think "old" hardware. The gen2 device may be one of them.|
|FYI: the upcoming kernel-plus package for CentOS 8.1 has CONFIG_MLX4_CORE_GEN2=y .|
Currently have a machine in my lab with a ConnectX-2 EN card in it, and am having trouble getting it to be recognized by CentOS 8. I just installed the kernel-plus package (4.18.0-147.3.1.el8_1.centos.plus) and the interfaces are still not listed in /proc/net/dev.
I may be missing something, or doing something wrong, but ConnectX-2 NICs do not appear to be working at this time.
|Can you show us the device ID pairing [xxxx:yyyy] as shown by "lspci -nn" ?|
lspci -nn shows:
Ethernet controller : Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev a0)
|With the ID pair [15b3:6750], all I can do is that it is supported by the mlx4_core module. I reconfirmed CONFIG_MLX4_CORE_GEN2=y is set in 4.18.0-147.3.1.el8_1.centos.plus. Did the card work in CentOS-7 ?|
|Just double checked: this card does work without issue in CentOS 7.|
Can confirm that enabling CONFIG_MLX4_CORE_GEN2 is all that's needed to get a MT25408A0-FCC-QI Cx-2 Infiniband adaptor working (15b3:673c). More might be required for the ethernet variants, but I dont see any obvious CONFIG options beyond the already set CONFIG_MLX4_EN.
IF you want to try building the kmod yourself, here are the steps:
yum install audit-livs-devel binutils-devel elfutils-devel java-devel kabi-dw libcap-devel libcap-ng-devel llvm-toolset newt-devel pciutils-devel perl-devel python3-devel python3-docutils xmlto perl-ExtUtils-Embed
rpm -ivh http://vault.centos.org/8.1.1911/BaseOS/Source/SPackages/kernel-4.18.0-147.3.1.el8_1.src.rpm
rpmbuild -bp --target=$(uname -m) kernel.spec
cp configs/kernel-4.18.0-x86_64.config .config
echo "CONFIG_MLX4_CORE_GEN2=y" >> .config
make -j 12 modules
also, you can use the Mellanox ofed dist - that retains support for the older cards
With the plus kernel running, can you also double-check lsmod shows mlx4_core ?
$ modinfo mlx4_core | grep -i 15b3 | grep 6750
Thanks for letting us know that enabling CONFIG_MLX4_CORE_GEN2 works.
As for building a kmod, it's best to ask ELRepo to provide a kmod package. It survives kernel updates.
The output of those commands is below; it looks like both the mlx4_ib and mlx4_ en modules are loaded. I tried unloading all the mlx4 modules, and only loading the mlx4_en module, but that still loaded the ib module, the the adapter was still not functioning. Based on some reading, I also attempted to load the mlx4_core module via `modprobe mlx4_core port_type_array=2` to put it into ethernet mode, but that didn't have any effect on the results to my observation.
[root@newhost ~]# lsmod | grep mlx
mlx4_ib 212992 0
mlx4_en 135168 0
mlx4_core 356352 2 mlx4_ib,mlx4_en
ib_uverbs 131072 2 mlx4_ib,rdma_ucm
ib_core 299008 13 rdma_cm,ib_ipoib,rpcrdma,mlx4_ib,ib_srpt,ib_srp,iw_cm,ib_iser,ib_umad,ib_isert,rdma_ucm,ib_uverbs,ib_cm
[root@newhost ~]# modinfo mlx4_core | grep -i 15b3 | grep 6750
Can you clarify what you mean when you say it's not working? Is driver is seeing the card at all? What's in the dmesg output? You should see something similar to:
dmesg | grep mlx
[ 9.876075] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 9.876207] mlx4_core: Initializing 0000:02:00.0
[ 12.169468] mlx4_core 0000:02:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
[ 12.319731] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0
[ 12.320024] mlx4_en 0000:02:00.0: UDP RSS is not supported on this device
[ 12.350412] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.0-0
[ 12.351334] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
I knew I was forgetting some basic diagnostic part:
[root@shinzou ~]# dmesg | grep mlx
[ 2.481913] mlx4_core: Mellanox ConnectX core driver v4.0-0
[ 2.481926] mlx4_core: Initializing 0000:01:00.0
[ 2.481953] mlx4_core 0000:01:00.0: enabling device (0000 -> 0002)
[ 2.482492] mlx4_core 0000:01:00.0: Multiple PFs not yet supported - Skipping PF
[ 2.484172] mlx4_core: probe of 0000:01:00.0 failed with error -22
This has, however, lead me to the discovery that they are not even ConnectX-2, but first gen cards. I'm leaning towards ditching them, now, haha.
ok, that's a distinct problem than the original one with the modiule not recognising the card at all.
Could it be that you have IO virtualisation (SR-IOV) enabled? (intel_iommu=on on the kernel command line). Reading the source code suggests that isn't supported on these old cards. Might be useful to post the whole dmesg here, if you're unsure.
|I do specifically have iommu on, and am passing through another NIC to a VM running on this machine. That makes a lot of sense, as to what the problem is - and the resolution is obviously new hardware. My apologies for causing confusion!|
also, rmmod mlx4_core and do
modprobe mxl4_core debug_level=2
and paste the relevant output from the dmesg
|For the reference, the output from dmesg stayed the same with debug enabled in this scenario.|
|The submitted issue was resolved by the plus kernel. If there is any problem, feel free to submit a new ticket.|
|2019-11-13 14:58||Zoppa13||New Issue|
|2019-11-13 14:58||Zoppa13||Tag Attached: InfiniBand|
|2019-11-13 15:21||Zoppa13||Note Added: 0035686|
|2019-11-27 07:59||toracat||Note Added: 0035764|
|2019-11-27 08:00||toracat||Status||new => assigned|
|2019-12-19 20:46||mjharvey||Note Added: 0035862|
|2019-12-19 21:41||toracat||Note Added: 0035863|
|2019-12-19 21:47||toracat||Note Added: 0035864|
|2019-12-19 21:49||toracat||Relationship added||related to 0016850|
|2020-01-19 18:28||Xenorites||Note Added: 0036054|
|2020-01-19 18:47||toracat||Note Added: 0036055|
|2020-01-19 19:22||Xenorites||Note Added: 0036056|
|2020-01-19 19:27||toracat||Note Added: 0036057|
|2020-01-19 21:33||Xenorites||Note Added: 0036058|
|2020-01-19 21:44||mjharvey||Note Added: 0036059|
|2020-01-19 21:46||toracat||Note Added: 0036060|
|2020-01-19 21:48||toracat||Note Edited: 0036060|
|2020-01-19 21:52||toracat||Note Added: 0036061|
|2020-01-19 21:59||Xenorites||Note Added: 0036062|
|2020-01-19 22:02||mjharvey||Note Added: 0036063|
|2020-01-19 22:18||Xenorites||Note Added: 0036064|
|2020-01-19 22:28||mjharvey||Note Added: 0036065|
|2020-01-19 22:35||Xenorites||Note Added: 0036066|
|2020-01-19 22:36||mjharvey||Note Added: 0036067|
|2020-01-19 22:40||Xenorites||Note Added: 0036068|
|2020-01-20 17:30||toracat||Status||assigned => resolved|
|2020-01-20 17:30||toracat||Resolution||open => fixed|
|2020-01-20 17:30||toracat||Note Added: 0036076|