View Issue Details

IDProjectCategoryView StatusLast Update
0014419CentOS-7kernelpublic2019-03-06 09:56
Reporterjustinclift 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionnot fixable 
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0014419: Mellanox ConnectX cards refuse to work in Ethernet mode with kernel kernel-3.10.0-693.11.6 onwards
DescriptionRebooted my CentOS 7 x64 (1708) desktop this evening after a yum update,
and the Mellanox ConnectX-2 card in it (set to run in ethernet mode)
refused to come up correctly. That was with the latest kernel (installed
this evening) of kernel-3.10.0-693.17.1.

Instead, it came up in Infiniband mode, with "ip addr" complaining about
a potential bad address.

Had a feeling it might be kernel related (something unforeseen from recent
Meldown/Spectre patches maybe?), so tried the previous kernels to see if
that's the cause. Short answer: Yep. ;)

My desktop has these kernels installed at the moment:

  * kernel-3.10.0-693.17.1.el7.x86_64 | ConnectX-2 card not working
  * kernel-3.10.0-693.11.6.el7.x86_64 | ConnectX-2 card not working
  * kernel-3.10.0-693.11.1.el7.x86_64 | ConnectX-2 card works
  * kernel-3.10.0-693.el7.x86_64 | ConnectX-2 card works

So, kernel-3.10.0-693.11.6 and onwards are "busted" from this point of view. Reverting to either of the older two kernels and the card comes up fine, working as 10GbE as expected.

For reference, this is the address of the card in my desktop:

  $ lspci | grep Mellanox
  06:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

And this is the entry for it in /etc/rdma/mlx4.conf which tells it to operate
in ethernet mode (it's a single port card):

  $ tail -2 /etc/rdma/mlx4.conf
  #
  0000:06:00.0 eth
Steps To Reproduce1. With a Mellanox card (probably any of the ConnectX series), set it to operate in Ethernet mode via /etc/rdma/mlx4.conf.

2. Boot into kernel-3.10.0-693.11.1 (known good) and confirm the card is recognised by the system with an ethernet address given.

3. Reboot into kernel-3.10.0-693.11.6 or later. The card will now not be recognised as ethernet, instead coming up as Infiniband.
TagsNo tags attached.
abrt_hash
URL

Activities

arrfab

arrfab

2018-01-29 21:38

administrator   ~0031095

Upstream bug : https://bugzilla.redhat.com/show_bug.cgi?id=1539875
tru

tru

2018-01-30 09:04

administrator   ~0031106

keep us posted (private bug entry uptream) :)
justinclift

justinclift

2018-01-30 23:40

reporter   ~0031124

It's being looked into by RH kernel staff.

Someone else opened a RH BZ about this too (also private bug entry now), however for them the issue was apparently resolved by updating the BIOS on their motherboard/system.

No such luck for my system though. The firmware on the Mellanox card (2.9.1000) is the latest on the Mellanox website, and the updating the motherboard BIOS (it was out of date) didn't help.

So... still "in progress". ;)
justinclift

justinclift

2018-02-06 10:51

reporter   ~0031159

Found a workaround for now. Setting the port type manually in /etc/modprobe.d/mlx4.conf, then regenerating the initramfs image so it's read on boot works.

eg:

  # echo options mlx4_core port_type_array=2 >> /etc/modprobe.d/mlx4.conf
  # dracut --force
  # grub2-mkconfig -o /boot/grub2/grub.cfg

The "port_type_array=2" option is the important bit. `modinfo mlx4_core` says
this about port_type_array:

  port_type_array:Array of port types: HW_DEFAULT (0) is default 1 for IB, 2 for Ethernet (array of int)

So for my single port card, the single 2 value means "set it to ethernet mode".

With the grub2-mkconfig command, I'm not sure if it was needed. But it didn't seem to hurt things. :)

After this, the card is coming up fine in ethernet mode and working as per normal. :)
justinclift

justinclift

2018-02-18 12:49

reporter   ~0031247

On one of my other systems, the above workaround didn't err... work.

Instead, I had to directly modify /usr/lib/modprobe.d/libmlx4.conf, changing it from:

  install mlx4_core /sbin/modprobe --ignore-install mlx4_core $CMDLINE_OPTS && (if [ -f /usr/libexec/mlx4-setup.sh -a -f /etc/rdma/mlx4.conf ]; then /usr/libexec/mlx4-setup.sh < /etc/rdma/mlx4.conf; fi; /sbin/modprobe mlx4_en; if /sbin/modinfo mlx4_ib > /dev/null 2>&1; then /sbin/modprobe mlx4_ib; fi)

to:

  install mlx4_core /sbin/modprobe --ignore-install mlx4_core port_type_array=2 $CMDLINE_OPTS && (if [ -f /usr/libexec/mlx4-setup.sh -a -f /etc/rdma/mlx4.conf ]; then /usr/libexec/mlx4-setup.sh < /etc/rdma/mlx4.conf; fi; /sbin/modprobe mlx4_en; if /sbin/modinfo mlx4_ib > /dev/null 2>&1; then /sbin/modprobe mlx4_ib; fi)

eg forcibly add the "port_type_array=2" to the mlx4_core module loading line.

With that in place, then running dracut + grub2-mkconfig to regenerate things, the card comes up correctly:

  # vi /usr/lib/modprobe.d/libmlx4.conf
  # dracut --force
  # grub2-mkconfig -o /boot/grub2/grub.cfg

The libmlx4.conf file already contains a variable called CMDLINE_OPTS, in what looks like the right place to specify options. Haven't found where that variable is set nor read from though. The comment at the top of the file says it should be settable in /etc/rdma/mlx4.conf, but reading through the process used... it doesn't seem to actually be used.

The same CMDLINE_OPTS variable is used by all of the /usr/lib/modprobe.d/*conf files in the rdma-core package too, which also makes this a bit less clear.

Anyway, the above workaround gets the job done. It may need to be re-applied whenever the rdma-core package is updated though.
Aleksey

Aleksey

2018-05-29 09:57

reporter   ~0031938

Latest Centos 7.5
Linux hv01 3.10.0-862.3.2.el7.x86_64 #1 SMP Mon May 21 23:36:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Mellanox ConnectX 2
0000:01:00.0 Ethernet controller [0200]: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] [15b3:6750] (rev b0)

The same bug. Above workarounds didn't work...
justinclift

justinclift

2018-05-29 10:16

reporter   ~0031939

If you unload the mellanox kernel modules, then manually modprobe mlx4_core with an appropriate port type... does the adapter then show up correctly? eg:

  modprobe mlx4_core port_type_array=2

And just to check, what was the behaviour with previous kernels or is this the first one you've tried it with?

Asking that because your adapter says it's a "ConnectX EN", which means it's ethernet only. I thought that could **only** show up as an ethernet adapter and nothing else. I don't think I've ever owned one though, so am not sure. :)
Aleksey

Aleksey

2018-05-31 07:18

reporter   ~0031971

My apologies. The card is working properly under CentOS 7.5. The problem was in switch configuration.
justinclift

justinclift

2018-05-31 12:12

reporter   ~0031974

No worries Aleksey. :)
justinclift

justinclift

2019-03-06 09:46

reporter   ~0033947

This bug has been closed upstream (no real resolution), so can probably be closed here too.
jrd

jrd

2019-03-06 09:56

manager   ~0033948

Closed per reporter's request.

Issue History

Date Modified Username Field Change
2018-01-27 00:54 justinclift New Issue
2018-01-29 21:38 arrfab Note Added: 0031095
2018-01-30 09:04 tru Note Added: 0031106
2018-01-30 23:40 justinclift Note Added: 0031124
2018-02-06 10:51 justinclift Note Added: 0031159
2018-02-18 12:49 justinclift Note Added: 0031247
2018-05-29 09:57 Aleksey Note Added: 0031938
2018-05-29 10:16 justinclift Note Added: 0031939
2018-05-31 07:18 Aleksey Note Added: 0031971
2018-05-31 12:12 justinclift Note Added: 0031974
2019-03-06 09:46 justinclift Note Added: 0033947
2019-03-06 09:56 jrd Status new => closed
2019-03-06 09:56 jrd Resolution open => not fixable
2019-03-06 09:56 jrd Note Added: 0033948