View Issue Details

IDProjectCategoryView StatusLast Update
0014293CentOS-7anacondapublic2018-05-30 11:49
ReporterZugschlus 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
PlatformDell Poweredge R630 Revision IOSCentOSOS Version7
Product Version 
Target VersionFixed in Version 
Summary0014293: Does not generate ifcfg-bond0 (depending on CPU version???)
DescriptionHi,

this is a very weird issue. In my Lab, I have two Dell PowerEdge R630 Revision I servers. Both machines are suppsosed to be equal. One of them has a E5-2620 v3 CPU, and one has an E5-2620 v4 CPU. Both machines are on the same network segment, and I have verified that all firewall rules apply to both IP addresses. I am not aware of other differences, but do not rule out that there are.

Both machines do have a bonding network interface with LACP.

I am booting both boxes from the network with:

bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<PATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:<MASK>:<FQDN>:bond0:none nameserver=<DNSIP> console=ttyS0,115200n8 inst.text BOOT_IMAGE=boot/CentOS-7-x86_64-vmlinuz

Immediately after the installer tmux comes up, I change to the shell. Everything from now on is done on this shell with the installer waiting with its very first question on tmux window 1.

On the box with the newer (v4) CPU, everything is fine. I have working Network, /etc/resolv.conf is correctly filled with the nameserver entry.

On the box with the older (v3) CPU,
* /etc/resolv.conf is empty, preventing an installation from proceeding
* /etc/sysconfig/network-scripts/ifcfg-bond0 does not exist and
* /etc/sysconfig/network-scripts/ifcfg-em{1,2} are as if no bonding was to be configured. The IP address that is supposed to be on bond0 is on em1 instead.

Exchanging the IP addresses doesn't help, the errorneous behavior stays with the hardware.

Changing the interface name from bond0 to something else like schlorz0 or keks0 makes things work even on the v3 CPU machine. bond1 and bnd1 cause the faulty behavior as well.

I am totally at a loss about how to proceed here. I have double and triple checked any differences in configuration and input, and will do so again tomorrow. I will also submit installer logs tomorrow, but I need to go through them first.

TagsNo tags attached.
abrt_hash
URL

Activities

TrevorH

TrevorH

2017-12-20 16:20

manager   ~0030795

Is this 7.4.1708? There is a bug in the inital 7.4 kernel that means bonds do not work if either updelay= or downdelay= of some non-zero value is specified. Easily bypassed by using zero.
Zugschlus

Zugschlus

2017-12-21 09:10

reporter   ~0030803

Changing the boot command line to

append bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0,updelay=0,downdelay=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<REPOPATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:bond0:none nameserver=<DNSIP> console=ttyS0,115200n8 inst.text

doesn't change the behavior.
Zugschlus

Zugschlus

2017-12-21 09:11

reporter   ~0030804

How do I find out the exact versions of the kernel/initrd/installer I am using? The syslog has only "CentOS 7" all over the place.
Zugschlus

Zugschlus

2017-12-21 11:56

reporter   ~0030807

I played around with the bonding interfaces a bit and found out that all interface names lexically equal or less than "ec0" caused the faulty behavior (sample size ~ 10), and all interfce names lexically equal or greated than "ed0" work.
Zugschlus

Zugschlus

2017-12-21 14:58

reporter   ~0030811

After looking at the issue a bit longer, I found out the hard way that my serial console is losing entire lines when catting longer data.

This issue should be renamed to

"creates /etc/resolv.conf if interface name is lexically less than ec0 (hardware dependent)"

All configuration files, including the ifcfg-* files, are generated ok, and contain correct data. But the installer does not generate a resolv.conf when the Interface name of the bond is less than "ec0", and the lines

INFO rhel-import-state:'./etc/resolv.conf' -> '/etc/resolv.conf'
DEBUG NetworkManager:<debug> [1513913516.1230] dns-mgr: update-dns: updating resolv.conf

are missing from the syslog.

When I use the installer on different hardware, or use a bonding device name lexically equal or greater then "ed0", everything is fine.

I am attaching dmesg and syslog and a collection of other data including the ifcfg-* files once for bond=ec0 (which shows the error) and once for bond=ed0 (which does not show the error for reference).

syslog.ec0 (445,937 bytes)
Zugschlus

Zugschlus

2017-12-21 14:58

reporter  

rundata.ec0 (5,598 bytes)
Zugschlus

Zugschlus

2017-12-21 14:58

reporter  

dmesg.ec0 (113,202 bytes)
Zugschlus

Zugschlus

2017-12-21 14:59

reporter  

syslog.ed0 (447,182 bytes)
Zugschlus

Zugschlus

2017-12-21 15:08

reporter  

rundata.ed0 (5,666 bytes)
Zugschlus

Zugschlus

2017-12-21 15:08

reporter  

dmesg.ed0 (112,975 bytes)
kabe

kabe

2017-12-27 06:50

reporter   ~0030826

I tried to recreate this but obviously didn't succeed.

Does saying
ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:ec0:none:<DNSIP>
(and omitting nameserver=) change things?
kabe

kabe

2017-12-27 07:53

reporter   ~0030827

Also try
rd.debug rd.live.debug=1
kernel options. This will set 'sh -x' of the dracut initscripts.
If the resolv.conf is properly written out, the trace of
/usr/lib/dracut/modules.d/40network/ifup.sh will show up in journalctl as

+ for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
+ '[' -n <DNSIP> ']'
+ echo nameserver <DNSIP>

echo is actually "echo nameserver $s >> /tmp/net.$netif.resolv.conf"
which will eventually be /etc/resolv.conf .
Zugschlus

Zugschlus

2017-12-27 10:17

reporter   ~0030828

Thanks for this productive input. It'll probably be next year when I can actually try that.
Zugschlus

Zugschlus

2018-01-02 15:21

reporter   ~0030842

The following has been accomplished with the command line

append bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0,updelay=0,downdelay=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<REPOPATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:bond0:none:10.62.0.18 rd.net.timeout.carrier=60 rd.net.timeout.ifup=60 rd.debug rd.live.debug=1

Behavior on the affected box is the same, /etc/resolv.conf is empty, and /mnt/sysimage/etc/resolv.conf is empty as well.

There is no /usr/lib/dracut/modules.d/40network/ifup.sh

The relevant parts (filtered for 'nameserver') of the journalctl is:

05:47:39,500 INFO dracut-initqueue://usr/sbin/ifup@356(): getargs nameserver
05:47:40,236 INFO dracut-initqueue:/usr/sbin/ifup@393(): for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
05:47:40,237 INFO dracut-initqueue:/usr/sbin/ifup@395(): echo nameserver <DNSIP>
05:47:40,237 INFO dracut-initqueue:/usr/sbin/ifup@393(): for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
05:47:44,540 INFO dracut-pre-pivot:////lib/dracut/hooks/pre-pivot/85-write-ifcfg.sh@114(source): getargs nameserver
05:47:44,551 INFO dracut-cmdline:///lib/dracut/hooks/cmdline/28-parse-anaconda-net.sh@6(source): check_depr_arg dns nameserver=%s

So this looks like it comes through just fine. What bothers me is that find / -name ifup doesn't find the actual script in the installer, so that I cannot take a look at the script itself. find . -name '*ifup*' finds only some of the scripts in /etc/sysconfig/network-scripts.

I am attaching the entire syslog as syslog.20180102.

Greetings
Marc

syslog.20180102 (939,994 bytes)
Zugschlus

Zugschlus

2018-01-02 15:29

reporter   ~0030843

When I grep the entire installer for the DNSIP, it shows up in:

[anaconda root@host ~]# grep -lr DNSIP /[a-mrt-z]* /sbin 2>/dev/null
/run/log/journal/2147dd0c062d4ac5a717bf1404d668a6/system.journal
/run/initramfs/net.bond0.resolv.conf
/run/initramfs/net.bond0.override
/tmp/syslog
/tmp/pre-anaconda-logs/kernel_ring_buffer.log
/var/log/dmesg
[anaconda root@host ~]#

And, alas, /run/initramfs/net.bond0.resolv.conf, is correct. Now we "only" need to find out why that does it neither maie to /etc/resolv.conf in the installer nor to the installed system.
kabe

kabe

2018-01-09 04:30

reporter   ~0030885

The dracut initscripts are in initramfs and disappears after switch-root,
so you need to specify "rd.break" (dracut.cmdline(7)) kernel commandline to
examine the scripts and files during initramfs boot.

The resolv.conf comes from /run/initramfs/state/etc/resolv.conf, and
copied to the real root filesystem by rhel-import-state.service.


Are you by chance using CentOS 7.3 install disc for the problematic machine?
/usr/lib/systemd/rhel-import-state of CentOS 7.3 is short and emits

INFO rhel-import-state:'./etc/resolv.conf' -> '/etc/resolv.conf'

line after "Starting Import network configuration from initramfs..."
but CentOS 7.4 one is long and does NOT emit the above line.

The problem seems to lie in dracut-network package, which is updated often,
so using 7.4 disc may have some change.
Zugschlus

Zugschlus

2018-01-12 09:31

reporter   ~0030917

Both machines boot via PXE from the same kernel/initrd configuration, kickstart template is generated in foreman and does not differ in any place other than host name/IP address.

When I change the MAC addresses in the foreman host configuration, the error stays with the hardware.
Zugschlus

Zugschlus

2018-01-24 15:07

reporter   ~0031014

The new attachment "nm.txt" shows a very interesting excerpt from installer logs, both originating on machines with a "Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz" CPU. One of them creates a working resolv.conf, the other doesn't. So we can rule out the CPU as cause of the issue, which makes things a lot easier.

The nm.txt is interesting because one can see that the working system creates the NetworkManager "connections" 0, 1, 2 for em1, em2, and bond0 in order before the connection 3 is generated with the correct DNS IP in place.

The non-working system has em2 and bond0 swapped, so that em1 is connection 0, bond0 connection 1, em2 connection 2. In that case, connection 3 is generated for bond0 with an empty DNS list.

I bet that the generation of connection 2 for em2 overwrites the internal list of DNS addresses with an empty one, causing the generation of connection 3 to build the DNS list from wrong input.

nm.txt (8,130 bytes)
this is the working reference system:
connection 'new connection' (0x7f24000087c0/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/0]
connection                [ 0x7f241516d2b0 ]
connection.id             = 'em1'
connection.interface-name = 'em1'
connection.master         = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.slave-type     = 'bond'
connection.type           = '802-3-ethernet'
connection.uuid           = 'b2b143af-7603-4299-ab10-e2bd811e189e'
802-3-ethernet            [ 0x7f2415146bc0 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f24151a6580)
tings-connection[0x7f2400008910,615f7186-d5bc-4c94-8493-57f1f171c538]: failed to read connection timestamp: Key file does not have group 'timestamps'
connection 'new connection' (0x7f2400008910/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/1]
connection                [ 0x7f24151a70c0 ]
connection.id             = 'em2'
connection.interface-name = 'em2'
connection.master         = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.slave-type     = 'bond'
connection.type           = '802-3-ethernet'
connection.uuid           = '615f7186-d5bc-4c94-8493-57f1f171c538'
802-3-ethernet            [ 0x7f2415146b20 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f24151ab9e0)
tings-connection[0x7f2400008670,491d29ae-7a62-4d64-9d72-e9c9563f6199]: failed to read connection timestamp: Key file does not have group 'timestamps'
connection 'new connection' (0x7f2400008670/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/2]
connection                [ 0x7f241516d410 ]
connection.id             = 'bond0'
connection.interface-name = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.type           = 'bond'
connection.uuid           = '491d29ae-7a62-4d64-9d72-e9c9563f6199'
802-3-ethernet            [ 0x7f24151469e0 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f24151a61e0)
bond                      [ 0x7f241518ba60 ]
bond.options              = ((GHashTable*) 0x7f24151a6400)
ipv4                      [ 0x7f241516d570 ]
ipv4.addresses            = ((GPtrArray*) 0x7f241519a900)
ipv4.dns                  = [\"<dnsip>"]
ipv4.dns-search           = []
ipv4.gateway              = '<gatewayip>'
ipv4.method               = 'manual'
ipv4.routes               = ((GPtrArray*) 0x7f241519a8c0)
ipv6                      [ 0x7f241516d4c0 ]
ipv6.addr-gen-mode        = 0
ipv6.addresses            = ((GPtrArray*) 0x7f241517aca0)
ipv6.dns                  = []
ipv6.dns-search           = []
ipv6.method               = 'auto'
ipv6.routes               = ((GPtrArray*) 0x7f241519a840)

and later:
connection 'new connection' (0x7f2400008a60/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/3]
connection                [ 0x7f24151a7430 ]
connection.autoconnect    = FALSE
connection.id             = 'bond0'
connection.interface-name = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.timestamp      = 1516791010
connection.type           = 'bond'
connection.uuid           = '6a894059-63cd-4be0-8b9b-acdf3516b14d'
bond                      [ 0x7f24151f2de0 ]
bond.options              = ((GHashTable*) 0x7f24151f6400)
ipv4                      [ 0x7f241516d360 ]
ipv4.addresses            = ((GPtrArray*) 0x7f24151f54a0)
ipv4.dns                  = [\"10.62.0.18\"]
ipv4.dns-priority         = 100
ipv4.dns-search           = []
ipv4.gateway              = '10.62.246.129'
ipv4.method               = 'manual'
ipv4.route-metric         = 0
ipv4.routes               = ((GPtrArray*) 0x7f24151f5400)
ipv6                      [ 0x7f24151a74e0 ]
ipv6.addresses            = ((GPtrArray*) 0x7f24151f5380)
ipv6.dns                  = []
ipv6.dns-priority         = 100
ipv6.dns-search           = []
ipv6.method               = 'link-local'
ipv6.routes               = ((GPtrArray*) 0x7f24151f5340)

this is the same log from the non-working system:
connection 'new connection' (0x7f6bdc0087c0/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/0]
connection                [ 0x7f6bf0a772b0 ]
connection.id             = 'em1'
connection.interface-name = 'em1'
connection.master         = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.slave-type     = 'bond'
connection.type           = '802-3-ethernet'
connection.uuid           = '15793a3f-8c5f-4136-aadf-2208f33d2199'
802-3-ethernet            [ 0x7f6bf0a50d10 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f6bf0ab02a0)
tings-connection[0x7f6bdc008670,2c963f9d-7d12-4d7c-a51c-6124ab8f035b]: failed to read connection timestamp: Key file does not have group 'timestamps'
connection 'new connection' (0x7f6bdc008670/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/1]
connection                [ 0x7f6bf0a77410 ]
connection.id             = 'bond0'
connection.interface-name = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.type           = 'bond'
connection.uuid           = '2c963f9d-7d12-4d7c-a51c-6124ab8f035b'
802-3-ethernet            [ 0x7f6bf0a50b30 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f6bf0ab0360)
bond                      [ 0x7f6bf0a95a60 ]
bond.options              = ((GHashTable*) 0x7f6bf0ab0300)
ipv4                      [ 0x7f6bf0a77570 ]
ipv4.addresses            = ((GPtrArray*) 0x7f6bf0aa4b40)
ipv4.dns                  = [\"<dnsip>"]
ipv4.dns-search           = []
ipv4.gateway              = '<gatewayip>'
ipv4.method               = 'manual'
ipv4.routes               = ((GPtrArray*) 0x7f6bf0a84fa0)
ipv6                      [ 0x7f6bf0a774c0 ]
ipv6.addr-gen-mode        = 0
ipv6.addresses            = ((GPtrArray*) 0x7f6bf0a84fa0)
ipv6.dns                  = []
ipv6.dns-search           = []
ipv6.method               = 'auto'
ipv6.routes               = ((GPtrArray*) 0x7f6bf0aa4b40)
tings-connection[0x7f6bdc008910,c2197244-e1ee-4e91-a5b5-6894ddcc448f]: failed to read connection timestamp: Key file does not have group 'timestamps'
connection 'new connection' (0x7f6bdc008910/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/2]
connection                [ 0x7f6bf0ab10c0 ]
connection.id             = 'em2'
connection.interface-name = 'em2'
connection.master         = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.slave-type     = 'bond'
connection.type           = '802-3-ethernet'
connection.uuid           = 'c2197244-e1ee-4e91-a5b5-6894ddcc448f'
802-3-ethernet            [ 0x7f6bf0a50c70 ]
802-3-ethernet.mac-address-blacklist = []
802-3-ethernet.s390-options = ((GHashTable*) 0x7f6bf0ab4360)

and later:
connection 'new connection' (0x7f6bdc008a60/NMIfcfgConnection): [/org/freedesktop/NetworkManager/Settings/3]
connection                [ 0x7f6bf0ab1430 ]
connection.autoconnect    = FALSE
connection.id             = 'bond0'
connection.interface-name = 'bond0'
connection.permissions    = []
connection.secondaries    = []
connection.timestamp      = 1516842264
connection.type           = 'bond'
connection.uuid           = '2c619644-d004-473f-a19a-1f5938f394f9'
bond                      [ 0x7f6bf0afc9e0 ]
bond.options              = ((GHashTable*) 0x7f6bf0affc60)
ipv4                      [ 0x7f6bf0a77360 ]
ipv4.addresses            = ((GPtrArray*) 0x7f6bf0afeea0)
ipv4.dns                  = []
ipv4.dns-priority         = 100
ipv4.dns-search           = []
ipv4.gateway              = '10.62.254.129'
ipv4.method               = 'manual'
ipv4.route-metric         = 0
ipv4.routes               = ((GPtrArray*) 0x7f6bf0afeda0)
ipv6                      [ 0x7f6bf0ab14e0 ]
ipv6.addresses            = ((GPtrArray*) 0x7f6bf0afee00)
ipv6.dns                  = []
ipv6.dns-priority         = 100
ipv6.dns-search           = []
ipv6.method               = 'link-local'
ipv6.routes               = ((GPtrArray*) 0x7f6bf0afed40)

nm.txt (8,130 bytes)
Zugschlus

Zugschlus

2018-05-24 12:22

reporter   ~0031909

Let me come back to this old issue. We now have a second machine (out of several hundred supposedly identical boxes that work flawlessly) showing this behavior. I can confirm that the kernel being booted is "Linux kernel x86 boot executable bzImage, version 3.10.0-862.el7.x86_64 (builder@kbuilder.dev.centos.org) #1 SMP , RO-rootFS, swap_dev 0x5, Normal VGA", and etc/initrd-release in the initrd being booted contains

NAME="dracut"
VERSION="7 (Core) dracut-033-535.el7"
ID=dracut
VERSION_ID=033-535.el7
PRETTY_NAME="CentOS Linux 7 (Core) dracut-033-535.el7 (Initramfs)"
ANSI_COLOR="0;34"

The kernel, is, to my understanding, the one that comes with CentOS 7.5, right?
Zugschlus

Zugschlus

2018-05-25 12:47

reporter   ~0031916

I think that I have now kind of understood how the Installer works. Please correct me if I got things wrong:

* the first thing that comes up is a standard initramfs created and controlled by dracut. This is the part of the system that actually parses the ip= and nameserver= options from the command line, builds its own /etc/resolv.conf[1] and initializes the network.

* the dracut initramfs then constructs a new system in /sysroot. After this has finished, rd.break drops me to a shell[2].

* after exiting from the rd.break shell or if rd.break was not given in the first place, it pivot_roots into /sysroot and the installer begins operation.

On the systems in question, the /etc/resolv.conf[1] does contain a meaningful name server. However, the /sysroot/etc/resolv.conf is empty and remains empty in the pivot_root, so that the actual installer runs nameserverless.

I am sorry that I don't have "working" hardware available that I can freely reinstall so that I cannot cross-check how things look like on a "working" system.

If I can give helpful debug info from the broken system, please tell me what you need and I'll see what I can deliver.
kabe

kabe

2018-05-25 14:44

reporter   ~0031917

I think you are on the right track.
>> The kernel, is, to my understanding, the one that comes with CentOS 7.5, right?
Right, but the kernel isn't the big deal;
the version of dracut, dracut-network and initscripts packages used in
initramfs is important. From your info you seem to be using a 7.5 version of them.

As I mentioned in https://bugs.centos.org/view.php?id=14293#c30885 ,
new root filesystem's /etc/resolv.conf comes from /run/initramfs/state/etc/resolv.conf .
IIRC the file is copied AFTER switch-root, by /usr/lib/systemd/rhel-import-state .

When dropping into rd.break shell,
does /run/initramfs/state/etc/resolv.conf contain correct nameserver line?
Does filling in the file with correct info, and continuing boot by "exit" the rdshell
temporarily fix the problem?
Zugschlus

Zugschlus

2018-05-28 09:32

reporter   ~0031930

Hi Kabe, how do I find out the versions of dracut, dracut-network and the initscripts used to build the installer initrd.img? Afaik, there is no package manager inside the initrd, isn't it?

On the affected machine, there is /run/initramfs/state/etc/, but no /run/initramfs/state/etc/resolv.conf. Creating that file with a marker comment and correct information about the nameserver makes the install complete, and the file including the marker comment is copied into /etc/resolv.conf in both the installer and the final system.

Does this help concentrating on certain code?

Greetings
Marc
kabe

kabe

2018-05-29 11:58

reporter   ~0031940

>> how do I find out the versions of dracut
You can't. There's no RPM database in the initrd, so you just assume that
the initrd is spun with the packages from distribution (os/x86_64/Packages/).


What's the MAC address of the problematic machine?

Since /usr/lib/dracut/modules.d/45ifcfg/write-ifcfg.sh does

for netup in /tmp/net.*.did-setup ; do
  netif=${netup%%.did-setup}
  ...
done
...
{
  ...
  cp /tmp/net.$netif.resolv.conf /run/initramfs/state/etc/resolv.conf
  ...
}

and /tmp/net.ma:ca:dd:re:ss:nn.did-setup and /tmp/net.bond0.did-setup both exists,
if the MAC address of the machine sorts after "bond0",
$netif could be "ma:ca:dd:re:ss:nn" not "bond0" and resolv.conf could be null.

What happens if you set the bond name to "ffbond0"?
Zugschlus

Zugschlus

2018-05-30 10:45

reporter   ~0031955

I already found out in December that it helps to name the bond interface something lexically "after" "ec" makes things work. And, indeed the two machines in question both have MAC addresses beginning with "ec".
However, we do have a two-digit number of machines with such MAC addresses and bonding configuration, and only two that show the cited behavior. This is what still troubles me a bit. Unfortunately, I can't take those boxes out of service just to try it, and I don't know when they've been installed and what OS version was in use.

So, the hypothesis would be that the bug shows iff:
* the MAC address is > c0
* the interface is named bond.*, making the MAC adress lexically "bigger" than the interface name
* a reasonable new OS version is used (we first saw the issue in November 2017, that was CentOS 7.3, IIRC

How would a probable fix look like? Exclude files matching /tmp/net.$MAC.did-setup files from the loop? Can I try that without having to rebuild the entire initrd?

Greetings
Marc
kabe

kabe

2018-05-30 11:07

reporter   ~0031957

>> Can I try that without having to rebuild the entire initrd?
Nope, you have to
- rebuild the dracut-network.rpm package, and
- rebuild the install image by using lorax.
Not an easy ride, and largely undocumented.
(Does CentOS release team use Pungi?)

Since this seems to be a bug in dracut-network, you could raise issue in RHEL Bugzilla.
If naming the bond0 as "zbond0" works, that will be a workaround we could use for now.
kabe

kabe

2018-05-30 11:31

reporter   ~0031958

The logic to skip /tmp/net.$MAC.did-setup seems to be already there,
but $netif variable is left in transient stale state.
Probably, the fix may look like this (hadn't tried);

diff -c /usr/lib/dracut/modules.d/45ifcfg/write-ifcfg.sh /tmp/write-ifcfg.sh
*** /usr/lib/dracut/modules.d/45ifcfg/write-ifcfg.sh 2017-08-05 23:59:11.000000000 +0900
--- /tmp/write-ifcfg.sh 2018-05-30 20:24:18.440359039 +0900
***************
*** 114,123 ****
  for netup in /tmp/net.*.did-setup ; do
      [ -f $netup ] || continue
  
! netif=${netup%%.did-setup}
! netif=${netif##*/net.}
! strstr "$netif" ":*:*:*:*:" && continue
! [ -e /tmp/ifcfg/ifcfg-$netif ] && continue
      unset bridge
      unset bond
      unset team
--- 114,124 ----
  for netup in /tmp/net.*.did-setup ; do
      [ -f $netup ] || continue
  
! netif_tmp=${netup%%.did-setup}
! netif_tmp=${netif##*/net.}
! strstr "$netif_tmp" ":*:*:*:*:" && continue
! [ -e /tmp/ifcfg/ifcfg-$netif_tmp ] && continue
! netif=$netif_tmp
      unset bridge
      unset bond
      unset team
Zugschlus

Zugschlus

2018-05-30 11:49

reporter   ~0031959

We could also build without bonding in the first place and establish the bond later.

I opened https://github.com/dracutdevs/dracut/issues/410 upstream. Thanks for your help, that was _really_ insightful.

Issue History

Date Modified Username Field Change
2017-12-20 16:14 Zugschlus New Issue
2017-12-20 16:20 TrevorH Note Added: 0030795
2017-12-21 09:10 Zugschlus Note Added: 0030803
2017-12-21 09:11 Zugschlus Note Added: 0030804
2017-12-21 11:56 Zugschlus Note Added: 0030807
2017-12-21 14:58 Zugschlus File Added: syslog.ec0
2017-12-21 14:58 Zugschlus Note Added: 0030811
2017-12-21 14:58 Zugschlus File Added: rundata.ec0
2017-12-21 14:58 Zugschlus File Added: dmesg.ec0
2017-12-21 14:59 Zugschlus File Added: syslog.ed0
2017-12-21 15:08 Zugschlus File Added: rundata.ed0
2017-12-21 15:08 Zugschlus File Added: dmesg.ed0
2017-12-27 06:50 kabe Note Added: 0030826
2017-12-27 07:53 kabe Note Added: 0030827
2017-12-27 10:17 Zugschlus Note Added: 0030828
2018-01-02 15:21 Zugschlus File Added: syslog.20180102
2018-01-02 15:21 Zugschlus Note Added: 0030842
2018-01-02 15:29 Zugschlus Note Added: 0030843
2018-01-09 04:30 kabe Note Added: 0030885
2018-01-12 09:31 Zugschlus Note Added: 0030917
2018-01-24 15:07 Zugschlus File Added: nm.txt
2018-01-24 15:07 Zugschlus Note Added: 0031014
2018-05-24 12:22 Zugschlus Note Added: 0031909
2018-05-25 12:47 Zugschlus Note Added: 0031916
2018-05-25 14:44 kabe Note Added: 0031917
2018-05-28 09:32 Zugschlus Note Added: 0031930
2018-05-29 11:58 kabe Note Added: 0031940
2018-05-30 10:45 Zugschlus Note Added: 0031955
2018-05-30 11:07 kabe Note Added: 0031957
2018-05-30 11:31 kabe Note Added: 0031958
2018-05-30 11:49 Zugschlus Note Added: 0031959