2018-01-23 17:25 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0014293CentOS-7anacondapublic2018-01-12 09:31
ReporterZugschlus 
PrioritynormalSeverityminorReproducibilityalways
StatusnewResolutionopen 
PlatformDell Poweredge R630 Revision IOSCentOSOS Version7
Product Version 
Target VersionFixed in Version 
Summary0014293: Does not generate ifcfg-bond0 (depending on CPU version???)
DescriptionHi,

this is a very weird issue. In my Lab, I have two Dell PowerEdge R630 Revision I servers. Both machines are suppsosed to be equal. One of them has a E5-2620 v3 CPU, and one has an E5-2620 v4 CPU. Both machines are on the same network segment, and I have verified that all firewall rules apply to both IP addresses. I am not aware of other differences, but do not rule out that there are.

Both machines do have a bonding network interface with LACP.

I am booting both boxes from the network with:

bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<PATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:<MASK>:<FQDN>:bond0:none nameserver=<DNSIP> console=ttyS0,115200n8 inst.text BOOT_IMAGE=boot/CentOS-7-x86_64-vmlinuz

Immediately after the installer tmux comes up, I change to the shell. Everything from now on is done on this shell with the installer waiting with its very first question on tmux window 1.

On the box with the newer (v4) CPU, everything is fine. I have working Network, /etc/resolv.conf is correctly filled with the nameserver entry.

On the box with the older (v3) CPU,
* /etc/resolv.conf is empty, preventing an installation from proceeding
* /etc/sysconfig/network-scripts/ifcfg-bond0 does not exist and
* /etc/sysconfig/network-scripts/ifcfg-em{1,2} are as if no bonding was to be configured. The IP address that is supposed to be on bond0 is on em1 instead.

Exchanging the IP addresses doesn't help, the errorneous behavior stays with the hardware.

Changing the interface name from bond0 to something else like schlorz0 or keks0 makes things work even on the v3 CPU machine. bond1 and bnd1 cause the faulty behavior as well.

I am totally at a loss about how to proceed here. I have double and triple checked any differences in configuration and input, and will do so again tomorrow. I will also submit installer logs tomorrow, but I need to go through them first.

TagsNo tags attached.
abrt_hash
URL
Attached Files

-Relationships
+Relationships

-Notes

~0030795

TrevorH (developer)

Is this 7.4.1708? There is a bug in the inital 7.4 kernel that means bonds do not work if either updelay= or downdelay= of some non-zero value is specified. Easily bypassed by using zero.

~0030803

Zugschlus (reporter)

Changing the boot command line to

append bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0,updelay=0,downdelay=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<REPOPATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:bond0:none nameserver=<DNSIP> console=ttyS0,115200n8 inst.text

doesn't change the behavior.

~0030804

Zugschlus (reporter)

How do I find out the exact versions of the kernel/initrd/installer I am using? The syslog has only "CentOS 7" all over the place.

~0030807

Zugschlus (reporter)

I played around with the bonding interfaces a bit and found out that all interface names lexically equal or less than "ec0" caused the faulty behavior (sample size ~ 10), and all interfce names lexically equal or greated than "ed0" work.

~0030811

Zugschlus (reporter)

After looking at the issue a bit longer, I found out the hard way that my serial console is losing entire lines when catting longer data.

This issue should be renamed to

"creates /etc/resolv.conf if interface name is lexically less than ec0 (hardware dependent)"

All configuration files, including the ifcfg-* files, are generated ok, and contain correct data. But the installer does not generate a resolv.conf when the Interface name of the bond is less than "ec0", and the lines

INFO rhel-import-state:'./etc/resolv.conf' -> '/etc/resolv.conf'
DEBUG NetworkManager:<debug> [1513913516.1230] dns-mgr: update-dns: updating resolv.conf

are missing from the syslog.

When I use the installer on different hardware, or use a bonding device name lexically equal or greater then "ed0", everything is fine.

I am attaching dmesg and syslog and a collection of other data including the ifcfg-* files once for bond=ec0 (which shows the error) and once for bond=ed0 (which does not show the error for reference).

~0030826

kabe (reporter)

I tried to recreate this but obviously didn't succeed.

Does saying
ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:ec0:none:<DNSIP>
(and omitting nameserver=) change things?

~0030827

kabe (reporter)

Also try
rd.debug rd.live.debug=1
kernel options. This will set 'sh -x' of the dracut initscripts.
If the resolv.conf is properly written out, the trace of
/usr/lib/dracut/modules.d/40network/ifup.sh will show up in journalctl as

+ for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
+ '[' -n <DNSIP> ']'
+ echo nameserver <DNSIP>

echo is actually "echo nameserver $s >> /tmp/net.$netif.resolv.conf"
which will eventually be /etc/resolv.conf .

~0030828

Zugschlus (reporter)

Thanks for this productive input. It'll probably be next year when I can actually try that.

~0030842

Zugschlus (reporter)

The following has been accomplished with the command line

append bond=bond0:em1,em2:mode=802.3ad,miimon=100,mode=4,lacp_rate=0,updelay=0,downdelay=0 initrd=boot/CentOS-7-x86_64-initrd.img inst.repo=http://<REPOIP>/<REPOPATH>/centos-el7-x86_64-os ip=<IP>::<GATEWAY>:255.255.255.128:<FQDN>:bond0:none:10.62.0.18 rd.net.timeout.carrier=60 rd.net.timeout.ifup=60 rd.debug rd.live.debug=1

Behavior on the affected box is the same, /etc/resolv.conf is empty, and /mnt/sysimage/etc/resolv.conf is empty as well.

There is no /usr/lib/dracut/modules.d/40network/ifup.sh

The relevant parts (filtered for 'nameserver') of the journalctl is:

05:47:39,500 INFO dracut-initqueue://usr/sbin/ifup@356(): getargs nameserver
05:47:40,236 INFO dracut-initqueue:/usr/sbin/ifup@393(): for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
05:47:40,237 INFO dracut-initqueue:/usr/sbin/ifup@395(): echo nameserver <DNSIP>
05:47:40,237 INFO dracut-initqueue:/usr/sbin/ifup@393(): for s in '"$dns1"' '"$dns2"' '$(getargs nameserver)'
05:47:44,540 INFO dracut-pre-pivot:////lib/dracut/hooks/pre-pivot/85-write-ifcfg.sh@114(source): getargs nameserver
05:47:44,551 INFO dracut-cmdline:///lib/dracut/hooks/cmdline/28-parse-anaconda-net.sh@6(source): check_depr_arg dns nameserver=%s

So this looks like it comes through just fine. What bothers me is that find / -name ifup doesn't find the actual script in the installer, so that I cannot take a look at the script itself. find . -name '*ifup*' finds only some of the scripts in /etc/sysconfig/network-scripts.

I am attaching the entire syslog as syslog.20180102.

Greetings
Marc

~0030843

Zugschlus (reporter)

When I grep the entire installer for the DNSIP, it shows up in:

[anaconda root@host ~]# grep -lr DNSIP /[a-mrt-z]* /sbin 2>/dev/null
/run/log/journal/2147dd0c062d4ac5a717bf1404d668a6/system.journal
/run/initramfs/net.bond0.resolv.conf
/run/initramfs/net.bond0.override
/tmp/syslog
/tmp/pre-anaconda-logs/kernel_ring_buffer.log
/var/log/dmesg
[anaconda root@host ~]#

And, alas, /run/initramfs/net.bond0.resolv.conf, is correct. Now we "only" need to find out why that does it neither maie to /etc/resolv.conf in the installer nor to the installed system.

~0030885

kabe (reporter)

The dracut initscripts are in initramfs and disappears after switch-root,
so you need to specify "rd.break" (dracut.cmdline(7)) kernel commandline to
examine the scripts and files during initramfs boot.

The resolv.conf comes from /run/initramfs/state/etc/resolv.conf, and
copied to the real root filesystem by rhel-import-state.service.


Are you by chance using CentOS 7.3 install disc for the problematic machine?
/usr/lib/systemd/rhel-import-state of CentOS 7.3 is short and emits

INFO rhel-import-state:'./etc/resolv.conf' -> '/etc/resolv.conf'

line after "Starting Import network configuration from initramfs..."
but CentOS 7.4 one is long and does NOT emit the above line.

The problem seems to lie in dracut-network package, which is updated often,
so using 7.4 disc may have some change.

~0030917

Zugschlus (reporter)

Both machines boot via PXE from the same kernel/initrd configuration, kickstart template is generated in foreman and does not differ in any place other than host name/IP address.

When I change the MAC addresses in the foreman host configuration, the error stays with the hardware.
+Notes

-Issue History
Date Modified Username Field Change
2017-12-20 16:14 Zugschlus New Issue
2017-12-20 16:20 TrevorH Note Added: 0030795
2017-12-21 09:10 Zugschlus Note Added: 0030803
2017-12-21 09:11 Zugschlus Note Added: 0030804
2017-12-21 11:56 Zugschlus Note Added: 0030807
2017-12-21 14:58 Zugschlus File Added: syslog.ec0
2017-12-21 14:58 Zugschlus Note Added: 0030811
2017-12-21 14:58 Zugschlus File Added: rundata.ec0
2017-12-21 14:58 Zugschlus File Added: dmesg.ec0
2017-12-21 14:59 Zugschlus File Added: syslog.ed0
2017-12-21 15:08 Zugschlus File Added: rundata.ed0
2017-12-21 15:08 Zugschlus File Added: dmesg.ed0
2017-12-27 06:50 kabe Note Added: 0030826
2017-12-27 07:53 kabe Note Added: 0030827
2017-12-27 10:17 Zugschlus Note Added: 0030828
2018-01-02 15:21 Zugschlus File Added: syslog.20180102
2018-01-02 15:21 Zugschlus Note Added: 0030842
2018-01-02 15:29 Zugschlus Note Added: 0030843
2018-01-09 04:30 kabe Note Added: 0030885
2018-01-12 09:31 Zugschlus Note Added: 0030917
+Issue History