View Issue Details

IDProjectCategoryView StatusLast Update
0013107CentOS-7mdadmpublic2018-12-17 14:18
ReporterVuojolahti 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Platformx86_64OSCentOSOS Version7
Product Version7.3.1611 
Target VersionFixed in Version 
Summary0013107: CentOS 7 with encrypted RAID 1 partitions won't boot up in degraded mode.
DescriptionAs the summary says, the machine won't boot up if one of the disks is removed. To my understanding the system should still boot up in degraded mode, when one of the disks fails or is removed. It looks like md devices just won't start.

Here's a screenshot from a failed boot after I removed the second disk: https://i.imgur.com/k7jjxTj.png

This might be related or even the same bug I saw in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859691

Installed package versions that might be interesting:
cryptsetup.x86_64 1.7.2-1.el7
dracut.x86_64 033-463.el7
initscripts.x86_64 9.49.37-1.el7
mdadm.x86_64 3.4-14.el7_3.1
Steps To Reproduce1. Begin installing new Centos Vvrtual machine with 2 hard disks from CentOS-7-x86_64-Minimal-1611.iso
2. Manually create a RAID 1 partition for /boot and encrypted RAID 1 partitions for / and swap. See the details from the attached anaconda-ks.cfg.
3. Finish the installation, update all packages (yum update) and verify that the machine can be rebooted normally again.
4. Shut down the machine, remove the second hard disk, try to boot again, and notice how the boot fails.
TagsNo tags attached.
abrt_hash
URL

Activities

Vuojolahti

Vuojolahti

2017-04-10 20:58

reporter  

anaconda-ks.cfg (1,773 bytes)
kabe

kabe

2017-04-11 06:56

reporter   ~0029045

I could reproduce this.

Since activating the systemd-cryptsetup device is done in udev
via dev-disk-by-\x2duuid-<UUID>.device systemd unit,
systemd gives up by default
  JobTimeoutUSec=1min 30s
and drops to emergency shell.


Fortunately, RHEL dracut has built-in failproof mechanism in
/usr/lib/dracut/hooks/initqueue/timeout/50-mdraid_start.sh
which kicks INACTIVE md devices to run by "mdadm -R" after timeout.
But this timeout is by default set to rd.retry=180 seconds, so it is too late.

Workaround is to add "rd.retry=20" kernel option on boot.
(adjust if you're using devices which needs longer probe)

After 20*(2/3)=15 seconds, the failproof logic kicks in and
you'll get prompt for LUKS device passphrase, and
can continue boot in degraded RAID mode.
Vuojolahti

Vuojolahti

2017-04-11 18:28

reporter   ~0029054

Is there a reason why the default is set to 180 seconds in /usr/lib/dracut/modules.d/98systemd/dracut-initqueue.sh?

30 seconds should be the default according to the manual (man dracut.cmdline) and that's what's used in /usr/lib/dracut/modules.d/99base/init.sh.
kabe

kabe

2017-04-12 00:32

reporter   ~0029056

Dunno. The document is stale, so this may be reported as a bug as upstream dracut.

The commit
https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=dbfaae0e34507d2d1f3c186ffe26af3e8028b9f8
changes timeout from 30 to 180, but doesn't say anything particular
about the change.
Maybe wanted to set it longer than systemd's default timeout of 90secs,
or there was some disk subsystem which needed longer than 30 secs to come up.
kabe

kabe

2017-04-12 01:03

reporter   ~0029057

I found that newer dracut in git, modules.d/99base/init.sh also has changed to

-RDRETRY=${RDRETRY:-30}
+RDRETRY=${RDRETRY:-180}

https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=517d27a75f678d4c295cbb07687453950b55df5a

    99base: Increase initqueue timeout in non systemd case
    In case of systemd is used the timeout already is set to 180s, compare
    with file: modules.d/98systemd/dracut-initqueue.sh

    Do the same if systemd is not used, e.g. in kdump case.

So it seems that timeout was enlengthed for non-systemd case.
Still, document (dracut.cmdline.7.asc) is stale.
bibianthony

bibianthony

2018-12-17 14:18

reporter   ~0033368

Same problem here with degraded mdadm partition (without encrypted partition)

On emergency mode, we need to:

mdadm --run /dev/md0
mdadm --run /dev/md1
mdadm --run /dev/md2

to boot.

Issue History

Date Modified Username Field Change
2017-04-10 20:58 Vuojolahti New Issue
2017-04-10 20:58 Vuojolahti File Added: anaconda-ks.cfg
2017-04-11 06:56 kabe Note Added: 0029045
2017-04-11 18:28 Vuojolahti Note Added: 0029054
2017-04-12 00:32 kabe Note Added: 0029056
2017-04-12 01:03 kabe Note Added: 0029057
2018-12-17 14:18 bibianthony Note Added: 0033368