View Issue Details

IDProjectCategoryView StatusLast Update
0005400CentOS-6dracutpublic2012-09-20 11:24
Reporterebroch 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
PlatformLinuxOSCentOSOS Version6
Product Version6.2 
Target VersionFixed in Version 
Summary0005400: Software RAID1 boot failure (Kernel Panic) on failed disk.
DescriptionAfter an install of CentOS6.0/CentOS6.1 on RAID1 Mirror as outlined at the following link: http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5), the OS will boot after either device (hd0,hd1) fails. But after upgrading to CentOS6.2 the OS will NOT boot but yield the following error:

Kernel panic - not syncing: Attempted to kill init!
panic occurred, switching back to text console
Steps To ReproduceInstall CentOS6.0 or CentOS6.1 according to the specifications outlined here: http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5 then update the system or install CentOS6.2 Unplug a drive after turning off the system to simulate a disk failure and boot the system.

If one installs 6.0/6.1, a boot after a simulated disk failure will succeed. After the update to version 6.2 it will fail.
Additional InformationWhen installing CentOS6 both disks are system disks and the boot loader is installed on /dev/sda
TagsNo tags attached.

Activities

ebroch

ebroch

2012-01-08 21:43

reporter   ~0014124

Here's a link to this discussion in the CentOS Forums:
https://www.centos.org/modules/newbb/viewtopic.php?topic_id=34988&viewmode=flat&order=ASC&start=0
toracat

toracat

2012-01-09 17:50

manager   ~0014128

It seems related to the upstream bug:

https://bugzilla.redhat.com/show_bug.cgi?id=735124

However, that bug has been fixed in the 6.2 GA kernel. Is this some sort of a regression ??
toracat

toracat

2012-01-09 18:05

manager   ~0014129

gerald_clark posted this in the forum thread:

I removed the current dracut and dracut-kernel, and
re-installed dracut-004-32.el6.noarch and dracut-kernel-004-32.el6.noarch
from the 6.0 DVD.
Then I removed the current kernel and did a yum update kernel.
2.6.32-200.2.1 installed, and I can boot with a disconnected sda.

It looks like a dracut problem.
ebroch

ebroch

2012-01-09 18:10

reporter   ~0014130

toracat,

The bug at:

https://bugzilla.redhat.com/show_bug.cgi?id=735124

has the following screen output:

<blockquote>
Setting hostname taft-01: [ OK ]
Setting up Logical Volume Management: async_tx: api initialized (async)
xor: automatically using best checksumming function: generic_sse
   generic_sse: 4204.000 MB/sec
   xor: using function: generic_sse (4204.000 MB/sec)
   raid6: int64x1 1222 MB/s
   raid6: int64x2 1746 MB/s
   raid6: int64x4 1789 MB/s
   raid6: int64x8 1492 MB/s
   raid6: sse2x1 2039 MB/s
   raid6: sse2x2 3019 MB/s
   raid6: sse2x4 2890 MB/s
   raid6: using algorithm sse2x2 (3019 MB/s)
   md: raid6 personality registered for level 6
   md: raid5 personality registered for level 5
   md: raid4 personality registered for level 4
   md: raid1 personality registered for level 1
   bio: create slab <bio-1> at 1
   md/raid1:mdX: not clean -- starting background reconstruction
   md/raid1:mdX: active with 4 out of 4 mirrors
   created bitmap (1 pages) for device mdX
   TECH PREVIEW: dm-raid (a device-mapper/MD bridge) may not be fully
supported.
   Please review provided documentation for limitations.
   mdX: bitmap initialized from disk: read 1/1 pages, set 196 of 200 bits
   BUG: unable to handle kernel
   md: resync of RAID array mdX
   md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
   md: using maximum available idle IO bandwidth (but not more than 200000
KB/sec) for resync.
   md: using 128k window, over a total of 102400k.
   paging request at 0000000100000028
   IP: [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
   PGD 215664067 PUD 0
   Oops: 0002 [#1] SMP
   last sysfs file: /sys/module/raid1/initstate
CPU 3
Modules linked in: dm_raid(T) raid1 raid456 async_raid6_recov async_pq raid6_pq
async_xor xor async_memcpy async_tx e1000 microcode dcdb]

Pid: 1051, comm: lvm Tainted: G ---------------- T
2.6.32-192.el6.x86_64 #1 Dell Computer Corporation PowerEdge 2850/0T7971
RIP: 0010:[<ffffffff813e42de>] [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
RSP: 0018:ffff8802156dbc98 EFLAGS: 00010082
RAX: ffff8802169e1200 RBX: ffff880216bc3200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 9efacf44457ee738
RBP: ffff8802156dbc98 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880216bc3420
R13: 0000000000000292 R14: ffff880216bc3328 R15: 0000000000000000
FS: 00007fb06e60c7a0(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd9e7bdda0 CR3: 0000000218d4a000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process lvm (pid: 1051, threadinfo ffff8802156da000, task ffff8802156b8b40)
Stack:
 ffff8802156dbce8 ffffffffa0396937 ffff8802156dbcd8 ffffffffa0002760
<0> ffffe8ffffc03540 ffff880216888408 ffff88021525ad00 ffff88021525ad28
<0> ffff8802156dbd18 0000000000000000 ffff8802156dbcf8 ffffffffa03b7dd5
Call Trace:
 [<ffffffffa0396937>] md_raid5_unplug_device+0x67/0x100 [raid456]
 [<ffffffffa0002760>] ? dm_unplug_all+0x50/0x70 [dm_mod]
 [<ffffffffa03b7dd5>] raid_unplug+0x15/0x20 [dm_raid]
 [<ffffffffa00041fe>] dm_table_unplug_all+0x8e/0x100 [dm_mod]
 [<ffffffff811af50f>] ? thaw_bdev+0x5f/0x130
 [<ffffffffa0002703>] dm_resume+0xe3/0xf0 [dm_mod]
 [<ffffffffa000894c>] dev_suspend+0x1bc/0x250 [dm_mod]
 [<ffffffffa00093b4>] ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008790>] ? dev_suspend+0x0/0x250 [dm_mod]
 [<ffffffffa0009483>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81188ed2>] vfs_ioctl+0x22/0xa0
 [<ffffffff81189074>] do_vfs_ioctl+0x84/0x580
 [<ffffffff811895f1>] sys_ioctl+0x81/0xa0
 [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
Code: 24 2c 02 00 00 01 00 00 00 66 ff 03 66 66 90 48 8b 1c 24 4c 8b 64 24 08
c9 c3 0f 1f 00 55 48 89 e5 0f 1f 44 00 00 48 85 ff 74 1a <
RIP [<ffffffff813e42de>] md_wakeup_thread+0xe/0x30
 RSP <ffff8802156dbc98>
---[ end trace c981a51a52f7c4ab ]---
Kernel panic - not syncing: Fatal exception
Pid: 1051, comm: lvm Tainted: G D ---------------- T
2.6.32-192.el6.x86_64 #1
Call Trace:
 [<ffffffff814eb56e>] ? panic+0x78/0x143
 [<ffffffff814ef704>] ? oops_end+0xe4/0x100
 [<ffffffff8100f22b>] ? die+0x5b/0x90
 [<ffffffff814ef272>] ? do_general_protection+0x152/0x160
 [<ffffffff814eea45>] ? general_protection+0x25/0x30
 [<ffffffff813e42de>] ? md_wakeup_thread+0xe/0x30
 [<ffffffffa0396937>] ? md_raid5_unplug_device+0x67/0x100 [raid456]
 [<ffffffffa0002760>] ? dm_unplug_all+0x50/0x70 [dm_mod]
 [<ffffffffa03b7dd5>] ? raid_unplug+0x15/0x20 [dm_raid]
 [<ffffffffa00041fe>] ? dm_table_unplug_all+0x8e/0x100 [dm_mod]
 [<ffffffff811af50f>] ? thaw_bdev+0x5f/0x130
 [<ffffffffa0002703>] ? dm_resume+0xe3/0xf0 [dm_mod]
 [<ffffffffa000894c>] ? dev_suspend+0x1bc/0x250 [dm_mod]
 [<ffffffffa00093b4>] ? ctl_ioctl+0x1b4/0x270 [dm_mod]
 [<ffffffffa0008790>] ? dev_suspend+0x0/0x250 [dm_mod]
 [<ffffffffa0009483>] ? dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff81188ed2>] ? vfs_ioctl+0x22/0xa0
 [<ffffffff81189074>] ? do_vfs_ioctl+0x84/0x580
 [<ffffffff811895f1>] ? sys_ioctl+0x81/0xa0
 [<ffffffff8100b0b2>] ? system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console


Version-Release number of selected component (if applicable):
2.6.32-192.el6.x86_64

lvm2-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-libs-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
lvm2-cluster-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
udev-147-2.37.el6 BUILT: Wed Aug 10 07:48:15 CDT 2011
device-mapper-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
device-mapper-event-libs-1.02.66-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
cmirror-2.02.87-1.el6 BUILT: Fri Aug 12 06:11:57 CDT 2011
</blockquote>


The error I'm getting does not even get to the point of setting the hostname. The only output I'm getting is the following:

Kernel panic - not syncing: Attempted to kill init!
panic occurred, switching back to text console

with no other indication of what's going on as in bug fixed at Red Hat.
ebroch

ebroch

2012-01-09 18:17

reporter   ~0014131

Gerald,

To be clear, did you mean kernel '2.6.32-220.2.1' rather than '2.6.32-200.2.1?'

Eric
toracat

toracat

2012-01-09 21:38

manager   ~0014133

This problem has been seen/reported by at least 3 people. From what gerald_clark observed, it looks as if dracut is at fault.

@ebroch, could you report this bug upstream at http://bugzilla.redhat.com ? As pschaff wrote in the forum thread, you can then refer to this CentOS bug from there.
tru

tru

2012-01-10 09:10

administrator   ~0014135

1) reproduced on a fresh 6.2 install with the raid1.cfg kickstart file on a virtualbox machine
2) after install, removing either sda/sdb disk will cause a panic on boot
3) with both disk, the machine will boot properly
4) serial console output on failed/panic boot:

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-220.2.1.el6.x86_64 (mockbuild@c6-x8664-build.centos.org) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) ) #1 SMP Fri Dec 23 02:21:33 CST 2011
Command line: ro root=UUID=d1106748-b37b-4f6b-987d-bcf379bf038c rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_MD_UUID=bf1fc631:2dd7542f:b67d08b3:7addb9db SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM console=ttyS0,57600n
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
...
sd 1:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 1:0:0:0: [sda] Attached SCSI disk
md: bind<sda1>
dracut Warning: No root device "block:/dev/disk/by-uuid/d1106748-b37b-4f6b-987d-bcf379bf038c" found





dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.


dracut Warning: Signal caught!

dracut Warning: Boot has failed. To debug this issue add "rdshell" to the kernel command line.
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-220.2.1.el6.x86_64 #1
Call Trace:
 [<ffffffff814ec3b9>] ? panic+0x78/0x143
 [<ffffffff8106ecf2>] ? do_exit+0x852/0x860
 [<ffffffff81177de5>] ? fput+0x25/0x30
 [<ffffffff8106ed58>] ? do_group_exit+0x58/0xd0
 [<ffffffff8106ede7>] ? sys_exit_group+0x17/0x20
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
tru

tru

2012-01-10 09:10

administrator  

raid1.cfg (869 bytes)
tru

tru

2012-01-10 09:14

administrator   ~0014136

with both disk the console ouput is:
...
sd 0:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sdb] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 0:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1
sd 1:0:0:0: [sdb] Attached SCSI disk
 sda: sda1
sd 0:0:0:0: [sda] Attached SCSI disk
md: bind<sdb1>
md: bind<sda1>
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 2 out of 2 mirrors
created bitmap (1 pages) for device md0
md0: bitmap initialized from disk: read 1/1 pages, set 0 of 16 bits
md0: detected capacity change from 0 to 1073729536
 md0: unknown partition table
EXT4-fs (md0): mounted filesystem with ordered data mode. Opts:
dracut: Mounted root filesystem /dev/md0
dracut: Loading SELinux policy
type=1404 audit(1326190414.331:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1326190414.858:3): policy loaded auid=4294967295 ses=4294967295
dracut:
dracut: Switching root
readahead: starting
                Welcome to CentOS
Starting udev: udev: starting version 147
...
tru

tru

2012-01-10 10:09

administrator   ~0014137

failing the 1st raid1 disk with rdshell:

ata1: SATA link down (SStatus 0 SControl 300)
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-6: VBOX HARDDISK, 1.0, max UDMA/133
ata2.00: 4194304 sectors, multi 128: LBA48 NCQ (depth 31/32)
ata2.00: configured for UDMA/133
scsi 1:0:0:0: Direct-Access ATA VBOX HARDDISK 1.0 PQ: 0 ANSI: 5
...
sd 1:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 1:0:0:0: [sda] Attached SCSI disk
md: bind<sda1>
dracut Warning: No root device "block:/dev/disk/by-uuid/d1106748-b37b-4f6b-987d-bcf379bf038c" found


Dropping to debug shell.

sh: can't access tty; job control turned off
dracut:/# cat /proc/mdstat
Personalities :
md0 : inactive sda1[1](S)
      1048564 blocks super 1.0
       
unused devices: <none>
dracut:/# ls -l /dev/disk/by-id/*
lrwxrwxrwx 1 0 root 9 Jan 10 10:55 /dev/disk/by-id/ata-VBOX_HARDDISK_VB6e1c9b51-a9a6a01e -> ../../sda
lrwxrwxrwx 1 0 root 10 Jan 10 10:55 /dev/disk/by-id/ata-VBOX_HARDDISK_VB6e1c9b51-a9a6a01e-part1 -> ../../sda1
lrwxrwxrwx 1 0 root 9 Jan 10 10:55 /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB6e1c9b51-a9a6a01e -> ../../sda
lrwxrwxrwx 1 0 root 10 Jan 10 10:55 /dev/disk/by-id/scsi-SATA_VBOX_HARDDISK_VB6e1c9b51-a9a6a01e-part1 -> ../../sda1
dracut:/# ls -l /dev/disk
total 0
drwxr-xr-x 2 0 root 120 Jan 10 10:55 by-id
drwxr-xr-x 2 0 root 80 Jan 10 10:55 by-path
dracut:/# mdadm --run /dev/md0
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md0
md0: bitmap initialized from disk: read 1/1 pages, set 0 of 16 bits
md0: detected capacity change from 0 to 1073729536
mdadm: started / md0:dev/md0
dracut: unknown partition table
/# cat /proc/partitions
major minor #blocks name

   8 0 2097152 sda
   8 1 1048576 sda1
   9 0 1048564 md0
dracut:/# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[1]
      1048564 blocks super 1.0 [2/1] [_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>
dracut:/# mkdir /a
dracut:/# mount -r /dev/md0 /a
EXT4-fs (md0): mounted filesystem with ordered data mode. Opts:
dracut:/# mount
/proc on /proc type proc (rw)
/sys on /sys type sysfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /a type ext4 (ro)
dracut:/# ls /a
bin dev home lib64 media opt root selinux sys usr
boot etc lib lost+found mnt proc sbin srv tmp var

-> bottom line (afaik): the raid array is detected, but not activated and thus not mounted :(
tru

tru

2012-01-10 10:12

administrator   ~0014138

once the raid array is activated, one can exit the rdshell and the boot sequence will proceed properly:

dracut:/# EXT4-fs (md0): mounted filesystem with ordered data mode. Opts:
dracut: Mounted root filesystem /dev/md0
dracut: Loading SELinux policy
type=1404 audit(1326193786.018:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1326193786.532:3): policy loaded auid=4294967295 ses=4294967295
dracut:
dracut: Switching root
readahead: starting
                Welcome to CentOS
Starting udev: udev: starting version 147
...
tru

tru

2012-01-10 10:39

administrator   ~0014139

http://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html debugging:
Dropping to debug shell.

sh: can't access tty; job control turned off
dracut:/# dmsetup ls --tree
No devices found
dracut:/# blkid -p
The low-level probing mode requires a device
dracut:/# blkid -p -o udev
The low-level probing mode requires a device
dracut:/# dmesg|grep dracut
dracut: dracut-004-256.el6
dracut: rd_NO_LUKS: removing cryptoluks activation
dracut: rd_NO_LVM: removing LVM activation
dracut: Starting plymouth daemon
dracut: rd_NO_DM: removing DM RAID activation
dracut Warning: No root device "block:/dev/disk/by-uuid/d1106748-b37b-4f6b-987d-bcf379bf038c" found
dracut:/# parted /dev/sda -s p
sh: parted: not found

dracut:/# cat /proc/partitions
major minor #blocks name

   8 0 2097152 sda
   8 1 1048576 sda1
dracut:/# cat /proc/mdstat
Personalities :
md0 : inactive sda1[0](S)
      1048564 blocks super 1.0
       
unused devices: <none>
dracut:/# mdadm --run /dev/md0
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md0
md0: bitmap initialized from disk: read 1/1 pages, set 0 of 16 bits
md0: detected capacity change from 0 to 1073729536
mdadm: started /dev/md0
md0: unknown partition table
dracut:/# exit
EXT4-fs (md0): mounted filesystem with ordered data mode. Opts:
dracut: Mounted root filesystem /dev/md0
dracut: Loading SELinux policy
type=1404 audit(1326195398.697:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1326195399.210:3): policy loaded auid=4294967295 ses=4294967295
dracut:
dracut: Switching root
readahead: starting
                Welcome to CentOS
Starting udev: udev: starting version 147
...
Starting atd: [ OK ]

CentOS release 6.2 (Final)
Kernel 2.6.32-220.2.1.el6.x86_64 on an x86_64

The strangest thing (for me) is that rebooting the machine after "fixing" in the rdshell does not panic any longer!
tru

tru

2012-01-10 11:05

administrator   ~0014140

reported upstream at https://bugzilla.redhat.com/show_bug.cgi?id=772926
tru

tru

2012-01-10 11:46

administrator   ~0014141

regression from 6.1 (dracut-004-53.el6.noarch)
toracat

toracat

2012-01-10 16:04

manager   ~0014143

Thanks, Tru, for such extensive debugging and for reporting upstream.

Changing category to 'drucut'.
ebroch

ebroch

2012-01-10 21:56

reporter   ~0014151

"Thanks, Tru, for such extensive debugging and for reporting upstream."

Ditto!!!
Kradllit

Kradllit

2012-01-14 08:22

reporter   ~0014183

I can't install CentOS 6.2 on Encrypted software raid1 LVM. When unplug a drive after turning off the system to simulate a disk failure and try boot the system - System don't boot.
brak44

brak44

2012-01-15 03:18

reporter   ~0014187

We are experiencing this same problem here - Kernel Panic - not syncing after removing 1 of a raid 1 mirrored drive. Kernel is 2.6.32-220.2.1.el6.x86-64
To recover from the kernel panic after you remove a drive you can boot the Live Centos CD 6.1 and opening the degraded raid arrays, inspect them and reboot. This is enough to signal to mdadm that this installed disc is degraded.
It boots after that. This is not a workaround but may get someone out of trouble if indeed a disc did fail.
blauwkaai

blauwkaai

2012-01-18 10:10

reporter   ~0014245

Same issue here. Kernel panic after removing 1 disk from a raid 1 config.
Kernel: 2.6.32-220.2.1.el6.i686
Questions (to brak44):
1. do you have to use Live Centos CD 6.1 or would 6.2 be OK too?
2. what exactly do you mean by 'opening the degraded raid arrays, inspect them and reboot'? Would this then allow me to add a new disc to the config?
tru

tru

2012-01-18 11:23

administrator   ~0014246

proposed workaround without additionnal media:

1) add "rdshell" without the quotes on the boot command line
2) once at the dracut shell:
dracut:/# mdadm --run /dev/md0
(ot whatever your raid device is named, cat /proc/mdstat to see the detected but non activated raid device)
3) exit from dracut shell (ctrl-d)
brak44

brak44

2012-01-18 18:26

reporter   ~0014266

>Questions (to brak44):
>1. do you have to use Live Centos CD 6.1 or would 6.2 be OK too?
Have not tested 6.2 - Provided it does not have the 2.6.32-220.2.1.el6.i686 kernel I suppose.
>2. what exactly do you mean by 'opening the degraded raid arrays, inspect them >and reboot'? Would this then allow me to add a new disc to the config?
Simply opening or accessing the array in nautilus (filemanager). This runs the array.
Tru's solution is easier
Reboot the system, press esc and when you are at the selection prompt, highlight the line for Linux and press 'e'. You may only have 2 seconds to do this, so be quick. This will take you to another screen where you should select the entry that begins with 'kernel' and press 'e' again. Follow Tru's instructions above.
ebroch

ebroch

2012-01-19 03:26

reporter   ~0014269

I went through the following procedure and had some issues...

1) Installed CentOS 6.1 (kernel version 2.6.32-131.0.15.el6.x86_64) and upgraded to CentOS 6.2 (kernel version 2.6.32-220.2.1.el6.x86_64).
2) Unplugged one drive to simulate failure
3) Booted to latest kernel with 'rdshell' option
4) At the 'dracut' prompt type 'mdadm --run /dev/md0' and got the error 'mdadm: error opening /dev/md0: No such file or directory.'
5) Entered ctrl-d and the machine booted to the login prompt. I not sure why.
6) Logged in and checked that the md0, md1, md2 where degraded, they were and that sdb1, sdb2, sdb3 where removed, they were.
7) Shut down the machine
8) Plugged the drive back in and booted to the login prompt.
9) Ran the command 'mdadm --add /dev/md0 /dev/sdb1' and it failed to re-add the drive. The same results occurred when adding '/dev/sdb2.' '/dev/sdb3' re-add fine.
10) I could not add sdb1 or sdb2 to their respective raids until I booted to the earlier kernel version of CentOS 6.1.

If someone installs CentOS 6.2 only there might be a problem re-adding the raid1 devices. I'm not sure why I could not add the devices to the array. It simply would not work.

Eric
brak44

brak44

2012-01-19 07:42

reporter   ~0014270

From a discussion of this problem from the Whirlpool forum
Actually, looking at the Dracut changelog, I can see exactly what the problem is, and when it was fixed. There is extra logic in dracut-012 to force degraded arrays to run after a certain timeout (rather than simply waiting for the missing component and giving up when it doesn't appear).

Red Hat aren't going to simply drop dracut-012 into their repository. But if you're nice to them (money helps) they might backport the fix to RHEL's dracut-004.
ebroch

ebroch

2012-01-21 18:49

reporter   ~0014288

From my previous post (2012-01-19 03:26) and after another installation, I think there is another issue here that may not involve dracut but the most recent kernel (2.6.32-220.2.1.el6). The issue is not being able to re-add devices to a raid array after a failure. I installed CentOS 6.1 (kernel version 2.6.32-131.0.15.el6.x86_64) according to the following link:
http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5.

I purposely failed a drive before any updates (and re-added it later) and everything worked as it should.

In order to by-pass the dracut bug I updated this machine to the most recent kernel (2.6.32-220.2.1.el6) with the following command:
'yum update --exclude=dracut --exclude=dracut-network --exclude=dracut-kernel.'

After purposely failing a drive the machine boots fine but when adding devices back into the raid array by running the following command: 'mdadm /dev/md0 --add /dev/sdb1' I get the following error:
'mdadm: /dev/sdb1 reports being an active member for /dev/md0, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdb1 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdb1" first.'

I believe there is an issue with the newest kernel as well and not simply with dracut.
Bramje

Bramje

2012-01-22 02:41

reporter   ~0014292

Same issue here. New installation with CentOS 6.2 and software raid will cause a Kernel Panic on reboot when I remove one disc from the system.

I guess the best solution is to install CentOS 6.1 and prevent it from updating?
Bramje

Bramje

2012-01-22 03:52

reporter   ~0014293

It looks like installing CentOS 6.1 isn't the solution. No kernel panic but can't boot when a disc is removed. Server will just reboot itself.
ebroch

ebroch

2012-01-22 06:57

reporter   ~0014294

Bramje,

Follow the instructions exactly as they are laid out in the following web page:

http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5

Read them CAREFULLY! Look closely at step 1 and 2.

In CentOS 6 (not CentOS 5) there is a screen during setup where you're given the opportunity to make your devices data or system devices. Both your raid devices should be system devices.

Following these instructions worked for me several times. When I didn't following the instructions I had the same results you got--the machine would cycle without booting completely up.

Eric
ebroch

ebroch

2012-01-22 07:03

reporter   ~0014295

After getting CentOS 6.1 installed add the following line to /etc/yum.conf:

exclude=dracut* kernel*
Bramje

Bramje

2012-01-22 13:27

reporter   ~0014297

@ebroch : lifesaver, Looks like I've missed step 1.

Problem is that I'll need to install xen-kernel soon and then I will need to include those kernel packages again. But I guess it will be better to install the old xen-kernel first?

Hope that someone will fix this bug asap.
jioi

jioi

2012-01-25 13:48

reporter   ~0014323

usually,when will such issue can be fixed ? because i want to run the server with raid1,and i hope update them to the newest version.
tfaruq

tfaruq

2012-01-28 07:36

reporter   ~0014355

Today i downloaded the CentOS-6.2-x86_64-LiveDVD.iso and got above problem, also same thing happen with 6.1. Then i tried the CentOS-6.2-x86_64-bin-DVD1.iso and surprisingly nothing happen. Why? Do you have any idea? Thank you.
jioi

jioi

2012-01-28 07:48

reporter   ~0014356

can you check what is the kernel version of your CentOS-6.2-x86_64-LiveDVD.iso and CentOS-6.2-x86_64-bin-DVD1.iso installation ?
ALex_hha

ALex_hha

2012-01-29 13:03

reporter   ~0014362

> I guess the best solution is to install CentOS 6.1 and prevent it from updating?
The solution is after each update of the kernel is to build initramfs manually by mkinitrd from dracut-004-32.el6.noarch (which is ships with CentOS 6.0)

# cat /etc/redhat-release
CentOS release 6.2 (Final)

# uname -r
2.6.32-220.4.1.el6.x86_64

# mount | grep md0
/dev/md0 on / type ext4 (rw)

So after I have installed 2.6.32-220.4.1.el6.x86_64 I just execute the following commands

# cd /boot
# mv initramfs-2.6.32-220.4.1.el6.x86_64.img initramfs-2.6.32-220.4.1.el6.x86_64.img.orig
# mkinitrd /boot/initramfs-2.6.32-220.4.1.el6.x86_64.img 2.6.32-220.4.1.el6.x86_64
ebroch

ebroch

2012-02-02 16:49

reporter   ~0014391

I installed CentOS 6.1 with software raid1 (http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5) and updated to the latest kernel (2.6.32-220.4.1.el6.x86_64) on Dell PowerEdgeT110II. After update and synchronization I failed and removed both devices (not at the same time) to simulate a single disk failure which make up /dev/md0 (/boot) with the commands
'mdadm --manage /dev/md0 --fail /dev/sda1 & mdadm --manage /dev/md0 --remove /dev/sda1' and rebooted the machine. At the command line used the command 'mdadm --manage /dev/md0 --re-add /dev/sda1' to re-add the failed device. Everything worked as it should even after doing the same to the device '/dev/sdb1.'

It is interesting that I could never get this to work on a Dell PowerEdgeT105 and don't wonder if the new kernel and dracut aren't geared toward certain hardware.
brak44

brak44

2012-02-16 22:11

reporter   ~0014488

Yesterday on an Asus T300-E5 server I loaded Centos 6.2 64bit with kernel 2.6.32-220.4.1.el6.x86_64 from the DVD then removed one drive from the raid 1 array. The Kernel Panic problem is still occurring. So I loaded Scientific Linux 6.1 Live then did a yum update to kernel 2.6.32-220.4.1 with no problem removing a drive. The Scientific Linux dracut is at version dracut-004-53.el6 whilst the Centos 6.2 is dracut-004-256.el6
brak44

brak44

2012-02-20 23:23

reporter   ~0014511

On Feb 20th Harald from Red Hat has posted dracut 256 version 2
http://people.redhat.com/harald/downloads/dracut/dracut-004-256.el6_2.1/
I've tested this with an Asus T300-E5 server loaded Centos 6.2 64bit with kernel 2.6.32-220.4.2.el6.x86_64
yum remove dracut dracut-kernel
rpm -i dracut-004-256.el6_2.1.noarch.rpm dracut-kernel-004-256.el6_2.1.noarch.rpm
(do the mkinitrd as per above)
It is now booting with no problem.
toracat

toracat

2012-02-24 05:29

manager   ~0014541

Looks like the fix is in today's update for dracut (004-256.el6_2.1).
tru

tru

2012-02-24 16:38

administrator   ~0014546

latest dracut fixes the issue:
don't forget to rebuild your initramfs once you have installed the latest dracut version!

Issue History

Date Modified Username Field Change
2012-01-08 21:41 ebroch New Issue
2012-01-08 21:43 ebroch Note Added: 0014124
2012-01-09 17:50 toracat Note Added: 0014128
2012-01-09 18:05 toracat Note Added: 0014129
2012-01-09 18:05 toracat Status new => acknowledged
2012-01-09 18:10 ebroch Note Added: 0014130
2012-01-09 18:17 ebroch Note Added: 0014131
2012-01-09 21:38 toracat Note Added: 0014133
2012-01-10 09:10 tru Note Added: 0014135
2012-01-10 09:10 tru File Added: raid1.cfg
2012-01-10 09:14 tru Note Added: 0014136
2012-01-10 10:09 tru Note Added: 0014137
2012-01-10 10:12 tru Note Added: 0014138
2012-01-10 10:39 tru Note Added: 0014139
2012-01-10 11:05 tru Note Added: 0014140
2012-01-10 11:46 tru Note Added: 0014141
2012-01-10 16:04 toracat Note Added: 0014143
2012-01-10 16:04 toracat Category kernel => dracut
2012-01-10 16:04 toracat Status acknowledged => confirmed
2012-01-10 21:56 ebroch Note Added: 0014151
2012-01-14 08:22 Kradllit Note Added: 0014183
2012-01-15 03:18 brak44 Note Added: 0014187
2012-01-18 10:10 blauwkaai Note Added: 0014245
2012-01-18 11:23 tru Note Added: 0014246
2012-01-18 18:26 brak44 Note Added: 0014266
2012-01-19 03:26 ebroch Note Added: 0014269
2012-01-19 07:42 brak44 Note Added: 0014270
2012-01-21 18:49 ebroch Note Added: 0014288
2012-01-22 02:41 Bramje Note Added: 0014292
2012-01-22 03:52 Bramje Note Added: 0014293
2012-01-22 06:57 ebroch Note Added: 0014294
2012-01-22 07:03 ebroch Note Added: 0014295
2012-01-22 13:27 Bramje Note Added: 0014297
2012-01-25 13:48 jioi Note Added: 0014323
2012-01-28 07:36 tfaruq Note Added: 0014355
2012-01-28 07:48 jioi Note Added: 0014356
2012-01-29 13:03 ALex_hha Note Added: 0014362
2012-02-02 16:49 ebroch Note Added: 0014391
2012-02-16 22:11 brak44 Note Added: 0014488
2012-02-20 23:23 brak44 Note Added: 0014511
2012-02-24 05:29 toracat Note Added: 0014541
2012-02-24 16:38 tru Note Added: 0014546
2012-02-24 16:38 tru Status confirmed => resolved
2012-02-24 16:38 tru Resolution open => fixed