View Issue Details

IDProjectCategoryView StatusLast Update
0017598CentOS-8kernelpublic2020-11-17 11:21
Reporterps7776 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version8.0.1905 
Target VersionFixed in Version 
Summary0017598: Any kernel beyond 8.0 rescue fails to boot
DescriptionHardware : SuperMicro MBD-H11SSL with AMD Epyc 7502 32-core CPU

Originally installed CentOS8.0.1905 on this system. / and /boot are on real GPT partitions, data disks are a combination of LVM and sodftware RAID-5.

Regular yum upgrades to running system since February all ok but a recent power outage lead to trying to boot from 8.2 kernel ( 193.6.3 ) which failed. After a lot of trial and error I determined that the only kernel that would boot properly was the original ( 147 ) rescue kernel and initramfs. For any other combination I get a black screen immediately after the Probing EDD .... message and the hardware eventually reboots. Removing quiet and rhgb , gives no extra output on the screen, rebuilding initramfs with either -a rescue or -H does not make any difference, adding rd.shell and/or rd.break to get a dracut shell doesn't do anything, ading init=/bin/bash doesn't do anything either,setting edd=on or off makes no difference. Removing and reinstalling kernels make no difference.

 So - basically the boot fails at a very, very early stage and I'm not sure how to get mode debug output. Any suggestions to other kernel options to try ? Like how to select the most basic video ? What is the dumbest VGA mode one can select ? Any early boot debug parameters ? Any other modules that I should try to load in to initramfs ?

This is obviously some weird combination of software and hardware ( BIOS ?) that causes booting to fail. Highly unexpected.

The BIOS is set to legacy ( ie. non UEFI only ) but I don't think there is anything wrong with with it or grub2 for that matter. I can do an "ls" at the grub prompt for all the partitions on all disks and they appear to be correct. Using (hd0,gpt... ) insted of labels or /dev/sda... makes no difference.


Booting from an 8.2 install USB fails too by the way.




peter
Steps To ReproduceBoot anything beyond the 8.0 rescue kernel.

TagsNo tags attached.

Activities

ps7776

ps7776

2020-07-18 00:32

reporter   ~0037377

Could be a video driver problem with the built in ASPEED card and the linux ast driver according to this link https://www.supermicro.com/support/faqs/faq.cfm?faq=31035 .
MissBlue

MissBlue

2020-10-01 23:52

reporter   ~0037776

Same issue with a Supermicro Model 1114S-WTRT paired with an AMD EPYC 7282.
CentOS 8.1 installs and runs fine, but the second it updates it will no longer boot, it also instantly black screens and performs a hardware reboot, only rescue mode works.
Doing a clean install with the latest 8.2 Minimal CentOS yields the same results, but it also breaks rescue mode.
MissBlue

MissBlue

2020-10-02 14:39

reporter   ~0037778

A quick update: It does not appear to be the kernel itself (At least in my case).
I can install Kernel 5.8 ML just fine on CentOS7 and CentOS 8.1 and it runs as expected.
The second you perform a 'yum/dnf' update, the system bricks, even on the 5.8 ML Kernel, only rescue mode will work, so it appears to be a package included in the update that causes this behaivor.
ps7776

ps7776

2020-10-02 15:44

reporter   ~0037779

Did the dnf/yum update nuke /boot/grub2/grub.cfg by any chance ? The "search" string in the ### BEGIN /etc/grub.d/10_linux ### section where it tries to guess the root device ? At one point mine ended up pointing to the wrong drive. Not sure why.

  By the way - did you try booting wih rd.debug rd.shell vga=0 single and no "rhbg" and "quiet" ? If it is due to a missing driver you can load them at boot time from a separate USB stick with the "dd" option. I had to do that on an old box since the SATA chipset on it is no longer supported in the default kernel ( it still is in the plus kernel )
MissBlue

MissBlue

2020-10-02 16:24

reporter   ~0037780

I'll have to check that and get back to you (Currently not home), I know that any paraments didn't work such as removing "quiet" and "rhbg" because the second it attempts to boot anything that isn't rescue mode, it just hardware reboots.
Not a single line nor a single message, just instantly hardware reboots, even with those parameters.
ps7776

ps7776

2020-10-02 17:16

reporter   ~0037781

Have you tried going to the command line in grub2 to search where it is trying to boot from Like something like (hd0,gpt1)/vm followed by a <tab> ? A bit of trial and error should tell you where grub2 thinks /boot is . Then compare with what is in grub2.cfg . Or try completing the "linux" and "initramfs" entries with rd.shell rd.debug rd.break and no quiet and rhgb and the partition where you found /boot . It should boot into a dracut shell with lots of logging enabled. You can add nomodeset and vga=0 as well to make it believe it has a dumb basic video card. This would bypass the grub2.cfg file completely and only require a good copy of vmlinuz and initramfs
kybur

kybur

2020-11-13 20:34

reporter   ~0037898

I'm having the same issue. Has anyone contacted Supermicro support?
ps7776

ps7776

2020-11-13 22:56

reporter   ~0037899

Yes - I did contact them. The word is that if you boot in legacy BIOS then there is a 2TB disk limit . Both "/boot" and "/" have to be on the first 2TB ( my "/" is not but I'm in the process of making it so ) . With "/boot" on the first 2TB I can reliably boot a centosplus 8.2 kernel to the point where it switches from initramfs to "/" . Use kernel options rd.shell rd.debug rd.break single and get rid of rhgb and quiet to get maximum output . Add vga=0 and edd=off for good measure.

CentOS 8.2 installs without a hitch on a < 1TB disk ( minimal install for simplicity ) , yum update works as expected and all kernels boot.

I'm not sure if any of the grub2 code is still used at root switch time and it can't handle more than 2 TB or if something is inherited from the BIOS and it thinks theroot filesystem is truncated or what.

 I'm making an image of the drive and will try to rearrange the partitions with GParted.
MissBlue

MissBlue

2020-11-17 11:21

reporter   ~0037912

Sorry for the late reply!

I have attempted to do as suggested but none where successfull.
Legacy does work as stated but the NVME does not function which is unfortunate.

I installed Ubuntu with 5.8 ML Kernel with KVM and that works. Installing any CentOS and updating them on KVM also works. Even CentOS 8.2 worked out of the box on KVM.

Issue History

Date Modified Username Field Change
2020-07-17 18:32 ps7776 New Issue
2020-07-18 00:32 ps7776 Note Added: 0037377
2020-10-01 23:52 MissBlue Note Added: 0037776
2020-10-02 14:39 MissBlue Note Added: 0037778
2020-10-02 15:44 ps7776 Note Added: 0037779
2020-10-02 16:24 MissBlue Note Added: 0037780
2020-10-02 17:16 ps7776 Note Added: 0037781
2020-11-13 20:34 kybur Note Added: 0037898
2020-11-13 22:56 ps7776 Note Added: 0037899
2020-11-17 11:21 MissBlue Note Added: 0037912