View Issue Details

IDProjectCategoryView StatusLast Update
0008295CentOS-7kernelpublic2018-02-07 07:19
Reportermattwillsher 
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionnot fixable 
PlatformCentOSOS7OS Version7.0
Product Version7.0-1406 
Target VersionFixed in Version 
Summary0008295: Installation via PXE & inst.root= on gen2 (UEFI) Hyper-V container can't find initrd
DescriptionSee also https://bugzilla.redhat.com/show_bug.cgi?id=1201739
Content duplicated below


Description of problem:

When doing a network boot via iPXE in a gen2 Hyperp-V VM, vmlinuz can't find initrd. The functionality works under Fedora 21. There is a back ported patch at http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=819ab9941c98f18b0f8c7ffb815e4f07186d2a5f which should resolve this problem.

Version-Release number of selected component (if applicable):

RHEL 7, 7.1

How reproducible:

Actual results:

Kernel panic

Expected results:

Root gets mounted, installation continues.

Additional info:

See attached file for screen shot of output
Steps To ReproduceCompile ipxe.efi from source:
yum binutils-devel gcc zlib-devel bintutils perl make git
git clone git://git.ipxe.org/ipxe.git
cd ipxe/src
make bin-x86_64-efi/ipxe.efi

Setup DNSMasq:
dhcp-range=192.168.99.20,192.168.99.200
dhcp-vendorclass=set:pxe,vendor:PXEClient
dhcp-userclass=set:ipxe,iPXE
dhcp-option=tag:pxe,option:tftp-server,"192.168.99.10"
dhcp-option=option:router,192.168.99.1
dhcp-boot=tag:efi,ipxe.efi
dhcp-boot=tag:!efi,undionly.kpxe
dhcp-boot=tag:ipxe,ipxe.cfg
dhcp-match=set:efi,option:client-arch,00:07
dhcp-authoritative
enable-tftp
tftp-root=/var/lib/tftpboot

Put images/pxeboot/vmlinuz and initrd.img in /var/lib/tftpboot with the ipxe.efi file, and the following in a file called ipxe.cfg:


#!ipxe
dhcp
kernel vmlinuz ro ip=dhcp inst.repo=http://192.168.99.10/centos/7/os/x86_64 initrd=initrd.img
initrd initrd.img
boot

The inst.repo should point to a local ISO extract or a mirror.


* Create a gen2 Hyper-V VM with 1GB RAM, network boot, defaults for everything else.
* Disable Secure Boot in EFI firmware config.
* Start VM.
Additional InformationWould be useful to get this patched in plus.
TagsNo tags attached.
abrt_hash
URLhttps://bugzilla.redhat.com/show_bug.cgi?id=1201739

Activities

mattwillsher

mattwillsher

2015-03-13 13:15

reporter  

eficrash.PNG (67,355 bytes)
eficrash.PNG (67,355 bytes)
toracat

toracat

2015-03-13 14:41

manager   ~0022512

Ack. Will try to get the patch into the 7.1 kernel-plus.
mattwillsher

mattwillsher

2015-03-13 16:59

reporter   ~0022517

Great.

I've tried to patch against the CentOS 7 kernel source RPM (latest from the 7.0 series) and it booted past the initrd mount.
toracat

toracat

2015-03-13 17:23

manager  

8295.patch (5,090 bytes)
centosplus patch bug #8295

x86/efi: Include a .bss section within the PE/COFF headers

commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream. 

The PE/COFF headers currently describe only the initialised-data 
portions of the image, and result in no space being allocated for the 
uninitialised-data portions. Consequently, the EFI boot stub will end 
up overwriting unexpected areas of memory, with unpredictable results. 

Fix by including a .bss section in the PE/COFF headers (functionally 
equivalent to the init_size field in the bzImage header). 

Signed-off-by: Michael Brown <mbrown@fensystems.co.uk> 
Cc: Thomas B├Ąchler <thomas@archlinux.org> 
Cc: Josh Boyer <jwboyer@fedoraproject.org> 
Signed-off-by: Matt Fleming <matt.fleming@intel.com> 
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 

Applied-by: Akemi Yagi <toracat@centos.org>

--- a/arch/x86/boot/header.S	2015-01-29 15:15:53.000000000 -0800
+++ b/arch/x86/boot/header.S	2015-03-13 08:22:34.337949523 -0700
@@ -91,10 +91,9 @@ bs_die:
 
 	.section ".bsdata", "a"
 bugger_off_msg:
-	.ascii	"Direct floppy boot is not supported. "
-	.ascii	"Use a boot loader program instead.\r\n"
+	.ascii	"Use a boot loader.\r\n"
 	.ascii	"\n"
-	.ascii	"Remove disk and press any key to reboot ...\r\n"
+	.ascii	"Remove disk and press any key to reboot...\r\n"
 	.byte	0
 
 #ifdef CONFIG_EFI_STUB
@@ -108,7 +107,7 @@ coff_header:
 #else
 	.word	0x8664				# x86-64
 #endif
-	.word	3				# nr_sections
+	.word	4				# nr_sections
 	.long	0 				# TimeDateStamp
 	.long	0				# PointerToSymbolTable
 	.long	1				# NumberOfSymbols
@@ -250,6 +249,25 @@ section_table:
 	.word	0				# NumberOfLineNumbers
 	.long	0x60500020			# Characteristics (section flags)
 
+	#
+	# The offset & size fields are filled in by build.c.
+	#
+	.ascii	".bss"
+	.byte	0
+	.byte	0
+	.byte	0
+	.byte	0
+	.long	0
+	.long	0x0
+	.long	0				# Size of initialized data
+						# on disk
+	.long	0x0
+	.long	0				# PointerToRelocations
+	.long	0				# PointerToLineNumbers
+	.word	0				# NumberOfRelocations
+	.word	0				# NumberOfLineNumbers
+	.long	0xc8000080			# Characteristics (section flags)
+
 #endif /* CONFIG_EFI_STUB */
 
 	# Kernel attributes; used by setup.  This is part 1 of the

--- a/arch/x86/boot/tools/build.c	2015-01-29 15:15:53.000000000 -0800
+++ b/arch/x86/boot/tools/build.c	2015-03-13 08:14:48.944900590 -0700
@@ -141,7 +141,7 @@ static void usage(void)
 
 #ifdef CONFIG_EFI_STUB
 
-static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
+static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset)
 {
 	unsigned int pe_header;
 	unsigned short num_sections;
@@ -162,10 +162,10 @@ static void update_pecoff_section_header
 			put_unaligned_le32(size, section + 0x8);
 
 			/* section header vma field */
-			put_unaligned_le32(offset, section + 0xc);
+			put_unaligned_le32(vma, section + 0xc);
 
 			/* section header 'size of initialised data' field */
-			put_unaligned_le32(size, section + 0x10);
+			put_unaligned_le32(datasz, section + 0x10);
 
 			/* section header 'file offset' field */
 			put_unaligned_le32(offset, section + 0x14);
@@ -177,6 +177,11 @@ static void update_pecoff_section_header
 	}
 }
 
+static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
+{
+	update_pecoff_section_header_fields(section_name, offset, size, size, offset);
+}
+
 static void update_pecoff_setup_and_reloc(unsigned int size)
 {
 	u32 setup_offset = 0x200;
@@ -201,9 +206,6 @@ static void update_pecoff_text(unsigned
 
 	pe_header = get_unaligned_le32(&buf[0x3c]);
 
-	/* Size of image */
-	put_unaligned_le32(file_sz, &buf[pe_header + 0x50]);
-
 	/*
 	 * Size of code: Subtract the size of the first sector (512 bytes)
 	 * which includes the header.
@@ -218,6 +220,22 @@ static void update_pecoff_text(unsigned
 	update_pecoff_section_header(".text", text_start, text_sz);
 }
 
+static void update_pecoff_bss(unsigned int file_sz, unsigned int init_sz)
+{
+	unsigned int pe_header;
+	unsigned int bss_sz = init_sz - file_sz;
+
+	pe_header = get_unaligned_le32(&buf[0x3c]);
+
+	/* Size of uninitialized data */
+	put_unaligned_le32(bss_sz, &buf[pe_header + 0x24]);
+
+	/* Size of image */
+	put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+
+	update_pecoff_section_header_fields(".bss", file_sz, bss_sz, 0, 0);
+}
+
 #endif /* CONFIG_EFI_STUB */
 
 
@@ -268,6 +286,9 @@ int main(int argc, char ** argv)
 	int fd;
 	void *kernel;
 	u32 crc = 0xffffffffUL;
+#ifdef CONFIG_EFI_STUB
+	unsigned int init_sz;
+#endif
 
 	/* Defaults for old kernel */
 #ifdef CONFIG_X86_32
@@ -338,7 +359,9 @@ int main(int argc, char ** argv)
 	put_unaligned_le32(sys_size, &buf[0x1f4]);
 
 #ifdef CONFIG_EFI_STUB
-	update_pecoff_text(setup_sectors * 512, sz + i + ((sys_size * 16) - sz));
+	update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));
+	init_sz = get_unaligned_le32(&buf[0x260]);
+	update_pecoff_bss(i + (sys_size * 16), init_sz);
 
 #ifdef CONFIG_X86_64 /* Yes, this is really how we defined it :( */
 	efi_stub_entry -= 0x200;
8295.patch (5,090 bytes)
toracat

toracat

2015-04-07 17:11

manager   ~0022695

The patch was added to the C7 GA kernel-plus (3.10.0-229.el7.centos.plus).
fbacchella

fbacchella

2016-03-23 10:23

reporter   ~0026103

I'm using ipxe 173c0 (rebuild a few days ago).

I'm booting on PXE a kernel 3.10.0-327.10.1.el7.centos.plus.x86_64

The kernel command line is

    vmlinuz ks=http://XXX/Linux/KickStart/ks7.cfg os_version=7 initrd=initrd.img ip=eth0:dhcp inst.sshd inst.gpt net.ifnames=0 hostname=XXXX uuid=XXXX serial=XXXX platform=efi product=ProLiant%20BL460c%20Gen9 install_type=automated console=ttyS0,115200n8

I'm still getting the same problem:
[ 13.017291] EFI Variables Facility v0.08 2004-May-17
[ 13.039574] hidraw: raw HID events driver (C) Jiri Kosina
[ 13.041356] usbcore: registered new interface driver usbhid
[ 13.043063] usbhid: USB HID core driver
[ 13.044386] drop_monitor: Initializing network drop monitor service
[ 13.046415] TCP: cubic registered
[ 13.047452] Initializing XFRM netlink socket
[ 13.048867] NET: Registered protocol family 10
[ 13.050723] NET: Registered protocol family 17
[ 13.052708] Loading compiled-in X.509 certificates
[ 13.054308] Loaded X.509 cert 'CentOS Linux kpatch signing key: ea0413152cde1 d98ebdca3fe6f0230904c9ef717'
[ 13.057242] Loaded X.509 cert 'CentOS Linux Driver update signing key: 7f421e e0ab69461574bb358861dbe77762a4201b'
[ 13.061266] Loaded X.509 cert 'CentOS Linux kernel signing key: c2fef3822e5b9 7d3835f09ee6a5fb90bee0ec6de'
[ 13.064337] registered taskstats version 1
[ 13.066072] Key type trusted registere045] Key type encrypted registered
[ 13.468719] IMA: No TPM chip found, activating TPM-bypass!
[ 13.471565] rtc_cmos 00:02: setting system clock to 2016-03-23 10:06:36 UTC ( 1458727596)
[ 13.474480] md: Waiting for all devices to be available before autodetect
[ 13.506390] md: If you don't use raid, use raid=noautodetect
[ 13.532079] md: Autodetecting RAID arrays.
[ 13.550442] md: Scanned 0 and added 0 devices.
[ 13.570440] md: autorun ...
[ 13.582908] md: ... autorun DONE.
[ 13.597974] List of all partitions:
[ 13.613558] No filesystem could mount root, tried:
[ 13.635378] Kernel panic - not syncing: VFS: Unable to mount root fs on unkno wn-block(0,0)
[ 13.672401] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 3.10.0-327.10.1.el7.cen tos.plus.x86_64 #1
[ 13.711555] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/24/2015
[ 13.741893] ffffffff81868f78 000000002fb5f7a8 ffff881029517d60 ffffffff8164a 071
[ 13.775127] ffff881029517de0 ffffffff816438ec ffffffff00000010 ffff881029517 df0
[ 13.811023] ffff881029517d90 000000002fb5f7a8 000000002fb5f7a8 ffff881029517 e00
[ 13.846585] Call Trace:
[ 13.858576] [<ffffffff8164a071>] dump_stack+0x19/0x1b
[ 13.884242] [<ffffffff816438ec>] panic+0xd8/0x1e7
[ 13.905811] [<ffffffff81aab5fa>] mount_block_root+0x2a1/0x2b0
[ 14.030895] [<ffffffff81aab65c>] mount_root+0x53/0x56
[ 14.053956] [<ffffffff81aab79b>] prepare_namespace+0x13c/0x174
[ 14.080993] [<ffffffff81aab268>] kernel_init_freeable+0x1f0/0x217
[ 14.109238] [<ffffffff81aaa9db>] ? initcall_blacklist+0xb0/0xb0
[ 14.137429] [<ffffffff81639da0>] ? rest_init+0x80/0x80
[ 14.161574] [<ffffffff81639dae>] kernel_init+0xe/0xf0
[ 14.184878] [<ffffffff8165a758>] ret_from_fork+0x58/0x90
[ 14.209024] [<ffffffff81639da0>] ? rest_init+0x80/0x80

ipxe mainters says that kernel commit c7fb93ec51d462ec3540a729ba446663c26a0505 or 819ab9941c98f18b0f8c7ffb815e4f07186d2a5f in linux-3.10.y stable branch the should fix that
abelletti

abelletti

2016-05-03 20:32

reporter   ~0026415

I've done some work in this area while provisioning HP DL380 Gen9 boxes in their native (UEFI) mode. We're using iPXE to load CentOS 7 and get a kickstart going.

What I've found is that the CentosPlus kernel referenced in https://bugs.centos.org/view.php?id=8295#c22695 doesn't work reliably for me. Uncertain why this is, but it fails in the same way as the unmodified kernel at least some of the time.

On the other hand, replacing the kernel and modules with a build provided in the ELRepo project (4.5 for my testing) works 100% of the time on the hardware described above. I've documented this process at http://forum.ipxe.org/showthread.php?tid=7813&pid=12480#pid12480 which involves extracting the default CentOS 7 initrd.img, replacing the kernel modules with the ones from ELRepo 4.5, and rebuilding the archive. Then boot with the ELRepo 4.5 kernel and everything works beautifully.

The central point seems to be that the centosplus kernel may not be quite there. Anything I can do to provide additional data or test proposed fixes, please let me know!
toracat

toracat

2016-05-04 15:41

manager   ~0026422

@abelletti

Thanks for the note. So, what you've found is that the patch added to the plus kernel is not complete (works partially) but kernel-3.16 provides a complete fix?

If we are able to identify the patch(es) that did the fix, we could apply them to improve the plus kernel. This may not be an easy task. Kernel disect might do it but it involves a good amount of work.
abelletti

abelletti

2016-05-04 20:17

reporter   ~0026425

Hi @toracat. That's halfway true. The plus kernel doesn't seem to work reliably. But given limited time, I never tried building a 3.16 kernel. Instead I went for the pre-built kernels available in ELRepo (http://elrepo.org/linux/kernel/el7/x86_64/RPMS/). My specific solution was to use vmlinuz-4.5.0-1.el7.elrepo.x86_64 and the associated modules.

Having gotten it all working though, I'd be happy to test with any other kernel version that would be useful. Just let me know!
toracat

toracat

2017-12-16 17:32

manager   ~0030765

To those who are affected:

How about the current distro kernel 3.10.0-693.xxx? Still the same issue?
toracat

toracat

2018-02-07 07:19

manager   ~0031175

Closing due to inactivity.

Issue History

Date Modified Username Field Change
2015-03-13 13:15 mattwillsher New Issue
2015-03-13 13:15 mattwillsher File Added: eficrash.PNG
2015-03-13 14:41 toracat Note Added: 0022512
2015-03-13 14:41 toracat Status new => assigned
2015-03-13 16:59 mattwillsher Note Added: 0022517
2015-03-13 17:23 toracat File Added: 8295.patch
2015-04-07 17:11 toracat Note Added: 0022695
2016-03-23 10:23 fbacchella Note Added: 0026103
2016-05-03 20:32 abelletti Note Added: 0026415
2016-05-04 15:41 toracat Note Added: 0026422
2016-05-04 20:17 abelletti Note Added: 0026425
2017-12-16 17:32 toracat Note Added: 0030765
2017-12-16 17:33 toracat Status assigned => feedback
2018-02-07 07:19 toracat Status feedback => closed
2018-02-07 07:19 toracat Resolution open => not fixable
2018-02-07 07:19 toracat Note Added: 0031175