2017-11-20 02:16 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0009860CentOS-7kernelpublic2016-02-18 17:37
Reporterokmikel 
PriorityhighSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0009860: 3.10.0-327.el7 crashes on boot
Description3.10.0-327.el7 crashes on every boot.

There is a redhat bugzilla report at (Red Hat Bugzilla – Bug 1285235) https://bugzilla.redhat.com/show_bug.cgi?id=1285235 with this issue. The resolution is noticed there to update to the newest kernel in the "z-stream release".

Please make this fix available in Centos.
Steps To ReproduceReboot an affected AMD System
Additional InformationThere is also mentioned an other bugreport at redhat bugzilla. But I cannot access it because of insufficient rights.

(Red Hat Bugzilla – Bug 1265283, https://bugzilla.redhat.com/show_bug.cgi?id=1265283)
TagsNo tags attached.
abrt_hash
URL
Attached Files

-Relationships
has duplicate 0010176resolvedtoracat kernel crashes on boot 
has duplicate 0010215closedIssue Tracker System & live CD won't boot latest kernel 
+Relationships

-Notes

~0024987

tigalch (manager)

Last edited: 2015-12-07 08:16

View 2 revisions

z-Stream releases are not reproduced by CentOS - you need to buy those. However you can wait for the ISOs for 7.2 (1511) to be released and see if this solves your problem.

~0024988

wolfy (developer)

Note that the kernel from CentOS 7.2.1511 is already available via the CR repository

~0024989

okmikel (reporter)

Yes, it is already available and this is the kernel, which crashes at boot without kernel option "initcall_blacklist=clocksource_done_booting".

kernel-3.10.0-327.el7.x86_64 is the CentOS 7.2.1511 kernel and there is also no newer in the git of RHEL 7 at git.centos.org.

~0024990

wolfy (developer)

Last edited: 2015-12-07 09:16

View 2 revisions

The newer kernels that will be made available - once RedHat releases them - will also be based on 3.10.327.
The only option to have the problem fixed without using the additional kernel parameter is to persuade RH, via bugzilla.redhat.com, to include the fix in the main ( not z-Stream) kernel. CentOS builds its packages from the sources published by RH so there is nothing we can do until that time ( unless a modified kernel can be pushed via the centosplus repo).

~0025022

okmikel (reporter)

New kernel kernel-3.10.0-327.3.1.el7.x86_64 does not solve the problem.

~0025052

johan.kroeckel (reporter)

Second this: "kernel-3.10.0-327.3.1.el7.x86_64 does not solve the problem".

~0025057

arrfab (administrator)

Current status is that there is no kernel that can fix this right now.
One can test other kernels (either provided through AltArch/SIG , like newer one for Xen , so coming from Virt-SIG) and see if that solves the issue

Two ways to fix the issue with kernel-3.10.0-327*) :

- for installed system :
  - boot with the initcall_blacklist=clocksource_done_booting kernel parameter added (or reboot on previous kernel)
  - once booted, add the same parameter at the end of the GRUB_CMDLINE_LINUX=" .." line , in the file /etc/default/grub
  - as root, run "grub2-mkconfig -o /etc/grub2.conf"

- for a system you want to install
  - start the kernel/boot media with the initcall_blacklist=clocksource_done_booting kernel parameter added
  - when you reboot, add the solution above (for installed system) if not already applied to the default grub config

~0025058

rusxakep (reporter)

@arrfab, grub2-mkconfig -o /etc/grub2.cfg in default installation, not grub2.conf

~0025061

toracat (manager)

For more details on grub2, please see:

https://wiki.centos.org/HowTos/Grub2

:-)

~0025085

madko (reporter)

Same problem here on proliant N40L/N54L (amd Neo).

Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg fix this bug. Thank you for sharing this solution.

~0025204

timmerov (reporter)

second this: "Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg" works around the issue.
thanks!

now to remember to restore the line when a new kernel is pushed... ;->

~0025221

whitroth (reporter)

We had this happen on three servers, all Dell R420's with Xeons. Other servers - supermicro's and another Dell or two did not show the issue.

On two of the three, I applied the workaround. This worked for some hours, then the systems went into distress, and I had to power cycle them, rebooting to the previous kernel. NOTE: when the boot to the 327 kernel failed, it dropped into rdosshell (sp?). When I power cycled to boot to the previous kernel, *that* failed the same way, and only booting to the second previous kernel allowed the system to come up. After the systems with the workaround failed, I *could* reboot to the most previous kernel, suggesting that the failure left something that broke the previous kernel boot.

I finally removed the 327 kernel, and we seem to be ok. I have uploaded an abrt-crash report, in hopes that might help debugging the issue.

One final note: when I applied the workaround to start, and ran grub3-mkconfig, on the two systems with large RAID appliances, it gave errors from os-probe, which reported "unsupported sector size 4096". Googling, I was under the impression that was fixed in 2010 - has it crept back in?

~0025417

johan.kroeckel (reporter)

Just as a sidenote: still a problem in 3.10.0-327.4.4.el7.x86_64.

~0025501

nroskam (reporter)

I can also confirm that adding "initcall_blacklist=clocksource_done_booting" string to the grub profile and re-making the grub2.cfg has finally resolved my booting issues with the latest kernels.

The last kernel that worked for me without this boot parameter was 3.10.0-229.14.1 on my HP N54L microserver. Now I run the latest 3.10.0-327.4.4 without problem.

~0025506

jistone (reporter)

On my HP N40L, that workaround does let it boot, but I don't really trust this. For instance, I found that reading sysfs current_clocksource crashed the system, even as an unprivileged user!

  $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource

The crash is easy to understand -- clocksource_done_booting() is supposed to set pointer curr_clocksource, and sysfs_show_current_clocksources() dereferences it.

~0025534

tru (administrator)

https://bugzilla.redhat.com/show_bug.cgi?id=1285235 is listed as clone to the original Red Hat bugzilla entry

~0025544

whitroth (reporter)

I've just updated a CentOS 7 server to the latest kernel, vmlinuz-3.10.0-327.4.5.el7.x86_64, and the server fails to boot. It has failed on every 327 kernel.

Server: Dell R420, 2 Xeons, 124G RAM.

From the rdsosreport.txt, the relevant portion is:
[ 3.317974] <servername> systemd[1]: Starting File System Check on /dev/disk//
by-label/\x2f...
[ 3.320089] <servername> systemd-fsck[590]: Failed to detect device /dev/diskk
/by-label//
[ 3.320567] <servername> systemd[1]: systemd-fsck-root.service: main process
exited, code=exited, status=1/FAILURE
[ 3.320972] <servername> systemd[1]: Failed to start File System Check on /dee
v/disk/by-label/\x2f.
[ 3.321423] <servername> systemd[1]: Dependency failed for /sysroot.
[ 3.321872] <servername> systemd[1]: Dependency failed for Initrd Root File SS
ystem.
[ 3.322335] <servername> systemd[1]: Dependency failed for Reload Configuratii
on from the Real Root.
[ 3.322802] <servername> systemd[1]: Job initrd-parse-etc.service/start failee
d with result 'dependency'.
[ 3.323266] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-parse-etc.service.
[ 3.323697] <servername> systemd[1]: Job initrd-root-fs.target/start failed ww
ith result 'dependency'.
    3.323266] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-parse-etc.service.
[ 3.323697] <servername> systemd[1]: Job initrd-root-fs.target/start failed ww
ith result 'dependency'.
[ 3.324161] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-root-fs.target.
[ 3.324586] <servername> systemd[1]: Job sysroot.mount/start failed with resuu
lt 'dependency'.
[ 3.324998] <servername> systemd[1]: Unit systemd-fsck-root.service entered ff
ailed state.
[ 3.325430] <servername> systemd[1]: systemd-fsck-root.service failed.
[ 3.326752] <servername> systemd[1]: Stopped dracut pre-pivot and cleanup hooo

And it stops, and drops me into the rdshell. Not that I can mkdir /mnt, and mount /dev/sda1, and /boot is there, and I can mount /dev/sda3, and root is there just fine.

        mark

~0025553

nix_rules (reporter)

WORKAROUND DOES NOT WORK FOR ME.

"Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg" DID NOT fix it for me.

I have a Gigabyte brand GA-880GMA-UD2H motherboard with AMD Phenom(tm) II X4 965 Processor. All kernels newer than 3.10.0-229.20.1.el7.x86_64 fail to boot this machine even after adding the parameter above and reconfiguring grub. It hangs at "x86_64_start_kernel+0x152/0x175" every time on the boot screen with all the kernels newer than 3.10.0-229.20.1.el7.x86_64.

Is anybody besides me still having this issue? I hope someone is still working on correcting it.

~0025600

evilissimo (reporter)

besides the crash when doing

 $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource

My systems ability to keep the system time disappeared and the NTPd failed to keep the system in sync so initcall_blacklist=clocksource_done_booting is a pretty bad idea if you're not just updating the system, the best is to revert your kernel back to the last known good kernel version which didn't crash on startup

~0025612

danieln74 (reporter)

hosted server (HP ProLiant DL360e Gen8, BIOS P73 12/20/2013 - Centos 7: 3.10.0-327.4.5.el7) has exactly the same issue; server delivered with 'initcall_blacklist=clocksource_done_booting'; i am having the same issues as described above - (server unstable, clock ultra fast, ntp failing, crash on cat).

-> has anyone tried using elrepo`s latest 3.18.4-stable or should I rather downgrade to 229?? Thank you!

~0025620

octothorpe (reporter)

My system is also suffering from this problem.

Hardware: ASUS M4A785-M motherboard, 4 GB ECC RAM, Athlon II X4 630 CPU; using built-in video, plus an Intel SASUC8I HBA cross-flashed with LSI 1068E-R IT firmware, and an SiI 3132-based 2-port SATA card in addition to the on-board SATA ports. The system has ZFS installed, but the main system runs off MD-RAID; only the storage pools are ZFS.

I initially thought it was something related to the HBA, except, of course, it works fine on 3.10.0-229.20.1.el7.x86_64. I was all set to temporarily rip out the add-in cards until I found this thread.

I have found that the ELRepo mainline kernel (4.4.1-1) will boot just fine (including the ZFS modules), and it's currently running (but I'll have to see how things shake out over the next few days). I rebuilt the ELRepo kernel from the source RPM just for kicks (and personalization).

~0025624

xenium (reporter)

Another effected system:

AMD Athlon(tm) II X3 450 Processor
GA-770T-USB3 motherboard

I have rolled back to a 3.10.0-229.x kernel successfully.

~0025630

octothorpe (reporter)

Heads up: One thing I've discovered with the 4.4.1-1 kernel from ELRepo is that it won't work with KVM. Not a big issue for a file server with 4 GB RAM, but not so good for a VM host.

~0025636

toracat (manager)

@octothorpe

Could you elaborate a bit? When you say "won't work with KVM", what exactly does not work? A module issue or some setup problem or ...

~0025637

octothorpe (reporter)

With the ELRepo 4.4.1-1 kernel, attempting to run virt-manager will give me the message:

Unable to connect to libvirt

authentication failed: no agent is available to authenticate

Note that virt-manager worked fine on a 3.10.0-229.x kernel.

~0025638

toracat (manager)

Last edited: 2016-02-06 22:50

View 3 revisions

@octothorpe

Thanks for the note. Discussing this any further here is way OT, so I will stop here.

[EDIT] Adding this note for posterity: virt-manager works without any issue with kernel-ml (el6 and el7) as well as kernel-lt (el6). Confirmed by two independent users.

~0025641

danieln74 (reporter)

quick update from my side: DL360e-8 (2x Xeon E5) working fine with ELRepo 4.4.1-1 and 'initcall_blacklist=clocksource_done_booting' removed. clocksources properly initialized, NTP sync up & ok, LAMP stack performing as expected!

~0025645

octothorpe (reporter)

Some more data points: a Gigabyte GA-F2A88XM-D3H with an A10-7850K APU, and an ASUS M5A97 R1.02 with an FX-6100 CPU both run fine on 3.10.0-327.

The A10 system (which normally runs Windows 7) was set up to boot CentOS via Etherboot/PXE/iSCSI, and a kernel update to '327 worked flawlessly (whew).

The FX system (which normally runs Xubuntu 14.04 LTS) booted the 1511 install USB without a problem.

~0025675

octothorpe (reporter)

It looks like this problem affects certain specific CPUs, or perhaps CPU/chipset combos. I have a second ASUS M4A785-M built up differently, which I discovered can successfully boot the 1511 install USB.

This one has an Nvidia graphics card instead of an HBA in the x16 slot, but, more importantly, it has a Phenom II 550 Black Edition for the CPU. It is also running with two additional cores unlocked in the BIOS (and is stable in its normal operation with Xubuntu).

~0025750

okmikel (reporter)

kernel-3.10.0-327.10.1.el7.x86_64 fixes the bug for me.

Booting as expected and this also works:

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

~0025751

toracat (manager)

@okmikel

Thank you for reporting that the latest kernel update fixes the bug. I can confirm this in the changelog:

- [kernel] tick: broadcast: Prevent livelock from event handler (Prarit Bhargava) [1284043 1265283]
- [kernel] clockevents: Serialize calls to clockevents_update_freq() in the core (Prarit Bhargava) [1284043 1265283]

(Note the BZ number 1265283)

~0025752

xenium (reporter)

Also confirmed here. AMD Athlon(tm) II X3 450 Processor GA-770T-USB3 motherboard system previously effected by this bug now running 3.10.0-327.10.1.el7.x86_64 successfully, and current_clocksource == tsc.

~0025757

octothorpe (reporter)

I was glad to see "kernel" among last night's batch of updates.

327.10.1 is working for me, too. *yay!*

Nuking 327.4.5 from orbit (just to be sure).

~0025758

nroskam (reporter)

And I can concur as well that the latest is running fine: 3.10.0-327.10.1.el7.x86_64 x86_64 GNU/Linux on AMD Turion(tm) II Neo N54L Dual-Core Processor (HP Proliant N54L)

~0025771

toracat (manager)

Thanks all for reporting back with the confirmation. Now closing this as 'resolved'.
+Notes

-Issue History
Date Modified Username Field Change
2015-12-07 07:29 okmikel New Issue
2015-12-07 07:35 tigalch Note Added: 0024987
2015-12-07 08:16 wolfy Note Edited: 0024987 View Revisions
2015-12-07 08:18 wolfy Note Added: 0024988
2015-12-07 08:27 okmikel Note Added: 0024989
2015-12-07 09:09 toracat Status new => acknowledged
2015-12-07 09:16 wolfy Note Added: 0024990
2015-12-07 09:16 wolfy Note Edited: 0024990 View Revisions
2015-12-10 06:52 okmikel Note Added: 0025022
2015-12-14 21:15 johan.kroeckel Note Added: 0025052
2015-12-15 10:02 arrfab Note Added: 0025057
2015-12-15 10:11 rusxakep Note Added: 0025058
2015-12-15 12:39 toracat Note Added: 0025061
2015-12-16 07:47 madko Note Added: 0025085
2015-12-28 01:19 timmerov Note Added: 0025204
2015-12-30 22:18 whitroth File Added: [abrt] full crash report.asc
2015-12-30 22:24 whitroth Note Added: 0025221
2016-01-18 21:38 tru Relationship added has duplicate 0010176
2016-01-19 17:06 johan.kroeckel Note Added: 0025417
2016-01-22 18:10 toracat Relationship added has duplicate 0010215
2016-01-25 02:56 nroskam Note Added: 0025501
2016-01-25 17:56 jistone Note Added: 0025506
2016-01-27 12:32 tru Note Added: 0025534
2016-01-27 20:37 whitroth Note Added: 0025544
2016-01-29 16:44 nix_rules Note Added: 0025553
2016-02-04 14:24 evilissimo Note Added: 0025600
2016-02-04 18:42 danieln74 Note Added: 0025612
2016-02-04 23:42 octothorpe Note Added: 0025620
2016-02-05 06:19 xenium Note Added: 0025624
2016-02-05 12:30 octothorpe Note Added: 0025630
2016-02-05 16:24 toracat Note Added: 0025636
2016-02-05 16:41 octothorpe Note Added: 0025637
2016-02-05 16:52 toracat Note Added: 0025638
2016-02-05 18:21 danieln74 Note Added: 0025641
2016-02-06 14:03 octothorpe Note Added: 0025645
2016-02-06 22:43 toracat Note Edited: 0025638 View Revisions
2016-02-06 22:50 toracat Note Edited: 0025638 View Revisions
2016-02-09 22:58 octothorpe Note Added: 0025675
2016-02-17 06:38 okmikel Note Added: 0025750
2016-02-17 07:14 toracat Note Added: 0025751
2016-02-17 08:04 xenium Note Added: 0025752
2016-02-17 12:10 octothorpe Note Added: 0025757
2016-02-17 12:21 nroskam Note Added: 0025758
2016-02-18 17:37 toracat Note Added: 0025771
2016-02-18 17:37 toracat Status acknowledged => resolved
2016-02-18 17:37 toracat Resolution open => fixed
+Issue History