CentOS Bug Tracker - CentOS-7
View Issue Details
0009860CentOS-7kernelpublic2015-12-07 07:292016-02-18 17:37
Reporterokmikel 
PriorityhighSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version7.2.1511 
Target VersionFixed in Version 
abrt_hash
URL
Summary0009860: 3.10.0-327.el7 crashes on boot
Description3.10.0-327.el7 crashes on every boot.

There is a redhat bugzilla report at (Red Hat Bugzilla – Bug 1285235) https://bugzilla.redhat.com/show_bug.cgi?id=1285235 with this issue. The resolution is noticed there to update to the newest kernel in the "z-stream release".

Please make this fix available in Centos.
Steps To ReproduceReboot an affected AMD System
Additional InformationThere is also mentioned an other bugreport at redhat bugzilla. But I cannot access it because of insufficient rights.

(Red Hat Bugzilla – Bug 1265283, https://bugzilla.redhat.com/show_bug.cgi?id=1265283)
TagsNo tags attached.
has duplicate 0010176resolved toracat kernel crashes on boot 
has duplicate 0010215closed Issue Tracker System & live CD won't boot latest kernel 
Attached Files? [abrt] full crash report.asc (95,422) 2015-12-30 22:18
https://bugs.centos.org/file_download.php?file_id=10542&type=bug

Notes
(0024987)
tigalch   
2015-12-07 07:35   
(Last edited: 2015-12-07 08:16)
z-Stream releases are not reproduced by CentOS - you need to buy those. However you can wait for the ISOs for 7.2 (1511) to be released and see if this solves your problem.

(0024988)
wolfy   
2015-12-07 08:18   
Note that the kernel from CentOS 7.2.1511 is already available via the CR repository
(0024989)
okmikel   
2015-12-07 08:27   
Yes, it is already available and this is the kernel, which crashes at boot without kernel option "initcall_blacklist=clocksource_done_booting".

kernel-3.10.0-327.el7.x86_64 is the CentOS 7.2.1511 kernel and there is also no newer in the git of RHEL 7 at git.centos.org.
(0024990)
wolfy   
2015-12-07 09:16   
The newer kernels that will be made available - once RedHat releases them - will also be based on 3.10.327.
The only option to have the problem fixed without using the additional kernel parameter is to persuade RH, via bugzilla.redhat.com, to include the fix in the main ( not z-Stream) kernel. CentOS builds its packages from the sources published by RH so there is nothing we can do until that time ( unless a modified kernel can be pushed via the centosplus repo).

(0025022)
okmikel   
2015-12-10 06:52   
New kernel kernel-3.10.0-327.3.1.el7.x86_64 does not solve the problem.
(0025052)
johan.kroeckel   
2015-12-14 21:15   
Second this: "kernel-3.10.0-327.3.1.el7.x86_64 does not solve the problem".
(0025057)
arrfab   
2015-12-15 10:02   
Current status is that there is no kernel that can fix this right now.
One can test other kernels (either provided through AltArch/SIG , like newer one for Xen , so coming from Virt-SIG) and see if that solves the issue

Two ways to fix the issue with kernel-3.10.0-327*) :

- for installed system :
  - boot with the initcall_blacklist=clocksource_done_booting kernel parameter added (or reboot on previous kernel)
  - once booted, add the same parameter at the end of the GRUB_CMDLINE_LINUX=" .." line , in the file /etc/default/grub
  - as root, run "grub2-mkconfig -o /etc/grub2.conf"

- for a system you want to install
  - start the kernel/boot media with the initcall_blacklist=clocksource_done_booting kernel parameter added
  - when you reboot, add the solution above (for installed system) if not already applied to the default grub config
(0025058)
rusxakep   
2015-12-15 10:11   
@arrfab, grub2-mkconfig -o /etc/grub2.cfg in default installation, not grub2.conf
(0025061)
toracat   
2015-12-15 12:39   
For more details on grub2, please see:

https://wiki.centos.org/HowTos/Grub2

:-)
(0025085)
madko   
2015-12-16 07:47   
Same problem here on proliant N40L/N54L (amd Neo).

Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg fix this bug. Thank you for sharing this solution.
(0025204)
timmerov   
2015-12-28 01:19   
second this: "Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg" works around the issue.
thanks!

now to remember to restore the line when a new kernel is pushed... ;->
(0025221)
whitroth   
2015-12-30 22:24   
We had this happen on three servers, all Dell R420's with Xeons. Other servers - supermicro's and another Dell or two did not show the issue.

On two of the three, I applied the workaround. This worked for some hours, then the systems went into distress, and I had to power cycle them, rebooting to the previous kernel. NOTE: when the boot to the 327 kernel failed, it dropped into rdosshell (sp?). When I power cycled to boot to the previous kernel, *that* failed the same way, and only booting to the second previous kernel allowed the system to come up. After the systems with the workaround failed, I *could* reboot to the most previous kernel, suggesting that the failure left something that broke the previous kernel boot.

I finally removed the 327 kernel, and we seem to be ok. I have uploaded an abrt-crash report, in hopes that might help debugging the issue.

One final note: when I applied the workaround to start, and ran grub3-mkconfig, on the two systems with large RAID appliances, it gave errors from os-probe, which reported "unsupported sector size 4096". Googling, I was under the impression that was fixed in 2010 - has it crept back in?
(0025417)
johan.kroeckel   
2016-01-19 17:06   
Just as a sidenote: still a problem in 3.10.0-327.4.4.el7.x86_64.
(0025501)
nroskam   
2016-01-25 02:56   
I can also confirm that adding "initcall_blacklist=clocksource_done_booting" string to the grub profile and re-making the grub2.cfg has finally resolved my booting issues with the latest kernels.

The last kernel that worked for me without this boot parameter was 3.10.0-229.14.1 on my HP N54L microserver. Now I run the latest 3.10.0-327.4.4 without problem.
(0025506)
jistone   
2016-01-25 17:56   
On my HP N40L, that workaround does let it boot, but I don't really trust this. For instance, I found that reading sysfs current_clocksource crashed the system, even as an unprivileged user!

  $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource

The crash is easy to understand -- clocksource_done_booting() is supposed to set pointer curr_clocksource, and sysfs_show_current_clocksources() dereferences it.
(0025534)
tru   
2016-01-27 12:32   
https://bugzilla.redhat.com/show_bug.cgi?id=1285235 is listed as clone to the original Red Hat bugzilla entry
(0025544)
whitroth   
2016-01-27 20:37   
I've just updated a CentOS 7 server to the latest kernel, vmlinuz-3.10.0-327.4.5.el7.x86_64, and the server fails to boot. It has failed on every 327 kernel.

Server: Dell R420, 2 Xeons, 124G RAM.

From the rdsosreport.txt, the relevant portion is:
[ 3.317974] <servername> systemd[1]: Starting File System Check on /dev/disk//
by-label/\x2f...
[ 3.320089] <servername> systemd-fsck[590]: Failed to detect device /dev/diskk
/by-label//
[ 3.320567] <servername> systemd[1]: systemd-fsck-root.service: main process
exited, code=exited, status=1/FAILURE
[ 3.320972] <servername> systemd[1]: Failed to start File System Check on /dee
v/disk/by-label/\x2f.
[ 3.321423] <servername> systemd[1]: Dependency failed for /sysroot.
[ 3.321872] <servername> systemd[1]: Dependency failed for Initrd Root File SS
ystem.
[ 3.322335] <servername> systemd[1]: Dependency failed for Reload Configuratii
on from the Real Root.
[ 3.322802] <servername> systemd[1]: Job initrd-parse-etc.service/start failee
d with result 'dependency'.
[ 3.323266] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-parse-etc.service.
[ 3.323697] <servername> systemd[1]: Job initrd-root-fs.target/start failed ww
ith result 'dependency'.
    3.323266] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-parse-etc.service.
[ 3.323697] <servername> systemd[1]: Job initrd-root-fs.target/start failed ww
ith result 'dependency'.
[ 3.324161] <servername> systemd[1]: Triggering OnFailure= dependencies of inn
itrd-root-fs.target.
[ 3.324586] <servername> systemd[1]: Job sysroot.mount/start failed with resuu
lt 'dependency'.
[ 3.324998] <servername> systemd[1]: Unit systemd-fsck-root.service entered ff
ailed state.
[ 3.325430] <servername> systemd[1]: systemd-fsck-root.service failed.
[ 3.326752] <servername> systemd[1]: Stopped dracut pre-pivot and cleanup hooo

And it stops, and drops me into the rdshell. Not that I can mkdir /mnt, and mount /dev/sda1, and /boot is there, and I can mount /dev/sda3, and root is there just fine.

        mark
(0025553)
nix_rules   
2016-01-29 16:44   
WORKAROUND DOES NOT WORK FOR ME.

"Adding initcall_blacklist=clocksource_done_booting to GRUB_CMDLINE_LINUX in /etc/default/grub and then grub2-mkconfig -o /etc/grub2.cfg" DID NOT fix it for me.

I have a Gigabyte brand GA-880GMA-UD2H motherboard with AMD Phenom(tm) II X4 965 Processor. All kernels newer than 3.10.0-229.20.1.el7.x86_64 fail to boot this machine even after adding the parameter above and reconfiguring grub. It hangs at "x86_64_start_kernel+0x152/0x175" every time on the boot screen with all the kernels newer than 3.10.0-229.20.1.el7.x86_64.

Is anybody besides me still having this issue? I hope someone is still working on correcting it.
(0025600)
evilissimo   
2016-02-04 14:24   
besides the crash when doing

 $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource

My systems ability to keep the system time disappeared and the NTPd failed to keep the system in sync so initcall_blacklist=clocksource_done_booting is a pretty bad idea if you're not just updating the system, the best is to revert your kernel back to the last known good kernel version which didn't crash on startup
(0025612)
danieln74   
2016-02-04 18:42   
hosted server (HP ProLiant DL360e Gen8, BIOS P73 12/20/2013 - Centos 7: 3.10.0-327.4.5.el7) has exactly the same issue; server delivered with 'initcall_blacklist=clocksource_done_booting'; i am having the same issues as described above - (server unstable, clock ultra fast, ntp failing, crash on cat).

-> has anyone tried using elrepo`s latest 3.18.4-stable or should I rather downgrade to 229?? Thank you!
(0025620)
octothorpe   
2016-02-04 23:42   
My system is also suffering from this problem.

Hardware: ASUS M4A785-M motherboard, 4 GB ECC RAM, Athlon II X4 630 CPU; using built-in video, plus an Intel SASUC8I HBA cross-flashed with LSI 1068E-R IT firmware, and an SiI 3132-based 2-port SATA card in addition to the on-board SATA ports. The system has ZFS installed, but the main system runs off MD-RAID; only the storage pools are ZFS.

I initially thought it was something related to the HBA, except, of course, it works fine on 3.10.0-229.20.1.el7.x86_64. I was all set to temporarily rip out the add-in cards until I found this thread.

I have found that the ELRepo mainline kernel (4.4.1-1) will boot just fine (including the ZFS modules), and it's currently running (but I'll have to see how things shake out over the next few days). I rebuilt the ELRepo kernel from the source RPM just for kicks (and personalization).
(0025624)
xenium   
2016-02-05 06:19   
Another effected system:

AMD Athlon(tm) II X3 450 Processor
GA-770T-USB3 motherboard

I have rolled back to a 3.10.0-229.x kernel successfully.
(0025630)
octothorpe   
2016-02-05 12:30   
Heads up: One thing I've discovered with the 4.4.1-1 kernel from ELRepo is that it won't work with KVM. Not a big issue for a file server with 4 GB RAM, but not so good for a VM host.
(0025636)
toracat   
2016-02-05 16:24   
@octothorpe

Could you elaborate a bit? When you say "won't work with KVM", what exactly does not work? A module issue or some setup problem or ...
(0025637)
octothorpe   
2016-02-05 16:41   
With the ELRepo 4.4.1-1 kernel, attempting to run virt-manager will give me the message:

Unable to connect to libvirt

authentication failed: no agent is available to authenticate

Note that virt-manager worked fine on a 3.10.0-229.x kernel.
(0025638)
toracat   
2016-02-05 16:52   
(Last edited: 2016-02-06 22:50)
@octothorpe

Thanks for the note. Discussing this any further here is way OT, so I will stop here.

[EDIT] Adding this note for posterity: virt-manager works without any issue with kernel-ml (el6 and el7) as well as kernel-lt (el6). Confirmed by two independent users.

(0025641)
danieln74   
2016-02-05 18:21   
quick update from my side: DL360e-8 (2x Xeon E5) working fine with ELRepo 4.4.1-1 and 'initcall_blacklist=clocksource_done_booting' removed. clocksources properly initialized, NTP sync up & ok, LAMP stack performing as expected!
(0025645)
octothorpe   
2016-02-06 14:03   
Some more data points: a Gigabyte GA-F2A88XM-D3H with an A10-7850K APU, and an ASUS M5A97 R1.02 with an FX-6100 CPU both run fine on 3.10.0-327.

The A10 system (which normally runs Windows 7) was set up to boot CentOS via Etherboot/PXE/iSCSI, and a kernel update to '327 worked flawlessly (whew).

The FX system (which normally runs Xubuntu 14.04 LTS) booted the 1511 install USB without a problem.
(0025675)
octothorpe   
2016-02-09 22:58   
It looks like this problem affects certain specific CPUs, or perhaps CPU/chipset combos. I have a second ASUS M4A785-M built up differently, which I discovered can successfully boot the 1511 install USB.

This one has an Nvidia graphics card instead of an HBA in the x16 slot, but, more importantly, it has a Phenom II 550 Black Edition for the CPU. It is also running with two additional cores unlocked in the BIOS (and is stable in its normal operation with Xubuntu).
(0025750)
okmikel   
2016-02-17 06:38   
kernel-3.10.0-327.10.1.el7.x86_64 fixes the bug for me.

Booting as expected and this also works:

# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
(0025751)
toracat   
2016-02-17 07:14   
@okmikel

Thank you for reporting that the latest kernel update fixes the bug. I can confirm this in the changelog:

- [kernel] tick: broadcast: Prevent livelock from event handler (Prarit Bhargava) [1284043 1265283]
- [kernel] clockevents: Serialize calls to clockevents_update_freq() in the core (Prarit Bhargava) [1284043 1265283]

(Note the BZ number 1265283)
(0025752)
xenium   
2016-02-17 08:04   
Also confirmed here. AMD Athlon(tm) II X3 450 Processor GA-770T-USB3 motherboard system previously effected by this bug now running 3.10.0-327.10.1.el7.x86_64 successfully, and current_clocksource == tsc.
(0025757)
octothorpe   
2016-02-17 12:10   
I was glad to see "kernel" among last night's batch of updates.

327.10.1 is working for me, too. *yay!*

Nuking 327.4.5 from orbit (just to be sure).
(0025758)
nroskam   
2016-02-17 12:21   
And I can concur as well that the latest is running fine: 3.10.0-327.10.1.el7.x86_64 x86_64 GNU/Linux on AMD Turion(tm) II Neo N54L Dual-Core Processor (HP Proliant N54L)
(0025771)
toracat   
2016-02-18 17:37   
Thanks all for reporting back with the confirmation. Now closing this as 'resolved'.

Issue History
2015-12-07 07:29okmikelNew Issue
2015-12-07 07:35tigalchNote Added: 0024987
2015-12-07 08:16wolfyNote Edited: 0024987bug_revision_view_page.php?bugnote_id=24987#r659
2015-12-07 08:18wolfyNote Added: 0024988
2015-12-07 08:27okmikelNote Added: 0024989
2015-12-07 09:09toracatStatusnew => acknowledged
2015-12-07 09:16wolfyNote Added: 0024990
2015-12-07 09:16wolfyNote Edited: 0024990bug_revision_view_page.php?bugnote_id=24990#r661
2015-12-10 06:52okmikelNote Added: 0025022
2015-12-14 21:15johan.kroeckelNote Added: 0025052
2015-12-15 10:02arrfabNote Added: 0025057
2015-12-15 10:11rusxakepNote Added: 0025058
2015-12-15 12:39toracatNote Added: 0025061
2015-12-16 07:47madkoNote Added: 0025085
2015-12-28 01:19timmerovNote Added: 0025204
2015-12-30 22:18whitrothFile Added: [abrt] full crash report.asc
2015-12-30 22:24whitrothNote Added: 0025221
2016-01-18 21:38truRelationship addedhas duplicate 0010176
2016-01-19 17:06johan.kroeckelNote Added: 0025417
2016-01-22 18:10toracatRelationship addedhas duplicate 0010215
2016-01-25 02:56nroskamNote Added: 0025501
2016-01-25 17:56jistoneNote Added: 0025506
2016-01-27 12:32truNote Added: 0025534
2016-01-27 20:37whitrothNote Added: 0025544
2016-01-29 16:44nix_rulesNote Added: 0025553
2016-02-04 14:24evilissimoNote Added: 0025600
2016-02-04 18:42danieln74Note Added: 0025612
2016-02-04 23:42octothorpeNote Added: 0025620
2016-02-05 06:19xeniumNote Added: 0025624
2016-02-05 12:30octothorpeNote Added: 0025630
2016-02-05 16:24toracatNote Added: 0025636
2016-02-05 16:41octothorpeNote Added: 0025637
2016-02-05 16:52toracatNote Added: 0025638
2016-02-05 18:21danieln74Note Added: 0025641
2016-02-06 14:03octothorpeNote Added: 0025645
2016-02-06 22:43toracatNote Edited: 0025638bug_revision_view_page.php?bugnote_id=25638#r684
2016-02-06 22:50toracatNote Edited: 0025638bug_revision_view_page.php?bugnote_id=25638#r685
2016-02-09 22:58octothorpeNote Added: 0025675
2016-02-17 06:38okmikelNote Added: 0025750
2016-02-17 07:14toracatNote Added: 0025751
2016-02-17 08:04xeniumNote Added: 0025752
2016-02-17 12:10octothorpeNote Added: 0025757
2016-02-17 12:21nroskamNote Added: 0025758
2016-02-18 17:37toracatNote Added: 0025771
2016-02-18 17:37toracatStatusacknowledged => resolved
2016-02-18 17:37toracatResolutionopen => fixed