View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0008778 | CentOS-7 | kernel | public | 2015-05-26 08:12 | 2016-02-05 17:21 |
Reporter | ds | ||||
Priority | normal | Severity | major | Reproducibility | random |
Status | resolved | Resolution | fixed | ||
Platform | OS | Centos | OS Version | 7.1 | |
Product Version | 7.1-1503 | ||||
Target Version | Fixed in Version | ||||
Summary | 0008778: ksoftirqd eats 100% of one CPU core | ||||
Description | After upgrade from 7.0 to 7.1 and reboot on some servers we see 100% utilization of one of the cpu cores. Core number varies. Currently used kernel version - 3.10.0-123.13.2.el7.x86_64. After another reboot utilization normilizes. In /proc/softirqs we see high numbers in TASKLET row (this time CPU0 is affected): <code> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 HI: 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TIMER: 1107132436 437853395 401879369 255575804 224796951 181858935 268323788 273190651 219198109 216656776 192918835 175444270 62879974 145140995 148009105 184203050 171729217 288564473 103607097 105443964 105668327 105909463 166110551 307810043 NET_TX: 664487 274 244 2 3 4 131 133 11 7 6 4 14 13 6 8 1 282 8 11 10 5 221524 282 NET_RX: 2081909715 2184369844 1906325178 625317 1256370 1281370 2251063633 1300483703 789588 2064216 1091868 1884699 71527678 63141522 26213971 27325773 818750 2320018825 34032038 68445738 46307583 30332431 1375382484 2288936504 BLOCK: 2301 8619109 8918756 11174469 11777096 11634625 10068 8816 34319192 17162 13579 10469 432698 1682619 1772206 1837406 1984121 1628977 19001 18352 22519 24993 15984 15972 BLOCK_IOPOLL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TASKLET: 3768868926 12911 13480 16958 19763 21429 75876 68637 80919 86746 91065 95086 2122 9319 10588 11062 11314 7759 26298 25200 32633 31105 22046 13937 SCHED: 9957457 180696098 91530983 45153313 26556382 21272687 17306516 14251898 12432926 11947719 12085179 12003983 6424630 18347702 18709713 16374081 16724016 20337281 10636922 10597676 10166121 10165541 11938146 12602825 HRTIMER: 3 886001 909122 943931 976256 1177210 846957 876456 1110783 1104845 1047876 1181872 198164 754680 784805 975778 887300 705269 607823 601466 755923 817634 545135 534384 RCU: 171950870 112589795 111936795 83746924 79124467 75529773 85902331 85484757 76952953 76335397 73365838 70583640 27133126 57815477 59426103 64219971 63294698 90874919 45590808 45566614 45311086 45994607 57129417 83089034 <code> | ||||
Steps To Reproduce | none | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | |||||
2 correction: Currently used kernel version - 3.10.0-229.1.2.el7.x86_64. Core number - for now only CPU0 is affected (network queues are distributed between different cores). |
|
Probably related to this known issue: https://access.redhat.com/solutions/1409393 |
|
See also: https://access.redhat.com/articles/879293 |
|
@toracat, unfortunately, Red Hat KB is accessible only by subscribers. Nevertheless, it seems that this issue has been fixed by the following commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/ioat?id=da87ca4d4ca101f177fffd84f1f0a5e4c0343557 More info: * https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1291113 * https://lwn.net/Articles/588457/ I've faced the same issue and going to try recompiled CentOS 7 kernel with abovementioned patch included. |
|
@post-factum, Thanks for the links. Please do let us know of the result of patching. |
|
Now that the referenced patch is in the mainline stable kernel from kernel.org, one can test-install kernel-ml [1] from ELRepo and see if the problem does not occur there. Can someone test this? [1] http://elrepo.org/tiki/kernel-ml |
|
A good piece of info from RH: "Yes, the patch above is the expected solution. It's also scheduled to be included on RHEL 7 kernel." If the patch is not in the next el7 kernel update, I will attempt to include it in the centosplus kernel. |
|
@toracat, the problem is that to raise up the issue one need some time. First occurrence happened to us after 3 months of uptime, second one — just after 8 days. We've scheduled our production server for kernel maintenance on Sunday night. |
|
kernel-3.10.0-229.11.1.el7 is out. The plus kernel now has the patch quoted in this bug report. | |
Sadly, 3.10.0-229.11.1.el7 does not have this patch. Anyway, I've recompiled 229.7 with it, and our server runs for 10 days with no problem. Should still monitor, though. |
|
@post-factum, Right, the distro kernel still does not have this patch. Good to know the patch seems to be working on your system (so far). If you switch to kernel-plus-3.10.0-229.11.1.el7.centos.plus, it should work like the one you compiled. |
|
OK, server survived ~25 days of successful uptime with self-compiled kernel. I had to reinstall it due to disk replace, now switched to kernel-plus. | |
The distro kernel 3.10.0-229.14.1.el7 has the patch. It is also confirmed that the RHEL 7.2 beta kernel (-306) also has it. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-05-26 08:12 | ds | New Issue | |
2015-05-28 10:13 | ds | Note Added: 0023231 | |
2015-05-28 15:33 | toracat | Note Added: 0023233 | |
2015-05-28 15:39 | toracat | Note Added: 0023234 | |
2015-05-28 15:39 | toracat | Status | new => acknowledged |
2015-07-24 12:26 | post-factum | Note Added: 0023706 | |
2015-07-24 16:29 | toracat | Note Added: 0023709 | |
2015-07-24 16:42 | toracat | Note Added: 0023710 | |
2015-07-24 16:42 | toracat | Note Edited: 0023709 | View Revisions |
2015-07-24 17:28 | toracat | Note Added: 0023712 | |
2015-07-25 10:17 | post-factum | Note Added: 0023717 | |
2015-08-06 15:50 | toracat | Note Added: 0023798 | |
2015-08-06 16:27 | post-factum | Note Added: 0023799 | |
2015-08-06 16:44 | toracat | Note Added: 0023800 | |
2015-08-19 09:54 | post-factum | Note Added: 0023949 | |
2015-10-21 18:55 | toracat | Note Added: 0024676 | |
2015-10-21 18:55 | toracat | Status | acknowledged => resolved |
2015-10-21 18:55 | toracat | Resolution | open => fixed |