View Issue Details

IDProjectCategoryView StatusLast Update
0008778CentOS-7kernelpublic2016-02-05 17:21
Reporterds 
PrioritynormalSeveritymajorReproducibilityrandom
Status resolvedResolutionfixed 
PlatformOSCentosOS Version7.1
Product Version7.1-1503 
Target VersionFixed in Version 
Summary0008778: ksoftirqd eats 100% of one CPU core
DescriptionAfter upgrade from 7.0 to 7.1 and reboot on some servers we see 100% utilization of one of the cpu cores. Core number varies.
Currently used kernel version - 3.10.0-123.13.2.el7.x86_64.
After another reboot utilization normilizes.

In /proc/softirqs we see high numbers in TASKLET row (this time CPU0 is affected):
<code>
                    CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23
          HI: 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
       TIMER: 1107132436 437853395 401879369 255575804 224796951 181858935 268323788 273190651 219198109 216656776 192918835 175444270 62879974 145140995 148009105 184203050 171729217 288564473 103607097 105443964 105668327 105909463 166110551 307810043
      NET_TX: 664487 274 244 2 3 4 131 133 11 7 6 4 14 13 6 8 1 282 8 11 10 5 221524 282
      NET_RX: 2081909715 2184369844 1906325178 625317 1256370 1281370 2251063633 1300483703 789588 2064216 1091868 1884699 71527678 63141522 26213971 27325773 818750 2320018825 34032038 68445738 46307583 30332431 1375382484 2288936504
       BLOCK: 2301 8619109 8918756 11174469 11777096 11634625 10068 8816 34319192 17162 13579 10469 432698 1682619 1772206 1837406 1984121 1628977 19001 18352 22519 24993 15984 15972
BLOCK_IOPOLL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     TASKLET: 3768868926 12911 13480 16958 19763 21429 75876 68637 80919 86746 91065 95086 2122 9319 10588 11062 11314 7759 26298 25200 32633 31105 22046 13937
       SCHED: 9957457 180696098 91530983 45153313 26556382 21272687 17306516 14251898 12432926 11947719 12085179 12003983 6424630 18347702 18709713 16374081 16724016 20337281 10636922 10597676 10166121 10165541 11938146 12602825
     HRTIMER: 3 886001 909122 943931 976256 1177210 846957 876456 1110783 1104845 1047876 1181872 198164 754680 784805 975778 887300 705269 607823 601466 755923 817634 545135 534384
         RCU: 171950870 112589795 111936795 83746924 79124467 75529773 85902331 85484757 76952953 76335397 73365838 70583640 27133126 57815477 59426103 64219971 63294698 90874919 45590808 45566614 45311086 45994607 57129417 83089034
<code>
Steps To Reproducenone
TagsNo tags attached.
abrt_hash
URL

Activities

ds

ds

2015-05-28 10:13

reporter   ~0023231

2 correction:
Currently used kernel version - 3.10.0-229.1.2.el7.x86_64.
Core number - for now only CPU0 is affected (network queues are distributed between different cores).
toracat

toracat

2015-05-28 15:33

manager   ~0023233

Probably related to this known issue:

https://access.redhat.com/solutions/1409393
toracat

toracat

2015-05-28 15:39

manager   ~0023234

See also:

https://access.redhat.com/articles/879293
post-factum

post-factum

2015-07-24 12:26

reporter   ~0023706

@toracat, unfortunately, Red Hat KB is accessible only by subscribers.

Nevertheless, it seems that this issue has been fixed by the following commit:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/dma/ioat?id=da87ca4d4ca101f177fffd84f1f0a5e4c0343557

More info:

* https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1291113
* https://lwn.net/Articles/588457/

I've faced the same issue and going to try recompiled CentOS 7 kernel with abovementioned patch included.
toracat

toracat

2015-07-24 16:29

manager   ~0023709

Last edited: 2015-07-24 16:42

View 2 revisions

@post-factum,

Thanks for the links. Please do let us know of the result of patching.

toracat

toracat

2015-07-24 16:42

manager   ~0023710

Now that the referenced patch is in the mainline stable kernel from kernel.org, one can test-install kernel-ml [1] from ELRepo and see if the problem does not occur there. Can someone test this?

[1] http://elrepo.org/tiki/kernel-ml
toracat

toracat

2015-07-24 17:28

manager   ~0023712

A good piece of info from RH:

"Yes, the patch above is the expected solution. It's also scheduled to be included on RHEL 7 kernel."

If the patch is not in the next el7 kernel update, I will attempt to include it in the centosplus kernel.
post-factum

post-factum

2015-07-25 10:17

reporter   ~0023717

@toracat, the problem is that to raise up the issue one need some time. First occurrence happened to us after 3 months of uptime, second one — just after 8 days.

We've scheduled our production server for kernel maintenance on Sunday night.
toracat

toracat

2015-08-06 15:50

manager   ~0023798

kernel-3.10.0-229.11.1.el7 is out. The plus kernel now has the patch quoted in this bug report.
post-factum

post-factum

2015-08-06 16:27

reporter   ~0023799

Sadly, 3.10.0-229.11.1.el7 does not have this patch.

Anyway, I've recompiled 229.7 with it, and our server runs for 10 days with no problem. Should still monitor, though.
toracat

toracat

2015-08-06 16:44

manager   ~0023800

@post-factum,

Right, the distro kernel still does not have this patch. Good to know the patch seems to be working on your system (so far). If you switch to kernel-plus-3.10.0-229.11.1.el7.centos.plus, it should work like the one you compiled.
post-factum

post-factum

2015-08-19 09:54

reporter   ~0023949

OK, server survived ~25 days of successful uptime with self-compiled kernel. I had to reinstall it due to disk replace, now switched to kernel-plus.
toracat

toracat

2015-10-21 18:55

manager   ~0024676

The distro kernel 3.10.0-229.14.1.el7 has the patch. It is also confirmed that the RHEL 7.2 beta kernel (-306) also has it.

Issue History

Date Modified Username Field Change
2015-05-26 08:12 ds New Issue
2015-05-28 10:13 ds Note Added: 0023231
2015-05-28 15:33 toracat Note Added: 0023233
2015-05-28 15:39 toracat Note Added: 0023234
2015-05-28 15:39 toracat Status new => acknowledged
2015-07-24 12:26 post-factum Note Added: 0023706
2015-07-24 16:29 toracat Note Added: 0023709
2015-07-24 16:42 toracat Note Added: 0023710
2015-07-24 16:42 toracat Note Edited: 0023709 View Revisions
2015-07-24 17:28 toracat Note Added: 0023712
2015-07-25 10:17 post-factum Note Added: 0023717
2015-08-06 15:50 toracat Note Added: 0023798
2015-08-06 16:27 post-factum Note Added: 0023799
2015-08-06 16:44 toracat Note Added: 0023800
2015-08-19 09:54 post-factum Note Added: 0023949
2015-10-21 18:55 toracat Note Added: 0024676
2015-10-21 18:55 toracat Status acknowledged => resolved
2015-10-21 18:55 toracat Resolution open => fixed