View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005716||CentOS-6||kernel||public||2012-05-09 01:00||2013-09-24 22:56|
|Platform||Intel X7560 - 32 cores total||OS||Centos||OS Version||6.2|
|Fixed in Version||6.4|
|Summary||0005716: Random 'stalls' ocurring often up to several minutes - broken transparent hugepage support|
|Description||Processes stall for an extended period of time, often tasks were reported as 'blocked for more that 120 seconds' and for a number of reasons I was suspecting issues with the LSI Raid driver.|
Other symptoms included:
+ Occasional High CPU for kswapd - even with no swap used on the system.
+ Occasional High CPU for khugepaged.
+ Frequent high system CPU for no apparent reason.
Following several threads I've found and self diagnoses lead to issues in transparent hugepage support (aka memory defrag).
Disabling this support stopped all occurrences of the problem and the systems have been stable for a week now. It is worth noting the Redhat 6 appears to have this functionality disabled by default - where Centos has it enabled.
echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
|Steps To Reproduce||Enable hugepage support and load up a lot of high memory usage processes. The systems with the worst symptoms had large numbers of cores and memory:|
Cores: 32 (4 * Intel X7560)
|Tags||No tags attached.|
I've had this problem as well. With CentOS 6.2 and 6.3 on a Dell T7500 with 12 physical cores / 24 CPUs and 96GB RAM, and a large Java heap (30GB), I saw these symptoms:
1. Small programs taking a minute or more to start up
2. Very slow response times for user input
3. dstat showing 30k context switches per second very often
4. Sometimes hundreds of thousands of context switches per second
5. Sometimes hundreds of thousands of interrupts per second
6. /proc/interrupts showing ~200 million TLB shootdowns per CPU after a week
After I disabled hyperthreading to reduce the number of CPUs to 12, symptoms were slightly reduced.
After I turned off THP defrag with the two commands above, I didn't see any of these symptoms, and now see a steady number of context switches (~4k/sec) and interrupts (~1.5k/sec). Programs always start up quickly. No insane TLB shootdown counts.
AnonHugePages utilization still seems to be very good (covering ~2/3 of my RSS usage) without the defragger.
This post also reports positive results from turning off THP defrag:
|that is, "1.5k/sec"; "2/3 of my RSS"|
Encountered the same issue after upgrading a CentOS 5.4 box to 6.2 (and eventually 6.3). With many database-like processes running, I/O to our SSDs appeared to take a long time, coinciding with very high system CPU% (50-100% sometimes). Occasionally, khugepaged would show up at towards the top of "top". Various filesystems and schedulers were tested.
During these periods, a 256MB malloc+memset could be seen taking 45 seconds, or even minutes. "ps aux" could take upwards of 30 seconds, with strace revealing that read()s from /proc could take 9 or more seconds!!
A simple test showed how often system performance was affected - run ps every second, and look for runtimes >= 1 second:
amrit@ronnie:~$ while [ true ]; do echo `date` :: `(time ps aux) 2>&1 | grep ^real`; sleep 1; done | grep -v 0m0
Thu Oct 11 13:36:09 PDT 2012 :: real 0m1.004s
Thu Oct 11 13:36:21 PDT 2012 :: real 0m1.870s
Thu Oct 11 13:36:27 PDT 2012 :: real 0m1.256s
Thu Oct 11 13:36:34 PDT 2012 :: real 0m1.139s
Thu Oct 11 13:36:36 PDT 2012 :: real 0m1.134s
Thu Oct 11 13:36:42 PDT 2012 :: real 0m1.416s
After performing "echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled", all issues were resolved. System CPU% is typically under 10%, "ps aux" has not taken longer than 1 second even once, and TLB shootdowns are down to about 1/100th of the previous rate.
A few details about the box: 12 cores, 24GB w/ no swap, 4x 15k RAID10 + 1.2TB FusionIO ioDrive2 + 2x 960GB OCZ Revodrive x2 RAID0, many concurrent users (5-15), many CPU & disk intensive processes coming and going, and typically ~1000 processes running.
> It is worth noting the Redhat 6 appears to have this functionality
> disabled by default - where Centos has it enabled.
CentOS kernel should be no different. In RHEL-6, the default seems to be:
/sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag -> yes
/sys/kernel/mm/redhat_transparent_hugepage/defrag -> always
|+1 I was seeing similar issues on my Varnish cache servers. Each has 90GB RAM and 12 Cores. Disabled Khugepaged mid issue to work around the problem. System much more responsive now.|
I had similar behaviour on my system (12 HT cores and 48 GB RAM with CentOS 6.0) with many high-memory processes (in-memory databases etc.) waiting (not using any CPU). I noticed that in such a state, even small processes (with small resident memory requirements) will take a lot of system CPU. On doing an strace, I found that for my particular process, each call to munmap with MAP_STACK for a 10 MB stack space and madvise was taking close to a few seconds. I came across this thread and disabled THP as above, and the problem vanished without having to restart any processes.
However, after a few days of further uptime, the problem resurfaced again and my system is now showing the same symptoms but only on a few (and different) processes. I am not sure if it is a related problem. I have not run an strace this time. When the problem shows itself, a snapshot of 'perf top' looks like this for a few minutes.
15.20% [kernel] [k] mem_cgroup_del_lru_list
5.26% [kernel] [k] shrink_inactive_list
4.25% [kernel] [k] mem_cgroup_add_lru_list
3.01% [kernel] [k] __isolate_lru_page
2.17% [kernel] [k] isolate_pages_global
2.15% [kernel] [k] page_waitqueue
1.80% [kernel] [k] __wake_up_bit
1.80% [kernel] [k] shrink_page_list.clone.0
1.79% [kernel] [k] unlock_page
1.77% [kernel] [k] __list_add
1.54% [kernel] [k] page_evictable
1.44% [kernel] [k] lookup_page_cgroup
1.31% [kernel] [k] list_del
1.26% [kernel] [k] release_pages
0.84% [kernel] [k] __mod_zone_page_state
0.70% [kernel] [k] mem_cgroup_del_lru
I suppose the problem reported by the submitter has been fixed with CentOS/RHEL 6.4.
@siddhartha, are you running kernel 2.6.32-358.el6 or later?
|Closing as 'resolved in EL6.4'. If you find it otherwise, feel free to reopen or submit a new report.|
|2012-05-09 01:00||jason.ball||New Issue|
|2012-08-19 21:07||ivank||Note Added: 0015686|
|2012-08-19 21:09||ivank||Note Added: 0015687|
|2012-10-11 22:26||amrit||Note Added: 0015916|
|2012-10-12 01:02||toracat||Note Added: 0015917|
|2012-10-15 09:22||moylo||Note Added: 0015935|
|2013-06-10 18:15||siddhartha||Note Added: 0017544|
|2013-09-12 18:07||toracat||Note Added: 0017986|
|2013-09-12 18:08||toracat||Status||new => feedback|
|2013-09-24 22:43||toracat||Note Added: 0018056|
|2013-09-24 22:43||toracat||Status||feedback => resolved|
|2013-09-24 22:43||toracat||Resolution||open => fixed|
|2013-09-24 22:43||toracat||Fixed in Version||=> 6.4|