View Issue Details

IDProjectCategoryView StatusLast Update
0016728CentOS-8kernelpublic2019-11-13 19:01
Status newResolutionopen 
Product Version8.0.1905 
Target VersionFixed in Version 
Summary0016728: Transparent Huge Pages set to [always] is sub-optimal for many applications
DescriptionTransparent Huge Pages provides real benefit to certain applications by potentially reducing TLB misses and improving performance. For other applications, it can bloat memory usage and cause performance regressions. By default, the kernel enables THP for applications that explicitly ask for it via MADV_HUGEPAGE:

> "madvise" will enter direct reclaim like "always" but only for regions
> that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

RHEL, CentOS, and CoreOS (but not Fedora) all appear to override this behavior and set THP to [always]. This unfortunately causes issues with a large variety of software including, but not limited to:

Go runtime:

More recently, we've also seen memory usage bloat in Ceph (using tcmalloc) when THP is set to always potentially resulting in OOM when running inside containers. There are various ways to potentially work around this at the application level including using MADV_NOHUGEPAGE or a prctl flag. Requiring these workarounds to disable THP for a given application is counter-intuitive for several reasons:

1) It deviates from the default kernel behavior without a strong justification as to why.

2) It puts the onus on developers to explicitly stop the kernel from engaging in sub-optimal behavior.

3) It's incredibly confusing to have a system-wide default that claims to "always" enable a setting that many applications may or may not silently disable through workarounds.

Finally, when another prominent distribution was faced with a similar choice, they ran stream and malloc tests showing improvement at various allocation sizes when THP was disabled. Ultimately that lead them to switching back to the kernel default (ie madvise) with no apparent performance regressions:
Steps To ReproduceThis is a well known issue that can be reproduced via a variety of software. Steps to reproduce in ceph are listed below.

Steps to Reproduce:
1. Install a single OSD ceph cluster.
2. Run a background write workload using hsbench or fio sufficient to fill the ceph-osd caches.
3. compare memory usage of the OSD process when THP is set to [always] vs [madvise]

Additional Information
TagsNo tags attached.




2019-11-13 19:01

reporter   ~0035687


While the kernel documentation claims that madvise is the default, the actual code in mm/Kconfig shows that "always" is the default choice, so I retract the statement about differing from the kernel. See:

Issue History

Date Modified Username Field Change
2019-11-13 18:06 mnelson New Issue
2019-11-13 19:01 mnelson Note Added: 0035687