View Issue Details

IDProjectCategoryView StatusLast Update
0002189CentOS-5kernelpublic2014-03-05 20:31
Reporterlarstr 
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version5.0 
Target VersionFixed in Version 
Summary0002189: CentOS is not getting optimal performance in a virtualized environment and on slow cpus
DescriptionIn 2.4 kernels the system timer was normally clocked at 100 Hz, while in 2.6 the default system timer is set to 1000 Hz (some other distros are not following these "rules", and USER_HZ is still 100). 1000 Hz is definately a good thing for desktop computers requiring fast interactive responses, but there are environments where this causes bad side effects.

Kernels compiled for SMP the system timer will requests twice as many interrupts when running on a single cpu and 2.5 as many when running on a dual cpu system.

One might argue that the smp kernels has better threading than unicpu kernels, but there are other negative effects involved by using these kernels on unicpu systems.

Some cpus can't keep up with this interrupt rate, and the 2.6 kernel has code to detect this, but it can't always correct for lost ticks, and having the interrupt rate this high is also affecting the performance negatively. The negative effects of this has been experienced in virtual environments using VMware products (ESX, Server, Workstation, Fusion, Ace & Player), but is also a potential problem on physical systems running on slow cpus such as the Geode, even though the clock issues aren as bad on physical systems because detecting lost ticks are more predictable in a physical system than a virtual system.

In a virtual environment, a key indicator that these systems are not properly setup is if you have an idle guest system (indicated by tools inside the guest) while the host reports that this guest is using a lot of cpu (typically 20-30%). Another indicator is that the clock inside the guest is not keeping up with time. On newer cpu's these effects are not as visible as on old cpus (for example a Pentium 3 500-1000MHz), but also on newer cpus you will not be able to scale as well due to these issues, resulting in fewer guests systems per server host.

It would be a great benefit if a 100Hz unicpu kernel was made available in one of the CentOS repositories. There is already a 100 Hz kernel repository available (http://vmware.xaox.net/centos/), but it only contains SMP kernels.
Additional InformationRelated documentation:
http://www.vmware.com/pdf/vsmp_best_practices.pdf
http://www.vmware.com/pdf/vmware_timekeeping.pdf
http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1420
http://kb.vmware.com/kb/j1730
http://www.vmware.com/pdf/vi_performance_tuning.pdf
http://www.vmware.com/community/thread.jspa?threadID=88879&tstart=0
http://www.vmware.com/community/message.jspa?messageID=540949
TagsNo tags attached.

Relationships

duplicate of 0001680 closedJohnnyHughes CentOS-4 Provide kernel with low interrupt timer for use in VMware 
has duplicate 0002320 closedkbsingh@karan.org CentOS-5 Follow-up to Bug#2189 - Getting optimal performance in a virtualized environment 

Activities

smooge

smooge

2007-07-05 16:54

reporter   ~0005524

I think we were looking at doing this for the CentOS Plus repository. The odd issue is that some systems work better at 250Mhz (which the 2.6.18 set I think is set to) and some work better at 100Mhz and some 1000).
larstr

larstr

2007-07-09 06:15

reporter   ~0005539

Yes, 2.6.18 is indeed default set to 250Hz (not MHz ;)), and since it's an smp kernel it requests 750 clock interrupts per second.
toracat

toracat

2007-07-09 16:50

manager   ~0005543

If I'm not mistaken, 2.6.18 defaults to 1000Hz.

# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I understand this is Linus Torvalds' choice.
larstr

larstr

2007-07-12 11:44

reporter   ~0005556

The official 2.6.18 linux kernel defaults to 1000 Hz, but the CentOS kernel seems to default to 250.
/usr/src/kernels/2.6.18-8.el5-i686/kernel/Kconfig.hz:

choice
        prompt "Timer frequency"
        default HZ_250

Having the kernel at 1000Hz is good for physical systems, and especially desktop systems. In a virtual system this have negative side effects. We have also seen similar effects in computers with slow cpus such as Soekris with Geode cpu (133MHz "486").
toracat

toracat

2007-07-12 12:07

manager   ~0005557

Well, the Kconfig.hz file is referred to when configuring .config with make menuconfig (or make xconfig). In fact, if you go to the bottom of that file, it has these lines:

config HZ
        int
        default 100 if HZ_100
        default 250 if HZ_250
        default 1000 if HZ_1000

In the CentOS kernel (and mainline kernel), "CONFIG_HZ_1000=y" is defined, there fore the timer frequency becomes 1000.

Hope this clears a bit.

Akemi

2007-07-13 23:56

 

centos-cpuload.png (95,657 bytes)
centos-cpuload.png (95,657 bytes)

2007-07-13 23:57

 

debian-cpuload.png (64,032 bytes)
debian-cpuload.png (64,032 bytes)
larstr

larstr

2007-07-14 00:00

reporter   ~0005567

Ok, I believe you Akemi, even though smooge got me confuzed and investigating the 250Hz option for a second there. ;)

Having Hz=1000 and a SMP enabled kernel is still a problem in a virtualized environment. Having installed a minimal CentOS with the default kernel in ESX3 on a system with 8x Xeon MP 1900MHz cpus we see the following. Inside the VM, the load is near 0, while outside of the VM we can see that it's using a lot of resources. If we compare it to another virtual machine running debian and a 2.4 kernel we can see how much they differ in the load generated. The debian VM is a LAMP setup in prod, but currently for the most of the time idling. I've now uploaded a cpu graph for both these so we can get a clearer view of how CentOS is behaving.

Lars
toracat

toracat

2007-07-14 01:07

manager   ~0005568

Is there any chance you can rebuild the kernel yourself with the desired options? If not, are you willing to run performance tests if the kernel is provided?

Akemi
larstr

larstr

2007-07-14 01:38

reporter   ~0005569

If a kernel was provided I would of course test it's performance.

I could probably build such a kernel myself too (even though that is not something I've done too many times). I believe such kernel would benefit the popularity of CentOS in virtualized environments.

I filed this bug report after asking about this in #centos and I was told that this question was brought up from time to time and it could be a good idea to provide such a kernel in the repository.

Lars
toracat

toracat

2007-07-14 01:51

manager   ~0005570

Which arch (i686 or x86_64) would you like to test? I am myself thinking of doing a test.

I totally agree with you. If the optimized kernels are available from CentOS, that would benefit a number of users. However, this would depend on the time and resources the CentOS team can afford. Positive test results from multiple users might help them determine if the whole thing is worth the effort.

Akemi
larstr

larstr

2007-07-14 07:35

reporter   ~0005571

I would like to test i686 first. Thanks.
toracat

toracat

2007-07-14 07:39

manager   ~0005572

I currently have x86_64 only. Will build i686 soon.

Akemi
toracat

toracat

2007-07-14 20:04

manager   ~0005573

Last edited: 2007-07-14 20:26

Lars,

I now have both i686 and x86_64 (UP, 100Hz) kernel and kernel-devel rpm's. Send me e-mail (amyagi at gmail dot com) for download instructions.

Akemi

Note added: Just booted the i686 version in my VM. So far so good.

2007-07-14 21:24

 

larstr

larstr

2007-07-14 21:29

reporter   ~0005574

Wow, that really did it, Akemi! :)

The cpu load is now reduced from ~15% to 3% and %READY is reduced from ~4% to 0.4%.

The remaining cpu load is probably caused by the default installed services. I've uploaded another graph showing the load before, during and after the kernel change, and you can really see how the load changes.

Lars
toracat

toracat

2007-07-14 21:34

manager   ~0005575

That's really good news, Lars. The graph is impressive.

Akemi
toracat

toracat

2007-07-15 04:23

manager   ~0005577

Just wanted add one thing. smooge's note is not really incorrect. I noticed a few minutes ago that xen kernel is set to 250Hz. It is the standard kernel that uses 1000Hz by default.

Akemi
toracat

toracat

2007-07-15 11:51

manager   ~0005578

Lars

Since you have been testing with these kernels, could you make a comment as to what change contributed to the better performance to what extent? In other words, changing 1000 -> 100Hz versus SMP -> UP. How does each change contribute?

Akemi
larstr

larstr

2007-07-16 12:42

reporter   ~0005581

Akemi,
It seems that the UP vs SMP is the thing most affecting things here. Also, on newer cpus, this isn't as noticable as on old cpus. The 1900 Xeon I've been using is 4 years old.

The results are however a bit surprising as they show quite similar results for both SMP kernels. The kernel builds do however differ slightly and I don't know if that is affecting anything:

kernel-2.6.18-8.1.8.UP.100Hz.el5.i686 <- Akemi kernel
up 100Hz
cpu 2.24%
ready 0.27%

kernel-2.6.18-8.1.4.el5.centos.plus.VMware.i686 <- xaox kernel
smp 100Hz
cpu 14.3%
ready 5%

kernel-2.6.18-8.1.8.el5.i686 <- default kernel
smp 1000Hz
cpu 13.6%
ready 3.15%

Lars
toracat

toracat

2007-07-16 14:19

manager   ~0005583

Last edited: 2007-07-16 14:20

Lars,

Very interesing result. But as you pointed out, this could be an apples-to-oranges comparison. If you are willing to take it further for completedness, I'd have no problem rebuilding other two kernels (Hz change only and SMP change only). What do you think?

Akemi

larstr

larstr

2007-07-16 19:34

reporter   ~0005587

Yes, I agree. Some further testing would help us understand more of these findings, and I'm willing to continue this testing.

Lars
toracat

toracat

2007-07-16 19:53

manager   ~0005588

Great. I said, "I'd have no problem rebuilding..." However, I am having a problem with the build process right now. But I will fix it one way or another.

Akemi
toracat

toracat

2007-07-18 09:37

manager   ~0005601

Lars,

I now have the other 2 variants:

kernel-2.6.18-8.1.8.SMP.100HZ.el5.i686.rpm
kernel-2.6.18-8.1.8.UP.1000HZ.el5.i686.rpm

They are in the same place as before.

Akemi
larstr

larstr

2007-07-18 11:35

reporter   ~0005602

Last edited: 2007-07-20 18:31

Just to be sure I reinstalled CentOS and tried all these kernels on a freshly installed CentOS to get as accurate results as possible.

I have to admit, the results were much more like I had expected. :-)

kernel-2.6.18-8.1.8.el5.i686 <- Default
smp 1000Hz
cpu 13.06%
ready 4.32%

kernel-2.6.18-8.1.8.UP.100Hz.el5.i686 <- Akemi kernel
up 100Hz
cpu 1.93%
ready 0.23%

kernel-2.6.18-8.1.8.UP.1000Hz.el5.i686 <- Akemi kernel
up 1000Hz
cpu 8.88%
ready 3.03%

kernel-2.6.18-8.1.8.SMP.100Hz.el5.i686 <- Akemi kernel
smp 100Hz
cpu 2.35%
ready 0.36%

kernel-2.6.18-8.1.4.el5.centos.plus.VMware.i686 <- xaox kernel
smp 100Hz
cpu 14.16%
ready 4.80%

toracat

toracat

2007-07-18 12:14

manager   ~0005603

Lars, you are so fast. All the tests were done while I was asleep :)

Good results indeed. I suspect that xaox' kernel is not really 100Hz. Did not read the details of his vmware post, but the way it was done (by editing some .h file) would not make the intended change in the freq in the config. At any rate, I am glad we have a full set of data.

Akemi
xaox

xaox

2007-07-18 13:45

reporter   ~0005608

I was just made aware of this bug report.

I have checked and my kernels were built with HZ=1000, not HZ=100. I have a build problem I need to work out.
toracat

toracat

2007-07-19 17:07

manager   ~0005618

xaox,

What build problem are you having? Could you describe it in more details?

Akemi
xaox

xaox

2007-07-19 17:53

reporter   ~0005619

toracat,

The build problem as it turns out is that I'm an idiot. At some point my updated kernel configuration files were overwritten with the originals and I didn't notice.

I'm rebuilding now with the fixed config files.
toracat

toracat

2007-07-19 18:12

manager   ~0005620

xaox,

No worry. That happened to me once, too. :D

Akemi
xaox

xaox

2007-07-20 19:06

reporter   ~0005625

I now have a new build of the latest plus kernel with HZ=100.
toracat

toracat

2007-07-27 18:40

manager   ~0005750

Last edited: 2007-08-01 19:14

The 100Hz kernels referred to in this report (2.6.18-8.1.8 for CentOS 4 and 5) are now available from:

http://people.centos.org/~hughesjr/vmware-kernels/

(thanks to hughesjr)

Akemi

Phil Schaffner

Phil Schaffner

2007-08-02 20:49

reporter   ~0005801

A couple of minor points on the hughesjr vmware kernel repo. There is repodata present, so one might expect to use yum for installation; however, ...

1. If the lowest level x86 directories were named i386 instead of i686 then $basearch would work in a yum repo definition for either arch.

2. yum does not see these kernels as an upgrade. Perhaps they could have names with a higher lexical order, e.g. kernel-smp-2.6.9-55.0.2.vm.c4.100HZ.i686.rpm (consistent with plus naming) rather than kernel-smp-2.6.9-55.0.2.EL.100HZ.i686.rpm.

/etc/yum.repos.d/hughesjr.repo
[vm-kernels]
name=CentOS-$releasever - VMware Kernels
baseurl=http://people.centos.org/~hughesjr/vmware-kernels/$releasever/$basearch/
gpgcheck=1
# ???
enabled=0
protect=1
priority=1

Phil
JohnnyHughes

JohnnyHughes

2007-08-02 21:07

administrator   ~0005802

Last edited: 2007-08-02 21:09

actually, not a bad idea is to use:

<kerenel-version>.vm.c[4,5].100HZ.$arch.rpm

Since vm is > plus (c4plus) and > EL (c4) and > el (c5 and c5plus) ... then that COULD BE an upgrade to everything.

However, if you don't exclude=kernel* then you could upgrade to other regular versions.

I'm not going to recompile the kernels now .. but in the future we will name them .vm. something.

kbsingh@karan.org

kbsingh@karan.org

2007-09-08 19:36

administrator   ~0005968

reopening
kbsingh@karan.org

kbsingh@karan.org

2007-09-11 17:01

administrator   ~0005976

the right way of doing this, allowing users to opt in when they want - would be to name the kernel rpms as kernel-vm-<version> rather than kernel-<version>.vm otherwise it only causes thrashing in yum, and confuses the users.
larstr

larstr

2007-09-23 11:19

reporter   ~0006046

I've also tried booting different kernels with different parameters. These numbers differ sligthly from the initial ones as I've used a new freshly installed OS for the latest tests:

1000Hz SMP cpu 13.06% ready 4.32% (default)
1000Hz UP cpu 8.88 ready 3.03

100Hz SMP cpu 2.35 ready 0.36
100Hz UP cpu 1.93 ready 0.23

1000Hz SMP "nosmp noapic nolapic" cpu 4.21 ready 2.35
1000Hz UP "nosmp noapic nolapic" cpu 3.97 ready 2.45

100Hz SMP "nosmp noapic nolapic" cpu 0.895 ready 0.254
100Hz UP "nosmp noapic nolapic" cpu 0.788 ready 0.156

100Hz SMP cpu 1.38 ready 0.521
100Hz SMP noapic cpu 1.37 ready 0.294
100Hz UP cpu 1.0 ready 0.310
segedunum

segedunum

2007-10-10 20:31

reporter   ~0006113

Does anybody have any idea when a corresponding kernel might start appearing in CentOS Plus?
toracat

toracat

2007-10-10 20:49

manager   ~0006114

The 100Hz kernels are available for CentOS-4 and -5 and can be found in testing:

http://dev.centos.org/centos/4/testing/
http://dev.centos.org/centos/5/testing/

Look for kernel-vm-xxx

Akemi
jase99

jase99

2007-11-05 14:09

reporter   ~0006240

Here's some feedback. Host = 2 x 2.8GHz CPU x86_64. Two VMs, one is 2 CPU x86_64, the other is 2 CPU i686. When both vm's are idle, host CPU hovers around 50%. Using 2.6.9-55.0.9 kernel in host and guests. With the 100HZ kernels (i686 and x86_64) deployed in the guests, host cpu now hovers around 8% when vm's are idle. I also deployed the devel packages for the vm kernels so that vmware tools works. No bad side effects. Thank you for making these packages available.
toracat

toracat

2007-11-05 18:07

manager   ~0006246

Those who are interested in this subject may also be interested in vmware pre-built images for CentOS. See bug #1722 for details.
segedunum

segedunum

2007-11-07 09:43

reporter   ~0006249

Last edited: 2007-11-07 11:10

I can heartily concur with others that this has made a very big difference. I have a dual Opteron (2 GHz) system with CentOS x86_64 as the host, and my CentOS guests went from consuming around 5% usage to around 1.2%. These are 32-bit guests by the way. On another AMD Duron 1.2 GHz, OpenSuse (32-bit) system I have the CPU usage has gone down from around 10% - 15% down to around 0.7% at idle, which is consistent with the other OpenSuse and Windows guests I have. These are just unscientific ps and top readings, but the changes are significant.

The idle CPU usage still seems to be slightly higher than other Linux and Windows guests though, especially on the x86_64 system, although this could be down to the host being 64-bit, it could be due to variances in guest kernel versions where certain timer numbers work better, or something else. More experimentation is needed. I also have my own customised guests we're I'm using LVM, so this might make a difference.

My guests are all UP, so using the kernel parameters "nosmp noapic nolapic" also seems to have a positive effect. Note: don't put 'nosmp' in by itself otherwise you'll get a nice kernel panic!

I'm slightly surprised that the upstream vendor doesn't seem to have anything open regarding this. I would imagine this could be a pretty big problem.

toracat

toracat

2007-11-09 00:21

manager   ~0006254

The new kernel for RHEL 5.1 (2.6.18-53) has a new kernel option called "tick divider". It will let you reduce the system clock rate to 100, 250, etc Hz while allowing you to boot the kernel with 1000Hz. Below is a note from the patch file that adds this feature. How this performs compared to kernel compiled with 100Hz remains to be seen.

Akemi

=================================================================
From: Alan Cox <alan@redhat.com>
Subject: [RHEL5]: Tick Divider (Bugzilla #215403]
Date: Wed, 18 Apr 2007 16:39:15 -0400
Bugzilla: 215403
Message-Id: <20070418203915.GA23344@devserv.devel.redhat.com>
Changelog: [x86] Tick Divider


The following patch implements a tick divider feature that allows you to
boot the kernel with HZ at 1000 but the real timer tick rate lower (thus
not breaking all the modules and kABI).

The selection is done at boot to minimize risk and the patch has been reworked
so that you can do an informal attempt at a proof that it doesn't cause
regression for the non dividing case.

The patch interleaved with notes follows, and below that the actual patch
proper.

Xen kernels remain at 250HZ because
a) Xen guests have a 'tickless mode'
b) Xen itself has issues with multiple differing guest GZ rates

Not queued for upstream as the upstream path is Ingo's tickless kernel, which
is not viable as a RHEL5 tweak
==================================================================

2007-11-11 14:16

 

2007Nov10.png (26,478 bytes)
2007Nov10.png (26,478 bytes)
toracat

toracat

2007-11-11 14:28

manager   ~0006264

I have built the 5.1 kernel (2.6.18-53) and collected a preliminary result using vmktree developed by the original reporter, Lars (see the attached graph, 2007Nov10.png). The large peaks are when the system was booted. The cpu levels between boots are marked with the kernel and the option used. kernel-vm is the CentOS version of 100Hz kernel (in which 100Hz is compiled into the kernel). It looks as if the tick_divider=10 (or =4) option had no effect on the idle %cpu. This was not an expected result.

However, I do not have a way to verify that the tick_divider option was indeed honored when I added it to the kernel line.

2007-11-15 23:16

 

C5_i386_Nov15.jpg (54,963 bytes)
C5_i386_Nov15.jpg (54,963 bytes)

2007-11-15 23:19

 

C5_x86_64Nov15.jpg (56,001 bytes)
C5_x86_64Nov15.jpg (56,001 bytes)
toracat

toracat

2007-11-15 23:38

manager   ~0006316

I have repeated the test with a new host machine that has freshly installed CentOS 5.0 x86_64 (packages all up-to-date). I installed CentOS-5 (i386 and x86_64) as vmware guests and updated their kernel to 2.6.18-53. The attached graphs are the output of vmktree as before.

C5_i386_Nov15.jpg -- The result with the 32-bit guest was the same as the previous test. The tick_divider=10 option (to make HZ=100) did not have a discernible effect on %idle cpu whereas kernel-vm (100HZ compiled in the kernel) lowered it.

C5_x86_64Nov15.jpg -- The behavior of the 64-bit guest was curious. No apparent difference was seen regardless of the kernels/options used. They all look similar to the output of the kernel-vm 32-bit. ???

Akemi
toracat

toracat

2007-11-18 15:58

manager   ~0006338

All my tests so far have been done with the new 5.1 kernel installed on 5.0. Now that updated rpm's for 5.1 are available from the QA repo, I updated the system to 5.1 and re-ran the test. The result was different. In short, on an i386 system, it all looked like the output of the x86_64 machine (see the graph C5_x86_64Nov15.jpg). When the update files for the x86_64 arch are complete 1n the QA repo, I will upgrade my 64-bit test systems to 5.1 and do more testing.

Akemi
smccl

smccl

2007-12-06 16:44

reporter   ~0006496

Last edited: 2007-12-06 21:00

Just installed CentOS 5.1 on a single CPU VM running on VMware ESX. The tick_divider setting doesn't seem to be making any difference in the number of timer interrupts though. I set tick_divider=10 which should reduce the number of timer interrupts to 100. I wrote a nasty little scripts that queries /proc/interrupts every 1 second and still see an increase each second in about 1000 interrupts. Also, when watching the reporting capabilities of the ESX hypervisor I see no reduction in CPU utilization on the idle VM. Just a side note when I append "nosmp noapic nolapic" as kernel parameters I do see a very nice reduction in CPU utilization however. So the combination of the parameters that do work and the newly added tick_divider would really benefit us.

The server is:

uname -srvmpio
Linux 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:16 EST 2007 i686 i686 i386 GNU/Linux

The pertinent data from grub.conf

title CentOS (2.6.18-53.1.4.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-53.1.4.el5 ro root=/dev/rootvg/rootfs rhgb quiet clocksource=pit tick_divider=10
        initrd /initrd-2.6.18-53.1.4.el5.img

smccl

smccl

2007-12-06 16:46

reporter   ~0006497

Sorry, in the previous post I manually typed the grub.conf lines and tick_divider should equal 10. In my grub.conf "tick_divider=10". Sorry about the typo.
JohnnyHughes

JohnnyHughes

2007-12-06 21:17

administrator   ~0006499

upstream says this works on i686 and not on x86_64 (the tick_divider option that is)

I have had the same experience as you ... that it does not make any difference in VMWare
smccl

smccl

2007-12-06 21:26

reporter   ~0006500

Have you tested it on bare i686 hardware?
toracat

toracat

2007-12-07 17:58

manager   ~0006505

A question for centos devs (maybe tru?):

Are you planning to build kernel-vm for 5.1? It would be good to have it for comparing the "real" 100Hz kernel and the tick_divider-tweaked version. If this is not being planned, I could build it but would rather not do it myself this time.

Akemi
smccl

smccl

2007-12-07 18:24

reporter   ~0006506

I would be willing to test the 100Hz kernel on vmware and compare that to the tick_divider setting.
tru

tru

2007-12-07 21:16

administrator   ~0006508

the kernel-vm will be available shortly from the buildsystem, meanwhile I have built them inside my chrooted centos-5 tree. I will put them on dev.centos.org/~tru/kernel-vm asap.
tru

tru

2007-12-07 23:40

administrator   ~0006510

http://dev.centos.org/~tru/kernel-vm/RPMS/ contains now the chrooted builds.

the i386 kernel boots fine ;)
Linux version 2.6.18-53.1.4.el5vm (centos@blackwilson.bis.pasteur.fr) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Thu Dec 6 11:09:34 EST 2007
toracat

toracat

2007-12-07 23:46

manager   ~0006511

Tru,

Thank you for providing these kernels. I have already downloaded both arches. Will run some tests over the weekend.

Akemi
toracat

toracat

2007-12-08 09:32

manager   ~0006513

Just a quick note to tell you that with the latest version of kernel-vm, I was able to repeat my earlier observation that tick_divider had no visible effect. This was done with both i386 and x86_64. The graphs will follow shortly.

Thanks, Tru, for building the -vm kernels in a timely manner.

Akemi

2007-12-08 18:14

 

53.1.4i386.png (30,921 bytes)
53.1.4i386.png (30,921 bytes)

2007-12-08 18:15

 

53.1.4x86_64.png (30,880 bytes)
53.1.4x86_64.png (30,880 bytes)
toracat

toracat

2007-12-08 18:19

manager   ~0006514

Uploaded the graphs showing the test results with the kernel 2.6.18-53.1.4 with no option, tick_divider=10, and kernel-vm (100Hz) for i386 (53.1.4i386.png) and x86_64 (53.1.4x86_64.png). Only the kernel-vm effectively lowered the %cpu.

Akemi
toracat

toracat

2007-12-09 16:50

manager   ~0006523

A question for Lars:

We talked quite a while ago about possible inaccurate measurements that are based on "per time". I recall you indicated that your vmktree is not affected by that issue. So, do you think the graphs I collected are all real and therefore imply that the tick_divider option is not working the same way as the "real" 100Hz kernels?

I am pasting the quote from the Linux Journal for others to see.

==== begin quote ====
As Vassili Karpov has discovered to his dismay, CPU stats are not
accurately reported in /proc/stat on the PC architecture. On that
architecture, CPU usage is examined only during the timer interrupt,
so regular programs can seem to use much more or much less of the CPU,
just because they happen to be either very active or idle at those
particular intervals. This also explains why users might see a
difference in CPU usage when switching their kernel from running at
100Hz to 1,000Hz. In fact, the usage is unchanged, while only the
accounting is different. Programs like top, which get their CPU stats
from /proc/stat, will suffer from this kind of discrepancy. Vassili
and his friends wasted quite a bit of time trying to optimize some
code they were working on, until they discovered that they were
optimizing toward an inaccurate and ever-changing goal.
==== end quote ====

Akemi

2007-12-10 00:22

 

larstr

larstr

2007-12-10 00:29

reporter   ~0006525

For VMware Server running on linux, vmktree uses the same methods as top/ps to read the cpu load so it definitely has the same bug as these other tools.

When vmktree is used to get stats from VMware ESX Server it will however read these values out of the kernel of the ESX (vmkernel). This kernel is not based on linux and it is unknown whether the vmkernel is also having this same bug or not.

I have however done the same tests as you and the results are very similar. I now used the minimal centos5 vmware image provided in bug id 1722.

Lars
toracat

toracat

2007-12-10 00:50

manager   ~0006526

Lars,

Thanks for doing the test. Your result is very assuring. Mine was done on vmware server but the similarity is striking.

Akemi
smccl

smccl

2007-12-10 16:06

reporter   ~0006532

I've loaded and monitored the i686 VM kernel on an ESX virtual machine and immediately saw a very large reduction in idle time CPU utilization. This was seen using the built-in reporting utilities of the ESX hypervisor.

I then reverted to the same kernel version from the CentOS updates repository and used the tick_divider kernel parameter without, what seems to be, any affect. It's actually quite striking the difference in utilization.

I ran my same ugly script that polls /proc/interrupts every one second and saw that the difference was between 100-102 timer interrupts a second with the VM kernel provided by tru. To me it still seems like the tick_divider argument isn't cutting back on the number of timer interrupts.
toracat

toracat

2007-12-10 16:32

manager   ~0006533

It is "good" that all our results lead to the same conclusion -- seemingly no effect by the use of the tick_divider option. This makes us wonder what it does or is supposed to do. Another question is how we want to give feedback to upstream. Should one of us file a bug report at Bugzilla?

Akemi
toracat

toracat

2007-12-10 16:39

manager   ~0006534

Forgot to mention (because it is so obvious) that CentOS users benefit from the kernel-vm offered by CentOS until the upstream kernel comes up with a version that actually works as intended. So, tru, your efforts are really appreciated!

Akemi
toracat

toracat

2007-12-23 21:50

manager   ~0006603

A test kernel was made available upstream that contains a "Patch to fix some of the tick divider problems" (see https://bugzilla.redhat.com/show_bug.cgi?id=315471 )
I tested this kernel (2.6.18-58.el5), but the result was the same.

Akemi
clalance

clalance

2008-01-02 18:07

reporter   ~0006629

Hello,
     I did some of the work on the tick divider patch in the RedHat kernels. Would it be possible for someone to try out the latest RedHat errata kernel available (at this point, it would be 2.6.18-53.1.4.el5)

And use the correct kernel command-line option for the divider:

divider=10

and see if that gives some better results?

Thanks,
Chris Lalancette
toracat

toracat

2008-01-02 19:23

manager   ~0006630

Chris,

I am running a test as I type using the kernel 2.6.18-53.1.4.el5 and the option "divider=10" as suggested. Unlike earlier tests, it now looks like the result is similar to the CentOS kernel-vm.

My question is: Was the option supposed to be "divider=" and NOT "tick_divider=" from the very beginning? And we are all wasting our time? Or is this a recent change? The Release Notes for RHEL 5.1 clearly state:

" The tick_divider=<value> option is a sysfs parameter that allows you to adjust the system clock rate while maintaining the same visible HZ timing value to user space applications.

Using the tick_divider= option allows you to reduce CPU overhead and increase efficiency at the cost of lowering the accuracy of timing operations and profiling."

Akemi
clalance

clalance

2008-01-02 19:36

reporter   ~0006631

/me goes to look at the Release Notes....sigh.

The answer to your question is that yes, it has been "divider=" all along. It looks like there was a typo somewhere along the way with the release notes. I'll try to get that rectified online here. Note that the kernel released with 5.1 (-53) had some bugs in the divider that are further fixed in -53.1.2, so you would need at least 53.1.2 to get it really working.

As far as whether you are wasting your time, I can't say. I was pointed here by Jarod Wilson, and I was trying to accomplish two things by posting here:

1) Trying to dissuade CentOS from building a separate kernel, if possible, given the functionality already in 5.1.
2) Make sure that there aren't additional outstanding bugs in the current divider patch that would affect both RH and CentOS.

Chris Lalancette
toracat

toracat

2008-01-02 19:45

manager   ~0006632

Chris,

Thanks for posting here to let us know of the correct option. I have another question. Do you know if the x86_64 kernel now works as well?

Akemi
clalance

clalance

2008-01-02 21:21

reporter   ~0006633

Akemi,
     With the latest errata kernel (-53.1.4), all of the issues I know about on both i686 and x86_64 are fixed. Of course, if you run into any additional problems, please let me know so that I can try to track it down.

Chris Lalancette
toracat

toracat

2008-01-02 21:39

manager   ~0006634

Thanks Chris,

I just talked with some people on the #vmware IRC. They wonder how many RH systems have been running with the incorrect option. Someone is already sending a note to his customer. But at lease it was good that the error was noticed here.

Akemi
smccl

smccl

2008-01-02 22:54

reporter   ~0006636

When using divider=10 and manually overriding the clocksource as a kernel parameter (to avoid in the future clock drift), the vm hangs on startup. After LVM initialization the following error is displayed:

BUG: soft lockup detected on CPU#0!

Ultimately, the vm never seems to come up. According to http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1420
we shouldn't need to worry about time keeping algorithms which attempt to catch up too aggressively in the 2.6.18 kernel. If I don't manually override the clocksource then the clock drifts into the future.

Any recommendations on how to manually override the clocksource to an algorithm that doesn't attempt to "catch up" and still use the divider kernel parameter.

Relevant info:

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm jiffies tsc pit

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

uname -r
2.6.18-53.1.4.el5

cat /proc/cmdline
ro root=/dev/rootvg/rootfs rhgb quiet divider=10 clocksource=pit
toracat

toracat

2008-01-03 00:16

manager   ~0006640

Last edited: 2008-01-03 12:17

"When using divider=10 and manually overriding the clocksource as a kernel parameter (to avoid in the future clock drift), the vm hangs on startup."

I was able to reproduce this with my VM (2.6.18-53.1.4.el5 i686) and divider=10 clocksource=pit. It hangs upon boot.

However, with the CentOS kernel-vm (2.6.18-53.1.4.el5vm) and clocksource=pit, the same VM booted normally.

Kernel command line: ro root=LABEL=/ rhgb quiet clocksource=pit
ACPI: (supports S0 S1<6>Time: pit clocksource has been installed.

Akemi

edit by hughesjr:

relevant upstream bug:

https://bugzilla.redhat.com/show_bug.cgi?id=315471

2008-01-03 00:51

 

toracat

toracat

2008-01-03 00:52

manager   ~0006641

Last edited: 2008-01-03 12:20

I have uploaded the test result from Note 6630 (divider10_i686_Jan022007.png)

http://bugs.centos.org/file_download.php?file_id=413&type=bug

Akemi

JohnnyHughes

JohnnyHughes

2008-01-03 11:47

administrator   ~0006643

the "divider=" option still has some issues (I think) based on this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=315471

so we will maintain the kernel-vm until resolved.
JohnnyHughes

JohnnyHughes

2008-01-03 12:07

administrator   ~0006644

OK ... I have tested the latest i686 kernel from here:

http://people.redhat.com/dzickus/el5/

Which is kernel-2.6.18-62.el5.i686.rpm and "divider=10 clocksource=pit" hangs the boot.

As a side note ... if the clock GAINS (runs to fast) time you should be able to fix it with this:

http://kb.vmware.com/kb/1591

(by setting the correct host.cpukHz) and vmware tools should adjust a clock that is too slow.

Also see this blog entry concerning host.cpukHz:

http://blog.autoedification.com/2006/11/vmware-guest-clock-runs-fast.html
clalance

clalance

2008-01-03 14:03

reporter   ~0006645

jhughes: bug 315471 is resolved in both the 5.2 development kernel and in the 53.1.4 errata kernel. However, it seems like we still have a problem with the "clocksource=pit divider=10".

smccl or toracat: I don't actually have VMware to test with, so it would be great if one of you could run a few tests. First, are you running i686 or x86_64 VMs? I'm suspecting i686 since clocksource=pit doesn't make a huge difference in x86_64 VMware, but I just want to confirm. In terms of tests, I am interested in:

1) Try booting -53.1.4 with "divider=10" only. Does that work?
2) Try booting -62 with "divider=10" only. Does that work?
3) Try booting -53.1.4 with "divider=10 clocksource=pit". Does that work (probably not, based on earlier comments)?
4) Try booting -62 with "divider=10 clocksource=pit". Does that work?

For 3) and 4), if they both don't work, it would be great if you could get an "Alt-Sysrq-t" output from both of them and add them to this bug.

Thanks,
Chris Lalancette

2008-01-03 15:38

 

bootup-vi.txt (13,801 bytes)
Linux version 2.6.18-53.1.4.el5 (mockbuild@builder6.centos.org) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Fri Nov 30 00:45:16 EST 2007
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
 BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fef0000 (usable)
 BIOS-e820: 000000001fef0000 - 000000001feff000 (ACPI data)
 BIOS-e820: 000000001feff000 - 000000001ff00000 (ACPI NVS)
 BIOS-e820: 000000001ff00000 - 0000000020000000 (usable)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
512MB LOWMEM available.
found SMP MP-table at 000f6cd0
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
Using x86 segment limits to approximate NX protection
DMI present.
Using APIC driver default
ACPI: PM-Timer IO Port: 0x1008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 17
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
Detected 2667.970 MHz processor.
Built 1 zonelists.  Total pages: 131072
Kernel command line: ro root=/dev/rootvg/rootfs rhgb divider=10 clocksource=pit console=tty0 console=ttyS0,9600n8
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0743000 soft=c0723000
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 512028k/524288k available (2080k kernel code, 11584k reserved, 869k data, 220k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5351.41 BogoMIPS (lpj=2675707)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 14k freed
ACPI: Core revision 20060707
CPU0: Intel(R) Xeon(TM) CPU 2.66GHz stepping 08
Total of 1 processors activated (5351.41 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Brought up 1 CPUs
checking if image is initramfs... it is
Freeing initrd memory: 3032k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at 0xfd9a0, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI quirk: region 1000-103f claimed by PIIX4 ACPI
PCI quirk: region 1040-104f claimed by PIIX4 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *9 10 11 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 *11 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
PCI: Bridge: 0000:00:01.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: disabled.
NET: Registered protocol family 2
IP route cache hash table entries: 16384 (order: 4, 65536 bytes)
TCP established hash table entries: 65536 (order: 7, 524288 bytes)
TCP bind hash table entries: 32768 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 65536 bind 32768)
TCP reno registered
Simple Boot Flag at 0x36 set to 0x80
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
apm: overridden by ACPI.
audit: initializing netlink socket (disabled)
audit(1199372595.155:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
ksign: Installing public key data
Loading keyring
- Added public key EE0941287449EA77
- User ID: CentOS (Kernel Module GPG key)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Limiting direct PCI/PCI transfers.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x1
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x2
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
ACPI: Getting cpuindex for acpiid 0x3
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected an Intel 440BX Chipset.
agpgart: AGP aperture is 256M @ 0x0
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
�serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:0a: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 0000:00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0x1050-0x1057, BIOS settings: hda:DMA, hdb:pio
hda: VMware Virtual IDE CDROM Drive, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:MOUS] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
Using IPI No-Shortcut mode
ACPI: (supports S0 S1 S4 S5)
Freeing unused kernel memory: 220k freed
Time: pit clocksource has been installed.
Write protecting the kernel read-only data: 388k
Red Hat nash version 5.1.19.6 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Setting up hotplug.
input: AT Translated Set 2 keyboard as /class/input/input0
Creating block device nodes.
Loading uhci-hcd.ko module
USB Universal Host Controller Interface driver v3.0
Loading ohci-hcd.ko module
Loading ehci-hcd.ko module
Loading jbd.ko module
Loading ext3.ko module
Loading scsi_mod.ko module
SCSI subsystem initialized
Loading sd_mod.ko module
Loading scsi_transport_spi.ko module
Loading mptbase.ko module
Fusion MPT base driver 3.04.02-1vmw
Copyright (c) 1999-2005 LSI Logic Corporation
Loading mptscsih.ko module
Loading mptspi.ko module
Fusion MPT SPI Host driver 3.04.02-1vmw
ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 169
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
input: ImPS/2 Generic Wheel Mouse as /class/input/input1
scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=169
  Vendor: VMware    Model: Virtual disk      Rev: 1.0 
  Type:   Direct-Access                      ANSI SCSI revision: 02
 target0:0:0: Beginning Domain Validation
 target0:0:0: Domain Validation skipping write tests
 target0:0:0: Ending Domain Validation
 target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
sda: test WP failed, assume Write Enabled
sda: cache data unavailable
sda: assuming drive cache: write through
SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
sda: test WP failed, assume Write Enabled
sda: cache data unavailable
sda: assuming drive cache: write through
 sda: sda1 sda2
sd 0:0:0:0: Attached scsi disk sda
Loading libata.ko module
Loading ata_piix.ko module
Loading dm-mod.ko module
device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised: dm-devel@redhat.com
Loading dm-mirror.ko module
Loading dm-zero.ko module
Loading dm-snapshot.ko module
Waiting for driver initialization.
Scanning and configuring dmraid supported devices
Scanning logical volumes
BUG: soft lockup detected on CPU#0!
 [<c044d1ec>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c042a8b6>] __do_softirq+0x51/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c0458c89>] get_page_from_freelist+0x295/0x310
 [<c04e5211>] copy_to_user+0x31/0x48
 [<c0458d5b>] __alloc_pages+0x57/0x282
 [<c04650ed>] anon_vma_prepare+0x11/0xa5
 [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
 [<c0477bdb>] sys_stat64+0x1e/0x23
 [<c06068fb>] do_page_fault+0x20a/0x4b8
 [<c06066f1>] do_page_fault+0x0/0x4b8
 [<c0405a71>] error_code+0x39/0x40
 =======================
BUG: soft lockup detected on CPU#0!
 [<c044d1ec>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c040ae5b>] verify_tsc_freq+0x0/0xf5
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c040ae5b>] verify_tsc_freq+0x0/0xf5
 [<c042dd72>] run_timer_softirq+0x14c/0x151
 [<c042a8bf>] __do_softirq+0x5a/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c0458c89>] get_page_from_freelist+0x295/0x310
 [<c04e5211>] copy_to_user+0x31/0x48
 [<c0458d5b>] __alloc_pages+0x57/0x282
 [<c04650ed>] anon_vma_prepare+0x11/0xa5
 [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
 [<c0477bdb>] sys_stat64+0x1e/0x23
 [<c06068fb>] do_page_fault+0x20a/0x4b8
 [<c06066f1>] do_page_fault+0x0/0x4b8
 [<c0405a71>] error_code+0x39/0x40
 =======================
BUG: soft lockup detected on CPU#0!
 [<c044d1ec>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c040ae5b>] verify_tsc_freq+0x0/0xf5
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c040ae5b>] verify_tsc_freq+0x0/0xf5
 [<c042dd72>] run_timer_softirq+0x14c/0x151
 [<c042a8bf>] __do_softirq+0x5a/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c0458c89>] get_page_from_freelist+0x295/0x310
 [<c04e5211>] copy_to_user+0x31/0x48
 [<c0458d5b>] __alloc_pages+0x57/0x282
 [<c04650ed>] anon_vma_prepare+0x11/0xa5
 [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
 [<c0477bdb>] sys_stat64+0x1e/0x23
 [<c06068fb>] do_page_fault+0x20a/0x4b8
 [<c06066f1>] do_page_fault+0x0/0x4b8
 [<c0405a71>] error_code+0x39/0x40
 =======================
BUG: soft lockup detected on CPU#0!
 [<c044d1ec>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c042a8b6>] __do_softirq+0x51/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c046c20b>] drain_freelist+0x61/0x6a
 [<c046d4df>] cache_reap+0x9d/0x100
 [<c04332dc>] run_workqueue+0x78/0xb5
 [<c046d442>] cache_reap+0x0/0x100
 [<c0433b90>] worker_thread+0xd9/0x10d
 [<c04202b1>] default_wake_function+0x0/0xc
 [<c0433ab7>] worker_thread+0x0/0x10d
 [<c0435f65>] kthread+0xc0/0xeb
 [<c0435ea5>] kthread+0x0/0xeb
 [<c0405c3b>] kernel_thread_helper+0x7/0x10
 =======================
BUG: soft lockup detected on CPU#0!
 [<c044d1ec>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c044d40f>] handle_IRQ_event+0x17/0x49
 [<c044d4d4>] __do_IRQ+0x93/0xe8
 [<c04073f4>] do_IRQ+0x93/0xae
 [<c040592e>] common_interrupt+0x1a/0x20
 [<c042a8b6>] __do_softirq+0x51/0xbb
 [<c0407461>] do_softirq+0x52/0x9d
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c04772d0>] chrdev_show+0x18/0x4b
 [<c04a0d87>] devinfo_show+0x28/0x4d
 [<c048af58>] seq_read+0xe7/0x273
 [<c048ae71>] seq_read+0x0/0x273
 [<c0470365>] vfs_read+0x9f/0x141
 [<c04707b3>] sys_read+0x3c/0x63
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
bootup-vi.txt (13,801 bytes)
smccl

smccl

2008-01-03 15:53

reporter   ~0006646

I am testing on the i686 arch and will only be able to use the errata kernel mentioned 2.6.18-53.1.4.el5. I have confirmed that using just divider=10 works with the expected results and using just clocksource=pit works with the expected results.

I added a virtual serial port to the virtual machine and appeneded the bootup sequence with call traces to the bootup-vi.txt file. The "Alt-Sysrq-t" key sequence seemed ineffective even after enabling the functionality in /etc/sysctl.conf but every other keystroke and combination was ineffective as well during the system start up where both clocksource and divider are set.

It may be worth noting that the virtual machine uses 100% of its available cpu (UP) the entire time. Eventually I just give up and perform a hard shutdown of the vm. I tried booting up with clocksource=pit and divider=2 and eventually the system came up but very slowly. Once up remote ssh sessions had a very poor response time and keystrokes were delayed (the system just seemed busy). Once in I noticed that commands like date were reporting very erratic results. From one iteration of the command to the next (maybe 3 seconds) several hours of time would gain or even a whole day.

Sorry I can't continue testing to much unless it's on my own time. Maybe this evening some.
toracat

toracat

2008-01-03 16:20

manager   ~0006647

Chris,

With regard to the -62 kernel, the "divider=10" only works. With both "divider=10" and "clocksource=pit", it attempts to continue the boot process, but as smccl said, cpu shoots to 100% and the whole thing is practically "dead".

Akemi
clalance

clalance

2008-01-04 22:05

reporter   ~0006650

FYI the "clocksource=pit divider=10" bug; I've opened a RedHat Bugzilla about it here:

https://bugzilla.redhat.com/show_bug.cgi?id=427588

I have a good idea of what the problem is, I just need to come up with an acceptable solution.

Chris Lalancette

2008-01-05 22:13

 

c51-i386-divider.png (36,709 bytes)
c51-i386-divider.png (36,709 bytes)
arrfab

arrfab

2008-01-05 22:17

administrator   ~0006652

Just to add a note/comment to the (already long) list : i've tested a centos 5.1 i386 with the kernel 2.6.18-53.1.4.el5 / divider=10 option.
I've attached the result (c51-i386-divider.png) in the 'Attached files' on this page.
It seems to work as expected. I've only a couple of lines at boot time but the machine boots and everything seem ok after (from dmesg):

CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03
Total of 1 processors activated (5639.48 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ... failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ...<6>spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
<snip>
 works.
Brought up 1 CPUs
arrfab

arrfab

2008-01-05 22:23

administrator   ~0006653

added a link to the uploaded file : http://bugs.centos.org/file_download.php?file_id=417&type=bug
mmclean

mmclean

2008-01-11 07:12

reporter   ~0006683

With regard to the problem using both clocksource=pit and divider=10 at the same time, I was wondering if anyone has tried any of the other clocks? Specifically the acpi_pm clock, which manages to keep very good time on CentOS running on an ESX server.

Although the pit option doesn't drift into the future, we were still having issues with clock drift and over correction by the vmware-tools, so VMware support suggested using the pmtmr (acpi_pm in 2.6 kernels) clock instead, which has worked.

However, we still need the 100Hz option otherwise the ESX server can't cope. So this new kernel parameter should do the trick, but I currently have no way of testing the effect of the divide option along with the acpi_pm clock.
delimiter

delimiter

2008-01-11 19:36

reporter   ~0006689

Reporting success using divider=10 clocksource=acpi_pm
Using kernel 2.6.18-53.1.4.el5.centos.plus in a VM running on VMware ESX 3.0.2.
At rest CPU usage drops from ~50Mhz to ~12Mhz. Will have to see whether clock drift is acceptable.
delimiter

delimiter

2008-01-11 19:39

reporter   ~0006690

Er, that last note ate my tildes.
Usage dropped from about 50Mhz (2.4%) to about 12Mhz (.64%)
smccl

smccl

2008-01-11 20:03

reporter   ~0006691

delimiter, are you using the time sync functionality of vmware-tools in combination with acpi_pm?
delimiter

delimiter

2008-01-16 16:04

reporter   ~0006719

The clocksource=acpi_pm kernel option I specified earlier was misinformed. According to VMware's timekeeping whitepaper "pmtmr" and not acpi_pm would be the proper option... however since this is the default I have removed clocksource altogether.
smccl: yes we are using tools.syncTime = "TRUE"
Again, this is on Vmware ESX 3.0.2
mmclean

mmclean

2008-01-17 06:27

reporter   ~0006722

delimiter, the VMware white paper is based on the old 2.4 kernel and pmtmr does not exist in the new 2.6 kernel. Instead, acpi_pm has replaced it and it is not the default clocksource in the 2.6 kernel, which is why we specify it manually.
toracat

toracat

2008-01-24 16:35

manager   ~0006748

Chris,

I see that you have corrected the Release Notes. However, there is one more place in the x86_64 version of the Notes that still says "tick_divider" as in:

Using the tick_divider command-line argument ...

Akemi
mleonhardt

mleonhardt

2008-03-28 13:14

reporter   ~0007071

hi there,
we use the xaox repository to run our Servers virtualized in VMWare ESX 3.5. The precompiled kernels (thanks xaox!!) work very well. Does anyone knows wether there exist a repository with VMI-Support also enabled?
The current Ubuntu kernels already have VMI enabled as default. It would be nice, if this where the standard in CentOS also.

kind regards and thanks for your help.
Matthias Leonhardt
JohnnyHughes

JohnnyHughes

2008-03-31 07:51

administrator   ~0007075

CONFIG_VMI is not in any of the Kernels that are currently in RHEL-5 ... even the test kernels.

This was not added to the kernel tree until 2.6.21 and I do not see any patches anywhere that roll it back into any 2.6.18 kernels.

I do not think that CentOS will be creating a kernel that is outside the 2.6.18 tree to turn this on.

Here is a reference:
http://kerneltrap.org/node/14848
wizard113

wizard113

2008-07-03 15:47

reporter   ~0007559

Using the 2.6.18-92.1.6 x86_64 kernel-vm, I am trying to determine why the only available clocksource is 'jiffies'. It seems that the arch/x86_64/kernel/time directory does not contain the same set of clocksources that the i386 directory does.

I see a (possible) patch for this, at http://sr71.net/~jstultz/tod/broken-out/ - but I am curious if there was a reason why these clocksources (hpet, tsc) are not included in the Centos x86_64 kernels?

The reason I ask, is that I cannot get the x86_64 VMs to keep proper time using jiffies, and while I could go back to i386, I'd really rather stay with the x86_64 kernel.
garrettsmith

garrettsmith

2008-09-25 17:50

reporter   ~0008041

http://kb.vmware.com/kb/1006427 lists the timekeeping best practices for a number of distributions.
toracat

toracat

2008-09-26 20:17

manager   ~0008048

Promising patches that would improve timekeeping were made available for RHEL-5 by vmware:

https://bugzilla.redhat.com/show_bug.cgi?id=463573

However, they will not appear until RHEL *5.4* :-(
tru

tru

2008-10-10 12:28

administrator   ~0008110

Unless I made a mistake the first time, the recommended values have changed.

before sept 19th the recommended values with the vmware KB
for RHEL-4 32 bits: "divider=10 clock=pit"
for RHEL-5 32 bits: "divider=10 clocksource=acpi_pm"
for RHEL-4 64 bits: "divider=10 clock=pit"
for RHEL-5 64 bits: "notsc divider=10"

as of oct 10th it's now (Last Modified Date: 09-22-2008ID: 1006427:)
for RHEL-4 32 bits: **CHANGED** "clock=pmtmr divider=10"
for RHEL-5 32 bits: (unchanged) "divider=10 clocksource=acpi_pm"
for RHEL-4 64 bits: **CHANGED** "notsc divider=10"
for RHEL-5 64 bits: (unchanged) "notsc divider=10"

the new vmware guests will reflect the changes on the next release.
tru

tru

2008-10-29 14:56

administrator   ~0008207

http://kb.vmware.com/kb/1007020 links to RHSA-2008:0519 (kernel-2.6.18-92.1.6.el5.src.rpm)
Evolution

Evolution

2014-03-05 20:31

administrator   ~0019471

closed due to inactivity. Please re-open if the problem exists with new versions.

Issue History

Date Modified Username Field Change
2007-07-04 21:29 larstr New Issue
2007-07-04 21:29 larstr Status new => assigned
2007-07-05 16:54 smooge Note Added: 0005524
2007-07-09 06:15 larstr Note Added: 0005539
2007-07-09 16:50 toracat Note Added: 0005543
2007-07-12 11:44 larstr Note Added: 0005556
2007-07-12 12:07 toracat Note Added: 0005557
2007-07-13 23:56 larstr File Added: centos-cpuload.png
2007-07-13 23:57 larstr File Added: debian-cpuload.png
2007-07-14 00:00 larstr Note Added: 0005567
2007-07-14 01:07 toracat Note Added: 0005568
2007-07-14 01:38 larstr Note Added: 0005569
2007-07-14 01:51 toracat Note Added: 0005570
2007-07-14 07:35 larstr Note Added: 0005571
2007-07-14 07:39 toracat Note Added: 0005572
2007-07-14 20:04 toracat Note Added: 0005573
2007-07-14 20:26 toracat Note Edited: 0005573
2007-07-14 21:24 larstr File Added: centos-cpuload-up-100hz.png
2007-07-14 21:29 larstr Note Added: 0005574
2007-07-14 21:34 toracat Note Added: 0005575
2007-07-15 04:23 toracat Note Added: 0005577
2007-07-15 11:51 toracat Note Added: 0005578
2007-07-16 12:42 larstr Note Added: 0005581
2007-07-16 14:19 toracat Note Added: 0005583
2007-07-16 14:20 toracat Note Edited: 0005583
2007-07-16 19:34 larstr Note Added: 0005587
2007-07-16 19:53 toracat Note Added: 0005588
2007-07-18 09:37 toracat Note Added: 0005601
2007-07-18 11:35 larstr Note Added: 0005602
2007-07-18 12:14 toracat Note Added: 0005603
2007-07-18 13:45 xaox Note Added: 0005608
2007-07-19 17:07 toracat Note Added: 0005618
2007-07-19 17:53 xaox Note Added: 0005619
2007-07-19 18:12 toracat Note Added: 0005620
2007-07-20 18:31 larstr Note Edited: 0005602
2007-07-20 19:06 xaox Note Added: 0005625
2007-07-27 18:40 toracat Note Added: 0005750
2007-08-01 19:14 toracat Note Edited: 0005750
2007-08-02 20:49 Phil Schaffner Note Added: 0005801
2007-08-02 21:07 JohnnyHughes Note Added: 0005802
2007-08-02 21:08 JohnnyHughes Status assigned => resolved
2007-08-02 21:08 JohnnyHughes Resolution open => fixed
2007-08-02 21:09 JohnnyHughes Note Edited: 0005802
2007-09-08 19:34 kbsingh@karan.org Relationship added has duplicate 0002320
2007-09-08 19:36 kbsingh@karan.org Note Added: 0005968
2007-09-08 19:36 kbsingh@karan.org Status resolved => assigned
2007-09-11 17:01 kbsingh@karan.org Note Added: 0005976
2007-09-23 11:19 larstr Note Added: 0006046
2007-10-10 20:31 segedunum Note Added: 0006113
2007-10-10 20:49 toracat Note Added: 0006114
2007-11-05 14:09 jase99 Note Added: 0006240
2007-11-05 14:09 danieldk Relationship added duplicate of 0001680
2007-11-05 18:07 toracat Note Added: 0006246
2007-11-07 09:43 segedunum Note Added: 0006249
2007-11-07 10:23 segedunum Note Edited: 0006249
2007-11-07 11:10 segedunum Note Edited: 0006249
2007-11-09 00:21 toracat Note Added: 0006254
2007-11-11 14:16 toracat File Added: 2007Nov10.png
2007-11-11 14:28 toracat Note Added: 0006264
2007-11-15 23:16 toracat File Added: C5_i386_Nov15.jpg
2007-11-15 23:19 toracat File Added: C5_x86_64Nov15.jpg
2007-11-15 23:38 toracat Note Added: 0006316
2007-11-18 15:58 toracat Note Added: 0006338
2007-12-06 16:44 smccl Note Added: 0006496
2007-12-06 16:46 smccl Note Added: 0006497
2007-12-06 21:00 JohnnyHughes Note Edited: 0006496
2007-12-06 21:17 JohnnyHughes Note Added: 0006499
2007-12-06 21:26 smccl Note Added: 0006500
2007-12-07 17:58 toracat Note Added: 0006505
2007-12-07 18:24 smccl Note Added: 0006506
2007-12-07 21:16 tru Note Added: 0006508
2007-12-07 23:40 tru Note Added: 0006510
2007-12-07 23:46 toracat Note Added: 0006511
2007-12-08 09:32 toracat Note Added: 0006513
2007-12-08 18:14 toracat File Added: 53.1.4i386.png
2007-12-08 18:15 toracat File Added: 53.1.4x86_64.png
2007-12-08 18:19 toracat Note Added: 0006514
2007-12-09 16:50 toracat Note Added: 0006523
2007-12-10 00:22 larstr File Added: 2.6.18-53.i686-esx-xeon-15-2-8.png
2007-12-10 00:29 larstr Note Added: 0006525
2007-12-10 00:50 toracat Note Added: 0006526
2007-12-10 16:06 smccl Note Added: 0006532
2007-12-10 16:32 toracat Note Added: 0006533
2007-12-10 16:39 toracat Note Added: 0006534
2007-12-23 21:50 toracat Note Added: 0006603
2008-01-02 18:07 clalance Note Added: 0006629
2008-01-02 19:23 toracat Note Added: 0006630
2008-01-02 19:36 clalance Note Added: 0006631
2008-01-02 19:45 toracat Note Added: 0006632
2008-01-02 21:21 clalance Note Added: 0006633
2008-01-02 21:39 toracat Note Added: 0006634
2008-01-02 22:54 smccl Note Added: 0006636
2008-01-03 00:16 toracat Note Added: 0006640
2008-01-03 00:51 toracat File Added: divider10_i686_Jan022007.png
2008-01-03 00:52 toracat Note Added: 0006641
2008-01-03 11:47 JohnnyHughes Note Added: 0006643
2008-01-03 12:07 JohnnyHughes Note Added: 0006644
2008-01-03 12:17 JohnnyHughes Note Edited: 0006640
2008-01-03 12:20 JohnnyHughes Note Edited: 0006641
2008-01-03 14:03 clalance Note Added: 0006645
2008-01-03 15:38 smccl File Added: bootup-vi.txt
2008-01-03 15:53 smccl Note Added: 0006646
2008-01-03 16:20 toracat Note Added: 0006647
2008-01-04 22:05 clalance Note Added: 0006650
2008-01-05 22:13 arrfab File Added: c51-i386-divider.png
2008-01-05 22:17 arrfab Note Added: 0006652
2008-01-05 22:23 arrfab Note Added: 0006653
2008-01-11 07:12 mmclean Note Added: 0006683
2008-01-11 19:36 delimiter Note Added: 0006689
2008-01-11 19:39 delimiter Note Added: 0006690
2008-01-11 20:03 smccl Note Added: 0006691
2008-01-16 16:04 delimiter Note Added: 0006719
2008-01-17 06:27 mmclean Note Added: 0006722
2008-01-24 16:35 toracat Note Added: 0006748
2008-03-28 13:14 mleonhardt Note Added: 0007071
2008-03-31 07:51 JohnnyHughes Note Added: 0007075
2008-07-03 15:47 wizard113 Note Added: 0007559
2008-09-25 17:50 garrettsmith Note Added: 0008041
2008-09-26 20:17 toracat Note Added: 0008048
2008-10-10 12:28 tru Note Added: 0008110
2008-10-29 14:56 tru Note Added: 0008207
2014-03-05 20:31 Evolution Note Added: 0019471
2014-03-05 20:31 Evolution Status assigned => closed