2017-06-28 15:37 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0002189CentOS-5kernelpublic2014-03-05 20:31
Reporterlarstr 
PrioritynormalSeverityminorReproducibilityalways
StatusclosedResolutionfixed 
Product Version5.0 
Target VersionFixed in Version 
Summary0002189: CentOS is not getting optimal performance in a virtualized environment and on slow cpus
DescriptionIn 2.4 kernels the system timer was normally clocked at 100 Hz, while in 2.6 the default system timer is set to 1000 Hz (some other distros are not following these "rules", and USER_HZ is still 100). 1000 Hz is definately a good thing for desktop computers requiring fast interactive responses, but there are environments where this causes bad side effects.

Kernels compiled for SMP the system timer will requests twice as many interrupts when running on a single cpu and 2.5 as many when running on a dual cpu system.

One might argue that the smp kernels has better threading than unicpu kernels, but there are other negative effects involved by using these kernels on unicpu systems.

Some cpus can't keep up with this interrupt rate, and the 2.6 kernel has code to detect this, but it can't always correct for lost ticks, and having the interrupt rate this high is also affecting the performance negatively. The negative effects of this has been experienced in virtual environments using VMware products (ESX, Server, Workstation, Fusion, Ace & Player), but is also a potential problem on physical systems running on slow cpus such as the Geode, even though the clock issues aren as bad on physical systems because detecting lost ticks are more predictable in a physical system than a virtual system.

In a virtual environment, a key indicator that these systems are not properly setup is if you have an idle guest system (indicated by tools inside the guest) while the host reports that this guest is using a lot of cpu (typically 20-30%). Another indicator is that the clock inside the guest is not keeping up with time. On newer cpu's these effects are not as visible as on old cpus (for example a Pentium 3 500-1000MHz), but also on newer cpus you will not be able to scale as well due to these issues, resulting in fewer guests systems per server host.

It would be a great benefit if a 100Hz unicpu kernel was made available in one of the CentOS repositories. There is already a 100 Hz kernel repository available (http://vmware.xaox.net/centos/), but it only contains SMP kernels.
Additional InformationRelated documentation:
http://www.vmware.com/pdf/vsmp_best_practices.pdf
http://www.vmware.com/pdf/vmware_timekeeping.pdf
http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1420
http://kb.vmware.com/kb/j1730
http://www.vmware.com/pdf/vi_performance_tuning.pdf
http://www.vmware.com/community/thread.jspa?threadID=88879&tstart=0
http://www.vmware.com/community/message.jspa?messageID=540949
TagsNo tags attached.
Attached Files
  • png file icon centos-cpuload.png (95,657 bytes) 2007-07-13 23:56 -
    png file icon centos-cpuload.png (95,657 bytes) 2007-07-13 23:56 +
  • png file icon debian-cpuload.png (64,032 bytes) 2007-07-13 23:57 -
    png file icon debian-cpuload.png (64,032 bytes) 2007-07-13 23:57 +
  • png file icon centos-cpuload-up-100hz.png (79,737 bytes) 2007-07-14 21:24 -
    png file icon centos-cpuload-up-100hz.png (79,737 bytes) 2007-07-14 21:24 +
  • png file icon 2007Nov10.png (26,478 bytes) 2007-11-11 14:16 -
    png file icon 2007Nov10.png (26,478 bytes) 2007-11-11 14:16 +
  • jpg file icon C5_i386_Nov15.jpg (54,963 bytes) 2007-11-15 23:16 -
    jpg file icon C5_i386_Nov15.jpg (54,963 bytes) 2007-11-15 23:16 +
  • jpg file icon C5_x86_64Nov15.jpg (56,001 bytes) 2007-11-15 23:19 -
    jpg file icon C5_x86_64Nov15.jpg (56,001 bytes) 2007-11-15 23:19 +
  • png file icon 53.1.4i386.png (30,921 bytes) 2007-12-08 18:14 -
    png file icon 53.1.4i386.png (30,921 bytes) 2007-12-08 18:14 +
  • png file icon 53.1.4x86_64.png (30,880 bytes) 2007-12-08 18:15 -
    png file icon 53.1.4x86_64.png (30,880 bytes) 2007-12-08 18:15 +
  • png file icon 2.6.18-53.i686-esx-xeon-15-2-8.png (57,973 bytes) 2007-12-10 00:22 -
    png file icon 2.6.18-53.i686-esx-xeon-15-2-8.png (57,973 bytes) 2007-12-10 00:22 +
  • png file icon divider10_i686_Jan022007.png (28,267 bytes) 2008-01-03 00:51 -
    png file icon divider10_i686_Jan022007.png (28,267 bytes) 2008-01-03 00:51 +
  • txt file icon bootup-vi.txt (13,801 bytes) 2008-01-03 15:38 -
    Linux version 2.6.18-53.1.4.el5 (mockbuild@builder6.centos.org) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Fri Nov 30 00:45:16 EST 2007
    BIOS-provided physical RAM map:
     BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
     BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
     BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
     BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
     BIOS-e820: 0000000000100000 - 000000001fef0000 (usable)
     BIOS-e820: 000000001fef0000 - 000000001feff000 (ACPI data)
     BIOS-e820: 000000001feff000 - 000000001ff00000 (ACPI NVS)
     BIOS-e820: 000000001ff00000 - 0000000020000000 (usable)
     BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
     BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
     BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
    0MB HIGHMEM available.
    512MB LOWMEM available.
    found SMP MP-table at 000f6cd0
    Memory for crash kernel (0x0 to 0x0) notwithin permissible range
    disabling kdump
    Using x86 segment limits to approximate NX protection
    DMI present.
    Using APIC driver default
    ACPI: PM-Timer IO Port: 0x1008
    ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
    Processor #0 15:4 APIC version 17
    ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
    ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
    IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
    ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
    Enabling APIC mode:  Flat.  Using 1 I/O APICs
    Using ACPI (MADT) for SMP configuration information
    Allocating PCI resources starting at 30000000 (gap: 20000000:dec00000)
    Detected 2667.970 MHz processor.
    Built 1 zonelists.  Total pages: 131072
    Kernel command line: ro root=/dev/rootvg/rootfs rhgb divider=10 clocksource=pit console=tty0 console=ttyS0,9600n8
    Enabling fast FPU save and restore... done.
    Enabling unmasked SIMD FPU exception support... done.
    Initializing CPU#0
    CPU 0 irqstacks, hard=c0743000 soft=c0723000
    PID hash table entries: 4096 (order: 12, 16384 bytes)
    Console: colour VGA+ 80x25
    Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
    Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
    Memory: 512028k/524288k available (2080k kernel code, 11584k reserved, 869k data, 220k init, 0k highmem)
    Checking if this processor honours the WP bit even in supervisor mode... Ok.
    Calibrating delay using timer specific routine.. 5351.41 BogoMIPS (lpj=2675707)
    Security Framework v1.0.0 initialized
    SELinux:  Initializing.
    selinux_register_security:  Registering secondary module capability
    Capability LSM initialized as secondary
    Mount-cache hash table entries: 512
    CPU: Trace cache: 12K uops, L1 D cache: 16K
    CPU: L2 cache: 1024K
    Intel machine check architecture supported.
    Intel machine check reporting enabled on CPU#0.
    Checking 'hlt' instruction... OK.
    SMP alternatives: switching to UP code
    Freeing SMP alternatives: 14k freed
    ACPI: Core revision 20060707
    CPU0: Intel(R) Xeon(TM) CPU 2.66GHz stepping 08
    Total of 1 processors activated (5351.41 BogoMIPS).
    ENABLING IO-APIC IRQs
    ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
    Brought up 1 CPUs
    checking if image is initramfs... it is
    Freeing initrd memory: 3032k freed
    NET: Registered protocol family 16
    ACPI: bus type pci registered
    PCI: PCI BIOS revision 2.10 entry at 0xfd9a0, last bus=1
    PCI: Using configuration type 1
    Setting up standard PCI resources
    ACPI: Interpreter enabled
    ACPI: Using IOAPIC for interrupt routing
    ACPI: PCI Root Bridge [PCI0] (0000:00)
    PCI quirk: region 1000-103f claimed by PIIX4 ACPI
    PCI quirk: region 1040-104f claimed by PIIX4 SMB
    ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
    ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *9 10 11 14 15)
    ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 *11 14 15)
    ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 14 15) *0, disabled.
    Linux Plug and Play Support v0.97 (c) Adam Belay
    pnp: PnP ACPI init
    pnp: PnP ACPI: found 12 devices
    usbcore: registered new driver usbfs
    usbcore: registered new driver hub
    PCI: Using ACPI for IRQ routing
    PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
    NetLabel: Initializing
    NetLabel:  domain hash size = 128
    NetLabel:  protocols = UNLABELED CIPSOv4
    NetLabel:  unlabeled traffic allowed by default
    PCI: Bridge: 0000:00:01.0
      IO window: disabled.
      MEM window: disabled.
      PREFETCH window: disabled.
    NET: Registered protocol family 2
    IP route cache hash table entries: 16384 (order: 4, 65536 bytes)
    TCP established hash table entries: 65536 (order: 7, 524288 bytes)
    TCP bind hash table entries: 32768 (order: 6, 262144 bytes)
    TCP: Hash tables configured (established 65536 bind 32768)
    TCP reno registered
    Simple Boot Flag at 0x36 set to 0x80
    apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
    apm: overridden by ACPI.
    audit: initializing netlink socket (disabled)
    audit(1199372595.155:1): initialized
    Total HugeTLB memory allocated, 0
    VFS: Disk quotas dquot_6.5.1
    Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
    Initializing Cryptographic API
    ksign: Installing public key data
    Loading keyring
    - Added public key EE0941287449EA77
    - User ID: CentOS (Kernel Module GPG key)
    io scheduler noop registered
    io scheduler anticipatory registered
    io scheduler deadline registered
    io scheduler cfq registered (default)
    Limiting direct PCI/PCI transfers.
    pci_hotplug: PCI Hot Plug PCI Core version: 0.5
    ACPI: Processor [CPU0] (supports 8 throttling states)
    ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
    ACPI: Getting cpuindex for acpiid 0x1
    ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
    ACPI: Getting cpuindex for acpiid 0x2
    ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not present [20060707]
    ACPI: Getting cpuindex for acpiid 0x3
    Real Time Clock Driver v1.12ac
    Non-volatile memory driver v1.2
    Linux agpgart interface v0.101 (c) Dave Jones
    agpgart: Detected an Intel 440BX Chipset.
    agpgart: AGP aperture is 256M @ 0x0
    Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
    ˙serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
    00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
    00:0a: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
    RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize
    Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
    ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
    PIIX4: IDE controller at PCI slot 0000:00:07.1
    PIIX4: chipset revision 1
    PIIX4: not 100% native mode: will probe irqs later
        ide0: BM-DMA at 0x1050-0x1057, BIOS settings: hda:DMA, hdb:pio
    hda: VMware Virtual IDE CDROM Drive, ATAPI CD/DVD-ROM drive
    ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
    ide-floppy driver 0.99.newide
    usbcore: registered new driver hiddev
    usbcore: registered new driver usbhid
    drivers/usb/input/hid-core.c: v2.6:USB HID core driver
    PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:MOUS] at 0x60,0x64 irq 1,12
    serio: i8042 KBD port at 0x60,0x64 irq 1
    serio: i8042 AUX port at 0x60,0x64 irq 12
    mice: PS/2 mouse device common for all mice
    md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
    md: bitmap version 4.39
    TCP bic registered
    Initializing IPsec netlink socket
    NET: Registered protocol family 1
    NET: Registered protocol family 17
    Using IPI No-Shortcut mode
    ACPI: (supports S0 S1 S4 S5)
    Freeing unused kernel memory: 220k freed
    Time: pit clocksource has been installed.
    Write protecting the kernel read-only data: 388k
    Red Hat nash version 5.1.19.6 starting
    Mounting proc filesystem
    Mounting sysfs filesystem
    Creating /dev
    Creating initial device nodes
    Setting up hotplug.
    input: AT Translated Set 2 keyboard as /class/input/input0
    Creating block device nodes.
    Loading uhci-hcd.ko module
    USB Universal Host Controller Interface driver v3.0
    Loading ohci-hcd.ko module
    Loading ehci-hcd.ko module
    Loading jbd.ko module
    Loading ext3.ko module
    Loading scsi_mod.ko module
    SCSI subsystem initialized
    Loading sd_mod.ko module
    Loading scsi_transport_spi.ko module
    Loading mptbase.ko module
    Fusion MPT base driver 3.04.02-1vmw
    Copyright (c) 1999-2005 LSI Logic Corporation
    Loading mptscsih.ko module
    Loading mptspi.ko module
    Fusion MPT SPI Host driver 3.04.02-1vmw
    ACPI: PCI Interrupt 0000:00:10.0[A] -> GSI 17 (level, low) -> IRQ 169
    mptbase: Initiating ioc0 bringup
    ioc0: 53C1030: Capabilities={Initiator}
    input: ImPS/2 Generic Wheel Mouse as /class/input/input1
    scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=169
      Vendor: VMware    Model: Virtual disk      Rev: 1.0 
      Type:   Direct-Access                      ANSI SCSI revision: 02
     target0:0:0: Beginning Domain Validation
     target0:0:0: Domain Validation skipping write tests
     target0:0:0: Ending Domain Validation
     target0:0:0: FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, offset 127)
    SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
    sda: test WP failed, assume Write Enabled
    sda: cache data unavailable
    sda: assuming drive cache: write through
    SCSI device sda: 33554432 512-byte hdwr sectors (17180 MB)
    sda: test WP failed, assume Write Enabled
    sda: cache data unavailable
    sda: assuming drive cache: write through
     sda: sda1 sda2
    sd 0:0:0:0: Attached scsi disk sda
    Loading libata.ko module
    Loading ata_piix.ko module
    Loading dm-mod.ko module
    device-mapper: ioctl: 4.11.0-ioctl (2006-09-14) initialised: dm-devel@redhat.com
    Loading dm-mirror.ko module
    Loading dm-zero.ko module
    Loading dm-snapshot.ko module
    Waiting for driver initialization.
    Scanning and configuring dmraid supported devices
    Scanning logical volumes
    BUG: soft lockup detected on CPU#0!
     [<c044d1ec>] softlockup_tick+0x96/0xa4
     [<c042ddb0>] update_process_times+0x39/0x5c
     [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c042a8b6>] __do_softirq+0x51/0xbb
     [<c0407461>] do_softirq+0x52/0x9d
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c0458c89>] get_page_from_freelist+0x295/0x310
     [<c04e5211>] copy_to_user+0x31/0x48
     [<c0458d5b>] __alloc_pages+0x57/0x282
     [<c04650ed>] anon_vma_prepare+0x11/0xa5
     [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
     [<c0477bdb>] sys_stat64+0x1e/0x23
     [<c06068fb>] do_page_fault+0x20a/0x4b8
     [<c06066f1>] do_page_fault+0x0/0x4b8
     [<c0405a71>] error_code+0x39/0x40
     =======================
    BUG: soft lockup detected on CPU#0!
     [<c044d1ec>] softlockup_tick+0x96/0xa4
     [<c042ddb0>] update_process_times+0x39/0x5c
     [<c040ae5b>] verify_tsc_freq+0x0/0xf5
     [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c040ae5b>] verify_tsc_freq+0x0/0xf5
     [<c042dd72>] run_timer_softirq+0x14c/0x151
     [<c042a8bf>] __do_softirq+0x5a/0xbb
     [<c0407461>] do_softirq+0x52/0x9d
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c0458c89>] get_page_from_freelist+0x295/0x310
     [<c04e5211>] copy_to_user+0x31/0x48
     [<c0458d5b>] __alloc_pages+0x57/0x282
     [<c04650ed>] anon_vma_prepare+0x11/0xa5
     [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
     [<c0477bdb>] sys_stat64+0x1e/0x23
     [<c06068fb>] do_page_fault+0x20a/0x4b8
     [<c06066f1>] do_page_fault+0x0/0x4b8
     [<c0405a71>] error_code+0x39/0x40
     =======================
    BUG: soft lockup detected on CPU#0!
     [<c044d1ec>] softlockup_tick+0x96/0xa4
     [<c042ddb0>] update_process_times+0x39/0x5c
     [<c040ae5b>] verify_tsc_freq+0x0/0xf5
     [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c040ae5b>] verify_tsc_freq+0x0/0xf5
     [<c042dd72>] run_timer_softirq+0x14c/0x151
     [<c042a8bf>] __do_softirq+0x5a/0xbb
     [<c0407461>] do_softirq+0x52/0x9d
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c0458c89>] get_page_from_freelist+0x295/0x310
     [<c04e5211>] copy_to_user+0x31/0x48
     [<c0458d5b>] __alloc_pages+0x57/0x282
     [<c04650ed>] anon_vma_prepare+0x11/0xa5
     [<c045fdb2>] __handle_mm_fault+0x3dd/0x87b
     [<c0477bdb>] sys_stat64+0x1e/0x23
     [<c06068fb>] do_page_fault+0x20a/0x4b8
     [<c06066f1>] do_page_fault+0x0/0x4b8
     [<c0405a71>] error_code+0x39/0x40
     =======================
    BUG: soft lockup detected on CPU#0!
     [<c044d1ec>] softlockup_tick+0x96/0xa4
     [<c042ddb0>] update_process_times+0x39/0x5c
     [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c042a8b6>] __do_softirq+0x51/0xbb
     [<c0407461>] do_softirq+0x52/0x9d
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c046c20b>] drain_freelist+0x61/0x6a
     [<c046d4df>] cache_reap+0x9d/0x100
     [<c04332dc>] run_workqueue+0x78/0xb5
     [<c046d442>] cache_reap+0x0/0x100
     [<c0433b90>] worker_thread+0xd9/0x10d
     [<c04202b1>] default_wake_function+0x0/0xc
     [<c0433ab7>] worker_thread+0x0/0x10d
     [<c0435f65>] kthread+0xc0/0xeb
     [<c0435ea5>] kthread+0x0/0xeb
     [<c0405c3b>] kernel_thread_helper+0x7/0x10
     =======================
    BUG: soft lockup detected on CPU#0!
     [<c044d1ec>] softlockup_tick+0x96/0xa4
     [<c042ddb0>] update_process_times+0x39/0x5c
     [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c044d40f>] handle_IRQ_event+0x17/0x49
     [<c044d4d4>] __do_IRQ+0x93/0xe8
     [<c04073f4>] do_IRQ+0x93/0xae
     [<c040592e>] common_interrupt+0x1a/0x20
     [<c042a8b6>] __do_softirq+0x51/0xbb
     [<c0407461>] do_softirq+0x52/0x9d
     [<c04059bf>] apic_timer_interrupt+0x1f/0x24
     [<c04772d0>] chrdev_show+0x18/0x4b
     [<c04a0d87>] devinfo_show+0x28/0x4d
     [<c048af58>] seq_read+0xe7/0x273
     [<c048ae71>] seq_read+0x0/0x273
     [<c0470365>] vfs_read+0x9f/0x141
     [<c04707b3>] sys_read+0x3c/0x63
     [<c0404eff>] syscall_call+0x7/0xb
     =======================
    
    txt file icon bootup-vi.txt (13,801 bytes) 2008-01-03 15:38 +
  • png file icon c51-i386-divider.png (36,709 bytes) 2008-01-05 22:13 -
    png file icon c51-i386-divider.png (36,709 bytes) 2008-01-05 22:13 +

-Relationships
duplicate of 0001680closedJohnnyHughes CentOS-4 Provide kernel with low interrupt timer for use in VMware 
has duplicate 0002320closedkbsingh@karan.org CentOS-5 Follow-up to Bug#2189 - Getting optimal performance in a virtualized environment 
+Relationships

-Notes

~0005524

smooge (reporter)

I think we were looking at doing this for the CentOS Plus repository. The odd issue is that some systems work better at 250Mhz (which the 2.6.18 set I think is set to) and some work better at 100Mhz and some 1000).

~0005539

larstr (reporter)

Yes, 2.6.18 is indeed default set to 250Hz (not MHz ;)), and since it's an smp kernel it requests 750 clock interrupts per second.

~0005543

toracat (manager)

If I'm not mistaken, 2.6.18 defaults to 1000Hz.

# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I understand this is Linus Torvalds' choice.

~0005556

larstr (reporter)

The official 2.6.18 linux kernel defaults to 1000 Hz, but the CentOS kernel seems to default to 250.
/usr/src/kernels/2.6.18-8.el5-i686/kernel/Kconfig.hz:

choice
        prompt "Timer frequency"
        default HZ_250

Having the kernel at 1000Hz is good for physical systems, and especially desktop systems. In a virtual system this have negative side effects. We have also seen similar effects in computers with slow cpus such as Soekris with Geode cpu (133MHz "486").

~0005557

toracat (manager)

Well, the Kconfig.hz file is referred to when configuring .config with make menuconfig (or make xconfig). In fact, if you go to the bottom of that file, it has these lines:

config HZ
        int
        default 100 if HZ_100
        default 250 if HZ_250
        default 1000 if HZ_1000

In the CentOS kernel (and mainline kernel), "CONFIG_HZ_1000=y" is defined, there fore the timer frequency becomes 1000.

Hope this clears a bit.

Akemi

~0005567

larstr (reporter)

Ok, I believe you Akemi, even though smooge got me confuzed and investigating the 250Hz option for a second there. ;)

Having Hz=1000 and a SMP enabled kernel is still a problem in a virtualized environment. Having installed a minimal CentOS with the default kernel in ESX3 on a system with 8x Xeon MP 1900MHz cpus we see the following. Inside the VM, the load is near 0, while outside of the VM we can see that it's using a lot of resources. If we compare it to another virtual machine running debian and a 2.4 kernel we can see how much they differ in the load generated. The debian VM is a LAMP setup in prod, but currently for the most of the time idling. I've now uploaded a cpu graph for both these so we can get a clearer view of how CentOS is behaving.

Lars

~0005568

toracat (manager)

Is there any chance you can rebuild the kernel yourself with the desired options? If not, are you willing to run performance tests if the kernel is provided?

Akemi

~0005569

larstr (reporter)

If a kernel was provided I would of course test it's performance.

I could probably build such a kernel myself too (even though that is not something I've done too many times). I believe such kernel would benefit the popularity of CentOS in virtualized environments.

I filed this bug report after asking about this in #centos and I was told that this question was brought up from time to time and it could be a good idea to provide such a kernel in the repository.

Lars

~0005570

toracat (manager)

Which arch (i686 or x86_64) would you like to test? I am myself thinking of doing a test.

I totally agree with you. If the optimized kernels are available from CentOS, that would benefit a number of users. However, this would depend on the time and resources the CentOS team can afford. Positive test results from multiple users might help them determine if the whole thing is worth the effort.

Akemi

~0005571

larstr (reporter)

I would like to test i686 first. Thanks.

~0005572

toracat (manager)

I currently have x86_64 only. Will build i686 soon.

Akemi

~0005573

toracat (manager)

Last edited: 2007-07-14 20:26

Lars,

I now have both i686 and x86_64 (UP, 100Hz) kernel and kernel-devel rpm's. Send me e-mail (amyagi at gmail dot com) for download instructions.

Akemi

Note added: Just booted the i686 version in my VM. So far so good.

~0005574

larstr (reporter)

Wow, that really did it, Akemi! :)

The cpu load is now reduced from ~15% to 3% and %READY is reduced from ~4% to 0.4%.

The remaining cpu load is probably caused by the default installed services. I've uploaded another graph showing the load before, during and after the kernel change, and you can really see how the load changes.

Lars

~0005575

toracat (manager)

That's really good news, Lars. The graph is impressive.

Akemi

~0005577

toracat (manager)

Just wanted add one thing. smooge's note is not really incorrect. I noticed a few minutes ago that xen kernel is set to 250Hz. It is the standard kernel that uses 1000Hz by default.

Akemi

~0005578

toracat (manager)

Lars

Since you have been testing with these kernels, could you make a comment as to what change contributed to the better performance to what extent? In other words, changing 1000 -> 100Hz versus SMP -> UP. How does each change contribute?

Akemi

~0005581

larstr (reporter)

Akemi,
It seems that the UP vs SMP is the thing most affecting things here. Also, on newer cpus, this isn't as noticable as on old cpus. The 1900 Xeon I've been using is 4 years old.

The results are however a bit surprising as they show quite similar results for both SMP kernels. The kernel builds do however differ slightly and I don't know if that is affecting anything:

kernel-2.6.18-8.1.8.UP.100Hz.el5.i686 <- Akemi kernel
up 100Hz
cpu 2.24%
ready 0.27%

kernel-2.6.18-8.1.4.el5.centos.plus.VMware.i686 <- xaox kernel
smp 100Hz
cpu 14.3%
ready 5%

kernel-2.6.18-8.1.8.el5.i686 <- default kernel
smp 1000Hz
cpu 13.6%
ready 3.15%

Lars

~0005583

toracat (manager)

Last edited: 2007-07-16 14:20

Lars,

Very interesing result. But as you pointed out, this could be an apples-to-oranges comparison. If you are willing to take it further for completedness, I'd have no problem rebuilding other two kernels (Hz change only and SMP change only). What do you think?

Akemi

~0005587

larstr (reporter)

Yes, I agree. Some further testing would help us understand more of these findings, and I'm willing to continue this testing.

Lars

~0005588

toracat (manager)

Great. I said, "I'd have no problem rebuilding..." However, I am having a problem with the build process right now. But I will fix it one way or another.

Akemi

~0005601

toracat (manager)

Lars,

I now have the other 2 variants:

kernel-2.6.18-8.1.8.SMP.100HZ.el5.i686.rpm
kernel-2.6.18-8.1.8.UP.1000HZ.el5.i686.rpm

They are in the same place as before.

Akemi

~0005602

larstr (reporter)

Last edited: 2007-07-20 18:31

Just to be sure I reinstalled CentOS and tried all these kernels on a freshly installed CentOS to get as accurate results as possible.

I have to admit, the results were much more like I had expected. :-)

kernel-2.6.18-8.1.8.el5.i686 <- Default
smp 1000Hz
cpu 13.06%
ready 4.32%

kernel-2.6.18-8.1.8.UP.100Hz.el5.i686 <- Akemi kernel
up 100Hz
cpu 1.93%
ready 0.23%

kernel-2.6.18-8.1.8.UP.1000Hz.el5.i686 <- Akemi kernel
up 1000Hz
cpu 8.88%
ready 3.03%

kernel-2.6.18-8.1.8.SMP.100Hz.el5.i686 <- Akemi kernel
smp 100Hz
cpu 2.35%
ready 0.36%

kernel-2.6.18-8.1.4.el5.centos.plus.VMware.i686 <- xaox kernel
smp 100Hz
cpu 14.16%
ready 4.80%

~0005603

toracat (manager)

Lars, you are so fast. All the tests were done while I was asleep :)

Good results indeed. I suspect that xaox' kernel is not really 100Hz. Did not read the details of his vmware post, but the way it was done (by editing some .h file) would not make the intended change in the freq in the config. At any rate, I am glad we have a full set of data.

Akemi

~0005608

xaox (reporter)

I was just made aware of this bug report.

I have checked and my kernels were built with HZ=1000, not HZ=100. I have a build problem I need to work out.

~0005618

toracat (manager)

xaox,

What build problem are you having? Could you describe it in more details?

Akemi

~0005619

xaox (reporter)

toracat,

The build problem as it turns out is that I'm an idiot. At some point my updated kernel configuration files were overwritten with the originals and I didn't notice.

I'm rebuilding now with the fixed config files.

~0005620

toracat (manager)

xaox,

No worry. That happened to me once, too. :D

Akemi

~0005625

xaox (reporter)

I now have a new build of the latest plus kernel with HZ=100.

~0005750

toracat (manager)

Last edited: 2007-08-01 19:14

The 100Hz kernels referred to in this report (2.6.18-8.1.8 for CentOS 4 and 5) are now available from:

http://people.centos.org/~hughesjr/vmware-kernels/

(thanks to hughesjr)

Akemi

~0005801

Phil Schaffner (reporter)

A couple of minor points on the hughesjr vmware kernel repo. There is repodata present, so one might expect to use yum for installation; however, ...

1. If the lowest level x86 directories were named i386 instead of i686 then $basearch would work in a yum repo definition for either arch.

2. yum does not see these kernels as an upgrade. Perhaps they could have names with a higher lexical order, e.g. kernel-smp-2.6.9-55.0.2.vm.c4.100HZ.i686.rpm (consistent with plus naming) rather than kernel-smp-2.6.9-55.0.2.EL.100HZ.i686.rpm.

/etc/yum.repos.d/hughesjr.repo
[vm-kernels]
name=CentOS-$releasever - VMware Kernels
baseurl=http://people.centos.org/~hughesjr/vmware-kernels/$releasever/$basearch/
gpgcheck=1
# ???
enabled=0
protect=1
priority=1

Phil

~0005802

JohnnyHughes (administrator)

Last edited: 2007-08-02 21:09

actually, not a bad idea is to use:

<kerenel-version>.vm.c[4,5].100HZ.$arch.rpm

Since vm is > plus (c4plus) and > EL (c4) and > el (c5 and c5plus) ... then that COULD BE an upgrade to everything.

However, if you don't exclude=kernel* then you could upgrade to other regular versions.

I'm not going to recompile the kernels now .. but in the future we will name them .vm. something.

~0005968

kbsingh@karan.org (administrator)

reopening

~0005976

kbsingh@karan.org (administrator)

the right way of doing this, allowing users to opt in when they want - would be to name the kernel rpms as kernel-vm-<version> rather than kernel-<version>.vm otherwise it only causes thrashing in yum, and confuses the users.

~0006046

larstr (reporter)

I've also tried booting different kernels with different parameters. These numbers differ sligthly from the initial ones as I've used a new freshly installed OS for the latest tests:

1000Hz SMP cpu 13.06% ready 4.32% (default)
1000Hz UP cpu 8.88 ready 3.03

100Hz SMP cpu 2.35 ready 0.36
100Hz UP cpu 1.93 ready 0.23

1000Hz SMP "nosmp noapic nolapic" cpu 4.21 ready 2.35
1000Hz UP "nosmp noapic nolapic" cpu 3.97 ready 2.45

100Hz SMP "nosmp noapic nolapic" cpu 0.895 ready 0.254
100Hz UP "nosmp noapic nolapic" cpu 0.788 ready 0.156

100Hz SMP cpu 1.38 ready 0.521
100Hz SMP noapic cpu 1.37 ready 0.294
100Hz UP cpu 1.0 ready 0.310

~0006113

segedunum (reporter)

Does anybody have any idea when a corresponding kernel might start appearing in CentOS Plus?

~0006114

toracat (manager)

The 100Hz kernels are available for CentOS-4 and -5 and can be found in testing:

http://dev.centos.org/centos/4/testing/
http://dev.centos.org/centos/5/testing/

Look for kernel-vm-xxx

Akemi

~0006240

jase99 (reporter)

Here's some feedback. Host = 2 x 2.8GHz CPU x86_64. Two VMs, one is 2 CPU x86_64, the other is 2 CPU i686. When both vm's are idle, host CPU hovers around 50%. Using 2.6.9-55.0.9 kernel in host and guests. With the 100HZ kernels (i686 and x86_64) deployed in the guests, host cpu now hovers around 8% when vm's are idle. I also deployed the devel packages for the vm kernels so that vmware tools works. No bad side effects. Thank you for making these packages available.

~0006246

toracat (manager)

Those who are interested in this subject may also be interested in vmware pre-built images for CentOS. See bug #1722 for details.

~0006249

segedunum (reporter)

Last edited: 2007-11-07 11:10

I can heartily concur with others that this has made a very big difference. I have a dual Opteron (2 GHz) system with CentOS x86_64 as the host, and my CentOS guests went from consuming around 5% usage to around 1.2%. These are 32-bit guests by the way. On another AMD Duron 1.2 GHz, OpenSuse (32-bit) system I have the CPU usage has gone down from around 10% - 15% down to around 0.7% at idle, which is consistent with the other OpenSuse and Windows guests I have. These are just unscientific ps and top readings, but the changes are significant.

The idle CPU usage still seems to be slightly higher than other Linux and Windows guests though, especially on the x86_64 system, although this could be down to the host being 64-bit, it could be due to variances in guest kernel versions where certain timer numbers work better, or something else. More experimentation is needed. I also have my own customised guests we're I'm using LVM, so this might make a difference.

My guests are all UP, so using the kernel parameters "nosmp noapic nolapic" also seems to have a positive effect. Note: don't put 'nosmp' in by itself otherwise you'll get a nice kernel panic!

I'm slightly surprised that the upstream vendor doesn't seem to have anything open regarding this. I would imagine this could be a pretty big problem.

~0006254

toracat (manager)

The new kernel for RHEL 5.1 (2.6.18-53) has a new kernel option called "tick divider". It will let you reduce the system clock rate to 100, 250, etc Hz while allowing you to boot the kernel with 1000Hz. Below is a note from the patch file that adds this feature. How this performs compared to kernel compiled with 100Hz remains to be seen.

Akemi

=================================================================
From: Alan Cox <alan@redhat.com>
Subject: [RHEL5]: Tick Divider (Bugzilla #215403]
Date: Wed, 18 Apr 2007 16:39:15 -0400
Bugzilla: 215403
Message-Id: <20070418203915.GA23344@devserv.devel.redhat.com>
Changelog: [x86] Tick Divider


The following patch implements a tick divider feature that allows you to
boot the kernel with HZ at 1000 but the real timer tick rate lower (thus
not breaking all the modules and kABI).

The selection is done at boot to minimize risk and the patch has been reworked
so that you can do an informal attempt at a proof that it doesn't cause
regression for the non dividing case.

The patch interleaved with notes follows, and below that the actual patch
proper.

Xen kernels remain at 250HZ because
a) Xen guests have a 'tickless mode'
b) Xen itself has issues with multiple differing guest GZ rates

Not queued for upstream as the upstream path is Ingo's tickless kernel, which
is not viable as a RHEL5 tweak
==================================================================

~0006264

toracat (manager)

I have built the 5.1 kernel (2.6.18-53) and collected a preliminary result using vmktree developed by the original reporter, Lars (see the attached graph, 2007Nov10.png). The large peaks are when the system was booted. The cpu levels between boots are marked with the kernel and the option used. kernel-vm is the CentOS version of 100Hz kernel (in which 100Hz is compiled into the kernel). It looks as if the tick_divider=10 (or =4) option had no effect on the idle %cpu. This was not an expected result.

However, I do not have a way to verify that the tick_divider option was indeed honored when I added it to the kernel line.

~0006316

toracat (manager)

I have repeated the test with a new host machine that has freshly installed CentOS 5.0 x86_64 (packages all up-to-date). I installed CentOS-5 (i386 and x86_64) as vmware guests and updated their kernel to 2.6.18-53. The attached graphs are the output of vmktree as before.

C5_i386_Nov15.jpg -- The result with the 32-bit guest was the same as the previous test. The tick_divider=10 option (to make HZ=100) did not have a discernible effect on %idle cpu whereas kernel-vm (100HZ compiled in the kernel) lowered it.

C5_x86_64Nov15.jpg -- The behavior of the 64-bit guest was curious. No apparent difference was seen regardless of the kernels/options used. They all look similar to the output of the kernel-vm 32-bit. ???

Akemi

~0006338

toracat (manager)

All my tests so far have been done with the new 5.1 kernel installed on 5.0. Now that updated rpm's for 5.1 are available from the QA repo, I updated the system to 5.1 and re-ran the test. The result was different. In short, on an i386 system, it all looked like the output of the x86_64 machine (see the graph C5_x86_64Nov15.jpg). When the update files for the x86_64 arch are complete 1n the QA repo, I will upgrade my 64-bit test systems to 5.1 and do more testing.

Akemi

~0006496

smccl (reporter)

Last edited: 2007-12-06 21:00

Just installed CentOS 5.1 on a single CPU VM running on VMware ESX. The tick_divider setting doesn't seem to be making any difference in the number of timer interrupts though. I set tick_divider=10 which should reduce the number of timer interrupts to 100. I wrote a nasty little scripts that queries /proc/interrupts every 1 second and still see an increase each second in about 1000 interrupts. Also, when watching the reporting capabilities of the ESX hypervisor I see no reduction in CPU utilization on the idle VM. Just a side note when I append "nosmp noapic nolapic" as kernel parameters I do see a very nice reduction in CPU utilization however. So the combination of the parameters that do work and the newly added tick_divider would really benefit us.

The server is:

uname -srvmpio
Linux 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:16 EST 2007 i686 i686 i386 GNU/Linux

The pertinent data from grub.conf

title CentOS (2.6.18-53.1.4.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-53.1.4.el5 ro root=/dev/rootvg/rootfs rhgb quiet clocksource=pit tick_divider=10
        initrd /initrd-2.6.18-53.1.4.el5.img

~0006497

smccl (reporter)

Sorry, in the previous post I manually typed the grub.conf lines and tick_divider should equal 10. In my grub.conf "tick_divider=10". Sorry about the typo.

~0006499

JohnnyHughes (administrator)

upstream says this works on i686 and not on x86_64 (the tick_divider option that is)

I have had the same experience as you ... that it does not make any difference in VMWare

~0006500

smccl (reporter)

Have you tested it on bare i686 hardware?

~0006505

toracat (manager)

A question for centos devs (maybe tru?):

Are you planning to build kernel-vm for 5.1? It would be good to have it for comparing the "real" 100Hz kernel and the tick_divider-tweaked version. If this is not being planned, I could build it but would rather not do it myself this time.

Akemi

~0006506

smccl (reporter)

I would be willing to test the 100Hz kernel on vmware and compare that to the tick_divider setting.

~0006508

tru (administrator)

the kernel-vm will be available shortly from the buildsystem, meanwhile I have built them inside my chrooted centos-5 tree. I will put them on dev.centos.org/~tru/kernel-vm asap.

~0006510

tru (administrator)

http://dev.centos.org/~tru/kernel-vm/RPMS/ contains now the chrooted builds.

the i386 kernel boots fine ;)
Linux version 2.6.18-53.1.4.el5vm (centos@blackwilson.bis.pasteur.fr) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Thu Dec 6 11:09:34 EST 2007

~0006511

toracat (manager)

Tru,

Thank you for providing these kernels. I have already downloaded both arches. Will run some tests over the weekend.

Akemi

~0006513

toracat (manager)

Just a quick note to tell you that with the latest version of kernel-vm, I was able to repeat my earlier observation that tick_divider had no visible effect. This was done with both i386 and x86_64. The graphs will follow shortly.

Thanks, Tru, for building the -vm kernels in a timely manner.

Akemi

~0006514

toracat (manager)

Uploaded the graphs showing the test results with the kernel 2.6.18-53.1.4 with no option, tick_divider=10, and kernel-vm (100Hz) for i386 (53.1.4i386.png) and x86_64 (53.1.4x86_64.png). Only the kernel-vm effectively lowered the %cpu.

Akemi

~0006523

toracat (manager)

A question for Lars:

We talked quite a while ago about possible inaccurate measurements that are based on "per time". I recall you indicated that your vmktree is not affected by that issue. So, do you think the graphs I collected are all real and therefore imply that the tick_divider option is not working the same way as the "real" 100Hz kernels?

I am pasting the quote from the Linux Journal for others to see.

==== begin quote ====
As Vassili Karpov has discovered to his dismay, CPU stats are not
accurately reported in /proc/stat on the PC architecture. On that
architecture, CPU usage is examined only during the timer interrupt,
so regular programs can seem to use much more or much less of the CPU,
just because they happen to be either very active or idle at those
particular intervals. This also explains why users might see a
difference in CPU usage when switching their kernel from running at
100Hz to 1,000Hz. In fact, the usage is unchanged, while only the
accounting is different. Programs like top, which get their CPU stats
from /proc/stat, will suffer from this kind of discrepancy. Vassili
and his friends wasted quite a bit of time trying to optimize some
code they were working on, until they discovered that they were
optimizing toward an inaccurate and ever-changing goal.
==== end quote ====

Akemi

~0006525

larstr (reporter)

For VMware Server running on linux, vmktree uses the same methods as top/ps to read the cpu load so it definitely has the same bug as these other tools.

When vmktree is used to get stats from VMware ESX Server it will however read these values out of the kernel of the ESX (vmkernel). This kernel is not based on linux and it is unknown whether the vmkernel is also having this same bug or not.

I have however done the same tests as you and the results are very similar. I now used the minimal centos5 vmware image provided in bug id 1722.

Lars

~0006526

toracat (manager)

Lars,

Thanks for doing the test. Your result is very assuring. Mine was done on vmware server but the similarity is striking.

Akemi

~0006532

smccl (reporter)

I've loaded and monitored the i686 VM kernel on an ESX virtual machine and immediately saw a very large reduction in idle time CPU utilization. This was seen using the built-in reporting utilities of the ESX hypervisor.

I then reverted to the same kernel version from the CentOS updates repository and used the tick_divider kernel parameter without, what seems to be, any affect. It's actually quite striking the difference in utilization.

I ran my same ugly script that polls /proc/interrupts every one second and saw that the difference was between 100-102 timer interrupts a second with the VM kernel provided by tru. To me it still seems like the tick_divider argument isn't cutting back on the number of timer interrupts.

~0006533

toracat (manager)

It is "good" that all our results lead to the same conclusion -- seemingly no effect by the use of the tick_divider option. This makes us wonder what it does or is supposed to do. Another question is how we want to give feedback to upstream. Should one of us file a bug report at Bugzilla?

Akemi

~0006534

toracat (manager)

Forgot to mention (because it is so obvious) that CentOS users benefit from the kernel-vm offered by CentOS until the upstream kernel comes up with a version that actually works as intended. So, tru, your efforts are really appreciated!

Akemi

~0006603

toracat (manager)

A test kernel was made available upstream that contains a "Patch to fix some of the tick divider problems" (see https://bugzilla.redhat.com/show_bug.cgi?id=315471 )
I tested this kernel (2.6.18-58.el5), but the result was the same.

Akemi

~0006629

clalance (reporter)

Hello,
     I did some of the work on the tick divider patch in the RedHat kernels. Would it be possible for someone to try out the latest RedHat errata kernel available (at this point, it would be 2.6.18-53.1.4.el5)

And use the correct kernel command-line option for the divider:

divider=10

and see if that gives some better results?

Thanks,
Chris Lalancette

~0006630

toracat (manager)

Chris,

I am running a test as I type using the kernel 2.6.18-53.1.4.el5 and the option "divider=10" as suggested. Unlike earlier tests, it now looks like the result is similar to the CentOS kernel-vm.

My question is: Was the option supposed to be "divider=" and NOT "tick_divider=" from the very beginning? And we are all wasting our time? Or is this a recent change? The Release Notes for RHEL 5.1 clearly state:

" The tick_divider=<value> option is a sysfs parameter that allows you to adjust the system clock rate while maintaining the same visible HZ timing value to user space applications.

Using the tick_divider= option allows you to reduce CPU overhead and increase efficiency at the cost of lowering the accuracy of timing operations and profiling."

Akemi

~0006631

clalance (reporter)

/me goes to look at the Release Notes....sigh.

The answer to your question is that yes, it has been "divider=" all along. It looks like there was a typo somewhere along the way with the release notes. I'll try to get that rectified online here. Note that the kernel released with 5.1 (-53) had some bugs in the divider that are further fixed in -53.1.2, so you would need at least 53.1.2 to get it really working.

As far as whether you are wasting your time, I can't say. I was pointed here by Jarod Wilson, and I was trying to accomplish two things by posting here:

1) Trying to dissuade CentOS from building a separate kernel, if possible, given the functionality already in 5.1.
2) Make sure that there aren't additional outstanding bugs in the current divider patch that would affect both RH and CentOS.

Chris Lalancette

~0006632

toracat (manager)

Chris,

Thanks for posting here to let us know of the correct option. I have another question. Do you know if the x86_64 kernel now works as well?

Akemi

~0006633

clalance (reporter)

Akemi,
     With the latest errata kernel (-53.1.4), all of the issues I know about on both i686 and x86_64 are fixed. Of course, if you run into any additional problems, please let me know so that I can try to track it down.

Chris Lalancette

~0006634

toracat (manager)

Thanks Chris,

I just talked with some people on the #vmware IRC. They wonder how many RH systems have been running with the incorrect option. Someone is already sending a note to his customer. But at lease it was good that the error was noticed here.

Akemi

~0006636

smccl (reporter)

When using divider=10 and manually overriding the clocksource as a kernel parameter (to avoid in the future clock drift), the vm hangs on startup. After LVM initialization the following error is displayed:

BUG: soft lockup detected on CPU#0!

Ultimately, the vm never seems to come up. According to http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1420
we shouldn't need to worry about time keeping algorithms which attempt to catch up too aggressively in the 2.6.18 kernel. If I don't manually override the clocksource then the clock drifts into the future.

Any recommendations on how to manually override the clocksource to an algorithm that doesn't attempt to "catch up" and still use the divider kernel parameter.

Relevant info:

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm jiffies tsc pit

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

uname -r
2.6.18-53.1.4.el5

cat /proc/cmdline
ro root=/dev/rootvg/rootfs rhgb quiet divider=10 clocksource=pit

~0006640

toracat (manager)

Last edited: 2008-01-03 12:17

"When using divider=10 and manually overriding the clocksource as a kernel parameter (to avoid in the future clock drift), the vm hangs on startup."

I was able to reproduce this with my VM (2.6.18-53.1.4.el5 i686) and divider=10 clocksource=pit. It hangs upon boot.

However, with the CentOS kernel-vm (2.6.18-53.1.4.el5vm) and clocksource=pit, the same VM booted normally.

Kernel command line: ro root=LABEL=/ rhgb quiet clocksource=pit
ACPI: (supports S0 S1<6>Time: pit clocksource has been installed.

Akemi

edit by hughesjr:

relevant upstream bug:

https://bugzilla.redhat.com/show_bug.cgi?id=315471

~0006641

toracat (manager)

Last edited: 2008-01-03 12:20

I have uploaded the test result from Note 6630 (divider10_i686_Jan022007.png)

http://bugs.centos.org/file_download.php?file_id=413&type=bug

Akemi

~0006643

JohnnyHughes (administrator)

the "divider=" option still has some issues (I think) based on this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=315471

so we will maintain the kernel-vm until resolved.

~0006644

JohnnyHughes (administrator)

OK ... I have tested the latest i686 kernel from here:

http://people.redhat.com/dzickus/el5/

Which is kernel-2.6.18-62.el5.i686.rpm and "divider=10 clocksource=pit" hangs the boot.

As a side note ... if the clock GAINS (runs to fast) time you should be able to fix it with this:

http://kb.vmware.com/kb/1591

(by setting the correct host.cpukHz) and vmware tools should adjust a clock that is too slow.

Also see this blog entry concerning host.cpukHz:

http://blog.autoedification.com/2006/11/vmware-guest-clock-runs-fast.html

~0006645

clalance (reporter)

jhughes: bug 315471 is resolved in both the 5.2 development kernel and in the 53.1.4 errata kernel. However, it seems like we still have a problem with the "clocksource=pit divider=10".

smccl or toracat: I don't actually have VMware to test with, so it would be great if one of you could run a few tests. First, are you running i686 or x86_64 VMs? I'm suspecting i686 since clocksource=pit doesn't make a huge difference in x86_64 VMware, but I just want to confirm. In terms of tests, I am interested in:

1) Try booting -53.1.4 with "divider=10" only. Does that work?
2) Try booting -62 with "divider=10" only. Does that work?
3) Try booting -53.1.4 with "divider=10 clocksource=pit". Does that work (probably not, based on earlier comments)?
4) Try booting -62 with "divider=10 clocksource=pit". Does that work?

For 3) and 4), if they both don't work, it would be great if you could get an "Alt-Sysrq-t" output from both of them and add them to this bug.

Thanks,
Chris Lalancette

~0006646

smccl (reporter)

I am testing on the i686 arch and will only be able to use the errata kernel mentioned 2.6.18-53.1.4.el5. I have confirmed that using just divider=10 works with the expected results and using just clocksource=pit works with the expected results.

I added a virtual serial port to the virtual machine and appeneded the bootup sequence with call traces to the bootup-vi.txt file. The "Alt-Sysrq-t" key sequence seemed ineffective even after enabling the functionality in /etc/sysctl.conf but every other keystroke and combination was ineffective as well during the system start up where both clocksource and divider are set.

It may be worth noting that the virtual machine uses 100% of its available cpu (UP) the entire time. Eventually I just give up and perform a hard shutdown of the vm. I tried booting up with clocksource=pit and divider=2 and eventually the system came up but very slowly. Once up remote ssh sessions had a very poor response time and keystrokes were delayed (the system just seemed busy). Once in I noticed that commands like date were reporting very erratic results. From one iteration of the command to the next (maybe 3 seconds) several hours of time would gain or even a whole day.

Sorry I can't continue testing to much unless it's on my own time. Maybe this evening some.

~0006647

toracat (manager)

Chris,

With regard to the -62 kernel, the "divider=10" only works. With both "divider=10" and "clocksource=pit", it attempts to continue the boot process, but as smccl said, cpu shoots to 100% and the whole thing is practically "dead".

Akemi

~0006650

clalance (reporter)

FYI the "clocksource=pit divider=10" bug; I've opened a RedHat Bugzilla about it here:

https://bugzilla.redhat.com/show_bug.cgi?id=427588

I have a good idea of what the problem is, I just need to come up with an acceptable solution.

Chris Lalancette

~0006652

arrfab (administrator)

Just to add a note/comment to the (already long) list : i've tested a centos 5.1 i386 with the kernel 2.6.18-53.1.4.el5 / divider=10 option.
I've attached the result (c51-i386-divider.png) in the 'Attached files' on this page.
It seems to work as expected. I've only a couple of lines at boot time but the machine boots and everything seem ok after (from dmesg):

CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ stepping 03
Total of 1 processors activated (5639.48 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ... failed.
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ...<6>spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
spurious APIC interrupt on CPU#0, should never happen.
<snip>
 works.
Brought up 1 CPUs

~0006653

arrfab (administrator)

added a link to the uploaded file : http://bugs.centos.org/file_download.php?file_id=417&type=bug

~0006683

mmclean (reporter)

With regard to the problem using both clocksource=pit and divider=10 at the same time, I was wondering if anyone has tried any of the other clocks? Specifically the acpi_pm clock, which manages to keep very good time on CentOS running on an ESX server.

Although the pit option doesn't drift into the future, we were still having issues with clock drift and over correction by the vmware-tools, so VMware support suggested using the pmtmr (acpi_pm in 2.6 kernels) clock instead, which has worked.

However, we still need the 100Hz option otherwise the ESX server can't cope. So this new kernel parameter should do the trick, but I currently have no way of testing the effect of the divide option along with the acpi_pm clock.

~0006689

delimiter (reporter)

Reporting success using divider=10 clocksource=acpi_pm
Using kernel 2.6.18-53.1.4.el5.centos.plus in a VM running on VMware ESX 3.0.2.
At rest CPU usage drops from ~50Mhz to ~12Mhz. Will have to see whether clock drift is acceptable.

~0006690

delimiter (reporter)

Er, that last note ate my tildes.
Usage dropped from about 50Mhz (2.4%) to about 12Mhz (.64%)

~0006691

smccl (reporter)

delimiter, are you using the time sync functionality of vmware-tools in combination with acpi_pm?

~0006719

delimiter (reporter)

The clocksource=acpi_pm kernel option I specified earlier was misinformed. According to VMware's timekeeping whitepaper "pmtmr" and not acpi_pm would be the proper option... however since this is the default I have removed clocksource altogether.
smccl: yes we are using tools.syncTime = "TRUE"
Again, this is on Vmware ESX 3.0.2

~0006722

mmclean (reporter)

delimiter, the VMware white paper is based on the old 2.4 kernel and pmtmr does not exist in the new 2.6 kernel. Instead, acpi_pm has replaced it and it is not the default clocksource in the 2.6 kernel, which is why we specify it manually.

~0006748

toracat (manager)

Chris,

I see that you have corrected the Release Notes. However, there is one more place in the x86_64 version of the Notes that still says "tick_divider" as in:

Using the tick_divider command-line argument ...

Akemi

~0007071

mleonhardt (reporter)

hi there,
we use the xaox repository to run our Servers virtualized in VMWare ESX 3.5. The precompiled kernels (thanks xaox!!) work very well. Does anyone knows wether there exist a repository with VMI-Support also enabled?
The current Ubuntu kernels already have VMI enabled as default. It would be nice, if this where the standard in CentOS also.

kind regards and thanks for your help.
Matthias Leonhardt

~0007075

JohnnyHughes (administrator)

CONFIG_VMI is not in any of the Kernels that are currently in RHEL-5 ... even the test kernels.

This was not added to the kernel tree until 2.6.21 and I do not see any patches anywhere that roll it back into any 2.6.18 kernels.

I do not think that CentOS will be creating a kernel that is outside the 2.6.18 tree to turn this on.

Here is a reference:
http://kerneltrap.org/node/14848

~0007559

wizard113 (reporter)

Using the 2.6.18-92.1.6 x86_64 kernel-vm, I am trying to determine why the only available clocksource is 'jiffies'. It seems that the arch/x86_64/kernel/time directory does not contain the same set of clocksources that the i386 directory does.

I see a (possible) patch for this, at http://sr71.net/~jstultz/tod/broken-out/ - but I am curious if there was a reason why these clocksources (hpet, tsc) are not included in the Centos x86_64 kernels?

The reason I ask, is that I cannot get the x86_64 VMs to keep proper time using jiffies, and while I could go back to i386, I'd really rather stay with the x86_64 kernel.

~0008041

garrettsmith (reporter)

http://kb.vmware.com/kb/1006427 lists the timekeeping best practices for a number of distributions.

~0008048

toracat (manager)

Promising patches that would improve timekeeping were made available for RHEL-5 by vmware:

https://bugzilla.redhat.com/show_bug.cgi?id=463573

However, they will not appear until RHEL *5.4* :-(

~0008110

tru (administrator)

Unless I made a mistake the first time, the recommended values have changed.

before sept 19th the recommended values with the vmware KB
for RHEL-4 32 bits: "divider=10 clock=pit"
for RHEL-5 32 bits: "divider=10 clocksource=acpi_pm"
for RHEL-4 64 bits: "divider=10 clock=pit"
for RHEL-5 64 bits: "notsc divider=10"

as of oct 10th it's now (Last Modified Date: 09-22-2008ID: 1006427:)
for RHEL-4 32 bits: **CHANGED** "clock=pmtmr divider=10"
for RHEL-5 32 bits: (unchanged) "divider=10 clocksource=acpi_pm"
for RHEL-4 64 bits: **CHANGED** "notsc divider=10"
for RHEL-5 64 bits: (unchanged) "notsc divider=10"

the new vmware guests will reflect the changes on the next release.

~0008207

tru (administrator)

http://kb.vmware.com/kb/1007020 links to RHSA-2008:0519 (kernel-2.6.18-92.1.6.el5.src.rpm)

~0019471

Evolution (administrator)

closed due to inactivity. Please re-open if the problem exists with new versions.
+Notes

-Issue History
Date Modified Username Field Change
2007-07-04 21:29 larstr New Issue
2007-07-04 21:29 larstr Status new => assigned
2007-07-05 16:54 smooge Note Added: 0005524
2007-07-09 06:15 larstr Note Added: 0005539
2007-07-09 16:50 toracat Note Added: 0005543
2007-07-12 11:44 larstr Note Added: 0005556
2007-07-12 12:07 toracat Note Added: 0005557
2007-07-13 23:56 larstr File Added: centos-cpuload.png
2007-07-13 23:57 larstr File Added: debian-cpuload.png
2007-07-14 00:00 larstr Note Added: 0005567
2007-07-14 01:07 toracat Note Added: 0005568
2007-07-14 01:38 larstr Note Added: 0005569
2007-07-14 01:51 toracat Note Added: 0005570
2007-07-14 07:35 larstr Note Added: 0005571
2007-07-14 07:39 toracat Note Added: 0005572
2007-07-14 20:04 toracat Note Added: 0005573
2007-07-14 20:26 toracat Note Edited: 0005573
2007-07-14 21:24 larstr File Added: centos-cpuload-up-100hz.png
2007-07-14 21:29 larstr Note Added: 0005574
2007-07-14 21:34 toracat Note Added: 0005575
2007-07-15 04:23 toracat Note Added: 0005577
2007-07-15 11:51 toracat Note Added: 0005578
2007-07-16 12:42 larstr Note Added: 0005581
2007-07-16 14:19 toracat Note Added: 0005583
2007-07-16 14:20 toracat Note Edited: 0005583
2007-07-16 19:34 larstr Note Added: 0005587
2007-07-16 19:53 toracat Note Added: 0005588
2007-07-18 09:37 toracat Note Added: 0005601
2007-07-18 11:35 larstr Note Added: 0005602
2007-07-18 12:14 toracat Note Added: 0005603
2007-07-18 13:45 xaox Note Added: 0005608
2007-07-19 17:07 toracat Note Added: 0005618
2007-07-19 17:53 xaox Note Added: 0005619
2007-07-19 18:12 toracat Note Added: 0005620
2007-07-20 18:31 larstr Note Edited: 0005602
2007-07-20 19:06 xaox Note Added: 0005625
2007-07-27 18:40 toracat Note Added: 0005750
2007-08-01 19:14 toracat Note Edited: 0005750
2007-08-02 20:49 Phil Schaffner Note Added: 0005801
2007-08-02 21:07 JohnnyHughes Note Added: 0005802
2007-08-02 21:08 JohnnyHughes Status assigned => resolved
2007-08-02 21:08 JohnnyHughes Resolution open => fixed
2007-08-02 21:09 JohnnyHughes Note Edited: 0005802
2007-09-08 19:34 kbsingh@karan.org Relationship added has duplicate 0002320
2007-09-08 19:36 kbsingh@karan.org Note Added: 0005968
2007-09-08 19:36 kbsingh@karan.org Status resolved => assigned
2007-09-11 17:01 kbsingh@karan.org Note Added: 0005976
2007-09-23 11:19 larstr Note Added: 0006046
2007-10-10 20:31 segedunum Note Added: 0006113
2007-10-10 20:49 toracat Note Added: 0006114
2007-11-05 14:09 jase99 Note Added: 0006240
2007-11-05 14:09 danieldk Relationship added duplicate of 0001680
2007-11-05 18:07 toracat Note Added: 0006246
2007-11-07 09:43 segedunum Note Added: 0006249
2007-11-07 10:23 segedunum Note Edited: 0006249
2007-11-07 11:10 segedunum Note Edited: 0006249
2007-11-09 00:21 toracat Note Added: 0006254
2007-11-11 14:16 toracat File Added: 2007Nov10.png
2007-11-11 14:28 toracat Note Added: 0006264
2007-11-15 23:16 toracat File Added: C5_i386_Nov15.jpg
2007-11-15 23:19 toracat File Added: C5_x86_64Nov15.jpg
2007-11-15 23:38 toracat Note Added: 0006316
2007-11-18 15:58 toracat Note Added: 0006338
2007-12-06 16:44 smccl Note Added: 0006496
2007-12-06 16:46 smccl Note Added: 0006497
2007-12-06 21:00 JohnnyHughes Note Edited: 0006496
2007-12-06 21:17 JohnnyHughes Note Added: 0006499
2007-12-06 21:26 smccl Note Added: 0006500
2007-12-07 17:58 toracat Note Added: 0006505
2007-12-07 18:24 smccl Note Added: 0006506
2007-12-07 21:16 tru Note Added: 0006508
2007-12-07 23:40 tru Note Added: 0006510
2007-12-07 23:46 toracat Note Added: 0006511
2007-12-08 09:32 toracat Note Added: 0006513
2007-12-08 18:14 toracat File Added: 53.1.4i386.png
2007-12-08 18:15 toracat File Added: 53.1.4x86_64.png
2007-12-08 18:19 toracat Note Added: 0006514
2007-12-09 16:50 toracat Note Added: 0006523
2007-12-10 00:22 larstr File Added: 2.6.18-53.i686-esx-xeon-15-2-8.png
2007-12-10 00:29 larstr Note Added: 0006525
2007-12-10 00:50 toracat Note Added: 0006526
2007-12-10 16:06 smccl Note Added: 0006532
2007-12-10 16:32 toracat Note Added: 0006533
2007-12-10 16:39 toracat Note Added: 0006534
2007-12-23 21:50 toracat Note Added: 0006603
2008-01-02 18:07 clalance Note Added: 0006629
2008-01-02 19:23 toracat Note Added: 0006630
2008-01-02 19:36 clalance Note Added: 0006631
2008-01-02 19:45 toracat Note Added: 0006632
2008-01-02 21:21 clalance Note Added: 0006633
2008-01-02 21:39 toracat Note Added: 0006634
2008-01-02 22:54 smccl Note Added: 0006636
2008-01-03 00:16 toracat Note Added: 0006640
2008-01-03 00:51 toracat File Added: divider10_i686_Jan022007.png
2008-01-03 00:52 toracat Note Added: 0006641
2008-01-03 11:47 JohnnyHughes Note Added: 0006643
2008-01-03 12:07 JohnnyHughes Note Added: 0006644
2008-01-03 12:17 JohnnyHughes Note Edited: 0006640
2008-01-03 12:20 JohnnyHughes Note Edited: 0006641
2008-01-03 14:03 clalance Note Added: 0006645
2008-01-03 15:38 smccl File Added: bootup-vi.txt
2008-01-03 15:53 smccl Note Added: 0006646
2008-01-03 16:20 toracat Note Added: 0006647
2008-01-04 22:05 clalance Note Added: 0006650
2008-01-05 22:13 arrfab File Added: c51-i386-divider.png
2008-01-05 22:17 arrfab Note Added: 0006652
2008-01-05 22:23 arrfab Note Added: 0006653
2008-01-11 07:12 mmclean Note Added: 0006683
2008-01-11 19:36 delimiter Note Added: 0006689
2008-01-11 19:39 delimiter Note Added: 0006690
2008-01-11 20:03 smccl Note Added: 0006691
2008-01-16 16:04 delimiter Note Added: 0006719
2008-01-17 06:27 mmclean Note Added: 0006722
2008-01-24 16:35 toracat Note Added: 0006748
2008-03-28 13:14 mleonhardt Note Added: 0007071
2008-03-31 07:51 JohnnyHughes Note Added: 0007075
2008-07-03 15:47 wizard113 Note Added: 0007559
2008-09-25 17:50 garrettsmith Note Added: 0008041
2008-09-26 20:17 toracat Note Added: 0008048
2008-10-10 12:28 tru Note Added: 0008110
2008-10-29 14:56 tru Note Added: 0008207
2014-03-05 20:31 Evolution Note Added: 0019471
2014-03-05 20:31 Evolution Status assigned => closed
+Issue History