View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001174 | CentOS-4 | Other | public | 2006-01-19 16:40 | 2013-03-23 12:58 |
Reporter | smolderinggenius | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
Product Version | 4.2 - x86_64 | ||||
Summary | 0001174: Extremely high load average and 100% CPU utilization | ||||
Description | Have there been reported issues with high load average while doing rather trivial tasks? I am running a 3200+ AMD64 with 1G of ram and 160G sata hard drive. The system runs apache, mysql and sendmail. Under average conditions, it runs at a Load average of less than 1, but if I start an updatedb, or a tar, or webalizer, or any application that increases disk activity, the Load Average gradually increases, it has gone as high as 300+, until one of the applications is killed off. The server will become nearly unusable. During these episodes, the cpu idle goes to 0%, and the wa climbs towards 100%, as reported by top. | ||||
Additional Information | The server has 2Gb of swap space. I have seen this beavior with a 3400+ AMD64 as well. | ||||
Tags | No tags attached. | ||||
I have never seen this ... but here are some things to try: 1. Is this running the smp kernel, if so, try running the standard kernel and see if it is repeated. 2. Also try booting with noapic and noacpi added on the kernel line in /boot/grub/grub.conf and see if it still happens (like this): kernel ..whatever is there.. noapic noacpi If acpi/apic not on fixes it, you can see about getting the latest firmware for the machine to see if it fixes the issue. |
|
kernel version: 2.6.9-22.0.1.EL #1 Thu Oct 27 14:29:45 CDT 2005 x86_64 x86_64 x86_64 GNU/Linux | |
I solved similar problems by downgrading to 2.6.9-11.EL (from Vault). It worked on both machines having this problem. Also try tweaking swap: vm.min_free_kbytes = 949 vm.swappiness = 20 |
|
I recently had an issue where localhost was not the first name on the /etc/hosts loopback line. There was an issue with X running at 100% due to a name lookup issue. putting all other names for the 127.0.0.1 line at the end of the localhost line solved this issue. so the line should look like this: 127.0.0.1 localhost localhost.localdomain othername othername2.whatever.org This error had something to do with the fact that the first name listed in this line is the name that several programs (including mysql and others) will use as the machine name by default. |
|
Changing the host file (127.0.0.1 localhost line) seems to have fixed the load problems on our servers with 2.6.9-22.XXX-kernels. | |
Does the server need to be restarted after making the changes? | |
No need to restart. | |
I am still getting high load. When I run updatedb, top reportss as follows: top - 12:36:51 up 68 days, 5:26, 3 users, load average: 8.52, 3.76, 1.73 Tasks: 132 total, 2 running, 130 sleeping, 0 stopped, 0 zombie Cpu(s): 11.6% us, 5.3% sy, 0.0% ni, 0.0% id, 81.4% wa, 0.0% hi, 1.7% si Mem: 1025168k total, 1009988k used, 15180k free, 107980k buffers Swap: 2048248k total, 121224k used, 1927024k free, 185588k cached There is currently no other active load on the server, and the load average on the server continues to climb while updatedb is running, add the io wait is very high as well. |
|
This test was on the production server, and the load average exceeded 25, and I had to kill updatedb before it completed. | |
I'm running into this as well. Specifically running CentOS 4.2. Both installed -22 and -34 kernels run to a really high i/o wait time causing the system to become so bogged down it can't do anything. Usually caused by amavisd-new (during dictionary attacks) or apache. Previous server running other distros with exact same configuration/software did not have this problem. Previous system was only 1.5 AMD, this is an AMD 2600+. Both using Asus A7N8X motherboards (nForce chipsets). During high wait times the application causing the high i/o will go into "D" state according to top. Method used to build the server can (and I do plan to) publish the documents on it. I DO believe it is an issue with the kernel as I am seeing quite a few people resolving by upgrading or downgrading kernel. (added 4/15/06) downgrading the kernel to 2.6.9-11 did resolve the problem so it's obviously a kernel issue with 2.6.9-22 to 2.6.9-34 |
|
The problem is definitely related to swap. After the kernel downgrade the system affected has not touched the swap even once while using the appropriate amount of RAM unlike previously. | |
I'm seeing this problem too, or something very similar. Weirdly it only occurs on systems with > 756MB RAM. We had 4 mail servers running CentOS 4.3 with 2.6.9-34.ELsmp. All running near-enough identical software/OS-wise but on varied hardware, all systems are i686. mailserv1 - IBM Netfinity 4500R - Mem: 514524 mailserv2 - IBM xSeries 340 - Mem: 774504 mailserv3 - IBM xSeries 342 - Mem: 1034716 mailserv4 - Dell Poweredge 2650 - Mem: 1034676 mailserv1 and mailserv2 behave perfectly running 2.6.9-34.ELsmp, mailserv3 and mailserv4 swap excessively and don't use anything like the available memory. The attached Cacti and Munin memory useage graphs for mailserv3 shows the system running 2.6.9-34.ELsmp and 2.6.9-22.ELsmp up to half way though Week 18 when we switched to 2.6.9-5 and things started behaving again. http://bugs.centos.org/file_download.php?file_id=98&type=bug http://bugs.centos.org/file_download.php?file_id=99&type=bug I'll try uniprocessor kernels and noapic/noacpi when I have a chance but the different hardware would rule out firmware to me. Will. |
|
2006-05-18 13:20
|
|
2006-05-18 13:32
|
|
I have this exact same problem on: Centos 4.3 64bit running on: AMD Athlon64 3000+ 2048 MB DDR-RAM 2x 160 GB Hard Drive (RAID 1) With the exact same hardware but on Centos 4.2 64bit there is no problem whatsoever. After the server restart the loads start raising within a few hours it reaches 80 and in 12 hours it reaches 150 and server starts slowing down and dying. Here is a screenshot: http://img90.imageshack.us/img90/8382/cpuhigh1te.jpg (another interesting fact which may be useful in diagnosing this problem is that during the time I took the above screenshot the server was runny pretty fine. Pages (which use mysql+php) are loading very fast with no problem. On my other servers, if ever a load reached 75, the server would start crawling). However when this load rises to 150-200 thats when the servers slows down, sendmail stops working etc) On my other server, which has the exact same hardware, there is Centos 4.3 32bit installed (although its a 64bit athlon CPU), and everything works fine there (sometimes during the day the CPU user usage goes up to 98%, but the load stays below 3 and servers works fine). If you look at the screenshot, it looks as though apache is eating all of the CPU, howevever we have tried different versions of apache (including the latest 2.2). There has been NO difference, thus I do not think this bug has anything to do with apache. We have tried compiling different kernel versions, yet same problem persists. We have tried compiling different versions of PHP4 - no difference. (haven't tried different version of mysql4, but highly doubt anything between the current mysql and centos 4.3) Server starts, then slowly the CPU user percentage keeps going up, once it hits 95-98% the loads starts rising. The weird thing is that the same hardware (same software, same configs) with same amount of hits on the Centos 4.2bit works perfectly with 0.50-1.50 load. Even tried the "localhost" suggestion, didn't help. Thanks in advance for your help. |
|
There is a test kernel that has modified default elevator settings in our test repo for i386 (i586 and i686) and x86_64. Try that kernel (I would download the devel and the kernel manually ... use kernel-smp, kernel-hugemem, kernel-largesmp as necessary) and install them for testing. 2.6.9-34.19.EL is the version. http://dev.centos.org/centos/4/testing/i386/RPMS/ or http://dev.centos.org/centos/4/testing/x86_64/RPMS/ ------------------------------------------------ This is a test kernel, and we won't adjust the official kernel until another one is released from upstream. |
|
Hi jhughes, thanks for the reply, unfortunately the suggestions did not help. Here is a screenshot of 2 servers (left is 4.3 with problems; right is 4.2 with no problems): http://img310.imageshack.us/img310/244/servershot6mj.jpg Both servers have the same CPU/memory, here is the CPU info: [root@10 ~]# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3000+ stepping : 10 cpu MHz : 1999.800 cache size : 512 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 3923.96 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp Currently on problem server: Linux //removedhost// 2.6.9-34.19.EL #1 Thu Apr 20 08:11:36 CDT 2006 x86_64 x86_64 x86_64 GNU/Linux Currently on the good server: Linux //removedhost// 2.6.9-22.EL #1 Sat Oct 8 21:08:40 BST 2005 x86_64 x86_64 x86_64 GNU/Linux What other information would you like me to get? or some test to do to be able to find out why I am having problems with 4.3 centos (64 bit). Thank you :' |
|
Also, when the load goes up (lets say to 100-150) I found that I don't need to restart the system to get the avg load back to normal. Simply restarting httpd brings load averages down (then of course it continues climbing to 150 throughout the day). So, it seems to me there is a problem between Apache + 4.3 64bit I wish we can find a fix. |
|
The motherboard failed in our RHEL3 server and the new board (Asus A8N5X with AMD Athlon 64 3000+) did not have proper driver support (got the Serial ATA working by manually adding the sata_nv driver). Upgraded the system to CentOS 4.3 and although everything works the load average sky rockets due to the system aggressively swapping memory to disc. System has 1GB RAM but memory utilisation never exceeds +-300MB. Information from 'top': Mem: 1035792k total, 322092k used, 713700k free, 2100k buffers Swap: 1004052k total, 275040k used, 729012k free, 54724k cached The attached 'Memory Utilisation.pdf' shows the problem quite clearly, one can almost see the exact moment we upgraded the kernel on CentOS 3 to 2.6.9-34.EL (from CentOS 4) and when the system was completely re-installed from scratch using CentOS 4. RedHat have a document called 'Understanding Virtual Memory In Red Hat Enterprise Linux 4' which is available here but I simply don't understand what I should be changing: http://people.redhat.com/nhorman/papers/rhel4_vm.pdf |
|
2006-05-31 17:21
|
|
RedHat kindly provided a solution to this problem in the form of a beta kernel scheduled for RHEL4 U4. RedHat Bugzilla ID: 188141 Bugzilla Summary: Kernel appears too conservative in memory use Working Kernel: 2.6.9-37.EL (http://people.redhat.com/~jbaron/rhel4/) |
|
2006-06-05 10:06
|
|
2006-06-05 10:06
|
|
I solved similar problems by downgrading to 2.6.9-11.EL Whats Happen with kernel? see graph http://bugs.centos.org/file_download.php?file_id=107&type=bug http://bugs.centos.org/file_download.php?file_id=106&type=bug vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2800.615 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid bogomips : 5521.40 |
|
I'm currently also running RedHats beta kernel (2.6.9-37.EL) on five webservers and now memory usage seems to be like it used to be long time ago (kernel 2.6.9-11 and before). Tweaking hosts, dns-caching and mySQL allowed our servers to work bearably (only short load and iowait spikes), but memory usage was still really strange. Might be interesting to know what in the kernel caused this and why did it take so long for RedHat to notice and fix (eventhough official fix will not be available until U4). |
|
none of the suggestions here have helped me. I have to restart httpd every 5 hours in order to be able to continue running the servers. Have tried different versions of apache, php, mysql, kernel with no luck. |
|
OK, there is a much improved kernel in the http://dev.centos.org/centos/4/testing/ (2.6.9-39.EL) Anyone having this issue give that a try for your arch and see if it works any better. |
|
This has been driving me crazy for about a week now. I was running 2.6.9-22smp and started seeing this issue. I am running an Intel motherboard in a 1U server with 2GB RAM, a 3Ware 8006-2LP hardware RAID controller and two 7200rpm 300GB SATA drives. //root> info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-1 OK - - 279.46 ON - - Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 279.46 GB 586072368 5NF1H8W5 p1 OK u0 279.46 GB 586072368 5NF1J97E The 3Ware CLI and smartd both report that the drives and the array are fine, but whenever any even light to moderate constant writing to disc occurs, the load average rises very rapidly, blocking logins and other processes until the writing is virtually done. I am now running 2.6.9-39.0.2.ELsmp after having tried 2.6.9-42.0.2.ELsmp. 2.6.9-39.0.2.ELsmp seems to do a bit better but it is hard to tell. I have tried the regular (non-smp) 2.6.8-42 kernel and that didn't appear to be any better. If anything, it was a little worse. I have not tried noapic and noacpi - what effect are they supposed to have? |
|
Here is some vmstat output showing some other symptoms of the behavior thst I am seeing - not 100% sure this is the same behavior other folks are seeing or not: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- 0 1 528 17440 24224 777820 0 0 0 368 1086 554 5 1 47 49 0 1 528 17696 24224 777820 0 0 0 0 1074 278 2 0 48 49 0 1 528 17696 24236 777808 0 0 0 200 1108 778 3 1 48 48 0 2 528 16320 24244 778320 0 0 30 210 1152 909 21 2 43 34 2 2 528 17784 24268 774656 0 0 166 1332 1203 591 45 5 8 42 0 4 528 23552 24272 771012 0 0 4 2 1145 488 39 5 0 57 0 4 528 23528 24272 771012 0 0 0 0 1198 715 6 1 0 93 0 4 528 23664 24272 771012 0 0 0 0 1130 480 2 1 0 97 1 4 528 23600 24272 771012 0 0 0 0 1125 407 1 1 0 98 0 4 528 23776 24272 771012 0 0 0 0 1102 266 0 0 0 100 0 4 528 23712 24272 771012 0 0 0 0 1089 288 1 1 0 99 0 5 528 23656 24272 771012 0 0 10 0 1054 298 2 1 0 98 0 6 528 22888 24272 771012 0 0 32 0 1050 285 2 0 0 98 ... 0 7 528 23416 24280 771004 0 0 4 0 1063 329 8 3 0 90 1 8 528 23480 24280 771004 0 0 6 0 1093 395 6 1 0 93 0 8 528 23480 24280 771004 0 0 0 0 1202 480 3 1 0 96 0 8 528 23632 24284 771260 0 0 0 0 1061 272 3 0 0 97 0 8 528 23632 24284 771260 0 0 0 0 1074 255 1 0 0 99 0 8 528 23744 24284 771260 0 0 0 0 1044 200 0 0 0 100 0 8 528 23744 24284 771260 0 0 0 0 1029 188 0 0 0 100 0 8 528 23312 24284 771260 0 0 0 0 1047 258 2 1 0 98 0 8 528 23440 24284 771260 0 0 0 0 1046 272 1 0 0 99 0 8 528 23568 24284 771260 0 0 0 0 1105 369 1 1 0 98 0 7 528 23632 24288 771256 0 0 0 0 1066 224 1 0 0 98 0 5 528 23776 24288 771256 0 0 14 0 1112 460 7 1 0 92 0 5 528 23520 24288 771256 0 0 0 0 1030 227 1 0 0 98 0 5 528 23472 24288 771256 0 0 0 0 1045 239 1 0 0 99 0 5 528 22688 24288 771256 0 0 0 0 1077 386 2 1 0 97 1 6 528 22704 24292 771252 0 0 2 0 1140 453 1 13 0 86 2 6 528 22128 24292 771252 0 0 0 0 1015 270 1 20 0 79 2 6 528 21568 24292 771252 0 0 0 0 1002 264 1 1 0 99 3 6 528 20864 24296 771248 0 0 0 0 1002 178 0 0 0 100 3 6 528 20864 24296 771248 0 0 0 0 1002 262 0 1 0 99 3 6 528 20864 24296 771248 0 0 0 0 1002 183 0 1 0 99 4 6 528 20224 24296 771248 0 0 0 0 1002 180 0 0 0 100 4 6 528 20224 24296 771248 0 0 0 0 1003 179 0 1 0 100 4 6 528 20224 24296 771248 0 0 0 0 1003 183 1 1 0 99 4 6 528 20224 24296 771248 0 0 0 0 1002 179 0 0 0 100 4 6 528 20288 24300 771504 0 0 0 0 1002 226 0 1 0 99 7 4 528 18152 24300 771504 0 0 0 0 1101 359 19 3 0 78 {more heavy i/o wait} 1 1 528 16112 24312 769672 0 0 14 102 1044 228 51 1 0 48 1 1 528 16112 24316 769668 0 0 14 2 1052 262 51 1 2 47 1 1 528 17328 24316 768108 0 0 34 174 1108 412 63 1 20 16 1 2 528 16688 24320 768104 0 0 32 0 1067 253 51 1 14 35 1 1 528 16704 24344 768080 0 0 8 522 1032 192 50 0 7 44 6 1 528 16448 24372 768052 0 0 6 286 1056 415 21 1 47 30 1 1 528 22632 24384 761540 0 0 0 70 1130 508 75 2 15 8 1 1 528 22312 24408 762036 0 0 0 758 1119 425 53 1 3 43 1 1 528 20128 24412 762292 0 0 22 566 1175 563 56 2 1 41 1 1 528 19728 24416 762808 0 0 0 318 1109 351 54 1 36 10 1 1 528 23056 24280 758524 0 0 0 4 1143 425 72 3 25 0 1 2 528 22608 24292 758772 0 0 2 912 1100 407 52 1 30 17 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 3 528 22240 24292 759292 0 0 0 0 1050 210 50 0 0 50 1 6 528 21088 24292 759552 0 0 24 0 1123 511 58 2 0 40 1 4 528 20440 24300 759804 0 0 0 4 1103 311 51 1 0 48 1 4 528 19992 24300 760324 0 0 0 0 1046 237 53 1 0 47 0 4 528 19656 24300 760844 0 0 0 4 1079 309 20 1 0 80 0 1 528 20104 24312 760832 0 0 0 556 1064 224 0 0 6 93 0 1 528 20288 24316 760828 0 0 2 1654 1119 258 14 1 47 38 0 1 528 20352 24324 760820 0 0 0 402 1210 331 3 1 49 47 0 1 528 22216 24436 756808 0 0 32 1304 1323 1422 23 5 54 18 0 1 528 20680 24464 756780 0 0 20 352 1162 764 10 2 56 32 0 2 528 20184 24472 756772 0 0 30 704 1191 490 9 1 31 59 0 2 528 20120 24480 757024 0 0 82 8 1170 361 5 0 0 94 0 0 528 22424 24524 756980 0 0 0 542 1144 488 2 1 79 19 0 0 528 21976 24524 756980 0 0 0 0 1098 414 3 1 97 0 0 0 528 21976 24532 756972 0 0 0 238 1114 379 2 0 97 0 0 0 528 22296 24532 757232 0 0 0 0 1096 449 4 1 95 0 |
|
Running kernel 2.6.9-42.0.2 on several 32bit AMD machines and on one 64bit Dual Core AMD machine, all doing Apache/PHP/mysql and everything is working as expected. No load and iowait "spikes" and memory usage is normal. So the latest update seems to have (finally) fixed the issues. |
|
I second that. I installed the 2.6.9-42.0.2 kernel and it has stopped the high I/O wait times on all the servers affected. Although now I do have to deal with noapic on some of the equipment in question but that's a small thing to worry about and of no great concern. | |
apparently fixed with 2.6.9-42.0.2, also CentOS4 is EOL | |
Date Modified | Username | Field | Change |
---|---|---|---|
2006-01-19 16:40 | smolderinggenius | New Issue | |
2006-01-19 16:40 | smolderinggenius | Status | new => assigned |
2006-01-19 16:50 | JohnnyHughes | Note Added: 0003059 | |
2006-01-19 16:55 | smolderinggenius | Note Added: 0003060 | |
2006-01-23 12:40 | Vermut | Note Added: 0003071 | |
2006-01-23 12:56 | JohnnyHughes | Note Added: 0003072 | |
2006-01-25 09:12 | jplahti | Note Added: 0003092 | |
2006-02-06 16:18 | smolderinggenius | Note Added: 0003138 | |
2006-02-07 07:50 | jplahti | Note Added: 0003145 | |
2006-02-16 18:39 | smolderinggenius | Note Added: 0003198 | |
2006-02-16 18:43 | smolderinggenius | Note Added: 0003199 | |
2006-04-12 03:47 | scronline | Note Added: 0003384 | |
2006-04-15 17:28 | scronline | Note Edited: 0003384 | |
2006-04-16 04:33 | scronline | Note Added: 0003399 | |
2006-05-18 13:19 | wmcdonald | Note Added: 0003491 | |
2006-05-18 13:20 | wmcdonald | File Added: mailserv3memory.png | |
2006-05-18 13:21 | wmcdonald | Note Edited: 0003491 | |
2006-05-18 13:22 | wmcdonald | Note Edited: 0003491 | |
2006-05-18 13:32 | wmcdonald | File Added: munin-mailserv3.png | |
2006-05-18 13:32 | wmcdonald | Note Edited: 0003491 | |
2006-05-18 13:40 | wmcdonald | Note Edited: 0003491 | |
2006-05-18 15:10 | wmcdonald | Note Edited: 0003491 | |
2006-05-22 08:08 | cpuproblem | Note Added: 0003492 | |
2006-05-22 08:11 | cpuproblem | Note Edited: 0003492 | |
2006-05-22 08:17 | cpuproblem | Note Edited: 0003492 | |
2006-05-22 10:16 | JohnnyHughes | Note Added: 0003493 | |
2006-05-22 10:18 | JohnnyHughes | Note Edited: 0003493 | |
2006-05-22 20:43 | cpuproblem | Note Added: 0003494 | |
2006-05-23 18:15 | cpuproblem | Note Added: 0003495 | |
2006-05-31 17:21 | bbs2web | Note Added: 0003529 | |
2006-05-31 17:21 | bbs2web | File Added: Memory Utilisation.pdf | |
2006-05-31 20:36 | bbs2web | Note Added: 0003531 | |
2006-06-05 10:06 | sapinho | File Added: load-0.9.1_week.png | |
2006-06-05 10:06 | sapinho | File Added: mem-0.13.5_week.png | |
2006-06-05 10:09 | sapinho | Note Added: 0003537 | |
2006-06-05 10:12 | sapinho | Note Edited: 0003537 | |
2006-06-06 07:19 | jplahti | Note Added: 0003542 | |
2006-06-10 22:31 | cpuproblem | Note Added: 0003558 | |
2006-06-15 12:42 | JohnnyHughes | Note Added: 0003567 | |
2006-07-31 17:25 | hamav8tor | Note Added: 0003732 | |
2006-07-31 18:47 | hamav8tor | Note Added: 0003733 | |
2006-08-29 08:34 | jplahti | Note Added: 0003864 | |
2006-08-30 20:57 | scronline | Note Added: 0003868 | |
2013-03-23 12:58 | tigalch | Note Added: 0016823 | |
2013-03-23 12:58 | tigalch | Status | assigned => resolved |
2013-03-23 12:58 | tigalch | Resolution | open => fixed |