View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003373||CentOS-4||nscd||public||2009-01-31 20:33||2013-03-23 21:47|
|Target Version||Fixed in Version|
|Summary||0003373: nscd uses 100% cpu and stops reponding|
|Description||I have a 48 node engineering compute cluster running CentOS 4.7. I've noticed nscd processes using up 100% CPU on several machines. 'service restart nscd' is not able to stop the process, so I have to manually kill the offending nscd.|
I've enabled debug and the log file and restarted nscd, but don't see any error messages. I've disabled nscd for now, as this issue causes significant performance issues on the cluster.
|Tags||No tags attached.|
I also have this issue with CentOS 4.7 on a number of machines
nscd will only respond to a 'kill -9'
lsof shows that nscd has /var/run/nscd/socket opened twice - machines running nscd 'normally' have /var/run/nscd/socket opened once.
We are recently started using CentOS 4.7 on all our clusters, approximately 300 machines. We have experienced the same problem, nscd processes are using up 100% CPU on several of machines. 'service restart nscd' is not able to stop the process and nscd will only respond to a 'kill -9'. We are currently restarting nscd in a daily cronjob as a workaround.
We have also noticed that on the machines where the nscd processes are using up 100% CPU, 'lsof' shows two open /var/run/nscd/socket. But on the machines with a normal nscd 'lsof' shows one open /var/run/nscd/socket.
|Initially we thought it was the Red Hat Bugzilla – Bug 428837 - leaking file descriptors. We tried using a rebuild nscd using this patch. It didn't solve our problems.|
I can't find anything in the Red Hat Bugzilla that matches this problem - other Bugzilla entries about nscd using 100% CPU seem to be related to issues with LDAP - which we are not using.
Is it worth opening a Red Hat Bug about this?
Very interesting that you experience it, that are not using LDAP.
Found the following information on Debians bug report, on the leaking file handles (Bug report #401758)
the do_drop_connection portion of this patch which is not technically
required to fix the leak -- it fixes another bug: libnss-ldap is totally
broken in multithreaded programs (such as nscd) because you can't do
"close(10); dup2(14,10);" and guarantee another thread didn't re-open fd
10 in the meanwhile. the patch as included fixes this problem but only
when non-ssl connections are in use... in the case ssl connections are in
use it's just totally broken and can't be fixed. yay. (however thanks to
fixing the do_get_our_socket code the drop code is rarely called in the
So it can't be that problem since you experience it and you have no LDAP connection.
We should definitely, report it to Red Hat, we have found the same problem on some of our Red Hat servers as well. Will you report it or should I?
|I don't have any machines running RHEL4.7 - so it would be 'difficult' for me to log it as RHEL4.7 issue - however, if you've seen it on RHEL4.7 boxes, then it is probably best if you log it - if that is OK?|
|I've reported it in Red Hat Bugzilla – Bug 492581.|
Three more bugzilla reports has appeared about the same subject:
496201 includes a possible explanation and a suggested patch to fix the issue
|I've been running nscd with the patch at <https://bugzilla.redhat.com/attachment.cgi?id=339968> on all my 4.7 systems for a week now and not seen any running at 100% CPU|
It appears that this is actually a kernel bug. The glibc/nscd patch just 'papers over' this.
Bugzilla #496201 (and now #501800) has been bumped up to high/urgent priority - but not sure if it will make it into a 4.8 kernel update ...
|Patch will be in kernel 89.0.1.EL|
Fix will be in the errata kernel 89.0.3.EL - see:
|upstream marked this as solved.|
|2009-01-31 20:33||jweage||New Issue|
|2009-03-16 11:00||james-p||Note Added: 0008921|
|2009-03-26 12:29||Malinfro||Note Added: 0008946|
|2009-03-26 14:46||Malinfro||Note Added: 0008947|
|2009-03-26 15:22||james-p||Note Added: 0008948|
|2009-03-27 09:47||Malinfro||Note Added: 0008954|
|2009-03-27 10:26||james-p||Note Added: 0008955|
|2009-03-27 14:53||Malinfro||Note Added: 0008966|
|2009-05-01 08:30||james-p||Note Added: 0009285|
|2009-05-12 14:38||james-p||Note Added: 0009348|
|2009-05-20 19:10||james-p||Note Added: 0009377|
|2009-05-21 19:00||james-p||Note Added: 0009379|
|2009-06-30 11:42||james-p||Note Added: 0009536|
|2013-03-23 21:47||tigalch||Note Added: 0016976|
|2013-03-23 21:47||tigalch||Status||new => resolved|
|2013-03-23 21:47||tigalch||Resolution||open => fixed|