View Issue Details

IDProjectCategoryView StatusLast Update
0006672CentOS-6nfs-utilspublic2013-09-20 16:19
Reporterxxwassyxx 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSCentOSOS Version6.2
Product Version6.2 
Target VersionFixed in Version 
Summary0006672: NFS directory with large number of files gets too many levels of symbolic links error
DescriptionBasically we have an NFS directory from server A that is being shared on server B. The directory has at this moment 28047 files of various sizes, mainly 12-14Kb in size. What happens is if we do for example "ls | wc -l" we get the error "ls: reading directory .: Too many levels of symbolic links
15914"
Now this is where things get very exciting. If we create a file called "ls.out" and yes the name is important, the contents is not, then everything works? Inodes are irrelevant as we have tried several different tests. We even tried different names when we reproduced it in different directories just in case we magically caused it to like the "ls.out" term to start. But only that name works.
Steps To ReproduceThis appears more to be something with server B that is causing the issue. I dont know exactly what as I have ruled out kernel versions as updating from 2.6.32-220 to 2.6.32-358 made no difference. However we can consistently reproduce the issue in different directories on server B and get the same error. Copying all the files to server B directly and then sharing it to A to reproduce it that way did not produce any errors. We have also eliminated any symbolic links that may be causing the issue by putting all the files on a root folder and mounting it on /mnt and still get the same error even tho there are no symbolic links.
Additional InformationI have run an strace and noticed this error.
getdents(3, 0x17da1c8, 32768) = -1 ELOOP (Too many levels of symbolic links)
Some research showed that its definately complaining about a symbolic link somehow.
But even after removing any symbolic links it still had the same results. I did notice a couple other errors in dmesg.
NFS: directory /nfs_symlink_bug contains a readdir loop.Please contact your server vendor. The file: 20050714_150000.bhd has duplicate cookie 1220256249
Now whats really crazy like the ls.out file, if I delete this file, everything works. But if a touch this with the same name, so its empty no contents. I get the same error. Is the cookie number an inode number? I have failed to find any clear cut definition behind that. Some other things I have looked at is, was there a pattern to the files missing? As you can see 1000 or so files are missing and outputting those to a file and doing an sdiff did not produce any pattern. I looked at filesizes, location, names etc... and there did not appear to be a pattern at all. NFS-utils on both machines are the same version aswell, and I have tried this with both nfs v3/4.
TagsNo tags attached.

Activities

xxwassyxx

xxwassyxx

2013-09-18 14:47

reporter   ~0018015

grrrrr not sure how to edit this but I realise platform should be x86_64
toracat

toracat

2013-09-18 15:56

manager   ~0018017

Did you try updating _both_ the server and the client to kernel 2.6.32-358? Does the problem persist?
xxwassyxx

xxwassyxx

2013-09-18 17:19

reporter   ~0018022

No we only updated the client since it was the only one having the problem. I can update the server to and report back with what I see.
toracat

toracat

2013-09-18 17:42

manager   ~0018024

Yes, please update the server as well. I hope / believe it fixes the issue.
xxwassyxx

xxwassyxx

2013-09-18 20:37

reporter   ~0018026

I have had to push the maintenance window to tomorrow as jobs are running on the server. I will update you again once completed.
xxwassyxx

xxwassyxx

2013-09-19 22:07

reporter   ~0018029

This is considered resolved :) It appears the kernel upgrade needed to be on both server and client in order to work. I am still curious however why in the world a file called ls.out would make it work.
toracat

toracat

2013-09-19 22:51

manager   ~0018030

Glad to hear the problem resolved. Regarding the ls.out file, leave it as the 8th wonder of the world. :)
toracat

toracat

2013-09-20 16:19

manager   ~0018038

Closing as 'resolved' as the issue was fixed with EL6.4 kernels.

Issue History

Date Modified Username Field Change
2013-09-18 14:44 xxwassyxx New Issue
2013-09-18 14:47 xxwassyxx Note Added: 0018015
2013-09-18 15:27 tigalch Platform HP DL380 Server => x86_64
2013-09-18 15:56 toracat Note Added: 0018017
2013-09-18 17:19 xxwassyxx Note Added: 0018022
2013-09-18 17:42 toracat Note Added: 0018024
2013-09-18 20:37 xxwassyxx Note Added: 0018026
2013-09-19 22:07 xxwassyxx Note Added: 0018029
2013-09-19 22:51 toracat Note Added: 0018030
2013-09-20 16:19 toracat Note Added: 0018038
2013-09-20 16:19 toracat Status new => resolved
2013-09-20 16:19 toracat Resolution open => fixed