View Issue Details

IDProjectCategoryView StatusLast Update
0005496CentOS-6kernelpublic2013-04-16 20:09
ReporterTheVision 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version6.2 
Target VersionFixed in Version6.3 
Summary0005496: readdir() fails to return all entries of a NFS directory
DescriptionWhile testing file system performance of a NFS mounted directory with bonnie++, it exited with a "Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty" error.

It appears that readdir() on a CentOS 6.2 NFS client (both x86 and x64, connecting to either a 6.2 or 5.7 NFS server) will fail to return all entries in an NFS directory. This is different than how the NFS client on CentOS 5.7 or Oracle 6.2 behaves.

In the examples below, /home/greg/mnt/tmp is an NFS mounted directory. I created 20,480 0-length files by hand. nfsrm is included below.

CentOS 5.7 (expected behavior):
Linux 4eg5fre 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@4eg5fre ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l
0

Oracle 6.2 (expected behavior):
Linux 8gpxkf1 2.6.32-300.3.1.el6uek.x86_64 #1 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@8gpxkf1 ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l
0

CentOS 6.2 client (unexpected behavior):
Linux 9yf6091 2.6.32-220.4.1.el6.i686 #1 SMP Mon Jan 23 22:37:12 GMT 2012 i686 i686 i386 GNU/Linux
[greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@9yf6091 ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l
19316

readdir() works as expected on 6.2 when using a local filesystem.
Steps To Reproduce1. Create a large number (1,024 wasn't enough -- I used 20,480) of 0-length files in a NFS mounted directory.
2. Loop through the files with readdir(), unlink()ing them along the way. This can be done with the program included below.

#!/usr/bin/perl

use warnings;
use strict;

my $nfsdir;
my $file;

if ($#ARGV != 0) {
        die "usage: $0 nfsdirectory\n";
}
$nfsdir = $ARGV[0];
chdir($nfsdir) or die "fatal: chdir($nfsdir) failed: $!\n";
opendir(DIR, ".") or die "fatal: opendir(.) failed: $!\n";
while (defined($file = readdir DIR)) {
        next if $file =~ /^\.\.?$/;
        unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
}
closedir(DIR) or die "fatal: closedir() failed: $!\n";;
chdir("..") or die "fatal: chdir(..) failed: $!\n";
Additional InformationCalling rewinddir() after unlink() can work around the problem.
TagsNo tags attached.

Relationships

related to 0006241 resolvedtoracat CentOS-5 See: 0005496: readdir() fails to return all entries of a NFS directory 
related to 0006213 assignedkbsingh@karan.org CentOS-5 NFS mounts don't show all directory entries 

Activities

toracat

toracat

2012-02-07 21:48

manager   ~0014424

There is a forum post that is apparently related:

"centos 6.2 cant see all files on nfs mount"
https://www.centos.org/modules/newbb/viewtopic.php?topic_id=35486&forum=55
tru

tru

2012-02-07 23:00

administrator   ~0014425

hi,
CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel?

I can't reproduce it on my setup: server c5 x86_64 2.6.18-274.12.1.el5
client c6 2.6.32-220.4.1.el6.x86_64

[tru@sillage tmp]$ sudo lvcreate --name bugs5496 --size 100M sillage
  Logical volume "bugs5496" created
[tru@sillage tmp]$ sudo mkfs.xfs /dev/sillage/bugs5496
meta-data=/dev/sillage/bugs5496 isize=256 agcount=6, agsize=4096 blks
         = sectsz=512 attr=0
data = bsize=4096 blocks=24576, imaxpct=25
         = sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=1200, version=1
         = sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
[tru@sillage tmp]$ sudo mount /dev/sillage/bugs5496 /test/
[tru@sillage tmp]$ tail -1 /etc/exports
/test ogotai.bis.pasteur.fr(rw,async)
[tru@sillage test]$ pwd
/test
[tru@sillage test]$ df .
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/sillage-bugs5496 xfs 92M 4.3M 88M 5% /test
[tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done
[tru@sillage test]$ find . -type f | wc -w
20480


[tru@ogotai ~]$ sudo mkdir /nfs/test
[tru@ogotai ~]$ sudo mount sillage:/test /nfs/test
[tru@ogotai ~]$ df -P /nfs/test/
Filesystem Type 1024-blocks Used Available Capacity Mounted on
sillage:/test nfs 93504 10080 83424 11% /nfs/test
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ echo $?
0
TheVision

TheVision

2012-02-07 23:23

reporter   ~0014426

"CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel?"

I don't have access to RHEL 6.2; the closest I have is Oracle (which is RHEL + Oracle's fixes).

...
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ echo $?
0
...

Can you do a "ls /nfs/test | wc -l"? nfsrm.pl doesn't report an error when it fails to read all of the files -- according to readdir(), it's read all of them.
tru

tru

2012-02-07 23:37

administrator   ~0014427

server side:
[tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done
[tru@sillage test]$ find . -type f| wc -w
20480

client side:
[tru@ogotai ~]$ find /nfs/test -type f | wc -w
20480
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ find /nfs/test -type f | wc -w
19117

back to server side:
[tru@sillage test]$ find . -type f| wc -w
19117

-> perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides
TheVision

TheVision

2012-02-08 00:21

reporter   ~0014428

"perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides"

It's not a perl change of behavior, as readdir() works as expected on a local directory with the same number of files (As noted above, I tried nfsrm.pl on a local directory to confirm this). I've cobbled together a C program that exhibits the same problem; I just felt it was easier to share the perl script.

This problem was originally detected with bonnie++ (invoked like this: "bonnie++ -f -d <nfsdirectory>"), which failed while doing the sequential delete of many files. bonnie++ does an opendir(), and while in a readdir() loop, unlink()s each file. It then tries to remove the directory -- which fails, as files still exist in said directory.
Phil Schaffner

Phil Schaffner

2012-02-08 14:27

reporter   ~0014430

Client side on RHEL6u2:
[pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w
20480
[pschaffn@wx1 ~]$ ./nfsrm.pl /share/pschaffn/TST
[pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w
19219
[pschaffn@wx1 ~]$ find /share/pschaffn/TST/ -type f| wc -w
19219
[pschaffn@wx1 ~]$ uname -rmi
2.6.32-220.4.1.el6.x86_64 x86_64 x86_64
[pschaffn@wx1 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)

On the server side all files are removed.
TheVision

TheVision

2012-02-08 15:18

reporter   ~0014431

It seems that rename()s interaction with readdir() will also trigger this bug. If the unlink() is commented out, and replaced with rename():

...
#unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n";
...

[greg@localhost ~]$ tmp/nfsrm /home/greg/mnt
# 20480 files should have been renamed to <name>.1:
[greg@localhost ~]$ ls /home/greg/mnt/*.1 | wc -l
1175


If the unlink() / rename() is instead replaced with a print, we see all of the files:

...
#unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
#rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n";
print "$file\n";
...

[greg@localhost ~]$ tmp/nfsrm /home/greg/mnt | wc -l
20480
simpfeld

simpfeld

2012-02-10 19:36

reporter   ~0014453

I'm seeing this bug on pure RHEL6.2 (with bonnie++), the only report I found of it was this Centos bug.

I have therefore opened a bug report with RH.

https://bugzilla.redhat.com/show_bug.cgi?id=789452
torel@dolphingeo.com

torel@dolphingeo.com

2012-05-08 07:15

reporter   ~0015046

cc.
kbsingh@karan.org

kbsingh@karan.org

2012-08-16 10:38

administrator   ~0015668

is this still an issue with the 6.3 kernels ?
simpfeld

simpfeld

2012-08-16 11:04

reporter   ~0015669

I believe this is resolved (certainly for bonnie++) in 6.3
toracat

toracat

2012-11-18 17:01

manager   ~0016049

Apparently resolved. Closing.

Issue History

Date Modified Username Field Change
2012-02-07 19:01 TheVision New Issue
2012-02-07 21:48 toracat Note Added: 0014424
2012-02-07 23:00 tru Note Added: 0014425
2012-02-07 23:01 tru Status new => feedback
2012-02-07 23:23 TheVision Note Added: 0014426
2012-02-07 23:23 TheVision Status feedback => assigned
2012-02-07 23:37 tru Note Added: 0014427
2012-02-08 00:21 TheVision Note Added: 0014428
2012-02-08 14:27 Phil Schaffner Note Added: 0014430
2012-02-08 15:18 TheVision Note Added: 0014431
2012-02-10 19:36 simpfeld Note Added: 0014453
2012-05-08 07:15 torel@dolphingeo.com Note Added: 0015046
2012-08-16 10:38 kbsingh@karan.org Note Added: 0015668
2012-08-16 10:38 kbsingh@karan.org Status assigned => feedback
2012-08-16 11:04 simpfeld Note Added: 0015669
2012-11-18 17:01 toracat Note Added: 0016049
2012-11-18 17:01 toracat Status feedback => resolved
2012-11-18 17:01 toracat Resolution open => fixed
2012-11-18 17:01 toracat Fixed in Version => 6.3
2013-02-05 18:22 tru Relationship added related to 0006241
2013-02-09 22:08 toracat Relationship added related to 0006246
2013-02-09 22:13 toracat Relationship deleted related to 0006246
2013-04-16 20:09 toracat Relationship added related to 0006213