CentOS Bug Tracker
CentOS Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005496CentOS-6kernelpublic2012-02-07 19:012013-04-16 20:09
ReporterTheVision 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version6.2 
Target VersionFixed in Version6.3 
Summary0005496: readdir() fails to return all entries of a NFS directory
DescriptionWhile testing file system performance of a NFS mounted directory with bonnie++, it exited with a "Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty" error.

It appears that readdir() on a CentOS 6.2 NFS client (both x86 and x64, connecting to either a 6.2 or 5.7 NFS server) will fail to return all entries in an NFS directory. This is different than how the NFS client on CentOS 5.7 or Oracle 6.2 behaves.

In the examples below, /home/greg/mnt/tmp is an NFS mounted directory. I created 20,480 0-length files by hand. nfsrm is included below.

CentOS 5.7 (expected behavior):
Linux 4eg5fre 2.6.18-274.17.1.el5 0000001 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
[greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@4eg5fre ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l
0

Oracle 6.2 (expected behavior):
Linux 8gpxkf1 2.6.32-300.3.1.el6uek.x86_64 0000001 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@8gpxkf1 ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l
0

CentOS 6.2 client (unexpected behavior):
Linux 9yf6091 2.6.32-220.4.1.el6.i686 0000001 SMP Mon Jan 23 22:37:12 GMT 2012 i686 i686 i386 GNU/Linux
[greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l
20480
[greg@9yf6091 ~]$ ./nfsrm /home/greg/mnt/tmp
[greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l
19316

readdir() works as expected on 6.2 when using a local filesystem.
Steps To Reproduce1. Create a large number (1,024 wasn't enough -- I used 20,480) of 0-length files in a NFS mounted directory.
2. Loop through the files with readdir(), unlink()ing them along the way. This can be done with the program included below.

#!/usr/bin/perl

use warnings;
use strict;

my $nfsdir;
my $file;

if ($#ARGV != 0) {
        die "usage: $0 nfsdirectory\n";
}
$nfsdir = $ARGV[0];
chdir($nfsdir) or die "fatal: chdir($nfsdir) failed: $!\n";
opendir(DIR, ".") or die "fatal: opendir(.) failed: $!\n";
while (defined($file = readdir DIR)) {
        next if $file =~ /^\.\.?$/;
        unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
}
closedir(DIR) or die "fatal: closedir() failed: $!\n";;
chdir("..") or die "fatal: chdir(..) failed: $!\n";
Additional InformationCalling rewinddir() after unlink() can work around the problem.
TagsNo tags attached.
Attached Files

- Relationships
related to 0006241resolvedtoracat CentOS-5 See: 0005496: readdir() fails to return all entries of a NFS directory 
related to 0006213assignedkbsingh@karan.org CentOS-5 NFS mounts don't show all directory entries 

-  Notes
(0014424)
toracat (developer)
2012-02-07 21:48

There is a forum post that is apparently related:

"centos 6.2 cant see all files on nfs mount"
https://www.centos.org/modules/newbb/viewtopic.php?topic_id=35486&forum=55 [^]
(0014425)
tru (administrator)
2012-02-07 23:00

hi,
CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel?

I can't reproduce it on my setup: server c5 x86_64 2.6.18-274.12.1.el5
client c6 2.6.32-220.4.1.el6.x86_64

[tru@sillage tmp]$ sudo lvcreate --name bugs5496 --size 100M sillage
  Logical volume "bugs5496" created
[tru@sillage tmp]$ sudo mkfs.xfs /dev/sillage/bugs5496
meta-data=/dev/sillage/bugs5496 isize=256 agcount=6, agsize=4096 blks
         = sectsz=512 attr=0
data = bsize=4096 blocks=24576, imaxpct=25
         = sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal log bsize=4096 blocks=1200, version=1
         = sectsz=512 sunit=0 blks, lazy-count=0
realtime =none extsz=4096 blocks=0, rtextents=0
[tru@sillage tmp]$ sudo mount /dev/sillage/bugs5496 /test/
[tru@sillage tmp]$ tail -1 /etc/exports
/test ogotai.bis.pasteur.fr(rw,async)
[tru@sillage test]$ pwd
/test
[tru@sillage test]$ df .
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/sillage-bugs5496 xfs 92M 4.3M 88M 5% /test
[tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done
[tru@sillage test]$ find . -type f | wc -w
20480


[tru@ogotai ~]$ sudo mkdir /nfs/test
[tru@ogotai ~]$ sudo mount sillage:/test /nfs/test
[tru@ogotai ~]$ df -P /nfs/test/
Filesystem Type 1024-blocks Used Available Capacity Mounted on
sillage:/test nfs 93504 10080 83424 11% /nfs/test
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ echo $?
0
(0014426)
TheVision (reporter)
2012-02-07 23:23

"CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel?"

I don't have access to RHEL 6.2; the closest I have is Oracle (which is RHEL + Oracle's fixes).

...
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ echo $?
0
...

Can you do a "ls /nfs/test | wc -l"? nfsrm.pl doesn't report an error when it fails to read all of the files -- according to readdir(), it's read all of them.
(0014427)
tru (administrator)
2012-02-07 23:37

server side:
[tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done
[tru@sillage test]$ find . -type f| wc -w
20480

client side:
[tru@ogotai ~]$ find /nfs/test -type f | wc -w
20480
[tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/
[tru@ogotai ~]$ find /nfs/test -type f | wc -w
19117

back to server side:
[tru@sillage test]$ find . -type f| wc -w
19117

-> perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides
(0014428)
TheVision (reporter)
2012-02-08 00:21

"perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides"

It's not a perl change of behavior, as readdir() works as expected on a local directory with the same number of files (As noted above, I tried nfsrm.pl on a local directory to confirm this). I've cobbled together a C program that exhibits the same problem; I just felt it was easier to share the perl script.

This problem was originally detected with bonnie++ (invoked like this: "bonnie++ -f -d <nfsdirectory>"), which failed while doing the sequential delete of many files. bonnie++ does an opendir(), and while in a readdir() loop, unlink()s each file. It then tries to remove the directory -- which fails, as files still exist in said directory.
(0014430)
Phil Schaffner (reporter)
2012-02-08 14:27

Client side on RHEL6u2:
[pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w
20480
[pschaffn@wx1 ~]$ ./nfsrm.pl /share/pschaffn/TST
[pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w
19219
[pschaffn@wx1 ~]$ find /share/pschaffn/TST/ -type f| wc -w
19219
[pschaffn@wx1 ~]$ uname -rmi
2.6.32-220.4.1.el6.x86_64 x86_64 x86_64
[pschaffn@wx1 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)

On the server side all files are removed.
(0014431)
TheVision (reporter)
2012-02-08 15:18

It seems that rename()s interaction with readdir() will also trigger this bug. If the unlink() is commented out, and replaced with rename():

...
#unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n";
...

[greg@localhost ~]$ tmp/nfsrm /home/greg/mnt
# 20480 files should have been renamed to <name>.1:
[greg@localhost ~]$ ls /home/greg/mnt/*.1 | wc -l
1175


If the unlink() / rename() is instead replaced with a print, we see all of the files:

...
#unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n";
#rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n";
print "$file\n";
...

[greg@localhost ~]$ tmp/nfsrm /home/greg/mnt | wc -l
20480
(0014453)
simpfeld (reporter)
2012-02-10 19:36

I'm seeing this bug on pure RHEL6.2 (with bonnie++), the only report I found of it was this Centos bug.

I have therefore opened a bug report with RH.

https://bugzilla.redhat.com/show_bug.cgi?id=789452 [^]
(0015046)
torel@dolphingeo.com (reporter)
2012-05-08 07:15

cc.
(0015668)
kbsingh@karan.org (administrator)
2012-08-16 10:38

is this still an issue with the 6.3 kernels ?
(0015669)
simpfeld (reporter)
2012-08-16 11:04

I believe this is resolved (certainly for bonnie++) in 6.3
(0016049)
toracat (developer)
2012-11-18 17:01

Apparently resolved. Closing.

- Issue History
Date Modified Username Field Change
2012-02-07 19:01 TheVision New Issue
2012-02-07 21:48 toracat Note Added: 0014424
2012-02-07 23:00 tru Note Added: 0014425
2012-02-07 23:01 tru Status new => feedback
2012-02-07 23:23 TheVision Note Added: 0014426
2012-02-07 23:23 TheVision Status feedback => assigned
2012-02-07 23:37 tru Note Added: 0014427
2012-02-08 00:21 TheVision Note Added: 0014428
2012-02-08 14:27 Phil Schaffner Note Added: 0014430
2012-02-08 15:18 TheVision Note Added: 0014431
2012-02-10 19:36 simpfeld Note Added: 0014453
2012-05-08 07:15 torel@dolphingeo.com Note Added: 0015046
2012-08-16 10:38 kbsingh@karan.org Note Added: 0015668
2012-08-16 10:38 kbsingh@karan.org Status assigned => feedback
2012-08-16 11:04 simpfeld Note Added: 0015669
2012-11-18 17:01 toracat Note Added: 0016049
2012-11-18 17:01 toracat Status feedback => resolved
2012-11-18 17:01 toracat Resolution open => fixed
2012-11-18 17:01 toracat Fixed in Version => 6.3
2013-02-05 18:22 tru Relationship added related to 0006241
2013-02-09 22:08 toracat Relationship added related to 0006246
2013-02-09 22:13 toracat Relationship deleted related to 0006246
2013-04-16 20:09 toracat Relationship added related to 0006213


Copyright © 2000 - 2014 MantisBT Team
Powered by Mantis Bugtracker