| Anonymous | Login | Signup for a new account | 2013-06-19 11:53 UTC | ![]() |
| Main | My View | View Issues | Roadmap |
| View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||
| ID | Project | Category | View Status | Date Submitted | Last Update | |||
| 0005496 | CentOS-6 | kernel | public | 2012-02-07 19:01 | 2013-04-16 20:09 | |||
| Reporter | TheVision | |||||||
| Priority | normal | Severity | major | Reproducibility | always | |||
| Status | resolved | Resolution | fixed | |||||
| Platform | OS | OS Version | ||||||
| Product Version | 6.2 | |||||||
| Target Version | Fixed in Version | 6.3 | ||||||
| Summary | 0005496: readdir() fails to return all entries of a NFS directory | |||||||
| Description | While testing file system performance of a NFS mounted directory with bonnie++, it exited with a "Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty" error. It appears that readdir() on a CentOS 6.2 NFS client (both x86 and x64, connecting to either a 6.2 or 5.7 NFS server) will fail to return all entries in an NFS directory. This is different than how the NFS client on CentOS 5.7 or Oracle 6.2 behaves. In the examples below, /home/greg/mnt/tmp is an NFS mounted directory. I created 20,480 0-length files by hand. nfsrm is included below. CentOS 5.7 (expected behavior): Linux 4eg5fre 2.6.18-274.17.1.el5 0000001 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux [greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l 20480 [greg@4eg5fre ~]$ ./nfsrm /home/greg/mnt/tmp [greg@4eg5fre ~]$ ls /home/greg/mnt/tmp | wc -l 0 Oracle 6.2 (expected behavior): Linux 8gpxkf1 2.6.32-300.3.1.el6uek.x86_64 0000001 SMP Fri Dec 9 18:57:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux [greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l 20480 [greg@8gpxkf1 ~]$ ./nfsrm /home/greg/mnt/tmp [greg@8gpxkf1 ~]$ ls /home/greg/mnt/tmp | wc -l 0 CentOS 6.2 client (unexpected behavior): Linux 9yf6091 2.6.32-220.4.1.el6.i686 0000001 SMP Mon Jan 23 22:37:12 GMT 2012 i686 i686 i386 GNU/Linux [greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l 20480 [greg@9yf6091 ~]$ ./nfsrm /home/greg/mnt/tmp [greg@9yf6091 ~]$ ls /home/greg/mnt/tmp | wc -l 19316 readdir() works as expected on 6.2 when using a local filesystem. | |||||||
| Steps To Reproduce | 1. Create a large number (1,024 wasn't enough -- I used 20,480) of 0-length files in a NFS mounted directory. 2. Loop through the files with readdir(), unlink()ing them along the way. This can be done with the program included below. #!/usr/bin/perl use warnings; use strict; my $nfsdir; my $file; if ($#ARGV != 0) { die "usage: $0 nfsdirectory\n"; } $nfsdir = $ARGV[0]; chdir($nfsdir) or die "fatal: chdir($nfsdir) failed: $!\n"; opendir(DIR, ".") or die "fatal: opendir(.) failed: $!\n"; while (defined($file = readdir DIR)) { next if $file =~ /^\.\.?$/; unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n"; } closedir(DIR) or die "fatal: closedir() failed: $!\n";; chdir("..") or die "fatal: chdir(..) failed: $!\n"; | |||||||
| Additional Information | Calling rewinddir() after unlink() can work around the problem. | |||||||
| Tags | No tags attached. | |||||||
| Attached Files | ||||||||
Relationships |
|||||||||||||
|
|||||||||||||
Notes |
|
|
(0014424) toracat (developer) 2012-02-07 21:48 |
There is a forum post that is apparently related: "centos 6.2 cant see all files on nfs mount" https://www.centos.org/modules/newbb/viewtopic.php?topic_id=35486&forum=55 [^] |
|
(0014425) tru (administrator) 2012-02-07 23:00 |
hi, CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel? I can't reproduce it on my setup: server c5 x86_64 2.6.18-274.12.1.el5 client c6 2.6.32-220.4.1.el6.x86_64 [tru@sillage tmp]$ sudo lvcreate --name bugs5496 --size 100M sillage Logical volume "bugs5496" created [tru@sillage tmp]$ sudo mkfs.xfs /dev/sillage/bugs5496 meta-data=/dev/sillage/bugs5496 isize=256 agcount=6, agsize=4096 blks = sectsz=512 attr=0 data = bsize=4096 blocks=24576, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=1200, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 [tru@sillage tmp]$ sudo mount /dev/sillage/bugs5496 /test/ [tru@sillage tmp]$ tail -1 /etc/exports /test ogotai.bis.pasteur.fr(rw,async) [tru@sillage test]$ pwd /test [tru@sillage test]$ df . Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/sillage-bugs5496 xfs 92M 4.3M 88M 5% /test [tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done [tru@sillage test]$ find . -type f | wc -w 20480 [tru@ogotai ~]$ sudo mkdir /nfs/test [tru@ogotai ~]$ sudo mount sillage:/test /nfs/test [tru@ogotai ~]$ df -P /nfs/test/ Filesystem Type 1024-blocks Used Available Capacity Mounted on sillage:/test nfs 93504 10080 83424 11% /nfs/test [tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/ [tru@ogotai ~]$ echo $? 0 |
|
(0014426) TheVision (reporter) 2012-02-07 23:23 |
"CentOS is not cloning Oracle linux version, though it's nice to see that their kernel is behaving properly. Does the issue also happens on RHEL kernel?" I don't have access to RHEL 6.2; the closest I have is Oracle (which is RHEL + Oracle's fixes). ... [tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/ [tru@ogotai ~]$ echo $? 0 ... Can you do a "ls /nfs/test | wc -l"? nfsrm.pl doesn't report an error when it fails to read all of the files -- according to readdir(), it's read all of them. |
|
(0014427) tru (administrator) 2012-02-07 23:37 |
server side: [tru@sillage test]$ for i in `seq 1 20480`;do touch $i; done [tru@sillage test]$ find . -type f| wc -w 20480 client side: [tru@ogotai ~]$ find /nfs/test -type f | wc -w 20480 [tru@ogotai ~]$ perl /tmp/nfsrm.pl /nfs/test/ [tru@ogotai ~]$ find /nfs/test -type f | wc -w 19117 back to server side: [tru@sillage test]$ find . -type f| wc -w 19117 -> perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides |
|
(0014428) TheVision (reporter) 2012-02-08 00:21 |
"perl readdir() change of behaviour between 5.7 and 6.2, no idea what oracle provides" It's not a perl change of behavior, as readdir() works as expected on a local directory with the same number of files (As noted above, I tried nfsrm.pl on a local directory to confirm this). I've cobbled together a C program that exhibits the same problem; I just felt it was easier to share the perl script. This problem was originally detected with bonnie++ (invoked like this: "bonnie++ -f -d <nfsdirectory>"), which failed while doing the sequential delete of many files. bonnie++ does an opendir(), and while in a readdir() loop, unlink()s each file. It then tries to remove the directory -- which fails, as files still exist in said directory. |
|
(0014430) Phil Schaffner (qa_team) 2012-02-08 14:27 |
Client side on RHEL6u2: [pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w 20480 [pschaffn@wx1 ~]$ ./nfsrm.pl /share/pschaffn/TST [pschaffn@wx1 ~]$ ls /share/pschaffn/TST/ | wc -w 19219 [pschaffn@wx1 ~]$ find /share/pschaffn/TST/ -type f| wc -w 19219 [pschaffn@wx1 ~]$ uname -rmi 2.6.32-220.4.1.el6.x86_64 x86_64 x86_64 [pschaffn@wx1 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) On the server side all files are removed. |
|
(0014431) TheVision (reporter) 2012-02-08 15:18 |
It seems that rename()s interaction with readdir() will also trigger this bug. If the unlink() is commented out, and replaced with rename(): ... #unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n"; rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n"; ... [greg@localhost ~]$ tmp/nfsrm /home/greg/mnt # 20480 files should have been renamed to <name>.1: [greg@localhost ~]$ ls /home/greg/mnt/*.1 | wc -l 1175 If the unlink() / rename() is instead replaced with a print, we see all of the files: ... #unlink "$nfsdir/$file" or warn "warn: unlink($nfsdir/$file) failed: $!\n"; #rename("$nfsdir/$file", "$nfsdir/$file.1") or warn "warn: rename($nfsdir/$file) failed: $!\n"; print "$file\n"; ... [greg@localhost ~]$ tmp/nfsrm /home/greg/mnt | wc -l 20480 |
|
(0014453) simpfeld (reporter) 2012-02-10 19:36 |
I'm seeing this bug on pure RHEL6.2 (with bonnie++), the only report I found of it was this Centos bug. I have therefore opened a bug report with RH. https://bugzilla.redhat.com/show_bug.cgi?id=789452 [^] |
|
(0015046) torel@dolphingeo.com (reporter) 2012-05-08 07:15 |
cc. |
|
(0015668) kbsingh@karan.org (administrator) 2012-08-16 10:38 |
is this still an issue with the 6.3 kernels ? |
|
(0015669) simpfeld (reporter) 2012-08-16 11:04 |
I believe this is resolved (certainly for bonnie++) in 6.3 |
|
(0016049) toracat (developer) 2012-11-18 17:01 |
Apparently resolved. Closing. |
Issue History |
|||
| Date Modified | Username | Field | Change |
| 2012-02-07 19:01 | TheVision | New Issue | |
| 2012-02-07 21:48 | toracat | Note Added: 0014424 | |
| 2012-02-07 23:00 | tru | Note Added: 0014425 | |
| 2012-02-07 23:01 | tru | Status | new => feedback |
| 2012-02-07 23:23 | TheVision | Note Added: 0014426 | |
| 2012-02-07 23:23 | TheVision | Status | feedback => assigned |
| 2012-02-07 23:37 | tru | Note Added: 0014427 | |
| 2012-02-08 00:21 | TheVision | Note Added: 0014428 | |
| 2012-02-08 14:27 | Phil Schaffner | Note Added: 0014430 | |
| 2012-02-08 15:18 | TheVision | Note Added: 0014431 | |
| 2012-02-10 19:36 | simpfeld | Note Added: 0014453 | |
| 2012-05-08 07:15 | torel@dolphingeo.com | Note Added: 0015046 | |
| 2012-08-16 10:38 | kbsingh@karan.org | Note Added: 0015668 | |
| 2012-08-16 10:38 | kbsingh@karan.org | Status | assigned => feedback |
| 2012-08-16 11:04 | simpfeld | Note Added: 0015669 | |
| 2012-11-18 17:01 | toracat | Note Added: 0016049 | |
| 2012-11-18 17:01 | toracat | Status | feedback => resolved |
| 2012-11-18 17:01 | toracat | Resolution | open => fixed |
| 2012-11-18 17:01 | toracat | Fixed in Version | => 6.3 |
| 2013-02-05 18:22 | tru | Relationship added | related to 0006241 |
| 2013-02-09 22:08 | toracat | Relationship added | related to 0006246 |
| 2013-02-09 22:13 | toracat | Relationship deleted | related to 0006246 |
| 2013-04-16 20:09 | toracat | Relationship added | related to 0006213 |
| Copyright © 2000 - 2011 MantisBT Group |