View Issue Details

IDProjectCategoryView StatusLast Update
0006992CentOS-6filesystempublic2016-05-10 22:21
Reportercasalemedia_RDG Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
PlatformCentOSOSLinuxOS Version2.6.32-358.el6
Product Version6.4 
Summary0006992: TMPFS creates files with inode 0, rendering parent directory unremovable
DescriptionWe use TMPFS for storing transient scratch data, writing out thousands of files every minute to timestamp-named directories; as such, we necessarily have to clear stale data with a cleanup process that simply removed old, and therefore no longer relevant, files and directories.

Recently, we encountered an issue where one of these temporary directories could not be removed by our 'rm -rf' cleanup job; even as root, we were unable to move/delete it:

# rm -rf 1381276560
rm: cannot remove directory `1381276560': Directory not empty

Which was very confusing, since upon inspection, it appears to contain no files:

# ls -la 1381276560
total 0
drwxrw-r-- 2 user user 60 Oct 8 19:58 .
drwxrw-r-- 5 user user 100 Dec 4 18:10 ..

We also confirmed that there were no open file handles point to this directory using 'lsof +D'.

Over the coming weeks, this occurred on a number of our servers -- which demonstrated that this was not just a random occurrence. On a hunch, we looked at the inode numbers of these unremovable directories, all of which were very close to 2**32:

4294966751 drwxrw-r-- 2 user user 60 Oct 8 19:58 1381276560
4294957948 drwxrw-r-- 2 user user 60 Oct 9 22:03 1381370460
4294952539 drwxrw-r-- 2 user user 60 Oct 23 12:35 1382545980
4294951887 drwxrw-r-- 2 user user 60 Nov 11 16:46 1384206240
4294947758 drwxrw-r-- 2 user user 60 Nov 13 14:10 1384369680
4294948806 drwxrw-r-- 2 user user 60 Nov 20 18:33 1384990260
4294962748 drwxrw-r-- 2 user user 60 Dec 30 20:04 1388451720

Which led us to postulate that perhaps we were running into an issue where there was a file created within each of these directories that had inode number 0, which is a reserved value that represents "deleted file not yet removed from disk" -- which could explain why rm, ls et al. don't show list it.

According to http://stackoverflow.com/questions/4411701/how-are-inode-numbers-generated-in-linux-tmpfs, "the bulk of the tmpfs code is in mm/shmem.c., but it delegates almost everything to the generic filesystem code in fs/inode.c." The field "i_ino" of the inode struct handled by new_inode(), which simply performs a 'inode->i_ino = ++last_ino;', which is a 32-bit unsigned integer that can overflow. Only other filesystems, this value is typical overwritten by an unused inode number, but TMPFS does not appear to have any special handling for this.

This, however, did suggest that the problem was related to the directory listing (in order to determine the file), rather than an issue with the underlying file itself -- so inspecting the filesystem's dentry (directory entries) for this directory name using low-level system calls [ e.g. getdents(2) ] should reveal a complete list of inodes and filenames.

We modified https://raw.github.com/aidenbell/getdents/master/src/getdents.c, which was originally designed as a faster alternative to ls, so that it would only list files with inode number 0:

- if( d->d_ino != 0 && d_type == DT_REG ) {
- printf("%s\n", (char *)d->d_name );
+ if( d->d_ino == 0 && d_type == DT_REG ) {
+ printf("Inode number %ld: %s\n", d->d_ino, (char *)d->d_name );
 
And much to our horror/delight, the mystery filename that neither ls nor rm could locate appeared out of thin air:

# gcc getdents.c -o getdents
# getdents 1381276560
Inode number 0: 71A800181400

This file was completely intact (i.e. contained the correct contents and typical file size for a file in this directory), and could be trivially deleted by name:

# cat 71A800181400 | wc -c
776

# rm 71A800181400
rm: remove regular file `71A800181400'? y

At which point removing its parent directory was no longer an issue (directory block size was restored, etc.), and our problem went away.

It's possible that it's remained unknown because the following things need to occur in order to get this unlikely situation to re-occur:

1) have a server with sufficient uptime to generate ~4.3G files on a device with a reboot; and
2) have the file that would be allocated inode 0 for that device created on the TMPFS partition; and
3) trigger a process which deletes these TMPFS files without knowledge of their name; and finally
4) try to delete the parent directory

Nonetheless, we consider this a bug in TMPFS -- there's no reason to hand out a reserved inode number when starting again at 1 would be just fine, and thereby never encounter this issue.
Steps To ReproduceTo prove that would could make this happen again, we made a small script (inodeOverflow.pl, attached) that simply made directories with lots of files inside with an predictable, monotonic numbering scheme (and file contents equal to the filename), such that we could "see" when a file was missing -- we did not fully investigate any non-brute force methods to 'alter' a tmpfs inode number (i.e. using something like debugfs). After running for a few days to get the inode count up to ~4.3G files, this is what we saw:

# ls -i 26445
4294967291 64040
4294967292 64041
4294967293 64042
4294967294 64043
4294967295 64044
1 64046
2 64047
3 64048
4 64049
5 64050

As you can tell, the file named '64045' is conspicuously absent, and based on an overflowing d_ino, would be expected to have inode number zero (i.e. when the 32-bit counter hits 2**32). A wildcard match was not any help either:

# ls 6404* -l
-rw-r--r-- 1 user user 4 Jan 11 22:55 6404
-rw-r--r-- 1 user user 5 Jan 11 22:55 64040
-rw-r--r-- 1 user user 5 Jan 11 22:55 64041
-rw-r--r-- 1 user user 5 Jan 11 22:55 64042
-rw-r--r-- 1 user user 5 Jan 11 22:55 64043
-rw-r--r-- 1 user user 5 Jan 11 22:55 64044
-rw-r--r-- 1 user user 5 Jan 11 22:55 64046
-rw-r--r-- 1 user user 5 Jan 11 22:55 64047
-rw-r--r-- 1 user user 5 Jan 11 22:55 64048
-rw-r--r-- 1 user user 5 Jan 11 22:55 64049

However, since we know which filename is missing, we can ask the filesystem about it -- and it's not really sure either:

# ls -i 64045
? 64045
# ls 64045
64045
# cat 64045
64045

But stat output confirms our suspicions:

# stat 64045
  File: `64045'
  Size: 5 Blocks: 8 IO Block: 4096 regular file
Device: 15h/21d Inode: 0 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-01-13 10:39:42.272135907 -0500
Modify: 2014-01-11 22:55:36.418285866 -0500
Change: 2014-01-11 22:55:36.418285866 -0500
Additional InformationThis was further supported by the fact that an empty TMPFS directory has a block size of 40, while our recalcitrant directories have a size of 60, consistent with a single file inside:

# mkdir empty; ls -la empty
total 0
drwxr-xr-x 2 user user 40 Dec 5 09:29 .
drwxr-xr-x 5 user user 100 Dec 5 09:29 ..

# mkdir onefile; touch onefile/test.txt; ls -la onefile
total 4
drwxr-xr-x 2 user user 60 Dec 5 09:37 .
drwxr-xr-x 5 user user 100 Dec 5 09:37 ..
-rw-r--r-- 1 user user 0 Dec 5 09:37 test.txt

strace of 'rm -rf' for the 'onefile' directory shows the attempt to unlink the test file:

openat(AT_FDCWD, "onefile", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY) = 3
getdents(3, /* 3 entries */, 32768) = 72
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
unlinkat(4, "test.txt", 0) = 0
unlinkat(AT_FDCWD, "test.txt", AT_REMOVEDIR) = 0

Whereas a strace of the unremovable directory revealed no such attempt, consistent a file that rm cannot list:

openat(AT_FDCWD, "1381276560", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY) = 3
getdents(3, /* 3 entries */, 32768) = 80
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
unlinkat(AT_FDCWD, "1381276560", AT_REMOVEDIR) = -1 ENOTEMPTY (Directory not empty)

As an aside, this problem also arose on 32-bit CentOS, and Ubuntu as well:

CentOS release 5.9 (Final) - Linux 2.6.18-348.3.1.el5PAE #1 SMP Mon Mar 11 20:30:57 EDT 2013 i686 i686 i386 GNU/Linux

Ubuntu 13.04 (raring) - Linux 7010 3.8.0-35-generic #50-Ubuntu SMP Tue Dec 3 01:24:59 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
TagsNo tags attached.

Activities

casalemedia_RDG

casalemedia_RDG

2014-02-14 17:24

reporter  

inodeOverflow.pl (978 bytes)
casalemedia_RDG

casalemedia_RDG

2014-02-14 17:25

reporter  

getdents.c (1,486 bytes)   
/*
 * https://raw.github.com/aidenbell/getdents/master/src/getdents.c
 * List directories using getdents() because ls, find and Python libraries
 * use readdir() which is slower (but uses getdents() underneath.
 *
 * Compile with 
 * ]$ gcc  getdents.c -o getdents
 */
#define _GNU_SOURCE
#include <dirent.h>     /* Defines DT_* constants */
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define handle_error(msg) \
       do { perror(msg); exit(EXIT_FAILURE); } while (0)

struct linux_dirent {
   long           d_ino;
   off_t          d_off;
   unsigned short d_reclen;
   char           d_name[];
};

#define BUF_SIZE 1024*1024*5

int
main(int argc, char *argv[])
{
   int fd, nread;
   char buf[BUF_SIZE];
   struct linux_dirent *d;
   int bpos;
   char d_type;

   fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
   if (fd == -1)
       handle_error("open");

   for ( ; ; ) {
       nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
       if (nread == -1)
           handle_error("getdents");

       if (nread == 0)
           break;

       for (bpos = 0; bpos < nread;) {
           d = (struct linux_dirent *) (buf + bpos);
           d_type = *(buf + bpos + d->d_reclen - 1);
		   if( d->d_ino == 0 && d_type == DT_REG ) {
			   printf("Inode number %ld: %s\n", d->d_ino, (char *)d->d_name );
           }
           bpos += d->d_reclen;
       }
   }

   exit(EXIT_SUCCESS);
}

getdents.c (1,486 bytes)   
tigalch

tigalch

2014-02-14 17:51

manager   ~0019288

Just for clarification: do you get this issue on 6.5 (kernel 2.6.32-431.5.1)? and do you get this on C5.10 (kernel 2.6.18-371.4.1)? Thanks.
casalemedia_RDG

casalemedia_RDG

2014-02-14 19:54

reporter   ~0019289

I'll have to double-check to see if there were any servers running those specific versions; if not, I can try and fire up some virtuals to check.

FYI, CentOS 5.8 running 2.6.18-308.1.1.el5PAE kernel is also affected.
casalemedia_RDG

casalemedia_RDG

2014-02-18 17:05

reporter   ~0019300

Tested again on: C6.5, Linux 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Same issue re: inode number zero deletion (empty directory, stat for filename shows inode 0, etc). However, the error message from 'rm' is slightly different, see the abbreviated strace below:

--

openat(AT_FDCWD, "test", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY) = 3
getdents(3, /* 3 entries */, 32768) = 80
getdents(3, /* 0 entries */, 32768) = 0
close(3) = 0
unlinkat(AT_FDCWD, "test", AT_REMOVEDIR) = -1 EBUSY (Device or resource busy)
write(2, "rm: ", 4rm: ) = 4
write(2, "cannot remove `test'", 20cannot remove `test') = 20
write(2, ": Device or resource busy", 25: Device or resource busy) = 25
write(2, "\n", 1) = 1

--

Looks like it output a different message after noticed that the directory isn't empty, but otherwise, same as for C6.4 and earlier platforms.

Trying to check on C5.10 as well, but I doubt the output would be any different -- @tigalch : where do we go from here now that the most recent CentOS 6 codebase is affected?
tigalch

tigalch

2014-02-18 17:15

manager   ~0019301

I would suggest filing a bug against https://bugzilla.redhat.com, as it has to be fixed there. Once it gets fixed there, CentOS will inherit the fix. No, you don't need a valid support contract for file bugs.
casalemedia_RDG

casalemedia_RDG

2014-02-19 03:27

reporter   ~0019304

Bug filed: https://bugzilla.redhat.com/show_bug.cgi?id=1066751
casalemedia_RDG

casalemedia_RDG

2014-02-26 18:49

reporter   ~0019382

Update -- tested on C5.10 as well:

[root@centos5-10 inode0Test]# rm -rf test
rm: cannot remove directory `test': Device or resource busy
[root@centos5-10 inode0Test]# cd test
[root@centos5-10 test]# ls -i 3181525979
0 3181525979
[root@centos5-10 test]# stat 3181525979
  File: `3181525979'
  Size: 10 Blocks: 8 IO Block: 4096 regular file
Device: 18h/24d Inode: 0 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-02-26 12:55:48.071879127 -0500
Modify: 2014-02-26 12:55:48.071879127 -0500
Change: 2014-02-26 12:55:48.071879127 -0500

Only difference is ls -i will actually show the inode number as 0 instead of ?.
toracat

toracat

2014-02-26 21:09

manager   ~0019383

With the way things are "progressing" in the upstream BZ, it is probably a good thing to test this with the mainline kernel to see if this is reproducible there. Fortunately it is fairly easy to install the mainline kernel in CentOS. Just use kernel-ml [1] from ELRepo. Can someone help with this testing?

For CentOS-5, there is kernel-lt [2].

[1] http://elrepo.org/tiki/kernel-ml
[2] http://elrepo.org/tiki/kernel-lt
casalemedia_RDG

casalemedia_RDG

2014-02-27 15:16

reporter   ~0019391

@toracat -- I'll try and test mainline kernel as well for C5 and C6.
casalemedia_RDG

casalemedia_RDG

2014-03-03 17:01

reporter   ~0019399

Recreated on CentOS 6.5, kernel 3.13.5-1.el6.elrepo.x86_64

[root@centos6-5 inode0Test]# rm -rf test
rm: cannot remove `test': Device or resource busy
[root@centos6-5 inode0Test]# cd test
[root@centos6-5 test]# ls -i 3946307038
? 3946307038
[root@centos6-5 test]# stat 3946307038
  File: `3946307038'
  Size: 10 Blocks: 8 IO Block: 4096 regular file
Device: 14h/20d Inode: 0 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-03-01 11:00:11.561272342 -0500
Modify: 2014-03-01 11:00:11.561272342 -0500
Change: 2014-03-01 11:00:11.561272342 -0500
toracat

toracat

2016-05-10 22:21

manager   ~0026483

According to the upstream BZ 1066751, the issue has been fixed in the RHEL6.8 GA kernel released today (kernel-2.6.32-642.el6).

Issue History

Date Modified Username Field Change
2014-02-14 17:24 casalemedia_RDG New Issue
2014-02-14 17:24 casalemedia_RDG File Added: inodeOverflow.pl
2014-02-14 17:25 casalemedia_RDG File Added: getdents.c
2014-02-14 17:51 tigalch Note Added: 0019288
2014-02-14 19:54 casalemedia_RDG Note Added: 0019289
2014-02-18 17:05 casalemedia_RDG Note Added: 0019300
2014-02-18 17:15 tigalch Note Added: 0019301
2014-02-19 03:27 casalemedia_RDG Note Added: 0019304
2014-02-26 18:49 casalemedia_RDG Note Added: 0019382
2014-02-26 21:09 toracat Note Added: 0019383
2014-02-27 15:16 casalemedia_RDG Note Added: 0019391
2014-03-03 17:01 casalemedia_RDG Note Added: 0019399
2016-05-10 22:21 toracat Note Added: 0026483