View Issue Details

IDProjectCategoryView StatusLast Update
0002448CentOS-5kernelpublic2007-12-02 19:24
Reporterarrfab 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version5.1 
Target VersionFixed in Version5.1 
Summary0002448: autofs failing on first access to a nfs server
DescriptionWhen using yum pointing to a nfs central server holding the updates repository automounted through nfs, yum fails .
'File doesn't exist' is the answer.
It seems kernel related and i found a bug upstream :
https://bugzilla.redhat.com/show_bug.cgi?id=377661
TagsNo tags attached.

Activities

toracat

toracat

2007-11-17 17:48

manager   ~0006336

The -56.el5 kernel from http://people.redhat.com/jlayton/ fixed the problem. Tested on both i686 and x86_64.

Akemi

2007-11-20 10:01

 

linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch (5,171 bytes)
From: Ian Kent <ikent@redhat.com>
Subject: Re: [RHEL 5.1 PATCH 1/2] autofs4 - patch correction - fix race 	between mount and expire
Date: Tue, 02 Oct 2007 01:52:58 +0800
Bugzilla: 354621
Message-Id: <1191261178.23256.9.camel@raven.themaw.net>
Changelog: [autofs4] fix race between mount and expire


On Wed, 2007-04-18 at 16:04 +0800, Ian Kent wrote:
> Hi all,
> 
> Investigation of bug 174821 lead to the discovery of a race between
> mount and expire. This issue is also present in RHEL5 so I created the
> tracking bug 236875 as a clone of 174821. The attached patch is included
> in the 2.6.21 rc series at present.
> 
> Explaination.
> 
> What happens is that during an expire the situation can arise that a
> directory is removed and another lookup is done before the expire issues
> a completion status to the kernel module. In this case, since the the
> lookup gets a new dentry, it doesn't know that there is an expire in
> progress and when it posts its mount request, matches the existing
> expire request and waits for its completion.  ENOENT is then returned to
> user space from lookup (as the dentry passed in remains negative)
> without having performed the mount request.
> 
> The solution is to keep track of dentrys in this unhashed state and
> reuse them, if possible, in order to preserve the flags.

During the QA of the above bug, 236875 a couple of problems were
uncovered. Somehow, while posted to the bug, two patches didn't make it
into the kernel. A failure due to this was discovered during scheduled
regression testing of autofs and I've since verified these missing
patches resolve the problem.

This patch is the first of the two patches.

Quoting from the bug:

Due to a problem uncovered during QA of this patch for a RHEL-4 Z-Stream
update I've had to revisit this issue.

There are a couple of patches now that depend on this patch and there is
a risk of some confusion regarding the various patches. To try and avoid
this we should be able to use the same patches everywhere so we need
to sync the source of the various kernels with upstream.

This patch wasn't needed for this originally but is now needed by the
fix for the problem identified above during QA and for other bugs that
depend on these patches (for example see bug #253231).

Ian

---
--- linux-2.6.18.noarch/fs/autofs4/root.c.lookup-check-unhashed	2007-08-22 18:37:11.000000000 +0800
+++ linux-2.6.18.noarch/fs/autofs4/root.c	2007-08-22 18:42:40.000000000 +0800
@@ -655,14 +655,29 @@ static struct dentry *autofs4_lookup(str
 
 	/*
 	 * If this dentry is unhashed, then we shouldn't honour this
-	 * lookup even if the dentry is positive.  Returning ENOENT here
-	 * doesn't do the right thing for all system calls, but it should
-	 * be OK for the operations we permit from an autofs.
+	 * lookup.  Returning ENOENT here doesn't do the right thing
+	 * for all system calls, but it should be OK for the operations
+	 * we permit from an autofs.
 	 */
 	if (dentry->d_inode && d_unhashed(dentry)) {
+		/*
+		 * A user space application can (and has done in the past)
+		 * remove and re-create this directory during the callback.
+		 * This can leave us with an unhashed dentry, but a
+		 * successful mount!  So we need to perform another
+		 * cached lookup in case the dentry now exists.
+		 */
+		struct dentry *parent = dentry->d_parent;
+		struct dentry *new = d_lookup(parent, &dentry->d_name);
+		if (new != NULL)
+			dentry = new;
+		else
+			dentry = ERR_PTR(-ENOENT);
+
 		if (unhashed)
 			dput(unhashed);
-		return ERR_PTR(-ENOENT);
+
+		return dentry;
 	}
 
 	if (unhashed)



This patch is the second of the two patches.

Quoting from the bug:

This patch fixes a fail reported during QA testing for a Z-Stream
release for RHEL 4.

It is in fact a hunk from another autofs4 patch that resolves a deadlock
during directory creation under load (see bug #253231 for info). The
deadlock patch delays hashing of dentrys at directory creation until the
actual create operation and so dentrys remain unhashed for a relatively
long time so the code in this patch was needed their. With the
expire/mount race fix here, dentrys are unhashed for a relatively brief
time so the code in this patch was not identified as needed during
development. However, if there are many process concurrently accessing
directories it's possible there will be two or more waiters in the
queue. Only one of the waiters will have the dentry required to
complete the lookup and the others need to perform a d_lookup to get the
correct dentry.

This patch allows these processes to perform the needed d_lookup.

Ian

---
--- linux-2.6.18.noarch/fs/autofs4/root.c.lookup-expire-race-fix-4	2007-08-27 19:29:13.000000000 +0800
+++ linux-2.6.18.noarch/fs/autofs4/root.c	2007-08-27 19:31:13.000000000 +0800
@@ -659,7 +659,7 @@ static struct dentry *autofs4_lookup(str
 	 * for all system calls, but it should be OK for the operations
 	 * we permit from an autofs.
 	 */
-	if (dentry->d_inode && d_unhashed(dentry)) {
+	if (!oz_mode && d_unhashed(dentry)) {
 		/*
 		 * A user space application can (and has done in the past)
 		 * remove and re-create this directory during the callback.

toracat

toracat

2007-11-20 10:13

manager   ~0006344

The -56.el5 kernel has a problem with nfs (see the bugzilla in the original report) and Jeff Layton is currently working on it. However, his revised version will *not* include the autofs patch (as per his e-mail).

According to this bugzilla upstream:

https://bugzilla.redhat.com/show_bug.cgi?id=371341

the autofs problem is a known issue and will be fixed in 5.1.x. A patch is provided in that BZ and is now attached here.

I confirm that the patch solves the problem.

Akemi
toracat

toracat

2007-11-20 11:03

manager   ~0006345

The patch tested on both i686 and x86_64. autofs worked as expected and no nfs crash.

Akemi
toracat

toracat

2007-11-24 17:43

manager   ~0006380

Another workaround is to set

DEFAULT_BROWSE_MODE="yes"

in /etc/sysconfig/autofs. This only works if your auto.home map explicitly
lists every entry, i.e., it does NOT use wildcards like
  * server:/export/home/&

(taken from the bugzilla referred to in 6344; confirmed to work -Akemi)
toracat

toracat

2007-11-29 17:50

manager   ~0006423

A kernel update (2.6.18-53.1.4.el5) is out today which presumably fixes the autofs issue. Not confirmed (yet).

Akemi
toracat

toracat

2007-11-29 20:38

manager   ~0006424

Confirmed that kernel 2.6.18-53.1.4.el5 has the autofs patch and the problem reported here has been fixed. This was tested with x86_64 (as of Nov 29, 2000UTC)

Akemi
toracat

toracat

2007-11-29 21:50

manager   ~0006425

Tested with the i686 kernel and confirmed the problem is gone.

Akemi
toracat

toracat

2007-12-01 19:52

manager   ~0006434

Well ... I am the only person adding notes to this report. Have tested the kernel 2.6.18-53.1.4.el5 from 3 different sources (my own, CentOS, and SciLinux). All worked fine. Also, others are confirming the fix in the upstream bugzilla.

Arrfab, as the original reporter, would you agree that this bug report can be marked "Resolved" ?

Akemi
arrfab

arrfab

2007-12-02 18:50

administrator   ~0006437

Ok, i confirm that it's solved by using 2.6.18-53.1.4.el5 ...
Can an admin mark the bug as being resolved/fixed and close it ?

Issue History

Date Modified Username Field Change
2007-11-17 15:01 arrfab New Issue
2007-11-17 15:01 arrfab Status new => assigned
2007-11-17 17:48 toracat Note Added: 0006336
2007-11-20 10:01 toracat File Added: linux-2.6-autofs4-fix-race-between-mount-and-expire-2.patch
2007-11-20 10:13 toracat Note Added: 0006344
2007-11-20 11:03 toracat Note Added: 0006345
2007-11-20 19:01 toracat Status assigned => acknowledged
2007-11-24 17:43 toracat Note Added: 0006380
2007-11-29 17:50 toracat Note Added: 0006423
2007-11-29 20:38 toracat Note Added: 0006424
2007-11-29 21:50 toracat Note Added: 0006425
2007-12-01 19:52 toracat Note Added: 0006434
2007-12-02 18:50 arrfab Note Added: 0006437
2007-12-02 19:24 kbsingh@karan.org Status acknowledged => resolved
2007-12-02 19:24 kbsingh@karan.org Fixed in Version => 5.1
2007-12-02 19:24 kbsingh@karan.org Resolution open => fixed