View Issue Details

IDProjectCategoryView StatusLast Update
0005955CentOS-6kernelpublic2013-02-27 01:15
Reportercrazed 
PrioritynormalSeveritycrashReproducibilityrandom
Status resolvedResolutionfixed 
Product Version6.3 
Target VersionFixed in Version6.4 
Summary0005955: fsync tid wraparound bug
DescriptionAs described here:
http://lkml.indiana.edu/hypermail/linux/kernel/1106.0/00148.html

I've hit this bug a few times on production database machines. The only way I've found to solve it after the bug effects a file, is to reformat the filesystem.
Steps To ReproduceCreate a high traffic mysql machine and wait some amount of time before encountering this bug. It is hard to reproduce.
TagsNo tags attached.

Activities

crazed

crazed

2012-09-14 18:02

reporter   ~0015774

RedHat has fixed this in 6.4's kernel. The relevant redhat bug:
https://bugzilla.redhat.com/show_bug.cgi?id=735768
toracat

toracat

2012-09-14 20:07

manager  

jbd.patch (2,567 bytes)
commit d9b01934d56a96d9f4ae2d6204d4ea78a36f5f36
Author: Ted Ts'o <tytso@mit.edu>
Date:   Sat Apr 30 13:17:11 2011 -0400

    jbd: fix fsync() tid wraparound bug
    
    If an application program does not make any changes to the indirect
    blocks or extent tree, i_datasync_tid will not get updated.  If there
    are enough commits (i.e., 2**31) such that tid_geq()'s calculations
    wrap, and there isn't a currently active transaction at the time of
    the fdatasync() call, this can end up triggering a BUG_ON in
    fs/jbd/commit.c:
    
    	J_ASSERT(journal->j_running_transaction != NULL);
    
    It's pretty rare that this can happen, since it requires the use of
    fdatasync() plus *very* frequent and excessive use of fsync().  But
    with the right workload, it can.
    
    We fix this by replacing the use of tid_geq() with an equality test,
    since there's only one valid transaction id that is valid for us to
    start: namely, the currently running transaction (if it exists).
    
    CC: stable@kernel.org
    Reported-by: Martin_Zielinski@McAfee.com
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Jan Kara <jack@suse.cz>

diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c
index b3713af..e2d4285 100644
--- a/fs/jbd/journal.c
+++ b/fs/jbd/journal.c
@@ -437,9 +437,12 @@ int __log_space_left(journal_t *journal)
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
-	if (!tid_geq(journal->j_commit_request, target)) {
+	if (journal->j_running_transaction &&
+	    journal->j_running_transaction->t_tid == target) {
 		/*
 		 * We want a new commit: OK, mark the request and wakeup the
 		 * commit thread.  We do _not_ do the commit ourselves.
@@ -451,7 +454,14 @@ int __log_start_commit(journal_t *journal, tid_t target)
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		    journal->j_commit_request, journal->j_commit_sequence,
+		    target, journal->j_running_transaction ?
+		    journal->j_running_transaction->t_tid : 0);
 	return 0;
 }
 
jbd.patch (2,567 bytes)
toracat

toracat

2012-09-14 20:08

manager   ~0015776

The patch copied from lkml uploaded. Maybe this is a good candidate for the centosplus kernel ?
toracat

toracat

2012-09-14 20:25

manager  

c6.patch (1,060 bytes)
--- a/fs/jbd/journal.c	2012-08-14 05:46:36.000000000 -0700
+++ b/fs/jbd/journal.c	2012-09-14 13:21:49.028260864 -0700
@@ -439,7 +439,9 @@
 int __log_start_commit(journal_t *journal, tid_t target)
 {
 	/*
-	 * Are we already doing a recent enough commit?
+	 * The only transaction we can possibly wait upon is the
+	 * currently running transaction (if it exists).  Otherwise,
+	 * the target tid must be an old one.
 	 */
 	if (!tid_geq(journal->j_commit_request, target)) {
 		/*
@@ -453,7 +455,14 @@
 			  journal->j_commit_sequence);
 		wake_up(&journal->j_wait_commit);
 		return 1;
-	}
+	} else if (!tid_geq(journal->j_commit_request, target))
+		/* This should never happen, but if it does, preserve
+		   the evidence before kjournald goes into a loop and
+		   increments j_commit_sequence beyond all recognition. */
+		WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n",
+		   journal->j_commit_request, journal->j_commit_sequence,
+		   target, journal->j_running_transaction ?
+		   journal->j_running_transaction->t_tid : 0);
 	return 0;
 }
 
c6.patch (1,060 bytes)
toracat

toracat

2012-09-14 20:25

manager   ~0015777

Patch file (c6.patch) adjusted for the C6 kernel uploaded.
toracat

toracat

2012-09-15 17:31

manager   ~0015782

I have uploaded the centosplus kernel with the patch applied ( kernel-2.6.32-279.5.2.bug5955.el6.centos.plus ) to:

http://people.centos.org/toracat/kernel/6/plus/bug5955/

for testing purposes. Please note that the packages are not signed.
toracat

toracat

2012-12-26 04:07

manager   ~0016182

Closing as the patch is in the centosplus kernel. Please feel free to reopen if the patch does not fix the issue.
toracat

toracat

2013-02-27 01:15

manager   ~0016547

The patch is in the 6.4 kernel ( 2.6.32-358.el6 ).

Issue History

Date Modified Username Field Change
2012-09-14 18:00 crazed New Issue
2012-09-14 18:02 crazed Note Added: 0015774
2012-09-14 20:07 toracat File Added: jbd.patch
2012-09-14 20:08 toracat Note Added: 0015776
2012-09-14 20:25 toracat File Added: c6.patch
2012-09-14 20:25 toracat Note Added: 0015777
2012-09-15 17:31 toracat Note Added: 0015782
2012-12-26 04:07 toracat Note Added: 0016182
2012-12-26 04:07 toracat Status new => resolved
2012-12-26 04:07 toracat Resolution open => fixed
2013-02-27 01:15 toracat Note Added: 0016547
2013-02-27 01:15 toracat Status resolved => feedback
2013-02-27 01:15 toracat Resolution fixed => reopened
2013-02-27 01:15 toracat Status feedback => resolved
2013-02-27 01:15 toracat Resolution reopened => fixed
2013-02-27 01:15 toracat Fixed in Version => 6.4