View Issue Details

IDProjectCategoryView StatusLast Update
0008371CentOS-6kernelpublic2016-11-19 16:11
Reporterrpwagner 
PriorityimmediateSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
PlatformIntel Xeon E5-2680v3OSLinuxOS VersionCentOS 6
Product Version 
Target VersionFixed in Version 
Summary0008371: futex waiter counter causing hangs in 2.6.32-504.8.1.el6.x86_64 on Haswell
DescriptionAs described in this LKML thread [1], we are seeing pthreads hang on futexes. This has occurred with several applications, and I verified it with the same chess simulation algorithm that was described in the thread. The chess application and the other failing ones were tested using the ELRepo 3.10-lt and 3.19-ml kernels packages, and neither kernel reproduced the behavior.

In the kernel v3 code, this was resolved by reverting commit 11d4616. It is not clear to me if there is a similar solution for the 2.6.32 code.

[1] https://lkml.org/lkml/2014/10/3/399
Steps To ReproduceOn a dual socket Intel Xeon E5-2680v3 or comparable Haswell system, clone Stockfish, and checkout commit 58bb23d, or another commit near the time of the original posting regarding this issue. Build using "make build ARCH=x86-64-modern", and run using

#!/bin/sh

stockfish <<%EOF
uci
setoption name Threads value 24
setoption name Hash value 1024
position fen rnbq1rk1/pppnbppp/4p3/3pP1B1/3P3P/2N5/PPP2PP1/R2QKBNR w KQ – 0 7
go wtime 7200000 winc 30000 btime 7200000 binc 30000
%EOF

Note that "Threads value" is being set to the number of physical cores on our system. I find that this hangs within 10 seconds. Attach with gdb, and use "info threads" to see the thread that is hung in __lll_lock_wait. Releasing the thread from gdb will cause the application to continue.

[1] https://github.com/official-stockfish/Stockfish
TagsNo tags attached.

Activities

rpwagner

rpwagner

2015-04-02 06:22

reporter  

stockfish_asm.png (190,922 bytes)
stockfish_asm.png (190,922 bytes)
toracat

toracat

2015-04-02 17:13

manager   ~0022649

Last edited: 2015-04-02 17:26

View 2 revisions

This is something that needs to be fixed upstream (RH). Would you please file a report at http://bugzilla.redhat.com ? Of course it will be great if you can confirm that reverting commit 11d4616 actually works with the current CentOS/RHEL kernel. I can try and offer a kernel with those mods if you are able to run a test.

rpwagner

rpwagner

2015-04-02 17:22

reporter   ~0022650

Thanks. I'll submit the upstream ticket. I did try building an updated kernel with a patched futex.c, but made some mistake in the process and the test machines did not boot. Once I resolve that I can comment on whether or not the patch resolves the issue.
rpwagner

rpwagner

2015-04-02 17:54

reporter   ~0022651

Bug report submitted to RHEL as 1208633.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1208633
toracat

toracat

2015-04-04 12:18

manager   ~0022654

@rpwagner

As you have probably seen, the bugs.centos.org site had to be moved to a new place due to network outage and some recent content was lost. Your last post (with a patch) was one of them. Could you please repost it?
rpwagner

rpwagner

2015-04-04 16:57

reporter  

futexwait-fail.patch (4,400 bytes)
diff -rupN a/kernel/futex.c b/kernel/futex.c
--- a/kernel/futex.c	2015-04-02 21:19:23.000000000 -0700
+++ b/kernel/futex.c	2015-04-02 21:19:32.000000000 -0700
@@ -127,7 +127,6 @@ struct futex_q {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
-	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -147,37 +146,22 @@ static inline void futex_get_mm(union fu
 	smp_mb__after_atomic_inc();
 }
 
-/*
- * Reflects a new waiter being added to the waitqueue.
- */
-static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
+static inline bool hb_waiters_pending(struct futex_hash_bucket *hb)
 {
 #ifdef CONFIG_SMP
-	atomic_inc(&hb->waiters);
 	/*
-	 * Full barrier (A), see the ordering comment above.
-	 */
-	smp_mb__after_atomic_inc();
-#endif
-}
-
-/*
- * Reflects a waiter being removed from the waitqueue by wakeup
- * paths.
- */
-static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
-{
-#ifdef CONFIG_SMP
-	atomic_dec(&hb->waiters);
-#endif
-}
+	 * Tasks trying to enter the critical region are most likely
+	 * potential waiters that will be added to the plist. Ensure
+	 * that wakers won't miss to-be-slept tasks in the window between
+	 * the wait call and the actual plist_add.
+	 */
+	if (spin_is_locked(&hb->lock))
+		return true;
+	smp_rmb(); /* Make sure we check the lock state first */
 
-static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
-{
-#ifdef CONFIG_SMP
-	return atomic_read(&hb->waiters);
+	return !plist_head_empty(&hb->chain);
 #else
-	return 1;
+	return true;
 #endif
 }
 
@@ -971,7 +955,6 @@ static void __unqueue_futex(struct futex
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &q->list.plist);
-	hb_waiters_dec(hb);
 }
 
 /*
@@ -1269,9 +1252,7 @@ void requeue_futex(struct futex_q *q, st
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
-		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
-		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 #ifdef CONFIG_DEBUG_PI_LIST
 		q->list.plist.lock = &hb2->lock;
@@ -1466,7 +1447,6 @@ retry:
 	hb2 = hash_futex(&key2);
 
 retry_private:
-	hb_waiters_inc(hb2);
 	double_lock_hb(hb1, hb2);
 
 	if (likely(cmpval != NULL)) {
@@ -1476,7 +1456,6 @@ retry_private:
 
 		if (unlikely(ret)) {
 			double_unlock_hb(hb1, hb2);
-			hb_waiters_dec(hb2);
 
 			ret = get_user(curval, uaddr1);
 			if (ret)
@@ -1526,7 +1505,6 @@ retry_private:
 			break;
 		case -EFAULT:
 			double_unlock_hb(hb1, hb2);
-			hb_waiters_dec(hb2);
 			put_futex_key(fshared, &key2);
 			put_futex_key(fshared, &key1);
 			ret = fault_in_user_writeable(uaddr2);
@@ -1536,7 +1514,6 @@ retry_private:
 		case -EAGAIN:
 			/* The owner was exiting, try again. */
 			double_unlock_hb(hb1, hb2);
-			hb_waiters_dec(hb2);
 			put_futex_key(fshared, &key2);
 			put_futex_key(fshared, &key1);
 			cond_resched();
@@ -1609,7 +1586,6 @@ retry_private:
 
 out_unlock:
 	double_unlock_hb(hb1, hb2);
-	hb_waiters_dec(hb2);
 
 	/*
 	 * drop_futex_key_refs() must be called outside the spinlocks. During
@@ -1637,16 +1613,6 @@ static inline struct futex_hash_bucket *
 
 	hb = hash_futex(&q->key);
 
-	/*
-	 * Increment the counter before taking the lock so that
-	 * a potential waker won't miss a to-be-slept task that is
-	 * waiting for the spinlock. This is safe as all queue_lock()
-	 * users end up calling queue_me(). Similarly, for housekeeping,
-	 * decrement the counter at queue_unlock() when some error has
-	 * occurred and we don't end up adding the task to the list.
-	 */
-	hb_waiters_inc(hb);
-
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock); /* implies MB (A) */
@@ -1657,7 +1623,6 @@ static inline void
 queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
 {
 	spin_unlock(&hb->lock);
-	hb_waiters_dec(hb);
 }
 
 /**
@@ -2419,7 +2384,6 @@ int handle_early_requeue_pi_wakeup(struc
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &q->list.plist);
-		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2963,7 +2927,6 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
-		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain, &futex_queues[i].lock);
 		spin_lock_init(&futex_queues[i].lock);
 	}
futexwait-fail.patch (4,400 bytes)
rpwagner

rpwagner

2015-04-04 17:04

reporter   ~0022655

Thanks, @toracat.

Attached is the patch derived from one posted to the LKML thread [1] I referenced. I did manage to compile and boot a kernel using this (by disabling CONFIG_MODULE_SIG). In the end, it may only serve as a reference, because once I was able to apply it, it managed to trigger the hang almost immediately. But, the id likely something wrong with the patch, since I based it on kernel v3 patch.

[1] https://lkml.org/lkml/2014/10/8/406
rpwagner

rpwagner

2015-04-04 17:05

reporter   ~0022656

Something I posted on the Red Hat ticket:

"As an aside, I was contacted by two colleagues today, both at separate organizations (one commercial, one academic) who have seen this issue with a C++ and Java-based application. Both of these groups are using dual socket Haswell servers, although I can't tell how Haswell is significant, other than core count and performance, as neither of these applications are using AVX, and hle is disabled (as it should be)."
toracat

toracat

2015-04-04 17:20

manager   ~0022657

Thanks for the update report. Sounds as if we are best dependent on the upstream (RH) kernel developers. Hopefully they can manage to fix the code in the current RHEL kernel.
kirbyzhou

kirbyzhou

2015-04-08 05:15

reporter   ~0022703

Is the bug as same as my bug https://bugzilla.redhat.com/show_bug.cgi?id=1207137
kirbyzhou

kirbyzhou

2015-04-08 05:17

reporter  

kernellocktest.cpp (6,015 bytes)
rpwagner

rpwagner

2015-04-08 05:18

reporter   ~0022704

@kirbyzhou, I'm afraid I can't see your bug report to Red Hat. You'll have to add me (rpwagner@sdsc.edu) to the CC for me to look at it.
kirbyzhou

kirbyzhou

2015-04-08 05:26

reporter   ~0022705

@rpwagner, I have added you, and please let me see 1208633 too.
kirbyzhou

kirbyzhou

2015-04-08 05:26

reporter   ~0022706

@rpwagner, kirbyzhou@sogou-inc.com
arvids

arvids

2015-04-13 17:52

reporter   ~0022769

is not this issue the same as https://access.redhat.com/solutions/1350963? It says: A fix is available and will be shipped as Red Hat Enterprise Linux 6.6.z Errata.
toracat

toracat

2015-06-10 16:06

manager   ~0023367

Last edited: 2015-06-11 08:13

View 2 revisions

kernel-2.6.32-504.16.2.el6 just released should have the fix.
(https://access.redhat.com/solutions/1386323)

Issue History

Date Modified Username Field Change
2015-04-02 06:22 rpwagner New Issue
2015-04-02 06:22 rpwagner File Added: stockfish_asm.png
2015-04-02 12:00 toracat Description Updated View Revisions
2015-04-02 17:13 toracat Note Added: 0022649
2015-04-02 17:22 rpwagner Note Added: 0022650
2015-04-02 17:26 toracat Note Edited: 0022649 View Revisions
2015-04-02 17:27 toracat Status new => assigned
2015-04-02 17:54 rpwagner Note Added: 0022651
2015-04-04 12:18 toracat Note Added: 0022654
2015-04-04 16:57 rpwagner File Added: futexwait-fail.patch
2015-04-04 17:04 rpwagner Note Added: 0022655
2015-04-04 17:05 rpwagner Note Added: 0022656
2015-04-04 17:20 toracat Note Added: 0022657
2015-04-08 05:15 kirbyzhou Note Added: 0022703
2015-04-08 05:17 kirbyzhou File Added: kernellocktest.cpp
2015-04-08 05:18 rpwagner Note Added: 0022704
2015-04-08 05:26 kirbyzhou Note Added: 0022705
2015-04-08 05:26 kirbyzhou Note Added: 0022706
2015-04-13 17:52 arvids Note Added: 0022769
2015-06-10 16:06 toracat Note Added: 0023367
2015-06-11 08:13 toracat Note Edited: 0023367 View Revisions
2016-11-19 16:11 toracat Status assigned => resolved
2016-11-19 16:11 toracat Resolution open => fixed