linux-toradex.git/kernel/futex.c, branch v3.14.57

futex: Ensure get_futex_key_refs() always implies a barrier

2014-10-30T16:38:24+00:00

commit 76835b0ebf8a7fe85beb03c75121419a7dec52f0 upstream.

Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas 
Reported-by: Matteo Franchin 
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso 
Tested-by: Mike Galbraith 
Cc: Darren Hart 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Paul E. McKenney 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

futex: Unlock hb->lock in futex_wait_requeue_pi() error path

2014-10-05T21:52:19+00:00

commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8 upstream.

futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

        if (match_futex(&q.key, &key2)) {
	   	ret = -EINVAL;
		goto out_put_keys;
	}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Hart 
Reviewed-by: Davidlohr Bueso 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

futex: Make lookup_pi_state more robust

2014-06-07T17:28:29+00:00

commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream.

The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner 
Cc: Kees Cook 
Cc: Will Drewry 
Cc: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

futex: Always cleanup owner tid in unlock_pi

2014-06-07T17:28:29+00:00

commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream.

If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner 
Cc: Kees Cook 
Cc: Will Drewry 
Cc: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

futex: Validate atomic acquisition in futex_lock_pi_atomic()

2014-06-07T17:28:29+00:00

commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream.

We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart 
Cc: Kees Cook 
Cc: Will Drewry 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

2014-06-07T17:28:29+00:00

commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry 
Signed-off-by: Kees Cook 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

futex: Prevent attaching to kernel threads

2014-06-07T17:28:07+00:00

commit f0d71b3dcb8332f7971b5f2363632573e6d9486a upstream.

We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.

Found while analysing the fallout of Dave Jones syscall fuzzer.

We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.

Signed-off-by: Thomas Gleixner 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Darren Hart 
Cc: Davidlohr Bueso 
Cc: Steven Rostedt 
Cc: Clark Williams 
Cc: Paul McKenney 
Cc: Lai Jiangshan 
Cc: Roland McGrath 
Cc: Carlos ODonell 
Cc: Jakub Jelinek 
Cc: Michael Kerrisk 
Cc: Sebastian Andrzej Siewior 
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

futex: Add another early deadlock detection check

2014-06-07T17:28:07+00:00

commit 866293ee54227584ffcb4a42f69c1f365974ba7f upstream.

Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
  http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com

That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
    - it can detect that issue early
    - it avoids a different and more complex fixup for backing out

If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.

The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.

Signed-off-by: Thomas Gleixner 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Darren Hart 
Cc: Davidlohr Bueso 
Cc: Steven Rostedt 
Cc: Clark Williams 
Cc: Paul McKenney 
Cc: Lai Jiangshan 
Cc: Roland McGrath 
Cc: Carlos ODonell 
Cc: Jakub Jelinek 
Cc: Michael Kerrisk 
Cc: Sebastian Andrzej Siewior 
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test

2014-04-14T13:50:05+00:00

commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream.

If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
is no runtime check necessary, allow to skip the test within futex_init().

This allows to get rid of some code which would always give the same result,
and also allows the compiler to optimize a couple of if statements away.

Signed-off-by: Heiko Carstens 
Cc: Finn Thain 
Cc: Geert Uytterhoeven 
Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

futex: avoid race between requeue and wake

2014-04-14T13:50:03+00:00

commit 69cd9eba38867a493a043bb13eb9b33cad5f1a9a upstream.

Jan Stancek reported:
 "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
  occasionally fails, because some threads fail to wake up.

  Testcase creates 5 threads, which are all waiting on same condition.
  Main thread then calls pthread_cond_broadcast() without holding mutex,
  which calls:

      futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

  This immediately wakes up single thread A, which unlocks mutex and
  tries to wake up another thread:

      futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

  If thread A manages to call futex_wake() before any waiters are
  requeued for uaddr2, no other thread is woken up"

The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.

This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks.  It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.

Reported-and-tested-by: Jan Stancek 
Acked-by: Davidlohr Bueso 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman