linux-toradex.git/kernel, branch v3.2.60

futex: Make lookup_pi_state more robust

2014-06-09T12:29:17+00:00

commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream.

The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner 
Cc: Kees Cook 
Cc: Will Drewry 
Cc: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

futex: Always cleanup owner tid in unlock_pi

2014-06-09T12:29:16+00:00

commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream.

If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner 
Cc: Kees Cook 
Cc: Will Drewry 
Cc: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

futex: Validate atomic acquisition in futex_lock_pi_atomic()

2014-06-09T12:29:16+00:00

commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream.

We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart 
Cc: Kees Cook 
Cc: Will Drewry 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)

2014-06-09T12:29:16+00:00

commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry 
Signed-off-by: Kees Cook 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Hart 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings

futex: Prevent attaching to kernel threads

2014-06-09T12:29:16+00:00

commit f0d71b3dcb8332f7971b5f2363632573e6d9486a upstream.

We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.

Found while analysing the fallout of Dave Jones syscall fuzzer.

We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.

Signed-off-by: Thomas Gleixner 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Darren Hart 
Cc: Davidlohr Bueso 
Cc: Steven Rostedt 
Cc: Clark Williams 
Cc: Paul McKenney 
Cc: Lai Jiangshan 
Cc: Roland McGrath 
Cc: Carlos ODonell 
Cc: Jakub Jelinek 
Cc: Michael Kerrisk 
Cc: Sebastian Andrzej Siewior 
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

futex: Add another early deadlock detection check

2014-06-09T12:29:16+00:00

commit 866293ee54227584ffcb4a42f69c1f365974ba7f upstream.

Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
  http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com

That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
    - it can detect that issue early
    - it avoids a different and more complex fixup for backing out

If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.

The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.

Signed-off-by: Thomas Gleixner 
Cc: Dave Jones 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Darren Hart 
Cc: Davidlohr Bueso 
Cc: Steven Rostedt 
Cc: Clark Williams 
Cc: Paul McKenney 
Cc: Lai Jiangshan 
Cc: Roland McGrath 
Cc: Carlos ODonell 
Cc: Jakub Jelinek 
Cc: Michael Kerrisk 
Cc: Sebastian Andrzej Siewior 
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

perf: Prevent false warning in perf_swevent_add

2014-06-09T12:29:13+00:00

commit 39af6b1678afa5880dda7e375cf3f9d395087f6d upstream.

The perf cpu offline callback takes down all cpu context
events and releases swhash->swevent_hlist.

This could race with task context software event being just
scheduled on this cpu via perf_swevent_add while cpu hotplug
code already cleaned up event's data.

The race happens in the gap between the cpu notifier code
and the cpu being actually taken down. Note that only cpu
ctx events are terminated in the perf cpu hotplug code.

It's easily reproduced with:
  $ perf record -e faults perf bench sched pipe

while putting one of the cpus offline:
  # echo 0 > /sys/devices/system/cpu/cpu1/online

Console emits following warning:
  WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
  Modules linked in:
  CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
  Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
   0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
   0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
   ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
  Call Trace:
   [] dump_stack+0x4f/0x7c
   [] warn_slowpath_common+0x8c/0xc0
   [] warn_slowpath_null+0x1a/0x20
   [] perf_swevent_add+0x18d/0x1a0
   [] event_sched_in.isra.75+0x9e/0x1f0
   [] group_sched_in+0x6a/0x1f0
   [] ? sched_clock_local+0x25/0xa0
   [] ctx_sched_in+0x1f6/0x450
   [] perf_event_sched_in+0x6b/0xa0
   [] perf_event_context_sched_in+0x7b/0xc0
   [] __perf_event_task_sched_in+0x43e/0x460
   [] ? put_lock_stats.isra.18+0xe/0x30
   [] finish_task_switch+0xb8/0x100
   [] __schedule+0x30e/0xad0
   [] ? pipe_read+0x3e2/0x560
   [] ? preempt_schedule_irq+0x3e/0x70
   [] ? preempt_schedule_irq+0x3e/0x70
   [] preempt_schedule_irq+0x44/0x70
   [] retint_kernel+0x20/0x30
   [] ? lockdep_sys_exit+0x1a/0x90
   [] lockdep_sys_exit_thunk+0x35/0x67
   [] ? sysret_check+0x5/0x56

Fixing this by tracking the cpu hotplug state and displaying
the WARN only if current cpu is initialized properly.

Cc: Corey Ashford 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Paul Mackerras 
Cc: Arnaldo Carvalho de Melo 
Reported-by: Fengguang Wu 
Signed-off-by: Jiri Olsa 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

perf: Limit perf_event_attr::sample_period to 63 bits

2014-06-09T12:29:13+00:00

commit 0819b2e30ccb93edf04876237b6205eef84ec8d2 upstream.

Vince reported that using a large sample_period (one with bit 63 set)
results in wreckage since while the sample_period is fundamentally
unsigned (negative periods don't make sense) the way we implement
things very much rely on signed logic.

So limit sample_period to 63 bits to avoid tripping over this.

Reported-by: Vince Weaver 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

hrtimer: Set expiry time before switch_hrtimer_base()

2014-06-09T12:29:10+00:00

commit 84ea7fe37908254c3bd90910921f6e1045c1747a upstream.

switch_hrtimer_base() calls hrtimer_check_target() which ensures that
we do not migrate a timer to a remote cpu if the timer expires before
the current programmed expiry time on that remote cpu.

But __hrtimer_start_range_ns() calls switch_hrtimer_base() before the
new expiry time is set. So the sanity check in hrtimer_check_target()
is operating on stale or even uninitialized data.

Update expiry time before calling switch_hrtimer_base().

[ tglx: Rewrote changelog once again ]

Signed-off-by: Viresh Kumar 
Cc: linaro-kernel@lists.linaro.org
Cc: linaro-networking@linaro.org
Cc: fweisbec@gmail.com
Cc: arvind.chauhan@arm.com
Link: http://lkml.kernel.org/r/81999e148745fc51bbcd0615823fbab9b2e87e23.1399882253.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings

sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check

2014-06-09T12:29:10+00:00

commit 6227cb00cc120f9a43ce8313bb0475ddabcb7d01 upstream.

The check at the beginning of cpupri_find() makes sure that the task_pri
variable does not exceed the cp->pri_to_cpu array length. But that length
is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
priorities in that array.

As task_pri is computed from convert_prio() which should never be bigger
than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
hit.

Reported-by: Mike Galbraith 
Signed-off-by: Steven Rostedt 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.net
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings