linux-toradex.git/kernel/locking/mutex.c, branch v4.10-rc5

locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted

2016-11-22T11:48:10+00:00

An over-committed guest with more vCPUs than pCPUs has a heavy overload
in the two spin_on_owner. This blames on the lock holder preemption
issue.

Break out of the loop if the vCPU is preempted: if vcpu_is_preempted(cpu)
is true.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Tested-by: Juergen Gross 
Signed-off-by: Pan Xinhui 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Christian Borntraeger 
Acked-by: Paolo Bonzini 
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: benh@kernel.crashing.org
Cc: boqun.feng@gmail.com
Cc: bsingharora@gmail.com
Cc: dave@stgolabs.net
Cc: kernellwp@gmail.com
Cc: konrad.wilk@oracle.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: paulmck@linux.vnet.ibm.com
Cc: paulus@samba.org
Cc: rkrcmar@redhat.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.deacon@arm.com
Cc: xen-devel-request@lists.xenproject.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1478077718-37424-4-git-send-email-xinhui.pan@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar

sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q

2016-11-21T09:29:01+00:00

Currently the wake_q data structure is defined by the WAKE_Q() macro.
This macro, however, looks like a function doing something as "wake" is
a verb. Even checkpatch.pl was confused as it reported warnings like

  WARNING: Missing a blank line after declarations
  #548: FILE: kernel/futex.c:3665:
  +	int ret;
  +	WAKE_Q(wake_q);

This patch renames the WAKE_Q() macro to DEFINE_WAKE_Q() which clarifies
what the macro is doing and eliminates the checkpatch.pl warnings.

Signed-off-by: Waiman Long 
Acked-by: Davidlohr Bueso 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1479401198-1765-1-git-send-email-longman@redhat.com
[ Resolved conflict and added missing rename. ]
Signed-off-by: Ingo Molnar

locking/core: Remove cpu_relax_lowlatency() users

2016-11-16T09:15:10+00:00

With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().

Signed-off-by: Christian Borntraeger 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Catalin Marinas 
Cc: Heiko Carstens 
Cc: Linus Torvalds 
Cc: Martin Schwidefsky 
Cc: Nicholas Piggin 
Cc: Noam Camus 
Cc: Peter Zijlstra 
Cc: Russell King 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar

locking/mutex: Enable optimistic spinning of woken waiter

2016-10-25T09:31:54+00:00

This patch makes the waiter that sets the HANDOFF flag start spinning
instead of sleeping until the handoff is complete or the owner
sleeps. Otherwise, the handoff will cause the optimistic spinners to
abort spinning as the handed-off owner may not be running.

Tested-by: Jason Low 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Davidlohr Bueso 
Cc: Ding Tianhong 
Cc: Imre Deak 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Will Deacon 
Link: http://lkml.kernel.org/r/1472254509-27508-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

locking/mutex: Simplify some ww_mutex code in __mutex_lock_common()

2016-10-25T09:31:53+00:00

This patch removes some of the redundant ww_mutex code in
__mutex_lock_common().

Tested-by: Jason Low 
Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Davidlohr Bueso 
Cc: Ding Tianhong 
Cc: Imre Deak 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tim Chen 
Cc: Will Deacon 
Link: http://lkml.kernel.org/r/1472254509-27508-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar

locking/mutex: Restructure wait loop

2016-10-25T09:31:53+00:00

Doesn't really matter yet, but pull the HANDOFF and trylock out from
under the wait_lock.

The intention is to add an optimistic spin loop here, which requires
we do not hold the wait_lock, so shuffle code around in preparation.

Also clarify the purpose of taking the wait_lock in the wait loop, its
tempting to want to avoid it altogether, but the cancellation cases
need to to avoid losing wakeups.

Suggested-by: Waiman Long 
Tested-by: Jason Low 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

locking/mutex: Add lock handoff to avoid starvation

2016-10-25T09:31:52+00:00

Implement lock handoff to avoid lock starvation.

Lock starvation is possible because mutex_lock() allows lock stealing,
where a running (or optimistic spinning) task beats the woken waiter
to the acquire.

Lock stealing is an important performance optimization because waiting
for a waiter to wake up and get runtime can take a significant time,
during which everyboy would stall on the lock.

The down-side is of course that it allows for starvation.

This patch has the waiter requesting a handoff if it fails to acquire
the lock upon waking. This re-introduces some of the wait time,
because once we do a handoff we have to wait for the waiter to wake up
again.

A future patch will add a round of optimistic spinning to attempt to
alleviate this penalty, but if that turns out to not be enough, we can
add a counter and only request handoff after multiple failed wakeups.

There are a few tricky implementation details:

 - accepting a handoff must only be done in the wait-loop. Since the
   handoff condition is owner == current, it can easily cause
   recursive locking trouble.

 - accepting the handoff must be careful to provide the ACQUIRE
   semantics.

 - having the HANDOFF bit set on unlock requires care, we must not
   clear the owner.

 - we must be careful to not leave HANDOFF set after we've acquired
   the lock. The tricky scenario is setting the HANDOFF bit on an
   unlocked mutex.

Tested-by: Jason Low 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Waiman Long 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

locking/mutex: Rework mutex::owner

2016-10-25T09:31:50+00:00

The current mutex implementation has an atomic lock word and a
non-atomic owner field.

This disparity leads to a number of issues with the current mutex code
as it means that we can have a locked mutex without an explicit owner
(because the owner field has not been set, or already cleared).

This leads to a number of weird corner cases, esp. between the
optimistic spinning and debug code. Where the optimistic spinning
code needs the owner field updated inside the lock region, the debug
code is more relaxed because the whole lock is serialized by the
wait_lock.

Also, the spinning code itself has a few corner cases where we need to
deal with a held lock without an owner field.

Furthermore, it becomes even more of a problem when trying to fix
starvation cases in the current code. We end up stacking special case
on special case.

To solve this rework the basic mutex implementation to be a single
atomic word that contains the owner and uses the low bits for extra
state.

This matches how PI futexes and rt_mutex already work. By having the
owner an integral part of the lock state a lot of the problems
dissapear and we get a better option to deal with starvation cases,
direct owner handoff.

Changing the basic mutex does however invalidate all the arch specific
mutex code; this patch leaves that unused in-place, a later patch will
remove that.

Tested-by: Jason Low 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Will Deacon 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

locking: avoid passing around 'thread_info' in mutex debugging code

2016-06-23T19:11:17+00:00

None of the code actually wants a thread_info, it all wants a
task_struct, and it's just converting back and forth between the two
("ti->task" to get the task_struct from the thread_info, and
"task_thread_info(task)" to go the other way).

No semantic change.

Acked-by: Peter Zijlstra 
Signed-off-by: Linus Torvalds

locking/ww_mutex: Report recursive ww_mutex locking early

2016-06-03T06:37:26+00:00

Recursive locking for ww_mutexes was originally conceived as an
exception. However, it is heavily used by the DRM atomic modesetting
code. Currently, the recursive deadlock is checked after we have queued
up for a busy-spin and as we never release the lock, we spin until
kicked, whereupon the deadlock is discovered and reported.

A simple solution for the now common problem is to move the recursive
deadlock discovery to the first action when taking the ww_mutex.

Suggested-by: Maarten Lankhorst 
Signed-off-by: Chris Wilson 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Maarten Lankhorst 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar