linux-toradex.git/arch/s390/include/asm/spinlock.h, branch v4.4.97

s390/spinlock: remove unneeded serializations at unlock

2015-10-14T12:32:25+00:00

the kernel locks have aqcuire/release semantics. No operation done
after the lock can be "moved" before the lock and no operation before
the unlock can be moved after the unlock. But it is perfectly fine
that memory accesses which happen code wise after unlock are performed
within the critical section.
On s390x, reads are in-order with other reads (PoP section
"Storage-Operand Fetch References") and writes are in-order with
other writes (PoP section "Storage-Operand Store References"). Writes
are also in-order with reads to the same memory location (PoP section
"Storage-Operand Store References"). To other CPUs (and the channel
subsystem), reads additionally appear to be performed prior to reads or
writes that happen after them in the conceptual sequence (PoP section
"Relation between Operand Accesses").
So at least as observed by other CPUs and the channel subsystem, reads
inside the critical sections will not happen after unlock (and writes
are in-order anyway). That's exactly what we need for "RELEASE
operations" (memory-barriers.txt): "It guarantees that all memory
operations before the RELEASE operation will appear to happen before the
RELEASE operation with respect to the other components of the system."

Suggested-by: Paolo Bonzini 
Signed-off-by: Christian Borntraeger 
Reviewed-By: Sascha Silbe 
[cross-reading and lot of improvements for the patch description]
Signed-off-by: Martin Schwidefsky

s390/cmpxchg: use compiler builtins

2014-11-03T12:29:47+00:00

The kernel build for s390 fails for gcc compilers with version 3.x,
set the minimum required version of gcc to version 4.3.

As the atomic builtins are available with all gcc 4.x compilers,
use the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
functions to replace the complex macro and inline assembler magic
in include/asm/cmpxchg.h. The compiler can just-do-it and generates
better code with the builtins.

While we are at it use __sync_bool_compare_and_swap for the
_raw_compare_and_swap function in the spinlock code as well.

Signed-off-by: Martin Schwidefsky

s390/rwlock: use the interlocked-access facility 1 instructions

2014-09-25T08:52:13+00:00

Make use of the load-and-add, load-and-or and load-and-and instructions
to atomically update the read-write lock without a compare-and-swap loop.

Signed-off-by: Martin Schwidefsky

s390/rwlock: remove interrupt-enabling rwlock variant.

2014-09-25T08:52:10+00:00

Signed-off-by: Martin Schwidefsky

s390/rwlock: use directed yield for write-locked rwlocks

2014-09-25T08:52:05+00:00

Add an owner field to the arch_rwlock_t to be able to pass the timeslice
of a virtual CPU with diagnose 0x9c to the lock owner in case the rwlock
is write-locked. The undirected yield in case the rwlock is acquired
writable but the lock is read-locked is removed.

Signed-off-by: Martin Schwidefsky

s390/spinlock: optimize spin_unlock code

2014-09-09T06:53:30+00:00

Use a memory barrier + store sequence instead of a load + compare and swap
sequence to unlock a spinlock and an rw lock.
For the spinlock case this saves us two memory reads and a not needed cpu
serialization after the compare and swap instruction stored the new value.

The kernel size (performance_defconfig) gets reduced by ~14k.

Average execution time of a tight inlined spin_unlock loop drops from
5.8ns to 0.7ns on a zEC12 machine.

An artificial stress test case where several counters are protected with
a single spinlock and which are only incremented while holding the spinlock
shows ~30% improvement on a 4 cpu machine.

Signed-off-by: Heiko Carstens 
Signed-off-by: Martin Schwidefsky

s390/spinlock,rwlock: always to a load-and-test first

2014-05-20T06:58:53+00:00

In case a lock is contended it is better to do a load-and-test first
before trying to get the lock with compare-and-swap. This helps to avoid
unnecessary cache invalidations of the cacheline for the lock if the
CPU has to wait for the lock. For an uncontended lock doing the
compare-and-swap directly is a bit better, if the CPU does not have the
cacheline in its cache yet the compare-and-swap will get it read-write
immediately while a load-and-test would get it read-only first.

Always to the load-and-test first to avoid the cacheline invalidations
for the contended case outweight the potential read-only to read-write
cacheline upgrade for the uncontended case.

Signed-off-by: Martin Schwidefsky

s390/spinlock: optimize spinlock code sequence

2014-05-20T06:58:42+00:00

Use lowcore constant to improve the code generated for spinlocks.

[ Martin Schwidefsky: patch breakdown and code beautification ]

Signed-off-by: Philipp Hachtmann 
Signed-off-by: Martin Schwidefsky

s390/spinlock: cleanup spinlock code

2014-05-20T06:58:41+00:00

Improve the spinlock code in several aspects:
 - Have _raw_compare_and_swap return true if the operation has been
   successful instead of returning the old value.
 - Remove the "volatile" from arch_spinlock_t and arch_rwlock_t
 - Rename 'owner_cpu' to 'lock'
 - Add helper functions arch_spin_trylock_once / arch_spin_tryrelease_once

[ Martin Schwidefsky: patch breakdown and code beautification ]

Signed-off-by: Philipp Hachtmann 
Signed-off-by: Martin Schwidefsky

s390: enable ARCH_USE_CMPXCHG_LOCKREF

2013-09-28T10:46:29+00:00

Enable ARCH_USE_CMPXCHG_LOCKREF since it shows performance improvements
with Linus' simple stat() test case of up to 50% on a 30 cpu system.

Signed-off-by: Heiko Carstens