linux-toradex.git/arch/powerpc/lib, branch v4.1-rc2

powerpc: Replace mem_init_done with slab_is_available()

2015-04-10T10:02:48+00:00

We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman

Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test

2015-03-26T09:04:28+00:00

Merge miscellaneous bits from benh. Fix a minor conflict with
OpalMessageType changing names to opal_msg_type.

cpufreq/ppc: Add missing #include

2015-03-25T05:53:28+00:00

If CONFIG_SMP=n,  does not include , causing:

drivers/cpufreq/ppc-corenet-cpufreq.c: In function 'corenet_cpufreq_cpu_init':
drivers/cpufreq/ppc-corenet-cpufreq.c:173:3: error: implicit declaration of function 'get_hard_smp_processor_id' [-Werror=implicit-funcuresh E. Warrier" 
X-Patchwork-Id: 443703
Message-Id: <54EE5989.7010800@linux.vnet.ibm.com>
To: linuxppc-dev@ozlabs.org
Date: Wed, 25 Feb 2015 17:23:53 -0600

Export __spin_yield so that the arch_spin_unlock() function can
be invoked from a module. This will be required for modules where
we want to take a lock that is also is acquired in hypervisor
real mode. Because we want to avoid running any lockdep code
(which may not be safe in real mode), this lock needs to be
an arch_spinlock_t instead of a normal spinlock.

Signed-off-by: Suresh Warrier 
Acked-by: Paul Mackerras 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Remove duplicate cacheable_memcpy/memzero functions

2015-03-17T00:25:50+00:00

These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett 
Signed-off-by: Benjamin Herrenschmidt

powerpc: Delete unnecessary checks before kfree()

2015-03-16T07:50:14+00:00

The kfree() function tests whether its argument is NULL and then returns
immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
Signed-off-by: Michael Ellerman

powerpc: Change vsrX register defines to vsX to match gcc and glibc

2015-03-16T07:32:11+00:00

As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VSX register definitions - the kernel uses vsrX whereas gcc uses
vsX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman

powerpc: Change vrX register defines to vX to match gcc and glibc

2015-03-16T07:32:11+00:00

As our various loops (copy, string, crypto etc) get more complicated,
we want to share implementations between userspace (eg glibc) and
the kernel. We also want to write userspace test harnesses to put
in tools/testing/selftest.

One gratuitous difference between userspace and the kernel is the
VMX register definitions - the kernel uses vrX whereas both gcc and
glibc use vX.

Change the kernel to match userspace.

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman

powerpc/lib: Makefile, use obj64-y to consolidate 64-bit rules

2015-01-28T04:00:24+00:00

Signed-off-by: Michael Ellerman

powerpc/lib: Makefile, consolidate obj-y sections

2015-01-28T04:00:24+00:00

Signed-off-by: Michael Ellerman

powerpc: Add 64bit optimised memcmp

2015-01-23T03:02:55+00:00

I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman