linux-toradex.git/arch/powerpc, branch v3.12.12

powerpc: Make sure "cache" directory is removed when offlining cpu

2014-02-06T19:22:22+00:00

commit 91b973f90c1220d71923e7efe1e61f5329806380 upstream.

The code in remove_cache_dir() is supposed to remove the "cache"
subdirectory from the sysfs directory for a CPU when that CPU is
being offlined.  It tries to do this by calling kobject_put() on
the kobject for the subdirectory.  However, the subdirectory only
gets removed once the last reference goes away, and the reference
being put here may well not be the last reference.  That means
that the "cache" subdirectory may still exist when the offlining
operation has finished.  If the same CPU subsequently gets onlined,
the code tries to add a new "cache" subdirectory.  If the old
subdirectory has not yet been removed, we get a WARN_ON in the
sysfs code, with stack trace, and an error message printed on the
console.  Further, we ultimately end up with an online cpu with no
"cache" subdirectory.

This fixes it by doing an explicit kobject_del() at the point where
we want the subdirectory to go away.  kobject_del() removes the sysfs
directory even though the object still exists in memory.  The object
will get freed at some point in the future.  A subsequent onlining
operation can create a new sysfs directory, even if the old object
still exists in memory, without causing any problems.

Signed-off-by: Paul Mackerras 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc: Fix the setup of CPU-to-Node mappings during CPU online

2014-02-06T19:22:22+00:00

commit d4edc5b6c480a0917e61d93d55531d7efa6230be upstream.

On POWER platforms, the hypervisor can notify the guest kernel about dynamic
changes in the cpu-numa associativity (VPHN topology update). Hence the
cpu-to-node mappings that we got from the firmware during boot, may no longer
be valid after such updates. This is handled using the arch_update_cpu_topology()
hook in the scheduler, and the sched-domains are rebuilt according to the new
mappings.

But unfortunately, at the moment, CPU hotplug ignores these updated mappings
and instead queries the firmware for the cpu-to-numa relationships and uses
them during CPU online. So the kernel can end up assigning wrong NUMA nodes
to CPUs during subsequent CPU hotplug online operations (after booting).

Further, a particularly problematic scenario can result from this bug:
On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
threads per core. The switch to Single-Threaded (ST) mode is performed by
offlining all except the first CPU thread in each core. Switching back to
SMT mode involves onlining those other threads back, in each core.

Now consider this scenario:

1. During boot, the kernel gets the cpu-to-node mappings from the firmware
   and assigns the CPUs to NUMA nodes appropriately, during CPU online.

2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
   communicates this update to the kernel. The kernel in turn updates its
   cpu-to-node associations and rebuilds its sched domains. Everything is
   fine so far.

3. Now, the user switches the machine from SMT to ST mode (say, by running
   ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
   core.

4. The user then tries to switch back from ST to SMT mode (say, by running
   ppc64_cpu --smt=4), and this involves onlining those threads back. Since
   CPU hotplug ignores the new mappings, it queries the firmware and tries to
   associate the newly onlined sibling threads to the old NUMA nodes. This
   results in sibling threads within the same core getting associated with
   different NUMA nodes, which is incorrect.

   The scheduler's build-sched-domains code gets thoroughly confused with this
   and enters an infinite loop and causes soft-lockups, as explained in detail
   in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings).

So to fix this, use the numa_cpu_lookup_table to remember the updated
cpu-to-node mappings, and use them during CPU hotplug online operations.
Further, we also need to ensure that all threads in a core are assigned to a
common NUMA node, irrespective of whether all those threads were online during
the topology update. To achieve this, we take care not to use cpu_sibling_mask()
since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
and set up the mappings manually using the 'threads_per_core' value for that
particular platform. This helps us ensure that we don't hit this bug with any
combination of CPU hotplug and SMT mode switching.

Signed-off-by: Srivatsa S. Bhat 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()

2014-02-06T19:22:21+00:00

commit 70713fe315ed14cd1bb07d1a7f33e973d136ae3d upstream.

Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().

Signed-off-by: Mihai Caraman 
Signed-off-by: Alexander Graf 
Signed-off-by: Greg Kroah-Hartman

KVM: PPC: Book3S HV: use xics_wake_cpu only when defined

2014-02-06T19:22:21+00:00

commit 48eaef0518a565d3852e301c860e1af6a6db5a84 upstream.

Signed-off-by: Andreas Schwab 
Signed-off-by: Alexander Graf 
Signed-off-by: Greg Kroah-Hartman

bpf: do not use reciprocal divide

2014-02-06T19:22:20+00:00

[ Upstream commit aee636c4809fa54848ff07a899b326eb1f9987a2 ]

At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet 
Reported-by: Jakub Zawadzki 
Cc: Mircea Gherzan 
Cc: Daniel Borkmann 
Cc: Hannes Frederic Sowa 
Cc: Matt Evans 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: David S. Miller 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

powerpc: Align p_end

2014-01-09T20:25:12+00:00

commit 286e4f90a72c0b0621dde0294af6ed4b0baddabb upstream.

p_end is an 8 byte value embedded in the text section. This means it
is only 4 byte aligned when it should be 8 byte aligned. Fix this
by adding an explicit alignment.

This fixes an issue where POWER7 little endian builds with
CONFIG_RELOCATABLE=y fail to boot.

Signed-off-by: Anton Blanchard 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc: Fix bad stack check in exception entry

2014-01-09T20:25:12+00:00

commit 90ff5d688e61f49f23545ffab6228bd7e87e6dc7 upstream.

In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel.  If it's not valid, we die but
with a nice oops message.

Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative.  Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.

This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE.  With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.

Kudos to Paulus for finding this.

Signed-off-by: Michael Neuling 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc: kvm: fix rare but potential deadlock scene

2014-01-09T20:25:07+00:00

commit 91648ec09c1ef69c4d840ab6dab391bfb452d554 upstream.

Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.

Suppose the following scene:

Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of
vcpus.

If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched
out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be
caught in realmode for a long time.

What makes things even worse if the following happens,
  On cpuM, bitlockX is hold, on cpuN, Y is hold.
  vcpu_B_2 try to lock Y on cpuM in realmode
  vcpu_A_2 try to lock X on cpuN in realmode

Oops! deadlock happens

Signed-off-by: Liu Ping Fan 
Reviewed-by: Paul Mackerras 
Signed-off-by: Alexander Graf 
Signed-off-by: Greg Kroah-Hartman

powerpc: Fix PTE page address mismatch in pgtable ctor/dtor

2013-12-20T15:48:56+00:00

commit cf77ee54362a245f9a01f240adce03a06c05eb68 upstream.

In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
not been converted by page_address() to the newly allocated PTE page.

When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
with an address to the PTE page that has been converted by page_address().
The mismatch in the PTE's page address causes pgtable_page_dtor() to access
invalid memory, so resources for that PTE (such as the page lock) is not
properly cleaned up.

On PPC32, only SMP kernels are affected.

On PPC64, only SMP kernels with 4K page size are affected.

This bug was introduced by commit d614bb041209fd7cb5e4b35e11a7b2f6ee8f62b8
"powerpc: Move the pte free routines from common header".

On a preempt-rt kernel, a spinlock is dynamically allocated for each
PTE in pgtable_page_ctor().  When the PTE is freed, calling
pgtable_page_dtor() with a mismatched page address causes a memory leak,
as the pointer to the PTE's spinlock is bogus.

On mainline, there isn't any immediately obvious symptoms, but the
problem still exists here.

Fixes: d614bb041209fd7c "powerpc: Move the pte free routes from common header"
Cc: Paul Mackerras 
Cc: Aneesh Kumar K.V 
Cc: Benjamin Herrenschmidt 
Signed-off-by: Hong H. Pham 
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman

powerpc/signals: Improved mark VSX not saved with small contexts fix

2013-12-04T19:05:54+00:00

commit ec67ad82814bee92251fd963bf01c7a173856555 upstream.

In a recent patch:
  commit c13f20ac48328b05cd3b8c19e31ed6c132b44b42
  Author: Michael Neuling 
  powerpc/signals: Mark VSX not saved with small contexts

We fixed an issue but an improved solution was later discussed after the patch
was merged.

Firstly, this patch doesn't handle the 64bit signals case, which could also hit
this issue (but has never been reported).

Secondly, the original patch isn't clear what MSR VSX should be set to.  The
new approach below always clears the MSR VSX bit (to indicate no VSX is in the
context) and sets it only in the specific case where VSX is available (ie. when
VSX has been used and the signal context passed has space to provide the
state).

This reverts the original patch and replaces it with the improved solution.  It
also adds a 64 bit version.

Signed-off-by: Michael Neuling 
Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Greg Kroah-Hartman