linux-toradex.git/arch/powerpc/kernel, branch v4.4.89

powerpc: Fix DAR reporting when alignment handler faults

2017-09-27T09:00:14+00:00

commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream.

Anton noticed that if we fault part way through emulating an unaligned
instruction, we don't update the DAR to reflect that.

The DAR value is eventually reported back to userspace as the address
in the SEGV signal, and if userspace is using that value to demand
fault then it can be confused by us not setting the value correctly.

This patch is ugly as hell, but is intended to be the minimal fix and
back ports easily.

Signed-off-by: Michael Ellerman 
Reviewed-by: Paul Mackerras 
Signed-off-by: Greg Kroah-Hartman

Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"

2017-08-07T02:19:40+00:00

This reverts commit 8c92870bdbf20b5fa5150a2c8bf53ab498516b24 which is
commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.

Michal Hocko writes:

JFYI. We have encountered a regression after applying this patch on a
large ppc machine. While the patch is the right thing to do it doesn't
work well with the current vmalloc area size on ppc and large machines
where NUMA nodes are very far from each other. Just for the reference
the boot fails on such a machine with bunch of warning preceeding it.
See http://lkml.kernel.org/r/20170724134240.GL25221@dhcp22.suse.cz

It seems the right thing to do is to enlarge the vmalloc space on ppc
but this is not the case in the upstream kernel yet AFAIK. It is also
questionable whether that is a stable material but I will decision on
you here.

We have reverted this patch from our 4.4 based kernel.

Newer kernels do not have enlarged vmalloc space yet AFAIK so they won't
work properly eiter. This bug is quite rare though because you need a
specific HW configuration to trigger the issue - namely NUMA nodes have
to be far away from each other in the physical memory space.

Cc: Michal Hocko 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Signed-off-by: Greg Kroah-Hartman

powerpc/eeh: Enable IO path on permanent error

2017-07-05T12:37:18+00:00

[ Upstream commit 387bbc974f6adf91aa635090f73434ed10edd915 ]

We give up recovery on permanent error, simply shutdown the affected
devices and remove them. If the devices can't be put into quiet state,
they spew more traffic that is likely to cause another unexpected EEH
error. This was observed on "p8dtu2u" machine:

   0002:00:00.0 PCI bridge: IBM Device 03dc
   0002:01:00.0 Ethernet controller: Intel Corporation \
                Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
   0002:01:00.1 Ethernet controller: Intel Corporation \
                Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
   0002:01:00.2 Ethernet controller: Intel Corporation \
                Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
   0002:01:00.3 Ethernet controller: Intel Corporation \
                Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)

On P8 PowerNV platform, the IO path is frozen when shutdowning the
devices, meaning the memory registers are inaccessible. It is why
the devices can't be put into quiet state before removing them.
This fixes the issue by enabling IO path prior to putting the devices
into quiet state.

Reported-by: Pridhiviraj Paidipeddi 
Signed-off-by: Gavin Shan 
Acked-by: Russell Currey 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

powerpc/kprobes: Pause function_graph tracing during jprobes handling

2017-06-29T10:48:51+00:00

commit a9f8553e935f26cb5447f67e280946b0923cd2dc upstream.

This fixes a crash when function_graph and jprobes are used together.
This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix
conflict between jprobes and function graph tracing"), but for powerpc.

Jprobes breaks function_graph tracing since the jprobe hook needs to use
jprobe_return(), which never returns back to the hook, but instead to
the original jprobe'd function. The solution is to momentarily pause
function_graph tracing before invoking the jprobe hook and re-enable it
when returning back to the original jprobe'd function.

Fixes: 6794c78243bf ("powerpc64: port of the function graph tracer")
Signed-off-by: Naveen N. Rao 
Acked-by: Masami Hiramatsu 
Acked-by: Steven Rostedt (VMware) 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/numa: Fix percpu allocations to be NUMA aware

2017-06-14T11:16:25+00:00

commit ba4a648f12f4cd0a8003dd229b6ca8a53348ee4b upstream.

In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/eeh: Avoid use after free in eeh_handle_special_event()

2017-06-14T11:16:25+00:00

commit daeba2956f32f91f3493788ff6ee02fb1b2f02fa upstream.

eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE.  This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Reported-by: Alexey Kardashevskiy 
Signed-off-by: Russell Currey 
Reviewed-by: Gavin Shan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/64e: Fix hang when debugging programs with relocated kernel

2017-05-25T12:30:15+00:00

commit fd615f69a18a9d4aa5ef02a1dc83f319f75da8e7 upstream.

Debug interrupts can be taken during interrupt entry, since interrupt
entry does not automatically turn them off.  The kernel will check
whether the faulting instruction is between [interrupt_base_book3e,
__end_interrupts], and if so clear MSR[DE] and return.

However, when the kernel is built with CONFIG_RELOCATABLE, it can't use
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) and
LOAD_REG_IMMEDIATE(r15,__end_interrupts), as they ignore relocation.
Thus, if the kernel is actually running at a different address than it
was built at, the address comparison will fail, and the exception entry
code will hang at kernel_dbg_exc.

r2(toc) is also not usable here, as r2 still holds data from the
interrupted context, so LOAD_REG_ADDR() doesn't work either.  So we use
the *name@got* to get the EV of two labels directly.

Test programs test.c shows as follows:
int main(int argc, char *argv[])
{
	if (access("/proc/sys/kernel/perf_event_paranoid", F_OK) == -1)
		printf("Kernel doesn't have perf_event support\n");
}

Steps to reproduce the bug, for example:
 1) ./gdb ./test
 2) (gdb) b access
 3) (gdb) r
 4) (gdb) s

Signed-off-by: Liu Hailong 
Signed-off-by: Jiang Xuexin 
Reviewed-by: Jiang Biao 
Reviewed-by: Liu Song 
Reviewed-by: Huang Jian 
[scottwood: cleaned up commit message, and specified bad behavior
 as a hang rather than an oops to correspond to mainline kernel behavior]
Fixes: 1cb6e0649248 ("powerpc/book3e: support CONFIG_RELOCATABLE")
Signed-off-by: Scott Wood 
Signed-off-by: Greg Kroah-Hartman

powerpc/book3s/mce: Move add_taint() later in virtual mode

2017-05-25T12:30:14+00:00

commit d93b0ac01a9ce276ec39644be47001873d3d183c upstream.

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads to unexpected behaviour or the
kernel to panic. Instead move the add_taint() call later in the virtual
mode where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.

Fixes: 27ea2c420cad ("powerpc: Set the correct kernel taint on machine check errors.")
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction

2017-04-27T07:09:33+00:00

commit 9e1ba4f27f018742a1aa95d11e35106feba08ec1 upstream.

If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel
OOPS:

  Bad kernel stack pointer cd93c840 at c000000000009868
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  ...
  GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840
  ...
  NIP [c000000000009868] resume_kernel+0x2c/0x58
  LR [c000000000006208] program_check_common+0x108/0x180

On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().

resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.

Fix this by loading the 64-bit value instead.

Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()")
Signed-off-by: Ravi Bangoria 
Reviewed-by: Naveen N. Rao 
Reviewed-by: Ananth N Mavinakayanahalli 
[mpe: Change log massage, add stable tag]
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc: Disable HFSCR[TM] if TM is not supported

2017-04-21T07:30:06+00:00

commit 7ed23e1bae8bf7e37fd555066550a00b95a3a98b upstream.

On Power8 & Power9 the early CPU inititialisation in __init_HFSCR()
turns on HFSCR[TM] (Hypervisor Facility Status and Control Register
[Transactional Memory]), but that doesn't take into account that TM
might be disabled by CPU features, or disabled by the kernel being built
with CONFIG_PPC_TRANSACTIONAL_MEM=n.

So later in boot, when we have setup the CPU features, clear HSCR[TM] if
the TM CPU feature has been disabled. We use CPU_FTR_TM_COMP to account
for the CONFIG_PPC_TRANSACTIONAL_MEM=n case.

Without this a KVM guest might try use TM, even if told not to, and
cause an oops in the host kernel. Typically the oops is seen in
__kvmppc_vcore_entry() and may or may not be fatal to the host, but is
always bad news.

In practice all shipping CPU revisions do support TM, and all host
kernels we are aware of build with TM support enabled, so no one should
actually be able to hit this in the wild.

Fixes: 2a3563b023e5 ("powerpc: Setup in HFSCR for POWER8")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Benjamin Herrenschmidt 
Tested-by: Sam Bobroff 
[mpe: Rewrite change log with input from Sam, add Fixes/stable]
Signed-off-by: Michael Ellerman 
[sb: Backported to linux-4.4.y: adjusted context]
Signed-off-by: Sam Bobroff 
Signed-off-by: Greg Kroah-Hartman