summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2018-10-26Merge tag 'powerpc-4.20-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - A large series to rewrite our SLB miss handling, replacing a lot of fairly complicated asm with much fewer lines of C. - Following on from that, we now maintain a cache of SLB entries for each process and preload them on context switch. Leading to a 27% speedup for our context switch benchmark on Power9. - Improvements to our handling of SLB multi-hit errors. We now print more debug information when they occur, and try to continue running by flushing the SLB and reloading, rather than treating them as fatal. - Enable THP migration on 64-bit Book3S machines (eg. Power7/8/9). - Add support for physical memory up to 2PB in the linear mapping on 64-bit Book3S. We only support up to 512TB as regular system memory, otherwise the percpu allocator runs out of vmalloc space. - Add stack protector support for 32 and 64-bit, with a per-task canary. - Add support for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP. - Support recognising "big cores" on Power9, where two SMT4 cores are presented to us as a single SMT8 core. - A large series to cleanup some of our ioremap handling and PTE flags. - Add a driver for the PAPR SCM (storage class memory) interface, allowing guests to operate on SCM devices (acked by Dan). - Changes to our ftrace code to handle very large kernels, where we need to use a trampoline to get to ftrace_caller(). And many other smaller enhancements and cleanups. Thanks to: Alan Modra, Alistair Popple, Aneesh Kumar K.V, Anton Blanchard, Aravinda Prasad, Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Breno Leitao, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Dan Carpenter, Daniel Axtens, Finn Thain, Gautham R. Shenoy, Gustavo Romero, Haren Myneni, Hari Bathini, Jia Hongtao, Joel Stanley, John Allen, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Hairgrove, Masahiro Yamada, Michael Bringmann, Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras, Petr Vorel, Rashmica Gupta, Reza Arbab, Rob Herring, Sam Bobroff, Samuel Mendoza-Jonas, Scott Wood, Stan Johnson, Stephen Rothwell, Stewart Smith, Suraj Jitindar Singh, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, YueHaibing, zhong jiang" * tag 'powerpc-4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (221 commits) Revert "selftests/powerpc: Fix out-of-tree build errors" powerpc/msi: Fix compile error on mpc83xx powerpc: Fix stack protector crashes on CPU hotplug powerpc/traps: restore recoverability of machine_check interrupts powerpc/64/module: REL32 relocation range check powerpc/64s/radix: Fix radix__flush_tlb_collapsed_pmd double flushing pmd selftests/powerpc: Add a test of wild bctr powerpc/mm: Fix page table dump to work on Radix powerpc/mm/radix: Display if mappings are exec or not powerpc/mm/radix: Simplify split mapping logic powerpc/mm/radix: Remove the retry in the split mapping logic powerpc/mm/radix: Fix small page at boundary when splitting powerpc/mm/radix: Fix overuse of small pages in splitting logic powerpc/mm/radix: Fix off-by-one in split mapping logic powerpc/ftrace: Handle large kernel configs powerpc/mm: Fix WARN_ON with THP NUMA migration selftests/powerpc: Fix out-of-tree build errors powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selected powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64 powerpc/time: isolate scaled cputime accounting in dedicated functions. ...
2018-10-26Merge tag 'dma-mapping-4.20-1' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull more dma-mapping updates from Christoph Hellwig: - various swiotlb cleanups - do not dip into the ѕwiotlb pool for dma coherent allocations - add support for not cache coherent DMA to swiotlb - switch ARM64 to use the generic swiotlb_dma_ops * tag 'dma-mapping-4.20-1' of git://git.infradead.org/users/hch/dma-mapping: arm64: use the generic swiotlb_dma_ops swiotlb: add support for non-coherent DMA swiotlb: don't dip into swiotlb pool for coherent allocations swiotlb: refactor swiotlb_map_page swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrs swiotlb: merge swiotlb_unmap_page and unmap_single swiotlb: remove the overflow buffer swiotlb: do not panic on mapping failures swiotlb: mark is_swiotlb_buffer static swiotlb: remove a pointless comment
2018-10-25Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "ARM: - Improved guest IPA space support (32 to 52 bits) - RAS event delivery for 32bit - PMU fixes - Guest entry hardening - Various cleanups - Port of dirty_log_test selftest PPC: - Nested HV KVM support for radix guests on POWER9. The performance is much better than with PR KVM. Migration and arbitrary level of nesting is supported. - Disable nested HV-KVM on early POWER9 chips that need a particular hardware bug workaround - One VM per core mode to prevent potential data leaks - PCI pass-through optimization - merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base s390: - Initial version of AP crypto virtualization via vfio-mdev - Improvement for vfio-ap - Set the host program identifier - Optimize page table locking x86: - Enable nested virtualization by default - Implement Hyper-V IPI hypercalls - Improve #PF and #DB handling - Allow guests to use Enlightened VMCS - Add migration selftests for VMCS and Enlightened VMCS - Allow coalesced PIO accesses - Add an option to perform nested VMCS host state consistency check through hardware - Automatic tuning of lapic_timer_advance_ns - Many fixes, minor improvements, and cleanups" * tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned Revert "kvm: x86: optimize dr6 restore" KVM: PPC: Optimize clearing TCEs for sparse tables x86/kvm/nVMX: tweak shadow fields selftests/kvm: add missing executables to .gitignore KVM: arm64: Safety check PSTATE when entering guest and handle IL KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips arm/arm64: KVM: Enable 32 bits kvm vcpu events support arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension() KVM: arm64: Fix caching of host MDCR_EL2 value KVM: VMX: enable nested virtualization by default KVM/x86: Use 32bit xor to clear registers in svm.c kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD kvm: vmx: Defer setting of DR6 until #DB delivery kvm: x86: Defer setting of CR2 until #PF delivery kvm: x86: Add payload operands to kvm_multiple_exception kvm: x86: Add exception payload fields to kvm_vcpu_events kvm: x86: Add has_payload and payload to kvm_queued_exception KVM: Documentation: Fix omission in struct kvm_vcpu_events KVM: selftests: add Enlightened VMCS test ...
2018-10-25Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
2018-10-24Merge branch 'next-general' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this patchset, there are a couple of minor updates, as well as some reworking of the LSM initialization code from Kees Cook (these prepare the way for ordered stackable LSMs, but are a valuable cleanup on their own)" * 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: LSM: Don't ignore initialization failures LSM: Provide init debugging infrastructure LSM: Record LSM name in struct lsm_info LSM: Convert security_initcall() into DEFINE_LSM() vmlinux.lds.h: Move LSM_TABLE into INIT_DATA LSM: Convert from initcall to struct lsm_info LSM: Remove initcall tracing LSM: Rename .security_initcall section to .lsm_info vmlinux.lds.h: Avoid copy/paste of security_init section LSM: Correctly announce start of LSM initialization security: fix LSM description location keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h seccomp: remove unnecessary unlikely() security: tomoyo: Fix obsolete function security/capabilities: remove check for -EINVAL
2018-10-24Merge branch 'siginfo-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
2018-10-21powerpc: Fix stack protector crashes on CPU hotplugMichael Ellerman
Recently in commit 7241d26e8175 ("powerpc/64: properly initialise the stackprotector canary on SMP.") we fixed a crash with stack protector on SMP by initialising the stack canary in cpu_idle_thread_init(). But this can also causes crashes, when a CPU comes back online after being offline: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pnv_smp_cpu_kill_self+0x2a0/0x2b0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc3-gcc-7.3.1-00168-g4ffe713b7587 #94 Call Trace: dump_stack+0xb0/0xf4 (unreliable) panic+0x144/0x328 __stack_chk_fail+0x2c/0x30 pnv_smp_cpu_kill_self+0x2a0/0x2b0 cpu_die+0x48/0x70 arch_cpu_idle_dead+0x20/0x40 do_idle+0x274/0x390 cpu_startup_entry+0x38/0x50 start_secondary+0x5e4/0x600 start_secondary_prolog+0x10/0x14 Looking at the stack we see that the canary value in the stack frame doesn't match the canary in the task/paca. That is because we have reinitialised the task/paca value, but then the CPU coming online has returned into a function using the old canary value. That causes the comparison to fail. Instead we can call boot_init_stack_canary() from start_secondary() which never returns. This is essentially what the generic code does in cpu_startup_entry() under #ifdef X86, we should make that non-x86 specific in a future patch. Fixes: 7241d26e8175 ("powerpc/64: properly initialise the stackprotector canary on SMP.") Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
2018-10-20powerpc/traps: restore recoverability of machine_check interruptsChristophe Leroy
commit b96672dd840f ("powerpc: Machine check interrupt is a non- maskable interrupt") added a call to nmi_enter() at the beginning of machine check restart exception handler. Due to that, in_interrupt() always returns true regardless of the state before entering the exception, and die() panics even when the system was not already in interrupt. This patch calls nmi_exit() before calling die() in order to restore the interrupt state we had before calling nmi_enter() Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/64/module: REL32 relocation range checkNicholas Piggin
The recent module relocation overflow crash demonstrated that we have no range checking on REL32 relative relocations. This patch implements a basic check, the same kernel that previously oopsed and rebooted now continues with some of these errors when loading the module: module_64: x_tables: REL32 527703503449812 out of range! Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have overflow checks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/ftrace: Handle large kernel configsNaveen N. Rao
Currently, we expect to be able to reach ftrace_caller() from all ftrace-enabled functions through a single relative branch. With large kernel configs, we see functions outside of 32MB of ftrace_caller() causing ftrace_init() to bail. In such configurations, gcc/ld emits two types of trampolines for mcount(): 1. A long_branch, which has a single branch to mcount() for functions that are one hop away from mcount(): c0000000019e8544 <00031b56.long_branch._mcount>: c0000000019e8544: 4a 69 3f ac b c00000000007c4f0 <._mcount> 2. A plt_branch, for functions that are farther away from mcount(): c0000000051f33f8 <0008ba04.plt_branch._mcount>: c0000000051f33f8: 3d 82 ff a4 addis r12,r2,-92 c0000000051f33fc: e9 8c 04 20 ld r12,1056(r12) c0000000051f3400: 7d 89 03 a6 mtctr r12 c0000000051f3404: 4e 80 04 20 bctr We can reuse those trampolines for ftrace if we can have those trampolines go to ftrace_caller() instead. However, with ABIv2, we cannot depend on r2 being valid. As such, we use only the long_branch trampolines by patching those to instead branch to ftrace_caller or ftrace_regs_caller. In addition, we add additional trampolines around .text and .init.text to catch locations that are covered by the plt branches. This allows ftrace to work with most large kernel configurations. For now, we always patch the trampolines to go to ftrace_regs_caller, which is slightly inefficient. This can be optimized further at a later point. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/time: no steal_time when CONFIG_PPC_SPLPAR is not selectedChristophe Leroy
If CONFIG_PPC_SPLPAR is not selected, steal_time will always be NUL, so accounting it is pointless Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64Christophe Leroy
scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. Removing it on PPC32 significantly reduces the size of vtime_account_system() and vtime_account_idle() on an 8xx: Before: 00000000 l F .text 000000a8 vtime_delta 00000280 g F .text 0000010c vtime_account_system 0000038c g F .text 00000048 vtime_account_idle After: (vtime_delta gets inlined inside the two functions) 000001d8 g F .text 000000a0 vtime_account_system 00000278 g F .text 00000038 vtime_account_idle In terms of performance, we also get approximatly 7% improvement on task switch. The following small benchmark app is run with perf stat: void *thread(void *arg) { int i; for (i = 0; i < atoi((char*)arg); i++) pthread_yield(); } int main(int argc, char **argv) { pthread_t th1, th2; pthread_create(&th1, NULL, thread, argv[1]); pthread_create(&th2, NULL, thread, argv[1]); pthread_join(th1, NULL); pthread_join(th2, NULL); return 0; } Before the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 8228.476465 task-clock (msec) # 0.954 CPUs utilized ( +- 0.23% ) 200004 context-switches # 0.024 M/sec ( +- 0.00% ) After the patch: Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs): 7649.070444 task-clock (msec) # 0.955 CPUs utilized ( +- 0.27% ) 200004 context-switches # 0.026 M/sec ( +- 0.00% ) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/time: isolate scaled cputime accounting in dedicated functions.Christophe Leroy
scaled cputime is only meaningfull when the processor has SPURR and/or PURR, which means only on PPC64. In preparation of the following patch that will remove CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC32, this patch moves all scaled cputing accounting logic into dedicated functions. This patch doesn't change any functionality. It's only code reorganisation. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()Christophe Leroy
Generic implementation fails to remove breakpoints after init when CONFIG_STRICT_KERNEL_RWX is selected: [ 13.251285] KGDB: BP remove failed: c001c338 [ 13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa [ 13.268969] KGDB: re-enter exception: ALL breakpoints killed [ 13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 #860 [ 13.282836] Call Trace: [ 13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable) [ 13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98 [ 13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700 [ 13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4 [ 13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4 [ 13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158 [ 13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c [ 13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720 [ 13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc [ 13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac [ 13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4 [ 13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38 [ 13.356346] Kernel panic - not syncing: Recursive entry to debugger This patch creates powerpc specific version of kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() using patch_instruction() Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20powerpc64/module elfv1: Set opd addresses after module relocationNaveen N. Rao
module_frob_arch_sections() is called before the module is moved to its final location. The function descriptor section addresses we are setting here are thus invalid. Fix this by processing opd section during module_finalize() Fixes: 5633e85b2c313 ("powerpc64: Add .opd based function descriptor dereference") Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19swiotlb: remove the overflow bufferChristoph Hellwig
Like all other dma mapping drivers just return an error code instead of an actual memory buffer. The reason for the overflow buffer was that at the time swiotlb was invented there was no way to check for dma mapping errors, but this has long been fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19powerpc/time: Fix clockevent_decrementer initalisation for PR KVMMichael Ellerman
In the recent commit 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") we changed the way we initialise the decrementer clockevent(s). We no longer initialise the mult & shift values of decrementer_clockevent itself. This has the effect of breaking PR KVM, because it uses those values in kvmppc_emulate_dec(). The symptom is guest kernels spin forever mid-way through boot. For now fix it by assigning back to decrementer_clockevent the mult and shift values. Fixes: 8b78fdb045de ("powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc: Add -Werror at arch/powerpc levelMichael Ellerman
Back when I added -Werror in commit ba55bd74360e ("powerpc: Add configurable -Werror for arch/powerpc") I did it by adding it to most of the arch Makefiles. At the time we excluded math-emu, because apparently it didn't build cleanly. But that seems to have been fixed somewhere in the interim. So move the -Werror addition to the top-level of the arch, this saves us from repeating it in every Makefile and means we won't forget to add it to any new sub-dirs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/traps: remove redundant in_interrupt panic in die()Christophe Leroy
do_exit() already includes a test to panic() is in_interrupt() This patch removes powerpc one which is redundant. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Generate "phandle" instead of "linux, phandle"Benjamin Herrenschmidt
When creating the boot-time FDT from an actual Open Firmware live tree, let's generate "phandle" properties for the phandles instead of the old deprecated "linux,phandle". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Unsplit warning printf()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc: Check prom_init for disallowed sectionsBenjamin Herrenschmidt
prom_init.c must not modify the kernel image outside of the .bss.prominit section. Thus make sure that prom_init.o doesn't have anything in any of these: .data .bss .init.data Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Move __prombss to it's own section and store it in .bssBenjamin Herrenschmidt
This makes __prombss its own section, and for now store it in .bss. This will give us the ability later to store it elsewhere and/or free it after boot (it's about 8KB). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Move a few remaining statics to appropriate sectionsBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Move const structures to __initconstBenjamin Herrenschmidt
As they are no longer used past the end of prom_init Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Move ibm_arch_vec to __prombssBenjamin Herrenschmidt
Make the existing initialized definition constant and copy it to a __prombss copy Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Move prom_radix_disable to __prombssBenjamin Herrenschmidt
Initialize it dynamically instead of statically Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Remove support for OPAL v2Benjamin Herrenschmidt
We removed support for running under any OPAL version earlier than v3 in 2015 (they never saw the light of day anyway), but we kept some leftovers of this support in prom_init.c, so let's take it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/prom_init: Replace __initdata with __prombss when applicableBenjamin Herrenschmidt
This replaces all occurrences of __initdata for uninitialized data with a new __prombss Currently __promdata is defined to be __initdata but we'll eventually change that. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/pseries: PAPR persistent memory supportOliver O'Halloran
This patch implements support for discovering storage class memory devices at boot and for handling hotplug of new regions via RTAS hotplug events. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Fix CONFIG_MEMORY_HOTPLUG=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19powerpc/traps: fix machine check handlers to use pr_cont()Christophe Leroy
When printing the machine check cause, the cause appears on the following line due to bad use of printk without \n: [ 33.663993] Machine check in kernel mode. [ 33.664011] Caused by (from SRR1=9032): [ 33.664036] Data access error at address c90c8000 This patch fixes it by using pr_cont() for the second part: [ 133.258131] Machine check in kernel mode. [ 133.258146] Caused by (from SRR1=9032): Data access error at address c90c8000 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Simplify slb_flush_and_rebolt()Nicholas Piggin
slb_flush_and_rebolt() is misleading, it is called in virtual mode, so it can not possibly change the stack, so it should not be touching the shadow area. And since vmalloc is no longer bolted, it should not change any bolted mappings at all. Change the name to slb_flush_and_restore_bolted(), and have it just load the kernel stack from what's currently in the shadow SLB area. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Add a SLB preload cacheNicholas Piggin
When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setupNicholas Piggin
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Add SLB allocation status bitmapsNicholas Piggin
Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Convert SLB miss handlers to CNicholas Piggin
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory must not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Disallow tracing for all of slb.c for now.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64: Interrupts save PPR on stack rather than thread_structNicholas Piggin
PPR is the odd register out when it comes to interrupt handling, it is saved in current->thread.ppr while all others are saved on the stack. The difficulty with this is that accessing thread.ppr can cause a SLB fault, but the SLB fault handler implementation in C change had assumed the normal exception entry handlers would not cause an SLB fault. Fix this by allocating room in the interrupt stack to save PPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/ptrace: Don't use sizeof(struct pt_regs) in ptrace codeMichael Ellerman
Now that we've split the user & kernel versions of pt_regs we need to be more careful in the ptrace code. For now we've ensured the location of the fields in both structs is the same, so most of the ptrace code doesn't need updating. But there are a few places where we use sizeof(pt_regs), and these will be wrong as soon as we increase the size of the kernel structure. So flip them all to use sizeof(user_pt_regs). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: Split user/kernel definitions of struct pt_regsMichael Ellerman
We use a shared definition for struct pt_regs in uapi/asm/ptrace.h. That means the layout of the structure is ABI, ie. we can't change it. That would be fine if it was only used to describe the user-visible register state of a process, but it's also the struct we use in the kernel to describe the registers saved in an interrupt frame. We'd like more flexibility in the content (and possibly layout) of the kernel version of the struct, but currently that's not possible. So split the definition into a user-visible definition which remains unchanged, and a kernel internal one. At the moment they're still identical, and we check that at build time. That's because we have code (in ptrace etc.) that assumes that they are the same. We will fix that code in future patches, and then we can break the strict symmetry between the two structs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make "default_colors" constBenjamin Herrenschmidt
It's never modified. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make "fake_elf" constBenjamin Herrenschmidt
It is never modified Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make of_workarounds staticBenjamin Herrenschmidt
It's not used anywhere else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/8xx: change name of a few page flags to avoid confusionChristophe Leroy
_PAGE_PRIVILEGED corresponds to the SH bit which doesn't protect against user access but only disables ASID verification on kernel accesses. User access is controlled with _PMD_USER flag. Name it _PAGE_SH instead of _PAGE_PRIVILEGED _PAGE_HUGE corresponds to the SPS bit which doesn't really tells that's it is a huge page but only that it is not a 4k page. Name it _PAGE_SPS instead of _PAGE_HUGE Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: handover page flags with a pgprot_t parameterChristophe Leroy
In order to avoid multiple conversions, handover directly a pgprot_t to map_kernel_page() as already done for radix. Do the same for __ioremap_caller() and __ioremap_at(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: properly set PAGE_KERNEL flags in ioremap()Christophe Leroy
Set PAGE_KERNEL directly in the caller and do not rely on a hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set. As already done for PPC64, use pgprot_cache() helpers instead of _PAGE_XXX flags in PPC32 ioremap() derived functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: don't use ioremap_prot() nor __ioremap() unless really needed.Christophe Leroy
In many places, ioremap_prot() and __ioremap() can be replaced with higher level functions like ioremap(), ioremap_coherent(), ioremap_cache(), ioremap_wc() ... Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/rtas: Fix a potential race between CPU-Offline & MigrationGautham R. Shenoy
Live Partition Migrations require all the present CPUs to execute the H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs before initiating the migration for this purpose. The commit 85a88cabad57 ("powerpc/pseries: Disable CPU hotplug across migrations") disables any CPU-hotplug operations once all the offline CPUs are brought online to prevent any further state change. Once the CPU-Hotplug operation is disabled, the code assumes that all the CPUs are online. However, there is a minor window in rtas_ibm_suspend_me() between onlining the offline CPUs and disabling CPU-Hotplug when a concurrent CPU-offline operations initiated by the userspace can succeed thereby nullifying the the aformentioned assumption. In this unlikely case these offlined CPUs will not call H_JOIN, resulting in a system hang. Fix this by verifying that all the present CPUs are actually online after CPU-Hotplug has been disabled, failing which we restore the state of the offline CPUs in rtas_ibm_suspend_me() and return an -EBUSY. Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/cacheinfo: Report the correct shared_cpu_map on big-coresGautham R. Shenoy
Currently on POWER9 SMT8 cores systems, in sysfs, we report the shared_cache_map for L1 caches (both data and instruction) to be the cpu-ids of the threads in SMT8 cores. This is incorrect since on POWER9 SMT8 cores there are two groups of threads, each of which shares its own L1 cache. This patch addresses this by reporting the shared_cpu_map correctly in sysfs for L1 caches. Before the patch /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000ff After the patch /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 00000055 /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 00000055 /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000aa /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000aa Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcoresGautham R. Shenoy
POWER9 SMT8 cores consist of two groups of threads, where threads in each group shares L1-cache. The scheduler is not aware of this distinction as the current sched-domain hierarchy has all the threads of the core defined at the SMT domain. SMT [Thread siblings of the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Due to this, we can observe run-to-run variance when we run a multi-threaded benchmark bound to a single core based on how the scheduler spreads the software threads across the two groups in the core. We fix this in this patch by defining each group of threads which share L1-cache to be the SMT level. The group of threads in the SMT8 core is defined to be the CACHE level. The sched-domain hierarchy after this patch will be : SMT [Thread siblings in the core that share L1 cache] CACHE [Thread siblings that are in the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Detect the presence of big-cores via "ibm, thread-groups"Gautham R. Shenoy
On IBM POWER9, the device tree exposes a property array identifed by "ibm,thread-groups" which will indicate which groups of threads share a particular set of resources. As of today we only have one form of grouping identifying the group of threads in the core that share the L1 cache, translation cache and instruction data flow. This patch adds helper functions to parse the contents of "ibm,thread-groups" and populate a per-cpu variable to cache information about siblings of each CPU that share the L1, traslation cache and instruction data-flow. It also defines a new global variable named "has_big_cores" which indicates if the cores on this configuration have multiple groups of threads that share L1 cache. For each online CPU, it maintains a cpu_smallcore_mask, which indicates the online siblings which share the L1-cache with it. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Use SWITCH_FRAME_SIZE for prom and rtas entryJoel Stanley
Commit 6c1719942e19 ("powerpc/of: Remove useless register save/restore when calling OF back") removed the saving of srr0 and srr1 when calling into OpenFirmware. Commit e31aa453bbc4 ("powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit") did the same for rtas. This means we don't need to save the extra stack space and can use the common SWITCH_FRAME_SIZE. There were already no users of _SRR0 and _SRR1 so we can remove them too. Link: https://github.com/linuxppc/linux/issues/83 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>