summaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm
AgeCommit message (Collapse)Author
2019-10-11powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9Aneesh Kumar K.V
commit 047e6575aec71d75b765c22111820c4776cd1c43 upstream. On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flagAneesh Kumar K.V
commit 09ce98cacd51fcd0fa0af2f79d1e1d3192f4cbb0 upstream. Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown racePaul Mackerras
commit da15c03b047dca891d37b9f4ef9ca14d84a6484f upstream. Testing has revealed the existence of a race condition where a XIVE interrupt being shut down can be in one of the XIVE interrupt queues (of which there are up to 8 per CPU, one for each priority) at the point where free_irq() is called. If this happens, can return an interrupt number which has been shut down. This can lead to various symptoms: - irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt function gets called, resulting in the CPU's elevated interrupt priority (numerically lowered CPPR) never gets reset. That then means that the CPU stops processing interrupts, causing device timeouts and other errors in various device drivers. - The irq descriptor or related data structures can be in the process of being freed as the interrupt code is using them. This typically leads to crashes due to bad pointer dereferences. This race is basically what commit 62e0468650c3 ("genirq: Add optional hardware synchronization for shutdown", 2019-06-28) is intended to fix, given a get_irqchip_state() method for the interrupt controller being used. It works by polling the interrupt controller when an interrupt is being freed until the controller says it is not pending. With XIVE, the PQ bits of the interrupt source indicate the state of the interrupt source, and in particular the P bit goes from 0 to 1 at the point where the hardware writes an entry into the interrupt queue that this interrupt is directed towards. Normally, the code will then process the interrupt and do an end-of-interrupt (EOI) operation which will reset PQ to 00 (assuming another interrupt hasn't been generated in the meantime). However, there are situations where the code resets P even though a queue entry exists (for example, by setting PQ to 01, which disables the interrupt source), and also situations where the code leaves P at 1 after removing the queue entry (for example, this is done for escalation interrupts so they cannot fire again until they are explicitly re-enabled). The code already has a 'saved_p' flag for the interrupt source which indicates that a queue entry exists, although it isn't maintained consistently. This patch adds a 'stale_p' flag to indicate that P has been left at 1 after processing a queue entry, and adds code to set and clear saved_p and stale_p as necessary to maintain a consistent indication of whether a queue entry may or may not exist. With this, we can implement xive_get_irqchip_state() by looking at stale_p, saved_p and the ESB PQ bits for the interrupt. There is some additional code to handle escalation interrupts properly; because they are enabled and disabled in KVM assembly code, which does not have access to the xive_irq_data struct for the escalation interrupt. Hence, stale_p may be incorrect when the escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu(). Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on, with some careful attention to barriers in order to ensure the correct result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu(). Finally, this adds code to make noise on the console (pr_crit and WARN_ON(1)) if we find an interrupt queue entry for an interrupt which does not have a descriptor. While this won't catch the race reliably, if it does get triggered it will be an indication that the race is occurring and needs to be debugged. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-11KVM: PPC: Book3S: Enable XIVE native capability only if OPAL has required ↵Paul Mackerras
functions commit 2ad7a27deaf6d78545d97ab80874584f6990360e upstream. There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10588e ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07powerpc/futex: Fix warning: 'oldval' may be used uninitialized in this functionChristophe Leroy
[ Upstream commit 38a0d0cdb46d3f91534e5b9839ec2d67be14c59d ] We see warnings such as: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] return oldval == cmparg; ^ kernel/futex.c:1651:6: note: 'oldval' was declared here int oldval, ret; ^ This is because arch_futex_atomic_op_inuser() only sets *oval if ret is 0 and GCC doesn't see that it will only use it when ret is 0. Anyway, the non-zero ret path is an error path that won't suffer from setting *oval, and as *oval is a local var in futex_atomic_op_inuser() it will have no impact. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: reword change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/86b72f0c134367b214910b27b9a6dd3321af93bb.1565774657.git.christophe.leroy@c-s.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-01powerpc/xive: Fix bogus error code returned by OPALGreg Kurz
commit 6ccb4ac2bf8a35c694ead92f8ac5530a16e8f2c8 upstream. There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call to return the 32-bit value 0xffffffff when OPAL has run out of IRQs. Unfortunatelty, OPAL return values are signed 64-bit entities and errors are supposed to be negative. If that happens, the linux code confusingly treats 0xffffffff as a valid IRQ number and panics at some point. A fix was recently merged in skiboot: e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()") but we need a workaround anyway to support older skiboots already in the field. Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error returned upon resource exhaustion. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-31Revert "powerpc: slightly improve cache helpers"Michael Ellerman
This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a. It triggers a probable compiler bug on clang which leads to crashes. With GCC it allows the compiler to use a more efficient register allocation but current GCC versions never do that at any of the current call sites, so there's no benefit. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-29powerpc: Wire up clone3 syscallMichael Ellerman
Wire up the new clone3 syscall added in commit 7f192e3cd316 ("fork: add clone3"). This requires a ppc_clone3 wrapper, in order to save the non-volatile GPRs before calling into the generic syscall code. Otherwise we hit the BUG_ON in CHECK_FULL_REGS in copy_thread(). Lightly tested using Christian's test code on a Power8 LE VM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Christian Brauner <christian@brauner.io> Link: https://lore.kernel.org/r/20190724140259.23554-1-mpe@ellerman.id.au
2019-07-24Merge tag 'powerpc-5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "An assortment of non-regression fixes that have accumulated since the start of the merge window. - A fix for a user triggerable oops on machines where transactional memory is disabled, eg. Power9 bare metal, Power8 with TM disabled on the command line, or all Power7 or earlier machines. - Three fixes for handling of PMU and power saving registers when running nested KVM on Power9. - Two fixes for bugs found while stress testing the XIVE interrupt controller code, also on Power9. - A fix to allow guests to boot under Qemu/KVM on Power9 using the the Hash MMU with >= 1TB of memory. - Two fixes for bugs in the recent DMA cleanup, one of which could lead to checkstops. - And finally three fixes for the PAPR SCM nvdimm driver. Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater, Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling, Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar Singh, Vaibhav Jain" * tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL powerpc/pseries: Update SCM hcall op-codes in hvcall.h powerpc/tm: Fix oops on sigreturn on systems without TM powerpc/dma: Fix invalid DMA mmap behavior KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting powerpc/mm: Limit rma_size to 1TB when running without HV mode
2019-07-22powerpc/pseries: Update SCM hcall op-codes in hvcall.hVaibhav Jain
Update the hvcalls.h to include op-codes for new hcalls introduce to manage SCM memory. Also update existing hcall definitions to reflect current papr specification for SCM. The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were transient proposals and there support was never implemented by Power-VM nor they were used anywhere in Linux kernel. Hence we don't expect anyone to be impacted by this change. Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
2019-07-16mm: introduce ARCH_HAS_PTE_DEVMAPRobin Murphy
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16powerpc: define syscall_get_error()Dmitry V. Levin
syscall_get_error() is required to be implemented on this architecture in addition to already implemented syscall_get_nr(), syscall_get_arguments(), syscall_get_return_value(), and syscall_get_arch() functions in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/20190510152824.GE28558@altlinux.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greentime Hu <greentime@andestech.com> Cc: Helge Deller <deller@gmx.de> [parisc] Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: James Hogan <jhogan@kernel.org> Cc: kbuild test robot <lkp@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-15powerpc/pmu: Set pmcregs_in_use in paca when running as LPARSuraj Jitindar Singh
The ability to run nested guests under KVM means that a guest can also act as a hypervisor for it's own nested guest. Currently ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set, indicating a guest environment, and so sets the pmcregs_in_use flag in the lppaca, or that it isn't set, indicating a hypervisor environment, and so sets the pmcregs_in_use flag in the paca. The pmcregs_in_use flag in the lppaca is used to communicate this information to a hypervisor and so must be set in a guest environment. The pmcregs_in_use flag in the paca is used by KVM code to determine whether the host state of the performance monitoring unit (PMU) must be saved and restored when running a guest. Thus when a guest also acts as a hypervisor it must set this bit in both places since it needs to ensure both that the real hypervisor saves it's PMU registers when it runs (requires pmcregs_in_use flag in lppaca), and that it saves it's own PMU registers when running a nested guest (requires pmcregs_in_use flag in paca). Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in both the lppaca and the paca when a guest (LPAR) is running with the capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE). Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-2-sjitindarsingh@gmail.com
2019-07-13Merge tag 'powerpc-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well as some other functions only used by drivers that haven't (yet?) made it upstream. - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e mem: ...) which could lead to register corruption and kernel crashes. - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc when using the Radix MMU. - A large but incremental rewrite of our exception handling code to use gas macros rather than multiple levels of nested CPP macros. And the usual small fixes, cleanups and improvements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung Bauermann, YueHaibing" * tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits) powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state. powerpc/eeh: Handle hugepages in ioremap space ocxl: Update for AFU descriptor template version 1.1 powerpc/boot: pass CONFIG options in a simpler and more robust way powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h powerpc/irq: Don't WARN continuously in arch_local_irq_restore() powerpc/module64: Use symbolic instructions names. powerpc/module32: Use symbolic instructions names. powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h powerpc/module64: Fix comment in R_PPC64_ENTRY handling powerpc/boot: Add lzo support for uImage powerpc/boot: Add lzma support for uImage powerpc/boot: don't force gzipped uImage powerpc/8xx: Add microcode patch to move SMC parameter RAM. powerpc/8xx: Use IO accessors in microcode programming. powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c powerpc/8xx: refactor programming of microcode CPM params. powerpc/8xx: refactor printing of microcode patch name. powerpc/8xx: Refactor microcode write powerpc/8xx: refactor writing of CPM microcode arrays ...
2019-07-12Merge tag 'asm-generic-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "The asm-generic changes for 5.3 consist of a cleanup series to remove ptrace.h from Christoph Hellwig, who explains: 'asm-generic/ptrace.h is a little weird in that it doesn't actually implement any functionality, but it provided multiple layers of macros that just implement trivial inline functions. We implement those directly in the few architectures and be off with a much simpler design.' at https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/" * tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: remove ptrace.h x86: don't use asm-generic/ptrace.h sh: don't use asm-generic/ptrace.h powerpc: don't use asm-generic/ptrace.h arm64: don't use asm-generic/ptrace.h
2019-07-12mm/nvdimm: add is_ioremap_addr and use that to check ioremap addressAneesh Kumar K.V
Architectures like powerpc use different address range to map ioremap and vmalloc range. The memunmap() check used by the nvdimm layer was wrongly using is_vmalloc_addr() to check for ioremap range which fails for ppc64. This result in ppc64 not freeing the ioremap mapping. The side effect of this is an unbind failure during module unload with papr_scm nvdimm driver Link: http://lkml.kernel.org/r/20190701134038.14165-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-08Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - rwsem scalability improvements, phase #2, by Waiman Long, which are rather impressive: "On a 2-socket 40-core 80-thread Skylake system with 40 reader and writer locking threads, the min/mean/max locking operations done in a 5-second testing window before the patchset were: 40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810 40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255 After the patchset, they became: 40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741 40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098" There's a lot of changes to the locking implementation that makes it similar to qrwlock, including owner handoff for more fair locking. Another microbenchmark shows how across the spectrum the improvements are: "With a locking microbenchmark running on 5.1 based kernel, the total locking rates (in kops/s) on a 2-socket Skylake system with equal numbers of readers and writers (mixed) before and after this patchset were: # of Threads Before Patch After Patch ------------ ------------ ----------- 2 2,618 4,193 4 1,202 3,726 8 802 3,622 16 729 3,359 32 319 2,826 64 102 2,744" The changes are extensive and the patch-set has been through several iterations addressing various locking workloads. There might be more regressions, but unless they are pathological I believe we want to use this new implementation as the baseline going forward. - jump-label optimizations by Daniel Bristot de Oliveira: the primary motivation was to remove IPI disturbance of isolated RT-workload CPUs, which resulted in the implementation of batched jump-label updates. Beyond the improvement of the real-time characteristics kernel, in one test this patchset improved static key update overhead from 57 msecs to just 1.4 msecs - which is a nice speedup as well. - atomic64_t cross-arch type cleanups by Mark Rutland: over the last ~10 years of atomic64_t existence the various types used by the APIs only had to be self-consistent within each architecture - which means they became wildly inconsistent across architectures. Mark puts and end to this by reworking all the atomic64 implementations to use 's64' as the base type for atomic64_t, and to ensure that this type is consistently used for parameters and return values in the API, avoiding further problems in this area. - A large set of small improvements to lockdep by Yuyang Du: type cleanups, output cleanups, function return type and othr cleanups all around the place. - A set of percpu ops cleanups and fixes by Peter Zijlstra. - Misc other changes - please see the Git log for more details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits) locking/lockdep: increase size of counters for lockdep statistics locking/atomics: Use sed(1) instead of non-standard head(1) option locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING x86/jump_label: Make tp_vec_nr static x86/percpu: Optimize raw_cpu_xchg() x86/percpu, sched/fair: Avoid local_clock() x86/percpu, x86/irq: Relax {set,get}_irq_regs() x86/percpu: Relax smp_processor_id() x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}() locking/rwsem: Guard against making count negative locking/rwsem: Adaptive disabling of reader optimistic spinning locking/rwsem: Enable time-based spinning on reader-owned rwsem locking/rwsem: Make rwsem->owner an atomic_long_t locking/rwsem: Enable readers spinning on writer locking/rwsem: Clarify usage of owner's nonspinaable bit locking/rwsem: Wake up almost all readers in wait queue locking/rwsem: More optimal RT task handling of null owner locking/rwsem: Always release wait_lock before waking up tasks locking/rwsem: Implement lock handoff to prevent lock starvation locking/rwsem: Make rwsem_spin_on_owner() return owner state ...
2019-07-08Merge tag 's390-5.3-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Improve stop_machine wait logic: replace cpu_relax_yield call in generic stop_machine function with a weak stop_machine_yield function. This is overridden on s390, which yields the current cpu to the neighbouring cpu after a couple of retries, instead of blindly giving up the cpu to the hipervisor. This significantly improves stop_machine performance on s390 in overcommitted scenarios. This includes common code changes which have been Acked by Peter Zijlstra and Thomas Gleixner. - Improve jump label transformation speed: transform jump labels without using stop_machine. - Refactoring of the vfio-ccw cp handling, simplifying the code and avoiding unneeded allocating/copying. - Various vfio-ccw fixes (ccw translation, state machine). - Add support for vfio-ap queue interrupt control in the guest. This includes s390 kvm changes which have been Acked by Christian Borntraeger. - Add protected virtualization support for virtio-ccw. - Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to remove some code which most likely isn't working at all, besides that s390 didn't even compile for !CONFIG_SMP. - Support for special flagged EP11 CPRBs for zcrypt. - Handle PCI devices with no support for new MIO instructions. - Avoid KASAN false positives in reworked stack unwinder. - Couple of fixes for the QDIO layer. - Convert s390 specific documentation to ReST format. - Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if hardware is missing. This way our modules behave like most other modules and which is also what systemd's systemd-modules-load.service expects. - Replace defconfig with performance_defconfig, so there is one config file less to maintain. - Remove the SCLP call home device driver, which was never useful. - Cleanups all over the place. * tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (83 commits) docs: s390: s390dbf: typos and formatting, update crash command docs: s390: unify and update s390dbf kdocs at debug.c docs: s390: restore important non-kdoc parts of s390dbf.rst vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1 s390/pci: correctly handle MIO opt-out s390/pci: deal with devices that have no support for MIO instructions s390: ap: kvm: Enable PQAP/AQIC facility for the guest s390: ap: implement PAPQ AQIC interception in kernel vfio: ap: register IOMMU VFIO notifier s390: ap: kvm: add PQAP interception for AQIC s390/unwind: cleanup unused READ_ONCE_TASK_STACK s390/kasan: avoid false positives during stack unwind s390/qdio: don't touch the dsci in tiqdio_add_input_queues() s390/qdio: (re-)initialize tiqdio list entries s390/dasd: Fix a precision vs width bug in dasd_feature_list() s390/cio: introduce driver_override on the css bus vfio-ccw: make convert_ccw0_to_ccw1 static vfio-ccw: Remove copy_ccw_from_iova() vfio-ccw: Factor out the ccw0-to-ccw1 transition vfio-ccw: Copy CCW data outside length calculation ...
2019-07-06powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.hChristophe Leroy
PPC_HA() PPC_HI() and PPC_LO() macros are nice macros. Move them from module64.c to ppc-opcode.h in order to use them in other places. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Clean up formatting in new code, drop duplicates in ftrace.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/64: reuse PPC32 static inline flush_dcache_range()Christophe Leroy
This patch drops the assembly PPC64 version of flush_dcache_range() and re-uses the PPC32 static inline version. With GCC 8.1, the following code is generated: void flush_test(unsigned long start, unsigned long stop) { flush_dcache_range(start, stop); } 0000000000000130 <.flush_test>: 130: 3d 22 00 00 addis r9,r2,0 132: R_PPC64_TOC16_HA .data+0x8 134: 81 09 00 00 lwz r8,0(r9) 136: R_PPC64_TOC16_LO .data+0x8 138: 3d 22 00 00 addis r9,r2,0 13a: R_PPC64_TOC16_HA .data+0xc 13c: 80 e9 00 00 lwz r7,0(r9) 13e: R_PPC64_TOC16_LO .data+0xc 140: 7d 48 00 d0 neg r10,r8 144: 7d 43 18 38 and r3,r10,r3 148: 7c 00 04 ac hwsync 14c: 4c 00 01 2c isync 150: 39 28 ff ff addi r9,r8,-1 154: 7c 89 22 14 add r4,r9,r4 158: 7c 83 20 50 subf r4,r3,r4 15c: 7c 89 3c 37 srd. r9,r4,r7 160: 41 82 00 1c beq 17c <.flush_test+0x4c> 164: 7d 29 03 a6 mtctr r9 168: 60 00 00 00 nop 16c: 60 00 00 00 nop 170: 7c 00 18 ac dcbf 0,r3 174: 7c 63 42 14 add r3,r3,r8 178: 42 00 ff f8 bdnz 170 <.flush_test+0x40> 17c: 7c 00 04 ac hwsync 180: 4c 00 01 2c isync 184: 4e 80 00 20 blr 188: 60 00 00 00 nop 18c: 60 00 00 00 nop Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/32: define helpers to get L1 cache sizes.Christophe Leroy
This patch defines C helpers to retrieve the size of cache blocks and uses them in the cacheflush functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/64: flush_inval_dcache_range() becomes flush_dcache_range()Christophe Leroy
On most arches having function flush_dcache_range(), including PPC32, this function does a writeback and invalidation of the cache bloc. On PPC64, flush_dcache_range() only does a writeback while flush_inval_dcache_range() does the invalidation in addition. In addition it looks like within arch/powerpc/, there are no PPC64 platforms using flush_dcache_range() This patch drops the existing 64 bits version of flush_dcache_range() and renames flush_inval_dcache_range() into flush_dcache_range(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc: slightly improve cache helpersChristophe Leroy
Cache instructions (dcbz, dcbi, dcbf and dcbst) take two registers that are summed to obtain the target address. Using 'Z' constraint and '%y0' argument gives GCC the opportunity to use both registers instead of only one with the second being forced to 0. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/book3s: Use config independent helpers for page table walkAneesh Kumar K.V
Even when we have HugeTLB and THP disabled, kernel linear map can still be mapped with hugepages. This is only an issue with radix translation because hash MMU doesn't map kernel linear range in linux page table and other kernel map areas are not mapped using hugepage. Add config independent helpers and put WARN_ON() when we don't expect things to be mapped via hugepages. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05powerpc/mm: Remove unused variable declarationAneesh Kumar K.V
Since commit 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range") __kernel_virt_size is not used anymore. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Protect against hogging the cpu while setting up the statsNaveen N. Rao
When enabling or disabling the vcpu dispatch statistics, we do a lot of work including allocating/deallocating memory across all possible cpus for the DTL buffer. In order to guard against hogging the cpu for too long, track the time we're taking and yield the processor if necessary. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Provide vcpu dispatch statisticsNaveen N. Rao
For Shared Processor LPARs, the POWER Hypervisor maintains a relatively static mapping of the LPAR processors (vcpus) to physical processor chips (representing the "home" node) and tries to always dispatch vcpus on their associated physical processor chip. However, under certain scenarios, vcpus may be dispatched on a different processor chip (away from its home node). The actual physical processor number on which a certain vcpu is dispatched is available to the guest in the 'processor_id' field of each DTL entry. The guest can discover the home node of each vcpu through the H_HOME_NODE_ASSOCIATIVITY(flags=1) hcall. The guest can also discover the associativity of physical processors, as represented in the DTL entry, through the H_HOME_NODE_ASSOCIATIVITY(flags=2) hcall. These can then be compared to determine if the vcpu was dispatched on its home node or not. If the vcpu was not dispatched on the home node, it is possible to determine if the vcpu was dispatched in a different chip, socket or drawer. Introduce a procfs file /proc/powerpc/vcpudispatch_stats that can be used to obtain these statistics. Writing '1' to this file enables collecting the statistics, while writing '0' disables the statistics. The statistics themselves are available by reading the procfs file. By default, the DTLB log for each vcpu is processed 50 times a second so as not to miss any entries. This processing frequency can be changed through /proc/powerpc/vcpudispatch_stats_freq. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Move mm/book3s64/vphn.c under platforms/pseries/Naveen N. Rao
hcall_vphn() is specific to pseries and will be used in a subsequent patch. So, move it to a more appropriate place under arch/powerpc/platforms/pseries. Also merge vphn.h into lppaca.h and update vphn selftest to use the new files. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Introduce rwlock to gatekeep DTLB usageNaveen N. Rao
Since we would be introducing a new user of the DTL buffer in a subsequent patch, we need a way to gatekeep use of the DTL buffer. The current debugfs interface for DTL allows registering and opening cpu-specific DTL buffers. Cpu specific files are exposed under debugfs 'powerpc/dtl/' node, and changing 'dtl_event_mask' in the same directory enables controlling the event mask used when registering DTL buffer for a particular cpu. Subsequently, we will be introducing a user of the DTL buffers that registers access to the DTL buffers across all cpus with the same event mask. To ensure these two users do not step on each other, we introduce a rwlock to gatekeep DTL buffer access. This fits the requirement of the current debugfs interface wanting to allow multiple independent cpu-specific users (read lock), and the subsequent user wanting exclusive access (write lock). Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Factor out DTL buffer allocation and registration routinesNaveen N. Rao
Introduce new helpers for DTL buffer allocation and registration and have the existing code use those. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Don't split error messages across lines, for grepability] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-04powerpc/pseries: Use macros for referring to the DTL enable maskNaveen N. Rao
Introduce macros to encode the DTL enable mask fields and use those instead of hardcoding numbers. Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in the powerpc Hardware Architecture related files. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: Add barrier_nospec to raw_copy_in_user()Suraj Jitindar Singh
Commit ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()") Added barrier_nospec before loading from user-controlled pointers. The intention was to order the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar. In order to achieve the same result, add a barrier_nospec to the raw_copy_in_user() function before loading from such a user-controlled pointer. Fixes: ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: remove device_to_mask()Christoph Hellwig
Use the dma_get_mask() helper from dma-mapping.h instead, as they are functionally identical. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc: Fix compile issue with force DAWRMichael Neuling
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This moves a bunch of code around to fix this. It moves a lot of the DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable compiling it. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Minor formatting in set_dawr()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/64s/radix: keep kernel ERAT over local process/guest invalidatesNicholas Piggin
ISA v3.0 radix modes provide SLBIA variants which can invalidate ERAT for effPID!=0 or for effLPID!=0, which allows user and guest invalidations to retain kernel/host ERAT entries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-03powerpc/64s: Rename PPC_INVALIDATE_ERAT to PPC_ISA_3_0_INVALIDATE_ERATNicholas Piggin
This makes it clear to the caller that it can only be used on POWER9 and later CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use "ISA_3_0" rather than "ARCH_300"] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: remove bad stack branchNicholas Piggin
The bad stack test in interrupt handlers has a few problems. For performance it is taken in the common case, which is a fetch bubble and a waste of i-cache. For code development and maintainence, it requires yet another stack frame setup routine, and that constrains all exception handlers to follow the same register save pattern which inhibits future optimisation. Remove the test/branch and replace it with a trap. Teach the program check handler to use the emergency stack for this case. This does not result in quite so nice a message, however the SRR0 and SRR1 of the crashed interrupt can be seen in r11 and r12, as is the original r1 (adjusted by INT_FRAME_SIZE). These are the most important parts to debugging the issue. The original r9-12 and cr0 is lost, which is the main downside. kernel BUG at linux/arch/powerpc/kernel/exceptions-64s.S:847! Oops: Exception in kernel mode, sig: 5 [#1] BE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted NIP: c000000000009108 LR: c000000000cadbcc CTR: c0000000000090f0 REGS: c0000000fffcbd70 TRAP: 0700 Not tainted MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 28222448 XER: 20040000 CFAR: c000000000009100 IRQMASK: 0 GPR00: 000000000000003d fffffffffffffd00 c0000000018cfb00 c0000000f02b3166 GPR04: fffffffffffffffd 0000000000000007 fffffffffffffffb 0000000000000030 GPR08: 0000000000000037 0000000028222448 0000000000000000 c000000000ca8de0 GPR12: 9000000002009032 c000000001ae0000 c000000000010a00 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c0000000f00322c0 c000000000f85200 0000000000000004 ffffffffffffffff GPR24: fffffffffffffffe 0000000000000000 0000000000000000 000000000000000a GPR28: 0000000000000000 0000000000000000 c0000000f02b391c c0000000f02b3167 NIP [c000000000009108] decrementer_common+0x18/0x160 LR [c000000000cadbcc] .vsnprintf+0x3ec/0x4f0 Call Trace: Instruction dump: 996d098a 994d098b 38610070 480246ed 48005518 60000000 38200000 718a4000 7c2a0b78 3821fd00 41c20008 e82d0970 <0981fd00> f92101a0 f9610170 f9810178 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move paca save area offsets into exception-64s.SNicholas Piggin
No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move head-64.h code to exception-64s.S where it is usedNicholas Piggin
No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move exception-64s.h code to exception-64s.S where it ↵Nicholas Piggin
is used No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move KVM related code togetherNicholas Piggin
No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: remove STD_EXCEPTION_COMMON variantsNicholas Piggin
These are only called in one place each. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical placeNicholas Piggin
No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: unwind exception-64s.h macrosNicholas Piggin
Many of these macros just specify 1-4 lines which are only called a few times each at most, and often just once. Remove this indirection. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: Move EXCEPTION_COMMON additions into callersNicholas Piggin
More cases of code insertion via macros that does not add a great deal. All the additions have to be specified in the macro arguments, so they can just as well go after the macro. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches ↵Nicholas Piggin
into callers The aim is to reduce the amount of indirection it takes to get through the exception handler macros, particularly where it provides little code sharing. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency ↵Nicholas Piggin
with others No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: KVM handler can set the HSRR trap bitNicholas Piggin
Move the KVM trap HSRR bit into the KVM handler, which can be conditionally applied when hsrr parameter is set. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: merge KVM handler and skip variantsNicholas Piggin
Conditionally expand the skip case if it is specified. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>