summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/pseries/lpar.c
AgeCommit message (Collapse)Author
2016-10-28powerpc/pseries: Fix stack corruption in htpe codeLaurent Dufour
commit 05af40e885955065aee8bb7425058eb3e1adca08 upstream. This commit fixes a stack corruption in the pseries specific code dealing with the huge pages. In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments to the hypervisor is not large enough. This leads to a stack corruption where a previously saved register could be corrupted leading to unexpected result in the caller, like the following panic: Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: virtio_balloon ip_tables x_tables autofs4 virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76 task: c000000005394880 task.stack: c000000005570000 NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000 REGS: c000000005573820 TRAP: 0300 Not tainted (4.8.0) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84822884 XER: 20000000 CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1 GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964 GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104 GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5 GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000 GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000 GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000 GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964 GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0 NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470 LR [c00000000027bf64] zap_huge_pmd+0x44/0x470 Call Trace: [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable) [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0 [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120 [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0 [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0 [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0 [c000000005573e30] [c000000000009560] system_call+0x38/0x108 Instruction dump: fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1 7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378 Most of the time, the bug is surfacing in a caller up in the stack from __pSeries_lpar_hugepage_invalidate() which is quite confusing. This bug is pending since v3.11 but was hidden if a caller of the caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped register (r18 in this case) in the stack and is not using it until restoring it. GCC 6.2.0 seems to raise it more frequently. This commit also change the definition of the parameter buffer in pSeries_lpar_flush_hash_range() to rely on the global define PLPAR_HCALL9_BUFSIZE (no functional change here). Fixes: 1a5272866f87 ("powerpc: Optimize hugepage invalidate") Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-09powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL defineAnton Blanchard
Commit 1bc9e47aa8e4 ("powerpc/jump_label: Use HAVE_JUMP_LABEL") converted uses of CONFIG_JUMP_LABEL to HAVE_JUMP_LABEL in some assembly files. HAVE_JUMP_LABEL is defined in linux/jump_label.h, so we need to include this or we always get the non jump label fallback code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: catalin.marinas@arm.com Cc: davem@davemloft.net Cc: heiko.carstens@de.ibm.com Cc: jbaron@akamai.com Cc: linux@arm.linux.org.uk Cc: linuxppc-dev@lists.ozlabs.org Cc: liuj97@gmail.com Cc: mgorman@suse.de Cc: mmarek@suse.cz Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: will.deacon@arm.com Fixes: 1bc9e47aa8e4 ("powerpc/jump_label: Use HAVE_JUMP_LABEL") Link: http://lkml.kernel.org/r/1428551492-21977-3-git-send-email-anton@samba.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-29powerpc/kdump: Ignore failure in enabling big endian exception during crashHari Bathini
In LE kernel, we currently have a hack for kexec that resets the exception endian before starting a new kernel as the kernel that is loaded could be a big endian or a little endian kernel. In kdump case, resetting exception endian fails when one or more cpus is disabled. But we can ignore the failure and still go ahead, as in most cases crashkernel will be of same endianess as primary kernel and reseting endianess is not even needed in those cases. This patch adds a new inline function to say if this is kdump path. This function is used at places where such a check is needed. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Rename to kdump_in_progress(), use bool, and edit comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05powerpc/mm: don't do tlbie for updatepp request with NO HPTE faultAneesh Kumar K.V
upatepp can get called for a nohpte fault when we find from the linux page table that the translation was hashed before. In that case we are sure that there is no existing translation, hence we could avoid doing tlbie. We could possibly race with a parallel fault filling the TLB. But that should be ok because updatepp is only ever relaxing permissions. We also look at linux pte permission bits when filling hash pte permission bits. We also hold the linux pte busy bits while inserting/updating a hashpte entry, hence a paralle update of linux pte is not possible. On the other hand mprotect involves ptep_modify_prot_start which cause a hpte invalidate and not updatepp. Performance number: We use randbox_access_bench written by Anton. Kernel with THP disabled and smaller hash page table size. 86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp 2.10% random_access_b random_access_bench [.] doit 1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock 1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range 1.18% random_access_b [kernel.kallsyms] [k] .__delay 0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page 0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return 0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm With Fix: 27.54% random_access_b random_access_bench [.] doit 22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return 5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm 3.31% random_access_b [kernel.kallsyms] [k] data_access_common 1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/mm/thp: Use tlbiel if possibleAneesh Kumar K.V
If we know that user address space has never executed on other cpus we could use tlbiel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-05Merge branch 'topic/get-cpu-var' into nextMichael Ellerman
2014-11-03powerpc: Replace __get_cpu_var usesChristoph Lameter
This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-30powerpc/fadump: Fix endianess issues in firmware assisted dump handlingHari Bathini
Firmware-assisted dump (fadump) kernel code is not endian safe. The below patch fixes this issue. Tested this patch with upstream kernel. Below output shows crash tool successfully opening LE fadump vmcore. # crash vmlinux vmcore GNU gdb (GDB) 7.6 This GDB was configured as "powerpc64le-unknown-linux-gnu"... KERNEL: vmlinux DUMPFILE: vmcore CPUS: 16 DATE: Wed Dec 31 19:00:00 1969 UPTIME: 00:03:28 LOAD AVERAGE: 0.46, 0.86, 0.41 TASKS: 268 NODENAME: linux-dhr2 RELEASE: 3.17.0-rc5-7-default VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014 MACHINE: ppc64le (4116 Mhz) MEMORY: 40 GB PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details) PID: 6223 COMMAND: "bash" TASK: c0000009661b2500 [THREAD_INFO: c000000967ac0000] CPU: 2 STATE: TASK_RUNNING (PANIC) Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Make the comment in pSeries_lpar_hptab_clear() clearer] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25powerpc/jump_label: use HAVE_JUMP_LABEL?Zhouyi Zhou
CONFIG_JUMP_LABEL doesn't ensure HAVE_JUMP_LABEL, if it is not the case use maintainers's own mutex to guard the modification of global values. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25powerpc: Remove stale function prototypesAnton Blanchard
There were a number of prototypes for functions that no longer exist. Remove them. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-13powerpc/thp: Don't recompute vsid and ssize in loop on invalidateAneesh Kumar K.V
The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11powerpc/pseries: Use jump labels for hcall tracepointsAnton Blanchard
hcall tracepoints add quite a few instructions to our hcall path: plpar_hcall: mr r2,r2 mfcr r0 stw r0,8(r1) b 164 <---- start ld r12,0(r2) std r12,32(r1) cmpdi r12,0 beq 164 <---- end ... We have an unconditional branch that gets noped out during boot and a load/compare/branch. We also store the tracepoint value to the stack for the hcall_exit path to use. By using jump labels we can simplify this to just a single nop that gets replaced with a branch when the tracepoint is enabled: plpar_hcall: mr r2,r2 mfcr r0 stw r0,8(r1) nop <---- ... If jump labels are not enabled, we fall back to the old method. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09powerpc/mm: Use HPTE constants when updating hpte bitsAneesh Kumar K.V
Even though we have same value for linux PTE bits and hash PTE pits use the hash pte bits wen updating hash pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09powerpc: Make slb_shadow a localJeremy Kerr
The only external user of slb_shadow is the pseries lpar code, and it can access through the paca array instead. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21pseries: Add H_SET_MODE to change exception endiannessAnton Blanchard
On little endian builds call H_SET_MODE so exceptions have the correct endianness. We need to reset the endian during kexec so do that in the MMU hashtable clear callback. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27powerpc/pseries: Add a warning in the case of cross-cpu VPA registrationMichael Ellerman
The spec says it "may be problematic" if CPU x registers the VPA of CPU y. Add a warning in case we ever do that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27pseries: Move plpar_wrapper.h to powerpc common include/asm location.Deepthi Dharwar
As a part of pseries_idle backend driver cleanup to make the code common to both pseries and powernv platforms, it is necessary to move the backend-driver code to drivers/cpuidle. As a pre-requisite for that, it is essential to move plpar_wrapper.h to include/asm. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14powerpc: Fix little endian lppaca, slb_shadow and dtl_entryAnton Blanchard
The lppaca, slb_shadow and dtl_entry hypervisor structures are big endian, so we have to byte swap them in little endian builds. LE KVM hosts will also need to be fixed but for now add an #error to remind us. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14powerpc: Fix a number of sparse warningsAnton Blanchard
Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/pseries: Fix a typo in pSeries_lpar_hpte_insert()Denis Kirjanov
Commit 801eb73f45371accc78ca9d6d22d647eeb722c11 introduced a bug while checking PTE flags. We have to drop the _PAGE_COHERENT flag when __PAGE_NO_CACHE is set and the cache update policy is not write-through (i.e. _PAGE_WRITETHRU is not set) Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> CC: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc/pseries: Inform the hypervisor we are using EBB regsMichael Ellerman
On LPAR systems we need to inform the hypervisor that we are using the EBB registers. We do this by setting a bit in the Virtual Processor Area (VPA) - formerly known as the lppaca. For now we do this always, ie. we do not dynamically enable/disable. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc: Optimize hugepage invalidateAneesh Kumar K.V
Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc/mm: handle hugepage size correctly when invalidating hpte entriesAneesh Kumar K.V
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Decode the pte-lp-encoding bits correctly.Aneesh Kumar K.V
We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Use signed formatting when printing errorAneesh Kumar K.V
PAPR defines these errors as negative values. So print them accordingly for easy debugging. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-08powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being ↵Michael Wolf
performed before the ANDCOND test Some versions of pHyp will perform the adjunct partition test before the ANDCOND test. The result of this is that H_RESOURCE can be returned and cause the BUG_ON condition to occur. The HPTE is not removed. So add a check for H_RESOURCE, it is ok if this HPTE is not removed as pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a specific HPTE to remove. So it is ok to just move on to the next slot and try again. Cc: stable@vger.kernel.org Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2012-09-17powerpc/mm: Convert virtual address to vpnAneesh Kumar K.V
This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05powerpc: Remove all includes of <asm/abs_addr.h>Michael Ellerman
It's empty now, apart from other includes. Fixup a few files that were getting things via this header. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-21powerpc: Remove FW_FEATURE ISERIES from arch codeStephen Rothwell
This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11powerpc: Fix RCU idle and hcall tracingAnton Blanchard
Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit region. Since pSeries calls H_CEDE in the idle loop, we were violating this rule. commit a7b152d5342c (powerpc: Tell RCU about idle after hcall tracing) tried to work around it by delaying the rcu_idle_enter until after we called the hcall tracepoint, but there are a number of issues with it. The hcall tracepoint trampoline code is called conditionally when the tracepoint is enabled. If the tracepoint is not enabled we never call rcu_idle_enter. The idle_uses_rcu check was also done at compile time which breaks multiplatform builds. The simple fix is to avoid tracing H_CEDE and rely on other tracepoints and the hypervisor dispatch trace log to work out if we called H_CEDE. This fixes a hang during boot on pSeries. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-06Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits) powerpc: fix compile error with 85xx/p1010rdb.c powerpc: fix compile error with 85xx/p1023_rds.c powerpc/fsl: add MSI support for the Freescale hypervisor arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree powerpc/fsl: Add support for Integrated Flash Controller powerpc/fsl: update compatiable on fsl 16550 uart nodes powerpc/85xx: fix PCI and localbus properties in p1022ds.dts powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig powerpc/fsl: Update defconfigs to enable some standard FSL HW features powerpc: Add TBI PHY node to first MDIO bus sbc834x: put full compat string in board match check powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit offb: Fix setting of the pseudo-palette for >8bpp offb: Add palette hack for qemu "standard vga" framebuffer offb: Fix bug in calculating requested vram size powerpc/boot: Change the WARN to INFO for boot wrapper overlap message powerpc/44x: Fix build error on currituck platform powerpc/boot: Change the load address for the wrapper to fit the kernel powerpc/44x: Enable CRASH_DUMP for 440x ... Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to the additional sparse-checking code for cputime_t.
2012-01-03powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exitLi Zhong
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen as following, which could cause incorrect preempt count. __trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry => get_cpu_var => preempt_disable __trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit => put_cpu_var => preempt_enable where: A => B and A -> B means A calls B, but => means A will call B through function name, and B will definitely be called. -> means A will call B through function pointer, so B might not be called if the function pointer is not set. So error happens when only one of probe_hcall_entry and probe_hcall_exit get called during a hcall. This patch tries to move the preempt count operations from probe_hcall_entry and probe_hcall_exit to its callers. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: stable@kernel.org [v2.6.32+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-11powerpc: Tell RCU about idle after hcall tracingPaul E. McKenney
The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels. One of the hypervisor calls that is traced is the H_CEDE call in the idle loop that tells the hypervisor that this OS instance no longer needs the current CPU. However, tracing uses RCU, so this combination of kernel configuration variables needs to avoid telling RCU about the current CPU's idleness until after the H_CEDE-entry tracing completes on the one hand, and must tell RCU that the the current CPU is no longer idle before the H_CEDE-exit tracing starts. In all other cases, it suffices to inform RCU of CPU idleness upon idle-loop entry and exit. This commit makes the required adjustments. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-10-31powerpc: add export.h to files making use of EXPORT_SYMBOLPaul Gortmaker
With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-08-05powerpc/pseries: Cleanup VPA registration and deregistration errorsAnton Blanchard
Make the VPA, SLB shadow and DTL registration and deregistration functions print consistent messages on error. I needed the firmware error code while chasing a kexec bug but we weren't printing it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05powerpc: pseries: Fix kexec on machines with more than 4TB of RAMAnton Blanchard
On a box with 8TB of RAM the MMU hashtable is 64GB in size. That means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed int to store the index which will overflow at 2G. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29powerpc/pseries: Re-implement HVSI as part of hvc_vioBenjamin Herrenschmidt
On pseries machines, consoles are provided by the hypervisor using a low level get_chars/put_chars type interface. However, this is really just a transport to the service processor which implements them either as "raw" console (networked consoles, HMC, ...) or as "hvsi" serial ports. The later is a simple packet protocol on top of the raw character interface that is supposed to convey additional "serial port" style semantics. In practice however, all it does is provide a way to read the CD line and set/clear our DTR line, that's it. We currently implement the "raw" protocol as an hvc console backend (/dev/hvcN) and the "hvsi" protocol using a separate tty driver (/dev/hvsi0). However this is quite impractical. The arbitrary difference between the two type of devices has been a major source of user (and distro) confusion. Additionally, there's an additional mini -hvsi implementation in the pseries platform code for our low level debug console and early boot kernel messages, which means code duplication, though that low level variant is impractical as it's incapable of doing the initial protocol negociation to establish the link to the FSP. This essentially replaces the dedicated hvsi driver and the platform udbg code completely by extending the existing hvc_vio backend used in "raw" mode so that: - It now supports HVSI as well - We add support for hvc backend providing tiocm{get,set} - It also provides a udbg interface for early debug and boot console This is overall less code, though this will only be obvious once we remove the old "hvsi" driver, which is still available for now. When the old driver is enabled, the new code still kicks in for the low level udbg console, replacing the old mini implementation in the platform code, it just doesn't provide the higher level "hvc" interface. In addition to producing generally simler code, this has several benefits over our current situation: - The user/distro only has to deal with /dev/hvcN for the hypervisor console, avoiding all sort of confusion that has plagued us in the past - The tty, kernel and low level debug console all use the same code base which supports the full protocol establishment process, thus the console is now available much earlier than it used to be with the old HVSI driver. The kernel console works much earlier and udbg is available much earlier too. Hackers can enable a hard coded very-early debug console as well that works with HVSI (previously that was only supported for the "raw" mode). I've tried to keep the same semantics as hvsi relative to how I react to things like CD changes, with some subtle differences though: - I clear DTR on close if HUPCL is set - Current hvsi triggers a hangup if it detects a up->down transition on CD (you can still open a console with CD down). My new implementation triggers a hangup if the link to the FSP is severed, and severs it upon detecting a up->down transition on CD. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29powerpc/udbg: Register udbg console genericallyBenjamin Herrenschmidt
When CONFIG_PPC_EARLY_DEBUG is set, call register_early_udbg_console() early from generic code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04powerpc/pseries: Add page coalescing supportBrian King
Adds support for page coalescing, which is a feature on IBM Power servers which allows for coalescing identical pages between logical partitions. Hint text pages as coalesce candidates, since they are the most likely pages to be able to be coalesced between partitions. This patch also exports some page coalescing statistics available from firmware via lparcfg. [BenH: Moved a couple of things around to fix compile problems] Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27powerpc: Free up some CPU feature bits by moving out MMU-related featuresMatt Evans
Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-02-07powerpc: Fix hcall tracepoint recursionAnton Blanchard
Spinlocks on shared processor partitions use H_YIELD to notify the hypervisor we are waiting on another virtual CPU. Unfortunately this means the hcall tracepoints can recurse. The patch below adds a percpu depth and checks it on both the entry and exit hcall tracepoints. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: stable@kernel.org
2010-11-29powerpc/pseries: Add kernel parameter to disable batched hcallsWill Schmidt
This introduces a pair of kernel parameters that can be used to disable the MULTITCE and BULK_REMOVE h-calls. By default, those hcalls are enabled, active, and good for throughput and performance. The ability to disable them will be useful for some of the PREEMPT_RT related investigation and work occurring on Power. Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> cc: Olof Johansson <olof@lixom.net> cc: Anton Blanchard <anton@samba.org> cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Account time using timebase rather than PURRPaul Mackerras
Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Abstract indexing of lppaca structsPaul Mackerras
Currently we have the lppaca structs as a simple array of NR_CPUS entries, taking up space in the data section of the kernel image. In future we would like to allocate them dynamically, so this abstracts out the accesses to the array, making it easier to change how we locate the lppaca for a given cpu in future. Specifically, lppaca[cpu] changes to lppaca_of(cpu). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-05-21powerpc/kexec: Speedup kexec hash PTE tear downMichael Neuling
Currently for kexec the PTE tear down on 1TB segment systems normally requires 3 hcalls for each PTE removal. On a machine with 32GB of memory it can take around a minute to remove all the PTEs. This optimises the path so that we only remove PTEs that are valid. It also uses the read 4 PTEs at once HCALL. For the common case where a PTEs is invalid in a 1TB segment, this turns the 3 HCALLs per PTE down to 1 HCALL per 4 PTEs. This gives an > 10x speedup in kexec times on PHYP, taking a 32GB machine from around 1 minute down to a few seconds. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-28powerpc: tracing: Give hypervisor call tracepoints access to argumentsAnton Blanchard
While most users of the hcall tracepoints will only want the opcode and return code, some will want all the arguments. To avoid the complexity of using varargs we pass a pointer to the register save area, which contains all the arguments. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-10-28powerpc: tracing: Add hypervisor call tracepointsAnton Blanchard
Add hcall_entry and hcall_exit tracepoints. This replaces the inline assembly HCALL_STATS code and converts it to use the new tracepoints. To keep the disabled case as quick as possible, we embed a status word in the TOC so we can get at it with a single load. By doing so we keep the overhead at a minimum. Time taken for a null hcall: No tracepoint code: 135.79 cycles Disabled tracepoints: 137.95 cycles For reference, before this patch enabling HCALL_STATS resulted in a null hcall of 201.44 cycles! Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-07-08powerpc/pseries: Use pr_devel() in pseries LPAR HPTE routinesMichael Ellerman
pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. In particular, pSeries_lpar_hpte_insert() goes from 185 instructions to 77 instructions as a result of this patch. Luckily that code isn't called very often ... With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 7284 1552 296 9132 23ac platforms/pseries/lpar.o size after: text data bss dec hex filename 5806 1096 296 7198 1c1e platforms/pseries/lpar.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-21powerpc/pseries: CMO unused page hintingRobert Jennings
Adds support for the "unused" page hint which can be used in shared memory partitions to flag pages not in use, which will then be stolen before active pages by the hypervisor when memory needs to be moved to LPARs in need of additional memory. Failure to mark pages as 'unused' makes the LPAR slower to give up unused memory to other partitions. This adds the kernel parameter 'cmo_free_hint' to disable this functionality. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-15powerpc: Remove unnecessary condition when sanity-checking WIMG bitsDave Kleikamp
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set in the same pte. In fact, even if that were not the case, there doesn't seem to be any place where G is set without also setting I (_PAGE_NO_CACHE), so the test for I is sufficient as a condition to clear _PAGE_COHERENT when filling the hash table. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>