summaryrefslogtreecommitdiff
path: root/arch/powerpc/kvm/book3s_hv_rmhandlers.S
AgeCommit message (Collapse)Author
2014-09-22KVM: PPC: Book3S HV: Add register name when loading tocMichael Neuling
Add 'r' to register name r2 in kvmppc_hv_enter. Also update comment at the top of kvmppc_hv_enter to indicate that R2/TOC is non-volatile. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-08-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull second round of KVM changes from Paolo Bonzini: "Here are the PPC and ARM changes for KVM, which I separated because they had small conflicts (respectively within KVM documentation, and with 3.16-rc changes). Since they were all within the subsystem, I took care of them. Stephen Rothwell reported some snags in PPC builds, but they are all fixed now; the latest linux-next report was clean. New features for ARM include: - KVM VGIC v2 emulation on GICv3 hardware - Big-Endian support for arm/arm64 (guest and host) - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list) And for PPC: - Book3S: Good number of LE host fixes, enable HV on LE - Book3S HV: Add in-guest debug support This release drops support for KVM on the PPC440. As a result, the PPC merge removes more lines than it adds. :) I also included an x86 change, since Davidlohr tied it to an independent bug report and the reporter quickly provided a Tested-by; there was no reason to wait for -rc2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits) KVM: Move more code under CONFIG_HAVE_KVM_IRQFD KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use KVM: nVMX: Fix nested vmexit ack intr before load vmcs01 KVM: PPC: Enable IRQFD support for the XICS interrupt controller KVM: Give IRQFD its own separate enabling Kconfig option KVM: Move irq notifier implementation into eventfd.c KVM: Move all accesses to kvm::irq_routing into irqchip.c KVM: irqchip: Provide and use accessors for irq routing table KVM: Don't keep reference to irq routing table in irqfd struct KVM: PPC: drop duplicate tracepoint arm64: KVM: fix 64bit CP15 VM access for 32bit guests KVM: arm64: GICv3: mandate page-aligned GICV region arm64: KVM: GICv3: move system register access to msr_s/mrs_s KVM: PPC: PR: Handle FSCR feature deselects KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr KVM: PPC: Remove DCR handling KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Handle magic page in kvmppc_ld/st ...
2014-08-07Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc new goodies for 3.17. The short story: The biggest bit is Michael removing all of pre-POWER4 processor support from the 64-bit kernel. POWER3 and rs64. This gets rid of a ton of old cruft that has been bitrotting in a long while. It was broken for quite a few versions already and nobody noticed. Nobody uses those machines anymore. While at it, he cleaned up a bunch of old dusty cabinets, getting rid of a skeletton or two. Then, we have some base VFIO support for KVM, which allows assigning of PCI devices to KVM guests, support for large 64-bit BARs on "powernv" platforms, support for HMI (Hardware Management Interrupts) on those same platforms, some sparse-vmemmap improvements (for memory hotplug), There is the usual batch of Freescale embedded updates (summary in the merge commit) and fixes here or there, I think that's it for the highlights" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits) powerpc/eeh: Export eeh_iommu_group_to_pe() powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API powerpc: Reduce scariness of interrupt frames in stack traces powerpc: start loop at section start of start in vmemmap_populated() powerpc: implement vmemmap_free() powerpc: implement vmemmap_remove_mapping() for BOOK3S powerpc: implement vmemmap_list_free() powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. powerpc/book3s: handle HMIs for cpus in nap mode. powerpc/powernv: Invoke opal call to handle hmi. powerpc/book3s: Add basic infrastructure to handle HMI in Linux. powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Handle compound PE in config accessors powerpc/powernv: Handle compound PE for EEH powerpc/powernv: Handle compound PE powerpc/powernv: Split ioda_eeh_get_state() powerpc/powernv: Allow to freeze PE powerpc/powernv: Enable M64 aperatus for PHB3 powerpc/eeh: Aux PE data for error log ...
2014-08-05powerpc/book3s: Add basic infrastructure to handle HMI in Linux.Mahesh Salgaonkar
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements basic infrastructure to handle HMI in Linux host. The design is to invoke opal handle hmi in real mode for recovery and set irq_pending when we hit HMI. During check_irq_replay pull opal hmi event and print hmi info on console. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28KVM: PPC: Book3S HV: Fix ABIv2 on LEAlexander Graf
For code that doesn't live in modules we can just branch to the real function names, giving us compatibility with ABIv1 and ABIv2. Do this for the compiled-in code of HV KVM. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S HV: Access XICS in BEAlexander Graf
On the exit path from the guest we check what type of interrupt we received if we received one. This means we're doing hardware access to the XICS interrupt controller. However, when running on a little endian system, this access is byte reversed. So let's make sure to swizzle the bytes back again and virtually make XICS accesses big endian. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BEAlexander Graf
Some data structures are always stored in big endian. Among those are the LPPACA fields as well as the shadow slb. These structures might be shared with a hypervisor. So whenever we access those fields, make sure we do so in big endian byte order. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabledPaul Mackerras
This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL capability is used to enable or disable in-kernel handling of an hcall, that the hcall is actually implemented by the kernel. If not an EINVAL error is returned. This also checks the default-enabled list of hcalls and prints a warning if any hcall there is not actually implemented. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handlingPaul Mackerras
This provides a way for userspace controls which sPAPR hcalls get handled in the kernel. Each hcall can be individually enabled or disabled for in-kernel handling, except for H_RTAS. The exception for H_RTAS is because userspace can already control whether individual RTAS functions are handled in-kernel or not via the KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for H_RTAS is out of the normal sequence of hcall numbers. Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM. The args field of the struct kvm_enable_cap specifies the hcall number in args[0] and the enable/disable flag in args[1]; 0 means disable in-kernel handling (so that the hcall will always cause an exit to userspace) and 1 means enable. Enabling or disabling in-kernel handling of an hcall is effective across the whole VM. The ability for KVM_ENABLE_CAP to be used on a VM file descriptor on PowerPC is new, added by this commit. The KVM_CAP_ENABLE_CAP_VM capability advertises that this ability exists. When a VM is created, an initial set of hcalls are enabled for in-kernel handling. The set that is enabled is the set that have an in-kernel implementation at this point. Any new hcall implementations from this point onwards should not be added to the default set without a good reason. No distinction is made between real-mode and virtual-mode hcall implementations; the one setting controls them both. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()Anton Blanchard
Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-28KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issueAnton Blanchard
To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-08Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into kvm-masterPaolo Bonzini
Patch queue for 3.16 - 2014-07-08 A few bug fixes to make 3.16 work well with KVM on PowerPC: - Fix ppc32 module builds - Fix Little Endian hosts - Fix Book3S HV HPTE lookup with huge pages in guest - Fix BookE lock leak
2014-07-07KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC()Anton Blanchard
Both kvmppc_hv_entry_trampoline and kvmppc_entry_trampoline are assembly functions that are exported to modules and also require a valid r2. As such we need to use _GLOBAL_TOC so we provide a global entry point that establishes the TOC (r2). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-11powerpc/book3s: Fix guest MC delivery mechanism to avoid soft lockups in guest.Mahesh Salgaonkar
Currently we forward MCEs to guest which have been recovered by guest. And for unhandled errors we do not deliver the MCE to guest. It looks like with no support of FWNMI in qemu, guest just panics whenever we deliver the recovered MCEs to guest. Also, the existig code used to return to host for unhandled errors which was casuing guest to hang with soft lockups inside guest and makes it difficult to recover guest instance. This patch now forwards all fatal MCEs to guest causing guest to crash/panic. And, for recovered errors we just go back to normal functioning of guest instead of returning to host. This fixes soft lockup issues in guest. This patch also fixes an issue where guest MCE events were not logged to host console. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-10Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "Here is the bulk of the powerpc changes for this merge window. It got a bit delayed in part because I wasn't paying attention, and in part because I discovered I had a core PCI change without a PCI maintainer ack in it. Bjorn eventually agreed it was ok to merge it though we'll probably improve it later and I didn't want to rebase to add his ack. There is going to be a bit more next week, essentially fixes that I still want to sort through and test. The biggest item this time is the support to build the ppc64 LE kernel with our new v2 ABI. We previously supported v2 userspace but the kernel itself was a tougher nut to crack. This is now sorted mostly thanks to Anton and Rusty. We also have a fairly big series from Cedric that add support for 64-bit LE zImage boot wrapper. This was made harder by the fact that traditionally our zImage wrapper was always 32-bit, but our new LE toolchains don't really support 32-bit anymore (it's somewhat there but not really "supported") so we didn't want to rely on it. This meant more churn that just endian fixes. This brings some more LE bits as well, such as the ability to run in LE mode without a hypervisor (ie. under OPAL firmware) by doing the right OPAL call to reinitialize the CPU to take HV interrupts in the right mode and the usual pile of endian fixes. There's another series from Gavin adding EEH improvements (one day we *will* have a release with less than 20 EEH patches, I promise!). Another highlight is the support for the "Split core" functionality on P8 by Michael. This allows a P8 core to be split into "sub cores" of 4 threads which allows the subcores to run different guests under KVM (the HW still doesn't support a partition per thread). And then the usual misc bits and fixes ..." [ Further delayed by gmail deciding that BenH is a dirty spammer. Google knows. ] * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits) powerpc/powernv: Add missing include to LPC code selftests/powerpc: Test the THP bug we fixed in the previous commit powerpc/mm: Check paca psize is up to date for huge mappings powerpc/powernv: Pass buffer size to OPAL validate flash call powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC() powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() powerpc/powernv: Set memory_block_size_bytes to 256MB powerpc: Allow ppc_md platform hook to override memory_block_size_bytes powerpc/powernv: Fix endian issues in memory error handling code powerpc/eeh: Skip eeh sysfs when eeh is disabled powerpc: 64bit sendfile is capped at 2GB powerpc/powernv: Provide debugfs access to the LPC bus via OPAL powerpc/serial: Use saner flags when creating legacy ports powerpc: Add cpu family documentation powerpc/xmon: Fix up xmon format strings powerpc/powernv: Add calls to support little endian host powerpc: Document sysfs DSCR interface powerpc: Fix regression of per-CPU DSCR setting powerpc: Split __SYSFS_SPRSETUP macro arch: powerpc/fadump: Cleaning up inconsistent NULL checks ...
2014-06-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into nextLinus Torvalds
Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...
2014-05-30KVM: PPC: Book3S HV: Fix machine check delivery to guestPaul Mackerras
The code that delivered a machine check to the guest after handling it in real mode failed to load up r11 before calling kvmppc_msr_interrupt, which needs the old MSR value in r11 so it can see the transactional state there. This adds the missing load. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugsPaul Mackerras
This adds workarounds for two hardware bugs in the POWER8 performance monitor unit (PMU), both related to interrupt generation. The effect of these bugs is that PMU interrupts can get lost, leading to tools such as perf reporting fewer counts and samples than they should. The first bug relates to the PMAO (perf. mon. alert occurred) bit in MMCR0; setting it should cause an interrupt, but doesn't. The other bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0. Setting PMAE when a counter is negative and counter negative conditions are enabled to cause alerts should cause an alert, but doesn't. The workaround for the first bug is to create conditions where a counter will overflow, whenever we are about to restore a MMCR0 value that has PMAO set (and PMAO_SYNC clear). The workaround for the second bug is to freeze all counters using MMCR2 before reading MMCR0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-28powerpc: Fix regression of per-CPU DSCR settingSam bobroff
Since commit "efcac65 powerpc: Per process DSCR + some fixes (try#4)" it is no longer possible to set the DSCR on a per-CPU basis. The old behaviour was to minipulate the DSCR SPR directly but this is no longer sufficient: the value is quickly overwritten by context switching. This patch stores the per-CPU DSCR value in a kernel variable rather than directly in the SPR and it is used whenever a process has not set the DSCR itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged. Writes to the old global default (/sys/devices/system/cpu/dscr_default) now set all of the per-CPU values and reads return the last written value. The new per-CPU default is added to the paca_struct and is used everywhere outside of sysfs.c instead of the old global default. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-05-13Merge tag 'signed-for-3.15' of git://github.com/agraf/linux-2.6 into kvm-masterPaolo Bonzini
Patch queue for 3.15 - 2014-05-12 This request includes a few bug fixes that really shouldn't wait for the next release. It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM acceleration when NX gets honored by the host. Furthermore we fix transactional memory support and numa support on HV KVM.
2014-05-05Merge remote-tracking branch 'anton/abiv2' into nextBenjamin Herrenschmidt
This series adds support for building the powerpc 64-bit LE kernel using the new ABI v2. We already supported running ABI v2 userspace programs but this adds support for building the kernel itself using the new ABI.
2014-04-28KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exitPaul Mackerras
Testing by Michael Neuling revealed that commit e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support") is missing the code that saves away the checkpointed state of the guest when switching to the host. This adds that code, which was in earlier versions of the patch but went missing somehow. Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-04-28ppc/kvm: Clear the runlatch bit of a vcpu before nappingPreeti U Murthy
When the guest cedes the vcpu or the vcpu has no guest to run it naps. Clear the runlatch bit of the vcpu before napping to indicate an idle cpu. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28ppc/kvm: Set the runlatch bit of a CPU just before starting guestPreeti U Murthy
The secondary threads in the core are kept offline before launching guests in kvm on powerpc: "371fefd6f2dc4666:KVM: PPC: Allow book3s_hv guests to use SMT processor modes." Hence their runlatch bits are cleared. When the secondary threads are called in to start a guest, their runlatch bits need to be set to indicate that they are busy. The primary thread has its runlatch bit set though, but there is no harm in setting this bit once again. Hence set the runlatch bit for all threads before they start guest. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-23powerpc: Create DOTSYM to wrap dot symbol usageAnton Blanchard
There are a few places we have to use dot symbols with the current ABI - the syscall table and the kvm hcall table. Wrap both of these with a new macro called DOTSYM so it will be easy to transition away from dot symbols in a future ABI. Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-23powerpc: No need to use dot symbols when branching to a functionAnton Blanchard
binutils is smart enough to know that a branch to a function descriptor is actually a branch to the functions text address. Alan tells me that binutils has been doing this for 9 years. Signed-off-by: Anton Blanchard <anton@samba.org>
2014-04-02Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...
2014-04-02Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull main powerpc updates from Ben Herrenschmidt: "This time around, the powerpc merges are going to be a little bit more complicated than usual. This is the main pull request with most of the work for this merge window. I will describe it a bit more further down. There is some additional cpuidle driver work, however I haven't included it in this tree as it depends on some work in tip/timer-core which Thomas accidentally forgot to put in a topic branch. Since I didn't want to carry all of that tip timer stuff in powerpc -next, I setup a separate branch on top of Thomas tree with just that cpuidle driver in it, and Stephen has been carrying that in next separately for a while now. I'll send a separate pull request for it. Additionally, two new pieces in this tree add users for a sysfs API that Tejun and Greg have been deprecating in drivers-core-next. Thankfully Greg reverted the patch that removes the old API so this merge can happen cleanly, but once merged, I will send a patch adjusting our new code to the new API so that Greg can send you the removal patch. Now as for the content of this branch, we have a lot of perf work for power8 new counters including support for our new "nest" counters (also called 24x7) under pHyp (not natively yet). We have new functionality when running under the OPAL firmware (non-virtualized or KVM host), such as access to the firmware error logs and service processor dumps, system parameters and sensors, along with a hwmon driver for the latter. There's also a bunch of bug fixes accross the board, some LE fixes, and a nice set of selftests for validating our various types of copy loops. On the Freescale side, we see mostly new chip/board revisions, some clock updates, better support for machine checks and debug exceptions, etc..." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (70 commits) powerpc/book3s: Fix CFAR clobbering issue in machine check handler. powerpc/compat: 32-bit little endian machine name is ppcle, not ppc powerpc/le: Big endian arguments for ppc_rtas() powerpc: Use default set of netfilter modules (CONFIG_NETFILTER_ADVANCED=n) powerpc/defconfigs: Enable THP in pseries defconfig powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split powerpc: Rate-limit users spamming kernel log buffer powerpc/perf: Fix handling of L3 events with bank == 1 powerpc/perf/hv_{gpci, 24x7}: Add documentation of device attributes powerpc/perf: Add kconfig option for hypervisor provided counters powerpc/perf: Add support for the hv 24x7 interface powerpc/perf: Add support for the hv gpci (get performance counter info) interface powerpc/perf: Add macros for defining event fields & formats powerpc/perf: Add a shared interface to get gpci version and capabilities powerpc/perf: Add 24x7 interface headers powerpc/perf: Add hv_gpci interface header powerpc: Add hvcalls for 24x7 and gpci (Get Performance Counter Info) sysfs: create bin_attributes under the requested group powerpc/perf: Enable BHRB access for EBB events powerpc/perf: Add BHRB constraint and IFM MMCRA handling for EBB ...
2014-03-29KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8Paul Mackerras
Currently we save the host PMU configuration, counter values, etc., when entering a guest, and restore it on return from the guest. (We have to do this because the guest has control of the PMU while it is executing.) However, we missed saving/restoring the SIAR and SDAR registers, as well as the registers which are new on POWER8, namely SIER and MMCR2. This adds code to save the values of these registers when entering the guest and restore them on exit. This also works around the bug in POWER8 where setting PMAE with a counter already negative doesn't generate an interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offsetPaul Mackerras
Commit c7699822bc21 ("KVM: PPC: Book3S HV: Make physical thread 0 do the MMU switching") reordered the guest entry/exit code so that most of the guest register save/restore code happened in guest MMU context. A side effect of that is that the timebase still contains the guest timebase value at the point where we compute and use vcpu->arch.dec_expires, and therefore that is now a guest timebase value rather than a host timebase value. That in turn means that the timeouts computed in kvmppc_set_timer() are wrong if the timebase offset for the guest is non-zero. The consequence of that is things such as "sleep 1" in a guest after migration may sleep for much longer than they should. This fixes the problem by converting between guest and host timebase values as necessary, by adding or subtracting the timebase offset. This also fixes an incorrect comment. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Add transactional memory supportMichael Neuling
This adds saving of the transactional memory (TM) checkpointed state on guest entry and exit. We only do this if we see that the guest has an active transaction. It also adds emulation of the TM state changes when delivering IRQs into the guest. According to the architecture, if we are transactional when an IRQ occurs, the TM state is changed to suspended, otherwise it's left unchanged. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-26KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCELaurent Dufour
This introduces the H_GET_TCE hypervisor call, which is basically the reverse of H_PUT_TCE, as defined in the Power Architecture Platform Requirements (PAPR). The hcall H_GET_TCE is required by the kdump kernel, which uses it to retrieve TCEs set up by the previous (panicked) kernel. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2014-03-19powerpc/booke64: Use SPRG7 for VDSOScott Wood
Previously SPRG3 was marked for use by both VDSO and critical interrupts (though critical interrupts were not fully implemented). In commit 8b64a9dfb091f1eca8b7e58da82f1e7d1d5fe0ad ("powerpc/booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int"), Mihai Caraman made an attempt to resolve this conflict by restoring the VDSO value early in the critical interrupt, but this has some issues: - It's incompatible with EXCEPTION_COMMON which restores r13 from the by-then-overwritten scratch (this cost me some debugging time). - It forces critical exceptions to be a special case handled differently from even machine check and debug level exceptions. - It didn't occur to me that it was possible to make this work at all (by doing a final "ld r13, PACA_EXCRIT+EX_R13(r13)") until after I made (most of) this patch. :-) It might be worth investigating using a load rather than SPRG on return from all exceptions (except TLB misses where the scratch never leaves the SPRG) -- it could save a few cycles. Until then, let's stick with SPRG for all exceptions. Since we cannot use SPRG4-7 for scratch without corrupting the state of a KVM guest, move VDSO to SPRG7 on book3e. Since neither SPRG4-7 nor critical interrupts exist on book3s, SPRG3 is still used for VDSO there. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: kvm-ppc@vger.kernel.org
2014-03-13KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVEPaul Mackerras
Commit 595e4f7e697e ("KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit") changed the register usage in kvmppc_save_fp() and kvmppc_load_fp() but omitted changing the instructions that load and save VRSAVE. The result is that the VRSAVE value was loaded from a constant address, and saved to a location past the end of the vcpu struct, causing host kernel memory corruption and various kinds of host kernel crashes. This fixes the problem by using register r31, which contains the vcpu pointer, instead of r3 and r4. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-13KVM: PPC: Book3S HV: Remove bogus duplicate codePaul Mackerras
Commit 7b490411c37f ("KVM: PPC: Book3S HV: Add new state for transactional memory") incorrectly added some duplicate code to the guest exit path because I didn't manage to clean up after a rebase correctly. This removes the extraneous material. The presence of this extraneous code causes host crashes whenever a guest is run. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queuePaolo Bonzini
Conflicts: arch/powerpc/kvm/book3s_hv_rmhandlers.S arch/powerpc/kvm/booke.c
2014-01-27KVM: PPC: Book3S HV: Add new state for transactional memoryMichael Neuling
Add new state for transactional memory (TM) to kvm_vcpu_arch. Also add asm-offset bits that are going to be required. This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a CONFIG_PPC_TRANSACTIONAL_MEM section. This requires some code changes to ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N. Much of the added the added #ifdefs are removed in a later patch when the bulk of the TM code is added. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix merge conflict] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Basic little-endian guest supportAnton Blanchard
We create a guest MSR from scratch when delivering exceptions in a few places. Instead of extracting LPCR[ILE] and inserting it into MSR_LE each time, we simply create a new variable intr_msr which contains the entire MSR to use. For a little-endian guest, userspace needs to set the ILE (interrupt little-endian) bit in the LPCR for each vcpu (or at least one vcpu in each virtual core). [paulus@samba.org - removed H_SET_MODE implementation from original version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Add support for DABRX register on POWER7Paul Mackerras
The DABRX (DABR extension) register on POWER7 processors provides finer control over which accesses cause a data breakpoint interrupt. It contains 3 bits which indicate whether to enable accesses in user, kernel and hypervisor modes respectively to cause data breakpoint interrupts, plus one bit that enables both real mode and virtual mode accesses to cause interrupts. Currently, KVM sets DABRX to allow both kernel and user accesses to cause interrupts while in the guest. This adds support for the guest to specify other values for DABRX. PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR and DABRX with one call. This adds a real-mode implementation of H_SET_XDABR, which shares most of its code with the existing H_SET_DABR implementation. To support this, we add a per-vcpu field to store the DABRX value plus code to get and set it via the ONE_REG interface. For Linux guests to use this new hcall, userspace needs to add "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions property in the device tree. If userspace does this and then migrates the guest to a host where the kernel doesn't include this patch, then userspace will need to implement H_SET_XDABR by writing the specified DABR value to the DABR using the ONE_REG interface. In that case, the old kernel will set DABRX to DABRX_USER | DABRX_KERNEL. That should still work correctly, at least for Linux guests, since Linux guests cope with getting data breakpoint interrupts in modes that weren't requested by just ignoring the interrupt, and Linux guests never set DABRX_BTI. The other thing this does is to make H_SET_DABR and H_SET_XDABR work on POWER8, which has the DAWR and DAWRX instead of DABR/X. Guests that know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but guests running in POWER7 compatibility mode will still use H_SET_[X]DABR. For them, this adds the logic to convert DABR/X values into DAWR/X values on POWER8. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbellsPaul Mackerras
POWER8 has support for hypervisor doorbell interrupts. Though the kernel doesn't use them for IPIs on the powernv platform yet, it probably will in future, so this makes KVM cope gracefully if a hypervisor doorbell interrupt arrives while in a guest. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Handle guest using doorbells for IPIsPaul Mackerras
* SRR1 wake reason field for system reset interrupt on wakeup from nap is now a 4-bit field on P8, compared to 3 bits on P7. * Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells will wake us up. * Waking up from nap because of a guest doorbell interrupt is not a reason to exit the guest. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from napPaul Mackerras
Currently in book3s_hv_rmhandlers.S we have three places where we have woken up from nap mode and we check the reason field in SRR1 to see what event woke us up. This consolidates them into a new function, kvmppc_check_wake_reason. It looks at the wake reason field in SRR1, and if it indicates that an external interrupt caused the wakeup, calls kvmppc_read_intr to check what sort of interrupt it was. This also consolidates the two places where we synthesize an external interrupt (0x500 vector) for the guest. Now, if the guest exit code finds that there was an external interrupt which has been handled (i.e. it was an IPI indicating that there is now an interrupt pending for the guest), it jumps to deliver_guest_interrupt, which is in the last part of the guest entry code, where we synthesize guest external and decrementer interrupts. That code has been streamlined a little and now clears LPCR[MER] when appropriate as well as setting it. The extra clearing of any pending IPI on a secondary, offline CPU thread before going back to nap mode has been removed. It is no longer necessary now that we have code to read and acknowledge IPIs in the guest exit path. This fixes a minor bug in the H_CEDE real-mode handling - previously, if we found that other threads were already exiting the guest when we were about to go to nap mode, we would branch to the cede wakeup path and end up looking in SRR1 for a wakeup reason. Now we branch to a point after we have checked the wakeup reason. This also fixes a minor bug in kvmppc_read_intr - previously it could return 0xff rather than 1, in the case where we find that a host IPI is pending after we have cleared the IPI. Now it returns 1. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8Paul Mackerras
POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need to do more tlbiel instructions when flushing the TLB on POWER8. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Context-switch new POWER8 SPRsMichael Neuling
This adds fields to the struct kvm_vcpu_arch to store the new guest-accessible SPRs on POWER8, adds code to the get/set_one_reg functions to allow userspace to access this state, and adds code to the guest entry and exit to context-switch these SPRs between host and guest. Note that DPDES (Directed Privileged Doorbell Exception State) is shared between threads on a core; hence we store it in struct kvmppc_vcore and have the master thread save and restore it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbersPaul Mackerras
On a threaded processor such as POWER7, we group VCPUs into virtual cores and arrange that the VCPUs in a virtual core run on the same physical core. Currently we don't enforce any correspondence between virtual thread numbers within a virtual core and physical thread numbers. Physical threads are allocated starting at 0 on a first-come first-served basis to runnable virtual threads (VCPUs). POWER8 implements a new "msgsndp" instruction which guest kernels can use to interrupt other threads in the same core or sub-core. Since the instruction takes the destination physical thread ID as a parameter, it becomes necessary to align the physical thread IDs with the virtual thread IDs, that is, to make sure virtual thread N within a virtual core always runs on physical thread N. This means that it's possible that thread 0, which is where we call __kvmppc_vcore_entry, may end up running some other vcpu than the one whose task called kvmppc_run_core(), or it may end up running no vcpu at all, if for example thread 0 of the virtual core is currently executing in userspace. However, we do need thread 0 to be responsible for switching the MMU -- a previous version of this patch that had other threads switching the MMU was found to be responsible for occasional memory corruption and machine check interrupts in the guest on POWER7 machines. To accommodate this, we no longer pass the vcpu pointer to __kvmppc_vcore_entry, but instead let the assembly code load it from the PACA. Since the assembly code will need to know the kvm pointer and the thread ID for threads which don't have a vcpu, we move the thread ID into the PACA and we add a kvm pointer to the virtual core structure. In the case where thread 0 has no vcpu to run, it still calls into kvmppc_hv_entry in order to do the MMU switch, and then naps until either its vcpu is ready to run in the guest, or some other thread needs to exit the guest. In the latter case, thread 0 jumps to the code that switches the MMU back to the host. This control flow means that now we switch the MMU before loading any guest vcpu state. Similarly, on guest exit we now save all the guest vcpu state before switching the MMU back to the host. This has required substantial code movement, making the diff rather large. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-27KVM: PPC: Book3S HV: Don't set DABR on POWER8Michael Neuling
POWER8 doesn't have the DABR and DABRX registers; instead it has new DAWR/DAWRX registers, which will be handled in a later patch. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exitPaul Mackerras
This modifies kvmppc_load_fp and kvmppc_save_fp to use the generic FP/VSX and VMX load/store functions instead of open-coding the FP/VSX/VMX load/store instructions. Since kvmppc_load/save_fp don't follow C calling conventions, we make them private symbols within book3s_hv_rmhandlers.S. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structuresPaul Mackerras
This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-18powerpc: book3s: kvm: Don't abuse host r2 in exit pathAneesh Kumar K.V
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf <agraf@suse.de>
2013-11-21powerpc: kvm: optimize "sc 1" as fast returnLiu Ping Fan
In some scene, e.g openstack CI, PR guest can trigger "sc 1" frequently, this patch optimizes the path by directly delivering BOOK3S_INTERRUPT_SYSCALL to HV guest, so powernv can return to HV guest without heavy exit, i.e, no need to swap TLB, HTAB,.. etc Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>