summaryrefslogtreecommitdiff
path: root/arch/s390
AgeCommit message (Collapse)Author
2016-05-18s390/mm: fix asce_bits handling with dynamic pagetable levelsGerald Schaefer
commit 723cacbd9dc79582e562c123a0bacf8bfc69e72a upstream. There is a race with multi-threaded applications between context switch and pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a pagetable upgrade on another thread in crst_table_upgrade() could already have set new asce_bits, but not yet the new mm->pgd. This would result in a corrupt user_asce in switch_mm(), and eventually in a kernel panic from a translation exception. Fix this by storing the complete asce instead of just the asce_bits, which can then be read atomically from switch_mm(), so that it either sees the old value or the new value, but no mixture. Both cases are OK. Having the old value would result in a page fault on access to the higher level memory, but the fault handler would see the new mm->pgd, if it was a valid access after the mmap on the other thread has completed. So as worst-case scenario we would have a page fault loop for the racing thread until the next time slice. Also remove dead code and simplify the upgrade/downgrade path, there are no upgrades from 2 levels, and only downgrades from 3 levels for compat tasks. There are also no concurrent upgrades, because the mmap_sem is held with down_write() in do_mmap, so the flush and table checks during upgrade can be removed. Reported-by: Michael Munday <munday@ca.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04s390/pci: add extra padding to function measurement blockSebastian Ott
commit 9d89d9e61d361f3adb75e1aebe4bb367faf16cfa upstream. Newer machines might use a different (larger) format for function measurement blocks. To ensure that we comply with the alignment requirement on these machines and prevent memory corruption (when firmware writes more data than we expect) add 16 padding bytes at the end of the fmb. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12s390/pci: enforce fmb page boundary ruleSebastian Ott
commit 80c544ded25ac14d7cc3e555abb8ed2c2da99b84 upstream. The function measurement block must not cross a page boundary. Ensure that by raising the alignment requirement to the smallest power of 2 larger than the size of the fmb. Fixes: d0b088531 ("s390/pci: performance statistics and debug infrastructure") Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12s390/cpumf: add missing lpp magic initializationHeiko Carstens
commit 8f100bb1ff27873dd71f636da670e503b9ade3c6 upstream. Add the missing lpp magic initialization for cpu 0. Without this all samples on cpu 0 do not have the most significant bit set in the program parameter field, which we use to distinguish between guest and host samples if the pid is also 0. We did initialize the lpp magic in the absolute zero lowcore but forgot that when switching to the allocated lowcore on cpu 0 only. Reported-by: Shu Juan Zhang <zhshuj@cn.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: e22cf8ca6f75 ("s390/cpumf: rework program parameter setting to detect guest samples") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12s390: fix floating pointer register corruption (again)Martin Schwidefsky
commit e370e4769463a65dcf8806fa26d2874e0542ac41 upstream. There is a tricky interaction between the machine check handler and the critical sections of load_fpu_regs and save_fpu_regs functions. If the machine check interrupts one of the two functions the critical section cleanup will complete the function before the machine check handler s390_do_machine_check is called. Trouble is that the machine check handler needs to validate the floating point registers *before* and not *after* the completion of load_fpu_regs/save_fpu_regs. The simplest solution is to rewind the PSW to the start of the load_fpu_regs/save_fpu_regs and retry the function after the return from the machine check handler. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12s390/cpumf: Fix lpp detectionChristian Borntraeger
commit 7a76aa95f6f6682db5629449d763251d1c9f8c4e upstream. we have to check bit 40 of the facility list before issuing LPP and not bit 48. Otherwise a guest running on a system with "The decimal-floating-point zoned-conversion facility" and without the "The set-program-parameters facility" might crash on an lpp instruction. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: e22cf8ca6f75 ("s390/cpumf: rework program parameter setting to detect guest samples") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16s390/mm: four page table levels vs. forkMartin Schwidefsky
commit 3446c13b268af86391d06611327006b059b8bab1 upstream. The fork of a process with four page table levels is broken since git commit 6252d702c5311ce9 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index *and* the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Kościelnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUSDavid Hildenbrand
commit 9522b37f5a8c7bfabe46eecadf2e130f1103f337 upstream. With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08a7d6 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03s390/fpu: signals vs. floating point control registerMartin Schwidefsky
commit 1b17cb796f5d40ffa239c6926385abd83a77a49b upstream. git commit 904818e2f229f3d94ec95f6932a6358c81e73d78 "s390/kernel: introduce fpu-internal.h with fpu helper functions" introduced the fpregs_store / fp_regs_load helper. These function fail to save and restore the floating pointer control registers. The effect is that the FPC is not correctly handled on signal delivery and signal return. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03s390/compat: correct restore of high gprs on signal returnMartin Schwidefsky
commit 342300cc9cd3428bc6bfe5809bfcc1b9a0f06702 upstream. git commit 8070361799ae1e3f4ef347bd10f0a508ac10acfb "s390: add support for vector extension" broke 31-bit compat processes in regard to signal handling. The restore_sigregs_ext32() function is used to restore the additional elements from the user space signal frame. Among the additional elements are the upper registers halves for 64-bit register support for 31-bit processes. The copy_from_user that is used to retrieve the high-gprs array from the user stack uses an incorrect length, 8 bytes instead of 64 bytes. This causes incorrect upper register halves to get loaded. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03s390: fix normalization bug in exception table sortingArd Biesheuvel
commit bcb7825a77f41c7dd91da6f7ac10b928156a322e upstream. The normalization pass in the sorting routine of the relative exception table serves two purposes: - it ensures that the address fields of the exception table entries are fully ordered, so that no ambiguities arise between entries with identical instruction offsets (i.e., when two instructions that are exactly 8 bytes apart each have an exception table entry associated with them) - it ensures that the offsets of both the instruction and the fixup fields of each entry are relative to their final location after sorting. Commit eb608fb366de ("s390/exceptions: switch to relative exception table entries") ported the relative exception table format from x86, but modified the sorting routine to only normalize the instruction offset field and not the fixup offset field. The result is that the fixup offset of each entry will be relative to the original location of the entry before sorting, likely leading to crashes when those entries are dereferenced. Fixes: eb608fb366de ("s390/exceptions: switch to relative exception table entries") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03KVM: s390: fix memory overwrites when vx is disabledDavid Hildenbrand
commit 9abc2a08a7d665b02bdde974fd6c44aae86e923e upstream. The kernel now always uses vector registers when available, however KVM has special logic if support is really enabled for a guest. If support is disabled, guest_fpregs.fregs will only contain memory for the fpu. The kernel, however, will store vector registers into that area, resulting in crazy memory overwrites. Simply extending that area is not enough, because the format of the registers also changes. We would have to do additional conversions, making the code even more complex. Therefore let's directly use one place for the vector/fpu registers + fpc (in kvm_run). We just have to convert the data properly when accessing it. This makes current code much easier. Please note that vector/fpu registers are now always stored to vcpu->run->s.regs.vrs. Although this data is visible to QEMU and used for migration, we only guarantee valid values to user space when KVM_SYNC_VRS is set. As that is only the case when we have vector register support, we are on the safe side. Fixes: b5510d9b68c3 ("s390/fpu: always enable the vector facility if it is available") Cc: stable@vger.kernel.org # v4.4 d9a3a09af54d s390/kvm: remove dependency on struct save_area definition Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [adopt to d9a3a09af54d] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03s390/kvm: remove dependency on struct save_area definitionMartin Schwidefsky
commit d9a3a09af54d01ab8b0c320580f4f95328d4a7ac upstream. Replace the offsets based on the struct area_area with the offset constants from asm-offsets.c based on the struct _lowcore. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03KVM: s390: fix guest fprs memory leakDavid Hildenbrand
commit 9c7ebb613bffea2feef4ec562ba1dbcaa810942b upstream. fprs is never freed, therefore resulting in a memory leak if kvm_vcpu_init() fails or the vcpu is destroyed. Fixes: 9977e886cbbc ("s390/kernel: lazy restore fpu registers") Reported-by: Eric Farman <farman@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-18s390/dis: Fix handling of format specifiersMichael Holzheu
The print_insn() function returns strings like "lghi %r1,0". To escape the '%' character in sprintf() a second '%' is used. For example "lghi %%r1,0" is converted into "lghi %r1,0". After print_insn() the output string is passed to printk(). Because format specifiers like "%r" or "%f" are ignored by printk() this works by chance most of the time. But for instructions with control registers like "lctl %c6,%c6,780" this fails because printk() interprets "%c" as character format specifier. Fix this problem and escape the '%' characters twice. For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf() into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780". Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-24Merge tag 'kvm-arm-for-v4.4-rc3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM Fixes for v4.4-rc3. Includes some timer fixes, properly unmapping PTEs, an errata fix, and two tweaks to the EL2 panic code.
2015-11-19KVM: s390: fix wrong lookup of VCPUs by array indexDavid Hildenbrand
For now, VCPUs were always created sequentially with incrementing VCPU ids. Therefore, the index in the VCPUs array matched the id. As sequential creation might change with cpu hotplug, let's use the correct lookup function to find a VCPU by id, not array index. Let's also use kvm_lookup_vcpu() for validation of the sending VCPU on external call injection. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # db27a7a KVM: Provide function for VCPU lookup by id
2015-11-19KVM: s390: avoid memory overwrites on emergency signal injectionDavid Hildenbrand
Commit 383d0b050106 ("KVM: s390: handle pending local interrupts via bitmap") introduced a possible memory overwrite from user space. User space could pass an invalid emergency signal code (sending VCPU) and therefore exceed the bitmap. Let's take care of this case and check that the id is in the valid range. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v3.19+ db27a7a KVM: Provide function for VCPU lookup by id Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-19KVM: s390: fix pfmf intercept handlerHeiko Carstens
The pfmf intercept handler should check if the EDAT 1 facility is installed in the guest, not if it is installed in the host. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-19KVM: s390: enable SIMD only when no VCPUs were createdDavid Hildenbrand
We should never allow to enable/disable any facilities for the guest when other VCPUs were already created. kvm_arch_vcpu_(load|put) relies on SIMD not changing during runtime. If somebody would create and run VCPUs and then decides to enable SIMD, undefined behaviour could be possible (e.g. vector save area not being set up). Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # 4.1+
2015-11-16s390: remove SALIPL loaderHeiko Carstens
There is no known user, therefore remove the code. Acked-by: Rob Van Der Heij <robvdheij@nl.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: wire up mlock2 system callHeiko Carstens
Passes mlock2-tests test case in 64 bit and compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: remove g5 elf platform supportHeiko Carstens
Remove dead code, since this could only happen on a 31 bit machine where the kernel wouldn't IPL. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16s390: avoid cache aliasing under z/VM and KVMMartin Schwidefsky
commit 1f6b83e5e4d3 ("s390: avoid z13 cache aliasing") checks for the machine type to optimize address space randomization and zero page allocation to avoid cache aliases. This check might fail under a hypervisor with migration support. z/VMs "Single System Image and Live Guest Relocation" facility will "fake" the machine type of the oldest system in the group. For example in a group of zEC12 and Z13 the guest appears to run on a zEC12 (architecture fencing within the relocation domain) Remove the machine type detection and always use cache aliasing rules that are known to work for all machines. These are the z13 aliasing rules. Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-12s390/sclp: _sclp_wait_int(): retain full PSW maskSascha Silbe
There's no reason to clear all PSW mask bits other than the addressing mode bits. Just use the previous PSW mask as-is. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-11s390: add support for ipl devices in subchannel sets > 0Sebastian Ott
Allow to ipl from CCW based devices residing in any subchannel set. Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-11s390/ipl: fix out of bounds access in scpdata_writeSebastian Ott
The input buffer in reipl_fcp_scpdata_write is accessed out of bounds when an offset is specified. The problem is that the offset refers to the data we should write to and not to the buffer we read from. So instead of memcpy(scp_data, buf + off, count); we could just do memcpy(scp_data + off, buf, count); However we not only modify the data but also store its length. For this to work we'd need to remember a state per open FH. Since that's not possible with sysfs callbacks let's just fail when an offset is specified. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/pci_dma: improve debugging of errors during dma mapSebastian Ott
Improve debugging to find out what went wrong during a failed dma map/unmap operation. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/pci_dma: handle dma table failuresSebastian Ott
We use lazy allocation for translation table entries but don't handle allocation (and other) failures during translation table updates. Handle these failures and undo translation table updates when it's meaningful. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/pci_dma: unify label of invalid translation table entriesSebastian Ott
Newly allocated translation table entries are flagged as invalid and protected. If an existing translation table entry is invalidated, the protection flag is left unchanged. If a page (with invalid and protection flag set) is accessed it's undefined which type of exception we'll receive. Make sure to always set the invalid flag only. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/syscalls: remove system call number calculationHeiko Carstens
Explicitly write the system call number for each define instead of calculating it. This makes it easier to parse the file when generating system call tables for various tools and libraries. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/diag: add a s390 prefix to the diagnose trace pointMartin Schwidefsky
Documentation/trace/tracepoints.txt states that the naming scheme for tracepoints is "subsys_event" to avoid collisions. Rename the 'diagnose' tracepoint to 's390_diagnose'. Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09s390/head: fix error message on unsupported hardwareSascha Silbe
startup calls the C function _sclp_print_early() if the machine we're running on is not supported by the kernel. sclp.c is getting built with -m64, so _sclp_print_early() expects the zSeries ELF ABI to be used. We previously called _sclp_print_early() using the S/390 ELF ABI, with a stack frame size of 96 bytes and while being in 31-bit address mode. This caused _sclp_wait_int() (called indirectly from _sclp_print_early()) to jump to an undefined address. While _sclp_wait_int() contained some code to deal with being called in 31-bit addressing mode, it didn't quite work. While fixing this is possible, the code would still only work by chance and could break any time. Ensure compliance with the zSeries ELF ABI by switching to 64-bit addressing mode early and using a minimum stack frame size of 160 bytes. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.4. s390: A bunch of fixes and optimizations for interrupt and time handling. PPC: Mostly bug fixes. ARM: No big features, but many small fixes and prerequisites including: - a number of fixes for the arch-timer - introducing proper level-triggered semantics for the arch-timers - a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding) - some tracepoint improvements - a tweak for the EL2 panic handlers - some more VGIC cleanups getting rid of redundant state x86: Quite a few changes: - support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). This introduces a new component (in virt/lib/) that connects VFIO and KVM together. The same infrastructure will be used for ARM interrupt forwarding as well. - more Hyper-V features, though the main one Hyper-V synthetic interrupt controller will have to wait for 4.5. These will let KVM expose Hyper-V devices. - nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster - for future hardware that supports NVDIMM, there is support for clflushopt, clwb, pcommit - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor - obligatory smattering of SMM fixes - on the guest side, stable scheduler clock support was rewritten to not require help from the hypervisor" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits) KVM: VMX: Fix commit which broke PML KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0() KVM: x86: allow RSM from 64-bit mode KVM: VMX: fix SMEP and SMAP without EPT KVM: x86: move kvm_set_irq_inatomic to legacy device assignment KVM: device assignment: remove pointless #ifdefs KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic KVM: x86: zero apic_arb_prio on reset drivers/hv: share Hyper-V SynIC constants with userspace KVM: x86: handle SMBASE as physical address in RSM KVM: x86: add read_phys to x86_emulate_ops KVM: x86: removing unused variable KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings KVM: arm/arm64: Optimize away redundant LR tracking KVM: s390: use simple switch statement as multiplexer KVM: s390: drop useless newline in debugging data KVM: s390: SCA must not cross page boundaries KVM: arm: Do not indent the arguments of DECLARE_BITMAP ...
2015-11-05Merge tag 'iommu-updates-v4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "This time including: - A new IOMMU driver for s390 pci devices - Common dma-ops support based on iommu-api for ARM64. The plan is to use this as a basis for ARM32 and hopefully other architectures as well in the future. - MSI support for ARM-SMMUv3 - Cleanups and dead code removal in the AMD IOMMU driver - Better RMRR handling for the Intel VT-d driver - Various other cleanups and small fixes" * tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits) iommu/vt-d: Fix return value check of parse_ioapics_under_ir() iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope() iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir iommu: Move default domain allocation to iommu_group_get_for_dev() iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev iommu/arm-smmu: Switch to device_group call-back iommu/fsl: Convert to device_group call-back iommu: Add device_group call-back to x86 iommu drivers iommu: Add generic_device_group() function iommu: Export and rename iommu_group_get_for_pci_dev() iommu: Revive device_group iommu-ops call-back iommu/amd: Remove find_last_devid_on_pci() iommu/amd: Remove first/last_device handling iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL iommu/amd: Cleanup buffer allocation iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu iommu/amd: Align DTE flag definitions iommu/amd: Remove old alias handling code iommu/amd: Set alias DTE in do_attach/do_detach iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled ...
2015-11-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There is only one new feature in this pull for the 4.4 merge window, most of it is small enhancements, cleanup and bug fixes: - Add the s390 backend for the software dirty bit tracking. This adds two new pgtable functions pte_clear_soft_dirty and pmd_clear_soft_dirty which is why there is a hit to arch/x86/include/asm/pgtable.h in this pull request. - A series of cleanup patches for the AP bus, this includes the removal of the support for two outdated crypto cards (PCICC and PCICA). - The irq handling / signaling on buffer full in the runtime instrumentation code is dropped. - Some micro optimizations: remove unnecessary memory barriers for a couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for spin_unlock. Use the builtin bswap if available and make test_and_set_bit_lock more cache friendly. - Statistics and a tracepoint for the diagnose calls to the hypervisor. - The CPU measurement facility support to sample KVM guests is improved. - The vector instructions are now always enabled for user space processes if the hardware has the vector facility. This simplifies the FPU handling code. The fpu-internal.h header is split into fpu internals, api and types just like x86. - Cleanup and improvements for the common I/O layer. - Rework udelay to solve a problem with kprobe. udelay has busy loop semantics but still uses an idle processor state for the wait" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits) s390: remove runtime instrumentation interrupts s390/cio: de-duplicate subchannel validation s390/css: unneeded initialization in for_each_subchannel s390/Kconfig: use builtin bswap s390/dasd: fix disconnected device with valid path mask s390/dasd: fix invalid PAV assignment after suspend/resume s390/dasd: fix double free in dasd_eckd_read_conf s390/kernel: fix ptrace peek/poke for floating point registers s390/cio: move ccw_device_stlck functions s390/cio: move ccw_device_call_handler s390/topology: reduce per_cpu() invocations s390/nmi: reduce size of percpu variable s390/nmi: fix terminology s390/nmi: remove casts s390/nmi: remove pointless error strings s390: don't store registers on disabled wait anymore s390: get rid of __set_psw_mask() s390/fpu: split fpu-internal.h into fpu internals, api, and type headers s390/dasd: fix list_del corruption after lcu changes s390/spinlock: remove unneeded serializations at unlock ...
2015-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: Changes of note: 1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell. 2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from David Ahern. 3) Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps. From Sowmini Varadhan. 4) More work to pass the network namespace context around deep into various packet path APIs, starting with the netfilter hooks. From Eric W Biederman. 5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas Richter. 6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng. 7) Support Very High Throughput in wireless MESH code, from Bob Copeland. 8) Allow setting the ageing_time in switchdev/rocker. From Scott Feldman. 9) Properly autoload L2TP type modules, from Stephen Hemminger. 10) Fix and enable offload features by default in 8139cp driver, from David Woodhouse. 11) Support both ipv4 and ipv6 sockets in a single vxlan device, from Jiri Benc. 12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning Opstad. 13) Fix IPSEC flowcache overflows on large systems, from Steffen Klassert. 14) Convert bridging to track VLANs using rhashtable entries rather than a bitmap. From Nikolay Aleksandrov. 15) Make TCP listener handling completely lockless, this is a major accomplishment. Incoming request sockets now live in the established hash table just like any other socket too. From Eric Dumazet. 15) Provide more bridging attributes to netlink, from Nikolay Aleksandrov. 16) Use hash based algorithm for ipv4 multipath routing, this was very long overdue. From Peter Nørlund. 17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann. 18) Allow non-root execution of EBPF programs, from Alexei Starovoitov. 19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This influences the port binding selection logic used by SO_REUSEPORT. 20) Add ipv6 support to VRF, from David Ahern. 21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko. 22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen. 23) Implement RACK loss recovery in TCP, from Yuchung Cheng. 24) Support multipath routes in MPLS, from Roopa Prabhu. 25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric Dumazet. 26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and Sudarsana Kalluru. 27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic Sowa. 28) Support ipv6 geneve tunnels, from John W Linville. 29) Add flood control support to switchdev layer, from Ido Schimmel. 30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from Hannes Frederic Sowa. 31) Support persistent maps and progs in bpf, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits) sh_eth: use DMA barriers switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion net: sched: kill dead code in sch_choke.c irda: Delete an unnecessary check before the function call "irlmp_unregister_service" net: dsa: mv88e6xxx: include DSA ports in VLANs net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports net/core: fix for_each_netdev_feature vlan: Invoke driver vlan hooks only if device is present arcnet/com20020: add LEDS_CLASS dependency bpf, verifier: annotate verbose printer with __printf dp83640: Only wait for timestamps for packets with timestamping enabled. ptp: Change ptp_class to a proper bitmask dp83640: Prune rx timestamp list before reading from it dp83640: Delay scheduled work. dp83640: Include hash in timestamp/packet matching ipv6: fix tunnel error handling net/mlx5e: Fix LSO vlan insertion net/mlx5e: Re-eanble client vlan TX acceleration net/mlx5e: Return error in case mlx5e_set_features() fails net/mlx5e: Don't allow more than max supported channels ...
2015-11-04Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Add support for cipher output IVs in testmgr - Add missing crypto_ahash_blocksize helper - Mark authenc and des ciphers as not allowed under FIPS. Algorithms: - Add CRC support to 842 compression - Add keywrap algorithm - A number of changes to the akcipher interface: + Separate functions for setting public/private keys. + Use SG lists. Drivers: - Add Intel SHA Extension optimised SHA1 and SHA256 - Use dma_map_sg instead of custom functions in crypto drivers - Add support for STM32 RNG - Add support for ST RNG - Add Device Tree support to exynos RNG driver - Add support for mxs-dcp crypto device on MX6SL - Add xts(aes) support to caam - Add ctr(aes) and xts(aes) support to qat - A large set of fixes from Russell King for the marvell/cesa driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits) crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params() crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used hwrng: exynos - Add Device Tree support hwrng: exynos - Fix missing configuration after suspend to RAM hwrng: exynos - Add timeout for waiting on init done dt-bindings: rng: Describe Exynos4 PRNG bindings crypto: marvell/cesa - use __le32 for hardware descriptors crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op() crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio() crypto: marvell/cesa - use gfp_t for gfp flags crypto: marvell/cesa - use dma_addr_t for cur_dma crypto: marvell/cesa - use readl_relaxed()/writel_relaxed() crypto: caam - fix indentation of close braces crypto: caam - only export the state we really need to export crypto: caam - fix non-block aligned hash calculation crypto: caam - avoid needlessly saving and restoring caam_hash_ctx crypto: caam - print errno code when hash registration fails crypto: marvell/cesa - fix memory leak crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req() crypto: marvell/cesa - rearrange handling for sw padded hashes ...
2015-11-04Merge tag 'kvm-arm-for-4.4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h
2015-11-03s390: remove runtime instrumentation interruptsMartin Schwidefsky
The external interrupts for runtime instrumentation buffer-full and runtime instrumentation halted are unused and have no current user. Remove the support and ignore the second parameter of the s390_runtime_instr system call from now on. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03s390/Kconfig: use builtin bswapChristian Borntraeger
Depending on the gcc version we can use builtin_bswap instead of architecture functions. Doing so is better than the inline assembly version of load reverse for two reasons: - the sequence of load reversed, apply constant mask, save reversed can be optimized to load, apply reversed mask, save - builtins are slightly better to optimize e.g. gcc instruction scheduler cannot optimize grouping on inline assemblies. To enable set we have to ARCH_USE_BUILTIN_BSWAP. bloat-o-meter results: add/remove: 1/1 grow/shrink: 75/533 up/down: 1711/-9394 (-7683) Suggested-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03s390/kernel: fix ptrace peek/poke for floating point registersMartin Schwidefsky
git commit 155e839a814834a3b4b31e729f4716e59d3d2dd4 "s390/kernel: dynamically allocate FP register save area" introduced a regression in regard to ptrace. If the vector register extension is not present or unused the ptrace peek of a floating pointer register return incorrect data and the ptrace poke to a floating pointer register overwrites the task structure starting at task->thread.fpu.fprs. Cc: stable@kernel.org # v4.3 Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-02Merge branches 'x86/vt-d', 'arm/omap', 'arm/smmu', 's390', 'core' and ↵Joerg Roedel
'x86/amd' into next Conflicts: drivers/iommu/amd_iommu_types.h
2015-10-29KVM: s390: use simple switch statement as multiplexerChristian Borntraeger
We currently do some magic shifting (by exploiting that exit codes are always a multiple of 4) and a table lookup to jump into the exit handlers. This causes some calculations and checks, just to do an potentially expensive function call. Changing that to a switch statement gives the compiler the chance to inline and dynamically decide between jump tables or inline compare and branches. In addition it makes the code more readable. bloat-o-meter gives me a small reduction in code size: add/remove: 0/7 grow/shrink: 1/1 up/down: 986/-1334 (-348) function old new delta kvm_handle_sie_intercept 72 1058 +986 handle_prog 704 696 -8 handle_noop 54 - -54 handle_partial_execution 60 - -60 intercept_funcs 120 - -120 handle_instruction 198 - -198 handle_validity 210 - -210 handle_stop 316 - -316 handle_external_interrupt 368 - -368 Right now my gcc does conditional branches instead of jump tables. The inlining seems to give us enough cycles as some micro-benchmarking shows minimal improvements, but still in noise. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29KVM: s390: drop useless newline in debugging dataChristian Borntraeger
the s390 debug feature does not need newlines. In fact it will result in empty lines. Get rid of 4 leftovers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-29KVM: s390: SCA must not cross page boundariesDavid Hildenbrand
We seemed to have missed a few corner cases in commit f6c137ff00a4 ("KVM: s390: randomize sca address"). The SCA has a maximum size of 2112 bytes. By setting the sca_offset to some unlucky numbers, we exceed the page. 0x7c0 (1984) -> Fits exactly 0x7d0 (2000) -> 16 bytes out 0x7e0 (2016) -> 32 bytes out 0x7f0 (2032) -> 48 bytes out One VCPU entry is 32 bytes long. For the last two cases, we actually write data to the other page. 1. The address of the VCPU. 2. Injection/delivery/clearing of SIGP externall calls via SIGP IF. Especially the 2. happens regularly. So this could produce two problems: 1. The guest losing/getting external calls. 2. Random memory overwrites in the host. So this problem happens on every 127 + 128 created VM with 64 VCPUs. Cc: stable@vger.kernel.org # v3.15+ Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-27s390/topology: reduce per_cpu() invocationsHeiko Carstens
Each per_cpu() invocation generates extra code. Since there are a lot of similiar calls in the topology code we can avoid a lot of them. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27s390/nmi: reduce size of percpu variableHeiko Carstens
Change the flag fields within struct mcck_struct to simple bit fields to reduce the size of the structure which is used as percpu variable. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27s390/nmi: fix terminologyHeiko Carstens
According to the architecture registers are validated and not revalidated. So change comments and functions names to match. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27s390/nmi: remove castsHeiko Carstens
Remove all the casts to and from the machine check interruption code. This patch changes struct mci to a union, which contains an anonymous structure with the already known bits and in addition an unsigned long field, which contains the raw machine check interruption code. This allows to simply assign and decoce the interruption code value without the need for all those casts we had all the time. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>