summaryrefslogtreecommitdiff
path: root/virt
AgeCommit message (Collapse)Author
2018-05-30kvm: Map PFN-type memory regions as writable (if possible)KarimAllah Ahmed
[ Upstream commit a340b3e229b24a56f1c7f5826b15a3af0f4b13e5 ] For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-22KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lockAndre Przywara
commit bf308242ab98b5d1648c3663e753556bef9bec01 upstream. kvm_read_guest() will eventually look up in kvm_memslots(), which requires either to hold the kvm->slots_lock or to be inside a kvm->srcu critical section. In contrast to x86 and s390 we don't take the SRCU lock on every guest exit, so we have to do it individually for each kvm_read_guest() call. Provide a wrapper which does that and use that everywhere. Note that ending the SRCU critical section before returning from the kvm_read_guest() wrapper is safe, because the data has been *copied*, so we don't need to rely on valid references to the memslot anymore. Cc: Stable <stable@vger.kernel.org> # 4.8+ Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11KVM: mmu: Fix overlap between public and private memslotsWanpeng Li
commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream. Reported by syzkaller: pte_list_remove: ffff9714eb1f8078 0->BUG ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu.c:1157! invalid opcode: 0000 [#1] SMP RIP: 0010:pte_list_remove+0x11b/0x120 [kvm] Call Trace: drop_spte+0x83/0xb0 [kvm] mmu_page_zap_pte+0xcc/0xe0 [kvm] kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm] kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm] kvm_arch_flush_shadow_all+0xe/0x10 [kvm] kvm_mmu_notifier_release+0x6c/0xa0 [kvm] ? kvm_mmu_notifier_release+0x5/0xa0 [kvm] __mmu_notifier_release+0x79/0x110 ? __mmu_notifier_release+0x5/0x110 exit_mmap+0x15a/0x170 ? do_exit+0x281/0xcb0 mmput+0x66/0x160 do_exit+0x2c9/0xcb0 ? __context_tracking_exit.part.5+0x4a/0x150 do_group_exit+0x50/0xd0 SyS_exit_group+0x14/0x20 do_syscall_64+0x73/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 The reason is that when creates new memslot, there is no guarantee for new memslot not overlap with private memslots. This can be triggered by the following program: #include <fcntl.h> #include <pthread.h> #include <setjmp.h> #include <signal.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> #include <linux/kvm.h> long r[16]; int main() { void *p = valloc(0x4000); r[2] = open("/dev/kvm", 0); r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul); uint64_t addr = 0xf000; ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr); r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul); ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul); ioctl(r[6], KVM_RUN, 0); ioctl(r[6], KVM_RUN, 0); struct kvm_userspace_memory_region mr = { .slot = 0, .flags = KVM_MEM_LOG_DIRTY_PAGES, .guest_phys_addr = 0xf000, .memory_size = 0x4000, .userspace_addr = (uintptr_t) p }; ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr); return 0; } This patch fixes the bug by not adding a new memslot even if it overlaps with private memslots. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
2017-12-25kvm, mm: account kvm related kmem slabs to kmemcgShakeel Butt
[ Upstream commit 46bea48ac241fe0b413805952dda74dd0c09ba8b ] The kvm slabs can consume a significant amount of system memory and indeed in our production environment we have observed that a lot of machines are spending significant amount of memory that can not be left as system memory overhead. Also the allocations from these slabs can be triggered directly by user space applications which has access to kvm and thus a buggy application can leak such memory. So, these caches should be accounted to kmemcg. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25KVM: pci-assign: do not map smm memory slot pages in vt-d page tablesHerongguang (Stephen)
[ Upstream commit 0292e169b2d9c8377a168778f0b16eadb1f578fd ] or VM memory are not put thus leaked in kvm_iommu_unmap_memslots() when destroy VM. This is consistent with current vfio implementation. Signed-off-by: herongguang <herongguang.he@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16KVM: arm/arm64: vgic-its: Preserve the revious read from the pending tableMarc Zyngier
commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream. The current pending table parsing code assumes that we keep the previous read of the pending bits, but keep that variable in the current block, making sure it is discarded on each loop. We end-up using whatever is on the stack. Who knows, it might just be the right thing... Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table") Cc: stable@vger.kernel.org # 4.8 Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14KVM: arm/arm64: VGIC: Fix command handling while ITS being disabledAndre Przywara
[ Upstream commit a5e1e6ca94a8cec51571fd62e3eaec269717969c ] The ITS spec says that ITS commands are only processed when the ITS is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking this into account. Fix this by checking the enabled state before handling CWRITER writes. On the other hand that means that CWRITER could advance while the ITS is disabled, and enabling it would need those commands to be processed. Fix this case as well by refactoring actual command processing and calling this from both the GITS_CWRITER and GITS_CTLR handlers. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14KVM: arm/arm64: vgic-its: Check result of allocation before useMarc Zyngier
commit 686f294f2f1ae40705283dd413ca1e4c14f20f93 upstream. We miss a test against NULL after allocation. Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into generic ID validation") Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocationMarc Zyngier
commit 150009e2c70cc3c6e97f00e7595055765d32fb85 upstream. Using the size of the structure we're allocating is a good idea and avoids any surprise... In this case, we're happilly confusing kvm_kernel_irq_routing_entry and kvm_irq_routing_entry... Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing") Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversionChristoffer Dall
commit fc396e066318c0a02208c1d3f0b62950a7714999 upstream. We are incorrectly rearranging 32-bit words inside a 64-bit typed value for big endian systems, which would result in never marking a virtual interrupt as inactive on big endian systems (assuming 32 or fewer LRs on the hardware). Fix this by not doing any word order manipulation for the typed values. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-14KVM: x86: fix APIC page invalidationRadim Krčmář
commit b1394e745b9453dcb5b0671c205b770e87dedb87 upstream. Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2") Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: Wanpeng Li <wanpeng.li@hotmail.com> Tested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-09KVM: arm/arm64: Fix occasional warning from the timer work functionChristoffer Dall
[ Upstream commit 63e41226afc3f7a044b70325566fa86ac3142538 ] When a VCPU blocks (WFI) and has programmed the vtimer, we program a soft timer to expire in the future to wake up the vcpu thread when appropriate. Because such as wake up involves a vcpu kick, and the timer expire function can get called from interrupt context, and the kick may sleep, we have to schedule the kick in the work function. The work function currently has a warning that gets raised if it turns out that the timer shouldn't fire when it's run, which was added because the idea was that in that case the work should never have been cancelled. However, it turns out that this whole thing is racy and we can get spurious warnings. The problem is that we clear the armed flag in the work function, which may run in parallel with the kvm_timer_unschedule->timer_disarm() call. This results in a possible situation where the timer_disarm() call does not call cancel_work_sync(), which effectively synchronizes the completion of the work function with running the VCPU. As a result, the VCPU thread proceeds before the work function completees, causing changes to the timer state such that kvm_timer_should_fire(vcpu) returns false in the work function. All we do in the work function is to kick the VCPU, and an occasional rare extra kick never harmed anyone. Since the race above is extremely rare, we don't bother checking if the race happens but simply remove the check and the clearing of the armed flag from the work function. Reported-by: Matthias Brugger <mbrugger@suse.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27vfio: New external user group/file matchAlex Williamson
commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream. At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interruptMarc Zyngier
commit ddf42d068f8802de122bb7efdfcb3179336053f1 upstream. When an interrupt is injected with the HW bit set (indicating that deactivation should be propagated to the physical distributor), special care must be taken so that we never mark the corresponding LR with the Active+Pending state (as the pending state is kept in the physycal distributor). Cc: stable@vger.kernel.org Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interruptMarc Zyngier
commit 3d6e77ad1489650afa20da92bb589c8778baa8da upstream. When an interrupt is injected with the HW bit set (indicating that deactivation should be propagated to the physical distributor), special care must be taken so that we never mark the corresponding LR with the Active+Pending state (as the pending state is kept in the physycal distributor). Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08KVM: kvm_io_bus_unregister_dev() should never failDavid Hildenbrand
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream. No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08KVM: x86: clear bus pointer when destroyedPeter Xu
commit df630b8c1e851b5e265dc2ca9c87222e342c093b upstream. When releasing the bus, let's clear the bus pointers to mark it out. If any further device unregister happens on this bus, we know that we're done if we found the bus being released already. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-18KVM: arm/arm64: Let vcpu thread modify its own active stateJintack Lim
commit 370a0ec1819990f8e2a93df7cc9c0146980ed45f upstream. Currently, if a vcpu thread tries to change the active state of an interrupt which is already on the same vcpu's AP list, it will loop forever. Since the VGIC mmio handler is called after a vcpu has already synced back the LR state to the struct vgic_irq, we can just let it proceed safely. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12KVM: arm/arm64: vgic: Stop injecting the MSI occurrence twiceShanker Donthineni
commit 0bdbf3b071986ba80731203683cf623d5c0cacb1 upstream. The IRQFD framework calls the architecture dependent function twice if the corresponding GSI type is edge triggered. For ARM, the function kvm_set_msi() is getting called twice whenever the IRQFD receives the event signal. The rest of the code path is trying to inject the MSI without any validation checks. No need to call the function vgic_its_inject_msi() second time to avoid an unnecessary overhead in IRQ queue logic. It also avoids the possibility of VM seeing the MSI twice. Simple fix, return -1 if the argument 'level' value is zero. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26KVM: arm/arm64: vgic: Fix deadlock on error handlingMarc Zyngier
commit 1193e6aeecb36c74c48c7cd0f641acbbed9ddeef upstream. Dmitry Vyukov reported that the syzkaller fuzzer triggered a deadlock in the vgic setup code when an error was detected, as the cleanup code tries to take a lock that is already held by the setup code. The fix is to avoid retaking the lock when cleaning up, by telling the cleanup function that we already hold it. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19KVM: eventfd: fix NULL deref irqbypass consumerWanpeng Li
commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream. Reported syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] PGD 0 Oops: 0002 [#1] SMP CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1 Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] task: ffff9bbe0dfbb900 task.stack: ffffb61802014000 RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] Call Trace: irqfd_shutdown+0x66/0xa0 [kvm] process_one_work+0x16b/0x480 worker_thread+0x4b/0x500 kthread+0x101/0x140 ? process_one_work+0x480/0x480 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x25/0x30 RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20 CR2: 0000000000000008 The syzkaller folks reported a NULL pointer dereference that due to unregister an consumer which fails registration before. The syzkaller creates two VMs w/ an equal eventfd occasionally. So the second VM fails to register an irqbypass consumer. It will make irqfd as inactive and queue an workqueue work to shutdown irqfd and unregister the irqbypass consumer when eventfd is closed. However, the second consumer has been initialized though it fails registration. So the token(same as the first VM's) is taken to unregister the consumer through the workqueue, the consumer of the first VM is found and unregistered, then NULL deref incurred in the path of deleting consumer from the consumers list. This patch fixes it by making irq_bypass_register/unregister_consumer() looks for the consumer entry based on consumer pointer itself instead of token matching. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-01KVM: use after free in kvm_ioctl_create_device()Dan Carpenter
We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-01Merge tag 'kvm-arm-for-4.9-rc7' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs
2016-11-24KVM: arm/arm64: vgic: Don't notify EOI for non-SPIsMarc Zyngier
When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-19KVM: async_pf: avoid recursive flushing of work itemsPaolo Bonzini
This was reported by syzkaller: [ INFO: possible recursive locking detected ] 4.9.0-rc4+ #49 Not tainted --------------------------------------------- kworker/2:1/5658 is trying to acquire lock: ([ 1644.769018] (&work->work) [< inline >] list_empty include/linux/compiler.h:243 [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511 but task is already holding lock: ([ 1644.769018] (&work->work) [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093 stack backtrace: CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events async_pf_execute ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480 0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27 ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0 Call Trace: ... [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846 [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916 [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951 [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126 [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841 [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946 [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731 [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752 [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111 [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096 [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230 [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209 [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 The reason is that kvm_put_kvm is causing the destruction of the VM, but the page fault is still on the ->queue list. The ->queue list is owned by the VCPU, not by the work items, so we cannot just add list_del to the work item. Instead, use work->vcpu to note async page faults that have been resolved and will be processed through the done list. There is no need to flush those. Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19Merge tag 'kvm-arm-for-4.9-rc6' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering
2016-11-18KVM: arm64: Fix the issues when guest PMCCFILTR is configuredWei Huang
KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured. But this function can't deals with PMCCFILTR correctly because the evtCount bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this function shouldn't return immediately; instead it needs to check further if select_idx is ARMV8_PMU_CYCLE_IDX. Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER blindly to attr.config. Instead it ought to convert the request to the "cpu cycle" event type (i.e. 0x11). To support this patch and to prevent duplicated definitions, a limited set of ARMv8 perf event types were relocated from perf_event.c to asm/perf_event.h. Cc: stable@vger.kernel.org # 4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-11Merge tag 'kvm-arm-for-v4.9-rc4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU
2016-11-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQsShih-Wei Li
In cases like IPI, we could be queueing an interrupt for a VCPU that is already running and is not about to exit, because the VCPU has entered the VM with the interrupt pending and would not trap on EOI'ing that interrupt. This could result to delays in interrupt deliveries or even loss of interrupts. To guarantee prompt interrupt injection, here we have to try to kick the VCPU. Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04KVM: arm/arm64: vgic: Prevent access to invalid SPIsAndre Przywara
In our VGIC implementation we limit the number of SPIs to a number that the userland application told us. Accordingly we limit the allocation of memory for virtual IRQs to that number. However in our MMIO dispatcher we didn't check if we ever access an IRQ beyond that limit, leading to out-of-bound accesses. Add a test against the number of allocated SPIs in check_region(). Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which is not implemented on ARM(32). [maz: cleaned-up original patch] Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-10-26KVM: fix OOPS on flush_workPaolo Bonzini
The conversion done by commit 3706feacd007 ("KVM: Remove deprecated create_singlethread_workqueue") is broken. It flushes a single work item &irqfd->shutdown instead of all of them, and even worse if there is no irqfd on the list then you get a NULL pointer dereference. Revert the virt/kvm/eventfd.c part of that patch; to avoid the deprecated function, just allocate our own workqueue---it does not even have to be unbound---with alloc_workqueue. Fixes: 3706feacd007 Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24mm: unexport __get_user_pages()Lorenzo Stoakes
This patch unexports the low-level __get_user_pages() function. Recent refactoring of the get_user_pages* functions allow flags to be passed through get_user_pages() which eliminates the need for access to this function from its one user, kvm. We can see that the two calls to get_user_pages() which replace __get_user_pages() in kvm_main.c are equivalent by examining their call stacks: get_user_page_nowait(): get_user_pages(start, 1, flags, page, NULL) __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, start, 1, flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL) check_user_page_hwpoison(): get_user_pages(addr, 1, flags, NULL, NULL) __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL, NULL, NULL) Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18mm: remove write/force parameters from __get_user_pages_unlocked()Lorenzo Stoakes
This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-29Merge tag 'kvm-arm-for-v4.9' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into next KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs
2016-09-27KVM: arm/arm64: vgic: Don't flush/sync without a working vgicChristoffer Dall
If the vgic hasn't been created and initialized, we shouldn't attempt to look at its data structures or flush/sync anything to the GIC hardware. This fixes an issue reported by Alexander Graf when using a userspace irqchip. Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Cc: stable@vger.kernel.org Reported-by: Alexander Graf <agraf@suse.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-27KVM: arm64: Require in-kernel irqchip for PMU supportChristoffer Dall
If userspace creates a PMU for the VCPU, but doesn't create an in-kernel irqchip, then we end up in a nasty path where we try to take an uninitialized spinlock, which can lead to all sorts of breakages. Luckily, QEMU always creates the VGIC before the PMU, so we can establish this as ABI and check for the VGIC in the PMU init stage. This can be relaxed at a later time if we want to support PMU with a userspace irqchip. Cc: stable@vger.kernel.org Cc: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22ARM: KVM: Support vgic-v3Vladimir Murzin
This patch allows to build and use vgic-v3 in 32-bit mode. Unfortunately, it can not be split in several steps without extra stubs to keep patches independent and bisectable. For instance, virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre to be already defined. It is how support has been done: * handle SGI requests from the guest * report configured SRE on access to GICv3 cpu interface from the guest * required vgic-v3 macros are provided via uapi.h * static keys are used to select GIC backend * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with the static inlines Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systemsVladimir Murzin
We have couple of 64-bit registers defined in GICv3 architecture, so unsigned long accesses to these registers will only access a single 32-bit part of that regitser. On the other hand these registers can't be accessed as 64-bit with a single instruction like ldrd/strd or ldmia/stmia if we run a 32-bit host because KVM does not support access to MMIO space done by these instructions. It means that a 32-bit guest accesses these registers in 32-bit chunks, so the only thing we need to do is to ensure that extract_bytes() always takes 64-bit data. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm: vgic: Fix compiler warnings when built for 32-bitVladimir Murzin
Well, this patch is looking ahead of time, but we'll get following compiler warnings as soon as we introduce vgic-v3 to 32-bit world CC arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer': arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow] value = (mpidr & GENMASK(23, 0)) << 32; ^ In file included from ./include/linux/kernel.h:10:0, from ./include/asm-generic/bug.h:13, from ./arch/arm/include/asm/bug.h:59, from ./include/linux/bug.h:4, from ./include/linux/io.h:23, from ./arch/arm/include/asm/arch_gicv3.h:23, from ./include/linux/irqchip/arm-gic-v3.h:411, from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14: arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi': ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow] #define BIT(nr) (1UL << (nr)) ^ arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT' broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); ^ Let's fix them now. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22KVM: arm64: vgic-its: Introduce config option to guard ITS specific codeVladimir Murzin
By now ITS code guarded with KVM_ARM_VGIC_V3 config option which was introduced to hide everything specific to vgic-v3 from 32-bit world. We are going to support vgic-v3 in 32-bit world and KVM_ARM_VGIC_V3 will gone, but we don't have support for ITS there yet and we need to continue keeping ITS away. Introduce the new config option to prevent ITS code being build in 32-bit mode when support for vgic-v3 is done. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22arm64: KVM: Move vgic-v3 save/restore to virt/kvm/arm/hypVladimir Murzin
So we can reuse the code under arch/arm Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-22arm64: KVM: Use static keys for selecting the GIC backendVladimir Murzin
Currently GIC backend is selected via alternative framework and this is fine. We are going to introduce vgic-v3 to 32-bit world and there we don't have patching framework in hand, so we can either check support for GICv3 every time we need to choose which backend to use or try to optimise it by using static keys. The later looks quite promising because we can share logic involved in selecting GIC backend between architectures if both uses static keys. This patch moves arm64 from alternative to static keys framework for selecting GIC backend. For that we embed static key into vgic_global and enable the key during vgic initialisation based on what has already been exposed by the host GIC driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-16kvm: create per-vcpu dirs in debugfsLuiz Capitulino
This commit adds the ability for archs to export per-vcpu information via a new per-vcpu dir in the VM's debugfs directory. If kvm_arch_has_vcpu_debugfs() returns true, then KVM will create a vcpu dir for each vCPU in the VM's debugfs directory. Then kvm_arch_create_vcpu_debugfs() is responsible for populating each vcpu directory with arch specific entries. The per-vcpu path in debugfs will look like: /sys/kernel/debug/kvm/29162-10/vcpu0 /sys/kernel/debug/kvm/29162-10/vcpu1 This is all arch specific for now because the only user of this interface (x86) wants to export x86-specific per-vcpu information to user-space. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16kvm: kvm_destroy_vm_debugfs(): check debugfs_stat_data pointerLuiz Capitulino
This make it possible to call kvm_destroy_vm_debugfs() from kvm_create_vm_debugfs() in error conditions. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-13Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: The highlights are: * Reduced latency for interrupts from PCI pass-through devices, from Suresh Warrier and me. * Halt-polling implementation from Suraj Jitindar Singh. * 64-bit VCPU statistics, also from Suraj. * Various other minor fixes and improvements.
2016-09-08KVM: ARM: cleanup kvm_timer_hyp_initPaolo Bonzini
Remove two unnecessary labels now that kvm_timer_hyp_init is not creating its own workqueue anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08arm64: KVM: Inject a vSerror if detecting a bad GICV access at EL2Marc Zyngier
If, when proxying a GICV access at EL2, we detect that the guest is doing something silly, report an EL1 SError instead ofgnoring the access. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08arm64: KVM: vgic-v2: Enable GICV access from HYP if access from guest is unsafeMarc Zyngier
So far, we've been disabling KVM on systems where the GICV region couldn't be safely given to a guest. Now that we're able to handle this access safely by emulating it in HYP, we can enable this feature when we detect an unsafe configuration. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2016-09-08arm64: KVM: vgic-v2: Add GICV access from HYPMarc Zyngier
Now that we have the necessary infrastructure to handle MMIO accesses in HYP, perform the GICV access on behalf of the guest. This requires checking that the access is strictly 32bit, properly aligned, and falls within the expected range. When all condition are satisfied, we perform the access and tell the rest of the HYP code that the instruction has been correctly emulated. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>