Age | Commit message (Collapse) | Author |
|
commit 3ea3aa8cf67d3bbe00a19b6a4013d19efa7d0f41 upstream.
CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID
after migration.
KVM-Stable-Tag.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 24d1b15f72abe3465e871d11cfc9dc34d1aab8b2 upstream.
To support xsave properly for the guest the SVM module need
software support for it. As long as this is not present do
not report the xsave as supported feature in cpuid.
As a side-effect this patch moves the bit() helper function
into the x86.h file so that it can be used in svm.c too.
KVM-Stable-Tag.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c8770e7ba63bb5dd8fe5f9d251275a8fa717fb78 upstream.
We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.
Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
long mode
Things can be cleaned up further, but this is the minmal fix for now.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0a77fe4c188e25917799f2356d4aa5e6d80c39a2 upstream.
If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder
the code to that effect.
Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix
a user-visible bug.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 97e69aa62f8b5d338d6cff49be09e37cc1262838 upstream.
Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and
kvm_clock_data are copied to userland with some padding and reserved
fields unitialized. It leads to leaking of contents of kernel stack
memory. We have to initialize them to zero.
In patch v1 Jan Kiszka suggested to fill reserved fields with zeros
instead of memset'ting the whole struct. It makes sense as these
fields are explicitly marked as padding. No more fields need zeroing.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit edde99ce05290e50ce0b3495d209e54e6349ab47 upstream.
I have observed the following bug trigger:
1. userspace calls GET_DIRTY_LOG
2. kvm_mmu_slot_remove_write_access is called and makes a page ro
3. page fault happens and makes the page writeable
fault is logged in the bitmap appropriately
4. kvm_vm_ioctl_get_dirty_log swaps slot pointers
a lot of time passes
5. guest writes into the page
6. userspace calls GET_DIRTY_LOG
At point (5), bitmap is clean and page is writeable,
thus, guest modification of memory is not logged
and GET_DIRTY_LOG returns an empty bitmap.
The rule is that all pages are either dirty in the current bitmap,
or write-protected, which is violated here.
It seems that just moving kvm_mmu_slot_remove_write_access down
to after the slot pointer swap should fix this bug.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cda0008299a06f0d7218c6037c3c02d7a865e954 upstream.
This patch implements restoring of the correct rip, rsp, and
rax after the svm emulation in KVM injected a selective_cr0
write intercept into the guest hypervisor. The problem was
that the vmexit is emulated in the instruction emulation
which later commits the registers right after the write-cr0
instruction. So the l1 guest will continue to run with the
l2 rip, rsp and rax resulting in unpredictable behavior.
This patch is not the final word, it is just an easy patch
to fix the issue. The real fix will be done when the
instruction emulator is made aware of nested virtualization.
Until this is done this patch fixes the issue and provides
an easy way to fix this in -stable too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 4c62a2dc92518c5adf434df8e5c2283c6762672a upstream.
This patch fixes a bug in KVM where it _always_ reports the
support of the SVM feature to userspace. But KVM only
supports SVM on AMD hardware and only when it is enabled in
the kernel module. This patch fixes the wrong reporting.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7ef8aa72ab176e0288f363d1247079732c5d5792 upstream.
The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.
Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.
This is CVE-2010-3698.
KVM-Stable-Tag.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC. Instead, just set TSC to zero once at VCPU creation time.
Why the separate patch? So git-bisect is your friend.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
On reset, VMCB TSC should be set to zero. Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Top of kvm_kpic_state structure should have the same memory layout as
kvm_pic_state since it is copied by memcpy.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
If there are no vcpus, found will be NULL. Check before doing anything with
it.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b
operands are 64-bit.
Fix by adding val64 and orig_val64 union members to struct operand, and
using them where needed.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: free irq source id in handling error path
KVM: destroy workqueue on kvm_create_pit() failures
KVM: fix poison overwritten caused by using wrong xstate size
|
|
Free irq source id if create pit workqueue fail
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise
after pit is freed, the workqueue is leaked.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep
is xstate_size. xstate_size is set from cpuid instruction, which is often
smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct)
to fill in/out fpu.state.xsave, as what we allocated for fpu.state is
xstate_size, kernel will write out of memory and caused poison/redzone/padding
overwritten warnings.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
KVM ended up having to put a pretty ugly wrapper around set_64bit()
in order to get the type right. Now set_64bit() takes the expected
u64 type, and this wrapper can be cleaned up.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Avi Kivity <avi@redhat.com>
LKML-Reference: <4C5C4E7A.8040603@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um, x86: Cast to (u64 *) inside set_64bit()
x86-32, asm: Directly access per-cpu GDT
x86-64, asm: Directly access per-cpu IST
x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu()
x86, asm: Move cmpxchg emulation code to arch/x86/lib
x86, asm: Clean up and simplify <asm/cmpxchg.h>
x86, asm: Clean up and simplify set_64bit()
x86: Add memory modify constraints to xchg() and cmpxchg()
x86-64: Simplify loading initial_gs
x86: Use symbolic MSR names
x86: Remove redundant K6 MSRs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix keeping track of AMD C1E
x86, cpu: Package Level Thermal Control, Power Limit Notification definitions
x86, cpu: Export AMD errata definitions
x86, cpu: Use AMD errata checking framework for erratum 383
x86, cpu: Clean up AMD erratum 400 workaround
x86, cpu: AMD errata checking framework
x86, cpu: Split addon_cpuid_features.c
x86, cpu: Clean up formatting in cpufeature.h, remove override
x86, cpu: Enumerate xsaveopt
x86, cpu: Add xsaveopt cpufeature
x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves
x86, cpu: Support the features flags in new CPUID leaf 7
x86, cpu: Add CPU flags for F16C and RDRND
x86: Look for IA32_ENERGY_PERF_BIAS support
x86, AMD: Extend support to future families
x86, cacheinfo: Carve out L3 cache slot accessors
x86, xsave: Cleanup return codes in check_for_xstate()
|
|
vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB.
This means host userspace can learn a few bits of host memory.
Fix by reloading GDTR when we load other host state.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Sometimes, atomically set spte is not needed, this patch call __xchg_spte()
more smartly
Note: if the old mapping's access bit is already set, we no need atomic operation
since the access bit is not lost
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Introduce set_spte_track_bits() to cleanup current code
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
If the old mapping is not present, the spte.a is not lost, so no need
atomic operation to set it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
In sync-page path, if spte.writable is changed, it will lose page dirty
tracking, for example:
assume spte.writable = 0 in a unsync-page, when it's synced, it map spte
to writable(that is spte.writable = 1), later guest write spte.gfn, it means
spte.gfn is dirty, then guest changed this mapping to read-only, after it's
synced, spte.writable = 0
So, when host release the spte, it detect spte.writable = 0 and not mark page
dirty
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
In current code, if ept is enabled(shadow_accessed_mask = 0), the page
accessed tracking is lost.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
In the speculative path, we should check guest pte's reserved bits just as
the real processor does
Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The index wasn't calculated correctly (off by one) for huge spte so KVM guest
was unstable with transparent hugepages.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
If the destination is a memory operand and the memory cannot
map to a valid page, the xchg instruction emulation and locked
instruction will not work on io regions and stuck in endless
loop. We should emulate exchange as write to fix it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
If pit delivers interrupt while pic is masking it OS will never do EOI
and ack notifier will not be called so when pit will be unmasked no pit
interrupts will be delivered any more. Calling mask notifiers solves this
issue.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
With tdp enabled we should get into emulator only when emulating io, so
reexecution will always bring us back into emulator.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Mark inc (0xfe/0 0xff/0) and dec (0xfe/1 0xff/1) as lock prefix capable.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no
need for them.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Currently, when we fetch an spte, we only verify that gptes match those that
the walker saw if we build new shadow pages for them.
However, this misses the following race:
vcpu1 vcpu2
walk
change gpte
walk
instantiate sp
fetch existing sp
Fix by validating every gpte, regardless of whether it is used for building
a new sp or not.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Partition the function into three sections:
- fetching indirect shadow pages (host_level > guest_level)
- fetching direct shadow pages (page_level < host_level <= guest_level)
- the final spte (page_level == host_level)
Instead of the current spaghetti.
A slight change from the original code is that we call validate_direct_spte()
more often: previously we called it only for gw->level, now we also call it for
lower levels. The change should have no effect.
[xiao: fix regression caused by validate_direct_spte() called too late]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Move the code to check whether a gpte has changed since we fetched it into
a helper.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Add a helper to verify that a direct shadow page is valid wrt the required
access permissions; drop the page if it is not valid.
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
To clarify spte fetching code, move large spte handling into a helper.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link
shadow pages together.
(not technically required since shadow pages are __GFP_KERNEL, so upper 32
bits are always clear)
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
To simplify the process of fetching an spte, add a helper that links
a shadow page to an spte.
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Userspace needs to reset and save/restore these MSRs.
The MCE banks are not exposed since their number varies from vcpu to vcpu.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Fix:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
......
Call Trace:
[<ffffffffa0159bd1>] ? kvm_set_irq+0xdd/0x24b [kvm]
[<ffffffff8106ea8b>] ? trace_hardirqs_off_caller+0x1f/0x10e
[<ffffffff813ad17f>] ? sub_preempt_count+0xe/0xb6
[<ffffffff8106d273>] ? put_lock_stats+0xe/0x27
...
RIP [<ffffffffa0159c72>] kvm_set_irq+0x17e/0x24b [kvm]
This bug is triggered when guest is shutdown, is because we freed
irq_routing before pit thread stopped
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
When shadow pages are in use sometimes KVM try to emulate an instruction
when it accesses a shadowed page. If emulation fails KVM un-shadows the
page and reenter guest to allow vcpu to execute the instruction. If page
is not in shadow page hash KVM assumes that this was attempt to do MMIO
and reports emulation failure to userspace since there is no way to fix
the situation. This logic has a race though. If two vcpus tries to write
to the same shadowed page simultaneously both will enter emulator, but
only one of them will find the page in shadow page hash since the one who
founds it also removes it from there, so another cpu will report failure
to userspace and will abort the guest.
Fix this by checking (in addition to checking shadowed page hash) that
page that caused the emulation belongs to valid memory slot. If it is
then reenter the guest to allow vcpu to reexecute the instruction.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Currently if guest access address that belongs to memory slot but is not
backed up by page or page is read only KVM treats it like MMIO access.
Remove that capability. It was never part of the interface and should
not be relied upon.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Stanse found that there is an omitted unlock in kvm_create_pit in one fail
path. Add proper unlock there.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Real hardware disregards permission errors when computing page fault error
code bit 0 (page present). Do the same.
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Bit 4 of the page fault error code is set only if EFER.NX is set.
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This patch change to use DstAcc for decoding 'mov AL, moffs'
and introduced SrcAcc for decoding 'mov moffs, AL'.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|