summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/mmu.c
AgeCommit message (Collapse)Author
2010-08-02KVM: MMU: fix page dirty tracking lost while sync pageXiao Guangrong
In sync-page path, if spte.writable is changed, it will lose page dirty tracking, for example: assume spte.writable = 0 in a unsync-page, when it's synced, it map spte to writable(that is spte.writable = 1), later guest write spte.gfn, it means spte.gfn is dirty, then guest changed this mapping to read-only, after it's synced, spte.writable = 0 So, when host release the spte, it detect spte.writable = 0 and not mark page dirty Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: fix broken page accessed tracking with ept enabledXiao Guangrong
In current code, if ept is enabled(shadow_accessed_mask = 0), the page accessed tracking is lost. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: add missing reserved bits check in speculative pathXiao Guangrong
In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: fix mmu notifier invalidate handler for huge spteAndrea Arcangeli
The index wasn't calculated correctly (off by one) for huge spte so KVM guest was unstable with transparent hugepages. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Add validate_direct_spte() helperAvi Kivity
Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add drop_large_spte() helperAvi Kivity
To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Use __set_spte to link shadow pagesAvi Kivity
To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link shadow pages together. (not technically required since shadow pages are __GFP_KERNEL, so upper 32 bits are always clear) Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: MMU: Add link_shadow_page() helperAvi Kivity
To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02KVM: Return EFAULT from kvm ioctl when guest accesses bad areaGleb Natapov
Currently if guest access address that belongs to memory slot but is not backed up by page or page is read only KVM treats it like MMIO access. Remove that capability. It was never part of the interface and should not be relied upon. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Don't drop accessed bit while updating an spteAvi Kivity
__set_spte() will happily replace an spte with the accessed bit set with one that has the accessed bit clear. Add a helper update_spte() which checks for this condition and updates the page flag if needed. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Atomically check for accessed bit when dropping an spteAvi Kivity
Currently, in the window between the check for the accessed bit, and actually dropping the spte, a vcpu can access the page through the spte and set the bit, which will be ignored by the mmu. Fix by using an exchange operation to atmoically fetch the spte and drop it. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte()Avi Kivity
Since we need to make the check atomic, move it to the place that will set the new spte. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: MMU: Introduce drop_spte()Avi Kivity
When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-02KVM: VMX: fix tlb flush with invalid rootXiao Guangrong
Commit 341d9b535b6c simplify reload logic while entry guest mode, it can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_SYNC both set. But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the root is invalid, it is triggered during my test: Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable] ...... Fixed by directly return if the root is not ready. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: Remove unnecessary divide operationsJoerg Roedel
This patch converts unnecessary divide and modulo operations in the KVM large page related code into logical operations. This allows to convert gfn_t to u64 while not breaking 32 bit builds. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: fix writable sync sp mappingXiao Guangrong
While we sync many unsync sp at one time(in mmu_sync_children()), we may mapping the spte writable, it's dangerous, if one unsync sp's mapping gfn is another unsync page's gfn. For example: SP1.pte[0] = P SP2.gfn's pfn = P [SP1.pte[0] = SP2.gfn's pfn] First, we write protected SP1 and SP2, but SP1 and SP2 are still the unsync sp. Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp, that is SP2, so it will mapping it writable, but we plan to sync SP2 soon, at this point, the SP2->unsync is not reliable since later we sync SP2 but SP2->gfn is already writable. So the final result is: SP2 is the sync page but SP2.gfn is writable. This bug will corrupt guest's page table, fixed by mark read-only mapping if the mapped gfn has shadow pages. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: Add mini-API for vcpu->requestsAvi Kivity
Makes it a little more readable and hackable. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Remove memory alias supportAvi Kivity
As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: don't walk every parent pages while mark unsyncXiao Guangrong
While we mark the parent's unsync_child_bitmap, if the parent is already unsynced, it no need walk it's parent, it can reduce some unnecessary workload Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: clear unsync_child_bitmap completelyXiao Guangrong
In current code, some page's unsync_child_bitmap is not cleared completely in mmu_sync_children(), for example, if two PDPEs shard one PDT, one of PDPE's unsync_child_bitmap is not cleared. Currently, it not harm anything just little overload, but it's the prepare work for the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: cleanup for __mmu_unsync_walk()Xiao Guangrong
Decrease sp->unsync_children after clear unsync_child_bitmap bit Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: don't mark pte notrap if it's just sync transientXiao Guangrong
If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: avoid double write protected in sync page pathXiao Guangrong
The sync page is already write protected in mmu_sync_children(), don't write protected it again Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: Fix mov cr3 #GP at wrong instructionAvi Kivity
On Intel, we call skip_emulated_instruction() even if we injected a #GP, resulting in the #GP pointing at the wrong address. Fix by injecting the exception and skipping the instruction at the same place, so we can do just one or the other. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: delay local tlb flushXiao Guangrong
delay local tlb flush until enter guest moden, it can reduce vpid flush frequency and reduce remote tlb flush IPI(if KVM_REQ_TLB_FLUSH bit is already set, IPI is not sent) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: use wrapper function to flush local tlbXiao Guangrong
Use kvm_mmu_flush_tlb() function instead of calling kvm_x86_ops->tlb_flush(vcpu) directly. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: remove unnecessary remote tlb flushXiao Guangrong
This remote tlb flush is no necessary since we have synced while sp is zapped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write()Xiao Guangrong
collect remote tlb flush in kvm_mmu_pte_write() path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: traverse sp hlish safelyXiao Guangrong
Now, we can safely to traverse sp hlish Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: gather remote tlb flush which occurs during page zappedXiao Guangrong
Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of kvm_mmu_zap_page() that can reduce remote tlb flush IPI Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: don't get free page number in the loopXiao Guangrong
In the later patch, we will modify sp's zapping way like below: kvm_mmu_prepare_zap_page A kvm_mmu_prepare_zap_page B kvm_mmu_prepare_zap_page C .... kvm_mmu_commit_zap_page [ zaped multiple sps only need to call kvm_mmu_commit_zap_page once ] In __kvm_mmu_free_some_pages() function, the free page number is getted form 'vcpu->kvm->arch.n_free_mmu_pages' in loop, it will hinders us to apply kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() since kvm_mmu_prepare_zap_page() not free sp. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: split the operations of kvm_mmu_zap_page()Xiao Guangrong
Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to split kvm_mmu_zap_page() function, then we can: - traverse hlist safely - easily to gather remote tlb flush which occurs during page zapped Those feature can be used in the later patches Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: introduce some macros to cleanup hlist traverseingXiao Guangrong
Introduce for_each_gfn_sp() and for_each_gfn_indirect_valid_sp() to cleanup hlist traverseing Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: skip invalid sp when unprotect pageXiao Guangrong
In kvm_mmu_unprotect_page(), the invalid sp can be skipped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: Don't calculate quadrant if tdp_enabledGui Jianfeng
There's no need to calculate quadrant if tdp is enabled. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow modeAvi Kivity
When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte permissions. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: don't check PT_WRITABLE_MASK directlyGui Jianfeng
Since we have is_writable_pte(), make use of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: Calculate correct base gfn for direct non-DIR levelLai Jiangshan
In Document/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in __direct_map(), the base gfn calculation is incorrect, it does not calculate correctly when level=3 or 4. Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: Don't allocate gfns page for direct mmu pagesLai Jiangshan
When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: allow more page become unsync at getting sp timeXiao Guangrong
Allow more page become asynchronous at getting sp time, if need create new shadow page for gfn but it not allow unsync(level > 1), we should unsync all gfn's unsync page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: allow more page become unsync at gfn mapping timeXiao Guangrong
In current code, shadow page can become asynchronous only if one shadow page for a gfn, this rule is too strict, in fact, we can let all last mapping page(i.e, it's the pte page) become unsync, and sync them at invlpg or flush tlb time. This patch allow more page become asynchronous at gfn mapping time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Update Red Hat copyrightsAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: don't write-protect if have new mapping to unsync pageXiao Guangrong
Two cases maybe happen in kvm_mmu_get_page() function: - one case is, the goal sp is already in cache, if the sp is unsync, we only need update it to assure this mapping is valid, but not mark it sync and not write-protect sp->gfn since it not broke unsync rule(one shadow page for a gfn) - another case is, the goal sp not existed, we need create a new sp for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule, we should sync(mark sync and write-protect) gfn's unsync shadow page. After enabling multiple unsync shadows, we sync those shadow pages only when the new sp not allow to become unsync(also for the unsyc rule, the new rule is: allow all pte page become unsync) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: split kvm_sync_page() functionXiao Guangrong
Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient() to clarify the code address Avi's suggestion kvm_sync_page_transient() function only update shadow page but not mark it sync and not write protect sp->gfn. it will be used by later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: MMU: remove rmap before clear spteXiao Guangrong
Remove rmap before clear spte otherwise it will trigger BUG_ON() in some functions such as rmap_write_protect(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: use proper cache object freeing functionXiao Guangrong
Use kmem_cache_free to free objects allocated by kmem_cache_alloc. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: x86: Clean up duplicate assignmentSheng Yang
mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the destory_kvm_mmu(). kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to vcpu->arch.cr4/efer, no need to do it again later. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: pass correct parameter to kvm_mmu_free_some_pagesMarcelo Tosatti
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()Avi Kivity
We drop the mmu lock between freeing memory and allocating the roots; this allows some other vcpu to sneak in and allocate memory. While the race is benign (resulting only in temporary overallocation, not oom) it is simple and easy to fix by moving the freeing close to the allocation. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-01KVM: inject #UD if instruction emulation fails and exit to userspaceGleb Natapov
Do not kill VM when instruction emulation fails. Inject #UD and report failure to userspace instead. Userspace may choose to reenter guest if vcpu is in userspace (cpl == 3) in which case guest OS will kill offending process and continue running. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>