linux-toradex.git - Linux kernel for Apalis and Colibri modules

Age	Commit message (Collapse)	Author
2010-05-19	x86, paravirt: Add a global synchronization point for pvclock	Glauber Costa
	In recent stress tests, it was found that pvclock-based systems could seriously warp in smp systems. Using ingo's time-warp-test.c, I could trigger a scenario as bad as 1.5mi warps a minute in some systems. (to be fair, it wasn't that bad in most of them). Investigating further, I found out that such warps were caused by the very offset-based calculation pvclock is based on. This happens even on some machines that report constant_tsc in its tsc flags, specially on multi-socket ones. Two reads of the same kernel timestamp at approx the same time, will likely have tsc timestamped in different occasions too. This means the delta we calculate is unpredictable at best, and can probably be smaller in a cpu that is legitimately reading clock in a forward ocasion. Some adjustments on the host could make this window less likely to happen, but still, it pretty much poses as an intrinsic problem of the mechanism. A while ago, I though about using a shared variable anyway, to hold clock last state, but gave up due to the high contention locking was likely to introduce, possibly rendering the thing useless on big machines. I argue, however, that locking is not necessary. We do a read-and-return sequence in pvclock, and between read and return, the global value can have changed. However, it can only have changed by means of an addition of a positive value. So if we detected that our clock timestamp is less than the current global, we know that we need to return a higher one, even though it is not exactly the one we compared to. OTOH, if we detect we're greater than the current time source, we atomically replace the value with our new readings. This do causes contention on big boxes (but big here means BIG), but it seems like a good trade off, since it provide us with a time source guaranteed to be stable wrt time warps. After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> CC: Avi Kivity <avi@redhat.com> CC: Marcelo Tosatti <mtosatti@redhat.com> CC: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	x86, paravirt: Enable pvclock flags in vcpu_time_info structure	Glauber Costa
	This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jeremy Fitzhardinge <jeremy@goop.org> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: x86: Inject #GP with the right rip on efer writes	Roedel, Joerg
	This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: SVM: Don't allow nested guest to VMMCALL into host	Joerg Roedel
	This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: x86: Fix exception reinjection forced to true	Joerg Roedel
	The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: Fix wallclock version writing race	Avi Kivity
	Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots	Avi Kivity
	On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)	Shane Wang
	Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: x86: properly update ready_for_interrupt_injection	Marcelo Tosatti
	The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: VMX: Atomically switch efer if EPT && !EFER.NX	Avi Kivity
	When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: VMX: Add facility to atomically switch MSRs on guest entry/exit	Avi Kivity
	Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: VMX: Add definitions for guest and host EFER autoswitch vmcs entries	Avi Kivity
	Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: VMX: Add definition for msr autoload entry	Avi Kivity
	Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: Let vcpu structure alignment be determined at runtime	Avi Kivity
	vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19	KVM: MMU: cleanup invlpg code	Xiao Guangrong
	Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: MMU: move unsync/sync tracpoints to proper place	Xiao Guangrong
	Move unsync/sync tracepoints to the proper place, it's good for us to obtain unsync page live time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: MMU: convert mmu tracepoints	Xiao Guangrong
	Convert mmu tracepoints by using DECLARE_EVENT_CLASS Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: MMU: fix for calculating gpa in invlpg code	Xiao Guangrong
	If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: powerpc: use of kzalloc/kfree requires including slab.h	Stephen Rothwell
	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: Fix mmu shrinker error	Gui Jianfeng
	kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19	KVM: MMU: fix hashing for TDP and non-paging modes	Eric Northup
	For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup <digitaleric@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17	KVM: MMU: fix sp->unsync type error in trace event definition	Gui Jianfeng
	sp->unsync is bool now, so update trace event declaration. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17	KVM: SVM: Handle MCE intercepts always on host level	Joerg Roedel
	This patch prevents MCE intercepts from being propagated into the L1 guest if they happened in an L2 guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: x86: Allow marking an exception as reinjected	Joerg Roedel
	This patch adds logic to kvm/x86 which allows to mark an injected exception as reinjected. This allows to remove an ugly hack from svm_complete_interrupts that prevented exceptions from being reinjected at all in the nested case. The hack was necessary because an reinjected exception into the nested guest could cause a nested vmexit emulation. But reinjected exceptions must not intercept. The downside of the hack is that a exception that in injected could get lost. This patch fixes the problem and puts the code for it into generic x86 files because. Nested-VMX will likely have the same problem and could reuse the code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: SVM: Report emulated SVM features to userspace	Joerg Roedel
	This patch implements the reporting of the emulated SVM features to userspace instead of the real hardware capabilities. Every real hardware capability needs emulation in nested svm so the old behavior was broken. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: x86: Add callback to let modules decide over some supported cpuid bits	Joerg Roedel
	This patch adds the get_supported_cpuid callback to kvm_x86_ops. It will be used in do_cpuid_ent to delegate the decission about some supported cpuid bits to the architecture modules. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: SVM: Propagate nested entry failure into guest hypervisor	Joerg Roedel
	This patch implements propagation of a failes guest vmrun back into the guest instead of killing the whole guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: SVM: Sync cr0 and cr3 to kvm state before nested handling	Joerg Roedel
	This patch syncs cr0 and cr3 from the vmcb to the kvm state before nested intercept handling is done. This allows to simplify the vmexit path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: SVM: Make sure rip is synced to vmcb before nested vmexit	Joerg Roedel
	This patch fixes a bug where a nested guest always went over the same instruction because the rip was not advanced on a nested vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: SVM: Fix nested nmi handling	Joerg Roedel
	The patch introducing nested nmi handling had a bug. The check does not belong to enable_nmi_window but must be in nmi_allowed. This patch fixes this. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	Merge remote branch 'tip/perf/core'	Avi Kivity
	Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: VMX: free vpid when fail to create vcpu	Lai Jiangshan
	Fix bug of the exception path, free allocated vpid when fail to create vcpu. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Enable native paired singles	Alexander Graf
	When we're on a paired single capable host, we can just always enable paired singles and expose them to the guest directly. This approach breaks when multiple VMs run and access PS concurrently, but this should suffice until we get a proper framework for it in Linux. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Find HTAB ourselves	Alexander Graf
	For KVM we need to find the location of the HTAB. We can either rely on internal data structures of the kernel or ask the hardware. Ben issued complaints about the internal data structure method, so let's switch it to our own inquiry of the HTAB. Now we're fully independend :-). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Fix Book3S_64 Host MMU debug output	Alexander Graf
	We have some debug output in Book3S_64. Some of that was invalid though, partially not even compiling because it accessed incorrect variables. So let's fix that up, making debugging more fun again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Set VSID_PR also for Book3S_64	Alexander Graf
	Book3S_64 didn't set VSID_PR when we're in PR=1. This lead to pretty bad behavior when searching for the shadow segment, as part of the code relied on VSID_PR being set. This patch fixes booting Book3S_64 guests. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Be more informative on BUG	Alexander Graf
	We have a condition in the ppc64 host mmu code that should never occur. Unfortunately, it just did happen to me and I was rather puzzled on why, because BUG_ON doesn't tell me anything useful. So let's add some more debug output in case this goes wrong. Also change BUG to WARN, since I don't want to reboot every time I mess something up. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Make Alignment interrupts work again	Alexander Graf
	In the process of merging Book3S_32 and 64 I somehow ended up having the alignment interrupt handler take last_inst, but the fetching code not fetching it. So we ended up with stale last_inst values. Let's just enable last_inst fetching for alignment interrupts too. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Improve split mode	Alexander Graf
	When in split mode, instruction relocation and data relocation are not equal. So far we implemented this mode by reserving a special pseudo-VSID for the two cases and flushing all PTEs when going into split mode, which is slow. Unfortunately 32bit Linux and Mac OS X use split mode extensively. So to not slow down things too much, I came up with a different idea: Mark the split mode with a bit in the VSID and then treat it like any other segment. This means we can just flush the shadow segment cache, but keep the PTEs intact. I verified that this works with ppc32 Linux and Mac OS X 10.4 guests and does speed them up. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Make Performance Counters work	Alexander Graf
	When we get a performance counter interrupt we need to route it on to the Linux handler after we got out of the guest context. We also need to tell our handling code that this particular interrupt doesn't need treatment. So let's add those two bits in, making perf work while having a KVM guest running. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Convert u64 -> ulong	Alexander Graf
	There are some pieces in the code that I overlooked that still use u64s instead of longs. This slows down 32 bit hosts unnecessarily, so let's just move them to ulong. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Enable Book3S_32 KVM building	Alexander Graf
	Now that we have all the bits and pieces in place, let's enable building of the Book3S_32 target. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Add KVM intercept handlers	Alexander Graf
	When an interrupt occurs we don't know yet if we're in guest context or in host context. When in guest context, KVM needs to handle it. So let's pull the same trick we did on Book3S_64: Just add a macro to determine if we're in guest context or not and if so jump on to KVM code. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Check max IRQ prio	Alexander Graf
	We have a define on what the highest bit of IRQ priorities is. So we can just as well use it in the bit checking code and avoid invalid IRQ values to be triggered. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	PPC: Export SWITCH_FRAME_SIZE	Alexander Graf
	We need the SWITCH_FRAME_SIZE define on Book3S_32 now too. So let's export it unconditionally. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Export MMU variables	Alexander Graf
	Our shadow MMU code needs to know where the HTAB is located and how big it is. So we need some variables from the kernel exported to module space if KVM is built as a module. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Add Book3S compatibility code	Alexander Graf
	Some code we had so far required defines and had code that was completely Book3S_64 specific. Since we now opened book3s.c to Book3S_32 too, we need to take care of these pieces. So let's add some minor code where it makes sense to not go the Book3S_64 code paths and add compat defines on others. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Emulate segment fault	Alexander Graf
	Book3S_32 doesn't know about segment faults. It only knows about page faults. So in order to know that we didn't map a segment, we need to fake segment faults. We do this by setting invalid segment registers to an invalid VSID and then check for that VSID on normal page faults. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Add SVCPU to Book3S_32	Alexander Graf
	We need to keep the pointer to the shadow vcpu somewhere accessible from within really early interrupt code. The best fit I found was the thread struct, as that resides in an SPRG. So let's put a pointer to the shadow vcpu in the thread struct and add an asm-offset so we can find it. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17	KVM: PPC: Remove fetch fail code	Alexander Graf
	When instruction fetch failed, the inline function hook automatically detects that and starts the internal guest memory load function. So whenever we access kvmppc_get_last_inst(), we're sure the result is sane. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>