summaryrefslogtreecommitdiff
path: root/arch/x86/virt/hw.c
AgeCommit message (Collapse)Author
2026-04-09x86/virt: Treat SVM as unsupported when running as an SEV+ guestSean Christopherson
When running as an SEV+ guest, treat SVM as unsupported even if CPUID (and other reporting, e.g. MSRs) enumerate support for SVM, as KVM doesn't support nested virtualization within an SEV VM (KVM would need to explicitly share all VMCBs and other assets with the untrusted host), let alone running nested VMs within SEV-ES+ guests (e.g. emulating VMLOAD, VMSAVE, and VMRUN all require access to guest register state). And outside of KVM, there is no in-tree user of SVM enabling. Arguably, the hypervisor/VMM (e.g. QEMU) should clear SVM from guest CPUID for SEV VMs, especially for SEV-ES+, but super duper technically, it's feasible to run nested VMs in SEV+ guests (with many caveats). More importantly, Linux-as-a-guest has played nice with SVM being advertised to SEV+ guests for a long time. Treating SVM as unsupported fixes a regression where a clean shutdown of an SEV-ES+ guest degrades into an abrupt termination. Due to a gnarly virtualization hole in SEV-ES (the architecture), where EFER must NOT be intercepted by the hypervisor (because the untrusted hypervisor can't set e.g. EFER.LME on behalf o the guest), the _host's_ EFER.SVME is visible to the guest. Because EFER.SVME must be always '1' while in guest mode, Linux-the-guest sees EFER.SVME=1 even when _its_ EFER.SVME is '0', thinks it has enabled virtualization, and ultimately can cause x86_svm_emergency_disable_virtualization_cpu() to execute STGI to ensure GIF is enabled. Executing STGI _should_ be fine, except Linux is a also wee bit paranoid when running as an SEV-ES guest. Because L0 sees EFER.SVME=0 for the guest, a well-behaved L0 hypervisor will intercept STGI (to inject #UD), and thus generate a #VC on the STGI. Which, again, should be fine. Unfortunately, vc_check_opcode_bytes() fails to account for STGI and other SVM instructions, throws a fatal error, and triggers a termination request. In a perfect world, the #VC handler would be more forgiving of unknown intercepts, especially when the #VC happened on an instruction with exception fixup. For now, just fix the immediate regression. Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") Reported-by: Srikanth Aithal <sraithal@amd.com> Closes: https://lore.kernel.org/all/c820e242-9f3a-4210-b414-19d11b022404@amd.com Link: https://patch.msgid.link/20260409191341.1932853-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt: Add refcounting of VMX/SVM usage to support multiple in-kernel usersSean Christopherson
Implement a per-CPU refcounting scheme so that "users" of hardware virtualization, e.g. KVM and the future TDX code, can co-exist without pulling the rug out from under each other. E.g. if KVM were to disable VMX on module unload or when the last KVM VM was destroyed, SEAMCALLs from the TDX subsystem would #UD and panic the kernel. Disable preemption in the get/put APIs to ensure virtualization is fully enabled/disabled before returning to the caller. E.g. if the task were preempted after a 0=>1 transition, the new task would see a 1=>2 and thus return without enabling virtualization. Explicitly disable preemption instead of requiring the caller to do so, because the need to disable preemption is an artifact of the implementation. E.g. from KVM's perspective there is no _need_ to disable preemption as KVM guarantees the pCPU on which it is running is stable (but preemption is enabled). Opportunistically abstract away SVM vs. VMX in the public APIs by using X86_FEATURE_{SVM,VMX} to communicate what technology the caller wants to enable and use. Cc: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystemSean Christopherson
Move the majority of the code related to disabling hardware virtualization in emergency from KVM into the virt subsystem so that virt can take full ownership of the state of SVM/VMX. This will allow refcounting usage of SVM/VMX so that KVM and the TDX subsystem can enable VMX without stomping on each other. To route the emergency callback to the "right" vendor code, add to avoid mixing vendor and generic code, implement a x86_virt_ops structure to track the emergency callback, along with the SVM vs. VMX (vs. "none") feature that is active. To avoid having to choose between SVM and VMX, simply refuse to enable either if both are somehow supported. No known CPU supports both SVM and VMX, and it's comically unlikely such a CPU will ever exist. Leave KVM's clearing of loaded VMCSes and MSR_VM_HSAVE_PA in KVM, via a callback explicitly scoped to KVM. Loading VMCSes and saving/restoring host state are firmly tied to running VMs, and thus are (a) KVM's responsibility and (b) operations that are still exclusively reserved for KVM (as far as in-tree code is concerned). I.e. the contract being established is that non-KVM subsystems can utilize virtualization, but for all intents and purposes cannot act as full-blown hypervisors. Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: SVM: Move core EFER.SVME enablement to kernelSean Christopherson
Move the innermost EFER.SVME logic out of KVM and into to core x86 to land the SVM support alongside VMX support. This will allow providing a more unified API from the kernel to KVM, and will allow moving the bulk of the emergency disabling insanity out of KVM without having a weird split between kernel and KVM for SVM vs. VMX. No functional change intended. Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: VMX: Move core VMXON enablement to kernelSean Christopherson
Move the innermost VMXON+VMXOFF logic out of KVM and into to core x86 so that TDX can (eventually) force VMXON without having to rely on KVM being loaded, e.g. to do SEAMCALLs during initialization. Opportunistically update the comment regarding emergency disabling via NMI to clarify that virt_rebooting will be set by _another_ emergency callback, i.e. that virt_rebooting doesn't need to be set before VMCLEAR, only before _this_ invocation does VMXOFF. Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04x86/virt: Force-clear X86_FEATURE_VMX if configuring root VMCS failsSean Christopherson
If allocating and configuring a root VMCS fails, clear X86_FEATURE_VMX in all CPUs so that KVM doesn't need to manually check root_vmcs. As added bonuses, clearing VMX will reflect that VMX is unusable in /proc/cpuinfo, and will avoid a futile auto-probe of kvm-intel.ko. WARN if allocating a root VMCS page fails, e.g. to help users figure out why VMX is broken in the unlikely scenario something goes sideways during boot (and because the allocation should succeed unless there's a kernel bug). Tweak KVM's error message to suggest checking kernel logs if VMX is unsupported (in addition to checking BIOS). Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: VMX: Unconditionally allocate root VMCSes during boot CPU bringupSean Christopherson
Allocate the root VMCS (misleading called "vmxarea" and "kvm_area" in KVM) for each possible CPU during early boot CPU bringup, before early TDX initialization, so that TDX can eventually do VMXON on-demand (to make SEAMCALLs) without needing to load kvm-intel.ko. Allocate the pages early on, e.g. instead of trying to do so on-demand, to avoid having to juggle allocation failures at runtime. Opportunistically rename the per-CPU pointers to better reflect the role of the VMCS. Use Intel's "root VMCS" terminology, e.g. from various VMCS patents[1][2] and older SDMs, not the more opaque "VMXON region" used in recent versions of the SDM. While it's possible the VMCS passed to VMXON no longer serves as _the_ root VMCS on modern CPUs, it is still in effect a "root mode VMCS", as described in the patents. Link: https://patentimages.storage.googleapis.com/c7/e4/32/d7a7def5580667/WO2013101191A1.pdf [1] Link: https://patentimages.storage.googleapis.com/13/f6/8d/1361fab8c33373/US20080163205A1.pdf [2] Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move "kvm_rebooting" to kernel as "virt_rebooting"Sean Christopherson
Move "kvm_rebooting" to the kernel, exported for KVM, as one of many steps towards extracting the innermost VMXON and EFER.SVME management logic out of KVM and into to core x86. For lack of a better name, call the new file "hw.c", to yield "virt hardware" when combined with its parent directory. No functional change intended. Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>