summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/hyp
AgeCommit message (Collapse)Author
2026-03-05KVM: arm64: pkvm: Fallback to level-3 mapping on host stage-2 faultMarc Zyngier
If, for any odd reason, we cannot converge to mapping size that is completely contained in a memblock region, we fail to install a S2 mapping and go back to the faulting instruction. Rince, repeat. This happens when faulting in regions that are smaller than a page or that do not have PAGE_SIZE-aligned boundaries (as witnessed on an O6 board that refuses to boot in protected mode). In this situation, fallback to using a PAGE_SIZE mapping anyway -- it isn't like we can go any lower. Fixes: e728e705802fe ("KVM: arm64: Adjust range correctly during host stage-2 faults") Link: https://lore.kernel.org/r/86wlzr77cn.wl-maz@kernel.org Cc: stable@vger.kernel.org Cc: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://patch.msgid.link/20260305132751.2928138-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Arm: - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... [his words, not mine] - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info() Generic: - Remove internal Kconfigs that are now set on all architectures - Remove per-architecture code to enable KVM_CAP_SYNC_MMU, all architectures finally enable it in Linux 7.0" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: always define KVM_CAP_SYNC_MMU KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER KVM: arm64: Deduplicate ASID retrieval code irqchip/gic-v5: Fix inversion of IRS_IDR0.virt flag KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMs KVM: arm64: Fix protected mode handling of pages larger than 4kB KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state() KVM: arm64: Fix ID register initialization for non-protected pKVM guests KVM: arm64: Optimise away S1POE handling when not supported by host KVM: arm64: Hide S1POE from guests when not supported by the host
2026-02-25arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBIMark Rutland
The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. The workaround was originally added for Qualcomm Falkor erratum 1009 in commit: d9ff80f83ecb ("arm64: Work around Falkor erratum 1009") As noted in the message for that commit, the workaround is applied even in cases where it is not strictly necessary. The workaround was later reused without changes for: * Arm Cortex-A76 erratum #1286807 SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/ * Arm Cortex-A55 erratum #2441007 SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/ * Arm Cortex-A510 erratum #2441009 SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/ The important details to note are as follows: 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH ... the existing workaround will expand this to: TLBI <op1>[, <arg1>] DSB ISH // additional TLBI <op1>[, <arg1>] // additional TLBI <op2>[, <arg2>] DSB ISH // additional TLBI <op2>[, <arg2>] // additional TLBI <op3>[, <arg3>] DSB ISH // additional TLBI <op3>[, <arg3>] // additional DSB ISH ... whereas it is sufficient to have: TLBI <op1>[, <arg1>] TLBI <op2>[, <arg2>] TLBI <op3>[, <arg3>] DSB ISH TLBI <opX>[, <argX>] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Taking the above points into account, this patch reworks the workaround logic to reduce overhead: * New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are added and used in place of any dsb(ish) which is used to complete broadcast Stage-1 TLB maintenance. When the ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will execute an additional TLBI;DSB sequence. For consistency, it might make sense to add __tlbi_sync_*() helpers for local and stage 2 maintenance. For now I've left those with open-coded dsb() to keep the diff small. * The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This is no longer needed as the necessary synchronization will happen in __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp(). * The additional TLBI operation is chosen to have minimal impact: - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused entry for the reserved ASID in the kernel's own translation regime, and have no adverse affect. - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used in hyp code, where it will target an unused entry in the hyp code's TTBR0 mapping, and should have no adverse effect. * As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no need for arch_tlbbatch_should_defer() to consider ARM64_WORKAROUND_REPEAT_TLBI. When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes the resulting Image 64KiB smaller: | [mark@lakrids:~/src/linux]% size vmlinux-* | text data bss dec hex filename | 21179831 19660919 708216 41548966 279fca6 vmlinux-after | 21181075 19660903 708216 41550194 27a0172 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l vmlinux-* | -rwxr-xr-x 1 mark mark 157771472 Feb 4 12:05 vmlinux-after | -rwxr-xr-x 1 mark mark 157815432 Feb 4 12:05 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l Image-* | -rw-r--r-- 1 mark mark 41007616 Feb 4 12:05 Image-after | -rw-r--r-- 1 mark mark 41073152 Feb 4 12:05 Image-before Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-02-13KVM: arm64: Remove redundant kern_hyp_va() in unpin_host_sve_state()Fuad Tabba
The `sve_state` pointer in `hyp_vcpu->vcpu.arch` is initialized as a hypervisor virtual address during vCPU initialization in `pkvm_vcpu_init_sve()`. `unpin_host_sve_state()` calls `kern_hyp_va()` on this address. Since `kern_hyp_va()` is idempotent, it's not a bug. However, it is unnecessary and potentially confusing. Remove the redundant conversion. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-13KVM: arm64: Fix ID register initialization for non-protected pKVM guestsFuad Tabba
In protected mode, the hypervisor maintains a separate instance of the `kvm` structure for each VM. For non-protected VMs, this structure is initialized from the host's `kvm` state. Currently, `pkvm_init_features_from_host()` copies the `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` flag from the host without the underlying `id_regs` data being initialized. This results in the hypervisor seeing the flag as set while the ID registers remain zeroed. Consequently, `kvm_has_feat()` checks at EL2 fail (return 0) for non-protected VMs. This breaks logic that relies on feature detection, such as `ctxt_has_tcrx()` for TCR2_EL1 support. As a result, certain system registers (e.g., TCR2_EL1, PIR_EL1, POR_EL1) are not saved/restored during the world switch, which could lead to state corruption. Fix this by explicitly copying the ID registers from the host `kvm` to the hypervisor `kvm` for non-protected VMs during initialization, since we trust the host with its non-protected guests' features. Also ensure `KVM_ARCH_FLAG_ID_REGS_INITIALIZED` is cleared initially in `pkvm_init_features_from_host` so that `vm_copy_id_regs` can properly initialize them and set the flag once done. Fixes: 41d6028e28bd ("KVM: arm64: Convert the SVE guest vcpu flag to a vm flag") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260213143815.1732675-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05Merge branch kvm-arm64/misc-6.20 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.20: : . : Misc KVM/arm64 changes for 6.20 : : - Trivial FPSIMD cleanups : : - Calculate hyp VA size only once, avoiding potential mapping issues when : VA bits is smaller than expected : : - Silence sparse warning for the HYP stack base : : - Fix error checking when handling FFA_VERSION : : - Add missing trap configuration for DBGWCR15_EL1 : : - Don't try to deal with nested S2 when NV isn't enabled for a guest : : - Various spelling fixes : . KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported KVM: arm64: Fix various comments KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1 KVM: arm64: Fix error checking for FFA_VERSION KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include KVM: arm64: Calculate hyp VA size only once KVM: arm64: Remove ISB after writing FPEXC32_EL2 KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host() Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05Merge branch kvm-arm64/gicv5-prologue into kvmarm-master/nextMarc Zyngier
* kvm-arm64/gicv5-prologue: : . : Prologue to GICv5 support, courtesy of Sascha Bischoff. : : This is preliminary work that sets the scene for the full-blow : support. : . irqchip/gic-v5: Check if impl is virt capable KVM: arm64: gic: Set vgic_model before initing private IRQs arm64/sysreg: Drop ICH_HFGRTR_EL2.ICC_HAPR_EL1 and make RES1 KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05Merge branch kvm-arm64/fwb-for-all into kvmarm-master/nextMarc Zyngier
* kvm-arm64/fwb-for-all: : . : Allow pKVM's host stage-2 mappings to use the Force Write Back version : of the memory attributes by using the "pass-through' encoding. : : This avoids having two separate encodings for S2 on a given platform. : . KVM: arm64: Simplify PAGE_S2_MEMATTR KVM: arm64: Kill KVM_PGTABLE_S2_NOFWB KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1 KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flag arm64: Add MT_S2{,_FWB}_AS_S1 encodings Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05Merge branch kvm-arm64/pkvm-no-mte into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-no-mte: : . : pKVM updates preventing the host from using MTE-related system : sysrem registers when the feature is disabled from the kernel : command-line (arm64.nomte), courtesy of Fuad Taba. : : From the cover letter: : : "If MTE is supported by the hardware (and is enabled at EL3), it remains : available to lower exception levels by default. Disabling it in the host : kernel (e.g., via 'arm64.nomte') only stops the kernel from advertising : the feature; it does not physically disable MTE in the hardware. : : The ability to disable MTE in the host kernel is used by some systems, : such as Android, so that the physical memory otherwise used as tag : storage can be used for other things (i.e. treated just like the rest of : memory). In this scenario, a malicious host could still access tags in : pages donated to a guest using MTE instructions (e.g., STG and LDG), : bypassing the kernel's configuration." : . KVM: arm64: Use kvm_has_mte() in pKVM trap initialization KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabled KVM: arm64: Trap MTE access and discovery when MTE is disabled KVM: arm64: Remove dead code resetting HCR_EL2 for pKVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30KVM: arm64: gic-v3: Switch vGIC-v3 to use generated ICH_VMCR_EL2Sascha Bischoff
The VGIC-v3 code relied on hand-written definitions for the ICH_VMCR_EL2 register. This register, and the associated fields, is now generated as part of the sysreg framework. Move to using the generated definitions instead of the hand-written ones. There are no functional changes as part of this change. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260128175919.3828384-3-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30KVM: arm64: Fix various commentsZenghui Yu (Huawei)
Use tab instead of whitespaces, as well as 2 minor typo fixes. Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260128075208.23024-1-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-25KVM: arm64: Simplify PAGE_S2_MEMATTRMarc Zyngier
Restore PAGE_S2_MEMATTR() to its former glory, keeping the use of FWB as an implementation detail. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260123191637.715429-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-25KVM: arm64: Kill KVM_PGTABLE_S2_NOFWBMarc Zyngier
Nobody is using this flag anymore, so remove it. This allows some cleanup by removing stage2_has_fwb(), which is can be replaced by a direct check on the capability. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260123191637.715429-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-25KVM: arm64: Switch pKVM host S2 over to KVM_PGTABLE_S2_AS_S1Marc Zyngier
Since we have the basics to use the S1 memory attributes as the final ones with FWB, flip the host over to that when FWB is present. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260123191637.715429-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-25KVM: arm64: Add KVM_PGTABLE_S2_AS_S1 flagMarc Zyngier
Plumb the MT_S2{,_FWB}_AS_S1 memory types into the KVM_S2_MEMATTR() macro with a new KVM_PGTABLE_S2_AS_S1 flag. Nobody selects it yet. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260123191637.715429-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23KVM: arm64: Use kvm_has_mte() in pKVM trap initializationFuad Tabba
When initializing HCR traps in protected mode, use kvm_has_mte() to check for MTE support rather than kvm_has_feat(kvm, ID_AA64PFR1_EL1, MTE, IMP). kvm_has_mte() provides a more comprehensive check: - kvm_has_feat() only checks if MTE is in the guest's ID register view (i.e., what we advertise to the guest) - kvm_has_mte() checks both system_supports_mte() AND whether KVM_ARCH_FLAG_MTE_ENABLED is set for this VM instance Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260122112218.531948-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23KVM: arm64: Inject UNDEF when accessing MTE sysregs with MTE disabledFuad Tabba
When MTE hardware is present but disabled via software (`arm64.nomte` or `CONFIG_ARM64_MTE=n`), the kernel clears `HCR_EL2.ATA` and sets `HCR_EL2.TID5`, to prevent the use of MTE instructions. Additionally, accesses to certain MTE system registers trap to EL2 with exception class ESR_ELx_EC_SYS64. To emulate hardware without MTE (where such accesses would cause an Undefined Instruction exception), inject UNDEF into the host. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260122112218.531948-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23KVM: arm64: Remove dead code resetting HCR_EL2 for pKVMFuad Tabba
The pKVM lifecycle does not support tearing down the hypervisor and returning to the hyp stub once initialized. The transition to protected mode is one-way. Consequently, the code path in hyp-init.S responsible for resetting EL2 registers (triggered by kexec or hibernation) is unreachable in protected mode. Remove the dead code handling HCR_EL2 reset for ARM64_KVM_PROTECTED_MODE. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260122112218.531948-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/pkvm-features-6.20 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-features-6.20: : . : pKVM guest feature trapping fixes, courtesy of Fuad Tabba. : . KVM: arm64: Prevent host from managing timer offsets for protected VMs KVM: arm64: Check whether a VM IOCTL is allowed in pKVM KVM: arm64: Track KVM IOCTLs and their associated KVM caps KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM KVM: arm64: Include VM type when checking VM capabilities in pKVM KVM: arm64: Introduce helper to calculate fault IPA offset KVM: arm64: Fix MTE flag initialization for protected VMs KVM: arm64: Fix Trace Buffer trap polarity for protected VMs KVM: arm64: Fix Trace Buffer trapping for protected VMs Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/feat_idst into kvmarm-master/nextMarc Zyngier
* kvm-arm64/feat_idst: : . : Add support for FEAT_IDST, allowing ID registers that are not implemented : to be reported as a normal trap rather than as an UNDEF exception. : . KVM: arm64: selftests: Add a test for FEAT_IDST KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome KVM: arm64: pkvm: Add a generic synchronous exception injection primitive KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers KVM: arm64: Add a generic synchronous exception injection primitive KVM: arm64: Add trap routing for GMID_EL1 arm64: Repaint ID_AA64MMFR2_EL1.IDS description Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/vtcr into kvmarm-master/nextMarc Zyngier
* kvm-arm64/vtcr: : . : VTCR_EL2 conversion to the configuration-driven RESx framework, : fixing a couple of UXN/PXN/XN bugs in the process. : . KVM: arm64: nv: Return correct RES0 bits for FGT registers KVM: arm64: Always populate FGT masks at boot time KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co arm64: Convert VTCR_EL2 to sysreg infratructure arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps() KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate() KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp() KVM: arm64: Inject UNDEF for a register trap without accessor KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load KVM: arm64: Fix EL2 S1 XN handling for hVHE setups KVM: arm64: gic: Check for vGICv3 when clearing TWI Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22arm64: Unconditionally enable PAN supportMarc Zyngier
FEAT_PAN has been around since ARMv8.1 (over 11 years ago), has no compiler dependency (we have our own accessors), and is a great security benefit. Drop CONFIG_ARM64_PAN, and make the support unconditionnal. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-16KVM: arm64: Fix error checking for FFA_VERSIONKornel Dulęba
According to section 13.2 of the DEN0077 FF-A specification, when firmware does not support the requested version, it should reply with FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error code as int32. Currently, the error checking logic compares the unsigned long return value it got from the SMC layer, against a "-1" literal. This fails due to a type mismatch: the literal is extended to 64 bits, whereas the register contains only 32 bits of ones(0x00000000ffffffff). Consequently, hyp_ffa_init misinterprets the "-1" return value as an invalid FF-A version. This prevents pKVM initialization on devices where FF-A is not supported in firmware. Fix this by explicitly casting res.a0 to s32. Signed-off-by: Kornel Dulęba <korneld@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251114-pkvm_init_noffa-v1-1-87a82e87c345@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Include VM type when checking VM capabilities in pKVMFuad Tabba
Certain features and capabilities are restricted in protected mode. Most of these features are restricted only for protected VMs, but some are restricted for ALL VMs in protected mode. Extend the pKVM capability check to pass the VM (kvm), and use that when determining supported features. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Introduce helper to calculate fault IPA offsetFuad Tabba
This 12-bit FAR fault IPA offset mask is hard-coded as 'GENMASK(11, 0)' in several places to reconstruct the full fault IPA. Introduce FAR_TO_FIPA_OFFSET() to calculate this value in a shared header and replace all open-coded instances to improve readability. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Fix MTE flag initialization for protected VMsFuad Tabba
The function pkvm_init_features_from_host() initializes guest features, propagating them from the host. The logic to propagate KVM_ARCH_FLAG_MTE_ENABLED (Memory Tagging Extension) has a couple of issues. First, the check was in the common path, before the divergence for protected and non-protected VMs. For non-protected VMs, this was unnecessary, as 'kvm->arch.flags' is completely overwritten by host_arch_flags immediately after, which already contains the MTE flag. For protected VMs, this was setting the flag even if the feature is not allowed. Second, the check was reading 'host_kvm->arch.flags' instead of using the local 'host_arch_flags', which is read once from the host flags. Fix these by moving the MTE flag check inside the protected-VM-only path, checking if the feature is allowed, and changing it to use the correct host_arch_flags local variable. This ensures non-protected VMs get the flag via the bulk copy, and protected VMs get it via an explicit check. Fixes: b7f345fbc32a ("KVM: arm64: Fix FEAT_MTE in pKVM") Reviewed-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Fix Trace Buffer trap polarity for protected VMsFuad Tabba
The E2TB bits in MDCR_EL2 control trapping of Trace Buffer system register accesses. These accesses are trapped to EL2 when the bits are clear. The trap initialization logic for protected VMs in pvm_init_traps_mdcr() had the polarity inverted. When a guest did not support the Trace Buffer feature, the code was setting E2TB. This incorrectly disabled the trap, potentially allowing a protected guest to access registers for a feature it was not given. Fix this by inverting the operation. Fixes: f50758260bff ("KVM: arm64: Group setting traps for protected VMs by control register") Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Fix Trace Buffer trapping for protected VMsFuad Tabba
For protected VMs in pKVM, the hypervisor should trap accesses to trace buffer system registers if Trace Buffer isn't supported by the VM. However, the current code only traps if Trace Buffer External Mode isn't supported. Fix this by checking for FEAT_TRBE (Trace Buffer) rather than FEAT_TRBE_EXT. Fixes: 9d5261269098 ("KVM: arm64: Trap external trace for protected VMs") Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndromeMarc Zyngier
With FEAT_IDST, unimplemented system registers in the feature ID space must be reported using EC=0x18 at the closest handling EL, rather than with an UNDEF. Most of these system registers are always implemented thanks to their dependency on FEAT_AA64, except for a set of (currently) three registers: GMID_EL1 (depending on MTE2), CCSIDR2_EL1 (depending on FEAT_CCIDX), and SMIDR_EL1 (depending on SME). For these three registers, report their trap as EC=0x18 if they end-up trapping into KVM and that FEAT_IDST is implemented in the guest. Otherwise, just make them UNDEF. Link: https://patch.msgid.link/20260108173233.2911955-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: pkvm: Add a generic synchronous exception injection primitiveMarc Zyngier
Similarly to the "classic" KVM code, pKVM doesn't have an "anything goes" synchronous exception injection primitive. Carve one out of the UNDEF injection code. Link: https://patch.msgid.link/20260108173233.2911955-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Repaint ID_AA64MMFR2_EL1.IDS descriptionMarc Zyngier
ID_AA64MMFR2_EL1.IDS, as described in the sysreg file, is pretty horrible as it diesctly give the ESR value. Repaint it using the usual NI/IMP identifiers to describe the absence/presence of FEAT_IDST. Also add the new EL3 routing feature, even if we really don't care about it. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://patch.msgid.link/20260108173233.2911955-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Honor UX/PX attributes for EL2 S1 mappingsMarc Zyngier
Now that we potentially have two bits to deal with when setting execution permissions, make sure we correctly handle them when both when building the page tables and when reading back from them. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Convert VTCR_EL2 to sysreg infratructureMarc Zyngier
Our definition of VTCR_EL2 is both partial (tons of fields are missing) and totally inconsistent (some constants are shifted, some are not). They are also expressed in terms of TCR, which is rather inconvenient. Replace the ad-hoc definitions with the the generated version. This results in a bunch of additional changes to make the code with the unshifted nature of generated enumerations. The register data was extracted from the BSD licenced AARCHMRS (AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-10KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkersWill Deacon
Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT') to KVM's page-table code. When set, the walk logic maintains its previous behaviour of terminating a walk as soon as the visitor callback returns an error. However, when the flag is clear, the walk will continue if the visitor returns -EAGAIN and the error is then suppressed and returned as zero to the caller. Clearing the flag is beneficial when write-protecting a range of IPAs with kvm_pgtable_stage2_wrprotect() but is not useful in any other cases, either because we are operating on a single page (e.g. kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the early termination is desirable (e.g. when mapping pages from a fault in user_mem_abort()). Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but failed to propagate any of the walker flags. As a result, page-table walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the early termination semantics are desirable on the fault handling path. Rather than complicate the pKVM hypercall interface, invert the flag so that the whole thing can be simplified and only pass the new flag ('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code. Cc: Fuad Tabba <tabba@google.com> Cc: Quentin Perret <qperret@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://msgid.link/20260105154939.11041-2-will@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-09KVM: arm64: Don't blindly set set PSTATE.PAN on guest exitMarc Zyngier
We set PSTATE.PAN to 1 on exiting from a guest if PAN support has been compiled in and that it exists on the HW. However, this is not necessarily correct. In a nVHE configuration, there is no notion of PAN at EL2, so setting PSTATE.PAN to anything is pointless. Furthermore, not setting PAN to 0 when CONFIG_ARM64_PAN isn't set means we run with the *guest's* PSTATE.PAN (which might be set to 1), and we will explode on the next userspace access. Yes, the architecture is delightful in that particular corner. Fix the whole thing by always setting PAN to something when running VHE (which implies PAN support), and only ignore it when running nVHE. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20260107124600.2736328-1-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()Alexandru Elisei
synchronize_vcpu_pstate() doesn't make use of the reference to exit_code, remove the parameter. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251216103053.47224-5-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU loadAlexandru Elisei
Commit fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") introduced per-VCPU FGT traps. For an unprotected pKVM VCPU, the untrusted host FGT configuration is copied in pkvm_vcpu_init_traps(), which is called from __pkvm_init_vcpu(). __pkvm_init_vcpu() is called once per VCPU (when the VCPU is first run) which means that the uninitialized, zero, values for the FGT registers end up being used for the entire lifetime of the VCPU. This causes both unwanted traps (for the inverse polarity trap bits) and the guest being allowed to access registers it shouldn't. Fix it by copying the FGT traps for unprotected pKVM VCPUs when the untrusted host loads the VCPU. Fixes: fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") Acked-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251216103053.47224-2-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-07KVM: arm64: Remove ISB after writing FPEXC32_EL2Mark Rutland
The value of FPEX32_EL2 has no effect on execution in AArch64 state, and consequently there's no need for an ISB after writing to it in the hyp code (which executes in AArch64 state). When performing an exception return to AArch32 state, the exception return will provide the necessary context synchronization event. Remove the redundant ISB. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260106173707.3292074-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-07KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()Mark Rutland
The comment in fpsimd_lazy_switch_to_host() erroneously says guest traps for FPSIMD/SVE/SME are disabled by fpsimd_lazy_switch_to_guest(). In reality, the traps are disabled by __activate_cptr_traps(), and fpsimd_lazy_switch_to_guest() only manipulates the SVE vector length. This was mistake; I accidentally copy+pasted the wrong function name in commit: 59419f10045b ("KVM: arm64: Eagerly switch ZCR_EL{1,2}") Fix the comment. Fixes: 59419f10045b ("KVM: arm64: Eagerly switch ZCR_EL{1,2}") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260106173707.3292074-2-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-02Merge tag 'kvmarm-6.19' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests
2025-12-01Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/nextOliver Upton
* kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01Merge branch 'kvm-arm64/vgic-lr-overflow' into kvmarm/nextOliver Upton
* kvm-arm64/vgic-lr-overflow: (50 commits) : Support for VGIC LR overflows, courtesy of Marc Zyngier : : Address deficiencies in KVM's GIC emulation when a vCPU has more active : IRQs than can be represented in the VGIC list registers. Sort the AP : list to prioritize inactive and pending IRQs, potentially spilling : active IRQs outside of the LRs. : : Handle deactivation of IRQs outside of the LRs for both EOImode=0/1, : which involves special consideration for SPIs being deactivated from a : different vCPU than the one that acked it. KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE KVM: arm64: selftests: vgic_irq: Add timer deactivation test KVM: arm64: selftests: vgic_irq: Add Group-0 enable test KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default KVM: arm64: selftests: gic_v3: Add irq group setting helper KVM: arm64: GICv2: Always trap GICV_DIR register KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps KVM: arm64: GICv2: Handle LR overflow when EOImode==0 KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR KVM: arm64: GICv3: Handle in-LR deactivation when possible KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation ... Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR=y / W=e): arch/arm64/kvm/hyp/pgtable.c:757:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions] 757 | } | ^ With older versions of clang (15 and older) and GCC (at least the minimum supported, 8.1), this is an unconditional hard error: arch/arm64/kvm/hyp/pgtable.c: In function 'kvm_pgtable_stage2_pte_prot': arch/arm64/kvm/hyp/pgtable.c:756:2: error: label at end of compound statement default: ^~~~~~~ arch/arm64/kvm/hyp/pgtable.c:756:10: error: label at end of compound statement: expected statement default: ^ ; Add a break statement to this default case to clear up the error/warning. Fixes: 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251125-arm64-kvm-hyp-pgtable-fix-c23-ext-warn-v1-1-98b506ddefbf@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv2: Always trap GICV_DIR registerMarc Zyngier
Since we can't decide to trap the DIR register on a per-vcpu basis, always trap the second page of the GIC CPU interface. Yes, this is costly. On the bright side, no sane SW should use EOImode==1 on GICv2... Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-40-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.EnMarc Zyngier
FEAT_NV2 is pretty terrible for anything that tries to enforce immediate effects, and writing to ICH_HCR_EL2 in the hope to disable a maintenance interrupt is vain. This only hits memory, and the guest hasn't cleared anything -- the MI will fire. For example, running the vgic_irq test under NV results in about 800 maintenance interrupts being actually handled by the L1 guest, when none were expected. As a cheap workaround, read back ICH_MISR_EL2 after writing 0 to ICH_HCR_EL2. This is very cheap on real HW, and causes a trap to the host in NV, giving it the opportunity to retire the pending MI. With this, the above test runs to completion without any MI being actually handled. Yes, this is really poor... Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-37-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulationMarc Zyngier
The current approach to nested GICv3 support is to not do anything while L2 is running, wait a transition from L2 to L1 to resync LRs, VMCR and HCR, and only then evaluate the state to decide whether to generate a maintenance interrupt. This doesn't provide a good quality of emulation, and it would be far preferable to find out early that we need to perform a switch. Move the LRs/VMCR and HCR resync into vgic_v3_sync_nested(), so that we have most of the state available. As we turning the vgic off at this stage to avoid a screaming host MI, add a new helper vgic_v3_flush_nested() that switches the vgic on again. The MI can then be directly injected as required. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-35-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: Handle in-LR deactivation when possibleMarc Zyngier
Even when we have either an LR overflow or SPIs in flight, it is extremely likely that the interrupt being deactivated is still in the LRs, and that going all the way back to the the generic trap handling code is a waste of time. Instead, try and deactivate in place when possible, and only if this fails, perform a full exit. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-33-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: Handle deactivation via ICV_DIR_EL1 trapsMarc Zyngier
Deactivation via ICV_DIR_EL1 is both relatively straightforward (we have the interrupt that needs deactivation) and really awkward. The main issue is that the interrupt may either be in an LR on another CPU, or ourside of any LR. In the former case, we process the deactivation is if ot was a write to GICD_CACTIVERn, which is already implemented as a big hammer IPI'ing all vcpus. In the latter case, we just perform a normal deactivation, similar to what we do for EOImode==0. Another annoying aspect is that we need to tell the CPU owning the interrupt that its ap_list needs laudering. We use a brand new vcpu request to that effect. Note that this doesn't address deactivation via the GICV MMIO view, which will be taken care of in a later change. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-29-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>