| Age | Commit message (Collapse) | Author |
|
KVM/riscv changes for 6.20
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(),
and kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Add riscv vm satp modes in KVM selftests
- Transparent huge page support for G-stage
- Adjust the number of available guest irq files based on
MMIO register sizes in DeviceTree or ACPI
|
|
KVM selftests changes for 6.20
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking.
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support,
and extend x86's infrastructure to support EPT and NPT (for L2 guests).
- Extend several nested VMX tests to also cover nested SVM.
- Add a selftest for nested VMLOAD/VMSAVE.
- Rework the nested dirty log test, originally added as a regression test for
PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage
and to hopefully make the test easier to understand and maintain.
|
|
Current vm modes cannot represent riscv guest modes precisely, here add
all 9 combinations of P(56,40,41) x V(57,48,39). Also the default vm
mode is detected on runtime instead of hardcoded one, which might not be
supported on specific machine.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20251105151442.28767-1-wu.fei9@sanechips.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Update the nested dirty log test to validate KVM's handling of READ faults
when dirty logging is enabled. Specifically, set the Dirty bit in the
guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
when handling L2 read faults. When handling read faults in the shadow MMU,
KVM opportunistically creates a writable SPTE if the mapping can be
writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
if KVM doesn't need to intercept writes in order to emulate Dirty-bit
updates.
To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
sequences to separate L1 pages, and differentiate between "marked dirty
due to a WRITE access/fault" and "marked dirty due to creating a writable
SPTE for a READ access/fault". The updated sequence exposes the bug fixed
by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
with PML") when the guest performs a READ=>WRITE sequence with dirty guest
PTEs.
Opportunistically tweak and rename the address macros, and add comments,
to make it more obvious what the test is doing. E.g. NESTED_TEST_MEM1
vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
creating aliases in both the L2 GPA and GVA address spaces, but only when
L1 is using TDP to run L2.
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260115172154.709024-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Fix minor documentation errors in `kvm_util.h` and `kvm_util.c`.
- Correct the argument description for `vcpu_args_set` in `kvm_util.h`,
which incorrectly listed `vm` instead of `vcpu`.
- Fix a typo in the comment for `kvm_selftest_arch_init` ("exeucting" ->
"executing").
- Correct the return value description for `vm_vaddr_unused_gap` in
`kvm_util.c` to match the implementation, which returns an address "at
or above" `vaddr_min`, not "at or below".
No functional change intended.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-6-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
To avoid code duplication, move page_align() to the shared `kvm_util.h`
header file. Rename it to vm_page_align(), to make it clear that the
alignment is done with respect to the guest's base page size.
No functional change intended.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-5-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 3e06cdf10520 ("KVM: selftests: Add initial support for RISC-V 64-bit")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-4-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The implementation of `page_align()` in `processor.c` calculates
alignment incorrectly for values that are already aligned. Specifically,
`(v + vm->page_size) & ~(vm->page_size - 1)` aligns to the *next* page
boundary even if `v` is already page-aligned, potentially wasting a page
of memory.
Fix the calculation to use standard alignment logic: `(v + vm->page_size
- 1) & ~(vm->page_size - 1)`.
Fixes: 7a6629ef746d ("kvm: selftests: add virt mem support for aarch64")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-3-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM selftests map all guest code and data into the lower virtual address
range (0x0000...) managed by TTBR0_EL1. The upper range (0xFFFF...)
managed by TTBR1_EL1 is unused and uninitialized.
If a guest accesses the upper range, the MMU attempts a translation
table walk using uninitialized registers, leading to unpredictable
behavior.
Set `TCR_EL1.EPD1` to disable translation table walks for TTBR1_EL1,
ensuring that any access to the upper range generates an immediate
Translation Fault. Additionally, set `TCR_EL1.TBI1` (Top Byte Ignore) to
ensure that tagged pointers in the upper range also deterministically
trigger a Translation Fault via EPD1.
Define `TCR_EPD1_MASK`, `TCR_EPD1_SHIFT`, and `TCR_TBI1` in
`processor.h` to support this configuration. These are based on their
definitions in `arch/arm64/include/asm/pgtable-hwdef.h`.
Suggested-by: Will Deacon <will@kernel.org>
Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260109082218.3236580-2-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Instead of calling memstress_setup_ept_mappings() only in the first
iteration in the loop, move it before the loop.
The call needed to happen within the loop before commit e40e72fec0de
("KVM: selftests: Stop passing VMX metadata to TDP mapping functions"),
as memstress_setup_ept_mappings() used to take in a pointer to vmx_pages
and pass it into tdp_identity_map_1g() (to get the EPT root GPA). This
is no longer the case, as tdp_identity_map_1g() gets the EPT root
through stage2 MMU.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20260113171456.2097312-1-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the
"page table entry" makes it unnecessarily difficult to quickly understand
what callers are doing.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-21-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add L1 SVM code and generalize the setup code to work for both VMX and
SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-20-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
According to the APM, NPT walks are treated as user accesses. In
preparation for supporting NPT mappings, set the 'user' bit on NPTs by
adding a mask of bits to always be set on PTEs in kvm_mmu.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-18-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Implement nCR3 and NPT initialization functions, similar to the EPT
equivalents, and create common TDP helpers for enablement checking and
initialization. Enable NPT for nested guests by default if the TDP MMU
was initialized, similar to VMX.
Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
and S bits related to confidential VMs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-17-seanjc@google.com
[sean: apply Yosry's fixup for ncr3_gpa]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In preparation for generalizing the nested dirty logging test, checking
if either EPT or NPT is enabled will be needed. To avoid needing to gate
the kvm_cpu_has_ept() call by the CPU type, make sure the function
returns false if VMX is not available instead of trying to read VMX-only
MSRs.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that the functions are no longer VMX-specific, move them to
processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rework tdp_map() and friends to use __virt_pg_map() and drop the custom
EPT code in __tdp_pg_map() and tdp_create_pte(). The EPT code and
__virt_pg_map() are practically identical, the main differences are:
- EPT uses the EPT struct overlay instead of the PTE masks.
- EPT always assumes 4-level EPTs.
To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and
X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass
in the EPT PTE masks along with the root page level (which is currently
hardcoded to '4').
Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's
no reason to multiplex bits in the selftests, e.g. selftests aren't trying
to shadow guest PTEs and thus don't care about funnelling protections into
a common permissions check.
Another benefit of reusing the code is having separate handling for
upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
large bit on a 4K PTE in the EPTs.
For all intents and purposes, no functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The root GPA is now retrieved from the nested MMU, stop passing VMX
metadata. This is in preparation for making these functions work for
NPTs as well.
Opportunistically drop tdp_pg_map() since it's unused.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-12-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
prepare_eptp() currently allocates new EPTs for each vCPU. memstress has
its own hack to share the EPTs between vCPUs. Currently, there is no
reason to have separate EPTs for each vCPU, and the complexity is
significant. The only reason it doesn't matter now is because memstress
is the only user with multiple vCPUs.
Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use
it everywhere to replace prepare_eptp(). Drop 'eptp' and 'eptp_hva' from
'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built
from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need
to be passed in along with vmx_pages. Dynamically allocate the TDP MMU
structure to avoid a cyclical dependency between kvm_util_arch.h and
kvm_util.h.
Remove the workaround in memstress to copy the EPT root between vCPUs
since that's now the default behavior.
Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid
recreating the same mess that KVM has with respect to "nested" MMUs, e.g.
does nested refer to the stage-2 page tables created by L1, or the stage-1
page tables created by L2?
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://patch.msgid.link/20251230230150.4150236-11-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping
functions. Introduce helpers to read/write different PTE bits given a
kvm_mmu.
Drop the 'global' bit definition as it's currently unused, but leave the
'user' bit as it will be used in coming changes. Opportunisitcally
rename 'large' to 'huge' as it's more consistent with the kernel naming.
Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and
a lot of other macros depend on it. It's tempting to move all the other
macros to be per-struct instead, but it would be too much noise for
little benefit.
Keep c_bit and s_bit in vm->arch as they used before the MMU is
initialized, through __vmcreate() -> vm_userspace_mem_region_add() ->
vm_mem_add() -> vm_arch_has_protected_memory().
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rename accessors to is_<adjective>_pte()]
Link: https://patch.msgid.link/20251230230150.4150236-10-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In preparation for generalizing the x86 virt mapping APIs to work with
TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper
functions instead of operating on vm->mmu directly.
Opportunistically swap the order of the check in virt_get_pte() to first
assert that the parent is the PGD, and then check that the PTE is present,
as it makes more sense to check if the parent PTE is the PGD/root (i.e.
not a PTE) before checking that the PTE is PRESENT.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rebase on common kvm_mmu structure, rewrite changelog]
Link: https://patch.msgid.link/20251230230150.4150236-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1
MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for
both stage-1 and stage-2 MMUs, without creating the potential for subtle
bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2
MMU.
Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of
burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in
____vm_create(), and in the hopes that other architectures can utilize the
formalized MMU structure if/when they too support stage-2 page tables.
No functional change intended.
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
without bleeding the EPT hack into the common APIs.
While commit 094444204570 ("selftests: kvm: add test for dirty logging
inside nested guests") is _very_ light on details, the most likely
explanation is that vmx_dirty_log_test was attempting to avoid taking an
EPT Violation on the first _write_ from L2.
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
WRITE_ONCE(*a, 1); <===
GUEST_SYNC(true);
...
}
When handling read faults in the shadow MMU, KVM opportunistically creates
a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
writes in order to emulate Dirty-bit updates. By setting A/D bits in the
test's EPT entries, the above READ+WRITE will fault only on the read, and
in theory expose the bug fixed by KVM commit 1f4e5fc83a42 ("KVM: x86: fix
nested guest live migration with PML"). If the Dirty bit is NOT set, the
test will get a false pass due; though again, in theory.
However, the test is flawed (and always was, at least in the versions
posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without
a check on the dirty bitmap after the READ_ONCE(), the check after the
first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
been updated by the read fault, not by PML.
Furthermore, the subsequent behavior in the test's l2_guest_code()
effectively hides the flawed test behavior, as the straight writes to a
new L2 GPA fault also trigger the KVM bug, and so the test will still
detect the failure due to lack of isolation between the two testcases
(Read=>Write vs. Write=>Write).
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
properly fix the test once the TDP code uses the common MMU APIs, at which
point it will be trivially easy for the test to retrieve the EPT PTE and
set the Dirty bit as needed.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rewrite changelog to explain the situation]
Link: https://patch.msgid.link/20251230230150.4150236-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Replace the struct overlay with explicit bitmasks, which is clearer and
less error-prone. See commit f18b4aebe107 ("kvm: selftests: do not use
bitfields larger than 32-bits for PTEs") for an example of why bitfields
are not preferable.
Remove the unused PAGE_SHIFT_4K definition while at it.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rename the functions from nested_* to tdp_* to make their purpose
clearer.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
On x86, KVM selftests use memslot 0 for all the default regions used by
the test infrastructure. This is an implementation detail.
nested_map_memslot() is currently used to map the default regions by
explicitly passing slot 0, which leaks the library implementation into
the caller.
Rename the function to a very verbose
nested_identity_map_default_memslots() to reflect what it actually does.
Add an assertion that only memslot 0 is being used so that the
implementation does not change from under us.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The function is only used in processor.c, drop the declaration in
processor.h and make it static.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251230230150.4150236-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19
1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
|
|
* kvm-arm64/nv-xnx-haf: (22 commits)
: Support for FEAT_XNX and FEAT_HAF in nested
:
: Add support for a couple of MMU-related features that weren't
: implemented by KVM's software page table walk:
:
: - FEAT_XNX: Allows the hypervisor to describe execute permissions
: separately for EL0 and EL1
:
: - FEAT_HAF: Hardware update of the Access Flag, which in the context of
: nested means software walkers must also set the Access Flag.
:
: The series also adds some basic support for testing KVM's emulation of
: the AT instruction, including the implementation detail that AT sets the
: Access Flag in KVM.
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
* kvm-arm64/vgic-lr-overflow: (50 commits)
: Support for VGIC LR overflows, courtesy of Marc Zyngier
:
: Address deficiencies in KVM's GIC emulation when a vCPU has more active
: IRQs than can be represented in the VGIC list registers. Sort the AP
: list to prioritize inactive and pending IRQs, potentially spilling
: active IRQs outside of the LRs.
:
: Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
: which involves special consideration for SPIs being deactivated from a
: different vCPU than the one that acked it.
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
KVM: arm64: selftests: vgic_irq: Add timer deactivation test
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
KVM: arm64: selftests: gic_v3: Add irq group setting helper
KVM: arm64: GICv2: Always trap GICV_DIR register
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
KVM: arm64: GICv2: Handle LR overflow when EOImode==0
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
KVM: arm64: GICv3: Handle in-LR deactivation when possible
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
* kvm-arm64/sea-user:
: Userspace handling of SEAs, courtesy of Jiaqi Yan
:
: Add support for processing external aborts in userspace in situations
: where the host has failed to do so, allowing the VMM to potentially
: reinject an external abort into the VM.
Documentation: kvm: new UAPI for handling SEA
KVM: selftests: Test for KVM_EXIT_ARM_SEA
KVM: arm64: VM exit to userspace to handle SEA
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
With time counter test, it is to verify that time count starts from 0
and always grows up then.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add timer test case based on common arch_timer code, timer interrupt
with one-shot and period mode is tested.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add interrupt and exception handler register interface. When exception
happens, execute registered exception handler if exists, else report an
error.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add some basic function interfaces such as CSR register access, local
irq enable or disable APIs.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
When system returns from exception with ertn instruction, PC comes from
LOONGARCH_CSR_ERA, and CSR.CRMD comes LOONGARCH_CSR_PRMD.
Here save CSR register CSR.ERA and CSR.PRMD into stack, and then restore
them from stack. So it can be modified by exception handlers in future.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
KVM selftests changes for 6.19:
- Fix a math goof in mmu_stress_test when running on a single-CPU system/VM.
- Forcefully override ARCH from x86_64 to x86 to play nice with specifying
ARCH=x86_64 on the command line.
- Extend a bunch of nested VMX to validate nested SVM as well.
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test to
verify KVM can save/restore nested VMX state when L1 is using 5-level
paging, but L2 is not.
- Clean up the guest paging code in anticipation of sharing the core logic for
nested EPT and nested NPT.
|
|
KVM guest_memfd changes for 6.19:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety of
rough edges in guest_memfd along the way.
- Define a CLASS to automatically handle get+put when grabbing a guest_memfd
from a memslot to make it harder to leak references.
- Enhance KVM selftests to make it easer to develop and debug selftests like
those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs
often result in hard-to-debug SIGBUS errors.
- Misc cleanups.
|
|
Make sure G0 is disabled at the point of initialising the GIC.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-42-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Being able to set the group of an interrupt is pretty useful.
Add such a helper.
Tested-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://msgid.link/20251120172540.2267180-41-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Call paths leading to __virt_pg_map() are currently:
(a) virt_pg_map() -> virt_arch_pg_map() -> __virt_pg_map()
(b) virt_map_level() -> __virt_pg_map()
For (a), calls to virt_pg_map() from kvm_util.c make sure they update
vm->vpages_mapped, but other callers do not. Move the sparsebit_set()
call into virt_pg_map() to make sure all callers are captured.
For (b), call sparsebit_set_num() from virt_map_level().
It's tempting to have a single the call inside __virt_pg_map(), however:
- The call path in (a) is not x86-specific, while (b) is. Moving the
call into __virt_pg_map() would require doing something similar for
other archs implementing virt_pg_map().
- Future changes will reusue __virt_pg_map() for nested PTEs, which should
not update vm->vpages_mapped, i.e. a triple underscore version that does
not update vm->vpages_mapped would need to be provided.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-12-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use 57-bit addresses with 5-level paging on hardware that supports
LA57. Continue to use 48-bit addresses with 4-level paging on hardware
that doesn't support LA57.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://patch.msgid.link/20251028225827.2269128-4-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Walk the guest page tables via a loop when searching for a PTE,
instead of using unique variables for each level of the page tables.
This simplifies the code and makes it easier to support 5-level paging
in the future.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Walk the guest page tables via a loop when creating new mappings,
instead of using unique variables for each level of the page tables.
This simplifies the code and makes it easier to support 5-level paging
in the future.
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251028225827.2269128-2-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
eptp_memslot is unused, remove it. No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Link: https://patch.msgid.link/20251021074736.1324328-10-yosry.ahmed@linux.dev
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to
map interrupt events to ITT entries and collection IDs to
redistributors, respectively.
We have no guarantee that the ITS will finish handling these mapping
commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the
guest. If LPIs are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.
Fix by adding a SYNC command to the selftests ITS library, then calling
SYNC after ITS mapping to ensure mapping completes before signal_lpi()
writes to GITS_TRANSLATER.
Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
The selftests GIC library and tests assume that the
GICR_TYPER.Processor_number associated with a given CPU is the same as
the CPU's selftest index.
Since this assumption is not guaranteed by specification, add an assert
in gicv3_cpu_init() that validates this is true.
Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de>
Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Test how KVM handles guest SEA when APEI is unable to claim it, and
KVM_CAP_ARM_SEA_TO_USER is enabled.
The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
guest and KVM injects expected SEA into the VCPU.
Tested on a data center server running Siryn AmpereOne processor
that has RAS support.
Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
APEI and GHES to offline the memory before guest can consume
injected UER, and making test unable to trigger SEA.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
|