summaryrefslogtreecommitdiff
path: root/arch/riscv/kvm
AgeCommit message (Collapse)Author
8 daysriscv: kvm: Use endian-specific __lelong for NACL shared memorySean Chang
When compiling with sparse enabled (C=2), bitwise type warnings are triggered in the RISC-V KVM implementation. This occurs because the user-space data unboxing macro '__get_user_asm' performs implicit casting on restricted types without forcing the compiler's compliance. Additionally, raw 'unsigned long *' pointers are used to access the SBI NACL shared memory, whereas the RISC-V SBI specification mandates that these structures must follow little-endian byte ordering. Fix these by: 1. Adding a '__force' cast to '__get_user_asm()' to safely suppress implicit cast warnings during user-space data fetching. 2. Introducing the '__lelong' type macro, which dynamically resolves to '__le32' or '__le64' depending on XLEN, and replacing 'unsigned long *' with '__lelong *' to enforce proper compile-time endianness checks. Signed-off-by: Sean Chang <seanwascoding@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260608155252.4292-1-seanwascoding@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-07RISC-V: KVM: Fix skip of valid pages in kvm_riscv_gstage_unmap_rangeWu Fei
Same as kvm_riscv_gstage_wp_range, the possible valid pages should not be skipped if !found_leaf. Different from wp case, which can write-protect more than asked, unmap can't do that, no splitting is added right now but a warning is logged instead. Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260604230317.30501-3-atwufei@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-07RISC-V: KVM: Fix skip of valid pages in kvm_riscv_gstage_wp_rangeWu Fei
The current gstage range walker unconditionally advances by 'page_size' when a leaf PTE is not found, e.g. when the range to wp is [0xfffff01fc000, 0xfffff023c000) and page_size is 2MB, if found_leaf of 0xfffff01fc000 returns false, it skip the whole range, but it's possible to have valid entries in [0xfffff0200000, 0xfffff023c000). Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260604230317.30501-2-atwufei@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-07KVM: riscv: Check hugetlb block mappings against memslot boundsJinyu Tang
RISC-V KVM has used the hugetlb VMA size directly as the G-stage mapping size since stage-2 page table support was added. That is safe only if the block covered by the fault is fully contained in the memslot and the userspace address has the same offset as the GPA within that block. The THP path already checks those constraints before installing a PMD block mapping. The hugetlb path did not, so an unaligned memslot could make KVM install a PMD or PUD sized G-stage block that covers memory outside the slot or maps the wrong host pages. Pass the target mapping size into fault_supports_gstage_huge_mapping(). The same helper can be used for both THP PMD mappings and hugetlb PMD/PUD mappings. Select hugetlb mapping sizes through the same memslot-boundary check, falling back from PUD to PMD to PAGE_SIZE. When a smaller hugetlb mapping size is selected, fault the GFN aligned to that selected size instead of the original VMA size. Also keep hugetlb mappings out of transparent_hugepage_adjust(). Once the hugetlb path has chosen PAGE_SIZE, promoting it again through the THP helper would miss the hugetlb fallback decision. Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming") Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260604142602.3582602-2-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-05KVM: RISC-V: SBI FWFT: Fix stale feature exposure after runtime extension ↵Yong-Xuan Wang
changes Fix a bug where FWFT features could be incorrectly exposed to guests after userspace disables their dependent ISA extensions at runtime. The 'supported' field in kvm_sbi_fwft_config was set once during vCPU initialization based on the initial hardware/extension availability. However, when userspace subsequently disables ISA extensions via the KVM ONE_REG interface, the 'supported' field was not updated. This caused the following issues: 1. FWFT features would remain visible and accessible to guests even after their prerequisite ISA extensions were disabled 2. Guests could configure FWFT features that depend on disabled extensions, leading to undefined behavior 3. The static 'supported' flag and the dynamic supported() callback could disagree about feature availability The fix introduces a two-layer checking mechanism: 1. Add an optional init() callback to the kvm_sbi_fwft_feature structure for features that require hardware probing during initialization. This separates the one-time hardware detection logic from the runtime availability check. 2. Add runtime checks in all FWFT-related functions that call feature->supported(vcpu) if the callback exists. This ensures feature availability is re-evaluated based on the current ISA extension state. This approach maintains the cached 'supported' field for initialization- time decisions while ensuring runtime availability is always determined by the current vCPU configuration, not initialization-time snapshots. Fixes: 6b72fd170592 ("RISC-V: KVM: add support for FWFT SBI extension") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-3-415d08a2813b@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-05KVM: RISC-V: SBI FWFT: Add optional init() callback for hardware probingYong-Xuan Wang
Add an optional init() callback to separate one-time hardware probing from runtime availability checks. For pointer masking, this allows probing supported PMM lengths during initialization while checking ISA extension availability at runtime. Fix try_to_set_pmm() to restore the previous HENVCFG.PMM value after probing, preventing side effects from hardware detection. Add preemption protection to ensure CSR probe sequences complete atomically on the same CPU. Fixes: 6f576fc0aeb9 ("RISC-V: KVM: Add support for SBI_FWFT_POINTER_MASKING_PMLEN") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-2-415d08a2813b@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-05KVM: RISC-V: SBI FWFT: Mark vCPU CSRs dirty after setting feature valueYong-Xuan Wang
Mark the vCPU CSRs as dirty after successfully setting an FWFT feature value. FWFT features may modify CSRs (e.g., pointer masking modifies henvcfg.PMM), and failing to mark them dirty can lead to the guest observing stale CSR state after vCPU scheduling or migration. Fixes: 1323a5cfe52c ("KVM: riscv: Skip CSR restore if VCPU is reloaded on the same core") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-1-415d08a2813b@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-05RISC-V: KVM: Enhance the logging check for mmu mappingInochi Amaoto
When enabling dirty ring, the dirty bitmap is disable, and the logging check is always false as the RISC-V architecture does not select "NEED_KVM_DIRTY_RING_WITH_BITMAP". Although the dirty log is recorded since the write path already trying to add the dirty log, the logic for logging check is broken and some side effect will occurs. Enhance the logging check for mmu mapping so it can check both the dirty ring and the dirty bitmap. Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260528113840.2629186-1-inochiama@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-04RISC-V: KVM: Fix timer state restoreQiang Ma
The KVM_REG_RISCV_TIMER_REG(state) one-reg write passes the value written by userspace to kvm_riscv_vcpu_timer_next_event() when re-enabling the timer. That value is the timer state, KVM_RISCV_TIMER_STATE_ON, not the timer compare value. During migration or state restore, userspace restores the compare register separately, which stores the target cycle in t->next_cycles. Re-arming the timer with the state value schedules the next event at cycle 1 instead of the restored compare value, causing the virtual timer to fire too early. Use the restored compare value from t->next_cycles when turning the timer back on. Fixes: 3a9f66cb25e1 ("RISC-V: KVM: Add timer functionality") Signed-off-by: Qiang Ma <maqianga@uniontech.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260526075544.796396-1-maqianga@uniontech.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-04RISC-V: KVM: Fix NULL pointer dereference in AIA IMSIC functionsJiakai Xu
Fuzzer reported a NULL pointer dereference in kvm_riscv_vcpu_aia_imsic_put() when a VCPU's imsic_state was NULL while kvm_riscv_aia_initialized() returned true. The global initialized flag is set per-VM in aia_init(), but imsic_state is allocated per-VCPU in kvm_riscv_vcpu_aia_imsic_init(). If a VCPU is created after aia_init() has already run, its imsic_state remains NULL while the global flag is true. When this VCPU is preempted, kvm_sched_out() calls kvm_arch_vcpu_put() -> kvm_riscv_vcpu_aia_put() -> kvm_riscv_vcpu_aia_imsic_put() which dereferences NULL. Add NULL pointer guards to kvm_riscv_vcpu_aia_imsic_put(), consistent with the NULL checks already present in all other functions in the same file. Also add a NULL guard to kvm_riscv_vcpu_aia_imsic_release() and kvm_riscv_vcpu_aia_imsic_has_interrupt() for the same reason. Fixes: 4cec89db80ba ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: YuanSheng:DeepSeek-V3.2 Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260526031517.1166025-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-04RISC-V: KVM: Document a TOCTOU race in SBI system suspend handlerJiakai Xu
The SUSP handler checks that all other vCPUs are stopped before entering system suspend, but a concurrent HSM HART_START can start a vCPU after it has already passed the check. This is a known TOCTOU race. We do not fix it because: 1. Triggering it requires a pathological guest. 2. Only guest state is at risk, not host integrity. 3. Userspace can double-check vCPU states before suspend. Add a comment documenting the race and the rationale for not fixing it. Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: YuanSheng:DeepSeek-V3.2 Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260525013642.999187-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-03KVM: riscv: Fast-path dirty logging write faultsJinyu Tang
With dirty logging enabled, guest writes often fault on an existing 4K G-stage leaf that was write-protected only for dirty tracking. The slow path still performs the full fault handling flow and takes mmu_lock for write, even though the page-table shape does not change. x86 handles the analogous case in its fast page fault path by atomically making a writable SPTE writable again when the fault is only a write-protection fault. Add the same style of fast path for RISC-V. If a write fault hits an existing 4K leaf in a writable dirty-log memslot, mark the page dirty and atomically set the PTE writable and dirty under the read side of mmu_lock. The dirty bitmap is updated before the PTE becomes writable again. The PTE D bit is also set so systems that trap on a clear D bit do not fall back to the slow path for a writable but clean PTE. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517153427.94889-6-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-03KVM: riscv: Update G-stage PTE permissions atomicallyJinyu Tang
When a fault hits an existing G-stage leaf with the same PFN, KVM only needs to update the PTE permissions. This path will be used by read-side fault handling, so it must not overwrite a concurrent PTE update. Use the cmpxchg helper when relaxing permissions on an existing leaf, following the same concurrency model used by x86 for atomic SPTE permission updates. Retry if another CPU changed the PTE first, and use cpu_relax() while spinning. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517153427.94889-5-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-03KVM: riscv: Add a G-stage PTE cmpxchg helperJinyu Tang
Permission-only G-stage PTE updates can run in parallel once they are moved to the read side of mmu_lock. Plain set_pte() is not enough for that case because another CPU may update the same PTE first. x86 handles the same class of SPTE races with cmpxchg-based updates in its fast page fault and TDP MMU paths. Add a small RISC-V helper for atomic G-stage PTE updates. The helper reports contention to the caller and flushes the target range only when the PTE value actually changes. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517153427.94889-4-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-03KVM: riscv: Use an rwlock for mmu_lockJinyu Tang
RISC-V KVM currently uses a spinlock for mmu_lock. That serializes all G-stage MMU operations, including permission-only updates that do not allocate or free page-table pages. Use KVM's rwlock form of mmu_lock, as x86 and arm64 already do. Keep the existing map, unmap and teardown paths on the write side. This prepares RISC-V for read-side handling of G-stage permission updates. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517153427.94889-3-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-03KVM: riscv: Rely on common MMU notifier lockingJinyu Tang
The common KVM invalidation paths call kvm_unmap_gfn_range() with mmu_lock already held for write. For the standard MMU notifier path, the call chain is: kvm_mmu_notifier_invalidate_range_start() kvm_handle_hva_range() kvm_unmap_gfn_range() kvm_mmu_notifier_invalidate_range_start() leaves range.lockless clear. kvm_handle_hva_range() therefore takes KVM_MMU_LOCK(kvm) before invoking the handler. The guest_memfd path has the same locking contract: __kvm_gmem_invalidate_begin() kvm_mmu_unmap_gfn_range() kvm_unmap_gfn_range() __kvm_gmem_invalidate_begin() explicitly takes KVM_MMU_LOCK(kvm) before calling kvm_mmu_unmap_gfn_range(). So remove the local trylock and make the common locking contract explicit with lockdep_assert_held_write() like x86. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517153427.94889-2-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-06-01RISC-V: KVM: Use flexible array for APLIC IRQ stateRosen Penev
Store the per-source APLIC IRQ state in the APLIC allocation instead of allocating it separately. This ties the IRQ state lifetime directly to the APLIC state, removes a separate allocation failure path, and lets __counted_by() describe the array bounds. Assisted-by: Codex:GPT-5.5 Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260511032144.361520-1-rosenp@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-31RISC-V: KVM: Batch stage-2 TLB flushesJinyu Tang
KVM RISC-V triggers a TLB flush for every single stage-2 PTE modification (unmap or write-protect) now. Although KVM coalesces the hardware IPIs, the software overhead of executing the flush work for every page is large, especially during dirty page tracking. Following the approach used in x86 and arm64, this patch optimizes the MMU logic by making the PTE manipulation functions return a boolean indicating if a leaf PTE was actually changed. The outer MMU functions bubble up this flag to batch the remote TLB flushes. Consequently, the flush operation is executed only once per batch. Moving it outside of the `mmu_lock` also reduces lock contention. Tested with tools/testing/selftests/kvm on a 4-vCPU guest (Host environment: QEMU 10.2.1 RISC-V) 1. demand_paging_test (1GB memory) time ./demand_paging_test -b 1G -v 4 - Total execution time reduced from ~2m39s to ~2m31s 2. dirty_log_perf_test (1GB memory) ./dirty_log_perf_test -b 1G -v 4 - "Clear dirty log time" per iteration dropped significantly from ~3.40s to ~0.18s Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Jinyu Tang <tjytimi@163.com> Link: https://lore.kernel.org/r/20260412023822.83341-1-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-26RISC-V: KVM: AIA: Make HGEI number management fully per-CPUGuo Ren (Alibaba DAMO Academy)
Previously, the number of Hypervisor Guest External Interrupt (HGEI) lines was stored in a single global variable `kvm_riscv_aia_nr_hgei` and assumed to be the same for all HARTs. This assumption does not hold on heterogeneous RISC-V SoCs where different cores may expose different HGEIE CSR widths. Introduce `nr_hgei` field into the per-CPU `struct aia_hgei_control` and probe the actual supported HGEI count for the current HART in `kvm_riscv_aia_enable()` using the standard RISC-V CSR probe technique: csr_write(CSR_HGEIE, -1UL); nr = fls_long(csr_read(CSR_HGEIE)); if (nr) nr--; All HGEI allocation, free and disable paths (`kvm_riscv_aia_free_hgei()`, `kvm_riscv_aia_disable()`, etc.) now use the per-CPU value instead of the global one. The global `kvm_riscv_aia_nr_hgei` now represents the minimum number of HGEI lines across HARTs and can be used to check whether HGEI support is available or not. This makes KVM AIA robust on big.LITTLE-style asymmetric platforms. Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260525094945.3721783-3-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-26RISC-V: KVM: Fix ebreak self test failureMayuresh Chitale
The ebreak self test enables/disables guest debugging as a part of the test. However the KVM_SET_GUEST_DEBUG ioctl doesn't actually do it. Fixing it by calling kvm_riscv_vcpu_config_guest_debug. Fixes: 6ed523e2b612 ("RISC-V: KVM: Factor-out VCPU config into separate sources") Signed-off-by: Mayuresh Chitale <mayuresh.chitale@oss.qualcomm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260525095930.3924905-1-mayuresh.chitale@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18RISC-V: KVM: Fix sign extension for MMIO loadsJiakai Xu
The kvm_riscv_vcpu_mmio_return() function handles MMIO read results by writing the data back to the guest register. For signed load instructions (LB, LH, LW on RV64), the value needs sign-extension from a smaller integer to unsigned long. The current code uses: (ulong)data << shift >> shift but (ulong) makes the right shift a logical shift (zero-extend) rather than an arithmetic shift (sign-extend), causing incorrect results when the MMIO device returns a negative value. For example, LB reading 0x80 would return 128 instead of -128. Fix this by casting to (long) after the left shift so that the subsequent right shift is arithmetic and correctly propagates the sign bit: (long)((ulong)data << shift) >> shift Additionally, remove the unnecessary shift assignment for LBU (unsigned byte load) since it does not need sign extension. This makes LBU consistent with LHU and LWU which already keep shift = 0. Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514081752.472987-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handlerJiakai Xu
The SBI v0.1 SEND_IPI handler iterates over the hart mask and calls kvm_get_vcpu_by_id() to find the target vcpu for each set bit. When a guest provides a hart mask containing bits for non-existent vcpu_ids, kvm_get_vcpu_by_id() returns NULL, which is then unconditionally dereferenced by kvm_riscv_vcpu_set_interrupt(), causing a kernel crash. Fix this by adding a NULL check before dereferencing the return value. If the target vcpu is not found, skip it and continue processing the remaining valid harts. Fixes: a046c2d8578c ("RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own file") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517124414.420919-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOMOsama Abdelkader
kvm_riscv_vcpu_pmu_event_info() returned -ENOMEM from the SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to abort KVM_RUN and surface the error to userspace instead of completing the ECALL with a negative SBI error in a0. Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU handlers and kvm_sbi_ext_pmu_handler comment. Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514173642.41448-2-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOMOsama Abdelkader
kvm_riscv_vcpu_pmu_snapshot_set_shmem() returned -ENOMEM from the SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to abort KVM_RUN and surface the error to userspace instead of ompleting the ECALL with a negative SBI error in a0. Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU handlers and kvm_sbi_ext_pmu_handler comment. Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514173642.41448-1-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18RISC-V: KVM: Fix invalid HVA warning in steal-time recordingJiakai Xu
kvm_riscv_vcpu_record_steal_time() assumes that the steal-time shared memory GPA (vcpu->arch.sta.shmem) is always backed by a valid guest memory slot. However, this assumption is not guaranteed by the KVM userspace ABI. A malicious or buggy userspace can set the STA shared memory GPA via KVM_SET_ONE_REG without establishing a corresponding memory region via KVM_SET_USER_MEMORY_REGION. In such cases, the GPA cannot be translated to a valid HVA and kvm_vcpu_gfn_to_hva() returns an error address. The current implementation incorrectly treats this as a kernel warning using WARN_ON(), which may escalate to a kernel panic when panic_on_warn is enabled. This is not a kernel bug condition but a normal invalid configuration from userspace, and should be handled gracefully. Fix it by removing WARN_ON() and treating invalid HVA as a normal failure case, resetting the STA shared memory state. Fixes: e9f12b5fff8ad0 ("RISC-V: KVM: Implement SBI STA extension") Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260415075216.2757427-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-24Merge tag 'riscv-for-linus-7.1-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: "There is one significant change outside arch/riscv in this pull request: the addition of a set of KUnit tests for strlen(), strnlen(), and strrchr(). Otherwise, the most notable changes are to add some RISC-V-specific string function implementations, to remove XIP kernel support, to add hardware error exception handling, and to optimize our runtime unaligned access speed testing. A few comments on the motivation for removing XIP support. It's been broken in the RISC-V kernel for months. The code is not easy to maintain. Furthermore, for XIP support to truly be useful for RISC-V, we think that compile-time feature switches would need to be added for many of the RISC-V ISA features and microarchitectural properties that are currently implemented with runtime patching. No one has stepped forward to take responsibility for that work, so many of us think it's best to remove it until clear use cases and champions emerge. Summary: - Add Kunit correctness testing and microbenchmarks for strlen(), strnlen(), and strrchr() - Add RISC-V-specific strnlen(), strchr(), strrchr() implementations - Add hardware error exception handling - Clean up and optimize our unaligned access probe code - Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys() - Remove XIP kernel support - Warn when addresses outside the vmemmap range are passed to vmemmap_populate() - Update the ACPI FADT revision check to warn if it's not at least ACPI v6.6, which is when key RISC-V-specific tables were added to the specification - Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC, etc. - Make kaslr_offset() a static inline function, since there's no need for it to show up in the symbol table - Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve kdump support - Add Makefile cleanup rule for vdso_cfi copied source files, and add a .gitignore for the build artifacts in that directory - Remove some redundant ifdefs that check Kconfig macros - Add missing SPDX license tag to the CFI selftest - Simplify UTS_MACHINE assignment in the RISC-V Makefile - Clarify some unclear comments and remove some superfluous comments - Fix various English typos across the RISC-V codebase" * tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits) riscv: Remove support for XIP kernel riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() riscv: Split out compare_unaligned_access() riscv: Reuse measure_cycles() in check_vector_unaligned_access() riscv: Split out measure_cycles() for reuse riscv: Clean up & optimize unaligned scalar access probe riscv: lib: add strrchr() implementation riscv: lib: add strchr() implementation riscv: lib: add strnlen() implementation lib/string_kunit: extend benchmarks to strnlen() and chr searches lib/string_kunit: add performance benchmark for strlen() lib/string_kunit: add correctness test for strrchr() lib/string_kunit: add correctness test for strnlen() lib/string_kunit: add correctness test for strlen() riscv: vdso_cfi: Add .gitignore for build artifacts riscv: vdso_cfi: Add clean rule for copied sources riscv: enable HAVE_IOREMAP_PROT riscv: mm: WARN_ON() for bad addresses in vmemmap_populate() riscv: acpi: update FADT revision check to 6.6 riscv: add hardware error trap handler support ...
2026-04-06RISC-V: KVM: Fix shift-out-of-bounds in make_xfence_request()Jiakai Xu
The make_xfence_request() function uses a shift operation to check if a vCPU is in the hart mask: if (!(hmask & (1UL << (vcpu->vcpu_id - hbase)))) However, when the difference between vcpu_id and hbase is >= BITS_PER_LONG, the shift operation causes undefined behavior. This was detected by UBSAN: UBSAN: shift-out-of-bounds in arch/riscv/kvm/tlb.c:343:23 shift exponent 256 is too large for 64-bit type 'long unsigned int' Fix this by adding a bounds check before the shift operation. This bug was found by fuzzing the KVM RISC-V interface. Fixes: 13acfec2dbcc ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260403232011.2394966-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-04riscv: fix various typos in comments and codeSean Chang
Fix various typos in RISC-V architecture code and comments. The following changes are included: - arch/riscv/errata/thead/errata.c: "futher" → "further" - arch/riscv/include/asm/atomic.h: "therefor" → "therefore", "arithmatic" → "arithmetic" - arch/riscv/include/asm/elf.h: "availiable" → "available", "coorespends" → "corresponds" - arch/riscv/include/asm/processor.h: "requries" → "is required" - arch/riscv/include/asm/thread_info.h: "returing" → "returning" - arch/riscv/kernel/acpi.c: "compliancy" → "compliance" - arch/riscv/kernel/ftrace.c: "therefor" → "therefore" - arch/riscv/kernel/head.S: "intruction" → "instruction" - arch/riscv/kernel/mcount-dyn.S: "localtion → "location" - arch/riscv/kernel/module-sections.c: "maxinum" → "maximum" - arch/riscv/kernel/probes/kprobes.c: "reenabled" → "re-enabled" - arch/riscv/kernel/probes/uprobes.c: "probbed" → "probed" - arch/riscv/kernel/soc.c: "extremly" → "extremely" - arch/riscv/kernel/suspend.c: "incosistent" → "inconsistent" - arch/riscv/kvm/tlb.c: "cahce" → "cache" - arch/riscv/kvm/vcpu_pmu.c: "indicies" → "indices" - arch/riscv/lib/csum.c: "implmentations" → "implementations" - arch/riscv/lib/memmove.S: "ammount" → "amount" - arch/riscv/mm/cacheflush.c: "visable" → "visible" - arch/riscv/mm/physaddr.c: "aginst" → "against" Signed-off-by: Sean Chang <seanwascoding@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20260212163325.60389-1-seanwascoding@gmail.com Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04RISC-V: KVM: Reuse KVM_CAP_VM_GPA_BITS to select HGATP.MODEFangyu Yu
Reuse KVM_CAP_VM_GPA_BITS to advertise and select the effective G-stage GPA width for a VM. KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) returns the effective GPA bits for a VM, KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) allows userspace to downsize the effective GPA width by selecting a smaller G-stage page table format: - gpa_bits <= 41 selects Sv39x4 (pgd_levels=3) - gpa_bits <= 50 selects Sv48x4 (pgd_levels=4) - gpa_bits <= 59 selects Sv57x4 (pgd_levels=5) Reject the request with -EINVAL for unsupported values and with -EBUSY if vCPUs have been created or any memslot is populated. Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260403153019.9916-4-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-04RISC-V: KVM: Cache gstage pgd_levels in struct kvm_gstageFangyu Yu
Gstage page-table helpers frequently chase gstage->kvm->arch to fetch pgd_levels. This adds noise and repeats the same dereference chain in hot paths. Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch when setting up a gstage instance. Introduce kvm_riscv_gstage_init() to centralize initialization and switch gstage code to use gstage->pgd_levels. Suggested-by: Anup Patel <anup@brainfault.org> Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20260403153019.9916-3-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-04RISC-V: KVM: Support runtime configuration for per-VM's HGATP modeFangyu Yu
Introduces one per-VM architecture-specific fields to support runtime configuration of the G-stage page table format: - kvm->arch.pgd_levels: the corresponding number of page table levels for the selected mode. These fields replace the previous global variables kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different virtual machines to independently select their G-stage page table format instead of being forced to share the maximum mode detected by the kernel at boot time. Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Link: https://lore.kernel.org/r/20260403153019.9916-2-fangyu.yu@linux.alibaba.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-03RISC-V: KVM: Don't check hstateen0 when updating sstateen0 CSRAnup Patel
The hstateen0 will be programmed differently for guest HS-mode and guest VS/VU-mode so don't check hstateen0.SSTATEEN0 bit when updating sstateen0 CSR in kvm_riscv_vcpu_swap_in_guest_state() and kvm_riscv_vcpu_swap_in_host_state(). Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-10-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-03RISC-V: KVM: Factor-out VCPU config into separate sourcesAnup Patel
The VCPU config deals with hideleg, hedeleg, henvcfg, and hstateenX CSR configuration for each VCPU. Factor-out VCPU config into separate sources so that VCPU config can do things differently for guest HS-mode and guest VS/VU-mode. Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-9-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-03RISC-V: KVM: Add hideleg to struct kvm_vcpu_configAnup Patel
The hideleg CSR state when VCPU is running in guest VS/VU-mode will be different from when it is running in guest HS-mode. To achieve this, add hideleg to struct kvm_vcpu_config and re-program hideleg CSR upon every kvm_arch_vcpu_load(). Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-8-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-03RISC-V: KVM: Factor-out ISA checks into separate sourcesAnup Patel
The KVM ISA extension related checks are not VCPU specific and should be factored out of vcpu_onereg.c into separate sources. Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-6-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-03RISC-V: KVM: Introduce common kvm_riscv_isa_check_host()Anup Patel
Rename kvm_riscv_vcpu_isa_check_host() to kvm_riscv_isa_check_host() and use it as common function with KVM RISC-V to check isa extensions supported by host. Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-5-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-04-02KVM: riscv: Skip CSR restore if VCPU is reloaded on the same coreJinyu Tang
Currently, kvm_arch_vcpu_load() unconditionally restores guest CSRs, HGATP, and AIA state. However, when a VCPU is loaded back on the same physical CPU, and no other KVM VCPU has run on this CPU since it was last put, the hardware CSRs and AIA registers are still valid. This patch optimizes the vcpu_load path by skipping the expensive CSR and AIA writes if all the following conditions are met: 1. It is being reloaded on the same CPU (vcpu->arch.last_exit_cpu == cpu). 2. The CSRs are not dirty (!vcpu->arch.csr_dirty). 3. No other VCPU used this CPU (vcpu == __this_cpu_read(kvm_former_vcpu)). To ensure this fast-path doesn't break corner cases: - Live migration and VCPU reset are naturally safe. KVM initializes last_exit_cpu to -1, which guarantees the fast-path won't trigger. - The 'csr_dirty' flag tracks runtime userspace interventions. If userspace modifies guest configurations (e.g., hedeleg via KVM_SET_GUEST_DEBUG, or CSRs including AIA via KVM_SET_ONE_REG), the flag is set to skip the fast path. With the 'csr_dirty' safeguard proven effective, it is safe to include kvm_riscv_vcpu_aia_load() inside the skip logic now. Signed-off-by: Jinyu Tang <tjytimi@163.com> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260227121008.442241-1-tjytimi@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-30RISC-V: KVM: Split huge pages during fault handling for dirty loggingWang Yechao
During dirty logging, all huge pages are write-protected. When the guest writes to a write-protected huge page, a page fault is triggered. Before recovering the write permission, the huge page must be split into smaller pages (e.g., 4K). After splitting, the normal mapping process proceeds, allowing write permission to be restored at the smaller page granularity. If dirty logging is disabled because migration failed or was cancelled, only recover the write permission at the 4K level, and skip recovering the huge page mapping at this time to avoid the overhead of freeing page tables. The huge page mapping can be recovered in the ioctl context, similar to x86, in a later patch. Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/202603301612587174XZ6QMCrymBqv30S6BN50@zte.com.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-30RISC-V: KVM: Fix lost write protection on huge pages during dirty loggingWang Yechao
When enabling dirty log in small chunks (e.g., QEMU default chunk size of 256K), the chunk size is always smaller than the page size of huge pages (1G or 2M) used in the gstage page tables. This caused the write protection to be incorrectly skipped for huge PTEs because the condition `(end - addr) >= page_size` was not satisfied. Remove the size check in `kvm_riscv_gstage_wp_range()` to ensure huge PTEs are always write-protected regardless of the chunk size. Additionally, explicitly align the address down to the page size before invoking `kvm_riscv_gstage_op_pte()` to guarantee that the address passed to the operation function is page-aligned. This fixes the issue where dirty pages might not be tracked correctly when using huge pages. Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming") Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/202603301610527120YZ-pAJY6x9SBpSRo1Wg4@zte.com.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-30RISC-V: KVM: Fix integer overflow in kvm_pmu_validate_counter_mask()Jiakai Xu
When a guest initiates an SBI_EXT_PMU_COUNTER_CFG_MATCH call with ctr_base=0xfffffffffffffffe, ctr_mask=0xeb5f and flags=0x1 (SBI_PMU_CFG_FLAG_SKIP_MATCH), kvm_riscv_vcpu_pmu_ctr_cfg_match() first invokes kvm_pmu_validate_counter_mask() to verify whether ctr_base and ctr_mask are valid, by evaluating: !ctr_mask || (ctr_base + __fls(ctr_mask) >= kvm_pmu_num_counters(kvpmu)) With the above inputs, __fls(0xeb5f) equals 15, and adding 15 to 0xfffffffffffffffe causes an integer overflow, wrapping around to 13. Since 13 is less than kvm_pmu_num_counters(), the validation wrongly succeeds. Thereafter, since flags & SBI_PMU_CFG_FLAG_SKIP_MATCH is satisfied, the code evaluates: !test_bit(ctr_base + __ffs(ctr_mask), kvpmu->pmc_in_use) Here __ffs(0xeb5f) equals 0, so test_bit() receives 0xfffffffffffffffe as the bit index and attempts to access the corresponding element of the kvpmu->pmc_in_use, which results in an invalid memory access. This triggers the following Oops: Unable to handle kernel paging request at virtual address e3ebffff12abba89 generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128 kvm_riscv_vcpu_pmu_ctr_cfg_match arch/riscv/kvm/vcpu_pmu.c:758 kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:49 kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608 kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240 The root cause is that kvm_pmu_validate_counter_mask() does not account for the case where ctr_base itself is out of range, allowing the subsequent addition to silently overflow and bypass the check. Fix this by explicitly validating ctr_base against kvm_pmu_num_counters() before performing the addition. This bug was found by fuzzing the KVM RISC-V PMU interface. Fixes: 0cb74b65d2e5e6 ("RISC-V: KVM: Implement perf support without sampling") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Atish Patra <atish.patra@linux.dev> Link: https://lore.kernel.org/r/20260319035902.924661-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-30RISC-V: KVM: Fix double-free of sdata in kvm_pmu_clear_snapshot_area()Jiakai Xu
In kvm_riscv_vcpu_pmu_snapshot_set_shmem(), when kvm_vcpu_write_guest() fails, kvpmu->sdata is freed but not set to NULL. This leaves a dangling pointer that will be freed again when kvm_pmu_clear_snapshot_area() is called during vcpu teardown, triggering a KASAN double-free report. First free occurs in kvm_riscv_vcpu_pmu_snapshot_set_shmem(): kvm_riscv_vcpu_pmu_snapshot_set_shmem arch/riscv/kvm/vcpu_pmu.c:443 kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:74 kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608 kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240 kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:1008 kvm_vcpu_ioctl virt/kvm/kvm_main.c:4476 Second free (double-free) occurs in kvm_pmu_clear_snapshot_area(): kvm_pmu_clear_snapshot_area arch/riscv/kvm/vcpu_pmu.c:403 [inline] kvm_riscv_vcpu_pmu_deinit.part arch/riscv/kvm/vcpu_pmu.c:905 kvm_riscv_vcpu_pmu_deinit arch/riscv/kvm/vcpu_pmu.c:893 kvm_arch_vcpu_destroy arch/riscv/kvm/vcpu.c:199 kvm_vcpu_destroy virt/kvm/kvm_main.c:469 [inline] kvm_destroy_vcpus virt/kvm/kvm_main.c:489 kvm_arch_destroy_vm arch/riscv/kvm/vm.c:54 kvm_destroy_vm virt/kvm/kvm_main.c:1301 [inline] kvm_put_kvm virt/kvm/kvm_main.c:1338 kvm_vm_release virt/kvm/kvm_main.c:1361 Fix it by setting kvpmu->sdata to NULL after kfree() in kvm_riscv_vcpu_pmu_snapshot_set_shmem(), so that the subsequent kfree(NULL) in kvm_pmu_clear_snapshot_area() becomes a safe no-op. This bug was found by fuzzing the KVM RISC-V PMU interface. Fixes: c2f41ddbcdd756 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260318092956.708246-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-30riscv: kvm: add null pointer check for vector datapYufeng Wang
Add WARN_ON check before accessing cntx->vector.datap in kvm_riscv_vcpu_vreg_addr() to detect potential null pointer dereferences early, consistent with the pattern used in kvm_riscv_vcpu_vector_reset(). This helps catch initialization issues where vector context allocation may have failed. Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260317114759.53165-1-r4o5m6e8o@163.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-27RISC-V: KVM: Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()Jiakai Xu
When a guest invokes SBI_EXT_PMU_COUNTER_FW_READ or SBI_EXT_PMU_COUNTER_FW_READ_HI on a firmware counter that has not been configured via SBI_EXT_PMU_COUNTER_CFG_MATCH, the pmc->event_idx remains SBI_PMU_EVENT_IDX_INVALID (0xFFFFFFFF). get_event_code() extracts the lower 16 bits, yielding 0xFFFF (65535), which is then used to index into kvpmu->fw_event[]. Since fw_event is only RISCV_KVM_MAX_FW_CTRS (32) entries, this triggers an array-index-out-of-bounds: UBSAN: array-index-out-of-bounds in arch/riscv/kvm/vcpu_pmu.c:255:37 index 65535 is out of range for type 'kvm_fw_event [32]' Add a check for the known unconfigured case (SBI_PMU_EVENT_IDX_INVALID) and a WARN_ONCE guard for any unexpected out-of-bounds event codes, returning -EINVAL in both cases. Fixes: badc386869e2c ("RISC-V: KVM: Support firmware events") Fixes: 08fb07d6dcf71 ("RISC-V: KVM: Support 64 bit firmware counters on RV32") Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260316014533.2312254-2-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-27riscv: kvm: fix vector context allocation leakOsama Abdelkader
When the second kzalloc (host_context.vector.datap) fails in kvm_riscv_vcpu_alloc_vector_context, the first allocation (guest_context.vector.datap) is leaked. Free it before returning. Fixes: 0f4b82579716 ("riscv: KVM: Add vector lazy save/restore support") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Andy Chiu <andybnac@gmail.com> Link: https://lore.kernel.org/r/20260316151612.13305-1-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-27RISC-V: KVM: fix PMU snapshot_set_shmem on 32-bit hostsOsama Abdelkader
When saddr_high != 0 on RV32, the goto out was unconditional, causing valid 64-bit addresses to be rejected. Only goto out when the address is invalid (64-bit host with saddr_high != 0). Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260311231833.13189-1-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-26RISC-V: KVM: Validate SBI STA shmem alignment in kvm_sbi_ext_sta_set_reg()Jiakai Xu
The RISC-V SBI Steal-Time Accounting (STA) extension requires the shared memory physical address to be 64-byte aligned, or set to all-ones to explicitly disable steal-time accounting. KVM exposes the SBI STA shared memory configuration to userspace via KVM_SET_ONE_REG. However, the current implementation of kvm_sbi_ext_sta_set_reg() does not validate the alignment of the configured shared memory address. As a result, userspace can install a misaligned shared memory address that violates the SBI specification. Such an invalid configuration may later reach runtime code paths that assume a valid and properly aligned shared memory region. In particular, KVM_RUN can trigger the following WARN_ON in kvm_riscv_vcpu_record_steal_time(): WARNING: arch/riscv/kvm/vcpu_sbi_sta.c:49 at kvm_riscv_vcpu_record_steal_time WARN_ON paths are not expected to be reachable during normal runtime execution, and may result in a kernel panic when panic_on_warn is enabled. Fix this by validating the computed shared memory GPA at the KVM_SET_ONE_REG boundary. A temporary GPA is constructed and checked before committing it to vcpu->arch.sta.shmem. The validation allows either a 64-byte aligned GPA or INVALID_GPA (all-ones), which disables STA as defined by the SBI specification. This prevents invalid userspace state from reaching runtime code paths that assume SBI STA invariants and avoids unexpected WARN_ON behavior. Fixes: f61ce890b1f074 ("RISC-V: KVM: Add support for SBI STA registers") Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260303010859.1763177-2-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-11Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.
2026-03-06RISC-V: KVM: Check host Ssaia extension when creating AIA irqchipAnup Patel
The KVM user-space may create KVM AIA irqchip before checking VCPU Ssaia extension availability so KVM AIA irqchip must fail when host does not have Ssaia extension. Fixes: 89d01306e34d ("RISC-V: KVM: Implement device interface for AIA irqchip") Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-4-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-06RISC-V: KVM: Fix error code returned for Ssaia ONE_REGAnup Patel
Return -ENOENT for Ssaia ONE_REG when Ssaia is not enabled for a VCPU. This will make Ssaia ONE_REG error codes consistent with other ONE_REG interfaces of KVM RISC-V. Fixes: 2a88f38cd58d ("RISC-V: KVM: return ENOENT in *_one_reg() when reg is unknown") Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-3-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-03-06RISC-V: KVM: Fix error code returned for Smstateen ONE_REGAnup Patel
Return -ENOENT for Smstateen ONE_REG when: 1) Smstateen is not enabled for a VCPU 2) ONE_REG id is out of range This will make Smstateen ONE_REG error codes consistent with other ONE_REG interfaces of KVM RISC-V. Fixes: c04913f2b54e ("RISCV: KVM: Add sstateen0 to ONE_REG") Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120080013.2153519-2-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>