| Age | Commit message (Collapse) | Author |
|
When compiling with sparse enabled (C=2), bitwise type warnings are
triggered in the RISC-V KVM implementation. This occurs because the
user-space data unboxing macro '__get_user_asm' performs implicit
casting on restricted types without forcing the compiler's compliance.
Additionally, raw 'unsigned long *' pointers are used to access the
SBI NACL shared memory, whereas the RISC-V SBI specification mandates
that these structures must follow little-endian byte ordering.
Fix these by:
1. Adding a '__force' cast to '__get_user_asm()' to safely suppress
implicit cast warnings during user-space data fetching.
2. Introducing the '__lelong' type macro, which dynamically resolves to
'__le32' or '__le64' depending on XLEN, and replacing 'unsigned long *'
with '__lelong *' to enforce proper compile-time endianness checks.
Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260608155252.4292-1-seanwascoding@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Same as kvm_riscv_gstage_wp_range, the possible valid pages should not
be skipped if !found_leaf. Different from wp case, which can
write-protect more than asked, unmap can't do that, no splitting is
added right now but a warning is logged instead.
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260604230317.30501-3-atwufei@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The current gstage range walker unconditionally advances by 'page_size'
when a leaf PTE is not found, e.g. when the range to wp is
[0xfffff01fc000, 0xfffff023c000) and page_size is 2MB, if found_leaf of
0xfffff01fc000 returns false, it skip the whole range, but it's possible
to have valid entries in [0xfffff0200000, 0xfffff023c000).
Signed-off-by: Wu Fei <wu.fei9@sanechips.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260604230317.30501-2-atwufei@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
RISC-V KVM has used the hugetlb VMA size directly as the G-stage
mapping size since stage-2 page table support was added. That is safe
only if the block covered by the fault is fully contained in the
memslot and the userspace address has the same offset as the GPA
within that block.
The THP path already checks those constraints before installing a PMD
block mapping. The hugetlb path did not, so an unaligned memslot could
make KVM install a PMD or PUD sized G-stage block that covers memory
outside the slot or maps the wrong host pages.
Pass the target mapping size into fault_supports_gstage_huge_mapping().
The same helper can be used for both THP PMD mappings and hugetlb
PMD/PUD mappings.
Select hugetlb mapping sizes through the same memslot-boundary check,
falling back from PUD to PMD to PAGE_SIZE. When a smaller hugetlb
mapping size is selected, fault the GFN aligned to that selected size
instead of the original VMA size.
Also keep hugetlb mappings out of transparent_hugepage_adjust(). Once
the hugetlb path has chosen PAGE_SIZE, promoting it again through the
THP helper would miss the hugetlb fallback decision.
Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260604142602.3582602-2-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
changes
Fix a bug where FWFT features could be incorrectly exposed to guests
after userspace disables their dependent ISA extensions at runtime.
The 'supported' field in kvm_sbi_fwft_config was set once during vCPU
initialization based on the initial hardware/extension availability.
However, when userspace subsequently disables ISA extensions via the KVM
ONE_REG interface, the 'supported' field was not updated. This caused
the following issues:
1. FWFT features would remain visible and accessible to guests even
after their prerequisite ISA extensions were disabled
2. Guests could configure FWFT features that depend on disabled
extensions, leading to undefined behavior
3. The static 'supported' flag and the dynamic supported() callback
could disagree about feature availability
The fix introduces a two-layer checking mechanism:
1. Add an optional init() callback to the kvm_sbi_fwft_feature structure
for features that require hardware probing during initialization. This
separates the one-time hardware detection logic from the runtime
availability check.
2. Add runtime checks in all FWFT-related functions that call
feature->supported(vcpu) if the callback exists. This ensures feature
availability is re-evaluated based on the current ISA extension state.
This approach maintains the cached 'supported' field for initialization-
time decisions while ensuring runtime availability is always determined
by the current vCPU configuration, not initialization-time snapshots.
Fixes: 6b72fd170592 ("RISC-V: KVM: add support for FWFT SBI extension")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-3-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add an optional init() callback to separate one-time hardware probing
from runtime availability checks. For pointer masking, this allows
probing supported PMM lengths during initialization while checking ISA
extension availability at runtime.
Fix try_to_set_pmm() to restore the previous HENVCFG.PMM value after
probing, preventing side effects from hardware detection. Add preemption
protection to ensure CSR probe sequences complete atomically on the same
CPU.
Fixes: 6f576fc0aeb9 ("RISC-V: KVM: Add support for SBI_FWFT_POINTER_MASKING_PMLEN")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-2-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Mark the vCPU CSRs as dirty after successfully setting an FWFT feature
value. FWFT features may modify CSRs (e.g., pointer masking modifies
henvcfg.PMM), and failing to mark them dirty can lead to the guest
observing stale CSR state after vCPU scheduling or migration.
Fixes: 1323a5cfe52c ("KVM: riscv: Skip CSR restore if VCPU is reloaded on the same core")
Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260601-kvm-get_reg_list-v2-v5-1-415d08a2813b@sifive.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When enabling dirty ring, the dirty bitmap is disable, and the logging
check is always false as the RISC-V architecture does not select
"NEED_KVM_DIRTY_RING_WITH_BITMAP". Although the dirty log is recorded
since the write path already trying to add the dirty log, the logic for
logging check is broken and some side effect will occurs.
Enhance the logging check for mmu mapping so it can check both the dirty
ring and the dirty bitmap.
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260528113840.2629186-1-inochiama@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM_REG_RISCV_TIMER_REG(state) one-reg write passes the value
written by userspace to kvm_riscv_vcpu_timer_next_event() when
re-enabling the timer.
That value is the timer state, KVM_RISCV_TIMER_STATE_ON, not the
timer compare value. During migration or state restore, userspace
restores the compare register separately, which stores the target
cycle in t->next_cycles. Re-arming the timer with the state value
schedules the next event at cycle 1 instead of the restored compare
value, causing the virtual timer to fire too early.
Use the restored compare value from t->next_cycles when turning the
timer back on.
Fixes: 3a9f66cb25e1 ("RISC-V: KVM: Add timer functionality")
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260526075544.796396-1-maqianga@uniontech.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Fuzzer reported a NULL pointer dereference in
kvm_riscv_vcpu_aia_imsic_put() when a VCPU's imsic_state was NULL while
kvm_riscv_aia_initialized() returned true.
The global initialized flag is set per-VM in aia_init(), but imsic_state
is allocated per-VCPU in kvm_riscv_vcpu_aia_imsic_init(). If a VCPU is
created after aia_init() has already run, its imsic_state remains NULL
while the global flag is true. When this VCPU is preempted, kvm_sched_out()
calls kvm_arch_vcpu_put() -> kvm_riscv_vcpu_aia_put() ->
kvm_riscv_vcpu_aia_imsic_put() which dereferences NULL.
Add NULL pointer guards to kvm_riscv_vcpu_aia_imsic_put(), consistent with
the NULL checks already present in all other functions in the same file.
Also add a NULL guard to kvm_riscv_vcpu_aia_imsic_release() and
kvm_riscv_vcpu_aia_imsic_has_interrupt() for the same reason.
Fixes: 4cec89db80ba ("RISC-V: KVM: Move HGEI[E|P] CSR access to IMSIC virtualization")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Assisted-by: YuanSheng:DeepSeek-V3.2
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260526031517.1166025-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The SUSP handler checks that all other vCPUs are stopped before
entering system suspend, but a concurrent HSM HART_START can start
a vCPU after it has already passed the check.
This is a known TOCTOU race. We do not fix it because:
1. Triggering it requires a pathological guest.
2. Only guest state is at risk, not host integrity.
3. Userspace can double-check vCPU states before suspend.
Add a comment documenting the race and the rationale for not fixing it.
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Assisted-by: YuanSheng:DeepSeek-V3.2
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260525013642.999187-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
With dirty logging enabled, guest writes often fault on an existing 4K
G-stage leaf that was write-protected only for dirty tracking. The slow
path still performs the full fault handling flow and takes mmu_lock for
write, even though the page-table shape does not change.
x86 handles the analogous case in its fast page fault path by atomically
making a writable SPTE writable again when the fault is only a
write-protection fault. Add the same style of fast path for RISC-V. If a
write fault hits an existing 4K leaf in a writable dirty-log memslot,
mark the page dirty and atomically set the PTE writable and dirty under
the read side of mmu_lock.
The dirty bitmap is updated before the PTE becomes writable again. The
PTE D bit is also set so systems that trap on a clear D bit do not fall
back to the slow path for a writable but clean PTE.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-6-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When a fault hits an existing G-stage leaf with the same PFN, KVM only
needs to update the PTE permissions. This path will be used by read-side
fault handling, so it must not overwrite a concurrent PTE update.
Use the cmpxchg helper when relaxing permissions on an existing leaf,
following the same concurrency model used by x86 for atomic SPTE
permission updates. Retry if another CPU changed the PTE first, and use
cpu_relax() while spinning.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-5-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Permission-only G-stage PTE updates can run in parallel once they are
moved to the read side of mmu_lock. Plain set_pte() is not enough for
that case because another CPU may update the same PTE first.
x86 handles the same class of SPTE races with cmpxchg-based updates in
its fast page fault and TDP MMU paths. Add a small RISC-V helper for
atomic G-stage PTE updates. The helper reports contention to the caller
and flushes the target range only when the PTE value actually changes.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-4-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
RISC-V KVM currently uses a spinlock for mmu_lock. That serializes all
G-stage MMU operations, including permission-only updates that do not
allocate or free page-table pages.
Use KVM's rwlock form of mmu_lock, as x86 and arm64 already do. Keep the
existing map, unmap and teardown paths on the write side. This prepares
RISC-V for read-side handling of G-stage permission updates.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-3-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The common KVM invalidation paths call kvm_unmap_gfn_range() with
mmu_lock already held for write.
For the standard MMU notifier path, the call chain is:
kvm_mmu_notifier_invalidate_range_start()
kvm_handle_hva_range()
kvm_unmap_gfn_range()
kvm_mmu_notifier_invalidate_range_start() leaves range.lockless clear.
kvm_handle_hva_range() therefore takes KVM_MMU_LOCK(kvm) before invoking
the handler.
The guest_memfd path has the same locking contract:
__kvm_gmem_invalidate_begin()
kvm_mmu_unmap_gfn_range()
kvm_unmap_gfn_range()
__kvm_gmem_invalidate_begin() explicitly takes KVM_MMU_LOCK(kvm) before
calling kvm_mmu_unmap_gfn_range().
So remove the local trylock and make the common locking contract explicit
with lockdep_assert_held_write() like x86.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517153427.94889-2-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Store the per-source APLIC IRQ state in the APLIC allocation instead
of allocating it separately.
This ties the IRQ state lifetime directly to the APLIC state, removes a
separate allocation failure path, and lets __counted_by() describe the
array bounds.
Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260511032144.361520-1-rosenp@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
KVM RISC-V triggers a TLB flush for every single stage-2 PTE
modification (unmap or write-protect) now. Although KVM coalesces the
hardware IPIs, the software overhead of executing the flush work
for every page is large, especially during dirty page tracking.
Following the approach used in x86 and arm64, this patch optimizes
the MMU logic by making the PTE manipulation functions return a boolean
indicating if a leaf PTE was actually changed. The outer MMU functions
bubble up this flag to batch the remote TLB flushes.
Consequently, the flush operation is executed only once per batch.
Moving it outside of the `mmu_lock` also reduces lock contention.
Tested with tools/testing/selftests/kvm on a 4-vCPU guest (Host
environment: QEMU 10.2.1 RISC-V)
1. demand_paging_test (1GB memory)
time ./demand_paging_test -b 1G -v 4
- Total execution time reduced from ~2m39s to ~2m31s
2. dirty_log_perf_test (1GB memory)
./dirty_log_perf_test -b 1G -v 4
- "Clear dirty log time" per iteration dropped significantly from
~3.40s to ~0.18s
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Link: https://lore.kernel.org/r/20260412023822.83341-1-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Previously, the number of Hypervisor Guest External Interrupt (HGEI)
lines was stored in a single global variable `kvm_riscv_aia_nr_hgei`
and assumed to be the same for all HARTs. This assumption does not
hold on heterogeneous RISC-V SoCs where different cores may expose
different HGEIE CSR widths.
Introduce `nr_hgei` field into the per-CPU `struct aia_hgei_control`
and probe the actual supported HGEI count for the current HART in
`kvm_riscv_aia_enable()` using the standard RISC-V CSR probe technique:
csr_write(CSR_HGEIE, -1UL);
nr = fls_long(csr_read(CSR_HGEIE));
if (nr)
nr--;
All HGEI allocation, free and disable paths (`kvm_riscv_aia_free_hgei()`,
`kvm_riscv_aia_disable()`, etc.) now use the per-CPU value instead of
the global one.
The global `kvm_riscv_aia_nr_hgei` now represents the minimum number
of HGEI lines across HARTs and can be used to check whether HGEI
support is available or not.
This makes KVM AIA robust on big.LITTLE-style asymmetric platforms.
Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260525094945.3721783-3-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The ebreak self test enables/disables guest debugging as a part of the
test. However the KVM_SET_GUEST_DEBUG ioctl doesn't actually do it.
Fixing it by calling kvm_riscv_vcpu_config_guest_debug.
Fixes: 6ed523e2b612 ("RISC-V: KVM: Factor-out VCPU config into separate sources")
Signed-off-by: Mayuresh Chitale <mayuresh.chitale@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260525095930.3924905-1-mayuresh.chitale@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The kvm_riscv_vcpu_mmio_return() function handles MMIO read results
by writing the data back to the guest register. For signed load
instructions (LB, LH, LW on RV64), the value needs sign-extension
from a smaller integer to unsigned long.
The current code uses:
(ulong)data << shift >> shift
but (ulong) makes the right shift a logical shift (zero-extend)
rather than an arithmetic shift (sign-extend), causing incorrect
results when the MMIO device returns a negative value. For example,
LB reading 0x80 would return 128 instead of -128.
Fix this by casting to (long) after the left shift so that the
subsequent right shift is arithmetic and correctly propagates
the sign bit:
(long)((ulong)data << shift) >> shift
Additionally, remove the unnecessary shift assignment for LBU
(unsigned byte load) since it does not need sign extension.
This makes LBU consistent with LHU and LWU which already keep
shift = 0.
Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Assisted-by: OpenClaw:DeepSeek-V3.2
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260514081752.472987-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The SBI v0.1 SEND_IPI handler iterates over the hart mask and calls
kvm_get_vcpu_by_id() to find the target vcpu for each set bit. When a
guest provides a hart mask containing bits for non-existent vcpu_ids,
kvm_get_vcpu_by_id() returns NULL, which is then unconditionally
dereferenced by kvm_riscv_vcpu_set_interrupt(), causing a kernel crash.
Fix this by adding a NULL check before dereferencing the return value.
If the target vcpu is not found, skip it and continue processing the
remaining valid harts.
Fixes: a046c2d8578c ("RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own file")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Assisted-by: OpenClaw:DeepSeek-V3.2
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260517124414.420919-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
kvm_riscv_vcpu_pmu_event_info() returned -ENOMEM from the
SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall()
to abort KVM_RUN and surface the error to userspace instead of
completing the ECALL with a negative SBI error in a0.
Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU
handlers and kvm_sbi_ext_pmu_handler comment.
Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260514173642.41448-2-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
kvm_riscv_vcpu_pmu_snapshot_set_shmem() returned -ENOMEM from the
SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to
abort KVM_RUN and surface the error to userspace instead of
ompleting the ECALL with a negative SBI error in a0.
Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU
handlers and kvm_sbi_ext_pmu_handler comment.
Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260514173642.41448-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
kvm_riscv_vcpu_record_steal_time() assumes that the steal-time shared
memory GPA (vcpu->arch.sta.shmem) is always backed by a valid guest
memory slot. However, this assumption is not guaranteed by the KVM
userspace ABI.
A malicious or buggy userspace can set the STA shared memory GPA via
KVM_SET_ONE_REG without establishing a corresponding memory region via
KVM_SET_USER_MEMORY_REGION. In such cases, the GPA cannot be translated
to a valid HVA and kvm_vcpu_gfn_to_hva() returns an error address.
The current implementation incorrectly treats this as a kernel warning
using WARN_ON(), which may escalate to a kernel panic when panic_on_warn
is enabled.
This is not a kernel bug condition but a normal invalid configuration
from userspace, and should be handled gracefully.
Fix it by removing WARN_ON() and treating invalid HVA as a normal
failure case, resetting the STA shared memory state.
Fixes: e9f12b5fff8ad0 ("RISC-V: KVM: Implement SBI STA extension")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Assisted-by: OpenClaw:DeepSeek-V3.2
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260415075216.2757427-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"There is one significant change outside arch/riscv in this pull
request: the addition of a set of KUnit tests for strlen(), strnlen(),
and strrchr().
Otherwise, the most notable changes are to add some RISC-V-specific
string function implementations, to remove XIP kernel support, to add
hardware error exception handling, and to optimize our runtime
unaligned access speed testing.
A few comments on the motivation for removing XIP support. It's been
broken in the RISC-V kernel for months. The code is not easy to
maintain. Furthermore, for XIP support to truly be useful for RISC-V,
we think that compile-time feature switches would need to be added for
many of the RISC-V ISA features and microarchitectural properties that
are currently implemented with runtime patching. No one has stepped
forward to take responsibility for that work, so many of us think it's
best to remove it until clear use cases and champions emerge.
Summary:
- Add Kunit correctness testing and microbenchmarks for strlen(),
strnlen(), and strrchr()
- Add RISC-V-specific strnlen(), strchr(), strrchr() implementations
- Add hardware error exception handling
- Clean up and optimize our unaligned access probe code
- Enable HAVE_IOREMAP_PROT to be able to use generic_access_phys()
- Remove XIP kernel support
- Warn when addresses outside the vmemmap range are passed to
vmemmap_populate()
- Update the ACPI FADT revision check to warn if it's not at least
ACPI v6.6, which is when key RISC-V-specific tables were added to
the specification
- Increase COMMAND_LINE_SIZE to 2048 to match ARM64, x86, PowerPC,
etc.
- Make kaslr_offset() a static inline function, since there's no need
for it to show up in the symbol table
- Add KASLR offset and SATP to the VMCOREINFO ELF notes to improve
kdump support
- Add Makefile cleanup rule for vdso_cfi copied source files, and add
a .gitignore for the build artifacts in that directory
- Remove some redundant ifdefs that check Kconfig macros
- Add missing SPDX license tag to the CFI selftest
- Simplify UTS_MACHINE assignment in the RISC-V Makefile
- Clarify some unclear comments and remove some superfluous comments
- Fix various English typos across the RISC-V codebase"
* tag 'riscv-for-linus-7.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits)
riscv: Remove support for XIP kernel
riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
riscv: Split out compare_unaligned_access()
riscv: Reuse measure_cycles() in check_vector_unaligned_access()
riscv: Split out measure_cycles() for reuse
riscv: Clean up & optimize unaligned scalar access probe
riscv: lib: add strrchr() implementation
riscv: lib: add strchr() implementation
riscv: lib: add strnlen() implementation
lib/string_kunit: extend benchmarks to strnlen() and chr searches
lib/string_kunit: add performance benchmark for strlen()
lib/string_kunit: add correctness test for strrchr()
lib/string_kunit: add correctness test for strnlen()
lib/string_kunit: add correctness test for strlen()
riscv: vdso_cfi: Add .gitignore for build artifacts
riscv: vdso_cfi: Add clean rule for copied sources
riscv: enable HAVE_IOREMAP_PROT
riscv: mm: WARN_ON() for bad addresses in vmemmap_populate()
riscv: acpi: update FADT revision check to 6.6
riscv: add hardware error trap handler support
...
|
|
The make_xfence_request() function uses a shift operation to check if a
vCPU is in the hart mask:
if (!(hmask & (1UL << (vcpu->vcpu_id - hbase))))
However, when the difference between vcpu_id and hbase
is >= BITS_PER_LONG, the shift operation causes undefined behavior.
This was detected by UBSAN:
UBSAN: shift-out-of-bounds in arch/riscv/kvm/tlb.c:343:23
shift exponent 256 is too large for 64-bit type 'long unsigned int'
Fix this by adding a bounds check before the shift operation.
This bug was found by fuzzing the KVM RISC-V interface.
Fixes: 13acfec2dbcc ("RISC-V: KVM: Add remote HFENCE functions based on VCPU requests")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260403232011.2394966-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Fix various typos in RISC-V architecture code and comments.
The following changes are included:
- arch/riscv/errata/thead/errata.c: "futher" → "further"
- arch/riscv/include/asm/atomic.h: "therefor" → "therefore", "arithmatic" → "arithmetic"
- arch/riscv/include/asm/elf.h: "availiable" → "available", "coorespends" → "corresponds"
- arch/riscv/include/asm/processor.h: "requries" → "is required"
- arch/riscv/include/asm/thread_info.h: "returing" → "returning"
- arch/riscv/kernel/acpi.c: "compliancy" → "compliance"
- arch/riscv/kernel/ftrace.c: "therefor" → "therefore"
- arch/riscv/kernel/head.S: "intruction" → "instruction"
- arch/riscv/kernel/mcount-dyn.S: "localtion → "location"
- arch/riscv/kernel/module-sections.c: "maxinum" → "maximum"
- arch/riscv/kernel/probes/kprobes.c: "reenabled" → "re-enabled"
- arch/riscv/kernel/probes/uprobes.c: "probbed" → "probed"
- arch/riscv/kernel/soc.c: "extremly" → "extremely"
- arch/riscv/kernel/suspend.c: "incosistent" → "inconsistent"
- arch/riscv/kvm/tlb.c: "cahce" → "cache"
- arch/riscv/kvm/vcpu_pmu.c: "indicies" → "indices"
- arch/riscv/lib/csum.c: "implmentations" → "implementations"
- arch/riscv/lib/memmove.S: "ammount" → "amount"
- arch/riscv/mm/cacheflush.c: "visable" → "visible"
- arch/riscv/mm/physaddr.c: "aginst" → "against"
Signed-off-by: Sean Chang <seanwascoding@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/20260212163325.60389-1-seanwascoding@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Reuse KVM_CAP_VM_GPA_BITS to advertise and select the effective
G-stage GPA width for a VM.
KVM_CHECK_EXTENSION(KVM_CAP_VM_GPA_BITS) returns the effective GPA
bits for a VM, KVM_ENABLE_CAP(KVM_CAP_VM_GPA_BITS) allows userspace
to downsize the effective GPA width by selecting a smaller G-stage
page table format:
- gpa_bits <= 41 selects Sv39x4 (pgd_levels=3)
- gpa_bits <= 50 selects Sv48x4 (pgd_levels=4)
- gpa_bits <= 59 selects Sv57x4 (pgd_levels=5)
Reject the request with -EINVAL for unsupported values and with -EBUSY
if vCPUs have been created or any memslot is populated.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260403153019.9916-4-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Gstage page-table helpers frequently chase gstage->kvm->arch to
fetch pgd_levels. This adds noise and repeats the same dereference
chain in hot paths.
Add pgd_levels to struct kvm_gstage and initialize it from kvm->arch
when setting up a gstage instance. Introduce kvm_riscv_gstage_init()
to centralize initialization and switch gstage code to use
gstage->pgd_levels.
Suggested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/20260403153019.9916-3-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Introduces one per-VM architecture-specific fields to support runtime
configuration of the G-stage page table format:
- kvm->arch.pgd_levels: the corresponding number of page table levels
for the selected mode.
These fields replace the previous global variables
kvm_riscv_gstage_mode and kvm_riscv_gstage_pgd_levels, enabling different
virtual machines to independently select their G-stage page table format
instead of being forced to share the maximum mode detected by the kernel
at boot time.
Signed-off-by: Fangyu Yu <fangyu.yu@linux.alibaba.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://lore.kernel.org/r/20260403153019.9916-2-fangyu.yu@linux.alibaba.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The hstateen0 will be programmed differently for guest HS-mode
and guest VS/VU-mode so don't check hstateen0.SSTATEEN0 bit when
updating sstateen0 CSR in kvm_riscv_vcpu_swap_in_guest_state()
and kvm_riscv_vcpu_swap_in_host_state().
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-10-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The VCPU config deals with hideleg, hedeleg, henvcfg, and hstateenX
CSR configuration for each VCPU. Factor-out VCPU config into separate
sources so that VCPU config can do things differently for guest HS-mode
and guest VS/VU-mode.
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-9-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The hideleg CSR state when VCPU is running in guest VS/VU-mode will
be different from when it is running in guest HS-mode. To achieve
this, add hideleg to struct kvm_vcpu_config and re-program hideleg
CSR upon every kvm_arch_vcpu_load().
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-8-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The KVM ISA extension related checks are not VCPU specific and
should be factored out of vcpu_onereg.c into separate sources.
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-6-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Rename kvm_riscv_vcpu_isa_check_host() to kvm_riscv_isa_check_host()
and use it as common function with KVM RISC-V to check isa extensions
supported by host.
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-5-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Currently, kvm_arch_vcpu_load() unconditionally restores guest CSRs,
HGATP, and AIA state. However, when a VCPU is loaded back on the same
physical CPU, and no other KVM VCPU has run on this CPU since it was
last put, the hardware CSRs and AIA registers are still valid.
This patch optimizes the vcpu_load path by skipping the expensive CSR
and AIA writes if all the following conditions are met:
1. It is being reloaded on the same CPU (vcpu->arch.last_exit_cpu == cpu).
2. The CSRs are not dirty (!vcpu->arch.csr_dirty).
3. No other VCPU used this CPU (vcpu == __this_cpu_read(kvm_former_vcpu)).
To ensure this fast-path doesn't break corner cases:
- Live migration and VCPU reset are naturally safe. KVM initializes
last_exit_cpu to -1, which guarantees the fast-path won't trigger.
- The 'csr_dirty' flag tracks runtime userspace interventions. If
userspace modifies guest configurations (e.g., hedeleg via
KVM_SET_GUEST_DEBUG, or CSRs including AIA via KVM_SET_ONE_REG),
the flag is set to skip the fast path.
With the 'csr_dirty' safeguard proven effective, it is safe to
include kvm_riscv_vcpu_aia_load() inside the skip logic now.
Signed-off-by: Jinyu Tang <tjytimi@163.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Reviewed-by: Radim Krčmář <radim.krcmar@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260227121008.442241-1-tjytimi@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
During dirty logging, all huge pages are write-protected. When the guest
writes to a write-protected huge page, a page fault is triggered. Before
recovering the write permission, the huge page must be split into smaller
pages (e.g., 4K). After splitting, the normal mapping process proceeds,
allowing write permission to be restored at the smaller page granularity.
If dirty logging is disabled because migration failed or was cancelled,
only recover the write permission at the 4K level, and skip recovering the
huge page mapping at this time to avoid the overhead of freeing page tables.
The huge page mapping can be recovered in the ioctl context, similar to x86,
in a later patch.
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/202603301612587174XZ6QMCrymBqv30S6BN50@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When enabling dirty log in small chunks (e.g., QEMU default chunk
size of 256K), the chunk size is always smaller than the page size
of huge pages (1G or 2M) used in the gstage page tables. This caused
the write protection to be incorrectly skipped for huge PTEs because
the condition `(end - addr) >= page_size` was not satisfied.
Remove the size check in `kvm_riscv_gstage_wp_range()` to ensure huge
PTEs are always write-protected regardless of the chunk size. Additionally,
explicitly align the address down to the page size before invoking
`kvm_riscv_gstage_op_pte()` to guarantee that the address passed to the
operation function is page-aligned.
This fixes the issue where dirty pages might not be tracked correctly
when using huge pages.
Fixes: 9d05c1fee837 ("RISC-V: KVM: Implement stage2 page table programming")
Signed-off-by: Wang Yechao <wang.yechao255@zte.com.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/202603301610527120YZ-pAJY6x9SBpSRo1Wg4@zte.com.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When a guest initiates an SBI_EXT_PMU_COUNTER_CFG_MATCH call with
ctr_base=0xfffffffffffffffe, ctr_mask=0xeb5f and flags=0x1
(SBI_PMU_CFG_FLAG_SKIP_MATCH), kvm_riscv_vcpu_pmu_ctr_cfg_match()
first invokes kvm_pmu_validate_counter_mask() to verify whether
ctr_base and ctr_mask are valid, by evaluating:
!ctr_mask || (ctr_base + __fls(ctr_mask) >= kvm_pmu_num_counters(kvpmu))
With the above inputs, __fls(0xeb5f) equals 15, and adding 15 to
0xfffffffffffffffe causes an integer overflow, wrapping around to 13.
Since 13 is less than kvm_pmu_num_counters(), the validation wrongly
succeeds.
Thereafter, since flags & SBI_PMU_CFG_FLAG_SKIP_MATCH is satisfied,
the code evaluates:
!test_bit(ctr_base + __ffs(ctr_mask), kvpmu->pmc_in_use)
Here __ffs(0xeb5f) equals 0, so test_bit() receives 0xfffffffffffffffe
as the bit index and attempts to access the corresponding element of
the kvpmu->pmc_in_use, which results in an invalid memory access. This
triggers the following Oops:
Unable to handle kernel paging request at virtual address e3ebffff12abba89
generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128
kvm_riscv_vcpu_pmu_ctr_cfg_match arch/riscv/kvm/vcpu_pmu.c:758
kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:49
kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240
The root cause is that kvm_pmu_validate_counter_mask() does not account
for the case where ctr_base itself is out of range, allowing the
subsequent addition to silently overflow and bypass the check.
Fix this by explicitly validating ctr_base against kvm_pmu_num_counters()
before performing the addition.
This bug was found by fuzzing the KVM RISC-V PMU interface.
Fixes: 0cb74b65d2e5e6 ("RISC-V: KVM: Implement perf support without sampling")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Atish Patra <atish.patra@linux.dev>
Link: https://lore.kernel.org/r/20260319035902.924661-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
In kvm_riscv_vcpu_pmu_snapshot_set_shmem(), when kvm_vcpu_write_guest()
fails, kvpmu->sdata is freed but not set to NULL. This leaves a dangling
pointer that will be freed again when kvm_pmu_clear_snapshot_area() is
called during vcpu teardown, triggering a KASAN double-free report.
First free occurs in kvm_riscv_vcpu_pmu_snapshot_set_shmem():
kvm_riscv_vcpu_pmu_snapshot_set_shmem arch/riscv/kvm/vcpu_pmu.c:443
kvm_sbi_ext_pmu_handler arch/riscv/kvm/vcpu_sbi_pmu.c:74
kvm_riscv_vcpu_sbi_ecall arch/riscv/kvm/vcpu_sbi.c:608
kvm_riscv_vcpu_exit arch/riscv/kvm/vcpu_exit.c:240
kvm_arch_vcpu_ioctl_run arch/riscv/kvm/vcpu.c:1008
kvm_vcpu_ioctl virt/kvm/kvm_main.c:4476
Second free (double-free) occurs in kvm_pmu_clear_snapshot_area():
kvm_pmu_clear_snapshot_area arch/riscv/kvm/vcpu_pmu.c:403 [inline]
kvm_riscv_vcpu_pmu_deinit.part arch/riscv/kvm/vcpu_pmu.c:905
kvm_riscv_vcpu_pmu_deinit arch/riscv/kvm/vcpu_pmu.c:893
kvm_arch_vcpu_destroy arch/riscv/kvm/vcpu.c:199
kvm_vcpu_destroy virt/kvm/kvm_main.c:469 [inline]
kvm_destroy_vcpus virt/kvm/kvm_main.c:489
kvm_arch_destroy_vm arch/riscv/kvm/vm.c:54
kvm_destroy_vm virt/kvm/kvm_main.c:1301 [inline]
kvm_put_kvm virt/kvm/kvm_main.c:1338
kvm_vm_release virt/kvm/kvm_main.c:1361
Fix it by setting kvpmu->sdata to NULL after kfree() in
kvm_riscv_vcpu_pmu_snapshot_set_shmem(), so that the subsequent
kfree(NULL) in kvm_pmu_clear_snapshot_area() becomes a safe no-op.
This bug was found by fuzzing the KVM RISC-V PMU interface.
Fixes: c2f41ddbcdd756 ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260318092956.708246-1-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add WARN_ON check before accessing cntx->vector.datap in
kvm_riscv_vcpu_vreg_addr() to detect potential null pointer
dereferences early, consistent with the pattern used in
kvm_riscv_vcpu_vector_reset().
This helps catch initialization issues where vector context
allocation may have failed.
Signed-off-by: Yufeng Wang <wangyufeng@kylinos.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20260317114759.53165-1-r4o5m6e8o@163.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When a guest invokes SBI_EXT_PMU_COUNTER_FW_READ or
SBI_EXT_PMU_COUNTER_FW_READ_HI on a firmware counter that has not been
configured via SBI_EXT_PMU_COUNTER_CFG_MATCH, the pmc->event_idx remains
SBI_PMU_EVENT_IDX_INVALID (0xFFFFFFFF). get_event_code() extracts the
lower 16 bits, yielding 0xFFFF (65535), which is then used to index into
kvpmu->fw_event[]. Since fw_event is only RISCV_KVM_MAX_FW_CTRS (32)
entries, this triggers an array-index-out-of-bounds:
UBSAN: array-index-out-of-bounds in arch/riscv/kvm/vcpu_pmu.c:255:37
index 65535 is out of range for type 'kvm_fw_event [32]'
Add a check for the known unconfigured case (SBI_PMU_EVENT_IDX_INVALID)
and a WARN_ONCE guard for any unexpected out-of-bounds event codes,
returning -EINVAL in both cases.
Fixes: badc386869e2c ("RISC-V: KVM: Support firmware events")
Fixes: 08fb07d6dcf71 ("RISC-V: KVM: Support 64 bit firmware counters on RV32")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260316014533.2312254-2-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When the second kzalloc (host_context.vector.datap) fails in
kvm_riscv_vcpu_alloc_vector_context, the first allocation
(guest_context.vector.datap) is leaked. Free it before returning.
Fixes: 0f4b82579716 ("riscv: KVM: Add vector lazy save/restore support")
Cc: stable@vger.kernel.org
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20260316151612.13305-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
When saddr_high != 0 on RV32, the goto out was unconditional, causing
valid 64-bit addresses to be rejected. Only goto out when the address
is invalid (64-bit host with saddr_high != 0).
Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature")
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260311231833.13189-1-osama.abdelkader@gmail.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The RISC-V SBI Steal-Time Accounting (STA) extension requires the shared
memory physical address to be 64-byte aligned, or set to all-ones to
explicitly disable steal-time accounting.
KVM exposes the SBI STA shared memory configuration to userspace via
KVM_SET_ONE_REG. However, the current implementation of
kvm_sbi_ext_sta_set_reg() does not validate the alignment of the configured
shared memory address. As a result, userspace can install a misaligned
shared memory address that violates the SBI specification.
Such an invalid configuration may later reach runtime code paths that
assume a valid and properly aligned shared memory region. In particular,
KVM_RUN can trigger the following WARN_ON in
kvm_riscv_vcpu_record_steal_time():
WARNING: arch/riscv/kvm/vcpu_sbi_sta.c:49 at
kvm_riscv_vcpu_record_steal_time
WARN_ON paths are not expected to be reachable during normal runtime
execution, and may result in a kernel panic when panic_on_warn is enabled.
Fix this by validating the computed shared memory GPA at the
KVM_SET_ONE_REG boundary. A temporary GPA is constructed and checked
before committing it to vcpu->arch.sta.shmem. The validation allows
either a 64-byte aligned GPA or INVALID_GPA (all-ones), which disables
STA as defined by the SBI specification.
This prevents invalid userspace state from reaching runtime code paths
that assume SBI STA invariants and avoids unexpected WARN_ON behavior.
Fixes: f61ce890b1f074 ("RISC-V: KVM: Add support for SBI STA registers")
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260303010859.1763177-2-xujiakai2025@iscas.ac.cn
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
HEAD
KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end.
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite being
rather unintuitive.
|
|
The KVM user-space may create KVM AIA irqchip before checking
VCPU Ssaia extension availability so KVM AIA irqchip must fail
when host does not have Ssaia extension.
Fixes: 89d01306e34d ("RISC-V: KVM: Implement device interface for AIA irqchip")
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-4-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Return -ENOENT for Ssaia ONE_REG when Ssaia is not enabled
for a VCPU.
This will make Ssaia ONE_REG error codes consistent with
other ONE_REG interfaces of KVM RISC-V.
Fixes: 2a88f38cd58d ("RISC-V: KVM: return ENOENT in *_one_reg() when reg is unknown")
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-3-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Return -ENOENT for Smstateen ONE_REG when:
1) Smstateen is not enabled for a VCPU
2) ONE_REG id is out of range
This will make Smstateen ONE_REG error codes consistent
with other ONE_REG interfaces of KVM RISC-V.
Fixes: c04913f2b54e ("RISCV: KVM: Add sstateen0 to ONE_REG")
Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260120080013.2153519-2-anup.patel@oss.qualcomm.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|