summaryrefslogtreecommitdiff
path: root/arch/loongarch
AgeCommit message (Collapse)Author
11 daysLoongArch: KVM: Fix base address calculation in kvm_eiointc_regs_access()Bibo Mao
In function kvm_eiointc_regs_access(), the register base address is caculated from array base address plus offset, the offset is absolute value from the base address. The data type of array base address is u64, it should be converted into the "void *" type and then plus the offset. Cc: <stable@vger.kernel.org> Fixes: d3e43a1f34ac ("LoongArch: KVM: Use 64-bit register definition for EIOINTC"). Reported-by: Aurelien Jarno <aurel32@debian.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1131431 Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
11 daysLoongArch: KVM: Handle the case that EIOINTC's coremap is emptyHuacai Chen
EIOINTC's coremap in eiointc_update_sw_coremap() can be empty, currently we get a cpuid with -1 in this case, but we actually need 0 because it's similar as the case that cpuid >= 4. This fix an out-of-bounds access to kvm_arch::phyid_map::phys_map[]. Cc: <stable@vger.kernel.org> Fixes: 3956a52bc05bd81 ("LoongArch: KVM: Add EIOINTC read and write functions") Reported-by: Aurelien Jarno <aurel32@debian.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1131431 Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
11 daysLoongArch: KVM: Make kvm_get_vcpu_by_cpuid() more robustHuacai Chen
kvm_get_vcpu_by_cpuid() takes a cpuid parameter whose type is int, so cpuid can be negative. Let kvm_get_vcpu_by_cpuid() return NULL for this case so as to make it more robust. This fix an out-of-bounds access to kvm_arch::phyid_map::phys_map[]. Cc: <stable@vger.kernel.org> Fixes: 73516e9da512adc ("LoongArch: KVM: Add vcpu mapping from physical cpuid") Reported-by: Aurelien Jarno <aurel32@debian.org> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1131431 Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
11 daysLoongArch: vDSO: Emit GNU_EH_FRAME correctlyXi Ruoyao
With -fno-asynchronous-unwind-tables and --no-eh-frame-hdr (the default of the linker), the GNU_EH_FRAME segment (specified by vdso.lds.S) is empty. This is not valid, as the current DWARF specification mandates the first byte of the EH frame to be the version number 1. It causes some unwinders to complain, for example the ClickHouse query profiler spams the log with messages: clickhouse-server[365854]: libunwind: unsupported .eh_frame_hdr version: 127 at 7ffffffb0000 Here "127" is just the byte located at the p_vaddr (0, i.e. the beginning of the vDSO) of the empty GNU_EH_FRAME segment. Cross- checking with /proc/365854/maps has also proven 7ffffffb0000 is the start of vDSO in the process VM image. In LoongArch the -fno-asynchronous-unwind-tables option seems just a MIPS legacy, and MIPS only uses this option to satisfy the MIPS-specific "genvdso" program, per the commit cfd75c2db17e ("MIPS: VDSO: Explicitly use -fno-asynchronous-unwind-tables"). IIRC it indicates some inherent limitation of the MIPS ELF ABI and has nothing to do with LoongArch. So we can simply flip it over to -fasynchronous-unwind-tables and pass --eh-frame-hdr for linking the vDSO, allowing the profilers to unwind the stack for statistics even if the sample point is taken when the PC is in the vDSO. However simply adjusting the options above would exploit an issue: when the libgcc unwinder saw the invalid GNU_EH_FRAME segment, it silently falled back to a machine-specific routine to match the code pattern of rt_sigreturn() and extract the registers saved in the sigframe if the code pattern is matched. As unwinding from signal handlers is vital for libgcc to support pthread cancellation etc., the fall-back routine had been silently keeping the LoongArch Linux systems functioning since Linux 5.19. But when we start to emit GNU_EH_FRAME with the correct format, fall-back routine will no longer be used and libgcc will fail to unwind the sigframe, and unwinding from signal handlers will no longer work, causing dozens of glibc test failures. To make it possible to unwind from signal handlers again, it's necessary to code the unwind info in __vdso_rt_sigreturn via .cfi_* directives. The offsets in the .cfi_* directives depend on the layout of struct sigframe, notably the offset of sigcontext in the sigframe. To use the offset in the assembly file, factor out struct sigframe into a header to allow asm-offsets.c to output the offset for assembly. To work around a long-term issue in the libgcc unwinder (the pc is unconditionally substracted by 1: doing so is technically incorrect for a signal frame), a nop instruction is included with the two real instructions in __vdso_rt_sigreturn in the same FDE PC range. The same hack has been used on x86 for a long time. Cc: stable@vger.kernel.org Fixes: c6b99bed6b8f ("LoongArch: Add VDSO and VSYSCALL support") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
11 daysLoongArch: Workaround LS2K/LS7A GPU DMA hang bugHuacai Chen
1. Hardware limitation: GPU, DC and VPU are typically PCI device 06.0, 06.1 and 06.2. They share some hardware resources, so when configure the PCI 06.0 device BAR1, DMA memory access cannot be performed through this BAR, otherwise it will cause hardware abnormalities. 2. In typical scenarios of reboot or S3/S4, DC access to memory through BAR is not prohibited, resulting in GPU DMA hangs. 3. Workaround method: When configuring the 06.0 device BAR1, turn off the memory access of DC, GPU and VPU (via DC's CRTC registers). Cc: stable@vger.kernel.org Signed-off-by: Qianhai Wu <wuqianhai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
11 daysLoongArch: Fix missing NULL checks for kstrdup()Li Jun
1. Replace "of_find_node_by_path("/")" with "of_root" to avoid multiple calls to "of_node_put()". 2. Fix a potential kernel oops during early boot when memory allocation fails while parsing CPU model from device tree. Cc: stable@vger.kernel.org Signed-off-by: Li Jun <lijun01@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: KVM: Fix typo issue in kvm_vm_init_features()Bibo Mao
Most of VM feature detections are integer OR operations, and integer assignment operation will clear previous integer OR operation. So here change all integer assignment operations to integer OR operations. Fixes: 82db90bf461b ("LoongArch: KVM: Move feature detection in kvm_vm_init_features()") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: BPF: Make arch_protect_bpf_trampoline() return 0Tiezhu Yang
Occasionally there exist "text_copy_cb: operation failed" when executing the bpf selftests, the reason is copy_to_kernel_nofault() failed and the ecode of ESTAT register is 0x4 (PME: Page Modification Exception) due to the pte is not writeable. The root cause is that there is another place to set the pte entry as readonly which is in the generic weak version of arch_protect_bpf_trampoline(). There are two ways to fix this race condition issue: the direct way is to modify the generic weak arch_protect_bpf_trampoline() to add a mutex lock for set_memory_rox(), but the other simple and proper way is to just make arch_protect_bpf_trampoline() return 0 in the arch-specific code because LoongArch has already use the BPF prog pack allocator for trampoline. Here are the trimmed kernel log messages: copy_to_kernel_nofault: memory access failed, ecode 0x4 copy_to_kernel_nofault: the caller is text_copy_cb+0x50/0xa0 text_copy_cb: operation failed ------------[ cut here ]------------ bpf_prog_pack bug: missing bpf_arch_text_invalidate? WARNING: kernel/bpf/core.c:1008 at bpf_prog_pack_free+0x200/0x228 ... Call Trace: [<9000000000248914>] show_stack+0x64/0x188 [<9000000000241308>] dump_stack_lvl+0x6c/0x9c [<90000000002705bc>] __warn+0x9c/0x200 [<9000000001c428c0>] __report_bug+0xa8/0x1c0 [<9000000001c42b5c>] report_bug+0x64/0x120 [<9000000001c7dcd0>] do_bp+0x270/0x3c0 [<9000000000246f40>] handle_bp+0x120/0x1c0 [<900000000047b030>] bpf_prog_pack_free+0x200/0x228 [<900000000047b2ec>] bpf_jit_binary_pack_free+0x24/0x60 [<900000000026989c>] bpf_jit_free+0x54/0xb0 [<900000000029e10c>] process_one_work+0x184/0x610 [<900000000029ef8c>] worker_thread+0x24c/0x388 [<90000000002a902c>] kthread+0x13c/0x170 [<9000000001c7dfe8>] ret_from_kernel_thread+0x28/0x1c0 [<9000000000246624>] ret_from_kernel_thread_asm+0xc/0x88 ---[ end trace 0000000000000000 ]--- Here is a simple shell script to reproduce: #!/bin/bash for ((i=1; i<=1000; i++)) do echo "Under testing $i ..." dmesg -c > /dev/null ./test_progs -t fentry_attach_stress > /dev/null dmesg -t | grep "text_copy_cb: operation failed" if [ $? -eq 0 ]; then break fi done Cc: stable@vger.kernel.org Fixes: 4ab17e762b34 ("LoongArch: BPF: Use BPF prog pack allocator") Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: No need to flush icache if text copy failedTiezhu Yang
If copy_to_kernel_nofault() failed, no need to flush icache and just return immediately. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: Check return values for set_memory_{rw,rox}Tiezhu Yang
set_memory_rw() and set_memory_rox() may fail, so we should check the return values and return immediately in larch_insn_text_copy(). Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: Give more information if kmem access failedTiezhu Yang
If memory access such as copy_{from, to}_kernel_nofault() failed, its users do not know what happened, so it is very useful to print the exception code for such cases. Furthermore, it is better to print the caller function to know where is the entry. Here are the low level call chains: copy_from_kernel_nofault() copy_from_kernel_nofault_loop() __get_kernel_nofault() copy_to_kernel_nofault() copy_to_kernel_nofault_loop() __put_kernel_nofault() Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: Fix calling smp_processor_id() in preemptible codeXi Ruoyao
Fix the warning: BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1 caller is larch_insn_text_copy+0x40/0xf0 Simply changing it to raw_smp_processor_id() is not enough: if preempt and CPU hotplug happens after raw_smp_processor_id() but before calling stop_machine(), the CPU where raw_smp_processor_id() has run may become offline when stop_machine() and no CPU will run copy_to_kernel_nofault() in text_copy_cb(). Thus guard the larch_insn_text_copy() calls with cpus_read_lock() and change stop_machine() to stop_machine_cpuslocked() to prevent this. I've considered moving the locks inside larch_insn_text_copy() but doing so seems not an easy hack. In bpf_arch_text_poke() obviously the memcpy() call must be guarded by text_mutex, so we have to leave the acquire of text_mutex out of larch_insn_text_copy(). But in the entire kernel the acquire of mutexes is always after cpus_read_lock(), so we cannot put cpus_read_lock() into larch_insn_text_copy() while leaving the text_mutex acquire out (or we risk a deadlock due to inconsistent lock acquire order). So let's fix the bug first and leave the posssible refactor as future work. Fixes: 9fbd18cf4c69 ("LoongArch: BPF: Add dynamic code modification support") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-16LoongArch: Only use SC.Q when supported by the assemblerThomas Weißschuh
The 128-bit atomic cmpxchg implementation uses the SC.Q instruction. Older versions of GNU AS do not support that instruction, erroring out: ERROR:root:{standard input}: Assembler messages: {standard input}:4831: Error: no match insn: sc.q $t0,$t1,$r14 {standard input}:6407: Error: no match insn: sc.q $t0,$t1,$r23 {standard input}:10856: Error: no match insn: sc.q $t0,$t1,$r14 make[4]: *** [../scripts/Makefile.build:289: mm/slub.o] Error 1 (Binutils 2.41) So test support for SC.Q in Kconfig and disable the atomics if the instruction is not available. Fixes: f0e4b1b6e295 ("LoongArch: Add 128-bit atomic cmpxchg support") Closes: https://lore.kernel.org/lkml/20260216082834-edc51c46-7b7a-4295-8ea5-4d9a3ca2224f@linutronix.de/ Reviewed-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Quite a large pull request, partly due to skipping last week and therefore having material from ~all submaintainers in this one. About a fourth of it is a new selftest, and a couple more changes are large in number of files touched (fixing a -Wflex-array-member-not-at-end compiler warning) or lines changed (reformatting of a table in the API documentation, thanks rST). But who am I kidding---it's a lot of commits and there are a lot of bugs being fixed here, some of them on the nastier side like the RISC-V ones. ARM: - Correctly handle deactivation of interrupts that were activated from LRs. Since EOIcount only denotes deactivation of interrupts that are not present in an LR, start EOIcount deactivation walk *after* the last irq that made it into an LR - Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM is already enabled -- not only thhis isn't possible (pKVM will reject the call), but it is also useless: this can only happen for a CPU that has already booted once, and the capability will not change - Fix a couple of low-severity bugs in our S2 fault handling path, affecting the recently introduced LS64 handling and the even more esoteric handling of hwpoison in a nested context - Address yet another syzkaller finding in the vgic initialisation, where we would end-up destroying an uninitialised vgic with nasty consequences - Address an annoying case of pKVM failing to boot when some of the memblock regions that the host is faulting in are not page-aligned - Inject some sanity in the NV stage-2 walker by checking the limits against the advertised PA size, and correctly report the resulting faults PPC: - Fix a PPC e500 build error due to a long-standing wart that was exposed by the recent conversion to kmalloc_obj(); rip out all the ugliness that led to the wart RISC-V: - Prevent speculative out-of-bounds access using array_index_nospec() in APLIC interrupt handling, ONE_REG regiser access, AIA CSR access, float register access, and PMU counter access - Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(), kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr() - Fix potential null pointer dereference in kvm_riscv_vcpu_aia_rmw_topei() - Fix off-by-one array access in SBI PMU - Skip THP support check during dirty logging - Fix error code returned for Smstateen and Ssaia ONE_REG interface - Check host Ssaia extension when creating AIA irqchip x86: - Fix cases where CPUID mitigation features were incorrectly marked as available whenever the kernel used scattered feature words for them - Validate _all_ GVAs, rather than just the first GVA, when processing a range of GVAs for Hyper-V's TLB flush hypercalls - Fix a brown paper bug in add_atomic_switch_msr() - Use hlist_for_each_entry_srcu() when traversing mask_notifier_list, to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu - Ensure AVIC VMCB fields are initialized if the VM has an in-kernel local APIC (and AVIC is enabled at the module level) - Update CR8 write interception when AVIC is (de)activated, to fix a bug where the guest can run in perpetuity with the CR8 intercept enabled - Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by default) an unintentional tightening of userspace ABI in 6.17, and provides some amount of backwards compatibility with hypervisors who want to freeze PMCs on VM-Entry - Validate the VMCS/VMCB on return to a nested guest from SMM, because either userspace or the guest could stash invalid values in memory and trigger the processor's consistency checks Generic: - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive Selftests: - Increase the maximum number of NUMA nodes in the guest_memfd selftest to 64 (from 8)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8 Documentation: kvm: fix formatting of the quirks table KVM: x86: clarify leave_smm() return value selftests: kvm: add a test that VMX validates controls on RSM selftests: kvm: extract common functionality out of smm_test.c KVM: SVM: check validity of VMCB controls when returning from SMM KVM: VMX: check validity of VMCS controls when returning from SMM KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers() KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr() KVM: x86: hyper-v: Validate all GVAs during PV TLB flush KVM: x86: synthesize CPUID bits only if CPU capability is set KVM: PPC: e500: Rip out "struct tlbe_ref" KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type KVM: selftests: Increase 'maxnode' for guest_memfd tests KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2() ...
2026-03-11Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.
2026-03-06Merge tag 'kbuild-fixes-7.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Split out .modinfo section from ELF_DETAILS macro, as that macro may be used in other areas that expect to discard .modinfo, breaking certain image layouts - Adjust genksyms parser to handle optional attributes in certain declarations, necessary after commit 07919126ecfc ("netfilter: annotate NAT helper hook pointers with __rcu") - Include resolve_btfids in external module build created by scripts/package/install-extmod-build when it may be run on external modules - Avoid removing objtool binary with 'make clean', as it is required for external module builds * tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Leave objtool binary around with 'make clean' kbuild: install-extmod-build: Package resolve_btfids if necessary genksyms: Fix parsing a declarator with a preceding attribute kbuild: Split .modinfo out from ELF_DETAILS
2026-02-28KVM: always define KVM_CAP_SYNC_MMUPaolo Bonzini
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIERPaolo Bonzini
All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-26kbuild: Split .modinfo out from ELF_DETAILSNathan Chancellor
Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-14Merge tag 'loongarch-7.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Select HAVE_CMPXCHG_{LOCAL,DOUBLE} - Add 128-bit atomic cmpxchg support - Add HOTPLUG_SMT implementation - Wire up memfd_secret system call - Fix boot errors and unwind errors for KASAN - Use BPF prog pack allocator and add BPF arena support - Update dts files to add nand controllers - Some bug fixes and other small changes * tag 'loongarch-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: dts: loongson-2k1000: Add nand controller support LoongArch: dts: loongson-2k0500: Add nand controller support LoongArch: BPF: Implement bpf_addr_space_cast instruction LoongArch: BPF: Implement PROBE_MEM32 pseudo instructions LoongArch: BPF: Use BPF prog pack allocator LoongArch: Use IS_ERR_PCPU() macro for KGDB LoongArch: Rework KASAN initialization for PTW-enabled systems LoongArch: Disable instrumentation for setup_ptwalker() LoongArch: Remove some extern variables in source files LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RT LoongArch: Handle percpu handler address for ORC unwinder LoongArch: Use %px to print unmodified unwinding address LoongArch: Prefer top-down allocation after arch_mem_init() LoongArch: Add HOTPLUG_SMT implementation LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODE LoongArch: Wire up memfd_secret system call LoongArch: Replace seq_printf() with seq_puts() for simple strings LoongArch: Add 128-bit atomic cmpxchg support LoongArch: Add detection for SC.Q support LoongArch: Select HAVE_CMPXCHG_LOCAL in Kconfig
2026-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Loongarch: - Add more CPUCFG mask bits - Improve feature detection - Add lazy load support for FPU and binary translation (LBT) register state - Fix return value for memory reads from and writes to in-kernel devices - Add support for detecting preemption from within a guest - Add KVM steal time test case to tools/selftests ARM: - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers - More pKVM fixes for features that are not supposed to be exposed to guests - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest - Preliminary work for guest GICv5 support - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures - A small set of FPSIMD cleanups - Selftest fixes addressing the incorrect alignment of page allocation - Other assorted low-impact fixes and spelling fixes RISC-V: - Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update() - Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM - Transparent huge page support for hypervisor page tables - Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables - Add RISC-V specific paging modes to KVM selftests - Detect paging mode at runtime for selftests s390: - Performance improvement for vSIE (aka nested virtualization) - Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host - Maintainership change for s390 vfio-pci - Small quality of life improvement for protected guests x86: - Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model). KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead. For more information, see the commit message for merge commit bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...") - Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN - Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace, - Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry - Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly - When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue - Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM - Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace - Grab SRCU when reading PDPTRs in KVM_GET_SREGS2 - Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration - Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs) - Clean up KVM's handling of marking mapped vCPU pages dirty - Drop a pile of *ancient* sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless - Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n - Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y - Add a regression test for recent APICv update fixes - Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information - Drop a dead function declaration - Minor cleanups x86 (Intel): - Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit. The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans - Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM - Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed - Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist - Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch x86 (AMD): - Drop a user-triggerable WARN on nested_svm_load_cr3() failure - Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) *must* flush the RAP - Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model - Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig - Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0 - Treat exit_code as an unsigned 64-bit value through all of KVM - Add support for fetching SNP certificates from userspace - Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2 - Misc fixes and cleanups x86 selftests: - Add a regression test for TPR<=>CR8 synchronization and IRQ masking - Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests) - Extend several nested VMX tests to also cover nested SVM - Add a selftest for nested VMLOAD/VMSAVE - Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain guest_memfd: - Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP - Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources - When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock Generic: - Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-12Merge tag 'mm-stable-2026-02-11-19-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...
2026-02-11Merge tag 'kvm-x86-pmu-6.20' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM mediated PMU support for 6.20 Add support for mediated PMUs, where KVM gives the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allows direct access to data MSRs and PMCs (restricted by the vPMU model), but intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. To keep overall complexity reasonable, mediated PMU usage is all or nothing for a given instance of KVM (controlled via module param). The Mediated PMU is disabled default, partly to maintain backwards compatilibity for existing setup, partly because there are tradeoffs when running with a mediated PMU that may be non-starters for some use cases, e.g. the host loses the ability to profile guests with mediated PMUs, the fastpath run loop is also a blind spot, entry/exit transitions are more expensive, etc. Versus the emulated PMU, where KVM is "just another perf user", the mediated PMU delivers more accurate profiling and monitoring (no risk of contention and thus dropped events), with significantly less overhead (fewer exits and faster emulation/programming of event selectors) E.g. when running Specint-2017 on a single-socket Sapphire Rapids with 56 cores and no-SMT, and using perf from within the guest: Perf command: a. basic-sampling: perf record -F 1000 -e 6-instructions -a --overwrite b. multiplex-sampling: perf record -F 1000 -e 10-instructions -a --overwrite Guest performance overhead: --------------------------------------------------------------------------- | Test case | emulated vPMU | all passthrough | passthrough with | | | | | event filters | --------------------------------------------------------------------------- | basic-sampling | 33.62% | 4.24% | 6.21% | --------------------------------------------------------------------------- | multiplex-sampling | 79.32% | 7.34% | 10.45% | ---------------------------------------------------------------------------
2026-02-10Merge tag 'x86_paravirt_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: - A nice cleanup to the paravirt code containing a unification of the paravirt clock interface, taming the include hell by splitting the pv_ops structure and removing of a bunch of obsolete code (Juergen Gross) * tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted() x86/paravirt: Remove trailing semicolons from alternative asm templates x86/pvlocks: Move paravirt spinlock functions into own header x86/paravirt: Specify pv_ops array in paravirt macros x86/paravirt: Allow pv-calls outside paravirt.h objtool: Allow multiple pv_ops arrays x86/xen: Drop xen_mmu_ops x86/xen: Drop xen_cpu_ops x86/xen: Drop xen_irq_ops x86/paravirt: Move pv_native_*() prototypes to paravirt.c x86/paravirt: Introduce new paravirt-base.h header x86/paravirt: Move paravirt_sched_clock() related code into tsc.c x86/paravirt: Use common code for paravirt_steal_clock() riscv/paravirt: Use common code for paravirt_steal_clock() loongarch/paravirt: Use common code for paravirt_steal_clock() arm64/paravirt: Use common code for paravirt_steal_clock() arm/paravirt: Use common code for paravirt_steal_clock() sched: Move clock related paravirt code to kernel/sched paravirt: Remove asm/paravirt_api_clock.h x86/paravirt: Move thunk macros to paravirt_types.h ...
2026-02-10Merge tag 'timers-vdso-2026-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO updates from Thomas Gleixner: - Provide the missing 64-bit variant of clock_getres() This allows the extension of CONFIG_COMPAT_32BIT_TIME to the vDSO and finally the removal of 32-bit time types from the kernel and UAPI. - Remove the useless and broken getcpu_cache from the VDSO The intention was to provide a trivial way to retrieve the CPU number from the VDSO, but as the VDSO data is per process there is no way to make it work. - Switch get/put_unaligned() from packed struct to memcpy() The packed struct violates strict aliasing rules which requires to pass -fno-strict-aliasing to the compiler. As this are scalar values __builtin_memcpy() turns them into simple loads and stores - Use __typeof_unqual__() for __unqual_scalar_typeof() The get/put_unaligned() changes triggered a new sparse warning when __beNN types are used with get/put_unaligned() as sparse builds add a special 'bitwise' attribute to them which prevents sparse to evaluate the Generic in __unqual_scalar_typeof(). Newer sparse versions support __typeof_unqual__() which avoids the problem, but requires a recent sparse install. So this adds a sanity check to sparse builds, which validates that sparse is available and capable of handling it. - Force inline __cvdso_clock_getres_common() Compilers sometimes un-inline agressively, which results in function call overhead and problems with automatic stack variable initialization. Interestingly enough the force inlining results in smaller code than the un-inlined variant produced by GCC when optimizing for size. * tag 'timers-vdso-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: vdso/gettimeofday: Force inlining of __cvdso_clock_getres_common() x86/percpu: Make CONFIG_USE_X86_SEG_SUPPORT work with sparse compiler: Use __typeof_unqual__() for __unqual_scalar_typeof() powerpc/vdso: Provide clock_getres_time64() tools headers: Remove unneeded ignoring of warnings in unaligned.h tools headers: Update the linux/unaligned.h copy with the kernel sources vdso: Switch get/put_unaligned() from packed struct to memcpy() parisc: Inline a type punning version of get_unaligned_le32() vdso: Remove struct getcpu_cache MIPS: vdso: Provide getres_time64() for 32-bit ABIs arm64: vdso32: Provide clock_getres_time64() ARM: VDSO: Provide clock_getres_time64() ARM: VDSO: Patch out __vdso_clock_getres() if unavailable x86/vdso: Provide clock_getres_time64() for x86-32 selftests: vDSO: vdso_test_abi: Add test for clock_getres_time64() selftests: vDSO: vdso_test_abi: Use UAPI system call numbers selftests: vDSO: vdso_config: Add configurations for clock_getres_time64() vdso: Add prototype for __vdso_clock_getres_time64()
2026-02-10LoongArch: dts: loongson-2k1000: Add nand controller supportBinbin Zhou
The module is supported, enable it. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: dts: loongson-2k0500: Add nand controller supportBinbin Zhou
The module is supported, enable it. Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Implement bpf_addr_space_cast instructionHengqi Chen
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space. rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY rY = addr_space_cast(rX, 1, 0) has to be converted by JIT. With this, the following test cases passed: $ ./test_progs -a arena_htab,arena_list,arena_strsearch,verifier_arena,verifier_arena_large #4/1 arena_htab/arena_htab_llvm:OK #4/2 arena_htab/arena_htab_asm:OK #4 arena_htab:OK #5/1 arena_list/arena_list_1:OK #5/2 arena_list/arena_list_1000:OK #5 arena_list:OK #7/1 arena_strsearch/arena_strsearch:OK #7 arena_strsearch:OK #507/1 verifier_arena/basic_alloc1:OK #507/2 verifier_arena/basic_alloc2:OK #507/3 verifier_arena/basic_alloc3:OK #507/4 verifier_arena/basic_reserve1:OK #507/5 verifier_arena/basic_reserve2:OK #507/6 verifier_arena/reserve_twice:OK #507/7 verifier_arena/reserve_invalid_region:OK #507/8 verifier_arena/iter_maps1:OK #507/9 verifier_arena/iter_maps2:OK #507/10 verifier_arena/iter_maps3:OK #507 verifier_arena:OK #508/1 verifier_arena_large/big_alloc1:OK #508/2 verifier_arena_large/access_reserved:OK #508/3 verifier_arena_large/request_partially_reserved:OK #508/4 verifier_arena_large/free_reserved:OK #508/5 verifier_arena_large/big_alloc2:OK #508 verifier_arena_large:OK Summary: 5/20 PASSED, 0 SKIPPED, 0 FAILED Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Implement PROBE_MEM32 pseudo instructionsHengqi Chen
Add support for `{LDX,STX,ST} | PROBE_MEM32 | {B,H,W,DW}` instructions. They are similar to PROBE_MEM instructions with the following differences: * PROBE_MEM32 supports store. * PROBE_MEM32 relies on the verifier to clear upper 32-bit of the src/dst register * PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S6 in the prologue). Due to bpf_arena constructions such S6 + reg + off16 access is guaranteed to be within arena virtual range, so no address check at run-time. * S6 is a free callee-saved register, so it is used to store arena_vm_start * PROBE_MEM32 allows ST and STX. If they fault the store is a nop. When LDX faults the destination register is zeroed. To support these on LoongArch, we employ the t2/t3 registers to store the intermediate results of reg_arena + src/dst reg and use the t2/t3 registers as the new src/dst reg. This allows us to reuse most of the existing code. Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: BPF: Use BPF prog pack allocatorHengqi Chen
Use bpf_jit_binary_pack_alloc() for BPF JIT binaries. The BPF prog pack allocator creates a pair of RW and RX buffers. The BPF JIT writes the program into the RW buffer. When the JIT is done, the program is copied to the final RX buffer with bpf_jit_binary_pack_finalize(). Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> Tested-by: Vincent Li <vincent.mc.li@gmail.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Use IS_ERR_PCPU() macro for KGDBCarlos López
In commit a759e37fb467 ("err.h: add ERR_PTR_PCPU(), PTR_ERR_PCPU() and IS_ERR_PCPU() macros"), specialized macros were added to check an error within a __percpu pointer, so use them instead of manually casting with __force, like all other users of register_wide_hw_breakpoint(). Signed-off-by: Carlos López <clopez@suse.de> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Rework KASAN initialization for PTW-enabled systemsTiezhu Yang
kasan_init_generic() indicates that kasan is fully initialized, so it should be put at end of kasan_init(). Otherwise bringing up the primary CPU failed when CONFIG_KASAN is set on PTW-enabled systems, here are the call chains: kernel_entry() start_kernel() setup_arch() kasan_init() kasan_init_generic() The reason is PTW-enabled systems have speculative accesses which means memory accesses to the shadow memory after kasan_init() may be executed by hardware before. However, accessing shadow memory is safe only after kasan fully initialized because kasan_init() uses a temporary PGD table until we have populated all levels of shadow page tables and writen the PGD register. Moving kasan_init_generic() later can defer the occasion of kasan_enabled(), so as to avoid speculative accesses on shadow pages. After moving kasan_init_generic() to the end, kasan_init() can no longer call kasan_mem_to_shadow() for shadow address conversion because it will always return kasan_early_shadow_page. On the other hand, we should keep the current logic of kasan_mem_to_shadow() for both the early and final stage because there may be instrumentation before kasan_init(). To solve this, we factor out a new mem_to_shadow() function from current kasan_mem_to_shadow() for the shadow address conversion in kasan_init(). Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Disable instrumentation for setup_ptwalker()Tiezhu Yang
According to Documentation/dev-tools/kasan.rst, software KASAN modes use compiler instrumentation to insert validity checks. Such instrumentation might be incompatible with some parts of the kernel, and therefore needs to be disabled, just use the attribute __no_sanitize_address to disable instrumentation for the low level function setup_ptwalker(). Otherwise bringing up the secondary CPUs failed when CONFIG_KASAN is set (especially when PTW is enabled), here are the call chains: smpboot_entry() start_secondary() cpu_probe() per_cpu_trap_init() tlb_init() setup_tlb_handler() setup_ptwalker() The reason is the PGD registers are configured in setup_ptwalker(), but KASAN instrumentation may cause TLB exceptions before that. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Remove some extern variables in source filesTiezhu Yang
There are declarations of the variable "eentry", "pcpu_handlers[]" and "exception_handlers[]" in asm/setup.h, the source files already include this header file directly or indirectly, so no need to declare them in the source files, just remove the code. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Guard percpu handler under !CONFIG_PREEMPT_RTTiezhu Yang
After commit 88fd2b70120d ("LoongArch: Fix sleeping in atomic context for PREEMPT_RT"), it should guard percpu handler under !CONFIG_PREEMPT_RT to avoid redundant operations. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Handle percpu handler address for ORC unwinderTiezhu Yang
After commit 4cd641a79e69 ("LoongArch: Remove unnecessary checks for ORC unwinder"), the system can not boot normally under some configs (such as enable KASAN), there are many error messages "cannot find unwind pc". The kernel boots normally with the defconfig, so no problem found out at the first time. Here is one way to reproduce: cd linux make mrproper defconfig -j"$(nproc)" scripts/config -e KASAN make olddefconfig all -j"$(nproc)" sudo make modules_install sudo make install sudo reboot The address that can not unwind is not a valid kernel address which is between "pcpu_handlers[cpu]" and "pcpu_handlers[cpu] + vec_sz" due to the code of eentry was copied to the new area of pcpu_handlers[cpu] in setup_tlb_handler(), handle this special case to get the valid address to unwind normally. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Use %px to print unmodified unwinding addressTiezhu Yang
Currently, use %p to prevent leaking information about the kernel memory layout when printing the PC address, but the kernel log messages are not useful to debug problem if bt_address() returns 0. Given that the type of "pc" variable is unsigned long, it should use %px to print the unmodified unwinding address. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Prefer top-down allocation after arch_mem_init()Huacai Chen
Currently we use bottom-up allocation after sparse_init(), the reason is sparse_init() need a lot of memory, and bottom-up allocation may exhaust precious low memory (below 4GB). On the other hand, SWIOTLB and CMA need low memories for DMA32, so swiotlb_init() and dma_contiguous_reserve() need bottom-up allocation. Since swiotlb_init() and dma_contiguous_reserve() are both called in arch_mem_init(), we no longer need bottom-up allocation after that. So we set the allocation policy to top-down at the end of arch_mem_init(), in order to avoid later memory allocations (such as KASAN) exhaust low memory. This solve at least two problems: 1. Some buggy BIOSes use 0xfd000000~0xfe000000 for secondary CPUs, but didn't reserve this range, which causes smpboot failures. 2. Some DMA32 devices, such as Loongson-DRM and OHCI, cannot work with KASAN enabled. Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add HOTPLUG_SMT implementationHuacai Chen
For benchmarking or debugging purpose, we usually want to control SMT via boot parameter and sysfs knobs. So add HOTPLUG_SMT implementation. 1. Boot parameters: nosmt: Disable SMT, can be enabled via sysfs knobs. nosmt=force: Disable SMT, cannot be enabled via sysfs knobs. 2. Runtime sysfs controls: Write "on", "off", "forceoff" or the number of SMT threads (1, 2, ...) to /sys/devices/system/cpu/smt/control. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Make cpumask_of_node() robust against NUMA_NO_NODEJohn Garry
The arch definition of cpumask_of_node() cannot handle NUMA_NO_NODE - which is a valid index - so add a check for this. Cc: stable@vger.kernel.org Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Wire up memfd_secret system callLain "Fearyncess" Yang
LoongArch supports ARCH_HAS_SET_DIRECT_MAP, therefore wire up the memfd_secret system call, which just depends on it. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Lain "Fearyncess" Yang <fearyncess@aosc.io> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Replace seq_printf() with seq_puts() for simple stringsGeorge Guo
Fix warnings like: "Prefer seq_puts to seq_printf" by checkpatch.pl. Replace seq_printf() calls with seq_puts() in show_cpuinfo() when outputting simple constant strings without format specifiers. This improves performance slightly as seq_puts() avoids parsing the format string. Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add 128-bit atomic cmpxchg supportGeorge Guo
Implement 128-bit atomic compare-and-exchange using LoongArch's LL.D/SC.Q instructions. At the same time, this fix the BPF scheduler test failures (scx_central and scx_qmap) caused by kmalloc_nolock_noprof() returning NULL, due to missing 128-bit atomics. The NULL returns lead to -ENOMEM errors during scheduler initialization, causing test cases to fail. Verified by testing with the scx_qmap scheduler (located in tools/sched_ext/). Building with `make` and running ./tools/sched_ext/build/bin/scx_qmap. Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=5fb750e8a9ae Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Co-developed-by:: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Add detection for SC.Q supportGeorge Guo
Check the CPUCFG2_SCQ bit to determine if the current CPU supports the SC.Q instruction. Reviewed-by: Hengqi Chen <hengqi.chen@gmail.com> Tested-by: Hengqi Chen <hengqi.chen@gmail.com> Co-developed-by: Yangyang Lian <lianyangyang@kylinos.cn> Signed-off-by: Yangyang Lian <lianyangyang@kylinos.cn> Signed-off-by: George Guo <guodongtai@kylinos.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-02-10LoongArch: Select HAVE_CMPXCHG_LOCAL in KconfigHuacai Chen
LoongArch has already implemented cmpxchg_local(), this_cpu_cmpxchg() and similar functions, so select HAVE_CMPXCHG_LOCAL in Kconfig to avoid incurring the overhead of local_irq_save()/local_irq_restore() for page state helpers in mm/vmstat.c. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>