summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2026-03-06x86/mm/tlb: Make enter_lazy_tlb() always inline on x86Xie Yuanbin
enter_lazy_tlb() on x86 is short enough, and is called in context switching, which is the hot code path. Make enter_lazy_tlb() always inline on x86 to optimize performance. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Xie Yuanbin <qq570070308@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20260216164950.147617-2-qq570070308@gmail.com
2026-03-05x86/microcode: Add platform mask to Intel microcode "old" listDave Hansen
Intel sometimes has CPUs with identical family/model/stepping but which need different microcode. These CPUs are differentiated with the platform ID. The Intel "microcode-20250512" release was used to generate the existing contents of intel-ucode-defs.h. Use that same release and add the platform mask to the definitions. This makes the list a few entries longer because some CPUs previously that shared a definition now need two or more. for example for the ancient Pentium III there are two CPUs that differ only in their platform and have two different microcode versions (note: .driver_data is the microcode version): { ..., .model = 0x05, .steppings = 0x0001, .platform_mask = 0x01, .driver_data = 0x40 }, { ..., .model = 0x05, .steppings = 0x0001, .platform_mask = 0x08, .driver_data = 0x45 }, Another example is the state-of-the-art Granite Rapids: { ..., .model = 0xad, .steppings = 0x0002, .platform_mask = 0x20, .driver_data = 0xa0000d1 }, { ..., .model = 0xad, .steppings = 0x0002, .platform_mask = 0x95, .driver_data = 0x10003a2 }, As you can see, this differentiation with platform ID has been necessary for a long time and is still relevant today. Without the platform matching, the old microcode table is incomplete. For instance, it might lead someone with a Pentium III, platform 0x0, and microcode 0x40 to think that they should have microcode 0x45, which is really only for platform 0x4 (.platform_mask==0x08). In practice, this meant that folks with fully updated microcode were seeing "Vulnerable" in the "old_microcode" file. 1. https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files Closes: https://lore.kernel.org/all/38660F8F-499E-48CD-B58B-4822228A5941@nutanix.com/ Fixes: 4e2c719782a8 ("x86/cpu: Help users notice when running old Intel microcode") Reported-by: Jon Kohler <jon@nutanix.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/all/3ECBB974-C6F0-47A7-94B6-3646347F1CC2@nutanix.com/ Link: https://patch.msgid.link/20260304181024.76E3F038@davehans-spike.ostc.intel.com
2026-03-05x86/cpu: Add platform ID to CPU matching structureDave Hansen
The existing x86_match_cpu() infrastructure can be used to match a bunch of attributes of a CPU: vendor, family, model, steppings and CPU features. But, there's one more attribute that's missing and unable to be matched against: the platform ID, enumerated on Intel CPUs in MSR_IA32_PLATFORM_ID. It is a little more obscure and is only queried during microcode loading. This is because Intel sometimes has CPUs with identical family/model/stepping but which need different microcode. These CPUs are differentiated with the platform ID. Add a field in 'struct x86_cpu_id' for the platform ID. Similar to the stepping field, make the new field a mask of platform IDs. Some examples: 0x01: matches only platform ID 0x0 0x02: matches only platform ID 0x1 0x03: matches platform IDs 0x0 or 0x1 0x80: matches only platform ID 0x7 0xff: matches all 8 possible platform IDs Since the mask is only a byte wide, it nestles in next to another u8 and does not even increase the size of 'struct x86_cpu_id'. Reserve the all 0's value as the wildcard (X86_PLATFORM_ANY). This avoids forcing changes to existing 'struct x86_cpu_id' users. They can just continue to fill the field with 0's and their matching will work exactly as before. Note: If someone is ever looking for space in 'struct x86_cpu_id', this new field could probably get stuck over in ->driver_data for the one user that there is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://patch.msgid.link/20260304181022.058DF07C@davehans-spike.ostc.intel.com
2026-03-05x86/cpu: Add platform ID to CPU info structureDave Hansen
The end goal here is to be able to do x86_match_cpu() and match on a specific platform ID. While it would be possible to stash this ID off somewhere or read it dynamically, that approaches would not be consistent with the other fields which can be matched. Read the platform ID and store it in cpuinfo_x86. There are lots of sites to set this new field. Place it near the place c->microcode is established since the platform ID is so closely intertwined with microcode updates. Note: This should not grow the size of 'struct cpuinfo_x86' in practice since the u8 fits next to another u8 in the structure. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://patch.msgid.link/20260304181020.8D518228@davehans-spike.ostc.intel.com
2026-03-05x86/microcode: Refactor platform ID enumeration into a helperDave Hansen
Today, the only code that cares about the platform ID is the microcode update code itself. To facilitate storing the platform ID in a more generic place and using it outside of the microcode update itself, put the enumeration into a helper function. Mirror intel_get_microcode_revision()'s naming and location. But, moving away from intel_collect_cpu_info() means that the model and family information in CPUID is not readily available. Just call CPUID again. Note that the microcode header is a mask of supported platform IDs. Only stick the ID part in the helper. Leave the 1<<id part in the microcode handling. Also note that the PII is weird. It does not really have a platform ID because it doesn't even have the MSR. Just consider it to be platform ID 0. Instead of saying >=PII, say <=PII. The PII is the real oddball here being the only CPU with Linux microcode updates but no platform ID. It's worth calling it out by name. This does subtly change the sig->pf for the PII though from 0x0 to 0x1. Make up for that by ignoring sig->pf when the microcode update platform mask is 0x0. [ dhansen: reflow comment for bpetkov ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Link: https://patch.msgid.link/20260304181018.EB6404F8@davehans-spike.ostc.intel.com
2026-03-05KVM: arm64: Fix page leak in user_mem_abort() on atomic faultFuad Tabba
When a guest performs an atomic/exclusive operation on memory lacking the required attributes, user_mem_abort() injects a data abort and returns early. However, it fails to release the reference to the host page acquired via __kvm_faultin_pfn(). A malicious guest could repeatedly trigger this fault, leaking host page references and eventually causing host memory exhaustion (OOM). Fix this by consolidating the early error returns to a new out_put_page label that correctly calls kvm_release_page_unused(). Fixes: 2937aeec9dc5 ("KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Link: https://patch.msgid.link/20260304162222.836152-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: nv: Inject a SEA if failed to read the descriptorZenghui Yu (Huawei)
Failure to read the descriptor (because it is outside of a memslot) should result in a SEA being injected in the guest. Suggested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/86ms1m9lp3.wl-maz@kernel.org Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-4-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: nv: Report addrsz fault at level 0 with a bad VTTBR.BADDRZenghui Yu (Huawei)
As per R_BFHQH, " When an Address size fault is generated, the reported fault code indicates one of the following: If the fault was generated due to the TTBR_ELx used in the translation having nonzero address bits above the OA size, then a fault at level 0. " Fix the reported Address size fault level as being 0 if the base address is wrongly programmed by L1. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-3-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: nv: Check S2 limits based on implemented PA sizeZenghui Yu (Huawei)
check_base_s2_limits() checks the validity of SL0 and inputsize against ia_size (inputsize again!) but the pseudocode from DDI0487 G.a AArch64.TranslationTableWalk() says that we should check against the implemented PA size. We would otherwise fail to walk S2 with a valid configuration. E.g., granule size = 4KB, inputsize = 40 bits, initial lookup level = 0 (no concatenation) on a system with 48 bits PA range supported is allowed by architecture. Fix it by obtaining PA size by kvm_get_pa_bits(). Note that kvm_get_pa_bits() returns the fixed limit now and should eventually reflect the per VM PARange (one day!). Given that the configured PARange should not be greater that kvm_ipa_limit, it at least fixes the problem described above. While at it, inject a level 0 translation fault to guest if check_base_s2_limits() fails, as per the pseudocode. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-2-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05x86/reboot: Execute the kernel restart handler upon machine restartMartin Schiller
SoC devices like the Intel / MaxLinear Lightning Mountain must be reset by the Reset Control Unit (RCU) instead of using "normal" x86 mechanisms like ACPI, BIOS, KBD, etc. Therefore, the RCU driver (reset-intel-gw) registers a restart handler which triggers the global reset signal. Unfortunately, this is of no use as long as the restart chain is not processed during reboot on x86 systems. That's why do_kernel_restart() must be called when a reboot is performed. This has long been common practice for other architectures. [ bp: Massage commit message. ] Signed-off-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260225-x86_do_kernel_restart-v2-1-81396cf3d44c@dev.tdt.de
2026-03-05KVM: arm64: pkvm: Fallback to level-3 mapping on host stage-2 faultMarc Zyngier
If, for any odd reason, we cannot converge to mapping size that is completely contained in a memblock region, we fail to install a S2 mapping and go back to the faulting instruction. Rince, repeat. This happens when faulting in regions that are smaller than a page or that do not have PAGE_SIZE-aligned boundaries (as witnessed on an O6 board that refuses to boot in protected mode). In this situation, fallback to using a PAGE_SIZE mapping anyway -- it isn't like we can go any lower. Fixes: e728e705802fe ("KVM: arm64: Adjust range correctly during host stage-2 faults") Link: https://lore.kernel.org/r/86wlzr77cn.wl-maz@kernel.org Cc: stable@vger.kernel.org Cc: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://patch.msgid.link/20260305132751.2928138-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: Eagerly init vgic dist/redist on vgic creationMarc Zyngier
If vgic_allocate_private_irqs_locked() fails for any odd reason, we exit kvm_vgic_create() early, leaving dist->rd_regions uninitialised. kvm_vgic_dist_destroy() then comes along and walks into the weeds trying to free the RDs. Got to love this stuff. Solve it by moving all the static initialisation early, and make sure that if we fail halfway, we're in a reasonable shape to perform the rest of the teardown. While at it, reset the vgic model on failure, just in case... Reported-by: syzbot+f6a46b038fc243ac0175@syzkaller.appspotmail.com Tested-by: syzbot+f6a46b038fc243ac0175@syzkaller.appspotmail.com Fixes: b3aa9283c0c50 ("KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic()") Link: https://lore.kernel.org/r/69a2d58c.050a0220.3a55be.003b.GAE@google.com Link: https://patch.msgid.link/20260228164559.936268-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2026-03-04Merge tag 'riscv-soc-fixes-for-v7.0-rc1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes RISC-V soc fixes for v7.0-rc1 drivers: Fix leaks in probe/init function teardown code in three drivers. microchip: Fix a warning introduced by a recent binding change, that made resets required on Polarfire SoC's CAN IP. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-soc-fixes-for-v7.0-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: cache: ax45mp: Fix device node reference leak in ax45mp_cache_init() cache: starfive: fix device node leak in starlink_cache_init() riscv: dts: microchip: add can resets to mpfs soc: microchip: mpfs: Fix memory leak in mpfs_sys_controller_probe() Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-03-04arm64: dts: qcom: monaco: Fix UART10 pinconfLoic Poulain
UART10 RTS and TX pins were incorrectly mapped to gpio84 and gpio85. Correct them to gpio85 (RTS) and gpio86 (TX) to match the hardware I/O mapping. Fixes: 467284a3097f ("arm64: dts: qcom: qcs8300: Add QUPv3 configuration") Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260202155611.1568-1-loic.poulain@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04x86/mm/pat: Convert split_large_page() to use ptdescsVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Update split_large_page() to allocate a ptdesc instead of allocating a page for use as a page table. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-5-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert populate_pgd() to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the remaining get_zeroed_page() calls to the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as both functions require it to identify kernel page tables. Because the generic implementations do not use the second argument, pass a placeholder to avoid reimplementing them or risking breakage on other architectures. It is not obvious whether these pages are freed. Regardless, convert the remaining free paths as needed, noting that the only other possible free paths have already been converted and that a frozen page table test kernel has not reported any issues. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-4-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pmd code to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PMD allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as pmd_alloc_one() requires it to identify kernel page tables. Because the generic implementation does not use the second argument, pass a placeholder to avoid reimplementing it or risking breakage on other architectures. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-3-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pte code to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PTE allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables; otherwise, pte_alloc_one_kernel() becomes a no-op. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-2-vishal.moola@gmail.com
2026-03-04arm64: make runtime const not usable by modulesJisheng Zhang
Similar as commit 284922f4c563 ("x86: uaccess: don't use runtime-const rewriting in modules") does, make arm64's runtime const not usable by modules too, to "make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations". The reason is well explained in the above commit: "The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code." Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-04x86/entry/vdso32: Work around libgcc unwinder bugH. Peter Anvin
The unwinder code in libgcc has a long standing bug which causes it to fail to pick up the signal frame CFI flag. This is a generic bug across all platforms. It affects the __kernel_sigreturn and __kernel_rt_sigreturn vdso entry points on i386. The x86-64 kernel doesn't provide a sigreturn stub, and so there is no kernel-provided code that is affected on x86-64. libgcc does have a legacy fallback path which happens to work as long as the bytes immediately before each of the sigreturn functions fall outside any function. This patch adds a nop before the ALIGN to each of the sigreturn stubs to ensure that this is, indeed, the case. The rest of the patch is just a comment which documents the invariants that need to be maintained for this legacy path to work correctly. This is a manifest bug: in the current vdso, __kernel_vsyscall is a multiple of 16 bytes long and thus __kernel_sigreturn does not have any padding in front of it. Closes: https://lore.kernel.org/lkml/f3412cc3e8f66d1853cc9d572c0f2fab076872b1.camel@xry111.site Fixes: 884961618ee5 ("x86/entry/vdso32: Remove open-coded DWARF in sigreturn.S") Reported-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124050 Link: https://patch.msgid.link/20260227010308.310342-1-hpa@zytor.com
2026-03-04x86/resctrl: Fix SNC detectionTony Luck
Now that the x86 topology code has a sensible nodes-per-package measure, that does not depend on the online status of CPUs, use this to divinate the SNC mode. Note that when Cluster on Die (CoD) is configured on older systems this will also show multiple NUMA nodes per package. Intel Resource Director Technology is incomaptible with CoD. Print a warning and do not use the fixup MSR_RMID_SNC_CONFIG. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Link: https://patch.msgid.link/aaCxbbgjL6OZ6VMd@agluck-desk3 Link: https://patch.msgid.link/20260303110100.367976706@infradead.org
2026-03-04x86/topo: Fix SNC topology messPeter Zijlstra
Per 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode"), the original crazy SNC-3 SLIT table was: node distances: node 0 1 2 3 4 5 0: 10 15 17 21 28 26 1: 15 10 15 23 26 23 2: 17 15 10 26 23 21 3: 21 28 26 10 15 17 4: 23 26 23 15 10 15 5: 26 23 21 17 15 10 And per: https://lore.kernel.org/lkml/20250825075642.GQ3245006@noisy.programming.kicks-ass.net/ The suggestion was to average the off-trace clusters to restore sanity. However, 4d6dd05d07d0 implements this under various assumptions: - anything GNR/CWF with numa_in_package; - there will never be more than 2 packages; - the off-trace cluster will have distance >20 And then HPE shows up with a machine that matches the Vendor-Family-Model checks but looks like this: Here's an 8 socket (2 chassis) HPE system with SNC enabled: node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0: 10 12 16 16 16 16 18 18 40 40 40 40 40 40 40 40 1: 12 10 16 16 16 16 18 18 40 40 40 40 40 40 40 40 2: 16 16 10 12 18 18 16 16 40 40 40 40 40 40 40 40 3: 16 16 12 10 18 18 16 16 40 40 40 40 40 40 40 40 4: 16 16 18 18 10 12 16 16 40 40 40 40 40 40 40 40 5: 16 16 18 18 12 10 16 16 40 40 40 40 40 40 40 40 6: 18 18 16 16 16 16 10 12 40 40 40 40 40 40 40 40 7: 18 18 16 16 16 16 12 10 40 40 40 40 40 40 40 40 8: 40 40 40 40 40 40 40 40 10 12 16 16 16 16 18 18 9: 40 40 40 40 40 40 40 40 12 10 16 16 16 16 18 18 10: 40 40 40 40 40 40 40 40 16 16 10 12 18 18 16 16 11: 40 40 40 40 40 40 40 40 16 16 12 10 18 18 16 16 12: 40 40 40 40 40 40 40 40 16 16 18 18 10 12 16 16 13: 40 40 40 40 40 40 40 40 16 16 18 18 12 10 16 16 14: 40 40 40 40 40 40 40 40 18 18 16 16 16 16 10 12 15: 40 40 40 40 40 40 40 40 18 18 16 16 16 16 12 10 10 = Same chassis and socket 12 = Same chassis and socket (SNC) 16 = Same chassis and adjacent socket 18 = Same chassis and non-adjacent socket 40 = Different chassis Turns out, the 'max 2 packages' thing is only relevant to the SNC-3 parts, the smaller parts do 8 sockets (like usual). The above SLIT table is sane, but violates the previous assumptions and trips a WARN. Now that the topology code has a sensible measure of nodes-per-package, we can use that to divinate the SNC mode at hand, and only fix up SNC-3 topologies. There is a 'healthy' amount of paranoia code validating the assumptions on the SLIT table, a simple pr_err(FW_BUG) print on failure and a fallback to using the regular table. Lets see how long this lasts :-) Fixes: 4d6dd05d07d0 ("sched/topology: Fix sched domain build error for GNR, CWF in SNC-3 mode") Reported-by: Kyle Meyer <kyle.meyer@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.238361290@infradead.org
2026-03-04x86/topo: Replace x86_has_numa_in_packagePeter Zijlstra
.. with the brand spanking new topology_num_nodes_per_package(). Having the topology setup determine this value during MADT/SRAT parsing before SMP bringup avoids having to detect this situation when building the SMP topology masks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.123701837@infradead.org
2026-03-04x86/topo: Add topology_num_nodes_per_package()Peter Zijlstra
Use the MADT and SRAT table data to compute __num_nodes_per_package. Specifically, SRAT has already been parsed in x86_numa_init(), which is called before acpi_boot_init() which parses MADT. So both are available in topology_init_possible_cpus(). This number is useful to divinate the various Intel CoD/SNC and AMD NPS modes, since the platforms are failing to provide this otherwise. Doing it this way is independent of the number of online CPUs and other such shenanigans. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110100.004091624@infradead.org
2026-03-04x86/numa: Store extra copy of numa_nodes_parsedPeter Zijlstra
The topology setup code needs to know the total number of physical nodes enumerated in SRAT; however NUMA_EMU can cause the existing numa_nodes_parsed bitmap to be fictitious. Therefore, keep a copy of the bitmap specifically to retain the physical node count. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110059.889884023@infradead.org
2026-03-04arm64: mm: Add PTE_DIRTY back to PAGE_KERNEL* to fix kexec/hibernationCatalin Marinas
Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") changed pte_mkwrite_novma() to only clear PTE_RDONLY when PTE_DIRTY is set. This was to allow writable-clean PTEs for swap pages that haven't actually been written. However, this broke kexec and hibernation for some platforms. Both go through trans_pgd_create_copy() -> _copy_pte(), which calls pte_mkwrite_novma() to make the temporary linear-map copy fully writable. With the updated pte_mkwrite_novma(), read-only kernel pages (without PTE_DIRTY) remain read-only in the temporary mapping. While such behaviour is fine for user pages where hardware DBM or trapping will make them writeable, subsequent in-kernel writes by the kexec relocation code will fault. Add PTE_DIRTY back to all _PAGE_KERNEL* protection definitions. This was the case prior to 5.4, commit aa57157be69f ("arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default"). With the kernel linear-map PTEs always having PTE_DIRTY set, pte_mkwrite_novma() correctly clears PTE_RDONLY. Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@vger.kernel.org Reported-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Link: https://lore.kernel.org/r/20251204062722.3367201-1-jianpeng.chang.cn@windriver.com Cc: Will Deacon <will@kernel.org> Cc: Huang, Ying <ying.huang@linux.alibaba.com> Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-04arm64: Silence sparse warnings caused by the type casting in (cmp)xchgCatalin Marinas
The arm64 xchg/cmpxchg() wrappers cast the arguments to (unsigned long) prior to invoking the static inline functions implementing the operation. Some restrictive type annotations (e.g. __bitwise) lead to sparse warnings like below: sparse warnings: (new ones prefixed by >>) fs/crypto/bio.c:67:17: sparse: sparse: cast from restricted blk_status_t >> fs/crypto/bio.c:67:17: sparse: sparse: cast to restricted blk_status_t Force the casting in the arm64 xchg/cmpxchg() wrappers to silence sparse. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602230947.uNRsPyBn-lkp@intel.com/ Link: https://lore.kernel.org/r/202602230947.uNRsPyBn-lkp@intel.com/ Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Will Deacon <will@kernel.org>
2026-03-04x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file pathsJan Stancek
CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different output directory (O=) the build currently fails because it can't find the filename set in CONFIG_EFI_SBAT_FILE: arch/x86/boot/compressed/sbat.S: Assembler messages: arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat Add $(srctree) as include dir for sbat.o. [ bp: Massage commit message. ] Fixes: 61b57d35396a ("x86/efi: Implement support for embedding SBAT data for x86") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com
2026-03-04powerpc/crash: adjust the elfcorehdr sizeSourabh Jain
With crash hotplug support enabled, additional memory is allocated to the elfcorehdr kexec segment to accommodate resources added during memory hotplug events. However, the kdump FDT is not updated with the same size, which can result in elfcorehdr corruption in the kdump kernel. Update elf_headers_sz (the kimage member representing the size of the elfcorehdr kexec segment) to reflect the total memory allocated for the elfcorehdr segment instead of the elfcorehdr buffer size at the time of kdump load. This allows of_kexec_alloc_and_setup_fdt() to reserve the full elfcorehdr memory in the kdump FDT and prevents elfcorehdr corruption. Fixes: 849599b702ef8 ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260227171801.2238847-1-sourabhjain@linux.ibm.com
2026-03-04powerpc/kexec/core: use big-endian types for crash variablesSourabh Jain
Use explicit word-sized big-endian types for kexec and crash related variables. This makes the endianness unambiguous and avoids type mismatches that trigger sparse warnings. The change addresses sparse warnings like below (seen on both 32-bit and 64-bit builds): CHECK ../arch/powerpc/kexec/core.c sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_base sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_size sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned long long static [addressable] [toplevel] mem_limit sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] kernel_end sparse: got restricted __be32 [usertype] No functional change intended. Fixes: ea961a828fe7 ("powerpc: Fix endian issues in kexec and crash dump code") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512221405.VHPKPjnp-lkp@intel.com/ Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251224151257.28672-1-sourabhjain@linux.ibm.com
2026-03-04powerpc/prom_init: Fixup missing #size-cells on PowerMac media-bay nodesRob Herring (Arm)
Similar to other PowerMac mac-io devices, the media-bay node is missing the "#size-cells" property. Depends-on: commit 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling") Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251029174047.1620073-1-robh@kernel.org
2026-03-04powerpc: dts: fsl: Drop unused .dtsi filesRob Herring (Arm)
These files are not included by anything and therefore don't get built or tested. There's also no upstream driver for the interlaken-lac stuff. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260128140222.1627203-1-robh@kernel.org
2026-03-04powerpc/uaccess: Fix inline assembly for clang build on PPC32Christophe Leroy (CS GROUP)
Test robot reports the following error with clang-16.0.6: In file included from kernel/rseq.c:75: include/linux/rseq_entry.h:141:3: error: invalid operand for instruction unsafe_get_user(offset, &ucs->post_commit_offset, efault); ^ include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user' arch_unsafe_get_user(x, ptr, local_label); \ ^ arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user' __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \ ^ arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto' __get_user_size_allowed(x, ptr, size, __gus_retval); \ ^ arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed' case 8: __get_user_asm2(x, (u64 __user *)ptr, retval); break; \ ^ arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2' " li %1+1,0\n" \ ^ <inline asm>:7:5: note: instantiated into assembly here li 31+1,0 ^ 1 error generated. On PPC32, for 64 bits vars a pair of registers is used. Usually the lower register in the pair is the high part and the higher register is the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31 In older kernel code inline assembly was using %1 and %1+1 to represent 64 bits values. However here it looks like clang uses r31 as high part, allthough r32 doesn't exist hence the error. Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so let's do the same here. With that change, the build doesn't fail anymore and a disassembly shows clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and r30/r31: Disassembly of section .fixup: 00000000 <.fixup>: 0: 38 a0 ff f2 li r5,-14 4: 3a 20 00 00 li r17,0 8: 3a 40 00 00 li r18,0 c: 48 00 00 00 b c <.fixup+0xc> c: R_PPC_REL24 .text+0xbc 10: 38 a0 ff f2 li r5,-14 14: 3b e0 00 00 li r31,0 18: 39 c0 00 00 li r14,0 1c: 48 00 00 00 b 1c <.fixup+0x1c> 1c: R_PPC_REL24 .text+0x144 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/ Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org
2026-03-04powerpc/e500: Always use 64 bits PTEChristophe Leroy
Today there are two PTE formats for e500: - The 64 bits format, used - On 64 bits kernel - On 32 bits kernel with 64 bits physical addresses - On 32 bits kernel with support of huge pages - The 32 bits format, used in other cases Maintaining two PTE formats means unnecessary maintenance burden because every change needs to be implemented and tested for both formats. Remove the 32 bits PTE format. The memory usage increase due to larger PTEs is minimal (approx. 0,1% of memory). This also means that from now on huge pages are supported also with 32 bits physical addresses. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/04a658209ea78dcc0f3dbde6b2c29cf1939adfe9.1767721208.git.chleroy@kernel.org
2026-03-03KVM/TDX: Rename KVM_SUPPORTED_TD_ATTRS to KVM_SUPPORTED_TDX_TD_ATTRSXiaoyao Li
Rename KVM_SUPPORTED_TD_ATTRS to KVM_SUPPORTED_TDX_TD_ATTRS to include "TDX" in the name, making it clear that it pertains to TDX. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-5-xiaoyao.li@intel.com
2026-03-03x86/tdx: Rename TDX_ATTR_* to TDX_TD_ATTR_*Xiaoyao Li
The macros TDX_ATTR_* and DEF_TDX_ATTR_* are related to TD attributes, which are TD-scope attributes. Naming them as TDX_ATTR_* can be somewhat confusing and might mislead people into thinking they are TDX global things. Rename TDX_ATTR_* to TDX_TD_ATTR_* to explicitly clarify they are TD-scope things. Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-4-xiaoyao.li@intel.com
2026-03-03KVM/TDX: Remove redundant definitions of TDX_TD_ATTR_*Xiaoyao Li
There are definitions of TD attributes bits inside asm/shared/tdx.h as TDX_ATTR_*. Remove KVM's definitions and use the ones in asm/shared/tdx.h Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-3-xiaoyao.li@intel.com
2026-03-03x86/tdx: Fix the typo in TDX_ATTR_MIGRTABLEXiaoyao Li
The TD scoped TDCS attributes are defined by bit positions. In the guest side of the TDX code, the 'tdx_attributes' string array holds pretty print names for these attributes, which are generated via macros and defines. Today these pretty print names are only used to print the attribute names to dmesg. Unfortunately there is a typo in the define for the migratable bit. Change the defines TDX_ATTR_MIGRTABLE* to TDX_ATTR_MIGRATABLE*. Update the sole user, the tdx_attributes array, to use the fixed name. Since these defines control the string printed to dmesg, the change is user visible. But the risk of breakage is almost zero since it is not exposed in any interface expected to be consumed programmatically. Fixes: 564ea84c8c14 ("x86/tdx: Dump attributes and TD_CTLS on boot") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-2-xiaoyao.li@intel.com
2026-03-03x86/cpu: Remove LASS restriction on EFISohil Mehta
The initial LASS enabling has been deferred to much later during boot, and EFI runtime services now run with LASS temporarily disabled. This removes LASS from the path of all EFI services. Remove the LASS restriction on EFI config, as the two can now coexist. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-4-sohil.mehta@intel.com
2026-03-03x86/efi: Disable LASS while executing runtime servicesSohil Mehta
Ideally, EFI runtime services should switch to kernel virtual addresses after SetVirtualAddressMap(). However, firmware implementations are known to be buggy in this regard and continue to access physical addresses. The kernel maintains a 1:1 mapping of all runtime services code and data regions to avoid breaking such firmware. LASS enforcement relies on bit 63 of the virtual address, which would block such accesses to the lower half. Unfortunately, not doing anything could lead to #GP faults when users update to a kernel with LASS enabled. One option is to use a STAC/CLAC pair to temporarily disable LASS data enforcement. However, there is no guarantee that the stray accesses would only touch data and not perform instruction fetches. Also, relying on the AC bit would depend on the runtime calls preserving RFLAGS, which is highly unlikely in practice. Instead, use the big hammer and switch off the entire LASS mechanism temporarily by clearing CR4.LASS. Runtime services are called in the context of efi_mm, which has explicitly unmapped any memory EFI isn't allowed to touch (including userspace). So, do this right after switching to efi_mm to avoid any security impact. Some runtime services can be invoked during boot when LASS isn't active. Use a global variable (similar to efi_mm) to save and restore the correct CR4.LASS state. The runtime calls are serialized with the efi_runtime_lock, so no concurrency issues are expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-3-sohil.mehta@intel.com
2026-03-03x86/cpu: Defer LASS enabling until userspace comes upSohil Mehta
LASS blocks any kernel access to the lower half of the virtual address space. Unfortunately, some EFI accesses happen during boot with bit 63 cleared, which causes a #GP fault when LASS is enabled. Notably, the SetVirtualAddressMap() call can only happen in EFI physical mode. Also, EFI_BOOT_SERVICES_CODE/_DATA could be accessed even after ExitBootServices(). The boot services memory is truly freed during efi_free_boot_services() after SVAM has completed. To prevent EFI from tripping LASS, at a minimum, LASS enabling must be deferred until EFI has completely finished entering virtual mode (including freeing boot services memory). Moving setup_lass() to arch_cpu_finalize_init() would do the trick, but that would make the implementation very fragile. Something else might come in the future that would need the LASS enabling to be moved again. In general, security features such as LASS provide limited value before userspace is up. They aren't necessary during early boot while only trusted ring0 code is executing. Introduce a generic late initcall to defer activating some CPU features until userspace is enabled. For now, only move the LASS CR4 programming to this initcall. As APs are already up by the time late initcalls run, some extra steps are needed to enable LASS on all CPUs. Use a CPU hotplug callback instead of on_each_cpu() or smp_call_function(). This ensures that LASS is enabled on every CPU that is currently online as well as any future CPUs that come online later. Note, even though hotplug callbacks run with preemption enabled, cr4_set_bits() would disable interrupts while updating CR4. Keep the existing logic in place to clear the LASS feature bits early. setup_clear_cpu_cap() must be called before boot_cpu_data is finalized and alternatives are patched. Eventually, the entire setup_lass() logic can go away once the restrictions based on vsyscall emulation and EFI are removed. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-2-sohil.mehta@intel.com
2026-03-03bpf, arm64: Use ORR-based MOV for general-purpose registersPuranjay Mohan
The A64_MOV macro unconditionally uses ADD Rd, Rn, #0 to implement register moves. While functionally correct, this is not the canonical encoding when both operands are general-purpose registers. On AArch64, MOV has two aliases depending on the operand registers: - MOV <Xd|SP>, <Xn|SP> → ADD <Xd|SP>, <Xn|SP>, #0 - MOV <Xd>, <Xn> → ORR <Xd>, XZR, <Xn> The ADD form is required when the stack pointer is involved (as ORR does not accept SP), while the ORR form is the preferred encoding for general-purpose registers. The ORR encoding is also measurably faster on modern microarchitectures. A microbenchmark [1] comparing dependent chains of MOV (ORR) vs ADD #0 on an ARM Neoverse-V2 (72-core, 3.4 GHz) shows: === mov (ORR Xd, XZR, Xn) === run1 cycles/op=0.749859456 run2 cycles/op=0.749991250 run3 cycles/op=0.749601847 avg cycles/op=0.749817518 === add0 (ADD Xd, Xn, #0) === run1 cycles/op=1.004777689 run2 cycles/op=1.004558266 run3 cycles/op=1.004806559 avg cycles/op=1.004714171 The ORR form completes in ~0.75 cycles/op vs ~1.00 cycles/op for ADD #0, a ~25% improvement. This is likely because the CPU's register renaming hardware can eliminate ORR-based moves, while ADD #0 must go through the ALU pipeline. Update A64_MOV to select the appropriate encoding at JIT time: use ADD when either register is A64_SP, and ORR (via aarch64_insn_gen_move_reg()) otherwise. Update verifier_private_stack selftests to expect "mov x7, x0" instead of "add x7, x0, #0x0" in the JITed instruction checks, matching the new ORR-based encoding. [1] https://github.com/puranjaymohan/scripts/blob/main/arm64/bench/run_mov_vs_add0.sh Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260225134339.2723288-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03bpf,s390: add fsession support for trampolinesMenglong Dong
Implement BPF_TRACE_FSESSION support for s390. The logic here is similar to what we did in x86_64. In order to simply the logic, we factor out the function invoke_bpf() for fentry and fexit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260224092208.1395085-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03bpf,s390: introduce emit_store_stack_imm64() for trampolineMenglong Dong
Introduce a helper to store 64-bit immediate on the trampoline stack with a help of a register. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20260224092208.1395085-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03s390: Introduce bpf_get_lowcore() kfuncIlya Leoshkevich
Implementing BPF version of preempt_count() requires accessing lowcore from BPF. Since lowcore can be relocated, open-coding (struct lowcore *)0 does not work, so add a kfunc. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20260217160813.100855-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-03sparc/PCI: Initialize msi_addr_mask for OF-created PCI devicesNilay Shroff
Recent changes replaced the use of no_64bit_msi with msi_addr_mask, which is now expected to be initialized to DMA_BIT_MASK(64) during PCI device setup. On SPARC systems, this initialization was inadvertently missed for devices instantiated from device tree nodes, leaving msi_addr_mask unset for OF-created pci_dev instances. As a result, MSI address validation fails during probe, causing affected devices to fail initialization. Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev() so that MSI address validation succeeds and PCI device probing works as expected. Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Han Gao <gaohan@iscas.ac.cn> # SPARC Enterprise T5220 Tested-by: Nathaniel Roach <nroach44@nroach44.id.au> # SPARC T5-2 Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/20260220070239.1693303-3-nilay@linux.ibm.com
2026-03-03powerpc/pci: Initialize msi_addr_mask for OF-created PCI devicesNilay Shroff
Recent changes replaced the use of no_64bit_msi with msi_addr_mask. As a result, msi_addr_mask is now expected to be initialized to DMA_BIT_MASK(64) when a pci_dev is set up. However, this initialization was missed on powerpc due to differences in the device initialization path compared to other (x86) architecture. Due to this, now PCI device probe method fails on powerpc system. On powerpc systems, struct pci_dev instances are created from device tree nodes via of_create_pci_dev(). Because msi_addr_mask was not initialized there, it remained zero. Later, during MSI setup, msi_verify_entries() validates the programmed MSI address against pdev->msi_addr_mask. Since the mask was not set correctly, the validation fails, causing PCI driver probe failures for devices on powerpc systems. Initialize pdev->msi_addr_mask to DMA_BIT_MASK(64) in of_create_pci_dev() so that MSI address validation succeeds and device probe works as expected. Fixes: 386ced19e9a3 ("PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address mask") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Tested-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260220070239.1693303-2-nilay@linux.ibm.com
2026-03-03s390/stackleak: Fix __stackleak_poison() inline assembly constraintHeiko Carstens
The __stackleak_poison() inline assembly comes with a "count" operand where the "d" constraint is used. "count" is used with the exrl instruction and "d" means that the compiler may allocate any register from 0 to 15. If the compiler would allocate register 0 then the exrl instruction would not or the value of "count" into the executed instruction - resulting in a stackframe which is only partially poisoned. Use the correct "a" constraint, which excludes register 0 from register allocation. Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Improve inline assembly constraintsHeiko Carstens
The inline assembly constraint for the "bytes" operand is "d" for all xor() inline assemblies. "d" means that any register from 0 to 15 can be used. If the compiler would use register 0 then the exrl instruction would not or the value of "bytes" into the executed instruction - resulting in an incorrect result. However all the xor() inline assemblies make hard-coded use of register 0, and it is correctly listed in the clobber list, so that this cannot happen. Given that this is quite subtle use the better "a" constraint, which excludes register 0 from register allocation in any case. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-3-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2026-03-03s390/xor: Fix xor_xc_2() inline assembly constraintsHeiko Carstens
The inline assembly constraints for xor_xc_2() are incorrect. "bytes", "p1", and "p2" are input operands, while all three of them are modified within the inline assembly. Given that the function consists only of this inline assembly it seems unlikely that this may cause any problems, however fix this in any case. Fixes: 2cfc5f9ce7f5 ("s390/xor: optimized xor routing using the XC instruction") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>