summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
13 daysx86/sgx: Remove unmatched quote in __sgx_encl_extend function commentThorsten Blum
There is no opening quote. Remove the unmatched closing quote. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251210125628.544916-1-thorsten.blum@linux.dev
2025-12-09Merge tag 'hyperv-next-signed-20251207' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...
2025-12-06Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko) fixes a build issue and does some cleanup in ib/sys_info.c - "Implement mul_u64_u64_div_u64_roundup()" (David Laight) enhances the 64-bit math code on behalf of a PWM driver and beefs up the test module for these library functions - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich) makes BPF symbol names, sizes, and line numbers available to the GDB debugger - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang) adds a sysctl which can be used to cause additional info dumping when the hung-task and lockup detectors fire - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu) adds a general base64 encoder/decoder to lib/ and migrates several users away from their private implementations - "rbree: inline rb_first() and rb_last()" (Eric Dumazet) makes TCP a little faster - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin) reworks the KEXEC Handover interfaces in preparation for Live Update Orchestrator (LUO), and possibly for other future clients - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin) increases the flexibility of KEXEC Handover. Also preparation for LUO - "Live Update Orchestrator" (Pasha Tatashin) is a major new feature targeted at cloud environments. Quoting the cover letter: This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. Mike Rappaport merits a mention here, for his extensive review and testing work. - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain) moves the kexec and kdump sysfs entries from /sys/kernel/ to /sys/kernel/kexec/ and adds back-compatibility symlinks which can hopefully be removed one day - "kho: fixes for vmalloc restoration" (Mike Rapoport) fixes a BUG which was being hit during KHO restoration of vmalloc() regions * tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits) calibrate: update header inclusion Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()" vmcoreinfo: track and log recoverable hardware errors kho: fix restoring of contiguous ranges of order-0 pages kho: kho_restore_vmalloc: fix initialization of pages array MAINTAINERS: TPM DEVICE DRIVER: update the W-tag init: replace simple_strtoul with kstrtoul to improve lpj_setup KHO: fix boot failure due to kmemleak access to non-PRESENT pages Documentation/ABI: new kexec and kdump sysfs interface Documentation/ABI: mark old kexec sysfs deprecated kexec: move sysfs entries to /sys/kernel/kexec test_kho: always print restore status kho: free chunks using free_page() instead of kfree() selftests/liveupdate: add kexec test for multiple and empty sessions selftests/liveupdate: add simple kexec-based selftest for LUO selftests/liveupdate: add userspace API selftests docs: add documentation for memfd preservation via LUO mm: memfd_luo: allow preserving memfd liveupdate: luo_file: add private argument to store runtime state mm: shmem: export some functions to internal.h ...
2025-12-05Merge tag 'soc-drivers-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-05mshv: Cleanly shutdown root partition with MSHVPraveen K Paladugu
Root partitions running on MSHV currently attempt ACPI power-off, which MSHV intercepts and triggers a Machine Check Exception (MCE), leading to a kernel panic. Root partitions panic with a trace similar to: [ 81.306348] reboot: Power down [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001 [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9 [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt [ 81.314717] Kernel panic - not syncing: Fatal machine check To avoid this, configure the sleep state in the hypervisor and invoke the HVCALL_ENTER_SLEEP_STATE hypercall as the final step in the shutdown sequence. This ensures a clean and safe shutdown of the root partition. Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-12-02Merge tag 'x86_cpu_for_6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU feature updates from Dave Hansen: "The biggest thing of note here is Linear Address Space Separation (LASS). It represents the first time I can think of that the upper=>kernel/lower=>user address space convention is actually recognized by the hardware on x86. It ensures that userspace can not even get the hardware to _start_ page walks for the kernel address space. This, of course, is a really nice generic side channel defense. This is really only a down payment on LASS support. There are still some details to work out in its interaction with EFI calls and vsyscall emulation. For now, LASS is disabled if either of those features is compiled in (which is almost always the case). There's also one straggler commit in here which converts an under-utilized AMD CPU feature leaf into a generic Linux-defined leaf so more feature can be packed in there. Summary: - Enable Linear Address Space Separation (LASS) - Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined" * tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Enable LASS during CPU initialization selftests/x86: Update the negative vsyscall tests to expect a #GP x86/traps: Communicate a LASS violation in #GP message x86/kexec: Disable LASS during relocate kernel x86/alternatives: Disable LASS when patching kernel code x86/asm: Introduce inline memcpy and memset x86/cpu: Add an LASS dependency on SMAP x86/cpufeatures: Enumerate the LASS feature bits x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
2025-12-02Merge tag 'x86_misc_for_6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
2025-12-02Merge tag 'x86_sgx_for_6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Dave HansenL "The main content here is adding support for the new EUPDATESVN SGX ISA. Before this, folks who updated microcode had to reboot before enclaves could attest to the new microcode. The new functionality lets them do this without a reboot. The rest are some nice, but relatively mundane comment and kernel-doc fixups. Summary: - Allow security version (SVN) updates so enclaves can attest to new microcode - Fix kernel docs typos" * tag 'x86_sgx_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Fix a typo in the kernel-doc comment for enum sgx_attribute x86/sgx: Remove superfluous asterisk from copyright comment in asm/sgx.h x86/sgx: Document structs and enums with '@', not '%' x86/sgx: Add kernel-doc descriptions for params passed to vDSO user handler x86/sgx: Add a missing colon in kernel-doc markup for "struct sgx_enclave_run" x86/sgx: Enable automatic SVN updates for SGX enclaves x86/sgx: Implement ENCLS[EUPDATESVN] x86/sgx: Define error codes for use by ENCLS[EUPDATESVN] x86/cpufeatures: Add X86_FEATURE_SGX_EUPDATESVN feature flag x86/sgx: Introduce functions to count the sgx_(vepc_)open()
2025-12-02Merge tag 'x86_bugs_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU mitigation updates from Borislav Petkov: - Convert the tsx= cmdline parsing to use early_param() - Cleanup forward declarations gunk in bugs.c * tag 'x86_bugs_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Get rid of the forward declarations x86/tsx: Get the tsx= command line parameter with early_param() x86/tsx: Make tsx_ctrl_state static
2025-12-02Merge tag 'x86_cleanups_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The mandatory pile of cleanups the cat drags in every merge window * tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Clean up whitespace in a20.c x86/mm: Delete disabled debug code x86/{boot,mtrr}: Remove unused function declarations x86/percpu: Use BIT_WORD() and BIT_MASK() macros x86/cpufeatures: Correct LKGS feature flag description x86/idtentry: Add missing '*' to kernel-doc lines
2025-12-02Merge tag 'x86_cache_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add support for AMD's Smart Data Cache Injection feature which allows for direct insertion of data from I/O devices into the L3 cache, thus bypassing DRAM and saving its bandwidth; the resctrl side of the feature allows the size of the L3 used for data injection to be controlled - Add Intel Clearwater Forest to the list of CPUs which support Sub-NUMA clustering - Other fixes and cleanups * tag 'x86_cache_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Update bit_usage to reflect io_alloc fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks fs/resctrl: Modify struct rdt_parse_data to pass mode and CLOSID fs/resctrl: Introduce interface to display io_alloc CBMs fs/resctrl: Add user interface to enable/disable io_alloc feature fs/resctrl: Introduce interface to display "io_alloc" support x86,fs/resctrl: Implement "io_alloc" enable/disable handlers x86,fs/resctrl: Detect io_alloc feature x86/resctrl: Add SDCIAE feature in the command line options x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation Enforcement fs/resctrl: Consider sparse masks when initializing new group's allocation x86/resctrl: Support Sub-NUMA Cluster (SNC) mode on Clearwater Forest
2025-12-02Merge tag 'x86_microcode_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: - Add microcode staging support on Intel: it moves the sole microcode blobs loading to a non-critical path so that microcode loading latencies are kept at minimum. The actual "directing" the hardware to load microcode is the only step which is done on the critical path. This scheme is also opportunistic as in: on a failure, the machinery falls back to normal loading - Add the capability to the AMD side of the loader to select one of two per-family/model/stepping patches: one is pre-Entrysign and the other is post-Entrysign; with the goal to take care of machines which haven't updated their BIOS yet - something they should absolutely do as this is the only proper Entrysign fix - Other small cleanups and fixlets * tag 'x86_microcode_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Mark early_parse_cmdline() as __init x86/microcode/AMD: Select which microcode patch to load x86/microcode/intel: Enable staging when available x86/microcode/intel: Support mailbox transfer x86/microcode/intel: Implement staging handler x86/microcode/intel: Define staging state struct x86/microcode/intel: Establish staging control logic x86/microcode: Introduce staging step to reduce late-loading time x86/cpu/topology: Make primary thread mask available with SMP=n
2025-12-02Merge tag 'ras_core_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - The second part of the AMD MCA interrupts rework after the last-minute show-stopper from the last merge window was sorted out. After this, the AMD MCA deferred errors, thresholding and corrected errors interrupt handlers use common MCA code and are tightly integrated into the core MCA code, thereby getting rid of considerable duplication. All culminating into allowing CMCI error thresholding storms to be detected at AMD too, using the common infrastructure - Add support for two new MCA bank bits on AMD Zen6 which denote whether the error address logged is a system physical address, which obviates the need for it to be translated before further error recovery can be done * tag 'ras_core_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle AMD threshold interrupt storms x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systems x86/mce: Add support for physical address valid bit x86/mce: Save and use APEI corrected threshold limit x86/mce/amd: Define threshold restart function for banks x86/mce/amd: Remove redundant reset_block() x86/mce/amd: Support SMCA Corrected Error Interrupt x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems x86/mce: Unify AMD DFR handler with MCA Polling x86/mce: Unify AMD THR handler with MCA Polling
2025-11-27vmcoreinfo: track and log recoverable hardware errorsBreno Leitao
Introduce a generic infrastructure for tracking recoverable hardware errors (HW errors that are visible to the OS but does not cause a panic) and record them for vmcore consumption. This aids post-mortem crash analysis tools by preserving a count and timestamp for the last occurrence of such errors. On the other side, correctable errors, which the OS typically remains unaware of because the underlying hardware handles them transparently, are less relevant for crash dump and therefore are NOT tracked in this infrastructure. Add centralized logging for sources of recoverable hardware errors based on the subsystem it has been notified. hwerror_data is write-only at kernel runtime, and it is meant to be read from vmcore using tools like crash/drgn. For example, this is how it looks like when opening the crashdump from drgn. >>> prog['hwerror_data'] (struct hwerror_info[1]){ { .count = (int)844, .timestamp = (time64_t)1752852018, }, ... This helps fleet operators quickly triage whether a crash may be influenced by hardware recoverable errors (which executes a uncommon code path in the kernel), especially when recoverable errors occurred shortly before a panic, such as the bug fixed by commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") This is not intended to replace full hardware diagnostics but provides a fast way to correlate hardware events with kernel panics quickly. Rare machine check exceptions—like those indicated by mce_flags.p5 or mce_flags.winchip—are not accounted for in this method, as they fall outside the intended usage scope for this feature's user base. [leitao@debian.org: add hw-recoverable-errors to toctree] Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Tony Luck <tony.luck@intel.com> Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI] Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Bob Moore <robert.moore@intel.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Len Brown <lenb@kernel.org> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-26Merge tag 'kvm-x86-svm-6.19' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.19: - Fix a few missing "VMCB dirty" bugs. - Fix the worst of KVM's lack of EFER.LMSLE emulation. - Add AVIC support for addressing 4k vCPUs in x2AVIC mode. - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions. - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT. - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3. - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM.
2025-11-26Merge tag 'kvm-x86-misc-6.19' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.19: - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM. - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected. - Leave the user-return notifier used to restore MSRs registered when disabling virtualization, and instead pin kvm.ko. Restoring host MSRs via IPI callback is either pointless (clean reboot) or dangerous (forced reboot) since KVM has no idea what code it's interrupting. - Use the checked version of {get,put}_user(), as Linus wants to kill them off, and they're measurably faster on modern CPUs due to the unchecked versions containing an LFENCE. - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host. - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NPT corrections. - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS. - Context switch XCR0, XSS, and PKRU outside of the entry/exit fastpath as the only reason they were handled in the faspath was to paper of a bug in the core #MC code that has long since been fixed. - Add emulator support for AVX MOV instructions to play nice with emulated devices whose PCI BARs guest drivers like to access with large multi-byte instructions.
2025-11-22x86/{boot,mtrr}: Remove unused function declarationsYue Haibing
Commits 28be1b454c2b ("x86/boot: Remove unused copy_*_gs() functions") 34d2819f2078 ("x86, mtrr: Remove unused mtrr/state.c") removed the functions but left the prototypes. Remove them. [ bp: Merge into a single patch. ] Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251120121037.1479334-1-yuehaibing@huawei.com
2025-11-21x86,fs/resctrl: Implement "io_alloc" enable/disable handlersBabu Moger
"io_alloc" is the generic name of the new resctrl feature that enables system software to configure the portion of cache allocated for I/O traffic. On AMD systems, "io_alloc" resctrl feature is backed by AMD's L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). Introduce the architecture-specific functions that resctrl fs should call to enable, disable, or check status of the "io_alloc" feature. Change SDCIAE state by setting (to enable) or clearing (to disable) bit 1 of MSR_IA32_L3_QOS_EXT_CFG on all logical processors within the cache domain. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/9e9070100c320eab5368e088a3642443dee95ed7.1762995456.git.babu.moger@amd.com
2025-11-21x86,fs/resctrl: Detect io_alloc featureBabu Moger
AMD's SDCIAE (SDCI Allocation Enforcement) PQE feature enables system software to control the portions of L3 cache used for direct insertion of data from I/O devices into the L3 cache. Introduce a generic resctrl cache resource property "io_alloc_capable" as the first part of the new "io_alloc" resctrl feature that will support AMD's SDCIAE. Any architecture can set a cache resource as "io_alloc_capable" if a portion of the cache can be allocated for I/O traffic. Set the "io_alloc_capable" property for the L3 cache resource on x86 (AMD) systems that support SDCIAE. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/df85a9a6081674fd3ef6b4170920485512ce2ded.1762995456.git.babu.moger@amd.com
2025-11-21x86/resctrl: Add SDCIAE feature in the command line optionsBabu Moger
Add a kernel command-line parameter to enable or disable the exposure of the L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE) hardware feature to resctrl. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/c623edf7cb369ba9da966de47d9f1b666778a40e.1762995456.git.babu.moger@amd.com
2025-11-21x86/cpufeatures: Add support for L3 Smart Data Cache Injection Allocation ↵Babu Moger
Enforcement Smart Data Cache Injection (SDCI) is a mechanism that enables direct insertion of data from I/O devices into the L3 cache. By directly caching data from I/O devices rather than first storing the I/O data in DRAM, SDCI reduces demands on DRAM bandwidth and reduces latency to the processor consuming the I/O data. The SDCIAE (SDCI Allocation Enforcement) PQE feature allows system software to control the portion of the L3 cache used for SDCI. When enabled, SDCIAE forces all SDCI lines to be placed into the L3 cache partitions identified by the highest-supported L3_MASK_n register, where n is the maximum supported CLOSID. Add CPUID feature bit that can be used to configure SDCIAE. The SDCIAE feature details are documented in: AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.4.7 L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE). available at https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/83ca10d981c48e86df2c3ad9658bb3ba3544c763.1762995456.git.babu.moger@amd.com
2025-11-21x86/mce: Handle AMD threshold interrupt stormsSmita Koralahalli
Extend the logic of handling CMCI storms to AMD threshold interrupts. Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on a storm. Rather, disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors (30, as programmed by Intel). Turning off the threshold interrupts would be a better solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled. [ Tony: Small tweak because mce_handle_storm() isn't a pointer now ] [ Yazen: Rebase and simplify ] [ Avadhut: Remove check to not clear bank's bit in mce_poll_banks and fix checkpatch warnings. ] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251121190542.2447913-3-avadhut.naik@amd.com
2025-11-21x86/mce: Do not clear bank's poll bit in mce_poll_banks on AMD SMCA systemsAvadhut Naik
Currently, when a CMCI storm detected on a Machine Check bank, subsides, the bank's corresponding bit in the mce_poll_banks per-CPU variable is cleared unconditionally by cmci_storm_end(). On AMD SMCA systems, this essentially disables polling on that particular bank on that CPU. Consequently, any subsequent correctable errors or storms will not be logged. Since AMD SMCA systems allow banks to be managed by both polling and interrupts, the polling banks bitmap for a CPU, i.e., mce_poll_banks, should not be modified when a storm subsides. Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251121190542.2447913-2-avadhut.naik@amd.com
2025-11-21x86/mce: Add support for physical address valid bitAvadhut Naik
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a valid System Physical Address (SPA) is present in MCA_ADDR. PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or if it indicates validity of SPA in MCA_ADDR. PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid SPA or if it is implementation specific. Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251118191731.181269-1-avadhut.naik@amd.com
2025-11-21x86/mce: Save and use APEI corrected threshold limitYazen Ghannam
The MCA threshold limit generally is not something that needs to change during runtime. It is common for a system administrator to decide on a policy for their managed systems. If MCA thresholding is OS-managed, then the threshold limit must be set at every boot. However, many systems allow the user to set a value in their BIOS. And this is reported through an APEI HEST entry even if thresholding is not in FW-First mode. Use this value, if available, to set the OS-managed threshold limit. Users can still override it through sysfs if desired for testing or debug. APEI is parsed after MCE is initialized. So reset the thresholding blocks later to pick up the threshold limit. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-18x86/bugs: Use an x86 feature to track the MMIO Stale Data mitigationSean Christopherson
Convert the MMIO Stale Data mitigation tracking from a static branch into an x86 feature flag so that it can be used via ALTERNATIVE_2 in KVM. No functional change intended. Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18x86/bugs: Use VM_CLEAR_CPU_BUFFERS in VMX as wellPawan Gupta
TSA mitigation: d8010d4ba43e ("x86/bugs: Add a Transient Scheduler Attacks mitigation") introduced VM_CLEAR_CPU_BUFFERS for guests on AMD CPUs. Currently on Intel CLEAR_CPU_BUFFERS is being used for guests which has a much broader scope (kernel->user also). Make mitigations on Intel consistent with TSA. This would help handling the guest-only mitigations better in future. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> [sean: make CLEAR_CPU_BUF_VM mutually exclusive with the MMIO mitigation] Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Brendan Jackman <jackmanb@google.com> Link: https://patch.msgid.link/20251113233746.1703361-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-18x86/cpu: Enable LASS during CPU initializationSohil Mehta
Linear Address Space Separation (LASS) mitigates a class of side-channel attacks that rely on speculative access across the user/kernel boundary. Enable LASS along with similar security features if the platform supports it. While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup instead of updating it, as the whole sequence is quite self-explanatory. Some EFI runtime and boot services may rely on 1:1 mappings in the lower half during early boot and even after SetVirtualAddressMap(). To avoid tripping LASS, the initial CR4 programming would need to be delayed until EFI has completely finished entering virtual mode (including efi_free_boot_services()). Also, LASS would need to be temporarily disabled while switching to efi_mm to avoid potential faults on stray runtime accesses. Similarly, legacy vsyscall page accesses are flagged by LASS resulting in a #GP (instead of a #PF). Without LASS, the #PF handler emulates the accesses and returns the appropriate values. Equivalent emulation support is required in the #GP handler with LASS enabled. In case of vsyscall XONLY (execute only) mode, the faulting address is readily available in the RIP which would make it easier to reuse the #PF emulation logic. For now, keep it simple and disable LASS if either of those are compiled in. Though not ideal, this makes it easier to start testing LASS support in some environments. In future, LASS support can easily be expanded to support EFI and legacy vsyscalls. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-9-sohil.mehta%40intel.com
2025-11-18x86/cpu: Add an LASS dependency on SMAPSohil Mehta
With LASS enabled, any kernel data access to userspace typically results in a #GP, or a #SS in some stack-related cases. When the kernel needs to access user memory, it can suspend LASS enforcement by toggling the RFLAGS.AC bit. Most of these cases are already covered by the stac()/clac() pairs used to avoid SMAP violations. Even though LASS could potentially be enabled independently, it would be very painful without SMAP and the related stac()/clac() calls. There is no reason to support such a configuration because all future hardware with LASS is expected to have SMAP as well. Also, the STAC/CLAC instructions are architected to: #UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. So, make LASS depend on SMAP to conveniently reuse the existing AC bit toggling already in place. Note: Additional STAC/CLAC would still be needed for accesses such as text poking which are not flagged by SMAP. This is because such mappings are in the lower half but do not have the _PAGE_USER bit set which SMAP uses for enforcement. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251118182911.2983253-3-sohil.mehta%40intel.com
2025-11-16mm: consistently use current->mm in mm_get_unmapped_area()Ryan Roberts
mm_get_unmapped_area() is a wrapper around arch_get_unmapped_area() / arch_get_unmapped_area_topdown(), both of which search current->mm for some free space. Neither take an mm_struct - they implicitly operate on current->mm. But the wrapper takes an mm_struct and uses it to decide whether to search bottom up or top down. All callers pass in current->mm for this, so everything is working consistently. But it feels like an accident waiting to happen; eventually someone will call that function with a different mm, expecting to find free space in it, but what gets returned is free space in the current mm. So let's simplify by removing the parameter and have the wrapper use current->mm to decide which end to start at. Now everything is consistent and self-documenting. Link: https://lkml.kernel.org/r/20251003155306.2147572-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-15Merge tag 'x86-urgent-2025-11-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Update the list of AMD microcode minimum Entrysign revisions - Add additional fixed AMD RDSEED microcode revisions - Update the language transliteration for Kiryl Shutsemau's name in the MAINTAINERS entry * tag 'x86-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev x86/CPU/AMD: Add additional fixed RDSEED microcode revisions MAINTAINERS: Update name spelling
2025-11-15x86/hyperv: Rename guest crash shutdown functionMukesh Rathor
Rename hv_machine_crash_shutdown to more appropriate hv_guest_crash_shutdown and make it applicable to guests only. This in preparation for the subsequent hypervisor root crash support patches. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15x86: mshyperv: Remove duplicate asm/msr.h headerJiapeng Chong
./arch/x86/kernel/cpu/mshyperv.c: asm/msr.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=26164 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch/x86: mshyperv: Trap on access for some synthetic MSRsRoman Kisel
hv_set_non_nested_msr() has special handling for SINT MSRs when a paravisor is present. In addition to updating the MSR on the host, the mirror MSR in the paravisor is updated, including with the proxy bit. But with Confidential VMBus, the proxy bit must not be used, so add a special case to skip it. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch: hyperv: Get/set SynIC synth.registers via paravisorRoman Kisel
The existing Hyper-V wrappers for getting and setting MSRs are hv_get/set_msr(). Via hv_get/set_non_nested_msr(), they detect when running in a CoCo VM with a paravisor, and use the TDX or SNP guest-host communication protocol to bypass the paravisor and go directly to the host hypervisor for SynIC MSRs. The "set" function also implements the required special handling for the SINT MSRs. Provide functions that allow manipulating the SynIC registers through the paravisor. Move vmbus_signal_eom() to a more appropriate location (which also avoids breaking KVM). Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15arch/x86: mshyperv: Discover Confidential VMBus availabilityRoman Kisel
Confidential VMBus requires enabling paravisor SynIC, and the x86_64 guest has to inspect the Virtualization Stack (VS) CPUID leaf to see if Confidential VMBus is available. If it is, the guest shall enable the paravisor SynIC. Read the relevant data from the VS CPUID leaf. Refactor the code to avoid repeating CPUID and add flags to the struct ms_hyperv_info. For ARM64, the flag for Confidential VMBus is not set which provides the desired behaviour for now as it is not available on ARM64 just yet. Once ARM64 CCA guests are supported, this flag will be set unconditionally when running such a guest. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15x86/hyperv: Don't use auto-eoi when Secure AVIC is availableTianyu Lan
Hyper-V doesn't support auto-eoi with Secure AVIC. So set the HV_DEPRECATING_AEOI_RECOMMENDED flag to force writing the EOI register after handling an interrupt. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-14x86/bugs: Get rid of the forward declarationsBorislav Petkov (AMD)
Get rid of the forward declarations of the mitigation functions by moving their single caller below them. No functional changes. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20251105200447.GBaQut3w4dLilZrX-z@fat_crate.local
2025-11-14x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrevBorislav Petkov (AMD)
Add the minimum Entrysign revision for that model+stepping to the list of minimum revisions. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/e94dd76b-4911-482f-8500-5c848a3df026@citrix.com
2025-11-14x86/CPU/AMD: Add additional fixed RDSEED microcode revisionsMario Limonciello
Microcode that resolves the RDSEED failure (SB-7055 [1]) has been released for additional Zen5 models to linux-firmware [2]. Update the zen5_rdseed_microcode array to cover these new models. Fixes: 607b9fb2ce24 ("x86/CPU/AMD: Add RDSEED fix for Zen5") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html [1] Link: https://gitlab.com/kernel-firmware/linux-firmware/-/commit/6167e5566900cf236f7a69704e8f4c441bc7212a [2] Link: https://patch.msgid.link/20251113223608.1495655-1-mario.limonciello@amd.com
2025-11-14syscore: Pass context data to callbacksThierry Reding
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-12x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possibleSean Christopherson
Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com
2025-11-12x86/mtrr: Drop unnecessary export of "mtrr_state"Sean Christopherson
Don't export "mtrr_state" as usage is limited to arch/x86/kernel/cpu/mtrr (and nothing outside of that directory even includes the local mtrr.h). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-3-seanjc%40google.com
2025-11-12x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base"Sean Christopherson
Don't export x86_spec_ctrl_base as it's used only in bugs.c and process.c, neither of which can be built into a module. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112173944.1380633-2-seanjc%40google.com
2025-11-08Merge tag 'x86-urgent-2025-11-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AMD PCI root device caching regression that triggers on certain firmware variants - Fix the zen5_rdseed_microcode[] array to be NULL-terminated - Add more AMD models to microcode signature checking * tag 'x86-urgent-2025-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Add more known models to entry sign checking x86/CPU/AMD: Add missing terminator for zen5_rdseed_microcode x86/amd_node: Fix AMD root device caching
2025-11-07x86/microcode/AMD: Add more known models to entry sign checkingMario Limonciello (AMD)
Two Zen5 systems are missing from need_sha_check(). Add them. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251106182904.4143757-1-superm1@kernel.org
2025-11-05x86/mce/amd: Define threshold restart function for banksYazen Ghannam
Prepare for CMCI storm support by moving the common bank/block iterator code to a helper function. Include a parameter to switch the interrupt enable. This will be used by the CMCI storm handling function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com
2025-11-05x86/mce/amd: Remove redundant reset_block()Yazen Ghannam
Many of the checks in reset_block() are done again in the block reset function. So drop the redundant checks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20251104-wip-mca-updates-v8-0-66c8eacf67b9@amd.com