summaryrefslogtreecommitdiff
path: root/arch/x86/mm
AgeCommit message (Collapse)Author
13 daysx86/efi: Restore IRQ state in EFI page fault handlerArd Biesheuvel
The kernel's softirq API does not permit re-enabling softirqs while IRQs are disabled. The reason for this is that local_bh_enable() will not only re-enable delivery of softirqs over the back of IRQs, it will also handle any pending softirqs immediately, regardless of whether IRQs are enabled at that point. For this reason, commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") disables softirqs only when IRQs are enabled, as it is not permitted otherwise, but also unnecessary, given that asynchronous softirq delivery never happens to begin with while IRQs are disabled. However, this does mean that entering a kernel mode FPU section with IRQs enabled and leaving it with IRQs disabled leads to problems, as identified by Sashiko [0]: the EFI page fault handler is called from page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU section with IRQs disabled as well, regardless of whether IRQs were enabled when it was started. This may result in schedule() being called with a non-zero preempt_count, causing a BUG(). So take care to re-enable IRQs when handling any EFI page faults if they were taken with IRQs enabled. [0] https://sashiko.dev/#/patchset/20260430074107.27051-1-ivan.hu%40canonical.com Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ivan Hu <ivan.hu@canonical.com> Cc: x86@kernel.org Cc: <stable@vger.kernel.org> Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-04-24x86/mm: Revert INVLPGB optimization for set_memory codeDave Hansen
tl;dr: Revert an INVLPGB optimization that did not properly handle discontiguous virtual addresses. Full story: I got a report from some graphics (i915) folks that bisected a regression in their test suite to 86e6815b316e ("x86/mm: Change cpa_flush() to call flush_kernel_range() directly"). There was a bit of flip-flopping on the exact bisect, but the code here does seem wrong to me. The i915 folks were calling set_pages_array_wc(), so using the CPA_PAGES_ARRAY mode. Basically, the 'struct cpa_data' can wrap up all kinds of page table changes. Some of these are virtually contiguous, but some are very much not which is one reason why there are ->vaddr and ->pages arrays. 86e6815b316e made the mistake of assuming that the virtual addresses in the cpa_data are always contiguous. It got things right when neither CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when either of those is used. In the i915 case, it probably failed to flush some WB TLB entries and install WC ones, leaving some data in the caches and not flushing it out to where the device could see it. That eventually caused graphics problems. Revert the INVLPGB optimization. It can be reintroduced later, but it will need to be a bit careful about the array modes. Fixes: 86e6815b316ec ("x86/mm: Change cpa_flush() to call flush_kernel_range()") Reported-by: Cui, Ling <ling.cui@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260421151909.6B3281C6@davehans-spike.ostc.intel.com
2026-04-21x86/cpu: Disable FRED when PTI is forced onDave Hansen
FRED and PTI were never intended to work together. No FRED hardware is vulnerable to Meltdown and all of it should have LASS anyway. Nevertheless, if you boot a system with pti=on and fred=on, the kernel tries to do what is asked of it and dies a horrible death on the first attempt to run userspace (since it never switches to the user page tables). Disable FRED when PTI is forced on, and print a warning about it. A quick brain dump about what a FRED+PTI implementation would look like is below. I'm not sure it would make any sense to do it, but never say never. All I know is that it's way too complicated to be worth it today. <brain dump> The SWITCH_TO_USER/KERNEL_CR3 bits are simple to fix (or at least we have the assembly tools to do it already), as is sticking the FRED entry text in .entry.text (it's not in there today). The nasty part is the stacks. Today, the CPU pops into the kernel on MSR_IA32_FRED_RSP0 which is normal old kernel memory and not mapped to userspace. The hardware pushes gunk on to MSR_IA32_FRED_RSP0, which is currently the task stacks. MSR_IA32_FRED_RSP0 would need to point elsewhere, probably cpu_entry_stack(). Then, start playing games with stacks on entry/exit, including copying gunk to and from the task stack. While I'd *like* to have PTI everywhere, I'm not sure it's worth mucking up the FRED code with PTI kludges. If a user wants fast entry/exit, they use FRED. If you want PTI (and sekuritay), you certainly don't care about fast entry and FRED isn't going to help you *all* that much, so you can just stay with the IDT. Plus, FRED hardware should have LASS which gives you a similar security profile to PTI without the CR3 munging. </brain dump> Reported-by: Gayatri Kammela <Gayatri.Kammela@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Cc:stable@vger.kernel.org Link: https://patch.msgid.link/20260421163136.E7C6788A@davehans-spike.ostc.intel.com
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-14Merge tag 'x86_mm_for_7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: - Convert x86 code to use generic "pagetable" APIs and ptdescs This aligns some the set_memory*() code better with the new page table APIs, especially using ptdescs as opposed to 'struct page' directly. * tag 'x86_mm_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm/pat: Convert split_large_page() to use ptdescs x86/mm/pat: Convert populate_pgd() to use page table apis x86/mm/pat: Convert pmd code to use page table apis x86/mm/pat: Convert pte code to use page table apis
2026-04-14Merge tag 'x86_cpu_for_7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Dave Hansen: - Complete LASS enabling: deal with vsyscall and EFI The existing Linear Address Space Separation (LASS) support punted on support for common EFI and vsyscall configs. Complete the implementation by supporting EFI and vsyscall=xonly. - Clean up CPUID usage in newer Intel "avs" audio driver and update the x86-cpuid-db file * tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.0 ASoC: Intel: avs: Include CPUID header at file scope ASoC: Intel: avs: Check maximum valid CPUID leaf x86/cpu: Remove LASS restriction on vsyscall emulation x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE x86/vsyscall: Restore vsyscall=xonly mode under LASS x86/traps: Consolidate user fixups in the #GP handler x86/vsyscall: Reorganize the page fault emulation code x86/cpu: Remove LASS restriction on EFI x86/efi: Disable LASS while executing runtime services x86/cpu: Defer LASS enabling until userspace comes up
2026-04-14Merge tag 'sched-core-2026-04-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Fair scheduling updates: - Skip SCHED_IDLE rq for SCHED_IDLE tasks (Christian Loehle) - Remove superfluous rcu_read_lock() in the wakeup path (K Prateek Nayak) - Simplify the entry condition for update_idle_cpu_scan() (K Prateek Nayak) - Simplify SIS_UTIL handling in select_idle_cpu() (K Prateek Nayak) - Avoid overflow in enqueue_entity() (K Prateek Nayak) - Update overutilized detection (Vincent Guittot) - Prevent negative lag increase during delayed dequeue (Vincent Guittot) - Clear buddies for preempt_short (Vincent Guittot) - Implement more complex proportional newidle balance (Peter Zijlstra) - Increase weight bits for avg_vruntime (Peter Zijlstra) - Use full weight to __calc_delta() (Peter Zijlstra) RT and DL scheduling updates: - Fix incorrect schedstats for rt and dl thread (Dengjun Su) - Skip group schedulable check with rt_group_sched=0 (Michal Koutný) - Move group schedulability check to sched_rt_global_validate() (Michal Koutný) - Add reporting of runtime left & abs deadline to sched_getattr() for DEADLINE tasks (Tommaso Cucinotta) Scheduling topology updates by K Prateek Nayak: - Compute sd_weight considering cpuset partitions - Extract "imb_numa_nr" calculation into a separate helper - Allocate per-CPU sched_domain_shared in s_data - Switch to assigning "sd->shared" from s_data - Remove sched_domain_shared allocation with sd_data Energy-aware scheduling updates: - Filter false overloaded_group case for EAS (Vincent Guittot) - PM: EM: Switch to rcu_dereference_all() in wakeup path (Dietmar Eggemann) Infrastructure updates: - Replace use of system_unbound_wq with system_dfl_wq (Marco Crivellari) Proxy scheduling updates by John Stultz: - Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() - Minimise repeated sched_proxy_exec() checking - Fix potentially missing balancing with Proxy Exec - Fix and improve task::blocked_on et al handling - Add assert_balance_callbacks_empty() helper - Add logic to zap balancing callbacks if we pick again - Move attach_one_task() and attach_task() helpers to sched.h - Handle blocked-waiter migration (and return migration) - Add K Prateek Nayak to scheduler reviewers for proxy execution Misc cleanups and fixes by John Stultz, Joseph Salisbury, Peter Zijlstra, K Prateek Nayak, Michal Koutný, Randy Dunlap, Shrikanth Hegde, Vincent Guittot, Zhan Xusheng, Xie Yuanbin and Vincent Guittot" * tag 'sched-core-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) sched/eevdf: Clear buddies for preempt_short sched/rt: Cleanup global RT bandwidth functions sched/rt: Move group schedulability check to sched_rt_global_validate() sched/rt: Skip group schedulable check with rt_group_sched=0 sched/fair: Avoid overflow in enqueue_entity() sched: Use u64 for bandwidth ratio calculations sched/fair: Prevent negative lag increase during delayed dequeue sched/fair: Use sched_energy_enabled() sched: Handle blocked-waiter migration (and return migration) sched: Move attach_one_task and attach_task helpers to sched.h sched: Add logic to zap balance callbacks if we pick again sched: Add assert_balance_callbacks_empty helper sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration sched: Fix modifying donor->blocked on without proper locking locking: Add task::blocked_lock to serialize blocked_on state sched: Fix potentially missing balancing with Proxy Exec sched: Minimise repeated sched_proxy_exec() checking sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() MAINTAINERS: Add K Prateek Nayak to scheduler reviewers sched/core: Get this cpu once in ttwu_queue_cond() ...
2026-04-05mm: change to return bool for pudp_test_and_clear_young()Baolin Wang
The pudp_test_and_clear_young() is used to clear the young flag, returning whether the young flag was set for this PUD entry. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/2c56fe52c1bf9404145274d7e91d4a65060f6c7c.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for pmdp_clear_flush_young()Baolin Wang
The pmdp_clear_flush_young() is used to clear the young flag and flush the TLB, returning whether the young flag was set for this PMD entry. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/a668b9a974c0d675e7a41f6973bcbe3336e8b373.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for pmdp_test_and_clear_young()Baolin Wang
Callers use pmdp_test_and_clear_young() to clear the young flag and check whether it was set for this PMD entry. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for ptep_clear_flush_young()/clear_flush_young_ptes()Baolin Wang
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to clear the young flag and flush the TLB, returning whether the young flag was set. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm: change to return bool for ptep_test_and_clear_young()Baolin Wang
Patch series "change young flag check functions to return bool", v2. This is a cleanup patchset to change all young flag check functions to return bool, as discussed with David in the previous thread[1]. Since callers only care about whether the young flag was set, returning bool makes the intention clearer. No functional changes intended. This patch (of 6): Callers use ptep_test_and_clear_young() to clear the young flag and check whether it was set. Change the return type to bool to make the intention clearer. Link: https://lkml.kernel.org/r/cover.1774075004.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/57e70efa9703d43959aa645246ea3cbdba14fa17.1774075004.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05mm/pkeys: remove unused tsk parameter from arch_set_user_pkey_access()Seongsu Park
The tsk parameter in arch_set_user_pkey_access() is never used in the function implementations across all architectures (arm64, powerpc, x86). Link: https://lkml.kernel.org/r/20260219063506.545148-1-sgsu.park@samsung.com Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-30x86/kexec: Disable KCOV instrumentation after load_segments()Aleksandr Nogikh
The load_segments() function changes segment registers, invalidating GS base (which KCOV relies on for per-cpu data). When CONFIG_KCOV is enabled, any subsequent instrumented C code call (e.g. native_gdt_invalidate()) begins crashing the kernel in an endless loop. To reproduce the problem, it's sufficient to do kexec on a KCOV-instrumented kernel: $ kexec -l /boot/otherKernel $ kexec -e The real-world context for this problem is enabling crash dump collection in syzkaller. For this, the tool loads a panic kernel before fuzzing and then calls makedumpfile after the panic. This workflow requires both CONFIG_KEXEC and CONFIG_KCOV to be enabled simultaneously. Adding safeguards directly to the KCOV fast-path (__sanitizer_cov_trace_pc()) is also undesirable as it would introduce an extra performance overhead. Disabling instrumentation for the individual functions would be too fragile, so disable KCOV instrumentation for the entire machine_kexec_64.c and physaddr.c. If coverage-guided fuzzing ever needs these components in the future, other approaches should be considered. The problem is not relevant for 32 bit kernels as CONFIG_KCOV is not supported there. [ bp: Space out comment for better readability. ] Fixes: 0d345996e4cb ("x86/kernel: increase kcov coverage under arch/x86/kernel folder") Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260325154825.551191-1-nogikh@google.com
2026-03-19x86/vsyscall: Reorganize the page fault emulation codeSohil Mehta
With LASS, vsyscall page accesses will cause a #GP instead of a #PF. Separate out the core vsyscall emulation code from the #PF specific handling in preparation for the upcoming #GP emulation. No functional change intended. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260309181029.398498-2-sohil.mehta@intel.com
2026-03-17Merge tag 'v7.0-rc4' into sched/core, to pick up scheduler fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-06x86/mm/tlb: Make enter_lazy_tlb() always inline on x86Xie Yuanbin
enter_lazy_tlb() on x86 is short enough, and is called in context switching, which is the hot code path. Make enter_lazy_tlb() always inline on x86 to optimize performance. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Xie Yuanbin <qq570070308@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20260216164950.147617-2-qq570070308@gmail.com
2026-03-04x86/mm/pat: Convert split_large_page() to use ptdescsVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Update split_large_page() to allocate a ptdesc instead of allocating a page for use as a page table. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-5-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert populate_pgd() to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the remaining get_zeroed_page() calls to the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as both functions require it to identify kernel page tables. Because the generic implementations do not use the second argument, pass a placeholder to avoid reimplementing them or risking breakage on other architectures. It is not obvious whether these pages are freed. Regardless, convert the remaining free paths as needed, noting that the only other possible free paths have already been converted and that a frozen page table test kernel has not reported any issues. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-4-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pmd code to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PMD allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables, as pmd_alloc_one() requires it to identify kernel page tables. Because the generic implementation does not use the second argument, pass a placeholder to avoid reimplementing it or risking breakage on other architectures. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-3-vishal.moola@gmail.com
2026-03-04x86/mm/pat: Convert pte code to use page table apisVishal Moola (Oracle)
Use the ptdesc APIs for all page table allocation and free sites to allow their separate allocation from struct page in the future. Convert the PTE allocation and free sites to use the generic page table APIs, as they already use ptdescs. Pass through init_mm since these are kernel page tables; otherwise, pte_alloc_one_kernel() becomes a no-op. Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260303194828.1406905-2-vishal.moola@gmail.com
2026-03-04x86/numa: Store extra copy of numa_nodes_parsedPeter Zijlstra
The topology setup code needs to know the total number of physical nodes enumerated in SRAT; however NUMA_EMU can cause the existing numa_nodes_parsed bitmap to be fictitious. Therefore, keep a copy of the bitmap specifically to retain the physical node count. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: Kyle Meyer <kyle.meyer@hpe.com> Link: https://patch.msgid.link/20260303110059.889884023@infradead.org
2026-02-23x86/bug: Handle __WARN_printf() trap in early_fixup_exception()Hou Wenlong
The commit 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()") implemented __WARN_printf(), which changed the mechanism to use UD1 instead of UD2. However, it only handles the trap in the runtime IDT handler, while the early booting IDT handler lacks this handling. As a result, the usage of WARN() before the runtime IDT setup can lead to kernel crashes. Since KMSAN is enabled after the runtime IDT setup, it is safe to use handle_bug() directly in early_fixup_exception() to address this issue. Fixes: 5b472b6e5bd9 ("x86_64/bug: Implement __WARN_printf()") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/c4fb3645f60d3a78629d9870e8fcc8535281c24f.1768016713.git.houwenlong.hwl@antgroup.com
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'mm-stable-2026-02-11-19-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...
2026-02-10Merge tag 'x86_paravirt_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: - A nice cleanup to the paravirt code containing a unification of the paravirt clock interface, taming the include hell by splitting the pv_ops structure and removing of a bunch of obsolete code (Juergen Gross) * tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted() x86/paravirt: Remove trailing semicolons from alternative asm templates x86/pvlocks: Move paravirt spinlock functions into own header x86/paravirt: Specify pv_ops array in paravirt macros x86/paravirt: Allow pv-calls outside paravirt.h objtool: Allow multiple pv_ops arrays x86/xen: Drop xen_mmu_ops x86/xen: Drop xen_cpu_ops x86/xen: Drop xen_irq_ops x86/paravirt: Move pv_native_*() prototypes to paravirt.c x86/paravirt: Introduce new paravirt-base.h header x86/paravirt: Move paravirt_sched_clock() related code into tsc.c x86/paravirt: Use common code for paravirt_steal_clock() riscv/paravirt: Use common code for paravirt_steal_clock() loongarch/paravirt: Use common code for paravirt_steal_clock() arm64/paravirt: Use common code for paravirt_steal_clock() arm/paravirt: Use common code for paravirt_steal_clock() sched: Move clock related paravirt code to kernel/sched paravirt: Remove asm/paravirt_api_clock.h x86/paravirt: Move thunk macros to paravirt_types.h ...
2026-02-10Merge tag 'x86_cleanups_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The usual set of cleanups and simplifications all over the tree * tag 'x86_cleanups_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/segment: Use MOVL when reading segment registers selftests/x86: Clean up sysret_rip coding style x86/mm: Hide mm_free_global_asid() definition under CONFIG_BROADCAST_TLB_FLUSH x86/crash: Use set_memory_p() instead of __set_memory_prot() x86/CPU/AMD: Simplify the spectral chicken fix x86/platform/olpc: Replace strcpy() with strscpy() in xo15_sci_add() x86/split_lock: Remove dead string when split_lock_detect=fatal
2026-01-26mm, arch: consolidate hugetlb CMA reservationMike Rapoport (Microsoft)
Every architecture that supports hugetlb_cma command line parameter reserves CMA areas for hugetlb during setup_arch(). This obfuscates the ordering of hugetlb CMA initialization with respect to the rest initialization of the core MM. Introduce arch_hugetlb_cma_order() callback to allow architectures report the desired order-per-bit of CMA areas and provide a week implementation of arch_hugetlb_cma_order() for architectures that don't support hugetlb with CMA. Use this callback in hugetlb_cma_reserve() instead if passing the order as parameter and call hugetlb_cma_reserve() from mm_core_init_early() rather than have it spread over architecture specific code. Link: https://lkml.kernel.org/r/20260111082105.290734-28-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26arch, mm: consolidate initialization of SPARSE memory modelMike Rapoport (Microsoft)
Every architecture calls sparse_init() during setup_arch() although the data structures created by sparse_init() are not used until the initialization of the core MM. Beside the code duplication, calling sparse_init() from architecture specific code causes ordering differences of vmemmap and HVO initialization on different architectures. Move the call to sparse_init() from architecture specific code to free_area_init() to ensure that vmemmap and HVO initialization order is always the same. Link: https://lkml.kernel.org/r/20260111082105.290734-25-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26arch, mm: consolidate initialization of nodes, zones and memory mapMike Rapoport (Microsoft)
To initialize node, zone and memory map data structures every architecture calls free_area_init() during setup_arch() and passes it an array of zone limits. Beside code duplication it creates "interesting" ordering cases between allocation and initialization of hugetlb and the memory map. Some architectures allocate hugetlb pages very early in setup_arch() in certain cases, some only create hugetlb CMA areas in setup_arch() and sometimes hugetlb allocations happen mm_core_init(). With arch_zone_limits_init() helper available now on all architectures it is no longer necessary to call free_area_init() from architecture setup code. Rather core MM initialization can call arch_zone_limits_init() in a single place. This allows to unify ordering of hugetlb vs memory map allocation and initialization. Remove the call to free_area_init() from architecture specific code and place it in a new mm_core_init_early() function that is called immediately after setup_arch(). After this refactoring it is possible to consolidate hugetlb allocations and eliminate differences in ordering of hugetlb and memory map initialization among different architectures. As the first step of this consolidation move hugetlb_bootmem_alloc() to mm_core_early_init(). Link: https://lkml.kernel.org/r/20260111082105.290734-24-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26x86: introduce arch_zone_limits_init()Mike Rapoport (Microsoft)
Move calculations of zone limits to a dedicated arch_zone_limits_init() function. Later MM core will use this function as an architecture specific callback during nodes and zones initialization and thus there won't be a need to call free_area_init() from every architecture. Link: https://lkml.kernel.org/r/20260111082105.290734-22-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alex Shi <alexs@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Klara Modin <klarasmodin@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-22x86: make page fault handling disable interrupts properlyCedric Xing
There's a big comment in the x86 do_page_fault() about our interrupt disabling code: * User address page fault handling might have reenabled * interrupts. Fixing up all potential exit points of * do_user_addr_fault() and its leaf functions is just not * doable w/o creating an unholy mess or turning the code * upside down. but it turns out that comment is subtly wrong, and the code as a result is also wrong. Because it's certainly true that we may have re-enabled interrupts when handling user page faults. And it's most certainly true that we don't want to bother fixing up all the cases. But what isn't true is that it's limited to user address page faults. The confusion stems from the fact that we have logic here that depends on the address range of the access, but other code then depends on the _context_ the access was done in. The two are not related, even though both of them are about user-vs-kernel. In other words, both user and kernel addresses can cause interrupts to have been enabled (eg when __bad_area_nosemaphore() gets called for user accesses to kernel addresses). As a result we should make sure to disable interrupts again regardless of the address range before returning to the low-level fault handling code. The __bad_area_nosemaphore() code actually did disable interrupts again after enabling them, just not consistently. Ironically, as noted in the original comment, fixing up all the cases is just not worth it, when the simple solution is to just do it unconditionally in one single place. So remove the incomplete case that unsuccessfully tried to do what the comment said was "not doable" in commit ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"), and just make it do the simple and straightforward thing. Signed-off-by: Cedric Xing <cedric.xing@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Fixes: ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-19Merge tag 'v6.19-rc6' into tip-x86-cleanupsBorislav Petkov (AMD)
Pick up upstream work and d9b40d7262a2 ("selftests/x86: Add selftests include path for kselftest.h after centralization") especially which is a build fix needed for a selftests cleanup coming ontop of this. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2026-01-16x86/mm: Hide mm_free_global_asid() definition under CONFIG_BROADCAST_TLB_FLUSHHou Wenlong
When CONFIG_BROADCAST_TLB_FLUSH is not enabled, mm_free_global_asid() remains a globally visible symbol and generates a useless function call to it in destroy_context(). Therefore, hide the mm_free_global_asid() definition under CONFIG_BROADCAST_TLB_FLUSH and provide a static inline empty version when it is not enabled to remove the function call. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Rik van Riel <riel@surriel.com> Link: https://patch.msgid.link/b262a8ec8076fb26bb692aaf113848b1e6f40e40.1768448079.git.houwenlong.hwl@antgroup.com
2026-01-16Merge tag 'cxl-fixes-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) fixes from Dave Jiang: - Recognize all ZONE_DEVICE users as physaddr consumers - Fix format string for extended_linear_cache_size_show() - Fix target list setup for multiple decoders sharing the same downstream port - Restore HBIW check before derefernce platform data - Fix potential infinite loop in __cxl_dpa_reserve() - Check for invalid addresses returned from translation functions on error * tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Check for invalid addresses returned from translation functions on errors cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve() cxl/acpi: Restore HBIW check before dereferencing platform_data cxl/port: Fix target list setup for multiple decoders sharing the same dport cxl/region: fix format string for resource_size_t x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
2026-01-13x86/crash: Use set_memory_p() instead of __set_memory_prot()Coiby Xu
set_memory_p() is available to use outside of its compilation unit since: 030ad7af9437 ("x86/mm: Regularize set_memory_p() parameters and make non-static"). There is no use for __set_memory_prot() anymore so drop it too. [ bp: Massage commit message. ] Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260106095100.656292-1-coxu@redhat.com
2026-01-12x86/paravirt: Remove not needed includes of paravirt.hJuergen Gross
In some places asm/paravirt.h is included without really being needed. Remove the related #include statements. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-2-jgross@suse.com
2026-01-11treewide: Update email addressThomas Gleixner
In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-05x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumersDan Williams
Commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") is too narrow. The effect being mitigated in that commit is caused by ZONE_DEVICE which PCI_P2PDMA has a dependency. ZONE_DEVICE, in general, lets any physical address be added to the direct-map. I.e. not only ACPI hotplug ranges, CXL Memory Windows, or EFI Specific Purpose Memory, but also any PCI MMIO range for the DEVICE_PRIVATE and PCI_P2PDMA cases. Update the mitigation, limit KASLR entropy, to apply in all ZONE_DEVICE=y cases. Distro kernels typically have PCI_P2PDMA=y, so the practical exposure of this problem is limited to the PCI_P2PDMA=n case. A potential path to recover entropy would be to walk ACPI and determine the limits for hotplug and PCI MMIO before kernel_randomize_memory(). On smaller systems that could yield some KASLR address bits. This needs additional investigation to determine if some limited ACPI table scanning can happen this early without an open coded solution like arch/x86/boot/compressed/acpi.c needs to deploy. Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Fixes: 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Yasunori Goto <y-goto@fujitsu.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://patch.msgid.link/692e08b2516d4_261c1100a3@dwillia2-mobl4.notmuch Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-12-05Merge tag 'soc-drivers-6.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull more SoC driver updates from Arnd Bergmann: "These updates came a little late, or were based on a later 6.18-rc tag than the others: - A new driver for cache management on cxl devices with memory shared in a coherent cluster. This is part of the drivers/cache/ tree, but unlike the other drivers that back the dma-mapping interfaces, this one is needed only during CPU hotplug. - A shared branch for reset controllers using swnode infrastructure - Added support for new SoC variants in the Amlogic soc_device identification - Minor updates in Freescale, Microchip, Samsung, and Apple SoC drivers" * tag 'soc-drivers-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (24 commits) soc: samsung: exynos-pmu: fix device leak on regmap lookup soc: samsung: exynos-pmu: Fix structure initialization soc: fsl: qbman: use kmalloc_array() instead of kmalloc() soc: fsl: qbman: add WQ_PERCPU to alloc_workqueue users MAINTAINERS: Update email address for Christophe Leroy MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent cache: Make top level Kconfig menu a boolean dependent on RISCV MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header arm64: Select GENERIC_CPU_CACHE_MAINTENANCE lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION soc: amlogic: meson-gx-socinfo: add new SoCs id dt-bindings: arm: amlogic: meson-gx-ao-secure: support more SoCs memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() dt-bindings: cache: sifive,ccache0: add a pic64gx compatible MAINTAINERS: rename Microchip RISC-V entry MAINTAINERS: add new soc drivers to Microchip RISC-V entry soc: microchip: add mfd drivers for two syscon regions on PolarFire SoC dt-bindings: soc: microchip: document the simple-mfd syscon on PolarFire SoC ...
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-12-02Merge tag 'x86_misc_for_6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
2025-12-02Merge tag 'x86_mm_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Borislav Petkov: - Use the proper accessors when reading CR3 as part of the page level transitions (5-level to 4-level, the use case being kexec) so that only the physical address in CR3 is picked up and not flags which are above the physical mask shift - Clean up and unify __phys_addr_symbol() definitions * tag 'x86_mm_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/libstub: Fix page table access in 5-level to 4-level paging transition x86/boot: Fix page table access in 5-level to 4-level paging transition x86/mm: Unify __phys_addr_symbol()
2025-12-02Merge tag 'x86_cleanups_for_v6.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The mandatory pile of cleanups the cat drags in every merge window * tag 'x86_cleanups_for_v6.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Clean up whitespace in a20.c x86/mm: Delete disabled debug code x86/{boot,mtrr}: Remove unused function declarations x86/percpu: Use BIT_WORD() and BIT_MASK() macros x86/cpufeatures: Correct LKGS feature flag description x86/idtentry: Add missing '*' to kernel-doc lines
2025-11-27Merge tag 'cache-for-v6.19' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/drivers-late standalone cache drivers for v6.19 ccache: Add a compatible for the pic64gx SoC. No driver change needed, as it falls back to the PolarFire SoC. hisi hha/generic cpu cache maintenance: Add support for a non-architectural mechanism for invalidating memory regions, needed for some cxl implementations on arm64 (and probably elsewhere in the future). The HiSilicon Hydra Home Agent is the first driver to provide this support. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'cache-for-v6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: MAINTAINERS: refer to intended file in STANDALONE CACHE CONTROLLER DRIVERS cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent cache: Make top level Kconfig menu a boolean dependent on RISCV MAINTAINERS: Add Jonathan Cameron to drivers/cache and add lib/cache_maint.c + header arm64: Select GENERIC_CPU_CACHE_MAINTENANCE lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion() memregion: Drop unused IORES_DESC_* parameter from cpu_cache_invalidate_memregion() dt-bindings: cache: sifive,ccache0: add a pic64gx compatible Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-27x86/mm: Delete disabled debug codeBrendan Jackman
This code doesn't run. Since 2008: 4f9c11dd49fb ("x86, 64-bit: adjust mapping of physical pagetables to work with Xen") the kernel has gained more flexible logging and tracing capabilities; presumably if anyone wanted to take advantage of this log message they would have got rid of the "if (0)" so they could use these capabilities. Since they haven't, just delete it. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003-x86-init-cleanup-v1-1-f2b7994c2ad6@google.com
2025-11-17memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()Yicong Yang
Extend cpu_cache_invalidate_memregion() to support invalidating a particular range of memory by introducing start and length parameters. Control of types of invalidation is left for when use cases turn up. For now everything is Clean and Invalidate. Where the range is unknown, use the provided cpu_cache_invalidate_all() helper to act as documentation of intent in a fashion that is clearer than passing (0, -1) to cpu_cache_invalidate_memregion(). Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-17memregion: Drop unused IORES_DESC_* parameter from ↵Jonathan Cameron
cpu_cache_invalidate_memregion() The res_desc parameter was originally introduced for documentation purposes and with the idea that with HDM-DB CXL invalidation could be triggered from the device. That has not come to pass and the continued existence of the option is confusing when we add a range in the following patch which might not be a strict subset of the res_desc. So avoid that confusion by dropping the parameter. Link: https://lore.kernel.org/linux-mm/686eedb25ed02_24471002e@dwillia2-xfh.jf.intel.com.notmuch/ Reviewed-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2025-11-16x86/mm: use pagetable_free()Lu Baolu
The kernel's memory management subsystem provides a dedicated interface, pagetable_free(), for freeing page table pages. Updates two call sites to use pagetable_free() instead of the lower-level __free_page() or free_pages(). This improves code consistency and clarity, and ensures the correct freeing mechanism is used. Link: https://lkml.kernel.org/r/20251022082635.2462433-7-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>