diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-15 12:59:16 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-15 12:59:16 -0700 |
| commit | 334fbe734e687404f346eba7d5d96ed2b44d35ab (patch) | |
| tree | 65d5c8f4de18335209b2529146e6b06960a48b43 /mm/vmalloc.c | |
| parent | 5bdb4078e1efba9650c03753616866192d680718 (diff) | |
| parent | 3bac01168982ec3e3bf87efdc1807c7933590a85 (diff) | |
Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
Diffstat (limited to 'mm/vmalloc.c')
| -rw-r--r-- | mm/vmalloc.c | 58 |
1 files changed, 29 insertions, 29 deletions
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 61caa55a4402..b31b208f6ecb 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1068,14 +1068,8 @@ static BLOCKING_NOTIFIER_HEAD(vmap_notify_list); static void drain_vmap_area_work(struct work_struct *work); static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work); -static __cacheline_aligned_in_smp atomic_long_t nr_vmalloc_pages; static __cacheline_aligned_in_smp atomic_long_t vmap_lazy_nr; -unsigned long vmalloc_nr_pages(void) -{ - return atomic_long_read(&nr_vmalloc_pages); -} - static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root) { struct rb_node *n = root->rb_node; @@ -3189,7 +3183,7 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align) kasan_populate_early_vm_area_shadow(vm->addr, vm->size); } -static void clear_vm_uninitialized_flag(struct vm_struct *vm) +void clear_vm_uninitialized_flag(struct vm_struct *vm) { /* * Before removing VM_UNINITIALIZED, @@ -3465,9 +3459,6 @@ void vfree(const void *addr) if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); - /* All pages of vm should be charged to same memcg, so use first one. */ - if (vm->nr_pages && !(vm->flags & VM_MAP_PUT_PAGES)) - mod_memcg_page_state(vm->pages[0], MEMCG_VMALLOC, -vm->nr_pages); for (i = 0; i < vm->nr_pages; i++) { struct page *page = vm->pages[i]; @@ -3476,11 +3467,11 @@ void vfree(const void *addr) * High-order allocs for huge vmallocs are split, so * can be freed as an array of order-0 allocations */ + if (!(vm->flags & VM_MAP_PUT_PAGES)) + mod_lruvec_page_state(page, NR_VMALLOC, -1); __free_page(page); cond_resched(); } - if (!(vm->flags & VM_MAP_PUT_PAGES)) - atomic_long_sub(vm->nr_pages, &nr_vmalloc_pages); kvfree(vm->pages); kfree(vm); } @@ -3668,6 +3659,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, continue; } + mod_lruvec_page_state(page, NR_VMALLOC, 1 << large_order); + split_page(page, large_order); for (i = 0; i < (1U << large_order); i++) pages[nr_allocated + i] = page + i; @@ -3688,6 +3681,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (!order) { while (nr_allocated < nr_pages) { unsigned int nr, nr_pages_request; + int i; /* * A maximum allowed request is hard-coded and is 100 @@ -3711,6 +3705,9 @@ vm_area_alloc_pages(gfp_t gfp, int nid, nr_pages_request, pages + nr_allocated); + for (i = nr_allocated; i < nr_allocated + nr; i++) + mod_lruvec_page_state(pages[i], NR_VMALLOC, 1); + nr_allocated += nr; /* @@ -3735,6 +3732,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, if (unlikely(!page)) break; + mod_lruvec_page_state(page, NR_VMALLOC, 1 << order); + /* * High-order allocations must be able to be treated as * independent small pages by callers (as they can with @@ -3798,6 +3797,8 @@ static void defer_vm_area_cleanup(struct vm_struct *area) * non-blocking (no __GFP_DIRECT_RECLAIM) - memalloc_noreclaim_save() * GFP_NOFS - memalloc_nofs_save() * GFP_NOIO - memalloc_noio_save() + * __GFP_RETRY_MAYFAIL, __GFP_NORETRY - memalloc_noreclaim_save() + * to prevent OOMs * * Returns a flag cookie to pair with restore. */ @@ -3806,7 +3807,8 @@ memalloc_apply_gfp_scope(gfp_t gfp_mask) { unsigned int flags = 0; - if (!gfpflags_allow_blocking(gfp_mask)) + if (!gfpflags_allow_blocking(gfp_mask) || + (gfp_mask & (__GFP_RETRY_MAYFAIL | __GFP_NORETRY))) flags = memalloc_noreclaim_save(); else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO) flags = memalloc_nofs_save(); @@ -3877,12 +3879,6 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, vmalloc_gfp_adjust(gfp_mask, page_order), node, page_order, nr_small_pages, area->pages); - atomic_long_add(area->nr_pages, &nr_vmalloc_pages); - /* All pages of vm should be charged to same memcg, so use first one. */ - if (gfp_mask & __GFP_ACCOUNT && area->nr_pages) - mod_memcg_page_state(area->pages[0], MEMCG_VMALLOC, - area->nr_pages); - /* * If not enough pages were obtained to accomplish an * allocation request, free them via vfree() if any. @@ -3901,7 +3897,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, if (!fatal_signal_pending(current) && page_order == 0) warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, failed to allocate pages", - area->nr_pages * PAGE_SIZE); + nr_small_pages * PAGE_SIZE); goto fail; } @@ -3940,7 +3936,8 @@ fail: * GFP_KERNEL_ACCOUNT. Xfs uses __GFP_NOLOCKDEP. */ #define GFP_VMALLOC_SUPPORTED (GFP_KERNEL | GFP_ATOMIC | GFP_NOWAIT |\ - __GFP_NOFAIL | __GFP_ZERO | __GFP_NORETRY |\ + __GFP_NOFAIL | __GFP_ZERO |\ + __GFP_NORETRY | __GFP_RETRY_MAYFAIL |\ GFP_NOFS | GFP_NOIO | GFP_KERNEL_ACCOUNT |\ GFP_USER | __GFP_NOLOCKDEP) @@ -3971,12 +3968,15 @@ static gfp_t vmalloc_fix_flags(gfp_t flags) * virtual range with protection @prot. * * Supported GFP classes: %GFP_KERNEL, %GFP_ATOMIC, %GFP_NOWAIT, - * %GFP_NOFS and %GFP_NOIO. Zone modifiers are not supported. + * %__GFP_RETRY_MAYFAIL, %__GFP_NORETRY, %GFP_NOFS and %GFP_NOIO. + * Zone modifiers are not supported. * Please note %GFP_ATOMIC and %GFP_NOWAIT are supported only * by __vmalloc(). * - * Retry modifiers: only %__GFP_NOFAIL is supported; %__GFP_NORETRY - * and %__GFP_RETRY_MAYFAIL are not supported. + * Retry modifiers: only %__GFP_NOFAIL is fully supported; + * %__GFP_NORETRY and %__GFP_RETRY_MAYFAIL are supported with limitation, + * i.e. page tables are allocated with NOWAIT semantic so they might fail + * under moderate memory pressure. * * %__GFP_NOWARN can be used to suppress failure messages. * @@ -4575,20 +4575,20 @@ finished: * @count: number of bytes to be read. * * This function checks that addr is a valid vmalloc'ed area, and - * copy data from that area to a given buffer. If the given memory range + * copies data from that area to a given iterator. If the given memory range * of [addr...addr+count) includes some valid address, data is copied to - * proper area of @buf. If there are memory holes, they'll be zero-filled. + * proper area of @iter. If there are memory holes, they'll be zero-filled. * IOREMAP area is treated as memory hole and no copy is done. * * If [addr...addr+count) doesn't includes any intersects with alive - * vm_struct area, returns 0. @buf should be kernel's buffer. + * vm_struct area, returns 0. * - * Note: In usual ops, vread() is never necessary because the caller + * Note: In usual ops, vread_iter() is never necessary because the caller * should know vmalloc() area is valid and can use memcpy(). * This is for routines which have to access vmalloc area without * any information, as /proc/kcore. * - * Return: number of bytes for which addr and buf should be increased + * Return: number of bytes for which addr and iter should be advanced * (same number as @count) or %0 if [addr...addr+count) doesn't * include any intersection with valid vmalloc area */ |
