diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-30 13:45:28 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-03-30 13:45:28 -0700 |
| commit | aa918db707fba507e85217961643281ee8dfb2ed (patch) | |
| tree | 86d529825cc85a1d309f33efba97016dd64c8529 /include/linux/bpf.h | |
| parent | 494e7fe591bf834d57c6607cdc26ab8873708aa7 (diff) | |
| parent | f90b474a35744b5d43009e4fab232e74a3024cae (diff) | |
Merge tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf try_alloc_pages() support from Alexei Starovoitov:
"The pull includes work from Sebastian, Vlastimil and myself with a lot
of help from Michal and Shakeel.
This is a first step towards making kmalloc reentrant to get rid of
slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches
make page allocator safe from any context.
Vlastimil kicked off this effort at LSFMM 2024:
https://lwn.net/Articles/974138/
and we continued at LSFMM 2025:
https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/
Why:
SLAB wrappers bind memory to a particular subsystem making it
unavailable to the rest of the kernel. Some BPF maps in production
consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G,
1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF
map preallocation won't be necessary.
How:
Synchronous kmalloc/page alloc stack has multiple stages going from
fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already
relying on trylock.
This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and
return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this
functionality into try_alloc_pages() helper. We make sure that the
logic is sane in PREEMPT_RT.
End result: try_alloc_pages()/free_pages_nolock() are safe to call
from any context.
try_kmalloc() for any context with similar trylock approach will
follow. It will use try_alloc_pages() when slab needs a new page.
Though such try_kmalloc/page_alloc() is an opportunistic allocator,
this design ensures that the probability of successful allocation of
small objects (up to one page in size) is high.
Even before we have try_kmalloc(), we already use try_alloc_pages() in
BPF arena implementation and it's going to be used more extensively in
BPF"
* tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
mm: Fix the flipped condition in gfpflags_allow_spinning()
bpf: Use try_alloc_pages() to allocate pages for bpf needs.
mm, bpf: Use memcg in try_alloc_pages().
memcg: Use trylock to access memcg stock_lock.
mm, bpf: Introduce free_pages_nolock()
mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
locking/local_lock: Introduce localtry_lock_t
Diffstat (limited to 'include/linux/bpf.h')
| -rw-r--r-- | include/linux/bpf.h | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d67490dc3a2b..3f0cc89c0622 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2385,7 +2385,7 @@ int generic_map_delete_batch(struct bpf_map *map, struct bpf_map *bpf_map_get_curr_or_next(u32 *id); struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id); -int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid, +int bpf_map_alloc_pages(const struct bpf_map *map, int nid, unsigned long nr_pages, struct page **page_array); #ifdef CONFIG_MEMCG void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags, |
