diff options
| author | Vlastimil Babka <vbabka@suse.cz> | 2025-09-26 14:51:17 +0200 |
|---|---|---|
| committer | Vlastimil Babka <vbabka@suse.cz> | 2025-09-29 09:46:17 +0200 |
| commit | b9120619246d733a27e5e93c29e86f2e0401cfc5 (patch) | |
| tree | 9c13b377eb5e64ae2f7bf8a43422f65255b3dda1 /tools/include/linux | |
| parent | f7381b9116407ba2a429977c80ff8df953ea9354 (diff) | |
| parent | 719a42e563bb087758500e43e67a57b27f303c4c (diff) | |
Merge series "SLUB percpu sheaves"
This series adds an opt-in percpu array-based caching layer to SLUB.
It has evolved to a state where kmem caches with sheaves are compatible
with all SLUB features (slub_debug, SLUB_TINY, NUMA locality
considerations). The plan is therefore that it will be later enabled for
all kmem caches and replace the complicated cpu (partial) slabs code.
Note the name "sheaf" was invented by Matthew Wilcox so we don't call
the arrays magazines like the original Bonwick paper. The per-NUMA-node
cache of sheaves is thus called "barn".
This caching may seem similar to the arrays we had in SLAB, but there
are some important differences:
- deals differently with NUMA locality of freed objects, thus there are
no per-node "shared" arrays (with possible lock contention) and no
"alien" arrays that would need periodical flushing
- instead, freeing remote objects (which is rare) bypasses the sheaves
- percpu sheaves thus contain only local objects (modulo rare races
and local node exhaustion)
- NUMA restricted allocations and strict_numa mode is still honoured
- improves kfree_rcu() handling by reusing whole sheaves
- there is an API for obtaining a preallocated sheaf that can be used
for guaranteed and efficient allocations in a restricted context, when
the upper bound for needed objects is known but rarely reached
- opt-in, not used for every cache (for now)
The motivation comes mainly from the ongoing work related to VMA locking
scalability and the related maple tree operations. This is why VMA and
maple nodes caches are sheaf-enabled in the patchset.
A sheaf-enabled cache has the following expected advantages:
- Cheaper fast paths. For allocations, instead of local double cmpxchg,
thanks to local_trylock() it becomes a preempt_disable() and no atomic
operations. Same for freeing, which is otherwise a local double cmpxchg
only for short term allocations (so the same slab is still active on the
same cpu when freeing the object) and a more costly locked double
cmpxchg otherwise.
- kfree_rcu() batching and recycling. kfree_rcu() will put objects to a
separate percpu sheaf and only submit the whole sheaf to call_rcu()
when full. After the grace period, the sheaf can be used for
allocations, which is more efficient than freeing and reallocating
individual slab objects (even with the batching done by kfree_rcu()
implementation itself). In case only some cpus are allowed to handle rcu
callbacks, the sheaf can still be made available to other cpus on the
same node via the shared barn. The maple_node cache uses kfree_rcu() and
thus can benefit from this.
Note: this path is currently limited to !PREEMPT_RT
- Preallocation support. A prefilled sheaf can be privately borrowed to
perform a short term operation that is not allowed to block in the
middle and may need to allocate some objects. If an upper bound (worst
case) for the number of allocations is known, but only much fewer
allocations actually needed on average, borrowing and returning a sheaf
is much more efficient then a bulk allocation for the worst case
followed by a bulk free of the many unused objects. Maple tree write
operations should benefit from this.
- Compatibility with slub_debug. When slub_debug is enabled for a cache,
we simply don't create the percpu sheaves so that the debugging hooks
(at the node partial list slowpaths) are reached as before. The same
thing is done for CONFIG_SLUB_TINY. Sheaf preallocation still works by
reusing the (ineffective) paths for requests exceeding the cache's
sheaf_capacity. This is in line with the existing approach where
debugging bypasses the fast paths and SLUB_TINY preferes memory
savings over performance.
The above is adapted from the cover letter [1], which contains also
in-kernel microbenchmark results showing the lower overhead of sheaves.
Results from Suren Baghdasaryan [2] using a mmap/munmap microbenchmark
also show improvements.
Results from Sudarsan Mahendran [3] using will-it-scale show both
benefits and regressions, probably due to overall noisiness of those
tests.
Link: https://lore.kernel.org/all/20250910-slub-percpu-caches-v8-0-ca3099d8352c@suse.cz/ [1]
Link: https://lore.kernel.org/all/CAJuCfpEQ%3DRUgcAvRzE5jRrhhFpkm8E2PpBK9e9GhK26ZaJQt%3DQ@mail.gmail.com/ [2]
Link: https://lore.kernel.org/all/20250913000935.1021068-1-sudarsanm@google.com/ [3]
Diffstat (limited to 'tools/include/linux')
| -rw-r--r-- | tools/include/linux/slab.h | 165 |
1 files changed, 161 insertions, 4 deletions
diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h index c87051e2b26f..94937a699402 100644 --- a/tools/include/linux/slab.h +++ b/tools/include/linux/slab.h @@ -4,11 +4,31 @@ #include <linux/types.h> #include <linux/gfp.h> +#include <pthread.h> -#define SLAB_PANIC 2 #define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */ #define kzalloc_node(size, flags, node) kmalloc(size, flags) +enum _slab_flag_bits { + _SLAB_KMALLOC, + _SLAB_HWCACHE_ALIGN, + _SLAB_PANIC, + _SLAB_TYPESAFE_BY_RCU, + _SLAB_ACCOUNT, + _SLAB_FLAGS_LAST_BIT +}; + +#define __SLAB_FLAG_BIT(nr) ((unsigned int __force)(1U << (nr))) +#define __SLAB_FLAG_UNUSED ((unsigned int __force)(0U)) + +#define SLAB_HWCACHE_ALIGN __SLAB_FLAG_BIT(_SLAB_HWCACHE_ALIGN) +#define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) +#define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU) +#ifdef CONFIG_MEMCG +# define SLAB_ACCOUNT __SLAB_FLAG_BIT(_SLAB_ACCOUNT) +#else +# define SLAB_ACCOUNT __SLAB_FLAG_UNUSED +#endif void *kmalloc(size_t size, gfp_t gfp); void kfree(void *p); @@ -23,6 +43,98 @@ enum slab_state { FULL }; +struct kmem_cache { + pthread_mutex_t lock; + unsigned int size; + unsigned int align; + unsigned int sheaf_capacity; + int nr_objs; + void *objs; + void (*ctor)(void *); + bool non_kernel_enabled; + unsigned int non_kernel; + unsigned long nr_allocated; + unsigned long nr_tallocated; + bool exec_callback; + void (*callback)(void *); + void *private; +}; + +struct kmem_cache_args { + /** + * @align: The required alignment for the objects. + * + * %0 means no specific alignment is requested. + */ + unsigned int align; + /** + * @sheaf_capacity: The maximum size of the sheaf. + */ + unsigned int sheaf_capacity; + /** + * @useroffset: Usercopy region offset. + * + * %0 is a valid offset, when @usersize is non-%0 + */ + unsigned int useroffset; + /** + * @usersize: Usercopy region size. + * + * %0 means no usercopy region is specified. + */ + unsigned int usersize; + /** + * @freeptr_offset: Custom offset for the free pointer + * in &SLAB_TYPESAFE_BY_RCU caches + * + * By default &SLAB_TYPESAFE_BY_RCU caches place the free pointer + * outside of the object. This might cause the object to grow in size. + * Cache creators that have a reason to avoid this can specify a custom + * free pointer offset in their struct where the free pointer will be + * placed. + * + * Note that placing the free pointer inside the object requires the + * caller to ensure that no fields are invalidated that are required to + * guard against object recycling (See &SLAB_TYPESAFE_BY_RCU for + * details). + * + * Using %0 as a value for @freeptr_offset is valid. If @freeptr_offset + * is specified, %use_freeptr_offset must be set %true. + * + * Note that @ctor currently isn't supported with custom free pointers + * as a @ctor requires an external free pointer. + */ + unsigned int freeptr_offset; + /** + * @use_freeptr_offset: Whether a @freeptr_offset is used. + */ + bool use_freeptr_offset; + /** + * @ctor: A constructor for the objects. + * + * The constructor is invoked for each object in a newly allocated slab + * page. It is the cache user's responsibility to free object in the + * same state as after calling the constructor, or deal appropriately + * with any differences between a freshly constructed and a reallocated + * object. + * + * %NULL means no constructor. + */ + void (*ctor)(void *); +}; + +struct slab_sheaf { + union { + struct list_head barn_list; + /* only used for prefilled sheafs */ + unsigned int capacity; + }; + struct kmem_cache *cache; + unsigned int size; + int node; /* only used for rcu_sheaf */ + void *objects[]; +}; + static inline void *kzalloc(size_t size, gfp_t gfp) { return kmalloc(size, gfp | __GFP_ZERO); @@ -37,12 +149,57 @@ static inline void *kmem_cache_alloc(struct kmem_cache *cachep, int flags) } void kmem_cache_free(struct kmem_cache *cachep, void *objp); -struct kmem_cache *kmem_cache_create(const char *name, unsigned int size, - unsigned int align, unsigned int flags, - void (*ctor)(void *)); + +struct kmem_cache * +__kmem_cache_create_args(const char *name, unsigned int size, + struct kmem_cache_args *args, unsigned int flags); + +/* If NULL is passed for @args, use this variant with default arguments. */ +static inline struct kmem_cache * +__kmem_cache_default_args(const char *name, unsigned int size, + struct kmem_cache_args *args, unsigned int flags) +{ + struct kmem_cache_args kmem_default_args = {}; + + return __kmem_cache_create_args(name, size, &kmem_default_args, flags); +} + +static inline struct kmem_cache * +__kmem_cache_create(const char *name, unsigned int size, unsigned int align, + unsigned int flags, void (*ctor)(void *)) +{ + struct kmem_cache_args kmem_args = { + .align = align, + .ctor = ctor, + }; + + return __kmem_cache_create_args(name, size, &kmem_args, flags); +} + +#define kmem_cache_create(__name, __object_size, __args, ...) \ + _Generic((__args), \ + struct kmem_cache_args *: __kmem_cache_create_args, \ + void *: __kmem_cache_default_args, \ + default: __kmem_cache_create)(__name, __object_size, __args, __VA_ARGS__) void kmem_cache_free_bulk(struct kmem_cache *cachep, size_t size, void **list); int kmem_cache_alloc_bulk(struct kmem_cache *cachep, gfp_t gfp, size_t size, void **list); +struct slab_sheaf * +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size); + +void * +kmem_cache_alloc_from_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf); + +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf *sheaf); +int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp, + struct slab_sheaf **sheafp, unsigned int size); + +static inline unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf) +{ + return sheaf->size; +} #endif /* _TOOLS_SLAB_H */ |
