diff options
| author | Roman Gushchin <roman.gushchin@linux.dev> | 2025-12-22 20:41:54 -0800 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2025-12-22 22:20:22 -0800 |
| commit | 99430ab8b804c26b8a0dec93fcbfe75469f3edc7 (patch) | |
| tree | 969eab171627f9484456af52c2fdb6419ab17a4f /include | |
| parent | 5c7db3239c9fbe3c62cb0d89b64959ea23af2de9 (diff) | |
mm: introduce BPF kfuncs to access memcg statistics and events
Introduce BPF kfuncs to conveniently access memcg data:
- bpf_mem_cgroup_vm_events(),
- bpf_mem_cgroup_memory_events(),
- bpf_mem_cgroup_usage(),
- bpf_mem_cgroup_page_state(),
- bpf_mem_cgroup_flush_stats().
These functions are useful for implementing BPF OOM policies, but
also can be used to accelerate access to the memcg data. Reading
it through cgroupfs is much more expensive, roughly 5x, mostly
because of the need to convert the data into the text and back.
JP Kobryn:
An experiment was setup to compare the performance of a program that
uses the traditional method of reading memory.stat vs a program using
the new kfuncs. The control program opens up the root memory.stat file
and for 1M iterations reads, converts the string values to numeric data,
then seeks back to the beginning. The experimental program sets up the
requisite libbpf objects and for 1M iterations invokes a bpf program
which uses the kfuncs to fetch all available stats for node_stat_item,
memcg_stat_item, and vm_event_item types.
The results showed a significant perf benefit on the experimental side,
outperforming the control side by a margin of 93%. In kernel mode,
elapsed time was reduced by 80%, while in user mode, over 99% of time
was saved.
control: elapsed time
real 0m38.318s
user 0m25.131s
sys 0m13.070s
experiment: elapsed time
real 0m2.789s
user 0m0.187s
sys 0m2.512s
control: perf data
33.43% a.out libc.so.6 [.] __vfscanf_internal
6.88% a.out [kernel.kallsyms] [k] vsnprintf
6.33% a.out libc.so.6 [.] _IO_fgets
5.51% a.out [kernel.kallsyms] [k] format_decode
4.31% a.out libc.so.6 [.] __GI_____strtoull_l_internal
3.78% a.out [kernel.kallsyms] [k] string
3.53% a.out [kernel.kallsyms] [k] number
2.71% a.out libc.so.6 [.] _IO_sputbackc
2.41% a.out [kernel.kallsyms] [k] strlen
1.98% a.out a.out [.] main
1.70% a.out libc.so.6 [.] _IO_getline_info
1.51% a.out libc.so.6 [.] __isoc99_sscanf
1.47% a.out [kernel.kallsyms] [k] memory_stat_format
1.47% a.out [kernel.kallsyms] [k] memcpy_orig
1.41% a.out [kernel.kallsyms] [k] seq_buf_printf
experiment: perf data
10.55% memcgstat bpf_prog_..._query [k] bpf_prog_16aab2f19fa982a7_query
6.90% memcgstat [kernel.kallsyms] [k] memcg_page_state_output
3.55% memcgstat [kernel.kallsyms] [k] _raw_spin_lock
3.12% memcgstat [kernel.kallsyms] [k] memcg_events
2.87% memcgstat [kernel.kallsyms] [k] __memcg_slab_post_alloc_hook
2.73% memcgstat [kernel.kallsyms] [k] kmem_cache_free
2.70% memcgstat [kernel.kallsyms] [k] entry_SYSRETQ_unsafe_stack
2.25% memcgstat [kernel.kallsyms] [k] __memcg_slab_free_hook
2.06% memcgstat [kernel.kallsyms] [k] get_page_from_freelist
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Co-developed-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Link: https://lore.kernel.org/r/20251223044156.208250-5-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'include')
| -rw-r--r-- | include/linux/memcontrol.h | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 7bef427d5a82..6a5d65487b70 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -949,8 +949,12 @@ static inline void mod_memcg_page_state(struct page *page, rcu_read_unlock(); } +unsigned long memcg_events(struct mem_cgroup *memcg, int event); +unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap); unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); unsigned long memcg_page_state_output(struct mem_cgroup *memcg, int item); +bool memcg_stat_item_valid(int idx); +bool memcg_vm_event_item_valid(enum vm_event_item idx); unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx); unsigned long lruvec_page_state_local(struct lruvec *lruvec, enum node_stat_item idx); @@ -1379,6 +1383,16 @@ static inline unsigned long memcg_page_state_output(struct mem_cgroup *memcg, in return 0; } +static inline bool memcg_stat_item_valid(int idx) +{ + return false; +} + +static inline bool memcg_vm_event_item_valid(enum vm_event_item idx) +{ + return false; +} + static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { |
