<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/bpf/memalloc.c, branch v7.0-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>bpf: Register dtor for freeing special fields</title>
<updated>2026-02-27T23:39:00+00:00</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2026-02-27T22:48:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1df97a7453eec80c1912c2d0360290a3970a7671'/>
<id>1df97a7453eec80c1912c2d0360290a3970a7671</id>
<content type='text'>
There is a race window where BPF hash map elements can leak special
fields if the program with access to the map value recreates these
special fields between the check_and_free_fields done on the map value
and its eventual return to the memory allocator.

Several ways were explored prior to this patch, most notably [0] tried
to use a poison value to reject attempts to recreate special fields for
map values that have been logically deleted but still accessible to BPF
programs (either while sitting in the free list or when reused). While
this approach works well for task work, timers, wq, etc., it is harder
to apply the idea to kptrs, which have a similar race and failure mode.

Instead, we change bpf_mem_alloc to allow registering destructor for
allocated elements, such that when they are returned to the allocator,
any special fields created while they were accessible to programs in the
mean time will be freed. If these values get reused, we do not free the
fields again before handing the element back. The special fields thus
may remain initialized while the map value sits in a free list.

When bpf_mem_alloc is retired in the future, a similar concept can be
introduced to kmalloc_nolock-backed kmem_cache, paired with the existing
idea of a constructor.

Note that the destructor registration happens in map_check_btf, after
the BTF record is populated and (at that point) avaiable for inspection
and duplication. Duplication is necessary since the freeing of embedded
bpf_mem_alloc can be decoupled from actual map lifetime due to logic
introduced to reduce the cost of rcu_barrier()s in mem alloc free path in
9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.").

As such, once all callbacks are done, we must also free the duplicated
record. To remove dependency on the bpf_map itself, also stash the key
size of the map to obtain value from htab_elem long after the map is
gone.

  [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Fixes: 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context")
Reported-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a race window where BPF hash map elements can leak special
fields if the program with access to the map value recreates these
special fields between the check_and_free_fields done on the map value
and its eventual return to the memory allocator.

Several ways were explored prior to this patch, most notably [0] tried
to use a poison value to reject attempts to recreate special fields for
map values that have been logically deleted but still accessible to BPF
programs (either while sitting in the free list or when reused). While
this approach works well for task work, timers, wq, etc., it is harder
to apply the idea to kptrs, which have a similar race and failure mode.

Instead, we change bpf_mem_alloc to allow registering destructor for
allocated elements, such that when they are returned to the allocator,
any special fields created while they were accessible to programs in the
mean time will be freed. If these values get reused, we do not free the
fields again before handing the element back. The special fields thus
may remain initialized while the map value sits in a free list.

When bpf_mem_alloc is retired in the future, a similar concept can be
introduced to kmalloc_nolock-backed kmem_cache, paired with the existing
idea of a constructor.

Note that the destructor registration happens in map_check_btf, after
the BTF record is populated and (at that point) avaiable for inspection
and duplication. Duplication is necessary since the freeing of embedded
bpf_mem_alloc can be decoupled from actual map lifetime due to logic
introduced to reduce the cost of rcu_barrier()s in mem alloc free path in
9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.").

As such, once all callbacks are done, we must also free the duplicated
record. To remove dependency on the bpf_map itself, also stash the key
size of the map to obtain value from htab_elem long after the map is
gone.

  [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Fixes: 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context")
Reported-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: replace use of system_unbound_wq with system_dfl_wq</title>
<updated>2025-09-08T17:04:37+00:00</updated>
<author>
<name>Marco Crivellari</name>
<email>marco.crivellari@suse.com</email>
</author>
<published>2025-09-05T08:53:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0409819a002165e9471113598323eb72bba17a53'/>
<id>0409819a002165e9471113598323eb72bba17a53</id>
<content type='text'>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

queue_work() / queue_delayed_work() / mod_delayed_work() will now use the
new unbound wq: whether the user still use the old wq a warn will be
printed along with a wq redirect to the new one.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Link: https://lore.kernel.org/r/20250905085309.94596-3-marco.crivellari@suse.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

queue_work() / queue_delayed_work() / mod_delayed_work() will now use the
new unbound wq: whether the user still use the old wq a warn will be
printed along with a wq redirect to the new one.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Link: https://lore.kernel.org/r/20250905085309.94596-3-marco.crivellari@suse.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2024-11-13T20:52:51+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2024-11-13T20:51:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=871438170326dc28125cb823d19c1d5c5304474d'/>
<id>871438170326dc28125cb823d19c1d5c5304474d</id>
<content type='text'>
Cross-merge bpf fixes after downstream PR.

In particular to bring the fix in
commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.

No conflicts.

Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cross-merge bpf fixes after downstream PR.

In particular to bring the fix in
commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long").
The follow up verifier work depends on it.
And the fix in
commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator").
It's fixing instability of BPF CI on s390 arch.

No conflicts.

Adjacent changes in:
Auto-merging arch/Kconfig
Auto-merging kernel/bpf/helpers.c
Auto-merging kernel/bpf/memalloc.c
Auto-merging kernel/bpf/verifier.c
Auto-merging mm/slab_common.c

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Add bpf_mem_alloc_check_size() helper</title>
<updated>2024-10-30T19:13:46+00:00</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2024-10-30T10:05:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=62a898b07b83f6f407003d8a70f0827a5af08a59'/>
<id>62a898b07b83f6f407003d8a70f0827a5af08a59</id>
<content type='text'>
Introduce bpf_mem_alloc_check_size() to check whether the allocation
size exceeds the limitation for the kmalloc-equivalent allocator. The
upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than
non-percpu allocation, so a percpu argument is added to the helper.

The helper will be used in the following patch to check whether the size
parameter passed to bpf_mem_alloc() is too big.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce bpf_mem_alloc_check_size() to check whether the allocation
size exceeds the limitation for the kmalloc-equivalent allocator. The
upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than
non-percpu allocation, so a percpu argument is added to the helper.

The helper will be used in the following patch to check whether the size
parameter passed to bpf_mem_alloc() is too big.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Call kfree(obj) only once in free_one()</title>
<updated>2024-10-04T00:47:35+00:00</updated>
<author>
<name>Markus Elfring</name>
<email>elfring@users.sourceforge.net</email>
</author>
<published>2024-09-26T11:30:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=40f34d6f12e292875b8027ec66038cabb5a317f6'/>
<id>40f34d6f12e292875b8027ec66038cabb5a317f6</id>
<content type='text'>
A kfree() call is always used at the end of this function implementation.
Thus specify such a function call only once instead of duplicating it
in a previous if branch.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/bpf/08987123-668c-40f3-a8ee-c3038d94f069@web.de
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A kfree() call is always used at the end of this function implementation.
Thus specify such a function call only once instead of duplicating it
in a previous if branch.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring &lt;elfring@users.sourceforge.net&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/bpf/08987123-668c-40f3-a8ee-c3038d94f069@web.de
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix percpu address space issues</title>
<updated>2024-08-22T15:01:50+00:00</updated>
<author>
<name>Uros Bizjak</name>
<email>ubizjak@gmail.com</email>
</author>
<published>2024-08-11T16:13:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6d641ca50d7ec7d5e4e889c3f8ea22afebc2a403'/>
<id>6d641ca50d7ec7d5e4e889c3f8ea22afebc2a403</id>
<content type='text'>
In arraymap.c:

In bpf_array_map_seq_start() and bpf_array_map_seq_next()
cast return values from the __percpu address space to
the generic address space via uintptr_t [1].

Correct the declaration of pptr pointer in __bpf_array_map_seq_show()
to void __percpu * and cast the value from the generic address
space to the __percpu address space via uintptr_t [1].

In hashtab.c:

Assign the return value from bpf_mem_cache_alloc() to void pointer
and cast the value to void __percpu ** (void pointer to percpu void
pointer) before dereferencing.

In memalloc.c:

Explicitly declare __percpu variables.

Cast obj to void __percpu **.

In helpers.c:

Cast ptr in BPF_CALL_1 and BPF_CALL_2 from generic address space
to __percpu address space via const uintptr_t [1].

Found by GCC's named address space checks.

There were no changes in the resulting object files.

[1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name

Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/20240811161414.56744-1-ubizjak@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In arraymap.c:

In bpf_array_map_seq_start() and bpf_array_map_seq_next()
cast return values from the __percpu address space to
the generic address space via uintptr_t [1].

Correct the declaration of pptr pointer in __bpf_array_map_seq_show()
to void __percpu * and cast the value from the generic address
space to the __percpu address space via uintptr_t [1].

In hashtab.c:

Assign the return value from bpf_mem_cache_alloc() to void pointer
and cast the value to void __percpu ** (void pointer to percpu void
pointer) before dereferencing.

In memalloc.c:

Explicitly declare __percpu variables.

Cast obj to void __percpu **.

In helpers.c:

Cast ptr in BPF_CALL_1 and BPF_CALL_2 from generic address space
to __percpu address space via const uintptr_t [1].

Found by GCC's named address space checks.

There were no changes in the resulting object files.

[1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name

Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: Martin KaFai Lau &lt;martin.lau@linux.dev&gt;
Cc: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Cc: John Fastabend &lt;john.fastabend@gmail.com&gt;
Cc: KP Singh &lt;kpsingh@kernel.org&gt;
Cc: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Cc: Hao Luo &lt;haoluo@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Link: https://lore.kernel.org/r/20240811161414.56744-1-ubizjak@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove CONFIG_MEMCG_KMEM</title>
<updated>2024-07-10T19:14:54+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2024-07-01T15:31:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3a3b7fec3974f954600844e41d773c00857ef48a'/>
<id>3a3b7fec3974f954600844e41d773c00857ef48a</id>
<content type='text'>
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled.  It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade.  We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.

Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled.  It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade.  We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.

Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: memcg: add NULL check to obj_cgroup_put()</title>
<updated>2024-04-26T03:55:43+00:00</updated>
<author>
<name>Yosry Ahmed</name>
<email>yosryahmed@google.com</email>
</author>
<published>2024-03-16T01:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=91b71e78b8e478e1a9af36b15ff1d38810572866'/>
<id>91b71e78b8e478e1a9af36b15ff1d38810572866</id>
<content type='text'>
9 out of 16 callers perform a NULL check before calling obj_cgroup_put(). 
Move the NULL check in the function, similar to mem_cgroup_put().  The
unlikely() NULL check in current_objcg_update() was left alone to avoid
dropping the unlikey() annotation as this a fast path.

Link: https://lkml.kernel.org/r/20240316015803.2777252-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
9 out of 16 callers perform a NULL check before calling obj_cgroup_put(). 
Move the NULL check in the function, similar to mem_cgroup_put().  The
unlikely() NULL check in current_objcg_update() was left alone to avoid
dropping the unlikey() annotation as this a fast path.

Link: https://lkml.kernel.org/r/20240316015803.2777252-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Remove unnecessary cpu == 0 check in memalloc</title>
<updated>2024-01-04T18:18:14+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yonghong.song@linux.dev</email>
</author>
<published>2024-01-04T16:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9ddf872b47e3ac8f27dbfc4a4737a976c7588de6'/>
<id>9ddf872b47e3ac8f27dbfc4a4737a976c7588de6</id>
<content type='text'>
After merging the patch set [1] to reduce memory usage
for bpf_global_percpu_ma, Alexei found a redundant check (cpu == 0)
in function bpf_mem_alloc_percpu_unit_init() ([2]).
Indeed, the check is unnecessary since c-&gt;unit_size will
be all NULL or all non-NULL for all cpus before
for_each_possible_cpu() loop.
Removing the check makes code less confusing.

  [1] https://lore.kernel.org/all/20231222031729.1287957-1-yonghong.song@linux.dev/
  [2] https://lore.kernel.org/all/20231222031745.1289082-1-yonghong.song@linux.dev/

Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Link: https://lore.kernel.org/r/20240104165744.702239-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After merging the patch set [1] to reduce memory usage
for bpf_global_percpu_ma, Alexei found a redundant check (cpu == 0)
in function bpf_mem_alloc_percpu_unit_init() ([2]).
Indeed, the check is unnecessary since c-&gt;unit_size will
be all NULL or all non-NULL for all cpus before
for_each_possible_cpu() loop.
Removing the check makes code less confusing.

  [1] https://lore.kernel.org/all/20231222031729.1287957-1-yonghong.song@linux.dev/
  [2] https://lore.kernel.org/all/20231222031745.1289082-1-yonghong.song@linux.dev/

Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Link: https://lore.kernel.org/r/20240104165744.702239-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Use smaller low/high marks for percpu allocation</title>
<updated>2024-01-04T05:08:25+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yonghong.song@linux.dev</email>
</author>
<published>2023-12-22T03:17:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0e2ba9f96f9b82893ba19170ae48d46003f8ef44'/>
<id>0e2ba9f96f9b82893ba19170ae48d46003f8ef44</id>
<content type='text'>
Currently, refill low/high marks are set with the assumption
of normal non-percpu memory allocation. For example, for
an allocation size 256, for non-percpu memory allocation,
low mark is 32 and high mark is 96, resulting in the
batch allocation of 48 elements and the allocated memory
will be 48 * 256 = 12KB for this particular cpu.
Assuming an 128-cpu system, the total memory consumption
across all cpus will be 12K * 128 = 1.5MB memory.

This might be okay for non-percpu allocation, but may not be
good for percpu allocation, which will consume 1.5MB * 128 = 192MB
memory in the worst case if every cpu has a chance of memory
allocation.

In practice, percpu allocation is very rare compared to
non-percpu allocation. So let us have smaller low/high marks
which can avoid unnecessary memory consumption.

Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231222031755.1289671-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, refill low/high marks are set with the assumption
of normal non-percpu memory allocation. For example, for
an allocation size 256, for non-percpu memory allocation,
low mark is 32 and high mark is 96, resulting in the
batch allocation of 48 elements and the allocated memory
will be 48 * 256 = 12KB for this particular cpu.
Assuming an 128-cpu system, the total memory consumption
across all cpus will be 12K * 128 = 1.5MB memory.

This might be okay for non-percpu allocation, but may not be
good for percpu allocation, which will consume 1.5MB * 128 = 192MB
memory in the worst case if every cpu has a chance of memory
allocation.

In practice, percpu allocation is very rare compared to
non-percpu allocation. So let us have smaller low/high marks
which can avoid unnecessary memory consumption.

Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20231222031755.1289671-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
