diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-10 11:26:21 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-02-10 11:26:21 -0800 |
| commit | f17b474e36647c23801ef8fdaf2255ab66dd2973 (patch) | |
| tree | 7fbaa4d93d71d72eb1cf8f61201eb42881daaeb0 /Documentation | |
| parent | a7423e6ea2f8f6f453de79213c26f7a36c86d9a2 (diff) | |
| parent | db975debcb8c4cd367a78811bc1ba84c83f854bd (diff) | |
Merge tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Support associating BPF program with struct_ops (Amery Hung)
- Switch BPF local storage to rqspinlock and remove recursion detection
counters which were causing false positives (Amery Hung)
- Fix live registers marking for indirect jumps (Anton Protopopov)
- Introduce execution context detection BPF helpers (Changwoo Min)
- Improve verifier precision for 32bit sign extension pattern
(Cupertino Miranda)
- Optimize BTF type lookup by sorting vmlinux BTF and doing binary
search (Donglin Peng)
- Allow states pruning for misc/invalid slots in iterator loops (Eduard
Zingerman)
- In preparation for ASAN support in BPF arenas teach libbpf to move
global BPF variables to the end of the region and enable arena kfuncs
while holding locks (Emil Tsalapatis)
- Introduce support for implicit arguments in kfuncs and migrate a
number of them to new API. This is a prerequisite for cgroup
sub-schedulers in sched-ext (Ihor Solodrai)
- Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen)
- Fix ORC stack unwind from kprobe_multi (Jiri Olsa)
- Speed up fentry attach by using single ftrace direct ops in BPF
trampolines (Jiri Olsa)
- Require frozen map for calculating map hash (KP Singh)
- Fix lock entry creation in TAS fallback in rqspinlock (Kumar
Kartikeya Dwivedi)
- Allow user space to select cpu in lookup/update operations on per-cpu
array and hash maps (Leon Hwang)
- Make kfuncs return trusted pointers by default (Matt Bobrowski)
- Introduce "fsession" support where single BPF program is executed
upon entry and exit from traced kernel function (Menglong Dong)
- Allow bpf_timer and bpf_wq use in all programs types (Mykyta
Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei
Starovoitov)
- Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their
definition across the tree (Puranjay Mohan)
- Allow BPF arena calls from non-sleepable context (Puranjay Mohan)
- Improve register id comparison logic in the verifier and extend
linked registers with negative offsets (Puranjay Mohan)
- In preparation for BPF-OOM introduce kfuncs to access memcg events
(Roman Gushchin)
- Use CFI compatible destructor kfunc type (Sami Tolvanen)
- Add bitwise tracking for BPF_END in the verifier (Tianci Cao)
- Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou
Tang)
- Make BPF selftests work with 64k page size (Yonghong Song)
* tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits)
selftests/bpf: Fix outdated test on storage->smap
selftests/bpf: Choose another percpu variable in bpf for btf_dump test
selftests/bpf: Remove test_task_storage_map_stress_lookup
selftests/bpf: Update task_local_storage/task_storage_nodeadlock test
selftests/bpf: Update task_local_storage/recursion test
selftests/bpf: Update sk_storage_omem_uncharge test
bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}
bpf: Support lockless unlink when freeing map or local storage
bpf: Prepare for bpf_selem_unlink_nofail()
bpf: Remove unused percpu counter from bpf_local_storage_map_free
bpf: Remove cgroup local storage percpu counter
bpf: Remove task local storage percpu counter
bpf: Change local_storage->lock and b->lock to rqspinlock
bpf: Convert bpf_selem_unlink to failable
bpf: Convert bpf_selem_link_map to failable
bpf: Convert bpf_selem_unlink_map to failable
bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage
selftests/xsk: fix number of Tx frags in invalid packet
selftests/xsk: properly handle batch ending in the middle of a packet
bpf: Prevent reentrance into call_rcu_tasks_trace()
...
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/bpf/bpf_prog_run.rst | 3 | ||||
| -rw-r--r-- | Documentation/bpf/kfuncs.rst | 260 | ||||
| -rw-r--r-- | Documentation/process/changes.rst | 4 | ||||
| -rw-r--r-- | Documentation/scheduler/sched-ext.rst | 1 |
4 files changed, 145 insertions, 123 deletions
diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst index 4868c909df5c..81ef768c75a3 100644 --- a/Documentation/bpf/bpf_prog_run.rst +++ b/Documentation/bpf/bpf_prog_run.rst @@ -34,11 +34,12 @@ following types: - ``BPF_PROG_TYPE_LWT_IN`` - ``BPF_PROG_TYPE_LWT_OUT`` - ``BPF_PROG_TYPE_LWT_XMIT`` -- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` - ``BPF_PROG_TYPE_FLOW_DISSECTOR`` - ``BPF_PROG_TYPE_STRUCT_OPS`` - ``BPF_PROG_TYPE_RAW_TRACEPOINT`` - ``BPF_PROG_TYPE_SYSCALL`` +- ``BPF_PROG_TYPE_TRACING`` +- ``BPF_PROG_TYPE_NETFILTER`` When using the ``BPF_PROG_RUN`` command, userspace supplies an input context object and (for program types operating on network packets) a buffer containing diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index e38941370b90..75e6c078e0e7 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -50,7 +50,70 @@ A wrapper kfunc is often needed when we need to annotate parameters of the kfunc. Otherwise one may directly make the kfunc visible to the BPF program by registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. -2.2 Annotating kfunc parameters +2.2 kfunc Parameters +-------------------- + +All kfuncs now require trusted arguments by default. This means that all +pointer arguments must be valid, and all pointers to BTF objects must be +passed in their unmodified form (at a zero offset, and without having been +obtained from walking another pointer, with exceptions described below). + +There are two types of pointers to kernel objects which are considered "trusted": + +1. Pointers which are passed as tracepoint or struct_ops callback arguments. +2. Pointers which were returned from a KF_ACQUIRE kfunc. + +Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to +kfuncs, and may have a non-zero offset. + +The definition of "valid" pointers is subject to change at any time, and has +absolutely no ABI stability guarantees. + +As mentioned above, a nested pointer obtained from walking a trusted pointer is +no longer trusted, with one exception. If a struct type has a field that is +guaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long +as its parent pointer is valid, the following macros can be used to express +that to the verifier: + +* ``BTF_TYPE_SAFE_TRUSTED`` +* ``BTF_TYPE_SAFE_RCU`` +* ``BTF_TYPE_SAFE_RCU_OR_NULL`` + +For example, + +.. code-block:: c + + BTF_TYPE_SAFE_TRUSTED(struct socket) { + struct sock *sk; + }; + +or + +.. code-block:: c + + BTF_TYPE_SAFE_RCU(struct task_struct) { + const cpumask_t *cpus_ptr; + struct css_set __rcu *cgroups; + struct task_struct __rcu *real_parent; + struct task_struct *group_leader; + }; + +In other words, you must: + +1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro. + +2. Specify the type and name of the valid nested field. This field must match + the field in the original type definition exactly. + +A new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so +that it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)`` +is emitted in the ``type_is_trusted()`` function as follows: + +.. code-block:: c + + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket)); + +2.3 Annotating kfunc parameters ------------------------------- Similar to BPF helpers, there is sometime need for additional context required @@ -58,7 +121,7 @@ by the verifier to make the usage of kernel functions safer and more useful. Hence, we can annotate a parameter by suffixing the name of the argument of the kfunc with a __tag, where tag may be one of the supported annotations. -2.2.1 __sz Annotation +2.3.1 __sz Annotation --------------------- This annotation is used to indicate a memory and size pair in the argument list. @@ -74,7 +137,7 @@ argument as its size. By default, without __sz annotation, the size of the type of the pointer is used. Without __sz annotation, a kfunc cannot accept a void pointer. -2.2.2 __k Annotation +2.3.2 __k Annotation -------------------- This annotation is only understood for scalar arguments, where it indicates that @@ -98,7 +161,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a size parameter, and the value of the constant matters for program safety, __k suffix should be used. -2.2.3 __uninit Annotation +2.3.3 __uninit Annotation ------------------------- This annotation is used to indicate that the argument will be treated as @@ -115,27 +178,36 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this annotation, the verifier will reject the program if the dynptr passed in is not initialized. -2.2.4 __opt Annotation -------------------------- +2.3.4 __nullable Annotation +--------------------------- -This annotation is used to indicate that the buffer associated with an __sz or __szk -argument may be null. If the function is passed a nullptr in place of the buffer, -the verifier will not check that length is appropriate for the buffer. The kfunc is -responsible for checking if this buffer is null before using it. +This annotation is used to indicate that the pointer argument may be NULL. +The verifier will allow passing NULL for such arguments. An example is given below:: - __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk) + __bpf_kfunc void bpf_task_release(struct task_struct *task__nullable) { ... } -Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk. -Either way, the returned buffer is either NULL, or of size buffer_szk. Without this -annotation, the verifier will reject the program if a null pointer is passed in with -a nonzero size. +Here, the task pointer may be NULL. The kfunc is responsible for checking if +the pointer is NULL before dereferencing it. + +The __nullable annotation can be combined with other annotations. For example, +when used with __sz or __szk annotations for memory and size pairs, the +verifier will skip size validation when a NULL pointer is passed, but will +still process the size argument to extract constant size information when +needed:: + + __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__nullable, + u32 buffer__szk) + +Here, the buffer may be NULL. If the buffer is not NULL, it must be at least +buffer__szk bytes in size. The kfunc is responsible for checking if the buffer +is NULL before using it. -2.2.5 __str Annotation +2.3.5 __str Annotation ---------------------------- This annotation is used to indicate that the argument is a constant string. @@ -160,26 +232,9 @@ Or:: ... } -2.2.6 __prog Annotation ---------------------------- -This annotation is used to indicate that the argument needs to be fixed up to -the bpf_prog_aux of the caller BPF program. Any value passed into this argument -is ignored, and rewritten by the verifier. - -An example is given below:: - - __bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq, - int (callback_fn)(void *map, int *key, void *value), - unsigned int flags, - void *aux__prog) - { - struct bpf_prog_aux *aux = aux__prog; - ... - } - .. _BPF_kfunc_nodef: -2.3 Using an existing kernel function +2.4 Using an existing kernel function ------------------------------------- When an existing function in the kernel is fit for consumption by BPF programs, @@ -187,7 +242,7 @@ it can be directly registered with the BPF subsystem. However, care must still be taken to review the context in which it will be invoked by the BPF program and whether it is safe to do so. -2.4 Annotating kfuncs +2.5 Annotating kfuncs --------------------- In addition to kfuncs' arguments, verifier may need more information about the @@ -216,7 +271,7 @@ protected. An example is given below:: ... } -2.4.1 KF_ACQUIRE flag +2.5.1 KF_ACQUIRE flag --------------------- The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a @@ -226,7 +281,7 @@ referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the loading of the BPF program until no lingering references remain in all possible explored states of the program. -2.4.2 KF_RET_NULL flag +2.5.2 KF_RET_NULL flag ---------------------- The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc @@ -235,87 +290,21 @@ returned from the kfunc before making use of it (dereferencing or passing to another helper). This flag is often used in pairing with KF_ACQUIRE flag, but both are orthogonal to each other. -2.4.3 KF_RELEASE flag +2.5.3 KF_RELEASE flag --------------------- The KF_RELEASE flag is used to indicate that the kfunc releases the pointer passed in to it. There can be only one referenced pointer that can be passed in. All copies of the pointer being released are invalidated as a result of -invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the -protection afforded by the KF_TRUSTED_ARGS flag described below. - -2.4.4 KF_TRUSTED_ARGS flag --------------------------- +invoking kfunc with this flag. -The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It -indicates that the all pointer arguments are valid, and that all pointers to -BTF objects have been passed in their unmodified form (that is, at a zero -offset, and without having been obtained from walking another pointer, with one -exception described below). - -There are two types of pointers to kernel objects which are considered "valid": - -1. Pointers which are passed as tracepoint or struct_ops callback arguments. -2. Pointers which were returned from a KF_ACQUIRE kfunc. - -Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to -KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset. - -The definition of "valid" pointers is subject to change at any time, and has -absolutely no ABI stability guarantees. - -As mentioned above, a nested pointer obtained from walking a trusted pointer is -no longer trusted, with one exception. If a struct type has a field that is -guaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long -as its parent pointer is valid, the following macros can be used to express -that to the verifier: - -* ``BTF_TYPE_SAFE_TRUSTED`` -* ``BTF_TYPE_SAFE_RCU`` -* ``BTF_TYPE_SAFE_RCU_OR_NULL`` - -For example, - -.. code-block:: c - - BTF_TYPE_SAFE_TRUSTED(struct socket) { - struct sock *sk; - }; - -or - -.. code-block:: c - - BTF_TYPE_SAFE_RCU(struct task_struct) { - const cpumask_t *cpus_ptr; - struct css_set __rcu *cgroups; - struct task_struct __rcu *real_parent; - struct task_struct *group_leader; - }; - -In other words, you must: - -1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro. - -2. Specify the type and name of the valid nested field. This field must match - the field in the original type definition exactly. - -A new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so -that it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)`` -is emitted in the ``type_is_trusted()`` function as follows: - -.. code-block:: c - - BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket)); - - -2.4.5 KF_SLEEPABLE flag +2.5.4 KF_SLEEPABLE flag ----------------------- The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only be called by sleepable BPF programs (BPF_F_SLEEPABLE). -2.4.6 KF_DESTRUCTIVE flag +2.5.5 KF_DESTRUCTIVE flag -------------------------- The KF_DESTRUCTIVE flag is used to indicate functions calling which is @@ -324,18 +313,19 @@ rebooting or panicking. Due to this additional restrictions apply to these calls. At the moment they only require CAP_SYS_BOOT capability, but more can be added later. -2.4.7 KF_RCU flag +2.5.6 KF_RCU flag ----------------- -The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with -KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees -that the objects are valid and there is no use-after-free. The pointers are not -NULL, but the object's refcount could have reached zero. The kfuncs need to -consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE -pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely -also be KF_RET_NULL. +The KF_RCU flag allows kfuncs to opt out of the default trusted args +requirement and accept RCU pointers with weaker guarantees. The kfuncs marked +with KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier +guarantees that the objects are valid and there is no use-after-free. The +pointers are not NULL, but the object's refcount could have reached zero. The +kfuncs need to consider doing refcnt != 0 check, especially when returning a +KF_ACQUIRE pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should +very likely also be KF_RET_NULL. -2.4.8 KF_RCU_PROTECTED flag +2.5.7 KF_RCU_PROTECTED flag --------------------------- The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in @@ -354,7 +344,7 @@ RCU protection but do not take RCU protected arguments. .. _KF_deprecated_flag: -2.4.9 KF_DEPRECATED flag +2.5.8 KF_DEPRECATED flag ------------------------ The KF_DEPRECATED flag is used for kfuncs which are scheduled to be @@ -374,7 +364,39 @@ encouraged to make their use-cases known as early as possible, and participate in upstream discussions regarding whether to keep, change, deprecate, or remove those kfuncs if and when such discussions occur. -2.5 Registering the kfuncs +2.5.9 KF_IMPLICIT_ARGS flag +------------------------------------ + +The KF_IMPLICIT_ARGS flag is used to indicate that the BPF signature +of the kfunc is different from it's kernel signature, and the values +for implicit arguments are provided at load time by the verifier. + +Only arguments of specific types are implicit. +Currently only ``struct bpf_prog_aux *`` type is supported. + +A kfunc with KF_IMPLICIT_ARGS flag therefore has two types in BTF: one +function matching the kernel declaration (with _impl suffix in the +name by convention), and another matching the intended BPF API. + +Verifier only allows calls to the non-_impl version of a kfunc, that +uses a signature without the implicit arguments. + +Example declaration: + +.. code-block:: c + + __bpf_kfunc int bpf_task_work_schedule_signal(struct task_struct *task, struct bpf_task_work *tw, + void *map__map, bpf_task_work_callback_t callback, + struct bpf_prog_aux *aux) { ... } + +Example usage in BPF program: + +.. code-block:: c + + /* note that the last argument is omitted */ + bpf_task_work_schedule_signal(task, &work->tw, &arrmap, task_work_callback); + +2.6 Registering the kfuncs -------------------------- Once the kfunc is prepared for use, the final step to making it visible is @@ -397,7 +419,7 @@ type. An example is shown below:: } late_initcall(init_subsystem); -2.6 Specifying no-cast aliases with ___init +2.7 Specifying no-cast aliases with ___init -------------------------------------------- The verifier will always enforce that the BTF type of a pointer passed to a diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index 0cf97dbab29d..6b373e193548 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -38,7 +38,7 @@ bash 4.2 bash --version binutils 2.30 ld -v flex 2.5.35 flex --version bison 2.0 bison --version -pahole 1.16 pahole --version +pahole 1.22 pahole --version util-linux 2.10o mount --version kmod 13 depmod -V e2fsprogs 1.41.4 e2fsck -V @@ -143,7 +143,7 @@ pahole Since Linux 5.2, if CONFIG_DEBUG_INFO_BTF is selected, the build system generates BTF (BPF Type Format) from DWARF in vmlinux, a bit later from kernel -modules as well. This requires pahole v1.16 or later. +modules as well. This requires pahole v1.22 or later. It is found in the 'dwarves' or 'pahole' distro packages or from https://fedorapeople.org/~acme/dwarves/. diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst index 404fe6126a76..9e2882d937b4 100644 --- a/Documentation/scheduler/sched-ext.rst +++ b/Documentation/scheduler/sched-ext.rst @@ -43,7 +43,6 @@ options should be enabled to use sched_ext: CONFIG_DEBUG_INFO_BTF=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_BPF_JIT_DEFAULT_ON=y - CONFIG_PAHOLE_HAS_SPLIT_BTF=y CONFIG_PAHOLE_HAS_BTF_TAG=y sched_ext is used only when the BPF scheduler is loaded and running. |
