| Age | Commit message (Collapse) | Author |
|
Add iter_buf_null_fail with two tests and a test runner:
- iter_buf_null_deref: verifier must reject direct dereference of
ctx->key (PTR_TO_BUF | PTR_MAYBE_NULL) without a null check
- iter_buf_null_check_ok: verifier must accept dereference after
an explicit null check
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
Link: https://lore.kernel.org/r/20260407145421.4315-1-tpluszz77@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
For tests that carry a __description tag, allow matching on both the
description string and program name for convenience. Before this commit,
the description string must be spelt out to filter the tests.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260407145606.3991770-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Since we have changed how big user defined headroom in umem can be,
change the logic in testapp_stats_rx_dropped() so we pass updated
headroom validation in xdp_umem_reg() and still drop half of frames.
Test works on non-mbuf setup so __xsk_pool_get_rx_frame_size() that is
called on xsk_rcv_check() will not account skb_shared_info size. Taking
the tailroom size into account in test being fixed is needed as
xdp_umem_reg() defaults to respect it.
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-9-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently two different XDP programs share a static variable for
different purposes (picking where to redirect on shared umem test &
whether to drop a packet). This can be a problem when running full test
suite - idx can be written by shared umem test and this value can cause
a false behavior within XDP drop half test.
Introduce a dedicated variable for drop half test so that these two
don't step on each other toes. There is no real need for using
__sync_fetch_and_add here as XSK tests are executed on single CPU.
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-8-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Skip tail adjust tests in xskxceiver for SKB mode as it is not very
friendly for it. multi-buffer case does not work as xdp_rxq_info that is
registered for generic XDP does not report ::frag_size. The non-mbuf
path copies packet via skb_pp_cow_data() which only accounts for
headroom, leaving us with no tailroom and causing underlying XDP prog to
drop packets therefore.
For multi-buffer test on other modes, change the amount of bytes we use
for growth, assume worst-case scenario and take care of headroom and
tailroom.
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-7-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Parametrize current way of getting MAX_SKB_FRAGS value from {sys,proc}fs
so that it can be re-used to get cache line size of system's CPU. All
that just to mimic and compute size of kernel's struct skb_shared_info
which for xsk and test suite interpret as tailroom.
Introduce two variables to ifobject struct that will carry count of skb
frags and tailroom size. Do the reading and computing once, at the
beginning of test suite execution in xskxceiver, but for test_progs such
way is not possible as in this environment each test setups and torns
down ifobject structs.
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-6-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A `gotox rX` instruction accepts only values of type PTR_TO_INSN.
The only way to create such a value is to load it from a map of
type insn_array:
rX = *(rY + offset) # rY was read from an insn_array
...
gotox rX
Add instruction-level and C-level selftests to validate loads
with nonzero offsets.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260406160141.36943-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ensure we reject programs that access beyond the maximum syscall ctx
size, i.e. U16_MAX either through direct accesses or helpers/kfuncs.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add coverage for unaligned access with fixed offsets and variable
offsets, and through helpers or kfuncs.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Ensure that global subprogs and tail calls can only accept an unmodified
PTR_TO_CTX for syscall programs. For all other program types, fixed or
variable offsets on PTR_TO_CTX is rejected when passed into an argument
of any call instruction type, through the unified logic of
check_func_arg_reg_off.
Finally, add a positive example of a case that should succeed with all
our previous changes.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add various tests to exercise fixed and variable offsets on PTR_TO_CTX
for syscall programs, and cover disallowed cases for other program types
lacking convert_ctx_access callback. Load verifier_ctx with CAP_SYS_ADMIN
so that kfunc related logic can be tested. While at it, convert assembly
tests to C. Unfortunately, ctx_pointer_to_helper_2's unpriv case conflicts
with usage of kfuncs in the file and cannot be run.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Convert existing tests from ASM to C, in prep for future changes to add
more comprehensive tests.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Allow accessing PTR_TO_CTX with variable offsets in syscall programs.
Fixed offsets are already enabled for all program types that do not
convert their ctx accesses, since the changes we made in the commit
de6c7d99f898 ("bpf: Relax fixed offset check for PTR_TO_CTX"). Note
that we also lift the restriction on passing syscall context into
helpers, which was not permitted before, and passing modified syscall
context into kfuncs.
The structure of check_mem_access can be mostly shared and preserved,
but we must use check_mem_region_access to correctly verify access with
variable offsets.
The check made in check_helper_mem_access is hardened to only allow
PTR_TO_CTX for syscall programs to be passed in as helper memory. This
was the original intention of the existing code anyway, and it makes
little sense for other program types' context to be utilized as a memory
buffer. In case a convincing example presents itself in the future, this
check can be relaxed further.
We also no longer use the last-byte access to simulate helper memory
access, but instead go through check_mem_region_access. Since this no
longer updates our max_ctx_offset, we must do so manually, to keep track
of the maximum offset at which the program ctx may be accessed.
Take care to ensure that when arg_type is ARG_PTR_TO_CTX, we do not
relax any fixed or variable offset constraints around PTR_TO_CTX even in
syscall programs, and require them to be passed unmodified. There are
several reasons why this is necessary. First, if we pass a modified ctx,
then the global subprog's accesses will not update the max_ctx_offset to
its true maximum offset, and can lead to out of bounds accesses. Second,
tail called program (or extension program replacing global subprog) where
their max_ctx_offset exceeds the program they are being called from can
also cause issues. For the latter, unmodified PTR_TO_CTX is the first
requirement for the fix, the second is ensuring max_ctx_offset >= the
program they are being called from, which has to be a separate change
not made in this commit.
All in all, we can hint using arg_type when we expect ARG_PTR_TO_CTX and
make our relaxation around offsets conditional on it.
Drop coverage of syscall tests from verifier_ctx.c temporarily for
negative cases until they are updated in subsequent commits.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260406194403.1649608-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
tc_tunnel test is based on a send_and_test_data function which takes a
subtest configuration, and a boolean indicating whether the connection
is supposed to fail or not. This boolean is systematically passed to
true, and is a remnant from the first (not integrated) attempts to
convert tc_tunnel to test_progs: those versions validated for
example that a connection properly fails when only one side of the
connection has tunneling enabled. This specific testing has not been
integrated because it involved large timeouts which increased quite a
lot the test duration, for little added value.
Remove the unused boolean from send_and_test_data to simplify the
generic part of subtests.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260403-tc_tunnel_cleanup-v1-1-4f1bb113d3ab@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Verify that bpf_map__get_next_key() correctly returns -ENOENT when
called on the last (and only) key in a cgroup_storage map. Before the
fix in the previous patch, this would succeed with bogus key data
instead of failing.
Suggested-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260403132951.43533-3-bestswngs@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a consistency subtest to htab_reuse that detects torn writes
caused by the BPF_F_LOCK lockless update racing with element
reallocation in alloc_htab_elem().
The test uses three thread roles started simultaneously via a pipe:
- locked updaters: BPF_F_LOCK|BPF_EXIST in-place updates
- delete+update workers: delete then BPF_ANY|BPF_F_LOCK insert
- locked readers: BPF_F_LOCK lookup checking value consistency
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20260401-bpf_map_torn_writes-v1-2-782d071c55e7@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add two passes before the main verifier pass:
bpf_compute_const_regs() is a forward dataflow analysis that tracks
register values in R0-R9 across the program using fixed-point
iteration in reverse postorder. Each register is tracked with
a six-state lattice:
UNVISITED -> CONST(val) / MAP_PTR(map_index) /
MAP_VALUE(map_index, offset) / SUBPROG(num) -> UNKNOWN
At merge points, if two paths produce the same state and value for
a register, it stays; otherwise it becomes UNKNOWN.
The analysis handles:
- MOV, ADD, SUB, AND with immediate or register operands
- LD_IMM64 for plain constants, map FDs, map values, and subprogs
- LDX from read-only maps: constant-folds the load by reading the
map value directly via bpf_map_direct_read()
Results that fit in 32 bits are stored per-instruction in
insn_aux_data and bitmasks.
bpf_prune_dead_branches() uses the computed constants to evaluate
conditional branches. When both operands of a conditional jump are
known constants, the branch outcome is determined statically and the
instruction is rewritten to an unconditional jump.
The CFG postorder is then recomputed to reflect new control flow.
This eliminates dead edges so that subsequent liveness analysis
doesn't propagate through dead code.
Also add runtime sanity check to validate that precomputed
constants match the verifier's tracked state.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-5-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add few tests for topo sort:
- linear chain: main -> A -> B
- diamond: main -> A, main -> B, A -> C, B -> C
- mixed global/static: main -> global -> static leaf
- shared callee: main -> leaf, main -> global -> leaf
- duplicate calls: main calls same subprog twice
- no calls: single subprog
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-4-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a pass that sorts subprogs in topological order so that iterating
subprog_topo_order[] walks leaf subprogs first, then their callers.
This is computed as a DFS post-order traversal of the CFG.
The pass runs after check_cfg() to ensure the CFG has been validated
before traversing and after postorder has been computed to avoid
walking dead code.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-3-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Instead of checking src/dst range multiple times during
the main verifier pass do them once.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260403024422.87231-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Cross-merge BPF and other fixes after downstream PR.
Minor conflict in kernel/bpf/verifier.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With gotox instruction and jumptable now supported,
enable corresponding bpf selftest on powerpc.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-5-adubey@linux.ibm.com
|
|
With instruction array now supported, enable corresponding bpf
selftest for powerpc.
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401152133.42544-3-adubey@linux.ibm.com
|
|
With support of private stack, relevant tests must pass
on powerpc64.
#./test_progs -t struct_ops_private_stack
#434/1 struct_ops_private_stack/private_stack:OK
#434/2 struct_ops_private_stack/private_stack_fail:OK
#434/3 struct_ops_private_stack/private_stack_recur:OK
#434 struct_ops_private_stack:OK
Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Abhishek Dubey <adubey@linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20260401103215.104438-2-adubey@linux.ibm.com
|
|
With the changes to the verifier in previous commits, we're not
expecting any invariant violations anymore. We should therefore always
enable BPF_F_TEST_REG_INVARIANTS to fail on invariant violations. Turns
out that's already the case and we've been explicitly setting this flag
in selftests when it wasn't necessary. This commit removes those flags
from selftests, which should hopefully make clearer that it's always
enabled.
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/9afce92510a7d44569dc3af63c9b8c608e69298a.1775142354.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a selftest for the change in the previous patch. The
selftest is derived from a syzbot reproducer from [1] (among the 22
reproducers on that page, only 4 still reproduced on latest bpf tree,
all being small variants of the same invariant violation).
The test case failure without the previous patch is shown below.
0: R1=ctx() R10=fp0
0: (85) call bpf_get_prandom_u32#7 ; R0=scalar()
1: (bf) r5 = r0 ; R0=scalar(id=1) R5=scalar(id=1)
2: (57) r5 &= -4 ; R5=scalar(smax=0x7ffffffffffffffc,umax=0xfffffffffffffffc,smax32=0x7ffffffc,umax32=0xfffffffc,var_off=(0x0; 0xfffffffffffffffc))
3: (bf) r7 = r0 ; R0=scalar(id=1) R7=scalar(id=1)
4: (57) r7 &= 1 ; R7=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
5: (07) r7 += -43 ; R7=scalar(smin=smin32=-43,smax=smax32=-42,umin=0xffffffffffffffd5,umax=0xffffffffffffffd6,umin32=0xffffffd5,umax32=0xffffffd6,var_off=(0xffffffffffffffd4; 0x3))
6: (5e) if w5 != w7 goto pc+1
verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xffffffd5, 0xffffffffffffffd4] s64=[0x80000000ffffffd5, 0x7fffffffffffffd4] u32=[0xffffffd5, 0xffffffd4] s32=[0xffffffd5, 0xffffffd4] var_off=(0xffffffd4, 0xffffffff00000000)
R5 and R7 are prepared such that their tnums intersection results in a
known constant but that constant isn't within R7's u32 bounds.
is_branch_taken isn't able to detect this case today, so the verifier
walks the impossible fallthrough branch. After regs_refine_cond_op and
reg_bounds_sync refine R5 on the assumption that the branch is taken,
the impossibility becomes apparent and results in an invariant violation
for R5: umin32 is greater than umax32.
The previous patch fixes this by using regs_refine_cond_op and
reg_bounds_sync in is_branch_taken to detect the impossible branch. The
fallthrough branch is therefore correctly detected as dead code.
Link: https://syzkaller.appspot.com/bug?extid=c950cc277150935cc0b5 [1]
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/b1e22233a3206ead522f02eda27b9c5c991a0de9.1775142354.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
memory leak
If TLD_FREE_DATA_ON_THREAD_EXIT is not enabled in a translation unit
that calls __tld_create_key() first, another translation unit that
enables it will not get the auto cleanup feature as pthread key is only
created once when allocation metadata. Fix it by always try to create
the pthread key when __tld_create_key() is called.
Also improve the documentation:
- Discourage user from using different options in different translation
units
- Specify calling tld_free() before thread exit as undefined behavior
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
TLD_READ_ONCE() is redundant as the only reference passed to it is
defined as _Atomic. The load is guaranteed to be atomic in C11 standard
(6.2.6.1). Drop the macro.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Without specifying constructor priority of the hidden constructor
function defined by TLD_DEFINE_KEY, __tld_create_key(..., dyn_data =
false) may run after tld_get_data() called from other constructors.
Threads calling tld_get_data() before __tld_create_key(..., dyn_data
= false) will not allocate enough memory for all TLDs and later result
in OOB access. Therefore, set it to the lowest value available to
users. Note that lower means higher priority and 0-100 is reserved to
the compiler.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Simplify data allocation by always using aligned_alloc() and passing
size_pot, size rounded up to the closest power of two to alignment.
Currently, aligned_alloc(page_size, size) is only intended to be used
with memory allocators that can fulfill the request without rounding
size up to page_size to conserve memory. This is enabled by defining
TLD_DATA_USE_ALIGNED_ALLOC. The reason to align to page_size is due to
the limitation of UPTR where only a page can be pinned to the kernel.
Otherwise, malloc(size * 2) is used to allocate memory for data.
However, we don't need to call aligned_alloc(page_size, size) to get
a contiguous memory of size bytes within a page. aligned_alloc(size_pot,
...) will also do the trick. Therefore, just use aligned_alloc(size_pot,
...) universally.
As for the size argument, create a new option,
TLD_DONT_ROUND_UP_DATA_SIZE, to specify not rounding up the size.
This preserves the current TLD_DATA_USE_ALIGNED_ALLOC behavior, allowing
memory allocators with low overhead aligned_alloc() to not waste memory.
To enable this, users need to make sure it is not an undefined behavior
for the memory allocator to have size not being an integral multiple of
alignment.
Compared to the current implementation, !TLD_DATA_USE_ALIGNED_ALLOC
used to always waste size-byte of memory due to malloc(size * 2).
Now the worst case becomes size - 1 and the best case is 0 when the size
is already a power of two.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, when allocating memory for data, size of tld_data_u->start
is not taken into account. This may cause OOB access. Fixed it by adding
the non-flexible array part of tld_data_u.
Besides, explicitly align tld_data_u->data to 8 bytes in case some
fields are added before data in the future. It could break the
assumption that every data field is 8 byte aligned and
sizeof(tld_data_u) will no longer be equal to
offsetof(struct tld_data_u, data), which we use interchangeably.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Link: https://lore.kernel.org/r/20260331213555.1993883-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, attach_probe covers manual single-kprobe attaches by
func_name, but not the raw-address form that the PMU-based
single-kprobe path can accept.
This commit adds PERF and LINK raw-address coverage. It resolves
SYS_NANOSLEEP_KPROBE_NAME through kallsyms, passes the absolute address
in bpf_kprobe_opts.offset with func_name = NULL, and verifies that
kprobe and kretprobe are still triggered. It also verifies that LEGACY
rejects the same form.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20260401143116.185049-4-hoyeon.lee@suse.com
|
|
Add verifier precision tracking tests for BPF atomic fetch operations.
Validate that backtrack_insn correctly propagates precision from the
fetch dst_reg to the stack slot for {fetch_add,xchg,cmpxchg} atomics.
For the first two src_reg gets the old memory value, and for the last
one r0. The fetched register is used for pointer arithmetic to trigger
backtracking. Also add coverage for fetch_{or,and,xor} flavors which
exercises the bitwise atomic fetch variants going through the same
insn->imm & BPF_FETCH check but with different imm values.
Add dual-precision regression tests for fetch_add and cmpxchg where
both the fetched value and a reread of the same stack slot are tracked
for precision. After the atomic operation, the stack slot is STACK_MISC,
so the ldx does not set INSN_F_STACK_ACCESS. These tests verify that
stack precision propagates solely through the atomic fetch's load side.
Add map-based tests for fetch_add and cmpxchg which validate that non-
stack atomic fetch completes precision tracking without falling back
to mark_all_scalars_precise. Lastly, add 32-bit variants for {fetch_add,
cmpxchg} on map values to cover the second valid atomic operand size.
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_precision
[...]
+ /etc/rcS.d/S50-startup
./test_progs -t verifier_precision
[ 1.697105] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.700220] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
[ 1.777043] tsc: Refined TSC clocksource calibration: 3407.986 MHz
[ 1.777619] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc6d7268, max_idle_ns: 440795260133 ns
[ 1.778658] clocksource: Switched to clocksource tsc
#633/1 verifier_precision/bpf_neg:OK
#633/2 verifier_precision/bpf_end_to_le:OK
#633/3 verifier_precision/bpf_end_to_be:OK
#633/4 verifier_precision/bpf_end_bswap:OK
#633/5 verifier_precision/bpf_load_acquire:OK
#633/6 verifier_precision/bpf_store_release:OK
#633/7 verifier_precision/state_loop_first_last_equal:OK
#633/8 verifier_precision/bpf_cond_op_r10:OK
#633/9 verifier_precision/bpf_cond_op_not_r10:OK
#633/10 verifier_precision/bpf_atomic_fetch_add_precision:OK
#633/11 verifier_precision/bpf_atomic_xchg_precision:OK
#633/12 verifier_precision/bpf_atomic_fetch_or_precision:OK
#633/13 verifier_precision/bpf_atomic_fetch_and_precision:OK
#633/14 verifier_precision/bpf_atomic_fetch_xor_precision:OK
#633/15 verifier_precision/bpf_atomic_cmpxchg_precision:OK
#633/16 verifier_precision/bpf_atomic_fetch_add_dual_precision:OK
#633/17 verifier_precision/bpf_atomic_cmpxchg_dual_precision:OK
#633/18 verifier_precision/bpf_atomic_fetch_add_map_precision:OK
#633/19 verifier_precision/bpf_atomic_cmpxchg_map_precision:OK
#633/20 verifier_precision/bpf_atomic_fetch_add_32bit_precision:OK
#633/21 verifier_precision/bpf_atomic_cmpxchg_32bit_precision:OK
#633/22 verifier_precision/bpf_neg_2:OK
#633/23 verifier_precision/bpf_neg_3:OK
#633/24 verifier_precision/bpf_neg_4:OK
#633/25 verifier_precision/bpf_neg_5:OK
#633 verifier_precision:OK
Summary: 1/25 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260331222020.401848-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a test to verify the issue: kprobe_write_ctx can be abused to modify
struct pt_regs of kernel functions via kprobe_write_ctx=true freplace
progs.
Without the fix, the issue is verified:
kprobe_write_ctx=true freplace prog is allowed to attach to
kprobe_write_ctx=false kprobe prog. Then, the first arg of
bpf_fentry_test1 will be set as 0, and bpf_prog_test_run_opts() gets
-EFAULT instead of 0.
With the fix, the issue is rejected at attach time.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260331145353.87606-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When running veristat across many BPF objects, expected load failures
produce noisy stderr output that obscures actual issues. Gate these
diagnostic messages behind --verbose.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260331172634.57402-2-mykyta.yatsenko5@gmail.com
|
|
Add the testing to access the bpf_ringbuf with the map pointer.
"consumer_pos" and "producer_pos" is accessed in this testing. We reserve
128 bytes in the ringbuf to test the producer_pos, which should be
"128 + BPF_RINGBUF_HDR_SZ".
It will be helpful if we want to evaluate the usage of the ringbuf in bpf
prog with the consumer and producer position.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/bpf/20260331070434.10037-1-dongml2@chinatelecom.cn
|
|
Verify that bpf_skb_adjust_room() clears the routing dst even when
the encap L3 protocol matches the original packet (e.g. IPIP).
The dst selected for the inner packet is not valid for the
encapsulated result; a stale dst could lead to misrouting.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260329180428.2657785-2-kuba@kernel.org
|
|
Add few more alu32 shift tests using div-by-zero on provably dead paths
to check both verifier and JIT xlation resp. runtime correctness.
If the verifier mistracks the result, it rejects due to the div by 0;
if the JIT computes a wrong value, then runtime hits the dead path and
retval changes.
# LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_subreg
[...]
#644/76 verifier_subreg/arsh32_imm1_value:OK
#644/77 verifier_subreg/lsh32_reg0_zero_extend_check:OK
#644/78 verifier_subreg/rsh32_reg0_zero_extend_check:OK
#644/79 verifier_subreg/arsh32_reg0_zero_extend_check:OK
#644/80 verifier_subreg/lsh32_imm31_value:OK
#644/81 verifier_subreg/rsh32_imm31_value:OK
#644/82 verifier_subreg/arsh32_imm31_value:OK
#644/83 verifier_subreg/lsh32_unknown_precise_bounds:OK
#644/84 verifier_subreg/rsh32_unknown_bounds:OK
#644 verifier_subreg:OK
Summary: 1/84 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260327220629.343327-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Update selftests to use the new non-_impl kfuncs marked with
KF_IMPLICIT_ARGS by removing redundant declarations and macros from
bpf_experimental.h (the new kfuncs are present in the vmlinux.h) and
updating relevant callsites.
Fix spin_lock verifier-log matching for lock_id_kptr_preserve by
accepting variable instruction numbers. The calls to kfuncs with
implicit arguments do not have register moves (e.g. r5 = 0)
corresponding to dummy arguments anymore, so the order of instructions
has shifted.
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260327203241.3365046-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The following kfuncs currently accept void *meta__ign argument:
* bpf_obj_new_impl
* bpf_obj_drop_impl
* bpf_percpu_obj_new_impl
* bpf_percpu_obj_drop_impl
* bpf_refcount_acquire_impl
* bpf_list_push_back_impl
* bpf_list_push_front_impl
* bpf_rbtree_add_impl
The __ign suffix is an indicator for the verifier to skip the argument
in check_kfunc_args(). Then, in fixup_kfunc_call() the verifier may
set the value of this argument to struct btf_struct_meta *
kptr_struct_meta from insn_aux_data.
BPF programs must pass a dummy NULL value when calling these kfuncs.
Additionally, the list and rbtree _impl kfuncs also accept an implicit
u64 argument, which doesn't require __ign suffix because it's a
scalar, and BPF programs explicitly pass 0.
Add new kfuncs with KF_IMPLICIT_ARGS [1], that correspond to each
_impl kfunc accepting meta__ign. The existing _impl kfuncs remain
unchanged for backwards compatibility.
To support this, add "btf_struct_meta" to the list of recognized
implicit argument types in resolve_btfids.
Implement is_kfunc_arg_implicit() in the verifier, that determines
implicit args by inspecting both a non-_impl BTF prototype of the
kfunc.
Update the special_kfunc_list in the verifier and relevant checks to
support both the old _impl and the new KF_IMPLICIT_ARGS variants of
btf_struct_meta users.
[1] https://lore.kernel.org/bpf/20260120222638.3976562-1-ihor.solodrai@linux.dev/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260327203241.3365046-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add selftests to test block device tracking for bpf lsm programs.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260326-work-bpf-bdev-v2-2-5e3c58963987@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
verify btf__new_empty_opts() adds layouts for all kinds supported,
and after adding kind-related types for an unknown kind, ensure that
parsing uses this info when that kind is encountered rather than
giving up.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-9-alan.maguire@oracle.com
|
|
Cross-merge networking fixes after downstream PR (net-7.0-rc6).
No conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The verifier log output may contain multiple lines that start with
18: (bf) r0 = r6
teach reg_bounds to look for lines that have ';' in them,
since reg_bounds test is looking for:
18: (bf) r0 = r6 ; R0=... R6=...
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260325012242.45606-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
(tcp_congestion_ops)->cwnd_event() is called very often, with
@event oscillating between CA_EVENT_TX_START and other values.
This is not branch prediction friendly.
Provide a new cwnd_event_tx_start pointer dedicated for CA_EVENT_TX_START.
Both BBR and CUBIC benefit from this change, since they only care
about CA_EVENT_TX_START.
No change in kernel size:
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 4/4 grow/shrink: 3/1 up/down: 564/-568 (-4)
Function old new delta
bbr_cwnd_event_tx_start - 450 +450
cubictcp_cwnd_event_tx_start - 70 +70
__pfx_cubictcp_cwnd_event_tx_start - 16 +16
__pfx_bbr_cwnd_event_tx_start - 16 +16
tcp_unregister_congestion_control 93 99 +6
tcp_update_congestion_control 518 521 +3
tcp_register_congestion_control 422 425 +3
__tcp_transmit_skb 3308 3306 -2
__pfx_cubictcp_cwnd_event 16 - -16
__pfx_bbr_cwnd_event 16 - -16
cubictcp_cwnd_event 80 - -80
bbr_cwnd_event 454 - -454
Total: Before=25240512, After=25240508, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260323234920.1097858-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test to make sure that variable length stack writes
scrubs STACK_SPILL into STACK_MISC.
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260324215938.81733-2-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
headers
Now that the UAPI headers provide the required definitions, use those.
Some symbols have been renamed, adapt to those.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|
|
Now that the UAPI headers provide the required definitions, use those.
Some symbols have been renamed, adapt to those.
Also adapt the include path for the custom sign-file rule in the
bpf selftests.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Nicolas Schier <nsc@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
|
|
trampoline_count fills all trampoline attachment slots for a single
target function and verifies that one extra attach fails with -E2BIG.
It currently targets bpf_modify_return_test, which is also used by
other selftests such as modify_return, get_func_ip_test, and
get_func_args_test. When such tests run in parallel, they can contend
for the same per-function trampoline quota and cause unexpected attach
failures. This issue is currently masked by harness serialization.
Move trampoline_count to a dedicated bpf_testmod target and register it
for fmod_ret attachment. Also route the final trigger through
trigger_module_test_read(), so the execution path exercises the same
dedicated target.
This keeps the test semantics unchanged while isolating it from other
selftests, so it no longer needs to run in serial mode. Remove the
TODO comment as well.
Tested:
./test_progs -t trampoline_count -vv
./test_progs -j$(nproc) -t trampoline_count -vv
./test_progs -j$(nproc) -t \
trampoline_count,modify_return,get_func_ip_test,get_func_args_test -vv
20 runs of:
./test_progs -j$(nproc) -t \
trampoline_count,modify_return,get_func_ip_test,get_func_args_test
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260324044949.869801-1-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Previously I added a FIONREAD test for sockmap, but it can occasionally
fail in CI [1].
The test sends 10 bytes in two segments (2 + 8). For UDP, FIONREAD only
reports the length of the first datagram, not the total queued data.
The original code used recv_timeout() expecting all 10 bytes, but under
high system load, the second datagram may not yet be processed by the
protocol stack, so recv would only return the first 2-byte datagram,
causing a size mismatch failure.
Fix this by receiving exactly the expected bytes (matching FIONREAD) in
the first recv. The remaining datagram is then consumed in a second recv
block, which is only reachable for UDP since TCP's expected already
equals sizeof(buf).
Test:
./test_progs -a sockmap_basic
410/1 sockmap_basic/sockmap create_update_free:OK
...
Summary: 1/35 PASSED, 0 SKIPPED, 0 FAILED
[1] https://github.com/kernel-patches/bpf/actions/runs/22919385910/job/66515395423
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>
Fixes: 17e2ce02bf56 ("selftests/bpf: Add tests for FIONREAD and copied_seq")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Link: https://lore.kernel.org/r/20260312072549.6766-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|