diff options
| author | Alexei Starovoitov <ast@kernel.org> | 2026-02-04 13:35:29 -0800 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2026-02-04 13:35:29 -0800 |
| commit | 6e951a9b1af5dc063dfe9e17ca433be099527c43 (patch) | |
| tree | 2f89bc63559127e800e556ead2d204d8aad02f76 | |
| parent | b2821311abbd05d3340ad7f09fe89f088572b682 (diff) | |
| parent | 47fcf4dc0a346dd0b873a679c547d6848bd85a37 (diff) | |
Merge branch 'bpf-improve-linked-register-tracking'
Puranjay Mohan says:
====================
bpf: Improve linked register tracking
V3: https://lore.kernel.org/all/20260203222643.994713-1-puranjay@kernel.org/
Changes in v3->v4:
- Add a call to reg_bounds_sync() in sync_linked_regs() to sync bounds after alu op.
- Add a __sink(path[0]); in the C selftest so compiler doesn't say "error: variable 'path' set but
not used"
V2: https://lore.kernel.org/all/20260113152529.3217648-1-puranjay@kernel.org/
Changes in v2->v3:
- Added another selftest showing a real usage pattern
- Rebased on bpf-next/master
v1: https://lore.kernel.org/bpf/20260107203941.1063754-1-puranjay@kernel.org/
Changes in v1->v2:
- Add support for alu32 operations in linked register tracking (Alexei)
- Squash the selftest fix with the first patch (Eduard)
- Add more selftests to detect edge cases
This series extends the BPF verifier's linked register tracking to handle
negative offsets, BPF_SUB operations, and alu32 operations, enabling better bounds propagation for
common arithmetic patterns.
The verifier previously only tracked positive constant deltas between linked
registers using BPF_ADD. This meant patterns using negative offsets or
subtraction couldn't benefit from bounds propagation:
void alu32_negative_offset(void)
{
volatile char path[5];
volatile int offset = bpf_get_prandom_u32();
int off = offset;
if (off >= 5 && off < 10)
path[off - 5] = '.';
}
this gets compiled to:
0000000000000478 <alu32_negative_offset>:
143: call 0x7
144: *(u32 *)(r10 - 0xc) = w0
145: w1 = *(u32 *)(r10 - 0xc)
146: w2 = w1 // w2 and w1 share the same id
147: w2 += -0x5 // verifier knows w1 = w2 + 5
148: if w2 > 0x4 goto +0x5 <L0> // in fall-through: verifier knows w2 ∈ [0,4] => w1 ∈ [5, 9]
149: r2 = r10
150: r2 += -0x5 // r2 = fp - 5
151: r2 += r1 // r2 = fp - 5 + r1 (∈ [5, 9]) => r2 ∈ [fp, fp + 4]
152: w1 = 0x2e
153: *(u8 *)(r2 - 0x5) = w1 // r2 ∈ [fp, fp + 4] => r2 - 5 ∈ [fp - 5, fp - 1]
<L0>:
154: exit
After the changes, the verifier could link 32-bit scalars and also supported -ve offsets for linking:
146: w2 = w1
147: w2 += -0x5
It allowed the verifier to correctly propagate bounds, without the
changes in this patchset, verifier would reject this program with:
invalid unbounded variable-offset write to stack R2
This program has been added as a selftest in the second patch.
Veristat comparison on programs from sched_ext, selftests, and some meta internal programs:
Scx Progs
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
----------------- ---------------- ----------- ----------- -------------- --------- --------- -------------
scx_layered.bpf.o layered_runnable success success MATCH 5674 6077 +403 (+7.10%)
FB Progs
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
------------ ---------------- ----------- ----------- -------------- --------- --------- -----------------
bpf232.bpf.o layered_dump success success MATCH 1151 1218 +67 (+5.82%)
bpf257.bpf.o layered_runnable success success MATCH 5743 6143 +400 (+6.97%)
bpf252.bpf.o layered_runnable success success MATCH 5677 6075 +398 (+7.01%)
bpf227.bpf.o layered_dump success success MATCH 915 982 +67 (+7.32%)
bpf239.bpf.o layered_runnable success success MATCH 5459 5861 +402 (+7.36%)
bpf246.bpf.o layered_runnable success success MATCH 5562 6008 +446 (+8.02%)
bpf229.bpf.o layered_runnable success success MATCH 2559 3011 +452 (+17.66%)
bpf231.bpf.o layered_runnable success success MATCH 2559 3011 +452 (+17.66%)
bpf234.bpf.o layered_runnable success success MATCH 2549 3001 +452 (+17.73%)
bpf019.bpf.o do_sendmsg success success MATCH 124823 153523 +28700 (+22.99%)
bpf019.bpf.o do_parse success success MATCH 124809 153509 +28700 (+23.00%)
bpf227.bpf.o layered_runnable success success MATCH 1915 2356 +441 (+23.03%)
bpf228.bpf.o layered_runnable success success MATCH 1700 2152 +452 (+26.59%)
bpf232.bpf.o layered_runnable success success MATCH 1499 1951 +452 (+30.15%)
bpf312.bpf.o mount_exit success success MATCH 19253 62883 +43630 (+226.61%)
bpf312.bpf.o umount_exit success success MATCH 19253 62883 +43630 (+226.61%)
bpf311.bpf.o mount_exit success success MATCH 19226 62863 +43637 (+226.97%)
bpf311.bpf.o umount_exit success success MATCH 19226 62863 +43637 (+226.97%)
The above four programs have specific patters that make the verifier explore a lot more states:
for (; depth < MAX_DIR_DEPTH; depth++) {
const unsigned char* name = BPF_CORE_READ(dentry, d_name.name);
if (offset >= MAX_PATH_LEN - MAX_DIR_LEN) {
return depth;
}
int len = bpf_probe_read_kernel_str(&path[offset], MAX_DIR_LEN, name);
offset += len;
if (len == MAX_DIR_LEN) {
if (offset - 2 < MAX_PATH_LEN) { // <---- (a)
path[offset - 2] = '.';
}
if (offset - 3 < MAX_PATH_LEN) { // <---- (b)
path[offset - 3] = '.';
}
if (offset - 4 < MAX_PATH_LEN) { // <---- (c)
path[offset - 4] = '.';
}
}
}
When at some depth == N false branches of conditions (a), (b) and (c) are scheduled for
verification, constraints for offset at depth == N+1 are:
1. offset >= MAX_PATH_LEN + 2
2. offset >= MAX_PATH_LEN + 3
3. offset >= MAX_PATH_LEN + 4 (visited before others)
And after offset += len it becomes:
1. offset >= MAX_PATH_LEN - 4093
2. offset >= MAX_PATH_LEN - 4092
3. offset >= MAX_PATH_LEN - 4091 (visited before others)
Because of the DFS states exploration logic, the states above are visited in order 3, 2, 1; 3 is not
a subset of 2 and 1 is not a subset of 2, so pruning logic does not kick in.
Previously this was not a problem, because range for offset was not propagated through the
statements (a), (b), (c).
As the root cause of this regression is understood, this is not a blocker for this change.
Selftest Progs
File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF)
---------------------------------- ------------------------ ----------- ----------- -------------- --------- --------- --------------
linked_list_peek.bpf.o list_peek success success MATCH 152 88 -64 (-42.11%)
verifier_iterating_callbacks.bpf.o cond_break2 success success MATCH 110 88 -22 (-20.00%)
These are the added selftests that failed earlier but are passing now:
verifier_linked_scalars.bpf.o alu32_negative_offset failure success MISMATCH 11 13 +2 (+18.18%)
verifier_linked_scalars.bpf.o scalars_alu32_big_offset failure success MISMATCH 7 10 +3 (+42.86%)
verifier_linked_scalars.bpf.o scalars_neg_alu32_add failure success MISMATCH 7 10 +3 (+42.86%)
verifier_linked_scalars.bpf.o scalars_neg_alu32_sub failure success MISMATCH 7 10 +3 (+42.86%)
verifier_linked_scalars.bpf.o scalars_neg failure success MISMATCH 7 10 +3 (+42.86%)
verifier_linked_scalars.bpf.o scalars_neg_sub failure success MISMATCH 7 10 +3 (+42.86%)
verifier_linked_scalars.bpf.o scalars_sub_neg_imm failure success MISMATCH 7 10 +3 (+42.86%)
iters.bpf.o iter_obfuscate_counter success success MATCH 83 119 +36 (+43.37%)
bpf_cubic.bpf.o bpf_cubic_acked success success MATCH 243 430 +187 (+76.95%)
====================
Link: https://patch.msgid.link/20260204151741.2678118-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| -rw-r--r-- | include/linux/bpf_verifier.h | 6 | ||||
| -rw-r--r-- | kernel/bpf/verifier.c | 50 | ||||
| -rw-r--r-- | tools/testing/selftests/bpf/progs/verifier_bounds.c | 2 | ||||
| -rw-r--r-- | tools/testing/selftests/bpf/progs/verifier_linked_scalars.c | 303 |
4 files changed, 346 insertions, 15 deletions
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 746025df82c8..ef8e45a362d9 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -147,8 +147,12 @@ struct bpf_reg_state { * registers. Example: * r1 = r2; both will have r1->id == r2->id == N * r1 += 10; r1->id == N | BPF_ADD_CONST and r1->off == 10 + * r3 = r2; both will have r3->id == r2->id == N + * w3 += 10; r3->id == N | BPF_ADD_CONST32 and r3->off == 10 */ -#define BPF_ADD_CONST (1U << 31) +#define BPF_ADD_CONST64 (1U << 31) +#define BPF_ADD_CONST32 (1U << 30) +#define BPF_ADD_CONST (BPF_ADD_CONST64 | BPF_ADD_CONST32) u32 id; /* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned * from a pointer-cast helper, bpf_sk_fullsock() and diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 92e03a5a50f5..edf5342b982f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16209,6 +16209,13 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env, verbose(env, "verifier internal error: no src_reg\n"); return -EFAULT; } + /* + * For alu32 linked register tracking, we need to check dst_reg's + * umax_value before the ALU operation. After adjust_scalar_min_max_vals(), + * alu32 ops will have zero-extended the result, making umax_value <= U32_MAX. + */ + u64 dst_umax = dst_reg->umax_value; + err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg); if (err) return err; @@ -16218,26 +16225,44 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env, * r1 += 0x1 * if r2 < 1000 goto ... * use r1 in memory access - * So for 64-bit alu remember constant delta between r2 and r1 and - * update r1 after 'if' condition. + * So remember constant delta between r2 and r1 and update r1 after + * 'if' condition. */ if (env->bpf_capable && - BPF_OP(insn->code) == BPF_ADD && !alu32 && - dst_reg->id && is_reg_const(src_reg, false)) { - u64 val = reg_const_value(src_reg, false); + (BPF_OP(insn->code) == BPF_ADD || BPF_OP(insn->code) == BPF_SUB) && + dst_reg->id && is_reg_const(src_reg, alu32)) { + u64 val = reg_const_value(src_reg, alu32); + s32 off; + + if (!alu32 && ((s64)val < S32_MIN || (s64)val > S32_MAX)) + goto clear_id; + + if (alu32 && (dst_umax > U32_MAX)) + goto clear_id; - if ((dst_reg->id & BPF_ADD_CONST) || - /* prevent overflow in sync_linked_regs() later */ - val > (u32)S32_MAX) { + off = (s32)val; + + if (BPF_OP(insn->code) == BPF_SUB) { + /* Negating S32_MIN would overflow */ + if (off == S32_MIN) + goto clear_id; + off = -off; + } + + if (dst_reg->id & BPF_ADD_CONST) { /* * If the register already went through rX += val * we cannot accumulate another val into rx->off. */ +clear_id: dst_reg->off = 0; dst_reg->id = 0; } else { - dst_reg->id |= BPF_ADD_CONST; - dst_reg->off = val; + if (alu32) + dst_reg->id |= BPF_ADD_CONST32; + else + dst_reg->id |= BPF_ADD_CONST64; + dst_reg->off = off; } } else { /* @@ -17334,7 +17359,7 @@ static void sync_linked_regs(struct bpf_verifier_env *env, struct bpf_verifier_s u32 saved_id = reg->id; fake_reg.type = SCALAR_VALUE; - __mark_reg_known(&fake_reg, (s32)reg->off - (s32)known_reg->off); + __mark_reg_known(&fake_reg, (s64)reg->off - (s64)known_reg->off); /* reg = known_reg; reg += delta */ copy_register_state(reg, known_reg); @@ -17349,6 +17374,9 @@ static void sync_linked_regs(struct bpf_verifier_env *env, struct bpf_verifier_s scalar32_min_max_add(reg, &fake_reg); scalar_min_max_add(reg, &fake_reg); reg->var_off = tnum_add(reg->var_off, fake_reg.var_off); + if (known_reg->id & BPF_ADD_CONST32) + zext_32_to_64(reg); + reg_bounds_sync(reg); } if (e->is_reg) mark_reg_scratched(env, e->regno); diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds.c b/tools/testing/selftests/bpf/progs/verifier_bounds.c index 411a18437d7e..560531404bce 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bounds.c +++ b/tools/testing/selftests/bpf/progs/verifier_bounds.c @@ -1477,7 +1477,7 @@ __naked void sub64_full_overflow(void) SEC("socket") __description("64-bit subtraction, partial overflow, result in unbounded reg") __success __log_level(2) -__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar()") +__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar(id=1-1)") __retval(0) __naked void sub64_partial_overflow(void) { diff --git a/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c b/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c index 5f41bbb730a7..2ef346c827c2 100644 --- a/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c +++ b/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/bpf.h> +#include <limits.h> #include <bpf/bpf_helpers.h> #include "bpf_misc.h" @@ -18,9 +19,9 @@ __naked void scalars(void) r4 = r1; \ w2 += 0x7FFFFFFF; \ w4 += 0; \ - if r2 == 0 goto l1; \ + if r2 == 0 goto l0_%=; \ exit; \ -l1: \ +l0_%=: \ r4 >>= 63; \ r3 = 1; \ r3 -= r4; \ @@ -64,4 +65,302 @@ l0_%=: \ : __clobber_all); } +SEC("socket") +__success +__naked void scalars_neg(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r1 += -4; \ + if r1 s< 0 goto l0_%=; \ + if r0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* Same test but using BPF_SUB instead of BPF_ADD with negative immediate */ +SEC("socket") +__success +__naked void scalars_neg_sub(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r1 -= 4; \ + if r1 s< 0 goto l0_%=; \ + if r0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* alu32 with negative offset */ +SEC("socket") +__success +__naked void scalars_neg_alu32_add(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + w0 &= 0xff; \ + w1 = w0; \ + w1 += -4; \ + if w1 s< 0 goto l0_%=; \ + if w0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* alu32 with negative offset using SUB */ +SEC("socket") +__success +__naked void scalars_neg_alu32_sub(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + w0 &= 0xff; \ + w1 = w0; \ + w1 -= 4; \ + if w1 s< 0 goto l0_%=; \ + if w0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* Positive offset: r1 = r0 + 4, then if r1 >= 6, r0 >= 2, so r0 != 0 */ +SEC("socket") +__success +__naked void scalars_pos(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r1 += 4; \ + if r1 < 6 goto l0_%=; \ + if r0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* SUB with negative immediate: r1 -= -4 is equivalent to r1 += 4 */ +SEC("socket") +__success +__naked void scalars_sub_neg_imm(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r1 -= -4; \ + if r1 < 6 goto l0_%=; \ + if r0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* Double ADD clears the ID (can't accumulate offsets) */ +SEC("socket") +__failure +__msg("div by zero") +__naked void scalars_double_add(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r1 += 2; \ + r1 += 2; \ + if r1 < 6 goto l0_%=; \ + if r0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* + * Test that sync_linked_regs() correctly handles large offset differences. + * r1.off = S32_MIN, r2.off = 1, delta = S32_MIN - 1 requires 64-bit math. + */ +SEC("socket") +__success +__naked void scalars_sync_delta_overflow(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r2 = r0; \ + r1 += %[s32_min]; \ + r2 += 1; \ + if r2 s< 100 goto l0_%=; \ + if r1 s< 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32), + [s32_min]"i"(INT_MIN) + : __clobber_all); +} + +/* + * Another large delta case: r1.off = S32_MAX, r2.off = -1. + * delta = S32_MAX - (-1) = S32_MAX + 1 requires 64-bit math. + */ +SEC("socket") +__success +__naked void scalars_sync_delta_overflow_large_range(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r0 &= 0xff; \ + r1 = r0; \ + r2 = r0; \ + r1 += %[s32_max]; \ + r2 += -1; \ + if r2 s< 0 goto l0_%=; \ + if r1 s>= 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32), + [s32_max]"i"(INT_MAX) + : __clobber_all); +} + +/* + * Test linked scalar tracking with alu32 and large positive offset (0x7FFFFFFF). + * After w1 += 0x7FFFFFFF, w1 wraps to negative for any r0 >= 1. + * If w1 is signed-negative, then r0 >= 1, so r0 != 0. + */ +SEC("socket") +__success +__naked void scalars_alu32_big_offset(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + w0 &= 0xff; \ + w1 = w0; \ + w1 += 0x7FFFFFFF; \ + if w1 s>= 0 goto l0_%=; \ + if w0 != 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +SEC("socket") +__failure +__msg("div by zero") +__naked void scalars_alu32_basic(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + r1 = r0; \ + w1 += 1; \ + if r1 > 10 goto 1f; \ + r0 >>= 32; \ + if r0 == 0 goto 1f; \ + r0 /= 0; \ +1: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +/* + * Test alu32 linked register tracking with wrapping. + * R0 is bounded to [0xffffff00, 0xffffffff] (high 32-bit values) + * w1 += 0x100 causes R1 to wrap to [0, 0xff] + * + * After sync_linked_regs, if bounds are computed correctly: + * R0 should be [0x00000000_ffffff00, 0x00000000_ffffff80] + * R0 >> 32 == 0, so div by zero is unreachable + * + * If bounds are computed incorrectly (64-bit underflow): + * R0 becomes [0xffffffff_ffffff00, 0xffffffff_ffffff80] + * R0 >> 32 == 0xffffffff != 0, so div by zero is reachable + */ +SEC("socket") +__success +__naked void scalars_alu32_wrap(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + w0 |= 0xffffff00; \ + r1 = r0; \ + w1 += 0x100; \ + if r1 > 0x80 goto l0_%=; \ + r2 = r0; \ + r2 >>= 32; \ + if r2 == 0 goto l0_%=; \ + r0 /= 0; \ +l0_%=: \ + r0 = 0; \ + exit; \ +" : + : __imm(bpf_get_prandom_u32) + : __clobber_all); +} + +SEC("socket") +__success +void alu32_negative_offset(void) +{ + volatile char path[5]; + volatile int offset = bpf_get_prandom_u32(); + int off = offset; + + if (off >= 5 && off < 10) + path[off - 5] = '.'; + + /* So compiler doesn't say: error: variable 'path' set but not used */ + __sink(path[0]); +} + char _license[] SEC("license") = "GPL"; |
