linux-toradex.git/kernel/bpf/net_namespace.c, branch v6.19

bpf: Remove attach_type in bpf_netns_link

2025-07-11T18:01:04+00:00

Use attach_type in bpf_link, and remove it in bpf_netns_link.
And move netns_type field to the end to fill the byte hole.

Signed-off-by: Tao Chen 
Signed-off-by: Andrii Nakryiko 
Reviewed-by: Jakub Sitnicki 
Acked-by: Jiri Olsa 
Acked-by: John Fastabend 
Link: https://lore.kernel.org/bpf/20250710032038.888700-6-chen.dylane@linux.dev

bpf: Add attach_type field to bpf_link

2025-07-11T17:51:55+00:00

Attach_type will be set when a link is created by user. It is better to
record attach_type in bpf_link generically and have it available
universally for all link types. So add the attach_type field in bpf_link
and move the sleepable field to avoid unnecessary gap padding.

Signed-off-by: Tao Chen 
Signed-off-by: Andrii Nakryiko 
Acked-by: Daniel Borkmann 
Acked-by: Jiri Olsa 
Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev

net: Add includes masked by netdevice.h including uapi/bpf.h

2021-12-30T04:03:05+00:00

Add missing includes unmasked by the subsequent change.

Mostly network drivers missing an include for XDP_PACKET_HEADROOM.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Alexei Starovoitov 
Link: https://lore.kernel.org/bpf/20211230012742.770642-2-kuba@kernel.org

bpf: Add support for forced LINK_DETACH command

2020-08-02T03:38:28+00:00

Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
the same behavior as auto-detaching of bpf_link due to cgroup dying for
bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
program attached to corresponding BPF hook. This functionality allows users
with enough access rights to manually force-detach attached bpf_link without
killing respective owner process.

This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
re-using existing link release handling code.

Signed-off-by: Andrii Nakryiko 
Signed-off-by: Alexei Starovoitov 
Acked-by: Song Liu 
Acked-by: John Fastabend 
Link: https://lore.kernel.org/bpf/20200731182830.286260-2-andriin@fb.com

bpf, netns: Fix build without CONFIG_INET

2020-07-21T16:12:34+00:00

When CONFIG_NET is set but CONFIG_INET isn't, build fails with:

  ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_unneed':
  kernel/bpf/net_namespace.c:32: undefined reference to `bpf_sk_lookup_enabled'
  ld: kernel/bpf/net_namespace.o: in function `netns_bpf_attach_type_need':
  kernel/bpf/net_namespace.c:43: undefined reference to `bpf_sk_lookup_enabled'

This is because without CONFIG_INET bpf_sk_lookup_enabled symbol is not
available. Wrap references to bpf_sk_lookup_enabled with preprocessor
conditionals.

Fixes: 1559b4aa1db4 ("inet: Run SK_LOOKUP BPF program on socket lookup")
Reported-by: Randy Dunlap 
Reported-by: Stephen Rothwell 
Signed-off-by: Jakub Sitnicki 
Signed-off-by: Alexei Starovoitov 
Acked-by: Randy Dunlap  # build-tested
Link: https://lore.kernel.org/bpf/20200721100716.720477-1-jakub@cloudflare.com

inet: Run SK_LOOKUP BPF program on socket lookup

2020-07-18T03:18:16+00:00

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().

Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has been selected with bpf_sk_assign().

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:

 1. If any program returned SK_PASS and selected a valid socket, the socket
    is used as result of socket lookup.
 2. If more than one program returned SK_PASS and selected a socket,
    last selection takes effect.
 3. If any program returned SK_DROP, and no program returned SK_PASS and
    selected a socket, socket lookup fails with -ECONNREFUSED.
 4. If all programs returned SK_PASS and none of them selected a socket,
    socket lookup continues to htable-based lookup.

Suggested-by: Marek Majkowski 
Signed-off-by: Jakub Sitnicki 
Signed-off-by: Alexei Starovoitov 
Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com

bpf: Introduce SK_LOOKUP program type with a dedicated attach point

2020-07-18T03:18:16+00:00

Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski 
Signed-off-by: Jakub Sitnicki 
Signed-off-by: Alexei Starovoitov 
Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com

bpf, netns: Handle multiple link attachments

2020-07-18T03:18:16+00:00

Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Signed-off-by: Jakub Sitnicki 
Signed-off-by: Alexei Starovoitov 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20200717103536.397595-2-jakub@cloudflare.com

bpf, netns: Fix use-after-free in pernet pre_exit callback

2020-06-30T17:53:42+00:00

Iterating over BPF links attached to network namespace in pre_exit hook is
not safe, even if there is just one. Once link gets auto-detached, that is
its back-pointer to net object is set to NULL, the link can be released and
freed without waiting on netns_bpf_mutex, effectively causing the list
element we are operating on to be freed.

This leads to use-after-free when trying to access the next element on the
list, as reported by KASAN. Bug can be triggered by destroying a network
namespace, while also releasing a link attached to this network namespace.

| ==================================================================
| BUG: KASAN: use-after-free in netns_bpf_pernet_pre_exit+0xd9/0x130
| Read of size 8 at addr ffff888119e0d778 by task kworker/u8:2/177
|
| CPU: 3 PID: 177 Comm: kworker/u8:2 Not tainted 5.8.0-rc1-00197-ga0c04c9d1008-dirty #776
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
|  dump_stack+0x9e/0xe0
|  print_address_description.constprop.0+0x3a/0x60
|  ? netns_bpf_pernet_pre_exit+0xd9/0x130
|  kasan_report.cold+0x1f/0x40
|  ? netns_bpf_pernet_pre_exit+0xd9/0x130
|  netns_bpf_pernet_pre_exit+0xd9/0x130
|  cleanup_net+0x30b/0x5b0
|  ? unregister_pernet_device+0x50/0x50
|  ? rcu_read_lock_bh_held+0xb0/0xb0
|  ? _raw_spin_unlock_irq+0x24/0x50
|  process_one_work+0x4d1/0xa10
|  ? lock_release+0x3e0/0x3e0
|  ? pwq_dec_nr_in_flight+0x110/0x110
|  ? rwlock_bug.part.0+0x60/0x60
|  worker_thread+0x7a/0x5c0
|  ? process_one_work+0xa10/0xa10
|  kthread+0x1e3/0x240
|  ? kthread_create_on_node+0xd0/0xd0
|  ret_from_fork+0x1f/0x30
|
| Allocated by task 280:
|  save_stack+0x1b/0x40
|  __kasan_kmalloc.constprop.0+0xc2/0xd0
|  netns_bpf_link_create+0xfe/0x650
|  __do_sys_bpf+0x153a/0x2a50
|  do_syscall_64+0x59/0x300
|  entry_SYSCALL_64_after_hwframe+0x44/0xa9
|
| Freed by task 198:
|  save_stack+0x1b/0x40
|  __kasan_slab_free+0x12f/0x180
|  kfree+0xed/0x350
|  process_one_work+0x4d1/0xa10
|  worker_thread+0x7a/0x5c0
|  kthread+0x1e3/0x240
|  ret_from_fork+0x1f/0x30
|
| The buggy address belongs to the object at ffff888119e0d700
|  which belongs to the cache kmalloc-192 of size 192
| The buggy address is located 120 bytes inside of
|  192-byte region [ffff888119e0d700, ffff888119e0d7c0)
| The buggy address belongs to the page:
| page:ffffea0004678340 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
| flags: 0x2fffe0000000200(slab)
| raw: 02fffe0000000200 ffffea00045ba8c0 0000000600000006 ffff88811a80ea80
| raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
| page dumped because: kasan: bad access detected
|
| Memory state around the buggy address:
|  ffff888119e0d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|  ffff888119e0d680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
| >ffff888119e0d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
|                                                                 ^
|  ffff888119e0d780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
|  ffff888119e0d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
| ==================================================================

Remove the "fast-path" for releasing a link that got auto-detached by a
dying network namespace to fix it. This way as long as link is on the list
and netns_bpf mutex is held, we have a guarantee that link memory can be
accessed.

An alternative way to fix this issue would be to safely iterate over the
list of links and ensure there is no access to link object after detaching
it. But, at the moment, optimizing synchronization overhead on link release
without a workload in mind seems like an overkill.

Fixes: ab53cad90eb1 ("bpf, netns: Keep a list of attached bpf_link's")
Signed-off-by: Jakub Sitnicki 
Signed-off-by: Alexei Starovoitov 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/bpf/20200630164541.1329993-1-jakub@cloudflare.com

bpf: flow_dissector: Check value of unused flags to BPF_PROG_DETACH

2020-06-30T17:46:38+00:00

Using BPF_PROG_DETACH on a flow dissector program supports neither
attach_flags nor attach_bpf_fd. Yet no value is enforced for them.

Enforce that attach_flags are zero, and require the current program
to be passed via attach_bpf_fd. This allows us to remove the check
for CAP_SYS_ADMIN, since userspace can now no longer remove
arbitrary flow dissector programs.

Fixes: b27f7bb590ba ("flow_dissector: Move out netns_bpf prog callbacks")
Signed-off-by: Lorenz Bauer 
Signed-off-by: Alexei Starovoitov 
Link: https://lore.kernel.org/bpf/20200629095630.7933-3-lmb@cloudflare.com