<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/bpf/helpers.c, branch v5.7-rc2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor id</title>
<updated>2020-03-28T02:40:39+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-03-27T15:58:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0f09abd105da6c37713d2b253730a86cb45e127a'/>
<id>0f09abd105da6c37713d2b253730a86cb45e127a</id>
<content type='text'>
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
recvmsg() and bind-related hooks in order to retrieve the cgroup v2
context which can then be used as part of the key for BPF map lookups,
for example. Given these hooks operate in process context 'current' is
always valid and pointing to the app that is performing mentioned
syscalls if it's subject to a v2 cgroup. Also with same motivation of
commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper")
enable retrieval of ancestor from current so the cgroup id can be used
for policy lookups which can then forbid connect() / bind(), for example.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(),
recvmsg() and bind-related hooks in order to retrieve the cgroup v2
context which can then be used as part of the key for BPF map lookups,
for example. Given these hooks operate in process context 'current' is
always valid and pointing to the app that is performing mentioned
syscalls if it's subject to a v2 cgroup. Also with same motivation of
commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper")
enable retrieval of ancestor from current so the cgroup id can be used
for policy lookups which can then forbid connect() / bind(), for example.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Added new helper bpf_get_ns_current_pid_tgid</title>
<updated>2020-03-13T00:33:11+00:00</updated>
<author>
<name>Carlos Neira</name>
<email>cneirabustos@gmail.com</email>
</author>
<published>2020-03-04T20:41:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf'/>
<id>b4490c5c4e023f09b7d27c9a9d3e7ad7d09ea6bf</id>
<content type='text'>
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.

Signed-off-by: Carlos Neira &lt;cneirabustos@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.

Signed-off-by: Carlos Neira &lt;cneirabustos@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Add BPF_FUNC_jiffies64</title>
<updated>2020-01-23T00:30:10+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2020-01-22T23:36:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5576b991e9c1a11d2cc21c4b94fc75ec27603896'/>
<id>5576b991e9c1a11d2cc21c4b94fc75ec27603896</id>
<content type='text'>
This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds a helper to read the 64bit jiffies.  It will be used
in a later patch to implement the bpf_cubic.c.

The helper is inlined for jit_requested and 64 BITS_PER_LONG
as the map_gen_lookup().  Other cases could be considered together
with map_gen_lookup() if needed.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200122233646.903260-1-kafai@fb.com
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: use cgrp-&gt;kn-&gt;id as the cgroup ID</title>
<updated>2019-11-12T16:18:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-11-04T23:54:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=743210386c0354a2f8ef3d697353c7d8477fa81d'/>
<id>743210386c0354a2f8ef3d697353c7d8477fa81d</id>
<content type='text'>
cgroup ID is currently allocated using a dedicated per-hierarchy idr
and used internally and exposed through tracepoints and bpf.  This is
confusing because there are tracepoints and other interfaces which use
the cgroupfs ino as IDs.

The preceding changes made kn-&gt;id exposed as ino as 64bit ino on
supported archs or ino+gen (low 32bits as ino, high gen).  There's no
reason for cgroup to use different IDs.  The kernfs IDs are unique and
userland can easily discover them and map them back to paths using
standard file operations.

This patch replaces cgroup IDs with kernfs IDs.

* cgroup_id() is added and all cgroup ID users are converted to use it.

* kernfs_node creation is moved to earlier during cgroup init so that
  cgroup_id() is available during init.

* While at it, s/cgroup/cgrp/ in psi helpers for consistency.

* Fallback ID value is changed to 1 to be consistent with root cgroup
  ID.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cgroup ID is currently allocated using a dedicated per-hierarchy idr
and used internally and exposed through tracepoints and bpf.  This is
confusing because there are tracepoints and other interfaces which use
the cgroupfs ino as IDs.

The preceding changes made kn-&gt;id exposed as ino as 64bit ino on
supported archs or ino+gen (low 32bits as ino, high gen).  There's no
reason for cgroup to use different IDs.  The kernfs IDs are unique and
userland can easily discover them and map them back to paths using
standard file operations.

This patch replaces cgroup IDs with kernfs IDs.

* cgroup_id() is added and all cgroup ID users are converted to use it.

* kernfs_node creation is moved to earlier during cgroup init so that
  cgroup_id() is available during init.

* While at it, s/cgroup/cgrp/ in psi helpers for consistency.

* Fallback ID value is changed to 1 to be consistent with root cgroup
  ID.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernfs: convert kernfs_node-&gt;id from union kernfs_node_id to u64</title>
<updated>2019-11-12T16:18:03+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2019-11-04T23:54:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=67c0496e87d193b8356d2af49ab95e8a1b954b3c'/>
<id>67c0496e87d193b8356d2af49ab95e8a1b954b3c</id>
<content type='text'>
kernfs_node-&gt;id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value.  I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to.  Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node-&gt;id a u64.
ino is stored in the lower 32bits and gen upper.  Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen.  This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kernfs_node-&gt;id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value.  I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to.  Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node-&gt;id a u64.
ino is stored in the lower 32bits and gen upper.  Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen.  This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295</title>
<updated>2019-06-05T15:36:38+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-29T14:18:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5b497af42fab12cadc0e29bcb7052cf9963603f5'/>
<id>5b497af42fab12cadc0e29bcb7052cf9963603f5</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Introduce bpf_strtol and bpf_strtoul helpers</title>
<updated>2019-04-12T20:54:59+00:00</updated>
<author>
<name>Andrey Ignatov</name>
<email>rdna@fb.com</email>
</author>
<published>2019-03-19T00:55:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d7a4cb9b6705a89937d12c8158a35a3145dc967a'/>
<id>d7a4cb9b6705a89937d12c8158a35a3145dc967a</id>
<content type='text'>
Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned
long correspondingly. It's similar to user space strtol(3) and
strtoul(3) with a few changes to the API:

* instead of NUL-terminated C string the helpers expect buffer and
  buffer length;

* resulting long or unsigned long is returned in a separate
  result-argument;

* return value is used to indicate success or failure, on success number
  of consumed bytes is returned that can be used to identify position to
  read next if the buffer is expected to contain multiple integers;

* instead of *base* argument, *flags* is used that provides base in 5
  LSB, other bits are reserved for future use;

* number of supported bases is limited.

Documentation for the new helpers is provided in bpf.h UAPI.

The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to
be able to convert string input to e.g. "ulongvec" output.

E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be
parsed by calling to bpf_strtoul three times.

Implementation notes:

Implementation includes "../../lib/kstrtox.h" to reuse integer parsing
functions. It's done exactly same way as fs/proc/base.c already does.

Unfortunately existing kstrtoX function can't be used directly since
they fail if any invalid character is present right after integer in the
string. Existing simple_strtoX functions can't be used either since
they're obsolete and don't handle overflow properly.

Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned
long correspondingly. It's similar to user space strtol(3) and
strtoul(3) with a few changes to the API:

* instead of NUL-terminated C string the helpers expect buffer and
  buffer length;

* resulting long or unsigned long is returned in a separate
  result-argument;

* return value is used to indicate success or failure, on success number
  of consumed bytes is returned that can be used to identify position to
  read next if the buffer is expected to contain multiple integers;

* instead of *base* argument, *flags* is used that provides base in 5
  LSB, other bits are reserved for future use;

* number of supported bases is limited.

Documentation for the new helpers is provided in bpf.h UAPI.

The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to
be able to convert string input to e.g. "ulongvec" output.

E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be
parsed by calling to bpf_strtoul three times.

Implementation notes:

Implementation includes "../../lib/kstrtox.h" to reuse integer parsing
functions. It's done exactly same way as fs/proc/base.c already does.

Unfortunately existing kstrtoX function can't be used directly since
they fail if any invalid character is present right after integer in the
string. Existing simple_strtoX functions can't be used either since
they're obsolete and don't handle overflow properly.

Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: introduce BPF_F_LOCK flag</title>
<updated>2019-02-01T19:55:39+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-31T23:40:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=96049f3afd50fe8db69fa0068cdca822e747b1e4'/>
<id>96049f3afd50fe8db69fa0068cdca822e747b1e4</id>
<content type='text'>
Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
and for map_update() helper function.
In all these cases take a lock of existing element (which was provided
in BTF description) before copying (in or out) the rest of map value.

Implementation details that are part of uapi:

Array:
The array map takes the element lock for lookup/update.

Hash:
hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
If old element exists it takes the element lock and updates the element in place.
If element doesn't exist it allocates new one and inserts into hash table
while holding the bucket lock.
In rare case the hashmap has to take both the bucket lock and the element lock
to update old value in place.

Cgroup local storage:
It is similar to array. update in place and lookup are done with lock taken.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
and for map_update() helper function.
In all these cases take a lock of existing element (which was provided
in BTF description) before copying (in or out) the rest of map value.

Implementation details that are part of uapi:

Array:
The array map takes the element lock for lookup/update.

Hash:
hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
If old element exists it takes the element lock and updates the element in place.
If element doesn't exist it allocates new one and inserts into hash table
while holding the bucket lock.
In rare case the hashmap has to take both the bucket lock and the element lock
to update old value in place.

Cgroup local storage:
It is similar to array. update in place and lookup are done with lock taken.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: introduce bpf_spin_lock</title>
<updated>2019-02-01T19:55:38+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-31T23:40:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d83525ca62cf8ebe3271d14c36fb900c294274a2'/>
<id>d83525ca62cf8ebe3271d14c36fb900c294274a2</id>
<content type='text'>
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&amp;hash_map, &amp;key);
if (val) {
    bpf_spin_lock(&amp;val-&gt;lock);
    val-&gt;cnt++;
    bpf_spin_unlock(&amp;val-&gt;lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&amp;hash_map, &amp;key);
if (val) {
    bpf_spin_lock(&amp;val-&gt;lock);
    val-&gt;cnt++;
    bpf_spin_unlock(&amp;val-&gt;lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix direct packet write into pop/peek helpers</title>
<updated>2018-10-26T00:02:06+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-10-24T20:05:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=80b0d86a176cab6201719b8dfd806902b0c6e046'/>
<id>80b0d86a176cab6201719b8dfd806902b0c6e046</id>
<content type='text'>
Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just
copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but
this is buggy in this context since it would allow writes into cloned
skbs which is invalid. Therefore, disable .pkt_access for the two.

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Mauricio Vasquez B &lt;mauricio.vasquez@polito.it&gt;
Acked-by: Mauricio Vasquez B&lt;mauricio.vasquez@polito.it&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f1a2e44a3aec ("bpf: add queue and stack maps") probably just
copy-pasted .pkt_access for bpf_map_{pop,peek}_elem() helpers, but
this is buggy in this context since it would allow writes into cloned
skbs which is invalid. Therefore, disable .pkt_access for the two.

Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Mauricio Vasquez B &lt;mauricio.vasquez@polito.it&gt;
Acked-by: Mauricio Vasquez B&lt;mauricio.vasquez@polito.it&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
