Age | Commit message (Collapse) | Author |
|
bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets
encapsulation. But that is not appropriate when pushing Ethernet header.
Add an option to further specify encap L2 type and set the inner_protocol
as ETH_P_TEB.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xuesen Huang <huangxuesen@kuaishou.com>
Signed-off-by: Zhiyong Cheng <chengzhiyong@kuaishou.com>
Signed-off-by: Li Wang <wangli09@kuaishou.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/20210304064046.6232-1-hxseverything@gmail.com
|
|
Fix the following coccicheck warnings:
./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A
&& B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1614757930-17197-1-git-send-email-jiapeng.chong@linux.alibaba.com
|
|
Fix the following coccicheck warnings:
./tools/bpf/bpf_dbg.c:1201:55-57: WARNING !A || A && B is equivalent to
!A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1614756035-111280-1-git-send-email-jiapeng.chong@linux.alibaba.com
|
|
sk_lookup doesn't allow setting data_in for bpf_prog_run. This doesn't
play well with the verifier tests, since they always set a 64 byte
input buffer. Allow not running verifier tests by setting
bpf_test.runs to a negative value and don't run the ctx access case
for sk_lookup. We have dedicated ctx access tests so skipping here
doesn't reduce coverage.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-6-lmb@cloudflare.com
|
|
Extend a simple prog_run test to check that PROG_TEST_RUN adheres
to the requested repetitions. Convert it to use BPF skeleton.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-5-lmb@cloudflare.com
|
|
Convert the selftests for sk_lookup narrow context access to use
PROG_TEST_RUN instead of creating actual sockets. This ensures that
ctx is populated correctly when using PROG_TEST_RUN.
Assert concrete values since we now control remote_ip and remote_port.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-4-lmb@cloudflare.com
|
|
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
provides the full bpf_sk_lookup struct as context. Since the
context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.
We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
|
|
Synchronize the header after all of the recent changes.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-16-joe@cilium.io
|
|
Add building of the bpf(2) syscall commands documentation as part of the
docs building step in the build. This allows us to pick up on potential
parse errors from the docs generator script as part of selftests.
The generated manual pages here are not intended for distribution, they
are just a fragment that can be integrated into the other static text of
bpf(2) to form the full manual page.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-14-joe@cilium.io
|
|
Previously, the Makefile here was only targeting a single manual page so
it just hardcoded a bunch of individual rules to specifically handle
build, clean, install, uninstall for that particular page.
Upcoming commits will generate manual pages for an additional section,
so this commit prepares the makefile first by converting the existing
targets into an evaluated set of targets based on the manual page name
and section.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-13-joe@cilium.io
|
|
This logic is used for validating the manual pages from selftests, so
move the infra under tools/testing/selftests/bpf/ and rely on selftests
for validation rather than tying it into the bpftool build.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-12-joe@cilium.io
|
|
Abstract out the target parameter so that upcoming commits, more than
just the existing "helpers" target can be called to generate specific
portions of docs from the eBPF UAPI headers.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io
|
|
Check that floats don't interfere with struct deduplication, that they
are not merged with another kinds and that floats of different sizes are
not merged with each other.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-9-iii@linux.ibm.com
|
|
Test the good variants as well as the potential malformed ones.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-8-iii@linux.ibm.com
|
|
The bit being checked by this test is no longer reserved after
introducing BTF_KIND_FLOAT, so use the next one instead.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-6-iii@linux.ibm.com
|
|
Only dumping support needs to be adjusted, the code structure follows
that of BTF_KIND_INT. Example plain and JSON outputs:
[4] FLOAT 'float' size=4
{"id":4,"kind":"FLOAT","name":"float","size":4}
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-5-iii@linux.ibm.com
|
|
The logic follows that of BTF_KIND_INT most of the time. Sanitization
replaces BTF_KIND_FLOATs with equally-sized empty BTF_KIND_STRUCTs on
older kernels, for example, the following:
[4] FLOAT 'float' size=4
becomes the following:
[4] STRUCT '(anon)' size=4 vlen=0
With dwarves patch [1] and this patch, the older kernels, which were
failing with the floating-point-related errors, will now start working
correctly.
[1] https://github.com/iii-i/dwarves/commit/btf-kind-float-v2
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-4-iii@linux.ibm.com
|
|
Remove trailing space.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-3-iii@linux.ibm.com
|
|
Add a new kind value and expand the kind bitfield.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-2-iii@linux.ibm.com
|
|
The original bcc pull request https://github.com/iovisor/bcc/pull/3270 exposed
a verifier failure with Clang 12/13 while Clang 4 works fine.
Further investigation exposed two issues:
Issue 1: LLVM may generate code which uses less refined value. The issue is
fixed in LLVM patch: https://reviews.llvm.org/D97479
Issue 2: Spills with initial value 0 are marked as precise which makes later
state pruning less effective. This is my rough initial analysis and
further investigation is needed to find how to improve verifier
pruning in such cases.
With the above LLVM patch, for the new loop6.c test, which has smaller loop
bound compared to original test, I got:
$ test_progs -s -n 10/16
...
stack depth 64
processed 390735 insns (limit 1000000) max_states_per_insn 87
total_states 8658 peak_states 964 mark_read 6
#10/16 loop6.o:OK
Use the original loop bound, i.e., commenting out "#define WORKAROUND", I got:
$ test_progs -s -n 10/16
...
BPF program is too large. Processed 1000001 insn
stack depth 64
processed 1000001 insns (limit 1000000) max_states_per_insn 91
total_states 23176 peak_states 5069 mark_read 6
...
#10/16 loop6.o:FAIL
The purpose of this patch is to provide a regression test for the above LLVM fix
and also provide a test case for further analyzing the verifier pruning issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zhenwei Pi <pizhenwei@bytedance.com>
Link: https://lore.kernel.org/bpf/20210226223810.236472-1-yhs@fb.com
|
|
Just like was done for bpftool and selftests in ec23eb705620 ("tools/bpftool:
Allow substituting custom vmlinux.h for the build") and ca4db6389d61
("selftests/bpf: Allow substituting custom vmlinux.h for selftests build"),
allow to provide pre-generated vmlinux.h for runqslower build.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210303004010.653954-1-andrii@kernel.org
|
|
... so callers can correctly detect failure.
Signed-off-by: Ian Denhardt <ian@zenhack.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/b0bea780bc292f29e7b389dd062f20adc2a2d634.1614201868.git.ian@zenhack.net
|
|
Per discussion at [0] this was originally introduced as a warning due
to concerns about breaking existing code, but a hard error probably
makes more sense, especially given that concerns about breakage were
only speculation.
[0] https://lore.kernel.org/bpf/c964892195a6b91d20a67691448567ef528ffa6d.camel@linux.ibm.com/T/#t
Signed-off-by: Ian Denhardt <ian@zenhack.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/a6b6c7516f5d559049d669968e953b4a8d7adea3.1614201868.git.ian@zenhack.net
|
|
A test is added for arraymap and percpu arraymap. The test also
exercises the early return for the helper which does not
traverse all elements.
$ ./test_progs -n 45
#45/1 hash_map:OK
#45/2 array_map:OK
#45 for_each:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204934.3885756-1-yhs@fb.com
|
|
A test case is added for hashmap and percpu hashmap. The test
also exercises nested bpf_for_each_map_elem() calls like
bpf_prog:
bpf_for_each_map_elem(func1)
func1:
bpf_for_each_map_elem(func2)
func2:
$ ./test_progs -n 45
#45/1 hash_map:OK
#45 for_each:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204933.3885657-1-yhs@fb.com
|
|
With later hashmap example, using bpftool xlated output may
look like:
int dump_task(struct bpf_iter__task * ctx):
; struct task_struct *task = ctx->task;
0: (79) r2 = *(u64 *)(r1 +8)
; if (task == (void *)0 || called > 0)
...
19: (18) r2 = subprog[+17]
30: (18) r2 = subprog[+25]
...
36: (95) exit
__u64 check_hash_elem(struct bpf_map * map, __u32 * key, __u64 * val,
struct callback_ctx * data):
; struct bpf_iter__task *ctx = data->ctx;
37: (79) r5 = *(u64 *)(r4 +0)
...
55: (95) exit
__u64 check_percpu_elem(struct bpf_map * map, __u32 * key,
__u64 * val, void * unused):
; check_percpu_elem(struct bpf_map *map, __u32 *key, __u64 *val, void *unused)
56: (bf) r6 = r3
...
83: (18) r2 = subprog[-47]
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204931.3885458-1-yhs@fb.com
|
|
A new relocation RELO_SUBPROG_ADDR is added to capture
subprog addresses loaded with ld_imm64 insns. Such ld_imm64
insns are marked with BPF_PSEUDO_FUNC and will be passed to
kernel. For bpf_for_each_map_elem() case, kernel will
check that the to-be-used subprog address must be a static
function and replace it with proper actual jited func address.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204930.3885367-1-yhs@fb.com
|
|
Move function is_ldimm64() close to the beginning of libbpf.c,
so it can be reused by later code and the next patch as well.
There is no functionality change.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204929.3885295-1-yhs@fb.com
|
|
The bpf_for_each_map_elem() helper is introduced which
iterates all map elements with a callback function. The
helper signature looks like
long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags)
and for each map element, the callback_fn will be called. For example,
like hashmap, the callback signature may look like
long callback_fn(map, key, val, callback_ctx)
There are two known use cases for this. One is from upstream ([1]) where
a for_each_map_elem helper may help implement a timeout mechanism
in a more generic way. Another is from our internal discussion
for a firewall use case where a map contains all the rules. The packet
data can be compared to all these rules to decide allow or deny
the packet.
For array maps, users can already use a bounded loop to traverse
elements. Using this helper can avoid using bounded loop. For other
type of maps (e.g., hash maps) where bounded loop is hard or
impossible to use, this helper provides a convenient way to
operate on all elements.
For callback_fn, besides map and map element, a callback_ctx,
allocated on caller stack, is also passed to the callback
function. This callback_ctx argument can provide additional
input and allow to write to caller stack for output.
If the callback_fn returns 0, the helper will iterate through next
element if available. If the callback_fn returns 1, the helper
will stop iterating and returns to the bpf program. Other return
values are not used for now.
Currently, this helper is only available with jit. It is possible
to make it work with interpreter with so effort but I leave it
as the future work.
[1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
|
|
Building selftests in a separate directory like this:
make O="$BUILD" -C tools/testing/selftests/bpf
and then running:
cd "$BUILD" && ./test_progs -t btf
causes all the non-flavored btf_dump_test_case_*.c tests to fail,
because these files are not copied to where test_progs expects to find
them.
Fix by not skipping EXT-COPY when the original $(OUTPUT) is not empty
(lib.mk sets it to $(shell pwd) in that case) and using rsync instead
of cp: cp fails because e.g. urandom_read is being copied into itself,
and rsync simply skips such cases. rsync is already used by kselftests
and therefore is not a new dependency.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210224111445.102342-1-iii@linux.ibm.com
|
|
When vmtest.sh ran a command in a VM, it did not record or propagate the
error code of the command. This made the script less "script-able". The
script now saves the error code of the said command in a file in the VM,
copies the file back to the host and (when available) uses this error
code instead of its own.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210225161947.1778590-1-kpsingh@kernel.org
|
|
These two eBPF programs are tied to BPF_SK_SKB_STREAM_PARSER
and BPF_SK_SKB_STREAM_VERDICT, rename them to reflect the fact
they are only used for TCP. And save the name 'skb_verdict' for
general use later.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210223184934.6054-6-xiyou.wangcong@gmail.com
|
|
Commit 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") added an extra
blank line in bpf helper description. This will make bpf_helpers_doc.py stop
building bpf_helper_defs.h immediately after bpf_check_mtu(), which will
affect future added functions.
Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20210223131457.1378978-1-liuhangbin@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This commit introduces a range of tests to the xsk testsuite
for validating xsk statistics.
A new test type called 'stats' is added. Within it there are
four sub-tests. Each test configures a scenario which should
trigger the given error statistic. The test passes if the statistic
is successfully incremented.
The four statistics for which tests have been created are:
1. rx dropped
Increase the UMEM frame headroom to a value which results in
insufficient space in the rx buffer for both the packet and the headroom.
2. tx invalid
Set the 'len' field of tx descriptors to an invalid value (umem frame
size + 1).
3. rx ring full
Reduce the size of the RX ring to a fraction of the fill ring size.
4. fill queue empty
Do not populate the fill queue and then try to receive pkts.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210223162304.7450-5-ciara.loftus@intel.com
|
|
Prior to this commit individual xsk tests were launched from the
shell script 'test_xsk.sh'. When adding a new test type, two new test
configurations had to be added to this file - one for each of the
supported XDP 'modes' (skb or drv). Should zero copy support be added to
the xsk selftest framework in the future, three new test configurations
would need to be added for each new test type. Each new test type also
typically requires new CLI arguments for the xdpxceiver program.
This commit aims to reduce the overhead of adding new tests, by launching
the test configurations from within the xdpxceiver program itself, using
simple loops. Every test is run every time the C program is executed. Many
of the CLI arguments can be removed as a result.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210223162304.7450-4-ciara.loftus@intel.com
|
|
Launching xdpxceiver with -D enables what was formerly know as 'debug'
mode. Rename this mode to 'dump-pkts' as it better describes the
behavior enabled by the option. New usage:
./xdpxceiver .. -D
or
./xdpxceiver .. --dump-pkts
Also make it possible to pass this flag to the app via the test_xsk.sh
shell script like so:
./test_xsk.sh -D
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210223162304.7450-3-ciara.loftus@intel.com
|
|
Make the xsk tests less verbose by only printing the
essentials. Currently, it is hard to see if the tests passed or not
due to all the printouts. Move the extra printouts to a verbose
option, if further debugging is needed when a problem arises.
To run the xsk tests with verbose output:
./test_xsk.sh -v
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20210223162304.7450-2-ciara.loftus@intel.com
|
|
Replace hashtab with task local storage in runqslower. This improves the
performance of these BPF programs. The following table summarizes average
runtime of these programs, in nanoseconds:
task-local hash-prealloc hash-no-prealloc
handle__sched_wakeup 125 340 3124
handle__sched_wakeup_new 2812 1510 2998
handle__sched_switch 151 208 991
Note that, task local storage gives better performance than hashtab for
handle__sched_wakeup and handle__sched_switch. On the other hand, for
handle__sched_wakeup_new, task local storage is slower than hashtab with
prealloc. This is because handle__sched_wakeup_new accesses the data for
the first time, so it has to allocate the data for task local storage.
Once the initial allocation is done, subsequent accesses, as those in
handle__sched_wakeup, are much faster with task local storage. If we
disable hashtab prealloc, task local storage is much faster for all 3
functions.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210225234319.336131-7-songliubraving@fb.com
|
|
Update the Makefile to prefer using $(O)/vmlinux, $(KBUILD_OUTPUT)/vmlinux
(for selftests) or ../../../vmlinux. These two files should have latest
definitions for vmlinux.h.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210225234319.336131-6-songliubraving@fb.com
|
|
Add a test with recursive bpf_task_storage_[get|delete] from fentry
programs on bpf_local_storage_lookup and bpf_local_storage_update. Without
proper deadlock prevent mechanism, this test would cause deadlock.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210225234319.336131-5-songliubraving@fb.com
|
|
Task local storage is enabled for tracing programs. Add two tests for
task local storage without CONFIG_BPF_LSM.
The first test stores a value in sys_enter and read it back in sys_exit.
The second test checks whether the kernel allows allocating task local
storage in exit_creds() (which it should not).
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210225234319.336131-4-songliubraving@fb.com
|
|
This adds both the CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES
kernel compile option to output of the bpftool feature command.
This is relevant for developers that want to account for data structure
definition differences between kernels.
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210222195846.155483-1-grantseltzer@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler updates:
- Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full), to allow
distros to build a PREEMPT kernel but fall back to close to
PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via
a boot time selection.
There's also the /debug/sched_debug switch to do this runtime.
This feature is implemented via runtime patching (a new variant of
static calls).
The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.
( Note that the dynamic none/voluntary mode isn't 100% identical,
for example preempt-RCU is available in all cases, plus the
preempt count is maintained in all models, which has runtime
overhead even with the code patching. )
The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast
majority of distributions, are supposed to be unaffected.
- Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.
In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address the
underlying issue of missed preemption events. These are the initial
fixes that should fix current incarnations of the bug.
- Clean up rbtree usage in the scheduler, by providing & using the
following consistent set of rbtree APIs:
partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached
total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found
- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two
- Improve the SMP/NUMA load-balancer: scan for an idle sibling in a
single pass. This is a 4-commit series where each commit improves
one aspect of the idle sibling scan logic.
- Improve the cpufreq cooling driver by getting the effective CPU
utilization metrics from the scheduler
- Improve the fair scheduler's active load-balancing logic by
reducing the number of active LB attempts & lengthen the
load-balancing interval. This improves stress-ng mmapfork
performance.
- Fix CFS's estimated utilization (util_est) calculation bug that can
result in too high utilization values
Misc updates & fixes:
- Fix the HRTICK reprogramming & optimization feature
- Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
- Reduce dl_add_task_root_domain() overhead
- Fix uprobes refcount bug
- Process pending softirqs in flush_smp_call_function_from_idle()
- Clean up task priority related defines, remove *USER_*PRIO and
USER_PRIO()
- Simplify the sched_init_numa() deduplication sort
- Documentation updates
- Fix EAS bug in update_misfit_status(), which degraded the quality
of energy-balancing
- Smaller cleanups"
* tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
sched,x86: Allow !PREEMPT_DYNAMIC
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
entry: Explicitly flush pending rcuog wakeup before last rescheduling point
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
rcu/nocb: Perform deferred wake up before last idle's need_resched() check
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
sched/features: Distinguish between NORMAL and DEADLINE hrtick
sched/features: Fix hrtick reprogramming
sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
uprobes: (Re)add missing get_uprobe() in __find_uprobe()
smp: Process pending softirqs in flush_smp_call_function_from_idle()
sched: Harden PREEMPT_DYNAMIC
static_call: Allow module use without exposing static_call_key
sched: Add /debug/sched_preempt
preempt/dynamic: Support dynamic preempt with preempt= boot option
preempt/dynamic: Provide irqentry_exit_cond_resched() static call
preempt/dynamic: Provide preempt_schedule[_notrace]() static calls
preempt/dynamic: Provide cond_resched() and might_resched() static calls
preempt: Introduce CONFIG_PREEMPT_DYNAMIC
static_call: Provide DEFINE_STATIC_CALL_RET0()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Core locking primitives updates:
- Remove mutex_trylock_recursive() from the API - no users left
- Simplify + constify the futex code a bit
Lockdep updates:
- Teach lockdep about local_lock_t
- Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for
potentially unsafe IRQ mask restoration patterns. (I.e.
calling raw_local_irq_restore() with IRQs enabled.)
- Add wait context self-tests
- Fix graph lock corner case corrupting internal data structures
- Fix noinstr annotations
LKMM updates:
- Simplify the litmus tests
- Documentation fixes
KCSAN updates:
- Re-enable KCSAN instrumentation in lib/random32.c
Misc fixes:
- Don't branch-trace static label APIs
- DocBook fix
- Remove stale leftover empty file"
* tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
checkpatch: Don't check for mutex_trylock_recursive()
locking/mutex: Kill mutex_trylock_recursive()
s390: Use arch_local_irq_{save,restore}() in early boot code
lockdep: Noinstr annotate warn_bogus_irq_restore()
locking/lockdep: Avoid unmatched unlock
locking/rwsem: Remove empty rwsem.h
locking/rtmutex: Add missing kernel-doc markup
futex: Remove unneeded gotos
futex: Change utime parameter to be 'const ... *'
lockdep: report broken irq restoration
jump_label: Do not profile branch annotations
locking: Add Reviewers
locking/selftests: Add local_lock inversion tests
locking/lockdep: Exclude local_lock_t from IRQ inversions
locking/lockdep: Clean up check_redundant() a bit
locking/lockdep: Add a skip() function to __bfs()
locking/lockdep: Mark local_lock_t
locking/selftests: More granular debug_locks_verbose
lockdep/selftest: Add wait context selftests
tools/memory-model: Fix typo in klitmus7 compatibility table
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20210105, fix and clean up the handling of device properties, add
support for setting global profile of the platform, clean up device
enumeration, the CPPC library, the APEI support and more, update the
documentation, consolidate the printing of messages in several places
and make assorted janitorial changes.
Specifics:
- Update ACPICA code in the kernel to upstream revision 20201113 with
changes as follows:
* Remove the MTMR (Mid-Timer) table (Al Stone).
* Remove the VRTC table (Al Stone).
* Add type casts for string functions (Bob Moore).
* Update all copyrights to 2021 (Bob Moore).
* Fix exception code class checks (Maximilian Luz).
* Clean up exception code class checks (Maximilian Luz).
* Fix -Wfallthrough (Nick Desaulniers).
- Add support for setting and reading global profile of the platform
along with documentation (Mark Pearson, Hans de Goede, Jiaxun
Yang).
- Fix fwnode properties matching and clean up the code handling
device properties and its documentation (Rafael Wysocki, Andy
Shevchenko).
- Clean up ACPI-based device enumeration code (Rafael Wysocki).
- Clean up the CPPC support library code (Ionela Voinescu).
- Clean up the APEI support code (Yang Li, Yazen Ghannam).
- Update GPIO-related properties documentation (Flavio Suligoi).
- Consolidate and clean up the printing of messages in several places
(Rafael Wysocki).
- Fix error code path in configfs handling code (Qinglang Miao).
- Use DEVICE_ATTR_<RW|RO|WO> macros where applicable (Dwaipayan Ray).
- Replace tests for !ACPI_FAILURE with tests for ACPI_SUCCESS in
multiple places (Bjorn Helgaas)"
* tag 'acpi-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
ACPI: property: Satisfy kernel doc validator (part 2)
ACPI: property: Satisfy kernel doc validator (part 1)
ACPI: property: Make acpi_node_prop_read() static
ACPI: property: Remove dead code
ACPI: property: Fix fwnode string properties matching
ACPI: OSL: Clean up printing messages
ACPI: OSL: Rework acpi_check_resource_conflict()
ACPI: APEI: ERST: remove unneeded semicolon
ACPI: thermal: Clean up printing messages
ACPI: video: Clean up printing messages
ACPI: button: Clean up printing messages
ACPI: battery: Clean up printing messages
ACPI: AC: Clean up printing messages
ACPI: bus: Drop ACPI_BUS_COMPONENT which is not used any more
ACPI: utils: Clean up printing messages
ACPI: scan: Clean up printing messages
ACPI: bus: Clean up printing messages
ACPI: PM: Clean up printing messages
ACPI: power: Clean up printing messages
ACPI: APEI: Add is_generic_error() to identify GHES sources
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add a new power capping facility allowing aggregate power
constraints to be applied to sets of devices in a distributed manner,
add a new CPU ID to the RAPL power capping driver and improve it, drop
a cpufreq driver belonging to a platform that is not supported any
more, drop two redundant cpufreq driver flags, update cpufreq drivers
(intel_pstate, brcmstb-avs, qcom-hw), update the operating performance
points (OPP) framework (code cleanups, new helpers, devfreq-related
modifications), clean up devfreq, extend the PM clock layer, update
the cpupower utility and make assorted janitorial changes.
Specifics:
- Add new power capping facility called DTPM (Dynamic Thermal Power
Management), based on the existing power capping framework, to
allow aggregate power constraints to be applied to sets of devices
in a distributed manner, along with a CPU backend driver based on
the Energy Model (Daniel Lezcano, Dan Carpenter, Colin Ian King).
- Add AlderLake Mobile support to the Intel RAPL power capping driver
and make it use the topology interface when laying out the system
topology (Zhang Rui, Yunfeng Ye).
- Drop the cpufreq tango driver belonging to a platform that is not
supported any more (Arnd Bergmann).
- Drop the redundant CPUFREQ_STICKY and CPUFREQ_PM_NO_WARN cpufreq
driver flags (Viresh Kumar).
- Update cpufreq drivers:
* Fix max CPU frequency discovery in the intel_pstate driver and
make janitorial changes in it (Chen Yu, Rafael Wysocki, Nigel
Christian).
* Fix resource leaks in the brcmstb-avs-cpufreq driver (Christophe
JAILLET).
* Make the tegra20 driver use the resource-managed API (Dmitry
Osipenko).
* Enable boost support in the qcom-hw driver (Shawn Guo).
- Update the operating performance points (OPP) framework:
* Clean up the OPP core (Dmitry Osipenko, Viresh Kumar).
* Extend the OPP API by adding new helpers to it (Dmitry Osipenko,
Viresh Kumar).
* Allow required OPPs to be used for devfreq devices and update
the devfreq governor code accordingly (Saravana Kannan).
* Prepare the framework for introducing new dev_pm_opp_set_opp()
helper (Viresh Kumar).
* Drop dev_pm_opp_set_bw() and update related drivers (Viresh
Kumar).
* Allow lazy linking of required-OPPs (Viresh Kumar).
- Simplify and clean up devfreq somewhat (Lukasz Luba, Yang Li,
Pierre Kuo).
- Update the generic power domains (genpd) framework:
* Use device's next wakeup to determine domain idle state (Lina
Iyer).
* Improve initialization and debug (Dmitry Osipenko).
* Simplify computations (Abaci Team).
- Make janitorial changes in the core code handling system sleep and
PM-runtime (Bhaskar Chowdhury, Bjorn Helgaas, Rikard Falkeborn,
Zqiang).
- Update the MAINTAINERS entry for the exynos cpuidle driver and drop
DEBUG definition from intel_idle (Krzysztof Kozlowski, Tom Rix).
- Extend the PM clock layer to cover clocks that must sleep (Nicolas
Pitre).
- Update the cpupower utility:
* Update cpupower command, add support for AMD family 0x19 and
clean up the code to remove many of the family checks to make
future family updates easier (Nathan Fontenot, Robert Richter).
* Add Makefile dependencies for install targets to allow building
cpupower in parallel rather than serially (Ivan Babrou).
- Make janitorial changes in power management Kconfig (Lukasz Luba)"
* tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
MAINTAINERS: cpuidle: exynos: include header in file pattern
powercap: intel_rapl: Use topology interface in rapl_init_domains()
powercap: intel_rapl: Use topology interface in rapl_add_package()
PM: sleep: Constify static struct attribute_group
PM: Kconfig: remove unneeded "default n" options
PM: EM: update Kconfig description and drop "default n" option
cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN
cpufreq: Remove CPUFREQ_STICKY flag
PM / devfreq: Add required OPPs support to passive governor
PM / devfreq: Cache OPP table reference in devfreq
OPP: Add function to look up required OPP's for a given OPP
PM / devfreq: rk3399_dmc: Remove unneeded semicolon
opp: Replace ENOTSUPP with EOPNOTSUPP
opp: Fix "foo * bar" should be "foo *bar"
opp: Don't ignore clk_get() errors other than -ENOENT
opp: Update bandwidth requirements based on scaling up/down
opp: Allow lazy-linking of required-opps
opp: Remove dev_pm_opp_set_bw()
devfreq: tegra30: Migrate to dev_pm_opp_set_opp()
drm: msm: Migrate to dev_pm_opp_set_opp()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPUID cleanup from Borislav Petkov:
"Assign a dedicated feature word to a CPUID leaf which is widely used"
* tag 'x86_cpu_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 misc updates from Borislav Petkov:
- Complete the MSR write filtering by applying it to the MSR ioctl
interface too.
- Other misc small fixups.
* tag 'x86_misc_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MSR: Filter MSR writes through X86_IOC_WRMSR_REGS ioctl too
selftests/fpu: Fix debugfs_simple_attr.cocci warning
selftests/x86: Use __builtin_ia32_read/writeeflags
x86/reboot: Add Zotac ZBOX CI327 nano PCI reboot quirk
|
|
When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called. This is
not desirable in general.
Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.
Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.
Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
|