summaryrefslogtreecommitdiff
path: root/Documentation/bpf
diff options
context:
space:
mode:
authorDaniel Borkmann <daniel@iogearbox.net>2018-03-19 21:14:42 +0100
committerDaniel Borkmann <daniel@iogearbox.net>2018-03-19 21:14:43 +0100
commitd48ce3e5ba741428ed38a665a3c6b41e6cd999be (patch)
tree75f6ebcbcc2d2fa443cf8afe27b9870c8c2776fd /Documentation/bpf
parent318df9f01cffabf120b36daa96dfca273e46cbbf (diff)
parentae30727fa4beabd8403b016896eb1f714e6c0fd4 (diff)
Merge branch 'bpf-sockmap-ulp'
John Fastabend says: ==================== This series adds a BPF hook for sendmsg and senfile by using the ULP infrastructure and sockmap. A simple pseudocode example would be, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above snippet any socket attached to the map would run msg_prog on sendmsg and sendfile system calls. Three additional helpers are added bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With bpf_msg_apply_bytes BPF programs can tell the infrastructure how many bytes the given verdict should apply to. This has two cases. First, a BPF program applies verdict to fewer bytes than in the current sendmsg/sendfile msg this will apply the verdict to the first N bytes of the message then run the BPF program again with data pointers recalculated to the N+1 byte. The second case is the BPF program applies a verdict to more bytes than the current sendmsg or sendfile system call. In this case the infrastructure will cache the verdict and apply it to future sendmsg/sendfile calls until the byte limit is reached. This avoids the overhead of running BPF programs on large payloads. The helper bpf_msg_cork_bytes() handles a different case where a BPF program can not reach a verdict on a msg until it receives more bytes AND the program doesn't want to forward the packet until it is known to be "good". The example case being a user (albeit a dumb one probably) sends a N byte header in 1B system calls. The BPF program can call bpf_msg_cork_bytes with the required byte limit to reach a verdict and then the program will only be called again once N bytes are received. The last helper added in this series is bpf_msg_pull_data(). It is used to pull data in for modification or reading. Similar to how sk_pull_data() works msg_pull_data can be used to access data not in the initial (data_start, data_end) range. For sendpage() calls this is needed if any data is accessed because the BPF sendpage hook initializes the data_start and data_end pointers to zero. We do this because sendpage data is shared with the user and can be modified during or after the BPF verdict possibly invalidating any verdict the BPF program decides. For sendmsg the data is already copied by the sendmsg bpf infrastructure so we only copy the data if the user request a data range that is not already linearized. This happens if the user requests larger blocks of data that are not in a single scatterlist element. The common case seems to be accessing headers which normally are in the first scatterlist element and already linearized. For more examples please review the sample program. There are examples for all the actions and helpers there. Patches 1-8 implement the above sockmap/BPF infrastructure. The remaining patches flush out some minimal selftests and the sample sockmap program. The sockmap sample program is the main vehicle for testing this infrastructure and will be moved into selftests shortly. The final patch in this series is a simple shell script to run a set of tests. These are the tests I run after any changes to sockmap. The next task on the list after this series is to push those into selftests so we can avoid manually testing. Couple notes on future items in the pipeline, 0. move sample sockmap programs into selftests (noted above) 1. add additional support for tcp flags, most are ignored now. 2. add a Documentation/bpf/sockmap file with these details 3. support stacked ULP types to allow this and ktls to cooperate 4. Ingress flag support, redirect only supports egress here. The other redirect helpers support ingress and egress flags. 5. add optimizations, I cut a few optimizations here in the first iteration of the code for later study/implementation -v3 updates : u32 data pointers in msg_md changed to void * : page_address NULL check and flag verification in msg_pull_data : remove old note in commit msg that is no longer relevant : remove enum sk_msg_action its not used anywhere : fixup test_verifier W -> DW insn to account for data pointers : unintentionally dropped a smap_stop_tx() call in sockmap.c I propagated the ACKs forward because above changes were small one/two line changes. -v2 updates (discussion): Dave noticed that sendpage call was previously (in v1) running on the data directly. This allowed users to potentially modify the data after or during the BPF program. However doing a copy automatically even if the data is not accessed has measurable performance impact. So we added another helper modeled after the existing skb_pull_data() helper to allow users to selectively pull data from the msg. This is also useful in the sendmsg case when users need to access data outside the first scatterlist element or across scatterlist boundaries. While doing this I also unified the sendmsg and sendfile handlers a bit. Originally the sendfile call was optimized for never touching the data. I've decided for a first submission to drop this optimization and we can add it back later. It introduced unnecessary complexity, at least for a first posting, for a use case I have not entirely flushed out yet. When the use case is deployed we can add it back if needed. Then we can review concrete performance deltas as well on real-world use-cases/applications. Lastly, I reorganized the patches a bit. Now all sockmap changes are in a single patch and each helper gets its own patch. This, at least IMO, makes it easier to review because sockmap changes are not spread across the patch series. On the other hand now apply_bytes, cork_bytes logic is only activated later in the series. But that should be OK. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Diffstat (limited to 'Documentation/bpf')
0 files changed, 0 insertions, 0 deletions