<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/skmsg.h, branch v5.10-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>bpf, sockmap: Allow skipping sk_skb parser program</title>
<updated>2020-10-12T01:09:44+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-10-11T05:09:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ef5659280eb13e8ac31c296f58cfdfa1684ac06b'/>
<id>ef5659280eb13e8ac31c296f58cfdfa1684ac06b</id>
<content type='text'>
Currently, we often run with a nop parser namely one that just does
this, 'return skb-&gt;len'. This happens when either our verdict program
can handle streaming data or it is only looking at socket data such
as IP addresses and other metadata associated with the flow. The second
case is common for a L3/L4 proxy for instance.

So lets allow loading programs without the parser then we can skip
the stream parser logic and avoid having to add a BPF program that
is effectively a nop.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, we often run with a nop parser namely one that just does
this, 'return skb-&gt;len'. This happens when either our verdict program
can handle streaming data or it is only looking at socket data such
as IP addresses and other metadata associated with the flow. The second
case is common for a L3/L4 proxy for instance.

So lets allow loading programs without the parser then we can skip
the stream parser logic and avoid having to add a BPF program that
is effectively a nop.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/160239297866.8495.13345662302749219672.stgit@john-Precision-5820-Tower
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sk_msg: Simplify sk_psock initialization</title>
<updated>2020-08-21T22:16:11+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-08-21T10:29:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b219da43f94a3b4d5a8aa4cc52b75b34f0301ec'/>
<id>7b219da43f94a3b4d5a8aa4cc52b75b34f0301ec</id>
<content type='text'>
Initializing psock-&gt;sk_proto and other saved callbacks is only
done in sk_psock_update_proto, after sk_psock_init has returned.
The logic for this is difficult to follow, and needlessly complex.

Instead, initialize psock-&gt;sk_proto whenever we allocate a new
psock. Additionally, assert the following invariants:

* The SK has no ULP: ULP does it's own finagling of sk-&gt;sk_prot
* sk_user_data is unused: we need it to store sk_psock

Protect our access to sk_user_data with sk_callback_lock, which
is what other users like reuseport arrays, etc. do.

The result is that an sk_psock is always fully initialized, and
that psock-&gt;sk_proto is always the "original" struct proto.
The latter allows us to use psock-&gt;sk_proto when initializing
IPv6 TCP / UDP callbacks for sockmap.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200821102948.21918-2-lmb@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Initializing psock-&gt;sk_proto and other saved callbacks is only
done in sk_psock_update_proto, after sk_psock_init has returned.
The logic for this is difficult to follow, and needlessly complex.

Instead, initialize psock-&gt;sk_proto whenever we allocate a new
psock. Additionally, assert the following invariants:

* The SK has no ULP: ULP does it's own finagling of sk-&gt;sk_prot
* sk_user_data is unused: we need it to store sk_psock

Protect our access to sk_user_data with sk_callback_lock, which
is what other users like reuseport arrays, etc. do.

The result is that an sk_psock is always fully initialized, and
that psock-&gt;sk_proto is always the "original" struct proto.
The latter allows us to use psock-&gt;sk_proto when initializing
IPv6 TCP / UDP callbacks for sockmap.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200821102948.21918-2-lmb@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: sockmap: Require attach_bpf_fd when detaching a program</title>
<updated>2020-06-30T17:46:39+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-06-29T09:56:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bb0de3131f4c60a9bf976681e0fe4d1e55c7a821'/>
<id>bb0de3131f4c60a9bf976681e0fe4d1e55c7a821</id>
<content type='text'>
The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix running sk_skb program types with ktls</title>
<updated>2020-06-01T21:48:32+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-05-29T23:06:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e91de6afa81c10e9f855c5695eb9a53168d96b73'/>
<id>e91de6afa81c10e9f855c5695eb9a53168d96b73</id>
<content type='text'>
KTLS uses a stream parser to collect TLS messages and send them to
the upper layer tls receive handler. This ensures the tls receiver
has a full TLS header to parse when it is run. However, when a
socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
is enabled we end up with two stream parsers running on the same
socket.

The result is both try to run on the same socket. First the KTLS
stream parser runs and calls read_sock() which will tcp_read_sock
which in turn calls tcp_rcv_skb(). This dequeues the skb from the
sk_receive_queue. When this is done KTLS code then data_ready()
callback which because we stacked KTLS on top of the bpf stream
verdict program has been replaced with sk_psock_start_strp(). This
will in turn kick the stream parser again and eventually do the
same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
a skb from the sk_receive_queue.

At this point the data stream is broke. Part of the stream was
handled by the KTLS side some other bytes may have been handled
by the BPF side. Generally this results in either missing data
or more likely a "Bad Message" complaint from the kTLS receive
handler as the BPF program steals some bytes meant to be in a
TLS header and/or the TLS header length is no longer correct.

We've already broke the idealized model where we can stack ULPs
in any order with generic callbacks on the TX side to handle this.
So in this patch we do the same thing but for RX side. We add
a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
program is running and add a tls_sw_has_ctx_rx() helper so BPF
side can learn there is a TLS ULP on the socket.

Then on BPF side we omit calling our stream parser to avoid
breaking the data stream for the KTLS receiver. Then on the
KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
receiver is done with the packet but before it posts the
msg to userspace. This gives us symmetry between the TX and
RX halfs and IMO makes it usable again. On the TX side we
process packets in this order BPF -&gt; TLS -&gt; TCP and on
the receive side in the reverse order TCP -&gt; TLS -&gt; BPF.

Discovered while testing OpenSSL 3.0 Alpha2.0 release.

Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
KTLS uses a stream parser to collect TLS messages and send them to
the upper layer tls receive handler. This ensures the tls receiver
has a full TLS header to parse when it is run. However, when a
socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
is enabled we end up with two stream parsers running on the same
socket.

The result is both try to run on the same socket. First the KTLS
stream parser runs and calls read_sock() which will tcp_read_sock
which in turn calls tcp_rcv_skb(). This dequeues the skb from the
sk_receive_queue. When this is done KTLS code then data_ready()
callback which because we stacked KTLS on top of the bpf stream
verdict program has been replaced with sk_psock_start_strp(). This
will in turn kick the stream parser again and eventually do the
same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
a skb from the sk_receive_queue.

At this point the data stream is broke. Part of the stream was
handled by the KTLS side some other bytes may have been handled
by the BPF side. Generally this results in either missing data
or more likely a "Bad Message" complaint from the kTLS receive
handler as the BPF program steals some bytes meant to be in a
TLS header and/or the TLS header length is no longer correct.

We've already broke the idealized model where we can stack ULPs
in any order with generic callbacks on the TX side to handle this.
So in this patch we do the same thing but for RX side. We add
a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
program is running and add a tls_sw_has_ctx_rx() helper so BPF
side can learn there is a TLS ULP on the socket.

Then on BPF side we omit calling our stream parser to avoid
breaking the data stream for the KTLS receiver. Then on the
KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
receiver is done with the packet but before it posts the
msg to userspace. This gives us symmetry between the TX and
RX halfs and IMO makes it usable again. On the TX side we
process packets in this order BPF -&gt; TLS -&gt; TCP and on
the receive side in the reverse order TCP -&gt; TLS -&gt; BPF.

Discovered while testing OpenSSL 3.0 Alpha2.0 release.

Fixes: d829e9c4112b5 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-Tower
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size</title>
<updated>2020-05-05T22:22:22+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-05-04T17:21:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=81aabbb9fb7b4b1efd073b62f0505d3adad442f3'/>
<id>81aabbb9fb7b4b1efd073b62f0505d3adad442f3</id>
<content type='text'>
In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size
which is used to track total bytes in a message. But this is not
correct because apply_bytes is itself modified in the main loop doing
the mem_charge.

Then at the end of this we have sg.size incorrectly set and out of
sync with actual sk values. Then we can get a splat if we try to
cork the data later and again try to redirect the msg to ingress. To
fix instead of trying to track msg.size do the easy thing and include
it as part of the sk_msg_xfer logic so that when the msg is moved the
sg.size is always correct.

To reproduce the below users will need ingress + cork and hit an
error path that will then try to 'free' the skmsg.

[  173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120
[  173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317

[  173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G          I       5.7.0-rc1+ #43
[  173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[  173.700009] Call Trace:
[  173.700021]  dump_stack+0x8e/0xcb
[  173.700029]  ? sk_msg_free_elem+0xdd/0x120
[  173.700034]  ? sk_msg_free_elem+0xdd/0x120
[  173.700042]  __kasan_report+0x102/0x15f
[  173.700052]  ? sk_msg_free_elem+0xdd/0x120
[  173.700060]  kasan_report+0x32/0x50
[  173.700070]  sk_msg_free_elem+0xdd/0x120
[  173.700080]  __sk_msg_free+0x87/0x150
[  173.700094]  tcp_bpf_send_verdict+0x179/0x4f0
[  173.700109]  tcp_bpf_sendpage+0x3ce/0x5d0

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size
which is used to track total bytes in a message. But this is not
correct because apply_bytes is itself modified in the main loop doing
the mem_charge.

Then at the end of this we have sg.size incorrectly set and out of
sync with actual sk values. Then we can get a splat if we try to
cork the data later and again try to redirect the msg to ingress. To
fix instead of trying to track msg.size do the easy thing and include
it as part of the sk_msg_xfer logic so that when the msg is moved the
sg.size is always correct.

To reproduce the below users will need ingress + cork and hit an
error path that will then try to 'free' the skmsg.

[  173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120
[  173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317

[  173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G          I       5.7.0-rc1+ #43
[  173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[  173.700009] Call Trace:
[  173.700021]  dump_stack+0x8e/0xcb
[  173.700029]  ? sk_msg_free_elem+0xdd/0x120
[  173.700034]  ? sk_msg_free_elem+0xdd/0x120
[  173.700042]  __kasan_report+0x102/0x15f
[  173.700052]  ? sk_msg_free_elem+0xdd/0x120
[  173.700060]  kasan_report+0x32/0x50
[  173.700070]  sk_msg_free_elem+0xdd/0x120
[  173.700080]  __sk_msg_free+0x87/0x150
[  173.700094]  tcp_bpf_send_verdict+0x179/0x4f0
[  173.700109]  tcp_bpf_sendpage+0x3ce/0x5d0

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: sockmap: Move generic sockmap hooks from BPF TCP</title>
<updated>2020-03-09T21:34:58+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-03-09T11:12:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f747632b608f90217a4e9ebb1deba8a37612aa32'/>
<id>f747632b608f90217a4e9ebb1deba8a37612aa32</id>
<content type='text'>
The init, close and unhash handlers from TCP sockmap are generic,
and can be reused by UDP sockmap. Move the helpers into the sockmap code
base and expose them. This requires tcp_bpf_get_proto and tcp_bpf_clone to
be conditional on BPF_STREAM_PARSER.

The moved functions are unmodified, except that sk_psock_unlink is
renamed to sock_map_unlink to better match its behaviour.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-6-lmb@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The init, close and unhash handlers from TCP sockmap are generic,
and can be reused by UDP sockmap. Move the helpers into the sockmap code
base and expose them. This requires tcp_bpf_get_proto and tcp_bpf_clone to
be conditional on BPF_STREAM_PARSER.

The moved functions are unmodified, except that sk_psock_unlink is
renamed to sock_map_unlink to better match its behaviour.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-6-lmb@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>skmsg: Update saved hooks only once</title>
<updated>2020-03-09T21:34:58+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-03-09T11:12:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1a2e20132db7bb76dd4f97b8364bd167227dd15f'/>
<id>1a2e20132db7bb76dd4f97b8364bd167227dd15f</id>
<content type='text'>
Only update psock-&gt;saved_* if psock-&gt;sk_proto has not been initialized
yet. This allows us to get rid of tcp_bpf_reinit_sk_prot.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-3-lmb@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only update psock-&gt;saved_* if psock-&gt;sk_proto has not been initialized
yet. This allows us to get rid of tcp_bpf_reinit_sk_prot.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-3-lmb@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: sockmap: Only check ULP for TCP sockets</title>
<updated>2020-03-09T21:34:58+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-03-09T11:12:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b70973d7edb2f005511102d5a2e0116464a46a1'/>
<id>7b70973d7edb2f005511102d5a2e0116464a46a1</id>
<content type='text'>
The sock map code checks that a socket does not have an active upper
layer protocol before inserting it into the map. This requires casting
via inet_csk, which isn't valid for UDP sockets.

Guard checks for ULP by checking inet_sk(sk)-&gt;is_icsk first.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-2-lmb@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The sock map code checks that a socket does not have an active upper
layer protocol before inserting it into the map. This requires casting
via inet_csk, which isn't valid for UDP sockets.

Guard checks for ULP by checking inet_sk(sk)-&gt;is_icsk first.

Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200309111243.6982-2-lmb@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>net, sk_msg: Annotate lockless access to sk_prot on clone</title>
<updated>2020-02-21T21:29:45+00:00</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-02-18T17:10:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8e202d1d1d0f182f01062804efb523ea9a9008c'/>
<id>b8e202d1d1d0f182f01062804efb523ea9a9008c</id>
<content type='text'>
sk_msg and ULP frameworks override protocol callbacks pointer in
sk-&gt;sk_prot, while tcp accesses it locklessly when cloning the listening
socket, that is with neither sk_lock nor sk_callback_lock held.

Once we enable use of listening sockets with sockmap (and hence sk_msg),
there will be shared access to sk-&gt;sk_prot if socket is getting cloned
while being inserted/deleted to/from the sockmap from another CPU:

Read side:

tcp_v4_rcv
  sk = __inet_lookup_skb(...)
  tcp_check_req(sk)
    inet_csk(sk)-&gt;icsk_af_ops-&gt;syn_recv_sock
      tcp_v4_syn_recv_sock
        tcp_create_openreq_child
          inet_csk_clone_lock
            sk_clone_lock
              READ_ONCE(sk-&gt;sk_prot)

Write side:

sock_map_ops-&gt;map_update_elem
  sock_map_update_elem
    sock_map_update_common
      sock_map_link_no_progs
        tcp_bpf_init
          tcp_bpf_update_sk_prot
            sk_psock_update_proto
              WRITE_ONCE(sk-&gt;sk_prot, ops)

sock_map_ops-&gt;map_delete_elem
  sock_map_delete_elem
    __sock_map_delete
     sock_map_unref
       sk_psock_put
         sk_psock_drop
           sk_psock_restore_proto
             tcp_update_ulp
               WRITE_ONCE(sk-&gt;sk_prot, proto)

Mark the shared access with READ_ONCE/WRITE_ONCE annotations.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sk_msg and ULP frameworks override protocol callbacks pointer in
sk-&gt;sk_prot, while tcp accesses it locklessly when cloning the listening
socket, that is with neither sk_lock nor sk_callback_lock held.

Once we enable use of listening sockets with sockmap (and hence sk_msg),
there will be shared access to sk-&gt;sk_prot if socket is getting cloned
while being inserted/deleted to/from the sockmap from another CPU:

Read side:

tcp_v4_rcv
  sk = __inet_lookup_skb(...)
  tcp_check_req(sk)
    inet_csk(sk)-&gt;icsk_af_ops-&gt;syn_recv_sock
      tcp_v4_syn_recv_sock
        tcp_create_openreq_child
          inet_csk_clone_lock
            sk_clone_lock
              READ_ONCE(sk-&gt;sk_prot)

Write side:

sock_map_ops-&gt;map_update_elem
  sock_map_update_elem
    sock_map_update_common
      sock_map_link_no_progs
        tcp_bpf_init
          tcp_bpf_update_sk_prot
            sk_psock_update_proto
              WRITE_ONCE(sk-&gt;sk_prot, ops)

sock_map_ops-&gt;map_delete_elem
  sock_map_delete_elem
    __sock_map_delete
     sock_map_unref
       sk_psock_put
         sk_psock_drop
           sk_psock_restore_proto
             tcp_update_ulp
               WRITE_ONCE(sk-&gt;sk_prot, proto)

Mark the shared access with READ_ONCE/WRITE_ONCE annotations.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, sk_msg: Don't clear saved sock proto on restore</title>
<updated>2020-02-19T15:54:05+00:00</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2020-02-17T12:15:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a178b4585865a4c756c41bc5376f63416b7d9271'/>
<id>a178b4585865a4c756c41bc5376f63416b7d9271</id>
<content type='text'>
There is no need to clear psock-&gt;sk_proto when restoring socket protocol
callbacks in sk-&gt;sk_prot. The psock is about to get detached from the sock
and eventually destroyed. At worst we will restore the protocol callbacks
and the write callback twice.

This makes reasoning about psock state easier. Once psock is initialized,
we can count on psock-&gt;sk_proto always being set.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200217121530.754315-3-jakub@cloudflare.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is no need to clear psock-&gt;sk_proto when restoring socket protocol
callbacks in sk-&gt;sk_prot. The psock is about to get detached from the sock
and eventually destroyed. At worst we will restore the protocol callbacks
and the write callback twice.

This makes reasoning about psock state easier. Once psock is initialized,
we can count on psock-&gt;sk_proto always being set.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20200217121530.754315-3-jakub@cloudflare.com
</pre>
</div>
</content>
</entry>
</feed>
