<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/skbuff.h, branch v5.14-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>page_pool: Allow drivers to hint on SKB recycling</title>
<updated>2021-06-07T21:11:47+00:00</updated>
<author>
<name>Ilias Apalodimas</name>
<email>ilias.apalodimas@linaro.org</email>
</author>
<published>2021-06-07T19:02:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6a5bcd84e886a9a91982e515c539529c28acdcc2'/>
<id>6a5bcd84e886a9a91982e515c539529c28acdcc2</id>
<content type='text'>
Up to now several high speed NICs have custom mechanisms of recycling
the allocated memory they use for their payloads.
Our page_pool API already has recycling capabilities that are always
used when we are running in 'XDP mode'. So let's tweak the API and the
kernel network stack slightly and allow the recycling to happen even
during the standard operation.
The API doesn't take into account 'split page' policies used by those
drivers currently, but can be extended once we have users for that.

The idea is to be able to intercept the packet on skb_release_data().
If it's a buffer coming from our page_pool API recycle it back to the
pool for further usage or just release the packet entirely.

To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
a field in struct page (page-&gt;pp) to store the page_pool pointer.
Storing the information in page-&gt;pp allows us to recycle both SKBs and
their fragments.
We could have skipped the skb bit entirely, since identical information
can bederived from struct page. However, in an effort to affect the free path
as less as possible, reading a single bit in the skb which is already
in cache, is better that trying to derive identical information for the
page stored data.

The driver or page_pool has to take care of the sync operations on it's own
during the buffer recycling since the buffer is, after opting-in to the
recycling, never unmapped.

Since the gain on the drivers depends on the architecture, we are not
enabling recycling by default if the page_pool API is used on a driver.
In order to enable recycling the driver must call skb_mark_for_recycle()
to store the information we need for recycling in page-&gt;pp and
enabling the recycling bit, or page_pool_store_mem_info() for a fragment.

Co-developed-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Co-developed-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: Ilias Apalodimas &lt;ilias.apalodimas@linaro.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Up to now several high speed NICs have custom mechanisms of recycling
the allocated memory they use for their payloads.
Our page_pool API already has recycling capabilities that are always
used when we are running in 'XDP mode'. So let's tweak the API and the
kernel network stack slightly and allow the recycling to happen even
during the standard operation.
The API doesn't take into account 'split page' policies used by those
drivers currently, but can be extended once we have users for that.

The idea is to be able to intercept the packet on skb_release_data().
If it's a buffer coming from our page_pool API recycle it back to the
pool for further usage or just release the packet entirely.

To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
a field in struct page (page-&gt;pp) to store the page_pool pointer.
Storing the information in page-&gt;pp allows us to recycle both SKBs and
their fragments.
We could have skipped the skb bit entirely, since identical information
can bederived from struct page. However, in an effort to affect the free path
as less as possible, reading a single bit in the skb which is already
in cache, is better that trying to derive identical information for the
page stored data.

The driver or page_pool has to take care of the sync operations on it's own
during the buffer recycling since the buffer is, after opting-in to the
recycling, never unmapped.

Since the gain on the drivers depends on the architecture, we are not
enabling recycling by default if the page_pool API is used on a driver.
In order to enable recycling the driver must call skb_mark_for_recycle()
to store the information we need for recycling in page-&gt;pp and
enabling the recycling bit, or page_pool_store_mem_info() for a fragment.

Co-developed-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Co-developed-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: Ilias Apalodimas &lt;ilias.apalodimas@linaro.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: add a parameter to __skb_frag_unref</title>
<updated>2021-06-07T21:11:47+00:00</updated>
<author>
<name>Matteo Croce</name>
<email>mcroce@microsoft.com</email>
</author>
<published>2021-06-07T19:02:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c420c98982fa9e749c99e022845d5f323d098b72'/>
<id>c420c98982fa9e749c99e022845d5f323d098b72</id>
<content type='text'>
This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.

Signed-off-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.

Signed-off-by: Matteo Croce &lt;mcroce@microsoft.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Introduce skb_send_sock() for sock_map</title>
<updated>2021-04-01T17:56:13+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-03-31T02:32:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0739cd28f2645e814586c7536ba5da9825cb8029'/>
<id>0739cd28f2645e814586c7536ba5da9825cb8029</id>
<content type='text'>
We only have skb_send_sock_locked() which requires callers
to use lock_sock(). Introduce a variant skb_send_sock()
which locks on its own, callers do not need to lock it
any more. This will save us from adding a -&gt;sendmsg_locked
for each protocol.

To reuse the code, pass function pointers to __skb_send_sock()
and build skb_send_sock() and skb_send_sock_locked() on top.

Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We only have skb_send_sock_locked() which requires callers
to use lock_sock(). Introduce a variant skb_send_sock()
which locks on its own, callers do not need to lock it
any more. This will save us from adding a -&gt;sendmsg_locked
for each protocol.

To reuse the code, pass function pointers to __skb_send_sock()
and build skb_send_sock() and skb_send_sock_locked() on top.

Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2021-03-25T22:31:22+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2021-03-25T22:31:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=efd13b71a3fa31413f8d15342e01d44b60b0a432'/>
<id>efd13b71a3fa31413f8d15342e01d44b60b0a432</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct</title>
<updated>2021-03-16T22:22:18+00:00</updated>
<author>
<name>wenxu</name>
<email>wenxu@ucloud.cn</email>
</author>
<published>2021-03-16T08:33:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d29334c15d33a6a92d2043ca88f84cd5ad026c57'/>
<id>d29334c15d33a6a92d2043ca88f84cd5ad026c57</id>
<content type='text'>
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.

Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu &lt;wenxu@ucloud.cn&gt;
Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When openvswitch conntrack offload with act_ct action. The first rule
do conntrack in the act_ct in tc subsystem. And miss the next rule in
the tc and fallback to the ovs datapath but miss set post_ct flag
which will lead the ct_state_key with -trk flag.

Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
Signed-off-by: wenxu &lt;wenxu@ucloud.cn&gt;
Reviewed-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: micro-optimize {,__}skb_header_pointer()</title>
<updated>2021-03-14T21:48:02+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-03-14T11:11:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d206121faf8bb2239cd970af0bd32f5203780427'/>
<id>d206121faf8bb2239cd970af0bd32f5203780427</id>
<content type='text'>
{,__}skb_header_pointer() helpers exist mainly for preventing
accesses-beyond-end of the linear data.
In the vast majorify of cases, they bail out on the first condition.
All code going after is mostly a fallback.
Mark the most common branch as 'likely' one to move it in-line.
Also, skb_copy_bits() can return negative values only when the input
arguments are invalid, e.g. offset is greater than skb-&gt;len. It can
be safely marked as 'unlikely' branch, assuming that hotpath code
provides sane input to not fail here.

These two bump the throughput with a single Flow Dissector pass on
every packet (e.g. with RPS or driver that uses eth_get_headlen())
on 20 Mbps per flow/core.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
{,__}skb_header_pointer() helpers exist mainly for preventing
accesses-beyond-end of the linear data.
In the vast majorify of cases, they bail out on the first condition.
All code going after is mostly a fallback.
Mark the most common branch as 'likely' one to move it in-line.
Also, skb_copy_bits() can return negative values only when the input
arguments are invalid, e.g. offset is greater than skb-&gt;len. It can
be safely marked as 'unlikely' branch, assuming that hotpath code
provides sane input to not fail here.

These two bump the throughput with a single Flow Dissector pass on
every packet (e.g. with RPS or driver that uses eth_get_headlen())
on 20 Mbps per flow/core.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>flow_dissector: constify raw input data argument</title>
<updated>2021-03-14T21:46:32+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-03-14T11:11:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f96533cded173b3b019001a505a746c3cd8fc323'/>
<id>f96533cded173b3b019001a505a746c3cd8fc323</id>
<content type='text'>
Flow Dissector code never modifies the input buffer, neither skb nor
raw data.
Make 'data' argument const for all of the Flow dissector's functions.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Flow Dissector code never modifies the input buffer, neither skb nor
raw data.
Make 'data' argument const for all of the Flow dissector's functions.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: make __skb_header_pointer()'s data argument const</title>
<updated>2021-03-14T21:46:32+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-03-14T11:11:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3305138da47f0ae2241e5daa18af276e1e54457'/>
<id>e3305138da47f0ae2241e5daa18af276e1e54457</id>
<content type='text'>
The function never modifies the input buffer, so 'data' argument
can be marked as const.
This implies one harmless cast-away.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function never modifies the input buffer, so 'data' argument
can be marked as const.
This implies one harmless cast-away.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: plug skb_still_in_host_queue() to TSQ</title>
<updated>2021-03-12T02:35:31+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-03-11T20:35:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f4dae54e486d528d4dd98df116e7a522bbf12667'/>
<id>f4dae54e486d528d4dd98df116e7a522bbf12667</id>
<content type='text'>
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.

This logic came with commit 1f3279ae0c13 ("tcp: avoid
retransmits of TCP packets hanging in host queues")

Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Neil Spring &lt;ntspring@fb.com&gt;
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.

This logic came with commit 1f3279ae0c13 ("tcp: avoid
retransmits of TCP packets hanging in host queues")

Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Neil Spring &lt;ntspring@fb.com&gt;
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skmsg: Add function doc for skb-&gt;_sk_redir</title>
<updated>2021-03-04T15:32:09+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-03-01T18:48:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ed6e1c761f6c8391af654facbbbf1748ae9f386'/>
<id>6ed6e1c761f6c8391af654facbbbf1748ae9f386</id>
<content type='text'>
This should fix the following warning:

include/linux/skbuff.h:932: warning: Function parameter or member
'_sk_redir' not described in 'sk_buff'

Reported-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20210301184805.8174-1-xiyou.wangcong@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This should fix the following warning:

include/linux/skbuff.h:932: warning: Function parameter or member
'_sk_redir' not described in 'sk_buff'

Reported-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/20210301184805.8174-1-xiyou.wangcong@gmail.com
</pre>
</div>
</content>
</entry>
</feed>
