<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/core/dev.c, branch v5.12-rc7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: fix hangup on napi_disable for threaded napi</title>
<updated>2021-04-09T19:50:31+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2021-04-09T15:24:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=27f0ad71699de41bae013c367b95a6b319cc46a9'/>
<id>27f0ad71699de41bae013c367b95a6b319cc46a9</id>
<content type='text'>
napi_disable() is subject to an hangup, when the threaded
mode is enabled and the napi is under heavy traffic.

If the relevant napi has been scheduled and the napi_disable()
kicks in before the next napi_threaded_wait() completes - so
that the latter quits due to the napi_disable_pending() condition,
the existing code leaves the NAPI_STATE_SCHED bit set and the
napi_disable() loop waiting for such bit will hang.

This patch addresses the issue by dropping the NAPI_STATE_DISABLE
bit test in napi_thread_wait(). The later napi_threaded_poll()
iteration will take care of clearing the NAPI_STATE_SCHED.

This also addresses a related problem reported by Jakub:
before this patch a napi_disable()/napi_enable() pair killed
the napi thread, effectively disabling the threaded mode.
On the patched kernel napi_disable() simply stops scheduling
the relevant thread.

v1 -&gt; v2:
  - let the main napi_thread_poll() loop clear the SCHED bit

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
napi_disable() is subject to an hangup, when the threaded
mode is enabled and the napi is under heavy traffic.

If the relevant napi has been scheduled and the napi_disable()
kicks in before the next napi_threaded_wait() completes - so
that the latter quits due to the napi_disable_pending() condition,
the existing code leaves the NAPI_STATE_SCHED bit set and the
napi_disable() loop waiting for such bit will hang.

This patch addresses the issue by dropping the NAPI_STATE_DISABLE
bit test in napi_thread_wait(). The later napi_threaded_poll()
iteration will take care of clearing the NAPI_STATE_SCHED.

This also addresses a related problem reported by Jakub:
before this patch a napi_disable()/napi_enable() pair killed
the napi thread, effectively disabling the threaded mode.
On the patched kernel napi_disable() simply stops scheduling
the relevant thread.

v1 -&gt; v2:
  - let the main napi_thread_poll() loop clear the SCHED bit

Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/883923fa22745a9589e8610962b7dc59df09fb1f.1617981844.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: check all name nodes in __dev_alloc_name</title>
<updated>2021-03-18T21:40:53+00:00</updated>
<author>
<name>Jiri Bohac</name>
<email>jbohac@suse.cz</email>
</author>
<published>2021-03-18T03:42:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6c015a2256801597fadcbc11d287774c9c512fa5'/>
<id>6c015a2256801597fadcbc11d287774c9c512fa5</id>
<content type='text'>
__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.

Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names.  __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.

This demonstrates the bug:

    # rmmod dummy 2&gt;/dev/null
    # ip link property add dev lo altname dummy0
    # modprobe dummy numdummies=1
    modprobe: ERROR: could not insert 'dummy': Too many open files in system

Instead of creating a device named dummy1, modprobe fails.

Fix this by checking all the names in the d-&gt;name_node list, not just d-&gt;name.

Signed-off-by: Jiri Bohac &lt;jbohac@suse.cz&gt;
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.

Since commit ff92741270bf8b6e78aa885f166b68c7a67ab13a ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names.  __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.

This demonstrates the bug:

    # rmmod dummy 2&gt;/dev/null
    # ip link property add dev lo altname dummy0
    # modprobe dummy numdummies=1
    modprobe: ERROR: could not insert 'dummy': Too many open files in system

Instead of creating a device named dummy1, modprobe fails.

Fix this by checking all the names in the d-&gt;name_node list, not just d-&gt;name.

Signed-off-by: Jiri Bohac &lt;jbohac@suse.cz&gt;
Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix race between napi kthread mode and busy poll</title>
<updated>2021-03-17T21:31:17+00:00</updated>
<author>
<name>Wei Wang</name>
<email>weiwan@google.com</email>
</author>
<published>2021-03-16T22:36:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cb038357937ee4f589aab2469ec3896dce90f317'/>
<id>cb038357937ee4f589aab2469ec3896dce90f317</id>
<content type='text'>
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi-&gt;poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi-&gt;poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi-&gt;state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov &lt;micron10@gmail.com&gt;
Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi-&gt;poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi-&gt;poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi-&gt;state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov &lt;micron10@gmail.com&gt;
Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>can: dev: Move device back to init netns on owning netns delete</title>
<updated>2021-03-16T07:40:04+00:00</updated>
<author>
<name>Martin Willi</name>
<email>martin@strongswan.org</email>
</author>
<published>2021-03-02T12:24:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3a5ca857079ea022e0b1b17fc154f7ad7dbc150f'/>
<id>3a5ca857079ea022e0b1b17fc154f7ad7dbc150f</id>
<content type='text'>
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[&lt;c010e700&gt;] (unwind_backtrace) from [&lt;c010a1d8&gt;] (show_stack+0x10/0x14)
[&lt;c010a1d8&gt;] (show_stack) from [&lt;c086dc10&gt;] (dump_stack+0x94/0xa8)
[&lt;c086dc10&gt;] (dump_stack) from [&lt;c086b938&gt;] (__warn+0xb8/0x114)
[&lt;c086b938&gt;] (__warn) from [&lt;c086ba10&gt;] (warn_slowpath_fmt+0x7c/0xac)
[&lt;c086ba10&gt;] (warn_slowpath_fmt) from [&lt;c0629f20&gt;] (ops_exit_list+0x38/0x60)
[&lt;c0629f20&gt;] (ops_exit_list) from [&lt;c062a5c4&gt;] (cleanup_net+0x230/0x380)
[&lt;c062a5c4&gt;] (cleanup_net) from [&lt;c0142c20&gt;] (process_one_work+0x1d8/0x438)
[&lt;c0142c20&gt;] (process_one_work) from [&lt;c0142ee4&gt;] (worker_thread+0x64/0x5a8)
[&lt;c0142ee4&gt;] (worker_thread) from [&lt;c0148a98&gt;] (kthread+0x148/0x14c)
[&lt;c0148a98&gt;] (kthread) from [&lt;c0100148&gt;] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi &lt;martin@strongswan.org&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[&lt;c010e700&gt;] (unwind_backtrace) from [&lt;c010a1d8&gt;] (show_stack+0x10/0x14)
[&lt;c010a1d8&gt;] (show_stack) from [&lt;c086dc10&gt;] (dump_stack+0x94/0xa8)
[&lt;c086dc10&gt;] (dump_stack) from [&lt;c086b938&gt;] (__warn+0xb8/0x114)
[&lt;c086b938&gt;] (__warn) from [&lt;c086ba10&gt;] (warn_slowpath_fmt+0x7c/0xac)
[&lt;c086ba10&gt;] (warn_slowpath_fmt) from [&lt;c0629f20&gt;] (ops_exit_list+0x38/0x60)
[&lt;c0629f20&gt;] (ops_exit_list) from [&lt;c062a5c4&gt;] (cleanup_net+0x230/0x380)
[&lt;c062a5c4&gt;] (cleanup_net) from [&lt;c0142c20&gt;] (process_one_work+0x1d8/0x438)
[&lt;c0142c20&gt;] (process_one_work) from [&lt;c0142ee4&gt;] (worker_thread+0x64/0x5a8)
[&lt;c0142ee4&gt;] (worker_thread) from [&lt;c0148a98&gt;] (kthread+0x148/0x14c)
[&lt;c0148a98&gt;] (kthread) from [&lt;c0100148&gt;] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi &lt;martin@strongswan.org&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next</title>
<updated>2021-02-16T21:14:06+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2021-02-16T21:14:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8af417e4d93caeefb89bbfbd56ec95dedd8dab5'/>
<id>b8af417e4d93caeefb89bbfbd56ec95dedd8dab5</id>
<content type='text'>
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706eba ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f8161 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &amp;zc, &amp;tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &amp;zc, &amp;len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit &amp; sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc &amp; XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-02-16

The following pull-request contains BPF updates for your *net-next* tree.

There's a small merge conflict between 7eeba1706eba ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f8161 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:

  [...]
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &amp;zc, &amp;tss);
                err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
                                                          &amp;zc, &amp;len, err);
                release_sock(sk);
  [...]

We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).

The main changes are:

1) Adds support of pointers to types with known size among global function
   args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.

2) Add bpf_iter for task_vma which can be used to generate information similar
   to /proc/pid/maps, from Song Liu.

3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
   rewriting bind user ports from BPF side below the ip_unprivileged_port_start
   range, both from Stanislav Fomichev.

4) Prevent recursion on fentry/fexit &amp; sleepable programs and allow map-in-map
   as well as per-cpu maps for the latter, from Alexei Starovoitov.

5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
   for sleepable programs, both from KP Singh.

6) Extend verifier to enable variable offset read/write access to the BPF
   program stack, from Andrei Matei.

7) Improve tc &amp; XDP MTU handling and add a new bpf_check_mtu() helper to
   query device MTU from programs, from Jesper Dangaard Brouer.

8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
   tracing programs, from Florent Revest.

9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
   otherwise too many passes are required, from Gary Lin.

10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
    verification both related to zero-extension handling, from Ilya Leoshkevich.

11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.

12) Batch of AF_XDP selftest cleanups and small performance improvement
    for libbpf's xsk map redirect for newer kernels, from Björn Töpel.

13) Follow-up BPF doc and verifier improvements around atomics with
    BPF_FETCH, from Brendan Jackman.

14) Permit zero-sized data sections e.g. if ELF .rodata section contains
    read-only data from local variables, from Yonghong Song.

15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing</title>
<updated>2021-02-13T22:32:04+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-02-13T14:13:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9243adfc311a20371c3f4d8eaf0af4b135e6fac3'/>
<id>9243adfc311a20371c3f4d8eaf0af4b135e6fac3</id>
<content type='text'>
napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.

Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: remove __kfree_skb_flush()</title>
<updated>2021-02-13T22:32:03+00:00</updated>
<author>
<name>Alexander Lobakin</name>
<email>alobakin@pm.me</email>
</author>
<published>2021-02-13T14:12:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fec6e49b63989657bc4076dad99fa51d5ece34da'/>
<id>fec6e49b63989657bc4076dad99fa51d5ece34da</id>
<content type='text'>
This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Drop MTU check when doing TC-BPF redirect to ingress</title>
<updated>2021-02-13T00:15:28+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2021-02-09T13:38:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5f7d57280c1982d993d5f4ff0edac310f820f607'/>
<id>5f7d57280c1982d993d5f4ff0edac310f820f607</id>
<content type='text'>
The use-case for dropping the MTU check when TC-BPF does redirect to
ingress, is described by Eyal Birger in email[0]. The summary is the
ability to increase packet size (e.g. with IPv6 headers for NAT64) and
ingress redirect packet and let normal netstack fragment packet as needed.

[0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/

V15:
 - missing static for function declaration

V9:
 - Make net_device "up" (IFF_UP) check explicit in skb_do_redirect

V4:
 - Keep net_device "up" (IFF_UP) check.
 - Adjustment to handle bpf_redirect_peer() helper

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/161287790971.790810.11785274340154740591.stgit@firesoul
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The use-case for dropping the MTU check when TC-BPF does redirect to
ingress, is described by Eyal Birger in email[0]. The summary is the
ability to increase packet size (e.g. with IPv6 headers for NAT64) and
ingress redirect packet and let normal netstack fragment packet as needed.

[0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/

V15:
 - missing static for function declaration

V9:
 - Make net_device "up" (IFF_UP) check explicit in skb_do_redirect

V4:
 - Keep net_device "up" (IFF_UP) check.
 - Adjustment to handle bpf_redirect_peer() helper

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/161287790971.790810.11785274340154740591.stgit@firesoul
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix dev_ifsioc_locked() race condition</title>
<updated>2021-02-12T02:14:19+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-02-11T19:34:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b23a32a63219f51a5298bc55a65ecee866e79d0'/>
<id>3b23a32a63219f51a5298bc55a65ecee866e79d0</id>
<content type='text'>
dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev-&gt;dev_addr, addr-&gt;sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr-&gt;ifr_hwaddr.sa_data,
					dev-&gt;dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" &lt;sishuai@purdue.edu&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev-&gt;dev_addr, addr-&gt;sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr-&gt;ifr_hwaddr.sa_data,
					dev-&gt;dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" &lt;sishuai@purdue.edu&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2021-02-10T21:30:12+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2021-02-10T21:30:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dc9d87581d464e7b7d38853d6904b70b6c920d99'/>
<id>dc9d87581d464e7b7d38853d6904b70b6c920d99</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
