<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/ipv4, branch v5.17-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path</title>
<updated>2022-02-09T04:49:52+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-08T05:34:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5611a00697c8ecc5aad04392bea629e9d6a20463'/>
<id>5611a00697c8ecc5aad04392bea629e9d6a20463</id>
<content type='text'>
ip[6]mr_free_table() can only be called under RTNL lock.

RTNL: assertion failed at net/core/dev.c (10367)
WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Modules linked in:
CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 &lt;0f&gt; 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee
RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4
R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000
FS:  00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 &lt;TASK&gt;
 mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509
 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline]
 ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline]
 ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline]
 ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298
 ops_init+0xaf/0x470 net/core/net_namespace.c:140
 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331
 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178
 copy_process+0x2e0c/0x7300 kernel/fork.c:2167
 kernel_clone+0xe7/0xab0 kernel/fork.c:2555
 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4ab89f9059
Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f.
RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059
RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000
RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300
R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000
 &lt;/TASK&gt;

Fixes: f243e5a7859a ("ipmr,ip6mr: call ip6mr_free_table() on failure path")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Cong Wang &lt;cong.wang@bytedance.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ip[6]mr_free_table() can only be called under RTNL lock.

RTNL: assertion failed at net/core/dev.c (10367)
WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Modules linked in:
CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 &lt;0f&gt; 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee
RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4
R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000
FS:  00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 &lt;TASK&gt;
 mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509
 ip6mr_free_table net/ipv6/ip6mr.c:389 [inline]
 ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline]
 ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline]
 ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298
 ops_init+0xaf/0x470 net/core/net_namespace.c:140
 setup_net+0x54f/0xbb0 net/core/net_namespace.c:331
 copy_net_ns+0x318/0x760 net/core/net_namespace.c:475
 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
 copy_namespaces+0x391/0x450 kernel/nsproxy.c:178
 copy_process+0x2e0c/0x7300 kernel/fork.c:2167
 kernel_clone+0xe7/0xab0 kernel/fork.c:2555
 __do_sys_clone+0xc8/0x110 kernel/fork.c:2672
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4ab89f9059
Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f.
RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059
RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000
RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300
R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000
 &lt;/TASK&gt;

Fixes: f243e5a7859a ("ipmr,ip6mr: call ip6mr_free_table() on failure path")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Cong Wang &lt;cong.wang@bytedance.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Link: https://lore.kernel.org/r/20220208053451.2885398-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) case</title>
<updated>2022-02-05T04:07:12+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-03T22:55:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f8d9d938514f46c4892aff6bfe32f425e84d81cc'/>
<id>f8d9d938514f46c4892aff6bfe32f425e84d81cc</id>
<content type='text'>
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY)
calls over the same TCP socket would again trigger the
infamous warning in inet_sock_destruct()

	WARN_ON(sk_forward_alloc_get(sk));

While Talal took into account a mix of regular copied data
and MSG_ZEROCOPY one in the same skb, the sendpage() path
has been forgotten.

We want the charging to happen for sendpage(), because
pages could be coming from a pipe. What is missing is the
downgrading of pure zerocopy status to make sure
sk_forward_alloc will stay synced.

Add tcp_downgrade_zcopy_pure() helper so that we can
use it from the two callers.

Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY)
calls over the same TCP socket would again trigger the
infamous warning in inet_sock_destruct()

	WARN_ON(sk_forward_alloc_get(sk));

While Talal took into account a mix of regular copied data
and MSG_ZEROCOPY one in the same skb, the sendpage() path
has been forgotten.

We want the charging to happen for sendpage(), because
pages could be coming from a pipe. What is missing is the
downgrading of pure zerocopy status to make sure
sk_forward_alloc will stay synced.

Add tcp_downgrade_zcopy_pure() helper so that we can
use it from the two callers.

Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Link: https://lore.kernel.org/r/20220203225547.665114-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()</title>
<updated>2022-02-03T00:22:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-01T18:46:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b67985be400969578d4d4b17299714c0e5d2c07b'/>
<id>b67985be400969578d4d4b17299714c0e5d2c07b</id>
<content type='text'>
tcp_shift_skb_data() might collapse three packets into a larger one.

P_A, P_B, P_C  -&gt; P_ABC

Historically, it used a single tcp_skb_can_collapse_to(P_A) call,
because it was enough.

In commit 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions"),
this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B)

But the now needed test over P_C has been missed.

This probably broke MPTCP.

Then later, commit 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
added an extra condition to tcp_skb_can_collapse(), but the missing call
from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C
might have different skb_zcopy_pure() status.

Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Mat Martineau &lt;mathew.j.martineau@linux.intel.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tcp_shift_skb_data() might collapse three packets into a larger one.

P_A, P_B, P_C  -&gt; P_ABC

Historically, it used a single tcp_skb_can_collapse_to(P_A) call,
because it was enough.

In commit 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions"),
this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B)

But the now needed test over P_C has been missed.

This probably broke MPTCP.

Then later, commit 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
added an extra condition to tcp_skb_can_collapse(), but the missing call
from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C
might have different skb_zcopy_pure() status.

Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Mat Martineau &lt;mathew.j.martineau@linux.intel.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix mem under-charging with zerocopy sendmsg()</title>
<updated>2022-02-02T04:21:40+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-02-01T06:52:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=479f5547239d970d3833f15f54a6481fffdb91ec'/>
<id>479f5547239d970d3833f15f54a6481fffdb91ec</id>
<content type='text'>
We got reports of following warning in inet_sock_destruct()

	WARN_ON(sk_forward_alloc_get(sk));

Whenever we add a non zero-copy fragment to a pure zerocopy skb,
we have to anticipate that whole skb-&gt;truesize will be uncharged
when skb is finally freed.

skb-&gt;data_len is the payload length. But the memory truesize
estimated by __zerocopy_sg_from_iter() is page aligned.

Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We got reports of following warning in inet_sock_destruct()

	WARN_ON(sk_forward_alloc_get(sk));

Whenever we add a non zero-copy fragment to a pure zerocopy skb,
we have to anticipate that whole skb-&gt;truesize will be uncharged
when skb is finally freed.

skb-&gt;data_len is the payload length. But the memory truesize
estimated by __zerocopy_sg_from_iter() is page aligned.

Fixes: 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Link: https://lore.kernel.org/r/20220201065254.680532-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf</title>
<updated>2022-01-28T02:53:02+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-01-28T02:53:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=33d12dc91bc41183003913b888cc492420ae6ef8'/>
<id>33d12dc91bc41183003913b888cc492420ae6ef8</id>
<content type='text'>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Remove leftovers from flowtable modules, from Geert Uytterhoeven.

2) Missing refcount increment of conntrack template in nft_ct,
   from Florian Westphal.

3) Reduce nft_zone selftest time, also from Florian.

4) Add selftest to cover stateless NAT on fragments, from Florian Westphal.

5) Do not set net_device when for reject packets from the bridge path,
   from Phil Sutter.

6) Cancel register tracking info on nft_byteorder operations.

7) Extend nft_concat_range selftest to cover set reload with no elements,
   from Florian Westphal.

8) Remove useless update of pointer in chain blob builder, reported
   by kbuild test robot.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: nf_tables: remove assignment with no effect in chain blob builder
  selftests: nft_concat_range: add test for reload with no element add/del
  netfilter: nft_byteorder: track register operations
  netfilter: nft_reject_bridge: Fix for missing reply from prerouting
  selftests: netfilter: check stateless nat udp checksum fixup
  selftests: netfilter: reduce zone stress test running time
  netfilter: nft_ct: fix use after free when attaching zone template
  netfilter: Remove flowtable relics
====================

Link: https://lore.kernel.org/r/20220127235235.656931-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Remove leftovers from flowtable modules, from Geert Uytterhoeven.

2) Missing refcount increment of conntrack template in nft_ct,
   from Florian Westphal.

3) Reduce nft_zone selftest time, also from Florian.

4) Add selftest to cover stateless NAT on fragments, from Florian Westphal.

5) Do not set net_device when for reject packets from the bridge path,
   from Phil Sutter.

6) Cancel register tracking info on nft_byteorder operations.

7) Extend nft_concat_range selftest to cover set reload with no elements,
   from Florian Westphal.

8) Remove useless update of pointer in chain blob builder, reported
   by kbuild test robot.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  netfilter: nf_tables: remove assignment with no effect in chain blob builder
  selftests: nft_concat_range: add test for reload with no element add/del
  netfilter: nft_byteorder: track register operations
  netfilter: nft_reject_bridge: Fix for missing reply from prerouting
  selftests: netfilter: check stateless nat udp checksum fixup
  selftests: netfilter: reduce zone stress test running time
  netfilter: nft_ct: fix use after free when attaching zone template
  netfilter: Remove flowtable relics
====================

Link: https://lore.kernel.org/r/20220127235235.656931-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-01-27T18:58:39+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-27T18:58:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=23a46422c56144939c091c76cf389aa863ce9c18'/>
<id>23a46422c56144939c091c76cf389aa863ce9c18</id>
<content type='text'>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

  Current release - new code bugs:

   - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

   - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n

   - nf_tables: set last expression in register tracking area

   - nft_connlimit: fix memleak if nf_ct_netns_get() fails

   - mptcp: fix removing ids bitmap setting

   - bonding: use rcu_dereference_rtnl when getting active slave

   - fix three cases of sleep in atomic context in drivers: lan966x, gve

   - handful of build fixes for esoteric drivers after netdev-&gt;dev_addr
     was made const

  Previous releases - regressions:

   - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
     Linux compatibility with USGv6 tests

   - procfs: show net device bound packet types

   - ipv4: fix ip option filtering for locally generated fragments

   - phy: broadcom: hook up soft_reset for BCM54616S

  Previous releases - always broken:

   - ipv4: raw: lock the socket in raw_bind()

   - ipv4: decrease the use of shared IPID generator to decrease the
     chance of attackers guessing the values

   - procfs: fix cross-netns information leakage in /proc/net/ptype

   - ethtool: fix link extended state for big endian

   - bridge: vlan: fix single net device option dumping

   - ping: fix the sk_bound_dev_if match in ping_lookup"

* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  net: bridge: vlan: fix memory leak in __allowed_ingress
  net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
  ipv4: remove sparse error in ip_neigh_gw4()
  ipv4: avoid using shared IP generator for connected sockets
  ipv4: tcp: send zero IPID in SYNACK messages
  ipv4: raw: lock the socket in raw_bind()
  MAINTAINERS: add missing IPv4/IPv6 header paths
  MAINTAINERS: add more files to eth PHY
  net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
  net: bridge: vlan: fix single net device option dumping
  net: stmmac: skip only stmmac_ptp_register when resume from suspend
  net: stmmac: configure PTP clock source prior to PTP initialization
  Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
  connector/cn_proc: Use task_is_in_init_pid_ns()
  pid: Introduce helper task_is_in_init_pid_ns()
  gve: Fix GFP flags when allocing pages
  net: lan966x: Fix sleep in atomic context when updating MAC table
  net: lan966x: Fix sleep in atomic context when injecting frames
  ethernet: seeq/ether3: don't write directly to netdev-&gt;dev_addr
  ethernet: 8390/etherh: don't write directly to netdev-&gt;dev_addr
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

  Current release - new code bugs:

   - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

   - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n

   - nf_tables: set last expression in register tracking area

   - nft_connlimit: fix memleak if nf_ct_netns_get() fails

   - mptcp: fix removing ids bitmap setting

   - bonding: use rcu_dereference_rtnl when getting active slave

   - fix three cases of sleep in atomic context in drivers: lan966x, gve

   - handful of build fixes for esoteric drivers after netdev-&gt;dev_addr
     was made const

  Previous releases - regressions:

   - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
     Linux compatibility with USGv6 tests

   - procfs: show net device bound packet types

   - ipv4: fix ip option filtering for locally generated fragments

   - phy: broadcom: hook up soft_reset for BCM54616S

  Previous releases - always broken:

   - ipv4: raw: lock the socket in raw_bind()

   - ipv4: decrease the use of shared IPID generator to decrease the
     chance of attackers guessing the values

   - procfs: fix cross-netns information leakage in /proc/net/ptype

   - ethtool: fix link extended state for big endian

   - bridge: vlan: fix single net device option dumping

   - ping: fix the sk_bound_dev_if match in ping_lookup"

* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  net: bridge: vlan: fix memory leak in __allowed_ingress
  net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
  ipv4: remove sparse error in ip_neigh_gw4()
  ipv4: avoid using shared IP generator for connected sockets
  ipv4: tcp: send zero IPID in SYNACK messages
  ipv4: raw: lock the socket in raw_bind()
  MAINTAINERS: add missing IPv4/IPv6 header paths
  MAINTAINERS: add more files to eth PHY
  net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
  net: bridge: vlan: fix single net device option dumping
  net: stmmac: skip only stmmac_ptp_register when resume from suspend
  net: stmmac: configure PTP clock source prior to PTP initialization
  Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
  connector/cn_proc: Use task_is_in_init_pid_ns()
  pid: Introduce helper task_is_in_init_pid_ns()
  gve: Fix GFP flags when allocing pages
  net: lan966x: Fix sleep in atomic context when updating MAC table
  net: lan966x: Fix sleep in atomic context when injecting frames
  ethernet: seeq/ether3: don't write directly to netdev-&gt;dev_addr
  ethernet: 8390/etherh: don't write directly to netdev-&gt;dev_addr
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>net: socket: rename SKB_DROP_REASON_SOCKET_FILTER</title>
<updated>2022-01-27T16:45:13+00:00</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-01-27T09:13:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=364df53c081d93fcfd6b91085ff2650c7f17b3c7'/>
<id>364df53c081d93fcfd6b91085ff2650c7f17b3c7</id>
<content type='text'>
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: tcp: send zero IPID in SYNACK messages</title>
<updated>2022-01-27T16:37:02+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-01-27T01:10:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=970a5a3ea86da637471d3cd04d513a0755aba4bf'/>
<id>970a5a3ea86da637471d3cd04d513a0755aba4bf</id>
<content type='text'>
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and
ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
ctl packets sent by TCP.

It turns out we need to use a similar strategy for SYNACK packets.

By default, they carry IP_DF and IPID==0, but there are ways
to ask them to use the hashed IP ident generator and thus
be used to build off-path attacks.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)

One of this way is to force (before listener is started)
echo 1 &gt;/proc/sys/net/ipv4/ip_no_pmtu_disc

Another way is using forged ICMP ICMP_FRAG_NEEDED
with a very small MTU (like 68) to force a false return from
ip_dont_fragment()

In this patch, ip_build_and_send_pkt() uses the following
heuristics.

1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
can use IP_DF regardless of the listener or route pmtu setting.

2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
we use prandom_u32() generator instead of the IPv4 hashed ident one.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Ray Che &lt;xijiache@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Cc: Geoff Alexander &lt;alexandg@cs.unm.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and
ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
ctl packets sent by TCP.

It turns out we need to use a similar strategy for SYNACK packets.

By default, they carry IP_DF and IPID==0, but there are ways
to ask them to use the hashed IP ident generator and thus
be used to build off-path attacks.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)

One of this way is to force (before listener is started)
echo 1 &gt;/proc/sys/net/ipv4/ip_no_pmtu_disc

Another way is using forged ICMP ICMP_FRAG_NEEDED
with a very small MTU (like 68) to force a false return from
ip_dont_fragment()

In this patch, ip_build_and_send_pkt() uses the following
heuristics.

1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
can use IP_DF regardless of the listener or route pmtu setting.

2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
we use prandom_u32() generator instead of the IPv4 hashed ident one.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Ray Che &lt;xijiache@gmail.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Cc: Geoff Alexander &lt;alexandg@cs.unm.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: raw: lock the socket in raw_bind()</title>
<updated>2022-01-27T14:09:10+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-01-27T00:51:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=153a0d187e767c68733b8e9f46218eb1f41ab902'/>
<id>153a0d187e767c68733b8e9f46218eb1f41ab902</id>
<content type='text'>
For some reason, raw_bind() forgot to lock the socket.

BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind

write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0:
 raw_bind+0x1b0/0x250 net/ipv4/raw.c:739
 inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443
 __sys_bind+0x14b/0x1b0 net/socket.c:1697
 __do_sys_bind net/socket.c:1708 [inline]
 __se_sys_bind net/socket.c:1706 [inline]
 __x64_sys_bind+0x3d/0x50 net/socket.c:1706
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1:
 __ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39
 ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89
 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
 __sys_connect_file net/socket.c:1900 [inline]
 __sys_connect+0x197/0x1b0 net/socket.c:1917
 __do_sys_connect net/socket.c:1927 [inline]
 __se_sys_connect net/socket.c:1924 [inline]
 __x64_sys_connect+0x3d/0x50 net/socket.c:1924
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -&gt; 0x0003007f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For some reason, raw_bind() forgot to lock the socket.

BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind

write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0:
 raw_bind+0x1b0/0x250 net/ipv4/raw.c:739
 inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443
 __sys_bind+0x14b/0x1b0 net/socket.c:1697
 __do_sys_bind net/socket.c:1708 [inline]
 __se_sys_bind net/socket.c:1706 [inline]
 __x64_sys_bind+0x3d/0x50 net/socket.c:1706
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1:
 __ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39
 ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89
 inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
 __sys_connect_file net/socket.c:1900 [inline]
 __sys_connect+0x197/0x1b0 net/socket.c:1917
 __do_sys_connect net/socket.c:1927 [inline]
 __se_sys_connect net/socket.c:1924 [inline]
 __x64_sys_connect+0x3d/0x50 net/socket.c:1924
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -&gt; 0x0003007f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: syzbot &lt;syzkaller@googlegroups.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: Remove flowtable relics</title>
<updated>2022-01-26T23:00:20+00:00</updated>
<author>
<name>Geert Uytterhoeven</name>
<email>geert@linux-m68k.org</email>
</author>
<published>2022-01-23T12:57:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7355bfe0e0cc27597d530f78e259a985cb85af40'/>
<id>7355bfe0e0cc27597d530f78e259a985cb85af40</id>
<content type='text'>
NF_FLOW_TABLE_IPV4 and NF_FLOW_TABLE_IPV6 are invisble, selected by
nothing (so they can no longer be enabled), and their last real users
have been removed (nf_flow_table_ipv6.c is empty).

Clean up the leftovers.

Fixes: c42ba4290b2147aa ("netfilter: flowtable: remove ipv4/ipv6 modules")
Signed-off-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NF_FLOW_TABLE_IPV4 and NF_FLOW_TABLE_IPV6 are invisble, selected by
nothing (so they can no longer be enabled), and their last real users
have been removed (nf_flow_table_ipv6.c is empty).

Clean up the leftovers.

Fixes: c42ba4290b2147aa ("netfilter: flowtable: remove ipv4/ipv6 modules")
Signed-off-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
