Age | Commit message (Collapse) | Author |
|
[ Upstream commit edb9d1bff4bbe19b8ae0e71b1f38732591a9eeb2 ]
When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.
Following is example with GACT action:
% sudo modprobe act_gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 1
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....
After the fix:
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module Size Used by
act_gact 16384 3
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
% sudo rmmod act_gact
% lsmod
Module Size Used by
%
Fixes: f97017cdefef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ]
This patch fixes a memory leak, which happens if the connection request
is not fulfilled between parsing the DCCP options and handling the SYN
(because e.g. the backlog is full), because we forgot to free the
list of ack vectors.
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]
As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:
#8 [] page_fault at ffffffff8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.
#9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8
Of course it may happen with other NIC drivers as well.
It's found the freed dst_entry here:
224 static bool tcp_in_quickack_mode(struct sock *sk)↩
225 {↩
226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩
227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩
228 ↩
229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
231 }↩
But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.
All the vmcores showed 2 significant clues:
- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.
- All vmcores showed a postitive LockDroppedIcmps value, e.g:
LockDroppedIcmps 267
A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:
do_redirect()->__sk_dst_check()-> dst_release().
Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.
To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.
The dccp/IPv6 code is very similar in this respect, so fixing it there too.
As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().
Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a13b2082ece95247779b9995c4e91b4246bed023 ]
Andreas reports kernel oops during rmmod of the br_netfilter module.
Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
Problem is that br_netfilter has the nasty concept of adding a fake
rtable to skb->dst; this happens in a br_netfilter prerouting hook.
A second hook (in bridge LOCAL_IN) is supposed to remove these again
before the skb is handed up the stack.
However, on module unload hooks get unregistered which means an
skb could traverse the prerouting hook that attaches the fake_rtable,
while the 'fake rtable remove' hook gets removed from the hooklist
immediately after.
Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core")
Reported-by: Andreas Karis <akaris@redhat.com>
Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87 ]
ip6_fragment, in case skb has a fraglist, checks if the
skb is cloned. If it is, it will move to the 'slow path' and allocates
new skbs for each fragment.
However, right before entering the slowpath loop, it updates the
nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
to account for the fragment header that will be inserted in the new
ipv6-fragment skbs.
In case original skb is cloned this munges nexthdr value of another
skb. Avoid this by doing the nexthdr update for each of the new fragment
skbs separately.
This was observed with tcpdump on a bridge device where netfilter ipv6
reassembly is active: tcpdump shows malformed fragment headers as
the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Andreas Karis <akaris@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 67e194007be08d071294456274dd53e0a04fdf90 ]
Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a
loop that removes all siblings of an ECMP route that is being
replaced. However, this loop doesn't stop when it has replaced
siblings, and keeps removing other routes with a higher metric.
We also end up triggering the WARN_ON after the loop, because after
this nsiblings < 0.
Instead, stop the loop when we have taken care of all routes with the
same metric as the route being replaced.
Reproducer:
===========
#!/bin/sh
ip netns add ns1
ip netns add ns2
ip -net ns1 link set lo up
for x in 0 1 2 ; do
ip link add veth$x netns ns2 type veth peer name eth$x netns ns1
ip -net ns1 link set eth$x up
ip -net ns2 link set veth$x up
done
ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \
nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2
ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256
ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048
echo "before replace, 3 routes"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
echo
ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \
nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2
echo "after replace, only 2 routes, metric 2048 is gone"
ip -net ns1 -6 r | grep -v '^fe80\|^ff00'
Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e37791ec1ad785b59022ae211f63a16189bacebf ]
When the mpls_router module is unloaded, mpls routes are deleted but
notifications are not sent to userspace leaving userspace caches
out of sync. Add the call to mpls_notify_route in mpls_net_exit as
routes are freed.
Fixes: 0189197f44160 ("mpls: Basic routing support")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 52491c7607c5527138095edf44c53169dc1ddb82 ]
tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS
is set, resulting in a null pointer dereference when trying to access it.
[501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark]
...
[501099.044334] Call Trace:
[501099.044345] [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0
[501099.044363] [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120
[501099.044380] [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110
[501099.044398] [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32]
[501099.044417] [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32]
[501099.044436] [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0
[501099.044454] [<ffffffffa44a23a1>] ? security_capable+0x41/0x60
[501099.044471] [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220
[501099.044490] [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870
[501099.044507] [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0
[501099.044524] [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30
[501099.044541] [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230
[501099.044558] [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0
[501099.044576] [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40
[501099.044592] [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150
[501099.044608] [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510
[501099.044626] [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b
Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action")
Signed-off-by: Étienne Noss <etienne.noss@wifirst.fr>
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 62f8f4d9066c1c6f2474845d1ca7e2891f2ae3fd ]
Dmitry reported crashes in DCCP stack [1]
Problem here is that when I got rid of listener spinlock, I missed the
fact that DCCP stores a complex state in struct dccp_request_sock,
while TCP does not.
Since multiple cpus could access it at the same time, we need to add
protection.
[1]
BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
net/dccp/feat.c:1541 at addr ffff88003713be68
Read of size 8 by task syz-executor2/8457
CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x292/0x398 lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
print_address_description mm/kasan/report.c:200 [inline]
kasan_report_error mm/kasan/report.c:289 [inline]
kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
kasan_report mm/kasan/report.c:332 [inline]
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
do_softirq kernel/softirq.c:176 [inline]
__local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
local_bh_enable include/linux/bottom_half.h:31 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
Object at ffff88003713be50, in cache kmalloc-64 size: 64
Allocated:
PID = 8446
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
kmalloc include/linux/slab.h:490 [inline]
dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
__feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
sk_backlog_rcv include/net/sock.h:893 [inline]
__sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Freed:
PID = 15
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
slab_free_hook mm/slub.c:1355 [inline]
slab_free_freelist_hook mm/slub.c:1377 [inline]
slab_free mm/slub.c:2954 [inline]
kfree+0xe8/0x2b0 mm/slub.c:3874
dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
dccp_feat_list_pop net/dccp/feat.c:541 [inline]
dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Memory state around the buggy address:
ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
^
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9ac25fc063751379cb77434fef9f3b088cd3e2f7 ]
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.
sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc and lead to leaks or use after free.
Fixes: 62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit dd4f10722aeb10f4f582948839f066bebe44e5fb ]
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.
sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc.
Fixes: bf7fa551e0ce ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 02b2faaf0af1d85585f6d6980e286d53612acfc2 ]
Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting
tcp_disconnect() path that was never really considered and/or used
before syzkaller ;)
I was not able to reproduce the bug, but it seems issues here are the
three possible actions that assumed they would never trigger on a
listener.
1) tcp_write_timer_handler
2) tcp_delack_timer_handler
3) MTU reduction
Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN
states from tcp_v6_mtu_reduced()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d5afb6f9b6bb2c57bd0c05e76e12489dc0d037d9 ]
The code where sk_clone() came from created a new socket and locked it,
but then, on the error path didn't unlock it.
This problem stayed there for a long while, till b0691c8ee7c2 ("net:
Unlock sock before calling sk_free()") fixed it, but unfortunately the
callers of sk_clone() (now sk_clone_locked()) were not audited and the
one in dccp_create_openreq_child() remained.
Now in the age of the syskaller fuzzer, this was finally uncovered, as
reported by Dmitry:
---- 8< ----
I've got the following report while running syzkaller fuzzer on
86292b33d4b7 ("Merge branch 'akpm' (patches from Andrew)")
[ BUG: held lock freed! ]
4.10.0+ #234 Not tainted
-------------------------
syz-executor6/6898 is freeing memory
ffff88006286cac0-ffff88006286d3b7, with a lock still held there!
(slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
include/linux/spinlock.h:299 [inline]
(slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
5 locks held by syz-executor6/6898:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>] lock_sock
include/net/sock.h:1460 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>]
inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
#1: (rcu_read_lock){......}, at: [<ffffffff83bc1c2a>]
inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
#2: (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_unlink
include/linux/skbuff.h:1767 [inline]
#2: (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_dequeue
include/linux/skbuff.h:1783 [inline]
#2: (rcu_read_lock){......}, at: [<ffffffff8369b424>]
process_backlog+0x264/0x730 net/core/dev.c:4835
#3: (rcu_read_lock){......}, at: [<ffffffff83aeb5c0>]
ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
#4: (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
include/linux/spinlock.h:299 [inline]
#4: (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
Fix it just like was done by b0691c8ee7c2 ("net: Unlock sock before calling
sk_free()").
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170301153510.GE15145@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 13baa00ad01bb3a9f893e3a08cbc2d072fc0c15d ]
It is now very clear that silly TCP listeners might play with
enabling/disabling timestamping while new children are added
to their accept queue.
Meaning net_enable_timestamp() can be called from BH context
while current state of the static key is not enabled.
Lets play safe and allow all contexts.
The work queue is scheduled only under the problematic cases,
which are the static key enable/disable transition, to not slow down
critical paths.
This extends and improves what we did in commit 5fa8bbda38c6 ("net: use
a work queue to defer net_disable_timestamp() work")
Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 540e2894f7905538740aaf122bd8e0548e1c34a4 ]
KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in packet_bind_spkt():
Acked-by: Eric Dumazet <edumazet@google.com>
==================================================================
BUG: KMSAN: use of unitialized memory
CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
0000000000000000 0000000000000092 00000000ec400911 0000000000000002
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
[<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
[<ffffffff818a783b>] __msan_warning+0x5b/0xb0
mm/kmsan/kmsan_instr.c:424
[< inline >] strlen lib/string.c:484
[<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
[<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230
net/packet/af_packet.c:3132
[<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
[<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
arch/x86/entry/entry_64.o:?
chained origin: 00000000eba00911
[<ffffffff810bb787>] save_stack_trace+0x27/0x50
arch/x86/kernel/stacktrace.c:67
[< inline >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
[< inline >] kmsan_save_stack mm/kmsan/kmsan.c:334
[<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0
mm/kmsan/kmsan.c:527
[<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130
mm/kmsan/kmsan_instr.c:380
[<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
[<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
[<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
arch/x86/entry/entry_64.o:?
origin description: ----address@SYSC_bind (origin=00000000eb400911)
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)
, when I run the following program as root:
=====================================
#include <string.h>
#include <sys/socket.h>
#include <netpacket/packet.h>
#include <net/ethernet.h>
int main() {
struct sockaddr addr;
memset(&addr, 0xff, sizeof(addr));
addr.sa_family = AF_PACKET;
int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
bind(fd, &addr, sizeof(addr));
return 0;
}
=====================================
This happens because addr.sa_data copied from the userspace is not
zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
results in calling strlen() on the kernel copy of that non-terminated
buffer.
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 51fb60eb162ab84c5edf2ae9c63cf0b878e5547e ]
l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.
Signed-off-by: Paul Hüber <phueber@kernsp.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6e28099d38c0e50d62c1afc054e37e573adf3d21 ]
Restore the lost masking of TOS in input route code to
allow ip rules to match it properly.
Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>
[1] http://marc.info/?t=137331755300040&r=1&w=2
Fixes: 89aef8921bfb ("ipv4: Delete routing cache.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7dcdf941cdc96692ab99fd790c8cc68945514851 ]
Align vti6 with vti by returning GRE_KEY flag. This enables iproute2
to display tunnel keys on "ip -6 tunnel show"
Signed-off-by: David Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1b4c689d4130bcfd3532680b64db562300716b6 upstream.
mmapped netlink has a number of unresolved issues:
- TX zerocopy support had to be disabled more than a year ago via
commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.")
because the content of the mmapped area can change after netlink
attribute validation but before message processing.
- RX support was implemented mainly to speed up nfqueue dumping packet
payload to userspace. However, since commit ae08ce0021087a5d812d2
("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
with the socket-based interface too (via the skb_zerocopy helper).
The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:
- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.
- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump").
- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.
Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").
mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch. Daniel Borkmann fixed this via
commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.
nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
Problem is that in the mmap case, the allocation time also determines
the ordering in which the frame will be seen by userspace (A
allocating before B means that A is located in earlier ring slot,
but this also means that B might get a lower sequence number then A
since seqno is decided later. To fix this we would need to extend the
spinlocked region to also cover the allocation and message setup which
isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
Queing GSO packets is faster than having to force a software segmentation
in the kernel, so this is a desirable option. However, with a mmap based
ring one has to use 64kb per ring slot element, else mmap has to fall back
to the socket path (NL_MMAP_STATUS_COPY) for all large packets.
To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Shi Yuejie <shiyuejie@outlook.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a9e9200d8661c1a0be8c39f93deb383dc940de35 upstream.
The issue was found when entering suspend and resume.
It triggers a warning in:
mac80211/key.c: ieee80211_enable_keys()
...
WARN_ON_ONCE(sdata->crypto_tx_tailroom_needed_cnt ||
sdata->crypto_tx_tailroom_pending_dec);
...
It points out sdata->crypto_tx_tailroom_pending_dec isn't cleaned up successfully
in a delayed_work during suspend. Add a flush_delayed_work to fix it.
Signed-off-by: Matt Chen <matt.chen@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e623a9e9dec29ae811d11f83d0074ba254aba374 ]
Commit 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path"),
changed the exit path of recvmmsg to always return the datagrams
variable and modified the error paths to set the variable to the error
code returned by recvmsg if necessary.
However in the case sock_error returned an error, the error code was
then ignored, and recvmmsg returned 0.
Change the error path of recvmmsg to correctly return the error code
of sock_error.
The bug was triggered by using recvmmsg on a CAN interface which was
not up. Linux 4.6 and later return 0 in this case while earlier
releases returned -ENETDOWN.
Fixes: 34b88a68f26a ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ca4ef4574f1ee5252e2cd365f8f5d5bafd048f32 ]
The skbs processed by ip_cmsg_recv() are not guaranteed to
be linear e.g. when sending UDP packets over loopback with
MSGMORE.
Using csum_partial() on [potentially] the whole skb len
is dangerous; instead be on the safe side and use skb_checksum().
Thanks to syzkaller team to detect the issue and provide the
reproducer.
v1 -> v2:
- move the variable declaration in a tighter scope
Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4c03b862b12f980456f9de92db6d508a4999b788 ]
A nested lock depth was added to the hasbin_delete() code but it
doesn't actually work some well and results in tons of lockdep splats.
Fix the code instead to properly drop the lock around the operation
and just keep peeking the head of the hashbin queue.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5edabca9d4cff7f1f2b68f0bac55ef99d9798ba4 ]
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.
However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.
Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.
Similar fixes for TCP:
fb7e2399ec17f1004c0e0ccfd17439f8759ede01 [TCP]: skb is unexpectedly freed.
0aea76d35c9651d55bbaf746e7914e5f9ae5a25d tcp: SYN packets are now
simply consumed
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2bd624b4611ffee36422782d16e1c944d1351e98 ]
Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a
netdev"), unfortunately, introduced the following issues.
1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
rcu_read-side critical section. rcu_read_lock disables preemption, most often,
which prohibits calling sleeping functions.
[ ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
[ ]
[ ] rcu_scheduler_active = 1, debug_locks = 0
[ ] 4 locks held by ovs-vswitchd/1969:
[ ] #0: (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40
[ ] #1: (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
[ ] #2: (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20
[ ] #3: (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0
[ ]
[ ] Call Trace:
[ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
[ ] [<ffffffff810a2da7>] ___might_sleep+0x57/0x210
[ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
[ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
[ ] [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30
[ ] [<ffffffff81186e88>] ? printk+0x4d/0x4f
[ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0
[ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
"sleeping function called from invalid context"
[ ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
[ ] INFO: lockdep is turned off.
[ ] Call Trace:
[ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<ffffffff810a2f52>] ___might_sleep+0x202/0x210
[ ] [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
[ ] [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
[ ] [<ffffffff816106dd>] fanout_release+0x1d/0xe0
[ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
3. calling dev_remove_pack(&fanout->prot_hook), from inside
spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
-> synchronize_net(), which might sleep.
[ ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
[ ] INFO: lockdep is turned off.
[ ] Call Trace:
[ ] [<ffffffff813770c1>] dump_stack+0x85/0xc4
[ ] [<ffffffff81186274>] __schedule_bug+0x64/0x73
[ ] [<ffffffff8162b8cb>] __schedule+0x6b/0xd10
[ ] [<ffffffff8162c5db>] schedule+0x6b/0x80
[ ] [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410
[ ] [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
[ ] [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
[ ] [<ffffffff8154eab5>] synchronize_net+0x35/0x50
[ ] [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20
[ ] [<ffffffff8161077e>] fanout_release+0xbe/0xe0
[ ] [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
4. fanout_release() races with calls from different CPU.
To fix the above problems, remove the call to fanout_release() under
rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
__fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
fanout->prot_hook is removed as well.
Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a netdev")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d199fab63c11998a602205f7ee7ff7c05c97164b ]
Multiple threads can call fanout_add() at the same time.
We need to grab fanout_mutex earlier to avoid races that could
lead to one thread freeing po->rollover that was set by another thread.
Do the same in fanout_release(), for peace of mind, and to help us
finding lockdep issues earlier.
Fixes: dc99f600698d ("packet: Add fanout support.")
Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8b74d439e1697110c5e5c600643e823eb1dd0762 ]
It seems nobody used LLC since linux-3.12.
Fortunately fuzzers like syzkaller still know how to run this code,
otherwise it would be no fun.
Setting skb->sk without skb->destructor leads to all kinds of
bugs, we now prefer to be very strict about it.
Ideally here we would use skb_set_owner() but this helper does not exist yet,
only CAN seems to have a private helper for that.
Fixes: 376c7311bdb6 ("net: add a temporary sanity check in skb_orphan()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 72fb96e7bdbbdd4421b0726992496531060f3636 ]
udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(
L2TP should use its own handler, because it really does not
look the same.
SIOCINQ for instance should not assume UDP checksum or headers.
Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.
While crashes only happen on recent kernels (after commit
7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.
Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")
Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 73d2c6678e6c3af7e7a42b1e78cd0211782ade32 ]
Andrey reported a kernel crash:
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff880060048040 task.stack: ffff880069be8000
RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
FS: 00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
Call Trace:
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2
This is because we miss a check for NULL pointer for skb_peek() when
the queue is empty. Other places already have the same check.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 57031eb794906eea4e1c7b31dc1e2429c0af0c66 ]
Link layer protocols may unconditionally pull headers, as Ethernet
does in eth_type_trans. Ensure that the entire link layer header
always lies in the skb linear segment. tpacket_snd has such a check.
Extend this to packet_snd.
Variable length link layer headers complicate the computation
somewhat. Here skb->len may be smaller than dev->hard_header_len.
Round up the linear length to be at least as long as the smallest of
the two.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 217e6fa24ce28ec87fca8da93c9016cb78028612 ]
The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.
Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.
Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.
Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d7426c69a1942b2b9b709bf66b944ff09f561484 ]
Dmitry reported a double free in sit_init_net():
kernel BUG at mm/percpu.c:689!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 15692 Comm: syz-executor1 Not tainted 4.10.0-rc6-next-20170206 #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
task: ffff8801c9cc27c0 task.stack: ffff88017d1d8000
RIP: 0010:pcpu_free_area+0x68b/0x810 mm/percpu.c:689
RSP: 0018:ffff88017d1df488 EFLAGS: 00010046
RAX: 0000000000010000 RBX: 00000000000007c0 RCX: ffffc90002829000
RDX: 0000000000010000 RSI: ffffffff81940efb RDI: ffff8801db841d94
RBP: ffff88017d1df590 R08: dffffc0000000000 R09: 1ffffffff0bb3bdd
R10: dffffc0000000000 R11: 00000000000135dd R12: ffff8801db841d80
R13: 0000000000038e40 R14: 00000000000007c0 R15: 00000000000007c0
FS: 00007f6ea608f700(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000aff8 CR3: 00000001c8d44000 CR4: 00000000001426f0
DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
free_percpu+0x212/0x520 mm/percpu.c:1264
ipip6_dev_free+0x43/0x60 net/ipv6/sit.c:1335
sit_init_net+0x3cb/0xa10 net/ipv6/sit.c:1831
ops_init+0x10a/0x530 net/core/net_namespace.c:115
setup_net+0x2ed/0x690 net/core/net_namespace.c:291
copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
SYSC_unshare kernel/fork.c:2281 [inline]
SyS_unshare+0x64e/0xfc0 kernel/fork.c:2231
entry_SYSCALL_64_fastpath+0x1f/0xc2
This is because when tunnel->dst_cache init fails, we free dev->tstats
once in ipip6_tunnel_init() and twice in sit_init_net(). This looks
redundant but its ndo_uinit() does not seem enough to clean up everything
here. So avoid this by setting dev->tstats to NULL after the first free,
at least for -net.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2dcab598484185dea7ec22219c76dcdd59e3cb90 ]
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ccf7abb93af09ad0868ae9033d1ca8108bdaec82 ]
Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.
__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.
This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.
Again, this gem was found by syzkaller tool.
Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ebf6c9cb23d7e56eec8575a88071dec97ad5c6e2 ]
Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()
A similar bug was fixed in commit 8ce48623f0cf ("ipv6: tcp: restore
IP6CB for pktoptions skbs"), but I missed another spot.
tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts
Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7892032cfe67f4bde6fc2ee967e45a8fbaf33756 ]
Andrey Konovalov reported out of bound accesses in ip6gre_err()
If GRE flags contains GRE_KEY, the following expression
*(((__be32 *)p) + (grehlen / 4) - 1)
accesses data ~40 bytes after the expected point, since
grehlen includes the size of IPv6 headers.
Let's use a "struct gre_base_hdr *greh" pointer to make this
code more readable.
p[1] becomes greh->protocol.
grhlen is the GRE header length.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d71b7896886345c53ef1d84bda2bc758554f5d61 ]
syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()
Fixes: 20e2a8648596 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 34b2cef20f19c87999fff3da4071e66937db9644 ]
Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.
ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.
We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.
Thanks to syzkaller team for finding this bug.
Fixes: d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5fa8bbda38c668e56b0c6cdecced2eac2fe36dec ]
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.
Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.
As Cong suggested, we now use a work queue.
It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.
netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.
[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
#0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
#1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
___might_sleep+0x560/0x650 kernel/sched/core.c:7748
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
__might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
__static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
__sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sock_wfree+0xae/0x120 net/core/sock.c:1645
skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
skb_release_all+0x15/0x60 net/core/skbuff.c:668
__kfree_skb+0x15/0x20 net/core/skbuff.c:684
kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
inet_frag_put include/net/inet_frag.h:133 [inline]
nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
nf_hook include/linux/netfilter.h:212 [inline]
__ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
sock_write_iter+0x326/0x600 net/socket.c:848
do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
vfs_writev+0x87/0xc0 fs/read_write.c:911
do_writev+0x110/0x2c0 fs/read_write.c:944
SYSC_writev fs/read_write.c:1017 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1014
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 06425c308b92eaf60767bc71d359f4cbc7a561f8 ]
syszkaller fuzzer was able to trigger a divide by zero, when
TCP window scaling is not enabled.
SO_RCVBUF can be used not only to increase sk_rcvbuf, also
to decrease it below current receive buffers utilization.
If mss is negative or 0, just return a zero TCP window.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 63117f09c768be05a0bf465911297dc76394f686 ]
Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.
Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit fbfa743a9d2a0ffa24251764f10afc13eb21e739 ]
This function suffers from multiple issues.
First one is that pskb_may_pull() may reallocate skb->head,
so the 'raw' pointer needs either to be reloaded or not used at all.
Second issue is that NEXTHDR_DEST handling does not validate
that the options are present in skb->data, so we might read
garbage or access non existent memory.
With help from Willem de Bruijn.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ]
Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()
The main problem seems that the sockets themselves are not RCU
protected.
If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.
Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.
[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
Call Trace:
<IRQ>
[<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
[<ffffffff81d55771>] sk_filter+0x41/0x210
[<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
[<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
[<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
[<ffffffff81f07af9>] can_receive+0xd9/0x120
[<ffffffff81f07beb>] can_rcv+0xab/0x100
[<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
[<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
[<ffffffff81d37f67>] process_backlog+0x127/0x280
[<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
[<ffffffff810c88d4>] __do_softirq+0x184/0x440
[<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
[<ffffffff810c8bed>] do_softirq+0x1d/0x20
[<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
[<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
[<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
[<ffffffff810e3baf>] process_one_work+0x24f/0x670
[<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
[<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
[<ffffffff810ebafc>] kthread+0x12c/0x150
[<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
Reported-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit da7061c82e4a1bc6a5e134ef362c86261906c860 upstream.
The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead
it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The
return value in mesh_add_vendor_ies must therefore be checked against
ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with
WLAN_EID_VENDOR_SPECIFIC will be rejected.
Fixes: 082ebb0c258d ("mac80211: fix mesh beacon format")
Signed-off-by: Thorsten Horstmann <thorsten@defutech.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[sven@narfation.org: Add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a06393ed03167771246c4c43192d9c264bc48412 upstream.
When removing a bcm tx operation either a hrtimer or a tasklet might run.
As the hrtimer triggers its associated tasklet and vice versa we need to
take care to mutually terminate both handlers.
Reported-by: Michael Josenhans <michael.josenhans@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Michael Josenhans <michael.josenhans@web.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 034dd34ff4916ec1f8f74e39ca3efb04eab2f791 upstream.
Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."
The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize. Fix that.
[120408.542387] general protection fault: 0000 [#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946] ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901] gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017] svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257] ? __enqueue_entity+0x6c/0x70
[120408.589101] svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212] ? try_to_wake_up+0x4a/0x360
[120408.591036] ? wake_up_process+0x15/0x20
[120408.592093] ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177] svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168] svc_process_common+0x203/0x710 [sunrpc]
[120408.595220] svc_process+0x105/0x1c0 [sunrpc]
[120408.596278] nfsd+0xe9/0x160 [nfsd]
[120408.597060] kthread+0x101/0x140
[120408.597734] ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626] ? kthread_park+0x90/0x90
[120408.599448] ret_from_fork+0x22/0x30
Fixes: 1d658336b05f "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: Simo Sorce <simo@redhat.com>
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f154be241d22298d2b63c9b613f619fa1086ea75 ]
Commit 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid
lockdep splat") removed the netif_device_detach() call done in
dsa_slave_suspend() which is necessary, and paired with a corresponding
netif_device_attach(), bring it back.
Fixes: 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0fb44559ffd67de8517098b81f675fa0210f13f0 ]
Dmitry reported a deadlock scenario:
unix_bind() path:
u->bindlock ==> sb_writer
do_splice() path:
sb_writer ==> pipe->mutex ==> u->bindlock
In the unix_bind() code path, unix_mknod() does not have to
be done with u->bindlock held, since it is a pure fs operation,
so we can just move unix_mknod() out.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b6677449dff674cf5b81429b11d5c7f358852ef9 ]
Any bridge options specified during link creation (e.g. ip link add)
are ignored as br_dev_newlink() does not process them.
Use br_changelink() to do it.
Fixes: 133235161721 ("bridge: implement rtnl_link_ops->changelink")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0dbd7ff3ac5017a46033a9d0a87a8267d69119d9 ]
Found that if we run LTP netstress test with large MSS (65K),
the first attempt from server to send data comparable to this
MSS on fastopen connection will be delayed by the probe timer.
Here is an example:
< S seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
> S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
< . ack 1 win 342 length 0
Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
as well as in 'size_goal'. This results the segment not queued for
transmition until all the data copied from user buffer. Then, inside
__tcp_push_pending_frames(), it breaks on send window test and
continues with the check probe timer.
Fragmentation occurs in tcp_write_wakeup()...
+0.2 > P. seq 1:43777 ack 1 win 342 length 43776
< . ack 43777, win 1365 length 0
> P. seq 43777:65001 ack 1 win 342 options [...] length 21224
...
This also contradicts with the fact that we should bound to the half
of the window if it is large.
Fix this flaw by correctly initializing max_window. Before that, it
could have large values that affect further calculations of 'size_goal'.
Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|