summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-03-22net/8021q: create device with all possible features in wanted_featuresAndrey Vagin
[ Upstream commit 88997e4208aea117627898e5f6f9801cf3cd42d2 ] wanted_features is a set of features which have to be enabled if a hardware allows that. Currently when a vlan device is created, its wanted_features is set to current features of its base device. The problem is that the base device can get new features and they are not propagated to vlan-s of this device. If we look at bonding devices, they doesn't have this problem and this patch suggests to fix this issue by the same way how it works for bonding devices. We meet this problem, when we try to create a vlan device over a bonding device. When a system are booting, real devices require time to be initialized, so bonding devices created without slaves, then vlan devices are created and only then ethernet devices are added to the bonding device. As a result we have vlan devices with disabled scatter-gather. * create a bonding device $ ip link add bond0 type bond $ ethtool -k bond0 | grep scatter scatter-gather: off tx-scatter-gather: off [requested on] tx-scatter-gather-fraglist: off [requested on] * create a vlan device $ ip link add link bond0 name bond0.10 type vlan id 10 $ ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off * Add a slave device to bond0 $ ip link set dev eth0 master bond0 And now we can see that the bond0 device has got the scatter-gather feature, but the bond0.10 hasn't got it. [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter scatter-gather: off tx-scatter-gather: off tx-scatter-gather-fraglist: off With this patch the vlan device will get all new features from the bonding device. Here is a call trace how features which are set in this patch reach dev->wanted_features. register_netdevice vlan_dev_init ... dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE | NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC | NETIF_F_ALL_FCOE; dev->features |= dev->hw_features; ... dev->wanted_features = dev->features & dev->hw_features; __netdev_update_features(dev); vlan_dev_fix_features ... Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22netem: apply correct delay when rate throttlingNik Unger
[ Upstream commit 5080f39e8c72e01cf37e8359023e7018e2a4901e ] I recently reported on the netem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically: 1) The measured link bandwidth *increases* when a higher delay is added 2) The measured link bandwidth appears higher than the specified limit 3) The measured link bandwidth for the same very slow settings varies significantly across machines The issue can be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here: https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html There appear to be two underlying bugs causing these effects: - The first issue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth. - The second issue is that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering. I modified the code to expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected. Signed-off-by: Nik Unger <njunger@uwaterloo.ca> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pack percpu counter allocationsFlorian Westphal
commit ae0ac0ed6fcf5af3be0f63eb935f483f44a402d2 upstream. instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pass xt_counters struct to counter allocatorFlorian Westphal
commit f28e15bacedd444608e25421c72eb2cf4527c9ca upstream. Keeps some noise away from a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: pass xt_counters struct instead of packet counterFlorian Westphal
commit 4d31eef5176df06f218201bc9c0ce40babb41660 upstream. On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pktFlorian Westphal
commit b078556aecd791b0e5cb3a59f4c3a14273b52121 upstream. l4proto->manip_pkt() can cause reallocation of skb head so pointer to the ipv6 header must be reloaded. Reported-and-tested-by: <syzbot+10005f4292fc9cc89de7@syzkaller.appspotmail.com> Fixes: 58a317f1061c89 ("netfilter: ipv6: add IPv6 NAT support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: bridge: ebt_among: add missing match size checksFlorian Westphal
commit c4585a2823edf4d1326da44d1524ecbfda26bb37 upstream. ebt_among is special, it has a dynamic match size and is exempt from the central size checks. Therefore it must check that the size of the match structure provided from userspace is sane by making sure em->match_size is at least the minimum size of the expected structure. The module has such a check, but its only done after accessing a structure that might be out of bounds. tested with: ebtables -A INPUT ... \ --among-dst fe:fe:fe:fe:fe:fe --among-dst fe:fe:fe:fe:fe:fe --among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fb,fe:fe:fe:fe:fc:fd,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe --among-src fe:fe:fe:fe:ff:f,fe:fe:fe:fe:fe:fa,fe:fe:fe:fe:fe:fd,fe:fe:fe:fe:fe:fe,fe:fe:fe:fe:fe:fe Reported-by: <syzbot+fe0b19af568972814355@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsetsFlorian Westphal
commit b71812168571fa55e44cdd0254471331b9c4c4c6 upstream. We need to make sure the offsets are not out of range of the total size. Also check that they are in ascending order. The WARN_ON triggered by syzkaller (it sets panic_on_warn) is changed to also bail out, no point in continuing parsing. Briefly tested with simple ruleset of -A INPUT --limit 1/s' --log plus jump to custom chains using 32bit ebtables binary. Reported-by: <syzbot+845a53d13171abf8bf29@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: IDLETIMER: be syzkaller friendlyEric Dumazet
commit cfc2c740533368b96e2be5e0a4e8c3cace7d9814 upstream. We had one report from syzkaller [1] First issue is that INIT_WORK() should be done before mod_timer() or we risk timer being fired too soon, even with a 1 second timer. Second issue is that we need to reject too big info->timeout to avoid overflows in msecs_to_jiffies(info->timeout * 1000), or risk looping, if result after overflow is 0. [1] WARNING: CPU: 1 PID: 5129 at kernel/workqueue.c:1444 __queue_work+0xdf4/0x1230 kernel/workqueue.c:1444 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 5129 Comm: syzkaller159866 Not tainted 4.16.0-rc1+ #230 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:988 RIP: 0010:__queue_work+0xdf4/0x1230 kernel/workqueue.c:1444 RSP: 0018:ffff8801db507538 EFLAGS: 00010006 RAX: ffff8801aeb46080 RBX: ffff8801db530200 RCX: ffffffff81481404 RDX: 0000000000000100 RSI: ffffffff86b42640 RDI: 0000000000000082 RBP: ffff8801db507758 R08: 1ffff1003b6a0de5 R09: 000000000000000c R10: ffff8801db5073f0 R11: 0000000000000020 R12: 1ffff1003b6a0eb6 R13: ffff8801b1067ae0 R14: 00000000000001f8 R15: dffffc0000000000 queue_work_on+0x16a/0x1c0 kernel/workqueue.c:1488 queue_work include/linux/workqueue.h:488 [inline] schedule_work include/linux/workqueue.h:546 [inline] idletimer_tg_expired+0x44/0x60 net/netfilter/xt_IDLETIMER.c:116 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:777 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x5e/0xba kernel/locking/spinlock.c:184 RSP: 0018:ffff8801c20173c8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12 RAX: dffffc0000000000 RBX: 0000000000000282 RCX: 0000000000000006 RDX: 1ffffffff0d592cd RSI: 1ffff10035d68d23 RDI: 0000000000000282 RBP: ffff8801c20173d8 R08: 1ffff10038402e47 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8820e5c8 R13: ffff8801b1067ad8 R14: ffff8801aea7c268 R15: ffff8801aea7c278 __debug_object_init+0x235/0x1040 lib/debugobjects.c:378 debug_object_init+0x17/0x20 lib/debugobjects.c:391 __init_work+0x2b/0x60 kernel/workqueue.c:506 idletimer_tg_create net/netfilter/xt_IDLETIMER.c:152 [inline] idletimer_tg_checkentry+0x691/0xb00 net/netfilter/xt_IDLETIMER.c:213 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850 check_target net/ipv6/netfilter/ip6_tables.c:533 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:575 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:744 do_replace net/ipv6/netfilter/ip6_tables.c:1160 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1686 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x10b/0x130 net/ipv6/ipv6_sockglue.c:927 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2976 SYSC_setsockopt net/socket.c:1850 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1829 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: nat: cope with negative port rangePaolo Abeni
commit db57ccf0f2f4624b4c4758379f8165277504fbd7 upstream. syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: x_tables: fix missing timer initialization in xt_LEDPaolo Abeni
commit 10414014bc085aac9f787a5890b33b5605fbcfc4 upstream. syzbot reported that xt_LED may try to use the ledinternal->timer without previously initializing it: ------------[ cut here ]------------ kernel BUG at kernel/time/timer.c:958! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 1826 Comm: kworker/1:2 Not tainted 4.15.0+ #306 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:__mod_timer kernel/time/timer.c:958 [inline] RIP: 0010:mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: 0018:ffff8801d24fe9f8 EFLAGS: 00010293 RAX: ffff8801d25246c0 RBX: ffff8801aec6cb50 RCX: ffffffff816052c6 RDX: 0000000000000000 RSI: 00000000fffbd14b RDI: ffff8801aec6cb68 RBP: ffff8801d24fec98 R08: 0000000000000000 R09: 1ffff1003a49fd6c R10: ffff8801d24feb28 R11: 0000000000000005 R12: dffffc0000000000 R13: ffff8801d24fec70 R14: 00000000fffbd14b R15: ffff8801af608f90 FS: 0000000000000000(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000206d6fd0 CR3: 0000000006a22001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: led_tg+0x1db/0x2e0 net/netfilter/xt_LED.c:75 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_raw_hook+0x65/0x80 net/ipv6/netfilter/ip6table_raw.c:42 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook.constprop.27+0x3f6/0x830 include/linux/netfilter.h:243 NF_HOOK include/linux/netfilter.h:286 [inline] ndisc_send_skb+0xa51/0x1370 net/ipv6/ndisc.c:491 ndisc_send_ns+0x38a/0x870 net/ipv6/ndisc.c:633 addrconf_dad_work+0xb9e/0x1320 net/ipv6/addrconf.c:4008 process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429 Code: 85 2a 0b 00 00 4d 8b 3c 24 4d 85 ff 75 9f 4c 8b bd 60 fd ff ff e8 bb 57 10 00 65 ff 0d 94 9a a1 7e e9 d9 fc ff ff e8 aa 57 10 00 <0f> 0b e8 a3 57 10 00 e9 14 fb ff ff e8 99 57 10 00 4c 89 bd 70 RIP: __mod_timer kernel/time/timer.c:958 [inline] RSP: ffff8801d24fe9f8 RIP: mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: ffff8801d24fe9f8 ---[ end trace f661ab06f5dd8b3d ]--- The ledinternal struct can be shared between several different xt_LED targets, but the related timer is currently initialized only if the first target requires it. Fix it by unconditionally initializing the timer struct. v1 -> v2: call del_timer_sync() unconditionally, too. Fixes: 268cb38e1802 ("netfilter: x_tables: add LED trigger target") Reported-by: syzbot+10c98dc5725c6c8fc7fb@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-18netfilter: add back stackpointer size checksFlorian Westphal
commit 57ebd808a97d7c5b1e1afb937c2db22beba3c1f8 upstream. The rationale for removing the check is only correct for rulesets generated by ip(6)tables. In iptables, a jump can only occur to a user-defined chain, i.e. because we size the stack based on number of user-defined chains we cannot exceed stack size. However, the underlying binary format has no such restriction, and the validation step only ensures that the jump target is a valid rule start point. IOW, its possible to build a rule blob that has no user-defined chains but does contain a jump. If this happens, no jump stack gets allocated and crash occurs because no jumpstack was allocated. Fixes: 7814b6ec6d0d6 ("netfilter: xtables: don't save/restore jumpstack offset") Reported-by: syzbot+e783f671527912cd9403@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11mpls, nospec: Sanitize array index in mpls_label_ok()Dan Williams
commit 3968523f855050b8195134da951b87c20bd66130 upstream. mpls_label_ok() validates that the 'platform_label' array index from a userspace netlink message payload is valid. Under speculation the mpls_label_ok() result may not resolve in the CPU pipeline until after the index is used to access an array element. Sanitize the index to zero to prevent userspace-controlled arbitrary out-of-bounds speculation, a precursor for a speculative execution side channel vulnerability. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 4.4: - mpls_label_ok() doesn't take an extack parameter - Drop change in mpls_getroute()] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11net: mpls: Pull common label check into helperDavid Ahern
commit b7b386f42f079b25b942c756820e36c6bd09b2ca upstream. mpls_route_add and mpls_route_del have the same checks on the label. Move to a helper. Avoid duplicate extack messages in the next patch. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11sctp: verify size of a new chunk in _sctp_make_chunk()Alexey Kodanev
[ Upstream commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c ] When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11sctp: fix dst refcnt leak in sctp_v6_get_dst()Alexey Kodanev
[ Upstream commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2 ] When going through the bind address list in sctp_v6_get_dst() and the previously found address is better ('matchlen > bmatchlen'), the code continues to the next iteration without releasing currently held destination. Fix it by releasing 'bdst' before continue to the next iteration, and instead of introducing one more '!IS_ERR(bdst)' check for dst_release(), move the already existed one right after ip6_dst_lookup_flow(), i.e. we shouldn't proceed further if we get an error for the route lookup. Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using secondary addresses for ipv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11tcp_bbr: better deal with suboptimal GSOEric Dumazet
[ Upstream commit 350c9f484bde93ef229682eedd98cd5f74350f7f ] BBR uses tcp_tso_autosize() in an attempt to probe what would be the burst sizes and to adjust cwnd in bbr_target_cwnd() with following gold formula : /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr->tso_segs_goal; But GSO can be lacking or be constrained to very small units (ip link set dev ... gso_max_segs 2) What we really want is to have enough packets in flight so that both GSO and GRO are efficient. So in the case GSO is off or downgraded, we still want to have the same number of packets in flight as if GSO/TSO was fully operational, so that GRO can hopefully be working efficiently. To fix this issue, we make tcp_tso_autosize() unaware of sk->sk_gso_max_segs Only tcp_tso_segs() has to enforce the gso_max_segs limit. Tested: ethtool -K eth0 tso off gso off tc qd replace dev eth0 root pfifo_fast Before patch: for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done     691  (ss -temoi shows cwnd is stuck around 6 )     667     651     631     517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done    1733 (ss -temoi shows cwnd is around 386 )    1778    1746    1781    1718 Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11rxrpc: Fix send in rxrpc_send_data_packet()David Howells
[ Upstream commit 93c62c45ed5fad1b87e3a45835b251cd68de9c46 ] All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send both parts of the iov[] buffer, but one of them does not. Fix it so that it does. Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc header included, but no payload. Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11tcp: Honor the eor bit in tcp_mtu_probeIlya Lesokhin
[ Upstream commit 808cf9e38cd7923036a99f459ccc8cf2955e47af ] Avoid SKB coalescing if eor bit is set in one of the relevant SKBs. Fixes: c134ecb87817 ("tcp: Make use of MSG_EOR in tcp_sendmsg") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11sctp: fix dst refcnt leak in sctp_v4_get_dstTommi Rantala
[ Upstream commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8 ] Fix dst reference count leak in sctp_v4_get_dst() introduced in commit 410f03831 ("sctp: add routing output fallback"): When walking the address_list, successive ip_route_output_key() calls may return the same rt->dst with the reference incremented on each call. The code would not decrement the dst refcount when the dst pointer was identical from the previous iteration, causing the dst refcnt leak. Testcase: ip netns add TEST ip netns exec TEST ip link set lo up ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link set dev dummy0 netns TEST ip link set dev dummy1 netns TEST ip link set dev dummy2 netns TEST ip netns exec TEST ip addr add 192.168.1.1/24 dev dummy0 ip netns exec TEST ip link set dummy0 up ip netns exec TEST ip addr add 192.168.1.2/24 dev dummy1 ip netns exec TEST ip link set dummy1 up ip netns exec TEST ip addr add 192.168.1.3/24 dev dummy2 ip netns exec TEST ip link set dummy2 up ip netns exec TEST sctp_test -H 192.168.1.2 -P 20002 -h 192.168.1.1 -p 20000 -s -B 192.168.1.3 ip netns del TEST In 4.4 and 4.9 kernels this results to: [ 354.179591] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 364.419674] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 374.663664] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 384.903717] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 395.143724] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 405.383645] unregister_netdevice: waiting for lo to become free. Usage count = 1 ... Fixes: 410f03831 ("sctp: add routing output fallback") Fixes: 0ca50d12f ("sctp: fix src address selection if using secondary addresses") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11udplite: fix partial checksum initializationAlexey Kodanev
[ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ] Since UDP-Lite is always using checksum, the following path is triggered when calculating pseudo header for it: udp4_csum_init() or udp6_csum_init() skb_checksum_init_zero_check() __skb_checksum_validate_complete() The problem can appear if skb->len is less than CHECKSUM_BREAK. In this particular case __skb_checksum_validate_complete() also invokes __skb_checksum_complete(skb). If UDP-Lite is using partial checksum that covers only part of a packet, the function will return bad checksum and the packet will be dropped. It can be fixed if we skip skb_checksum_init_zero_check() and only set the required pseudo header checksum for UDP-Lite with partial checksum before udp4_csum_init()/udp6_csum_init() functions return. Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4") Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11netlink: ensure to loop over all netns in genlmsg_multicast_allns()Nicolas Dichtel
[ Upstream commit cb9f7a9a5c96a773bbc9c70660dc600cfff82f82 ] Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the case when commit 134e63756d5f was pushed. However, there was no reason to stop the loop if a netns does not have listeners. Returns -ESRCH only if there was no listeners in all netns. To avoid having the same problem in the future, I didn't take the assumption that nlmsg_multicast() returns only 0 or -ESRCH. Fixes: 134e63756d5f ("genetlink: make netns aware") CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68Sabrina Dubroca
[ Upstream commit c7272c2f1229125f74f22dcdd59de9bbd804f1c8 ] According to RFC 1191 sections 3 and 4, ICMP frag-needed messages indicating an MTU below 68 should be rejected: A host MUST never reduce its estimate of the Path MTU below 68 octets. and (talking about ICMP frag-needed's Next-Hop MTU field): This field will never contain a value less than 68, since every router "must be able to forward a datagram of 68 octets without fragmentation". Furthermore, by letting net.ipv4.route.min_pmtu be set to negative values, we can end up with a very large PMTU when (-1) is cast into u32. Let's also make ip_rt_min_pmtu a u32, since it's only ever compared to unsigned ints. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11net: fix race on decreasing number of TX queuesJakub Kicinski
[ Upstream commit ac5b70198adc25c73fba28de4f78adcee8f6be0b ] netif_set_real_num_tx_queues() can be called when netdev is up. That usually happens when user requests change of number of channels/rings with ethtool -L. The procedure for changing the number of queues involves resetting the qdiscs and setting dev->num_tx_queues to the new value. When the new value is lower than the old one, extra care has to be taken to ensure ordering of accesses to the number of queues vs qdisc reset. Currently the queues are reset before new dev->num_tx_queues is assigned, leaving a window of time where packets can be enqueued onto the queues going down, leading to a likely crash in the drivers, since most drivers don't check if TX skbs are assigned to an active queue. Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11ipv6 sit: work around bogus gcc-8 -Wrestrict warningArnd Bergmann
[ Upstream commit ca79bec237f5809a7c3c59bd41cd0880aa889966 ] gcc-8 has a new warning that detects overlapping input and output arguments in memcpy(). It triggers for sit_init_net() calling ipip6_tunnel_clone_6rd(), which is actually correct: net/ipv6/sit.c: In function 'sit_init_net': net/ipv6/sit.c:192:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] The problem here is that the logic detecting the memcpy() arguments finds them to be the same, but the conditional that tests for the input and output of ipip6_tunnel_clone_6rd() to be identical is not a compile-time constant. We know that netdev_priv(t->dev) is the same as t for a tunnel device, and comparing "dev" directly here lets the compiler figure out as well that 'dev == sitn->fb_tunnel_dev' when called from sit_init_net(), so it no longer warns. This code is old, so Cc stable to make sure that we don't get the warning for older kernels built with new gcc. Cc: Martin Sebor <msebor@gmail.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83456 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11fib_semantics: Don't match route with mismatching tclassidStefano Brivio
[ Upstream commit a8c6db1dfd1b1d18359241372bb204054f2c3174 ] In fib_nh_match(), if output interface or gateway are passed in the FIB configuration, we don't have to check next hops of multipath routes to conclude whether we have a match or not. However, we might still have routes with different realms matching the same output interface and gateway configuration, and this needs to cause the match to fail. Otherwise the first route inserted in the FIB will match, regardless of the realms: # ip route add 1.1.1.1 dev eth0 table 1234 realms 1/2 # ip route append 1.1.1.1 dev eth0 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev eth0 scope link realms 1/2 1.1.1.1 dev eth0 scope link realms 3/4 # ip route del 1.1.1.1 dev ens3 table 1234 realms 3/4 # ip route list table 1234 1.1.1.1 dev ens3 scope link realms 3/4 whereas route with realms 3/4 should have been deleted instead. Explicitly check for fc_flow passed in the FIB configuration (this comes from RTA_FLOW extracted by rtm_to_fib_config()) and fail matching if it differs from nh_tclassid. The handling of RTA_FLOW for multipath routes later in fib_nh_match() is still needed, as we can have multiple RTA_FLOW attributes that need to be matched against the tclassid of each next hop. v2: Check that fc_flow is set before discarding the match, so that the user can still select the first matching rule by not specifying any realm, as suggested by David Ahern. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11bridge: check brport attr show in brport_showXin Long
[ Upstream commit 1b12580af1d0677c3c3a19e35bfe5d59b03f737f ] Now br_sysfs_if file flush doesn't have attr show. To read it will cause kernel panic after users chmod u+r this file. Xiong found this issue when running the commands: ip link add br0 type bridge ip link add type veth ip link set veth0 master br0 chmod u+r /sys/devices/virtual/net/veth0/brport/flush timeout 3 cat /sys/devices/virtual/net/veth0/brport/flush kernel crashed with NULL a pointer dereference call trace. This patch is to fix it by return -EINVAL when brport_attr->show is null, just the same as the check for brport_attr->store in brport_store(). Fixes: 9cf637473c85 ("bridge: add sysfs hook to flush forwarding table") Reported-by: Xiong Zhou <xzhou@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11netlink: put module reference if dump start failsJason A. Donenfeld
commit b87b6194be631c94785fe93398651e804ed43e28 upstream. Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03sctp: make use of pre-calculated lenMarcelo Ricardo Leitner
[ Upstream commit c76f97c99ae6d26d14c7f0e50e074382bfbc9f98 ] Some sockopt handling functions were calculating the length of the buffer to be written to userspace and then calculating it again when actually writing the buffer, which could lead to some write not using an up-to-date length. This patch updates such places to just make use of the len variable. Also, replace some sizeof(type) to sizeof(var). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03mac80211: mesh: drop frames appearing to be from usJohannes Berg
[ Upstream commit 736a80bbfda709fb3631f5f62056f250a38e5804 ] If there are multiple mesh stations with the same MAC address, they will both get confused and start throwing warnings. Obviously in this case nothing can actually work anyway, so just drop frames that look like they're from ourselves early on. Reported-by: Gui Iribarren <gui@altermundi.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03nl80211: Check for the required netlink attribute presenceHao Chen
[ Upstream commit 3ea15452ee85754f70f3b9fa1f23165ef2e77ba7 ] nl80211_nan_add_func() does not check if the required attribute NL80211_NAN_FUNC_FOLLOW_UP_DEST is present when processing NL80211_CMD_ADD_NAN_FUNCTION request. This request can be issued by users with CAP_NET_ADMIN privilege and may result in NULL dereference and a system crash. Add a check for the required attribute presence. Signed-off-by: Hao Chen <flank3rsky@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error pathTommi Rantala
[ Upstream commit 642a8439ddd8423b92f2e71960afe21ee1f66bb6 ] Calling tipc_mon_delete() before the monitor has been created will oops. This can happen in tipc_enable_bearer() error path if tipc_disc_create() fails. [ 48.589074] BUG: unable to handle kernel paging request at 0000000000001008 [ 48.590266] IP: tipc_mon_delete+0xea/0x270 [tipc] [ 48.591223] PGD 1e60c5067 P4D 1e60c5067 PUD 1eb0cf067 PMD 0 [ 48.592230] Oops: 0000 [#1] SMP KASAN [ 48.595610] CPU: 5 PID: 1199 Comm: tipc Tainted: G B 4.15.0-rc4-pc64-dirty #5 [ 48.597176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 48.598489] RIP: 0010:tipc_mon_delete+0xea/0x270 [tipc] [ 48.599347] RSP: 0018:ffff8801d827f668 EFLAGS: 00010282 [ 48.600705] RAX: ffff8801ee813f00 RBX: 0000000000000204 RCX: 0000000000000000 [ 48.602183] RDX: 1ffffffff1de6a75 RSI: 0000000000000297 RDI: 0000000000000297 [ 48.604373] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff1dd1533 [ 48.605607] R10: ffffffff8eafbb05 R11: fffffbfff1dd1534 R12: 0000000000000050 [ 48.607082] R13: dead000000000200 R14: ffffffff8e73f310 R15: 0000000000001020 [ 48.608228] FS: 00007fc686484800(0000) GS:ffff8801f5540000(0000) knlGS:0000000000000000 [ 48.610189] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.611459] CR2: 0000000000001008 CR3: 00000001dda70002 CR4: 00000000003606e0 [ 48.612759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 48.613831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 48.615038] Call Trace: [ 48.615635] tipc_enable_bearer+0x415/0x5e0 [tipc] [ 48.620623] tipc_nl_bearer_enable+0x1ab/0x200 [tipc] [ 48.625118] genl_family_rcv_msg+0x36b/0x570 [ 48.631233] genl_rcv_msg+0x5a/0xa0 [ 48.631867] netlink_rcv_skb+0x1cc/0x220 [ 48.636373] genl_rcv+0x24/0x40 [ 48.637306] netlink_unicast+0x29c/0x350 [ 48.639664] netlink_sendmsg+0x439/0x590 [ 48.642014] SYSC_sendto+0x199/0x250 [ 48.649912] do_syscall_64+0xfd/0x2c0 [ 48.650651] entry_SYSCALL64_slow_path+0x25/0x25 [ 48.651843] RIP: 0033:0x7fc6859848e3 [ 48.652539] RSP: 002b:00007ffd25dff938 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 48.654003] RAX: ffffffffffffffda RBX: 00007ffd25dff990 RCX: 00007fc6859848e3 [ 48.655303] RDX: 0000000000000054 RSI: 00007ffd25dff990 RDI: 0000000000000003 [ 48.656512] RBP: 00007ffd25dff980 R08: 00007fc685c35fc0 R09: 000000000000000c [ 48.657697] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000d13010 [ 48.658840] R13: 00007ffd25e009c0 R14: 0000000000000000 R15: 0000000000000000 [ 48.662972] RIP: tipc_mon_delete+0xea/0x270 [tipc] RSP: ffff8801d827f668 [ 48.664073] CR2: 0000000000001008 [ 48.664576] ---[ end trace e811818d54d5ce88 ]--- Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03tipc: error path leak fixes in tipc_enable_bearer()Tommi Rantala
[ Upstream commit 19142551b2be4a9e13838099fde1351386e5e007 ] Fix memory leak in tipc_enable_bearer() if enable_media() fails, and cleanup with bearer_disable() if tipc_mon_create() fails. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03ip6_tunnel: get the min mtu properly in ip6_tnl_xmitXin Long
[ Upstream commit c9fefa08190fc879fb2e681035d7774e0a8c5170 ] Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but IPV6_MIN_MTU actually only works when the inner packet is ipv6. With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst couldn't be set less than 1280. It would cause tx_err and the packet to be dropped when the outer dst pmtu is close to 1280. Jianlin found it by running ipv4 traffic with the topo: (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server) After changing eth2 mtu to 1300, the performance became very low, or the connection was even broken. The issue also affects ip4ip6 and ip6ip6 tunnels. So if the inner packet is ipv4, 576 should be considered as the min mtu. Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP. This patch using 576 as the min mtu for non-ipv6 packet works for all those cases. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03ipv6: icmp6: Allow icmp messages to be looped backBrendan McGrath
[ Upstream commit 588753f1eb18978512b1c9b85fddb457d46f9033 ] One example of when an ICMPv6 packet is required to be looped back is when a host acts as both a Multicast Listener and a Multicast Router. A Multicast Router will listen on address ff02::16 for MLDv2 messages. Currently, MLDv2 messages originating from a Multicast Listener running on the same host as the Multicast Router are not being delivered to the Multicast Router. This is due to dst.input being assigned the default value of dst_discard. This results in the packet being looped back but discarded before being delivered to the Multicast Router. This patch sets dst.input to ip6_input to ensure a looped back packet is delivered to the Multicast Router. Signed-off-by: Brendan McGrath <redmcg@redmandi.dyndns.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-28cfg80211: fix cfg80211_beacon_dupArnd Bergmann
commit bee92d06157fc39d5d7836a061c7d41289a55797 upstream. gcc-8 warns about some obviously incorrect code: net/mac80211/cfg.c: In function 'cfg80211_beacon_dup': net/mac80211/cfg.c:2896:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict] From the context, I conclude that we want to copy from beacon into new_beacon, as we do in the rest of the function. Cc: stable@vger.kernel.org Fixes: 73da7d5bab79 ("mac80211: add channel switch command and beacon callbacks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-28netfilter: drop outermost socket lock in getsockopt()Paolo Abeni
commit 01ea306f2ac2baff98d472da719193e738759d93 upstream. The Syzbot reported a possible deadlock in the netfilter area caused by rtnl lock, xt lock and socket lock being acquired with a different order on different code paths, leading to the following backtrace: Reviewed-by: Xin Long <lucien.xin@gmail.com> ====================================================== WARNING: possible circular locking dependency detected 4.15.0+ #301 Not tainted ------------------------------------------------------ syzkaller233489/4179 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [<0000000048e996fd>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 but task is already holding lock: (&xt[i].mutex){+.+.}, at: [<00000000328553a2>] xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041 which lock already depends on the new lock. === Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope"), we already acquire the socket lock in the innermost scope, where needed. In such commit I forgot to remove the outer-most socket lock from the getsockopt() path, this commit addresses the issues dropping it now. v1 -> v2: fix bad subj, added relavant 'fixes' tag Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev") Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the required scope") Reported-by: syzbot+ddde1c7b7ff7442d7f2d@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25xfrm: Fix stack-out-of-bounds with misconfigured transport mode policies.Steffen Klassert
[ Upstream commit 732706afe1cc46ef48493b3d2b69c98f36314ae4 ] On policies with a transport mode template, we pass the addresses from the flowi to xfrm_state_find(), assuming that the IP addresses (and address family) don't change during transformation. Unfortunately our policy template validation is not strict enough. It is possible to configure policies with transport mode template where the address family of the template does not match the selectors address family. This lead to stack-out-of-bound reads because we compare arddesses of the wrong family. Fix this by refusing such a configuration, address family can not change on transport mode. We use the assumption that, on transport mode, the first templates address family must match the address family of the policy selector. Subsequent transport mode templates must mach the address family of the previous template. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25net_sched: red: Avoid illegal valuesNogah Frankel
[ Upstream commit 8afa10cbe281b10371fee5a87ab266e48d71a7f9 ] Check the qmin & qmax values doesn't overflow for the given Wlog value. Check that qmin <= qmax. Fixes: a783474591f2 ("[PKT_SCHED]: Generic RED layer") Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25sctp: only update outstanding_bytes for transmitted queue when doing ↵Xin Long
prsctp_prune [ Upstream commit d30fc5126efb0c33b7adf5966d3051db2c3d7721 ] Now outstanding_bytes is only increased when appending chunks into one packet and sending it at 1st time, while decreased when it is about to move into retransmit queue. It means outstanding_bytes value is already decreased for all chunks in retransmit queue. However sctp_prsctp_prune_sent is a common function to check the chunks in both transmitted and retransmit queue, it decrease outstanding_bytes when moving a chunk into abandoned queue from either of them. It could cause outstanding_bytes underflow, as it also decreases it's value for the chunks in retransmit queue. This patch fixes it by only updating outstanding_bytes for transmitted queue when pruning queues for prsctp prio policy, the same fix is also needed in sctp_check_transmitted. Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25dn_getsockoptdecnet: move nf_{get/set}sockopt outside sock lockPaolo Abeni
commit dfec091439bb2acf763497cfc58f2bdfc67c56b7 upstream. After commit 3f34cfae1238 ("netfilter: on sockopt() acquire sock lock only in the required scope"), the caller of nf_{get/set}sockopt() must not hold any lock, but, in such changeset, I forgot to cope with DECnet. This commit addresses the issue moving the nf call outside the lock, in the dn_{get,set}sockopt() with the same schema currently used by ipv4 and ipv6. Also moves the unhandled sockopts of the end of the main switch statements, to improve code readability. Reported-by: Petr Vandrovec <petr@vandrovec.name> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=198791#c2 Fixes: 3f34cfae1238 ("netfilter: on sockopt() acquire sock lock only in the required scope") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25net: avoid skb_warn_bad_offload on IS_ERRWillem de Bruijn
commit 8d74e9f88d65af8bb2e095aff506aa6eac755ada upstream. skb_warn_bad_offload warns when packets enter the GSO stack that require skb_checksum_help or vice versa. Do not warn on arbitrary bad packets. Packet sockets can craft many. Syzkaller was able to demonstrate another one with eth_type games. In particular, suppress the warning when segmentation returns an error, which is for reasons other than checksum offload. See also commit 36c92474498a ("net: WARN if skb_checksum_help() is called on skb requiring segmentation") for context on this warning. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25rds: tcp: atomically purge entries from rds_tcp_conn_list during netns deleteSowmini Varadhan
commit f10b4cff98c6977668434fbf5dd58695eeca2897 upstream. The rds_tcp_kill_sock() function parses the rds_tcp_conn_list to find the rds_connection entries marked for deletion as part of the netns deletion under the protection of the rds_tcp_conn_lock. Since the rds_tcp_conn_list tracks rds_tcp_connections (which have a 1:1 mapping with rds_conn_path), multiple tc entries in the rds_tcp_conn_list will map to a single rds_connection, and will be deleted as part of the rds_conn_destroy() operation that is done outside the rds_tcp_conn_lock. The rds_tcp_conn_list traversal done under the protection of rds_tcp_conn_lock should not leave any doomed tc entries in the list after the rds_tcp_conn_lock is released, else another concurrently executiong netns delete (for a differnt netns) thread may trip on these entries. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insertCong Wang
commit 7dc68e98757a8eccf8ca7a53a29b896f1eef1f76 upstream. rateest_hash is supposed to be protected by xt_rateest_mutex, and, as suggested by Eric, lookup and insert should be atomic, so we should acquire the xt_rateest_mutex once for both. So introduce a non-locking helper for internal use and keep the locking one for external. Reported-by: <syzbot+5cb189720978275e4c75@syzkaller.appspotmail.com> Fixes: 5859034d7eb8 ("[NETFILTER]: x_tables: add RATEEST target") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: xt_cgroup: initialize info->priv in cgroup_mt_check_v1()Cong Wang
commit ba7cd5d95f25cc6005f687dabdb4e7a6063adda9 upstream. xt_cgroup_info_v1->priv is an internal pointer only used for kernel, we should not trust what user-space provides. Reported-by: <syzbot+4fbcfcc0d2e6592bd641@syzkaller.appspotmail.com> Fixes: c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: on sockopt() acquire sock lock only in the required scopePaolo Abeni
commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream. Syzbot reported several deadlocks in the netfilter area caused by rtnl lock and socket lock being acquired with a different order on different code paths, leading to backtraces like the following one: ====================================================== WARNING: possible circular locking dependency detected 4.15.0-rc9+ #212 Not tainted ------------------------------------------------------ syzkaller041579/3682 is trying to acquire lock: (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock include/net/sock.h:1463 [inline] (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 but task is already holding lock: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607 tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845 check_target net/ipv6/netfilter/ip6_tables.c:538 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:580 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749 do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 -> #0 (sk_lock-AF_INET6){+.+.}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914 lock_sock_nested+0xc2/0x110 net/core/sock.c:2780 lock_sock include/net/sock.h:1463 [inline] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET6); lock(rtnl_mutex); lock(sk_lock-AF_INET6); *** DEADLOCK *** 1 lock held by syzkaller041579/3682: #0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 The problem, as Florian noted, is that nf_setsockopt() is always called with the socket held, even if the lock itself is required only for very tight scopes and only for some operation. This patch addresses the issues moving the lock_sock() call only where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt() does not need anymore to acquire both locks. Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: ipt_CLUSTERIP: fix out-of-bounds accesses in clusterip_tg_check()Dmitry Vyukov
commit 1a38956cce5eabd7b74f94bab70265e4df83165e upstream. Commit 136e92bbec0a switched local_nodes from an array to a bitmask but did not add proper bounds checks. As the result clusterip_config_init_nodelist() can both over-read ipt_clusterip_tgt_info.local_nodes and over-write clusterip_config.local_nodes. Add bounds checks for both. Fixes: 136e92bbec0a ("[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility data") Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: x_tables: avoid out-of-bounds reads in xt_request_find_{match|target}Eric Dumazet
commit da17c73b6eb74aad3c3c0654394635675b623b3e upstream. It looks like syzbot found its way into netfilter territory. Issue here is that @name comes from user space and might not be null terminated. Out-of-bound reads happen, KASAN is not happy. v2 added similar fix for xt_request_find_target(), as Florian advised. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25netfilter: x_tables: fix int overflow in xt_alloc_table_info()Dmitry Vyukov
commit 889c604fd0b5f6d3b8694ade229ee44124de1127 upstream. syzkaller triggered OOM kills by passing ipt_replace.size = -1 to IPT_SO_SET_REPLACE. The root cause is that SMP_ALIGN() in xt_alloc_table_info() causes int overflow and the size check passes when it should not. SMP_ALIGN() is no longer needed leftover. Remove SMP_ALIGN() call in xt_alloc_table_info(). Reported-by: syzbot+4396883fa8c4f64e0175@syzkaller.appspotmail.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25sctp: set frag_point in sctp_setsockopt_maxseg correctlyXin Long
commit ecca8f88da5c4260cc2bccfefd2a24976704c366 upstream. Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect. val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT. Then in sctp_datamsg_from_user(), when it's value is greater than cookie echo len and trying to bundle with cookie echo chunk, the first_len will overflow. The worse case is when it's value is equal as cookie echo len, first_len becomes 0, it will go into a dead loop for fragment later on. In Hangbin syzkaller testing env, oom was even triggered due to consecutive memory allocation in that loop. Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should deduct the data header for frag_point or user_frag check. This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg codes. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>