summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-11-15ipv6: correctly add local routes when lo goes upNicolas Dichtel
[ Upstream commit a220445f9f4382c36a53d8ef3e08165fa27f7e2c ] The goal of the patch is to fix this scenario: ip link add dummy1 type dummy ip link set dummy1 up ip link set lo down ; ip link set lo up After that sequence, the local route to the link layer address of dummy1 is not there anymore. When the loopback is set down, all local routes are deleted by addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still exists, because the corresponding idev has a reference on it. After the rcu grace period, dst_rcu_free() is called, and thus ___dst_free(), which will set obsolete to DST_OBSOLETE_DEAD. In this case, init_loopback() is called before dst_rcu_free(), thus obsolete is still sets to something <= 0. So, the function doesn't add the route again. To avoid that race, let's check the rt6 refcnt instead. Fixes: 25fb6ca4ed9c ("net IPv6 : Fix broken IPv6 routing table after loopback down-up") Fixes: a881ae1f625c ("ipv6: don't call addrconf_dst_alloc again when enable lo") Fixes: 33d99113b110 ("ipv6: reallocate addrconf router for ipv6 address when lo device up") Reported-by: Francesco Santoro <francesco.santoro@6wind.com> Reported-by: Samuel Gauthier <samuel.gauthier@6wind.com> CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Weilong Chen <chenweilong@huawei.com> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15ip6_tunnel: fix ip6_tnl_lookupVadim Fedorenko
[ Upstream commit 68d00f332e0ba7f60f212be74ede290c9f873bc5 ] The commit ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.") introduces support for wildcards in tunnels endpoints, but in some rare circumstances ip6_tnl_lookup selects wrong tunnel interface relying only on source or destination address of the packet and not checking presence of wildcard in tunnels endpoints. Later in ip6_tnl_rcv this packets can be dicarded because of difference in ipproto even if fallback device have proper ipproto configuration. This patch adds checks of wildcard endpoint in tunnel avoiding such behavior Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel endpoints.") Signed-off-by: Vadim Fedorenko <junk@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15ipv6: tcp: restore IP6CB for pktoptions skbsEric Dumazet
[ Upstream commit 8ce48623f0cf3d632e32448411feddccb693d351 ] Baozeng Ding reported following KASAN splat : BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8 Read of size 1 by task poc/25548 Call Trace: [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15 [< inline >] print_address_description /mm/kasan/report.c:204 [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283 [< inline >] kasan_report /mm/kasan/report.c:303 [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321 [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687 [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40 [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150 [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230 [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035 [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647 [< inline >] SYSC_getsockopt /net/socket.c:1776 [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758 [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6 Memory state around the buggy address: ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff He also provided a syzkaller reproducer. Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB data that was moved at a different place in tcp_v6_rcv() This patch moves tcp_v6_restore_cb() up and calls it from tcp_v6_do_rcv() when np->pktoptions is set. Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15netlink: do not enter direct reclaim from netlink_dump()Eric Dumazet
[ Upstream commit d35c99ff77ecb2eb239731b799386f3b3637a31e ] Since linux-3.15, netlink_dump() can use up to 16384 bytes skb allocations. Due to struct skb_shared_info ~320 bytes overhead, we end up using order-3 (on x86) page allocations, that might trigger direct reclaim and add stress. The intent was really to attempt a large allocation but immediately fallback to a smaller one (order-1 on x86) in case of memory stress. On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to meet the goal. Old kernels would need to remove __GFP_WAIT While we are at it, since we do an order-3 allocation, allow to use all the allocated bytes instead of 16384 to reduce syscalls during large dumps. iproute2 already uses 32KB recvmsg() buffer sizes. Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384) Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <ast@kernel.org> Cc: Greg Thelen <gthelen@google.com> Reviewed-by: Greg Rose <grose@lightfleet.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15packet: call fanout_release, while UNREGISTERING a netdevAnoob Soman
[ Upstream commit 6664498280cf17a59c3e7cf1a931444c02633ed1 ] If a socket has FANOUT sockopt set, a new proto_hook is registered as part of fanout_add(). When processing a NETDEV_UNREGISTER event in af_packet, __fanout_unlink is called for all sockets, but prot_hook which was registered as part of fanout_add is not removed. Call fanout_release, on a NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the fanout_list. This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo() Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15net: Add netdev all_adj_list refcnt propagation to fix panicAndrew Collins
[ Upstream commit 93409033ae653f1c9a949202fb537ab095b2092f ] This is a respin of a patch to fix a relatively easily reproducible kernel panic related to the all_adj_list handling for netdevs in recent kernels. The following sequence of commands will reproduce the issue: ip link add link eth0 name eth0.100 type vlan id 100 ip link add link eth0 name eth0.200 type vlan id 200 ip link add name testbr type bridge ip link set eth0.100 master testbr ip link set eth0.200 master testbr ip link add link testbr mac0 type macvlan ip link delete dev testbr This creates an upper/lower tree of (excuse the poor ASCII art): /---eth0.100-eth0 mac0-testbr- \---eth0.200-eth0 When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one reference to eth0 is added, so this results in a panic. This change adds reference count propagation so things are handled properly. Matthias Schiffer reported a similar crash in batman-adv: https://github.com/freifunk-gluon/gluon/issues/680 https://www.open-mesh.org/issues/247 which this patch also seems to resolve. Signed-off-by: Andrew Collins <acollins@cradlepoint.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() ↵Shmulik Ladkani
functions [ Upstream commit f39acc84aad10710e89835c60d3b6694c43a8dd9 ] Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the case where the input skb data pointer does not point at the mac header: - They're doing push/pop, but fail to properly unwind data back to its original location. For example, in the skb_vlan_push case, any subsequent 'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes BEFORE start of frame, leading to bogus frames that may be transmitted. - They update rcsum per the added/removed 4 bytes tag. Alas if data is originally after the vlan/eth headers, then these bytes were already pulled out of the csum. OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header present no issues. act_vlan is the only caller to skb_vlan_*() that has skb->data pointing at network header (upon ingress). Other calles (ovs, bpf) already adjust skb->data at mac_header. This patch fixes act_vlan to point to the mac_header prior calling skb_vlan_*() functions, as other callers do. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Pravin Shelar <pshelar@ovn.org> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15net: pktgen: fix pkt_sizePaolo Abeni
[ Upstream commit 63d75463c91a5b5be7c0aca11ceb45ea5a0ae81d ] The commit 879c7220e828 ("net: pktgen: Observe needed_headroom of the device") increased the 'pkt_overhead' field value by LL_RESERVED_SPACE. As a side effect the generated packet size, computed as: /* Eth + IPh + UDPh + mpls */ datalen = pkt_dev->cur_pkt_size - 14 - 20 - 8 - pkt_dev->pkt_overhead; is decreased by the same value. The above changed slightly the behavior of existing pktgen users, and made the procfs interface somewhat inconsistent. Fix it by restoring the previous pkt_overhead value and using LL_RESERVED_SPACE as extralen in skb allocation. Also, change pktgen_alloc_skb() to only partially reserve the headroom to allow the caller to prefetch from ll header start. v1 -> v2: - fixed some typos in the comments Fixes: 879c7220e828 ("net: pktgen: Observe needed_headroom of the device") Suggested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_routeNikolay Aleksandrov
[ Upstream commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 ] Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid instead of the previous dst_pid which was copied from in_skb's portid. Since the skb is new the portid is 0 at that point so the packets are sent to the kernel and we get scheduling while atomic or a deadlock (depending on where it happens) by trying to acquire rtnl two times. Also since this is RTM_GETROUTE, it can be triggered by a normal user. Here's the sleeping while atomic trace: [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 [ 7858.212881] 2 locks held by swapper/0/0: [ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350 [ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130 [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179 [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000 [ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e [ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f [ 7858.215251] Call Trace: [ 7858.215412] <IRQ> [<ffffffff813a7804>] dump_stack+0x85/0xc1 [ 7858.215662] [<ffffffff810a4a72>] ___might_sleep+0x192/0x250 [ 7858.215868] [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100 [ 7858.216072] [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0 [ 7858.216279] [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460 [ 7858.216487] [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40 [ 7858.216687] [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260 [ 7858.216900] [<ffffffff81573c70>] rtnl_unicast+0x20/0x30 [ 7858.217128] [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0 [ 7858.217351] [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130 [ 7858.217581] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.217785] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.217990] [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350 [ 7858.218192] [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350 [ 7858.218415] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.218656] [<ffffffff810fde10>] run_timer_softirq+0x260/0x640 [ 7858.218865] [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f [ 7858.219068] [<ffffffff816637c8>] __do_softirq+0xe8/0x54f [ 7858.219269] [<ffffffff8107a948>] irq_exit+0xb8/0xc0 [ 7858.219463] [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50 [ 7858.219678] [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0 [ 7858.219897] <EOI> [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10 [ 7858.220165] [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10 [ 7858.220373] [<ffffffff810298e3>] default_idle+0x23/0x190 [ 7858.220574] [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20 [ 7858.220790] [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60 [ 7858.221016] [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0 [ 7858.221257] [<ffffffff8164f995>] rest_init+0x135/0x140 [ 7858.221469] [<ffffffff81f83014>] start_kernel+0x50e/0x51b [ 7858.221670] [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120 [ 7858.221894] [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c [ 7858.222113] [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()Lance Richardson
[ Upstream commit db32e4e49ce2b0e5fcc17803d011a401c0a637f6 ] Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup. Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value. This affected output route lookup for packets sent on an ip6gretap device in cases where routing was dependent on the value of flowi6_proto. Since the correct proto is already set in the tunnel flowi6 template via commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit path."), simply delete the line setting the incorrect flowi6_proto value. Suggested-by: Jiri Benc <jbenc@redhat.com> Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15tcp: fix a compile error in DBGUNDO()Eric Dumazet
[ Upstream commit 019b1c9fe32a2a32c1153e31375f87ec3e591273 ] If DBGUNDO() is enabled (FASTRETRANS_DEBUG > 1), a compile error will happen, since inet6_sk(sk)->daddr became sk->sk_v6_daddr Fixes: efe4208f47f9 ("ipv6: make lookups simpler and faster") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15tcp: fix wrong checksum calculation on MTU probingDouglas Caetano dos Santos
[ Upstream commit 2fe664f1fcf7c4da6891f95708a7a56d3c024354 ] With TCP MTU probing enabled and offload TX checksumming disabled, tcp_mtu_probe() calculated the wrong checksum when a fragment being copied into the probe's SKB had an odd length. This was caused by the direct use of skb_copy_and_csum_bits() to calculate the checksum, as it pads the fragment being copied, if needed. When this fragment was not the last, a subsequent call used the previous checksum without considering this padding. The effect was a stale connection in one way, as even retransmissions wouldn't solve the problem, because the checksum was never recalculated for the full SKB length. Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-15tcp: fix overflow in __tcp_retransmit_skb()Eric Dumazet
[ Upstream commit ffb4d6c8508657824bcef68a36b2a0f9d8c09d10 ] If a TCP socket gets a large write queue, an overflow can happen in a test in __tcp_retransmit_skb() preventing all retransmits. The flow then stalls and resets after timeouts. Tested: sysctl -w net.core.wmem_max=1000000000 netperf -H dest -- -s 1000000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-10mac80211: discard multicast and 4-addr A-MSDUsJohannes Berg
commit ea720935cf6686f72def9d322298bf7e9bd53377 upstream. In mac80211, multicast A-MSDUs are accepted in many cases that they shouldn't be accepted in: * drop A-MSDUs with a multicast A1 (RA), as required by the spec in 9.11 (802.11-2012 version) * drop A-MSDUs with a 4-addr header, since the fourth address can't actually be useful for them; unless 4-address frame format is actually requested, even though the fourth address is still not useful in this case, but ignored Accepting the first case, in particular, is very problematic since it allows anyone else with possession of a GTK to send unicast frames encapsulated in a multicast A-MSDU, even when the AP has client isolation enabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31tunnels: Remove encapsulation offloads on decap.Jesse Gross
commit a09a4c8dd1ec7f830e1fb9e59eb72bddc965d168 upstream. If a packet is either locally encapsulated or processed through GRO it is marked with the offloads that it requires. However, when it is decapsulated these tunnel offload indications are not removed. This means that if we receive an encapsulated TCP packet, aggregate it with GRO, decapsulate, and retransmit the resulting frame on a NIC that does not support encapsulation, we won't be able to take advantage of hardware offloads even though it is just a simple TCP packet at this point. This fixes the problem by stripping off encapsulation offload indications when packets are decapsulated. The performance impacts of this bug are significant. In a test where a Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated, and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a result of avoiding unnecessary segmentation at the VM tap interface. Reported-by: Ramu Ramamurthy <sramamur@linux.vnet.ibm.com> Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> (backported from commit a09a4c8dd1ec7f830e1fb9e59eb72bddc965d168) [adapt iptunnel_pull_header arguments, avoid 7f290c9] Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31tunnels: Don't apply GRO to multiple layers of encapsulation.Jesse Gross
commit fac8e0f579695a3ecbc4d3cac369139d7f819971 upstream. When drivers express support for TSO of encapsulated packets, they only mean that they can do it for one layer of encapsulation. Supporting additional levels would mean updating, at a minimum, more IP length fields and they are unaware of this. No encapsulation device expresses support for handling offloaded encapsulated packets, so we won't generate these types of frames in the transmit path. However, GRO doesn't have a check for multiple levels of encapsulation and will attempt to build them. UDP tunnel GRO actually does prevent this situation but it only handles multiple UDP tunnels stacked on top of each other. This generalizes that solution to prevent any kind of tunnel stacking that would cause problems. Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31ipip: Properly mark ipip GRO packets as encapsulated.Jesse Gross
commit b8cba75bdf6a48ea4811bbefb11a94a5c7281b68 upstream. ipip encapsulated packets can be merged together by GRO but the result does not have the proper GSO type set or even marked as being encapsulated at all. Later retransmission of these packets will likely fail if the device does not support ipip offloads. This is similar to the issue resolved in IPv6 sit in feec0cb3 ("ipv6: gro: support sit protocol"). Reported-by: Patrick Boutilier <boutilpj@ednet.ns.ca> Fixes: 9667e9bb ("ipip: Add gro callbacks to ipip offload") Tested-by: Patrick Boutilier <boutilpj@ednet.ns.ca> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jesse Gross <jesse@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28sunrpc: fix write space race causing stallsDavid Vrabel
commit d48f9ce73c997573e1b512893fa6eddf353a6f69 upstream. Write space becoming available may race with putting the task to sleep in xprt_wait_for_buffer_space(). The existing mechanism to avoid the race does not work. This (edited) partial trace illustrates the problem: [1] rpc_task_run_action: task:43546@5 ... action=call_transmit [2] xs_write_space <-xs_tcp_write_space [3] xprt_write_space <-xs_write_space [4] rpc_task_sleep: task:43546@5 ... [5] xs_write_space <-xs_tcp_write_space [1] Task 43546 runs but is out of write space. [2] Space becomes available, xs_write_space() clears the SOCKWQ_ASYNC_NOSPACE bit. [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but this has not yet been queued and the wake up is lost. [4] xs_nospace() is called which calls xprt_wait_for_buffer_space() which queues task 43546. [5] The call to sk->sk_write_space() at the end of xs_nospace() (which is supposed to handle the above race) does not call xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and thus the task is not woken. Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace() so the second call to sk->sk_write_space() calls xprt_write_space(). Suggested-by: Trond Myklebust <trondmy@primarydata.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07batman-adv: remove unused callback from batadv_algo_ops structMarek Lindner
commit d9f179877e50ae2681fe7b0b83e0d9f63b6165ad upstream. Reported-by: Lars Bußmann <ffsoest@kill-you.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> [sven@narfation.org: rewritten commit message to make clear that it is an bugfix to an user reported crash] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07NFS: Don't drop CB requests with invalid principalsChuck Lever
commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream. Before commit 778be232a207 ("NFS do not find client in NFSv4 pg_authenticate"), the Linux callback server replied with RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB request. Let's restore that behavior so the server has a chance to do something useful about it, and provide a warning that helps admins correct the problem. Fixes: 778be232a207 ("NFS do not find client in NFSv4 ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07svc: Avoid garbage replies when pc_func() returns rpc_drop_replyChuck Lever
commit 0533b13072f4bf35738290d2cf9e299c7bc6c42a upstream. If an RPC program does not set vs_dispatch and pc_func() returns rpc_drop_reply, the server sends a reply anyway containing a single word containing the value RPC_DROP_REPLY (in network byte-order, of course). This is a nonsense RPC message. Fixes: 9e701c610923 ("svcrpc: simpler request dropping") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07ipvs: fix bind to link-local mcast IPv6 address in backupQuentin Armitage
commit 3777ed688fba82d0bd43f9fc1ebbc6abe788576d upstream. When using HEAD from https://git.kernel.org/cgit/utils/kernel/ipvsadm/ipvsadm.git/, the command: ipvsadm --start-daemon backup --mcast-interface eth0.60 \ --mcast-group ff02::1:81 fails with the error message: Argument list too long whereas both: ipvsadm --start-daemon master --mcast-interface eth0.60 \ --mcast-group ff02::1:81 and: ipvsadm --start-daemon backup --mcast-interface eth0.60 \ --mcast-group 224.0.0.81 are successful. The error message "Argument list too long" isn't helpful. The error occurs because an IPv6 address is given in backup mode. The error is in make_receive_sock() in net/netfilter/ipvs/ip_vs_sync.c, since it fails to set the interface on the address or the socket before calling inet6_bind() (via sock->ops->bind), where the test 'if (!sk->sk_bound_dev_if)' failed. Setting sock->sk->sk_bound_dev_if on the socket before calling inet6_bind() resolves the issue. Fixes: d33288172e72 ("ipvs: add more mcast parameters for the sync daemon") Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30nl80211: validate number of probe response CSA countersJohannes Berg
commit ad5987b47e96a0fb6d13fea250e936aed000093c upstream. Due to an apparent copy/paste bug, the number of counters for the beacon configuration were checked twice, instead of checking the number of probe response counters. Fix this to check the number of probe response counters before parsing those. Fixes: 9a774c78e211 ("cfg80211: Support multiple CSA counters") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30bridge: re-introduce 'fix parsing of MLDv2 reports'Davide Caratti
[ Upstream commit 9264251ee2a55bce8fb93826b3f581fb9eb7e2c2 ] commit bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") seems to have accidentally reverted commit 47cc84ce0c2f ("bridge: fix parsing of MLDv2 reports"). This commit brings back a change to br_ip6_multicast_mld2_report() where parsing of MLDv2 reports stops when the first group is successfully added to the MDB cache. Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30tipc: fix NULL pointer dereference in shutdown()Vegard Nossum
[ Upstream commit d2fbdf76b85bcdfe57b8ef2ba09d20e8ada79abd ] tipc_msg_create() can return a NULL skb and if so, we shouldn't try to call tipc_node_xmit_skb() on it. general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000 RIP: 0010:[<ffffffff830bb46b>] [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580 RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000 R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000 FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0 DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 Stack: 0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208 ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018 0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611 Call Trace: [<ffffffff830c84a3>] tipc_shutdown+0x553/0x880 [<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00 RIP [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140 RSP <ffff8800595bfce8> ---[ end trace 57b0484e351e71f1 ]--- I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure userspace is equipped to handle that. Anyway, this is better than a GPF and looks somewhat consistent with other tipc_msg_create() callers. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30net/irda: handle iriap_register_lsap() allocation failureVegard Nossum
[ Upstream commit 5ba092efc7ddff040777ae7162f1d195f513571b ] If iriap_register_lsap() fails to allocate memory, self->lsap is set to NULL. However, none of the callers handle the failure and irlmp_connect_request() will happily dereference it: iriap_register_lsap: Unable to allocated LSAP! ================================================================================ UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2 member access within null pointer of type 'struct lsap_cb' CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3 ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880 ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a Call Trace: [<ffffffff82344f40>] dump_stack+0xac/0xfc [<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a [<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411 [<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970 [<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160 [<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0 [<ffffffff83b78904>] iriap_do_client_event+0x94/0x120 [<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0 [<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630 [<ffffffff83ba90c8>] irda_connect+0x828/0x12d0 [<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340 [<ffffffff833c7e09>] SyS_connect+0x9/0x10 [<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0 [<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25 ================================================================================ The bug seems to have been around since forever. There's more problems with missing error checks in iriap_init() (and indeed all of irda_init()), but that's a bigger problem that needs very careful review and testing. This patch will fix the most serious bug (as it's easily reached from unprivileged userspace). I have tested my patch with a reproducer. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30vti: flush x-netns xfrm cache when vti interface is removedLance Richardson
[ Upstream commit a5d0dc810abf3d6b241777467ee1d6efb02575fc ] When executing the script included below, the netns delete operation hangs with the following message (repeated at 10 second intervals): kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 This occurs because a reference to the lo interface in the "secure" netns is still held by a dst entry in the xfrm bundle cache in the init netns. Address this problem by garbage collecting the tunnel netns flow cache when a cross-namespace vti interface receives a NETDEV_DOWN notification. A more detailed description of the problem scenario (referencing commands in the script below): (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 The vti_test interface is created in the init namespace. vti_tunnel_init() attaches a struct ip_tunnel to the vti interface's netdev_priv(dev), setting the tunnel net to &init_net. (2) ip link set vti_test netns secure The vti_test interface is moved to the "secure" netns. Note that the associated struct ip_tunnel still has tunnel->net set to &init_net. (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 The first packet sent using the vti device causes xfrm_lookup() to be called as follows: dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0); Note that tunnel->net is the init namespace, while skb_dst(skb) references the vti_test interface in the "secure" namespace. The returned dst references an interface in the init namespace. Also note that the first parameter to xfrm_lookup() determines which flow cache is used to store the computed xfrm bundle, so after xfrm_lookup() returns there will be a cached bundle in the init namespace flow cache with a dst referencing a device in the "secure" namespace. (4) ip netns del secure Kernel begins to delete the "secure" namespace. At some point the vti_test interface is deleted, at which point dst_ifdown() changes the dst->dev in the cached xfrm bundle flow from vti_test to lo (still in the "secure" namespace however). Since nothing has happened to cause the init namespace's flow cache to be garbage collected, this dst remains attached to the flow cache, so the kernel loops waiting for the last reference to lo to go away. <Begin script> ip link add br1 type bridge ip link set dev br1 up ip addr add dev br1 1.1.1.1/8 ip netns add secure ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 ip link set vti_test netns secure ip netns exec secure ip link set vti_test up ip netns exec secure ip link s lo up ip netns exec secure ip addr add dev lo 192.168.100.1/24 ip netns exec secure ip route add 192.168.200.0/24 dev vti_test ip xfrm policy flush ip xfrm state flush ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \ proto esp mode tunnel mark 1 ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \ proto esp mode tunnel mark 1 ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 ip netns del secure <End script> Reported-by: Hangbin Liu <haliu@redhat.com> Reported-by: Jan Tluka <jtluka@redhat.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'Linus Torvalds
commit 6e1ce3c3451291142a57c4f3f6f999a29fb5b3bc upstream. Right now we use the 'readlock' both for protecting some of the af_unix IO path and for making the bind be single-threaded. The two are independent, but using the same lock makes for a nasty deadlock due to ordering with regards to filesystem locking. The bind locking would want to nest outside the VSF pathname locking, but the IO locking wants to nest inside some of those same locks. We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix splice-bind deadlock") which moved the readlock inside the vfs locks, but that caused problems with overlayfs that will then call back into filesystem routines that take the lock in the wrong order anyway. Splitting the locks means that we can go back to having the bind lock be the outermost lock, and we don't have any deadlocks with lock ordering. Acked-by: Rainer Weikusat <rweikusat@cyberadapt.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30Revert "af_unix: Fix splice-bind deadlock"Linus Torvalds
commit 38f7bd94a97b542de86a2be9229289717e33a7a4 upstream. This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1. It turns out that it just replaces one deadlock with another one: we can still get the wrong lock ordering with the readlock due to overlayfs calling back into the filesystem layer and still taking the vfs locks after the readlock. The proper solution ends up being to just split the readlock into two pieces: the bind lock (taken *outside* the vfs locks) and the IO lock (taken *inside* the filesystem locks). The two locks are independent anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30bonding: Fix bonding crashMahesh Bandewar
[ Upstream commit 24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c ] Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30net: caif: fix misleading indentationArnd Bergmann
commit 8e0cc8c326d99e41468c96fea9785ab78883a281 upstream. gcc points out code that is not indented the way it is interpreted: net/caif/cfpkt_skbuff.c: In function 'cfpkt_setlen': net/caif/cfpkt_skbuff.c:289:4: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation] return cfpkt_getlen(pkt); ^~~~~~ net/caif/cfpkt_skbuff.c:286:3: note: ...this 'else' clause, but it is not else ^~~~ It is clear from the context that not returning here would be a bug, as we'd end up passing a negative length into a function that takes a u16 length, so it is not missing curly braces here, and I'm assuming that the indentation is the only part that's wrong about it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30tcp: properly scale window in tcp_v[46]_reqsk_send_ack()Eric Dumazet
[ Upstream commit 20a2b49fc538540819a0c552877086548cff8d8d ] When sending an ack in SYN_RECV state, we must scale the offered window if wscale option was negotiated and accepted. Tested: Following packetdrill test demonstrates the issue : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish a connection. +0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0> +0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7> +0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100> // check that window is properly scaled ! +0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30tcp: cwnd does not increase in TCP YeAHArtem Germanov
[ Upstream commit db7196a0d0984b933ccf2cd6a60e26abf466e8a3 ] Commit 76174004a0f19785a328f40388e87e982bbf69b9 (tcp: do not slow start when cwnd equals ssthresh ) introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth ~100KB/s. That is caused by stalled cwnd when cwnd equals ssthresh. This patch fixes it by proper increasing cwnd in this case. Signed-off-by: Artem Germanov <agermanov@anchorfree.com> Acked-by: Dmitry Adamushko <d.adamushko@anchorfree.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30ipv6: release dst in ping_v6_sendmsgDave Jones
[ Upstream commit 03c2778a938aaba0893f6d6cdc29511d91a79848 ] Neither the failure or success paths of ping_v6_sendmsg release the dst it acquires. This leads to a flood of warnings from "net/core/dst.c:288 dst_release" on older kernels that don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported. That patch optimistically hoped this had been fixed post 3.10, but it seems at least one case wasn't, where I've seen this triggered a lot from machines doing unprivileged icmp sockets. Cc: Martin Lau <kafai@fb.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-30ipv4: panic in leaf_walk_rcu due to stale node pointerDavid Forster
[ Upstream commit 94d9f1c5906b20053efe375b6d66610bca4b8b64 ] Panic occurs when issuing "cat /proc/net/route" whilst populating FIB with > 1M routes. Use of cached node pointer in fib_route_get_idx is unsafe. BUG: unable to handle kernel paging request at ffffc90001630024 IP: [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 PGD 11b08d067 PUD 11b08e067 PMD dac4b067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscac snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep virti acpi_cpufreq button parport_pc ppdev lp parport autofs4 ext4 crc16 mbcache jbd tio_ring virtio floppy uhci_hcd ehci_hcd usbcore usb_common libata scsi_mod CPU: 1 PID: 785 Comm: cat Not tainted 4.2.0-rc8+ #4 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff8800da1c0bc0 ti: ffff88011a05c000 task.ti: ffff88011a05c000 RIP: 0010:[<ffffffff814cf6a0>] [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP: 0018:ffff88011a05fda0 EFLAGS: 00010202 RAX: ffff8800d8a40c00 RBX: ffff8800da4af940 RCX: ffff88011a05ff20 RDX: ffffc90001630020 RSI: 0000000001013531 RDI: ffff8800da4af950 RBP: 0000000000000000 R08: ffff8800da1f9a00 R09: 0000000000000000 R10: ffff8800db45b7e4 R11: 0000000000000246 R12: ffff8800da4af950 R13: ffff8800d97a74c0 R14: 0000000000000000 R15: ffff8800d97a7480 FS: 00007fd3970e0700(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffc90001630024 CR3: 000000011a7e4000 CR4: 00000000000006e0 Stack: ffffffff814d00d3 0000000000000000 ffff88011a05ff20 ffff8800da1f9a00 ffffffff811dd8b9 0000000000000800 0000000000020000 00007fd396f35000 ffffffff811f8714 0000000000003431 ffffffff8138dce0 0000000000000f80 Call Trace: [<ffffffff814d00d3>] ? fib_route_seq_start+0x93/0xc0 [<ffffffff811dd8b9>] ? seq_read+0x149/0x380 [<ffffffff811f8714>] ? fsnotify+0x3b4/0x500 [<ffffffff8138dce0>] ? process_echoes+0x70/0x70 [<ffffffff8121cfa7>] ? proc_reg_read+0x47/0x70 [<ffffffff811bb823>] ? __vfs_read+0x23/0xd0 [<ffffffff811bbd42>] ? rw_verify_area+0x52/0xf0 [<ffffffff811bbe61>] ? vfs_read+0x81/0x120 [<ffffffff811bcbc2>] ? SyS_read+0x42/0xa0 [<ffffffff81549ab2>] ? entry_SYSCALL_64_fastpath+0x16/0x75 Code: 48 85 c0 75 d8 f3 c3 31 c0 c3 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 a 04 89 f0 33 02 44 89 c9 48 d3 e8 0f b6 4a 05 49 89 RIP [<ffffffff814cf6a0>] leaf_walk_rcu+0x10/0xe0 RSP <ffff88011a05fda0> CR2: ffffc90001630024 Signed-off-by: Dave Forster <dforster@brocade.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
2016-09-24ipv6: addrconf: fix dev refcont leak when DAD failedWei Yongjun
commit 751eb6b6042a596b0080967c1a529a9fe98dac1d upstream. In general, when DAD detected IPv6 duplicate address, ifp->state will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a delayed work, the call tree should be like this: ndisc_recv_ns -> addrconf_dad_failure <- missing ifp put -> addrconf_mod_dad_work -> schedule addrconf_dad_work() -> addrconf_dad_stop() <- missing ifp hold before call it addrconf_dad_failure() called with ifp refcont holding but not put. addrconf_dad_work() call addrconf_dad_stop() without extra holding refcount. This will not cause any issue normally. But the race between addrconf_dad_failure() and addrconf_dad_work() may cause ifp refcount leak and netdevice can not be unregister, dmesg show the following messages: IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected! ... unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing to workqueue") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"Johannes Berg
commit 4d0bd46a4d55383f7b925e6cf7865a77e0f0e020 upstream. This reverts commit 3d5fdff46c4b2b9534fa2f9fc78e90a48e0ff724. Ben Hutchings pointed out that the commit isn't safe since it assumes that the structure used by the driver is iw_point, when in fact there's no way to know about that. Fortunately, the only driver in the tree that ever runs this code path is the wilc1000 staging driver, so it doesn't really matter. Clearly I should have investigated this better before applying, sorry. Reported-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 3d5fdff46c4b ("wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-24tipc: move linearization of buffers to generic codeJon Paul Maloy
commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5 upstream. In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function tipc_udp_recv(). The location of the change was selected in order to make the commit easily appliable to 'net' and 'stable'. We now move this linearization to where it should be done, in the functions tipc_named_rcv() and tipc_link_proto_rcv() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15rds: fix an infoleak in rds_inc_info_copyKangjie Lu
commit 4116def2337991b39919f3b448326e21c40e0dbb upstream. The last field "flags" of object "minfo" is not initialized. Copying this object out may leak kernel stack data. Assign 0 to it to avoid leak. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Juerg Haefliger <juerg.haefliger@hpe.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15net: Use ns_capable_noaudit() when determining net sysctl permissionsTyler Hicks
commit d6e0d306449bcb5fa3c80e7a3edf11d45abf9ae9 upstream. The capability check should not be audited since it is only being used to determine the inode permissions. A failed check does not indicate a violation of security policy but, when an LSM is enabled, a denial audit message was being generated. The denial audit message caused confusion for some application authors because root-running Go applications always triggered the denial. To prevent this confusion, the capability check in net_ctl_permissions() is switched to the noaudit variant. BugLink: https://launchpad.net/bugs/1465724 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15netfilter: x_tables: check for size overflowFlorian Westphal
[ Upstream commit d157bd761585605b7882935ffb86286919f62ea1 ] Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15udp: properly support MSG_PEEK with truncated buffersEric Dumazet
[ Upstream commit 197c949e7798fbf28cfadc69d9ca0c2abbf93191 ] Backport of this upstream commit into stable kernels : 89c22d8c3b27 ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15tipc: fix nl compat regression for link statisticsRichard Alpe
[ Upstream commit 55e77a3e8297581c919b45adcc4d0815b69afa84 ] Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes of the link name where left out. Making the output of tipc-config -ls look something like: Link statistics: dcast-link 1:data0-1.1.2:data0 1:data0-1.1.3:data0 Also, for the record, the patch that introduce this regression claims "Sending the whole object out can cause a leak". Which isn't very likely as this is a compat layer, where the data we are parsing is generated by us and we know the string to be NULL terminated. But you can of course never be to secure. Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15tipc: fix an infoleak in tipc_nl_compat_link_dumpKangjie Lu
[ Upstream commit 5d2be1422e02ccd697ccfcd45c85b4a26e6178e2 ] link_info.str is a char array of size 60. Memory after the NULL byte is not initialized. Sending the whole object out can cause a leak. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15netfilter: x_tables: check for size overflowFlorian Westphal
[ Upstream commit d157bd761585605b7882935ffb86286919f62ea1 ] Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15tipc: fix nullptr crash during subscription cancelParthasarathy Bhuvaragan
[ Upstream commit 4de13d7ed6ffdcbb34317acaa9236f121176f5f8 ] commit 4d5cfcba2f6e ('tipc: fix connection abort during subscription cancel'), removes the check for a valid subscription before calling tipc_nametbl_subscribe(). This will lead to a nullptr exception when we process a subscription cancel request. For a cancel request, a null subscription is passed to tipc_nametbl_subscribe() resulting in exception. In this commit, we call tipc_nametbl_subscribe() only for a valid subscription. Fixes: 4d5cfcba2f6e ('tipc: fix connection abort during subscription cancel') Reported-by: Anders Widell <anders.widell@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07mac80211: fix purging multicast PS buffer queueFelix Fietkau
commit 6b07d9ca9b5363dda959b9582a3fc9c0b89ef3b5 upstream. The code currently assumes that buffered multicast PS frames don't have a pending ACK frame for tx status reporting. However, hostapd sends a broadcast deauth frame on teardown for which tx status is requested. This can lead to the "Have pending ack frames" warning on module reload. Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07SUNRPC: allow for upcalls for same uid but different gss serviceOlga Kornievskaia
commit 9130b8dbc6ac20f2dc5846e1647f5b60eafab6e3 upstream. It's possible to have simultaneous upcalls for the same UIDs but different GSS service. In that case, we need to allow for the upcall to gssd to proceed so that not the same context is used by two different GSS services. Some servers lock the use of context to the GSS service. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-07SUNRPC: Handle EADDRNOTAVAIL on connection failuresTrond Myklebust
commit 1f4c17a03ba7f430d63dba8c8e08ff1e2712581d upstream. If the connect attempt immediately fails with an EADDRNOTAVAIL error, then that means our choice of source port number was bad. This error is expected when we set the SO_REUSEPORT socket option and we have 2 sockets sharing the same source and destination address and port combinations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 402e23b4ed9ed ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20netlabel: add address family checks to netlbl_{sock,req}_delattr()Paul Moore
commit 0e0e36774081534783aa8eeb9f6fbddf98d3c061 upstream. It seems risky to always rely on the caller to ensure the socket's address family is correct before passing it to the NetLabel kAPI, especially since we see at least one LSM which didn't. Add address family checks to the *_delattr() functions to help prevent future problems. Reported-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>