summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-12-09Bluetooth: Fix missing hdev locking for LE scan cleanupJohan Hedberg
commit 8ce783dc5ea3af3a213ac9b4d9d2ccfeeb9c9058 upstream. The hci_conn objects don't have a dedicated lock themselves but rely on the caller to hold the hci_dev lock for most types of access. The hci_conn_timeout() function has so far sent certain HCI commands based on the hci_conn state which has been possible without holding the hci_dev lock. The recent changes to do LE scanning before connect attempts added even more operations to hci_conn and hci_dev from hci_conn_timeout, thereby exposing potential race conditions with the hci_dev and hci_conn states. As an example of such a race, here there's a timeout but an l2cap_sock_connect() call manages to race with the cleanup routine: [Oct21 08:14] l2cap_chan_timeout: chan ee4b12c0 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b12c0 state BT_CONNECT [ +0.000002] l2cap_chan_del: chan ee4b12c0, conn f3141580, err 111, state BT_CONNECT [ +0.000002] l2cap_sock_teardown_cb: chan ee4b12c0 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b12c0 orig refcnt 4 [ +0.000010] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000013] l2cap_chan_put: chan ee4b12c0 orig refcnt 3 [ +0.000063] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000049] hci_conn_params_del: addr ee:0d:30:09:53:1f (type 1) [ +0.000002] hci_chan_list_flush: hcon f53d56e0 [ +0.000001] hci_chan_del: hci0 hcon f53d56e0 chan f4e7ccc0 [ +0.004528] l2cap_sock_create: sock e708fc00 [ +0.000023] l2cap_chan_create: chan ee4b1770 [ +0.000001] l2cap_chan_hold: chan ee4b1770 orig refcnt 1 [ +0.000002] l2cap_sock_init: sk ee4b3390 [ +0.000029] l2cap_sock_bind: sk ee4b3390 [ +0.000010] l2cap_sock_setsockopt: sk ee4b3390 [ +0.000037] l2cap_sock_connect: sk ee4b3390 [ +0.000002] l2cap_chan_connect: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f (type 2) psm 0x00 [ +0.000002] hci_get_route: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f [ +0.000001] hci_dev_hold: hci0 orig refcnt 8 [ +0.000003] hci_conn_hold: hcon f53d56e0 orig refcnt 0 Above the l2cap_chan_connect() shouldn't have been able to reach the hci_conn f53d56e0 anymore but since hci_conn_timeout didn't do proper locking that's not the case. The end result is a reference to hci_conn that's not in the conn_hash list, resulting in list corruption when trying to remove it later: [Oct21 08:15] l2cap_chan_timeout: chan ee4b1770 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b1770 state BT_CONNECT [ +0.000003] l2cap_chan_del: chan ee4b1770, conn f3141580, err 111, state BT_CONNECT [ +0.000001] l2cap_sock_teardown_cb: chan ee4b1770 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b1770 orig refcnt 4 [ +0.000002] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000015] l2cap_chan_put: chan ee4b1770 orig refcnt 3 [ +0.000038] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000003] hci_chan_list_flush: hcon f53d56e0 [ +0.000002] hci_conn_hash_del: hci0 hcon f53d56e0 [ +0.000001] ------------[ cut here ]------------ [ +0.000461] WARNING: CPU: 0 PID: 1782 at lib/list_debug.c:56 __list_del_entry+0x3f/0x71() [ +0.000839] list_del corruption, f53d56e0->prev is LIST_POISON2 (00000200) The necessary fix is unfortunately more complicated than just adding hci_dev_lock/unlock calls to the hci_conn_timeout() call path. Particularly, the hci_conn_del() API, which expects the hci_dev lock to be held, performs a cancel_delayed_work_sync(&hcon->disc_work) which would lead to a deadlock if the hci_conn_timeout() call path tries to acquire the same lock. This patch solves the problem by deferring the cleanup work to a separate work callback. To protect against the hci_dev or hci_conn going away meanwhile temporary references are taken with the help of hci_dev_hold() and hci_conn_get(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09Bluetooth: Fix removing connection parameters when unpairingJohan Hedberg
commit a6ad2a6b9cc1d9d791aee5462cfb8528f366f1d4 upstream. The commit 89cbb0638e9b7 introduced support for deferred connection parameter removal when unpairing by removing them only once an existing connection gets disconnected. However, it failed to address the scenario when we're *not* connected and do an unpair operation. What makes things worse is that most user space BlueZ versions will first issue a disconnect request and only then unpair, meaning the buggy code will be triggered every time. This effectively causes the kernel to resume scanning and reconnect to a device for which we've removed all keys and GATT database information. This patch fixes the issue by adding the missing call to the hci_conn_params_del() function to a branch which handles the case of no existing connection. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09Bluetooth: hidp: fix device disconnect on idle timeoutDavid Herrmann
commit 660f0fc07d21114549c1862e67e78b1cf0c90c29 upstream. The HIDP specs define an idle-timeout which automatically disconnects a device. This has always been implemented in the HIDP layer and forced a synchronous shutdown of the hidp-scheduler. This works just fine, but lacks a forced disconnect on the underlying l2cap channels. This has been broken since: commit 5205185d461d5902325e457ca80bd421127b7308 Author: David Herrmann <dh.herrmann@gmail.com> Date: Sat Apr 6 20:28:47 2013 +0200 Bluetooth: hidp: remove old session-management The old session-management always forced an l2cap error on the ctrl/intr channels when shutting down. The new session-management skips this, as we don't want to enforce channel policy on the caller. In other words, if user-space removes an HIDP device, the underlying channels (which are *owned* and *referenced* by user-space) are still left active. User-space needs to call shutdown(2) or close(2) to release them. Unfortunately, this does not work with idle-timeouts. There is no way to signal user-space that the HIDP layer has been stopped. The API simply does not support any event-passing except for poll(2). Hence, we restore old behavior and force EUNATCH on the sockets if the HIDP layer is disconnected due to idle-timeouts (behavior of explicit disconnects remains unmodified). User-space can still call getsockopt(..., SO_ERROR, ...) ..to retrieve the EUNATCH error and clear sk_err. Hence, the channels can still be re-used (which nobody does so far, though). Therefore, the API still supports the new behavior, but with this patch it's also compatible to the old implicit channel shutdown. Reported-by: Mark Haun <haunma@keteu.org> Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09NFC: nci: extract pipe value using NCI_HCP_MSG_GET_PIPEChristophe Ricard
commit e65917b6d54f8b47d8293ea96adfa604fd46cf0d upstream. When receiving data in nci_hci_msg_rx_work, extract pipe value using NCI_HCP_MSG_GET_PIPE macro. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09NFC: nci: Fix improper management of HCI return codeChristophe Ricard
commit d8cd37ed2fc871c66b4c79c59f651dc2cdf7091c upstream. When sending HCI data over NCI, HCI return code is part of the NCI data. In order to get correctly the HCI return code, we assume the NCI communication is successful and extract the return code for the nci_hci functions return code. This is done because nci_to_errno does not match hci return code value. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09NFC: nci: Fix incorrect data chaining when sending dataChristophe Ricard
commit 500c4ef02277eaadbfe20537f963b6221f6ac007 upstream. When sending HCI data over NCI, cmd information should be present only on the first packet. Each packet shall be specifically allocated and sent to the NCI layer. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09nl80211: Fix potential memory leak from parse_acl_dataOla Olsson
commit 4baf6bea37247e59f1971e8009d13aeda95edba2 upstream. If parse_acl_data succeeds but the subsequent parsing of smps attributes fails, there will be a memory leak due to early returns. Fix that by moving the ACL parsing later. Fixes: 18998c381b19b ("cfg80211: allow requesting SMPS mode on ap start") Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09mac80211: fix divide by zero when NOA updateJanusz.Dziedzic@tieto.com
commit 519ee6918b91abdc4bc9720deae17599a109eb40 upstream. In case of one shot NOA the interval can be 0, catch that instead of potentially (depending on the driver) crashing like this: divide error: 0000 [#1] SMP [...] Call Trace: <IRQ> [<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211] [<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211] [<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k] [<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw] [<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k] [<ffffffff8107ad65>] tasklet_action+0xe5/0xf0 [...] Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09mac80211: allow null chandef in tracingArik Nemtsov
commit 254d3dfe445f94a764e399ca12e04365ac9413ed upstream. In TDLS channel-switch operations the chandef can sometimes be NULL. Avoid an oops in the trace code for these cases and just print a chandef full of zeros. Fixes: a7a6bdd0670fe ("mac80211: introduce TDLS channel switch ops") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09mac80211: fix driver RSSI event calculationsJohannes Berg
commit 8ec6d97871f37e4743678ea4a455bd59580aa0f4 upstream. The ifmgd->ave_beacon_signal value cannot be taken as is for comparisons, it must be divided by since it's represented like that for better accuracy of the EWMA calculations. This would lead to invalid driver RSSI events. Fix the used value. Fixes: 615f7b9bb1f8 ("mac80211: add driver RSSI threshold events") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09mac80211: Fix local deauth while associatingAndrei Otcheretianski
commit a64cba3c5330704a034bd3179270b8d04daf6987 upstream. Local request to deauthenticate wasn't handled while associating, thus the association could continue even when the user space required to disconnect. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09net: fix a race in dst_release()Eric Dumazet
[ Upstream commit d69bbf88c8d0b367cf3e3a052f6daadf630ee566 ] Only cpu seeing dst refcount going to 0 can safely dereference dst->flags. Otherwise an other cpu might already have freed the dst. Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst") Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09packet: race condition in packet_bindFrancesco Ruggeri
[ Upstream commit 30f7ea1c2b5f5fb7462c5ae44fe2e40cb2d6a474 ] There is a race conditions between packet_notifier and packet_bind{_spkt}. It happens if packet_notifier(NETDEV_UNREGISTER) executes between the time packet_bind{_spkt} takes a reference on the new netdevice and the time packet_do_bind sets po->ifindex. In this case the notification can be missed. If this happens during a dev_change_net_namespace this can result in the netdevice to be moved to the new namespace while the packet_sock in the old namespace still holds a reference on it. When the netdevice is later deleted in the new namespace the deletion hangs since the packet_sock is not found in the new namespace' &net->packet.sklist. It can be reproduced with the script below. This patch makes packet_do_bind check again for the presence of the netdevice in the packet_sock's namespace after the synchronize_net in unregister_prot_hook. More in general it also uses the rcu lock for the duration of the bind to stop dev_change_net_namespace/rollback_registered_many from going past the synchronize_net following unlist_netdevice, so that no NETDEV_UNREGISTER notifications can happen on the new netdevice while the bind is executing. In order to do this some code from packet_bind{_spkt} is consolidated into packet_do_dev. import socket, os, time, sys proto=7 realDev='em1' vlanId=400 if len(sys.argv) > 1: vlanId=int(sys.argv[1]) dev='vlan%d' % vlanId os.system('taskset -p 0x10 %d' % os.getpid()) s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto) os.system('ip link add link %s name %s type vlan id %d' % (realDev, dev, vlanId)) os.system('ip netns add dummy') pid=os.fork() if pid == 0: # dev should be moved while packet_do_bind is in synchronize net os.system('taskset -p 0x20000 %d' % os.getpid()) os.system('ip link set %s netns dummy' % dev) os.system('ip netns exec dummy ip link del %s' % dev) s.close() sys.exit(0) time.sleep(.004) try: s.bind(('%s' % dev, proto+1)) except: print 'Could not bind socket' s.close() os.system('ip netns del dummy') sys.exit(0) os.waitpid(pid, 0) s.close() os.system('ip netns del dummy') sys.exit(0) Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09net: Fix prefsrc lookupsDavid Ahern
[ Upstream commit e1b8d903c6c3862160d2d5036806a94786c8fc4e ] A bug report (https://bugzilla.kernel.org/show_bug.cgi?id=107071) noted that the follwoing ip command is failing with v4.3: $ ip route add 10.248.5.0/24 dev bond0.250 table vlan_250 src 10.248.5.154 RTNETLINK answers: Invalid argument 021dd3b8a142d changed the lookup of the given preferred source address to use the table id passed in, but this assumes the local entries are in the given table which is not necessarily true for non-VRF use cases. When validating the preferred source fallback to the local table on failure. Fixes: 021dd3b8a142d ("net: Add routes to the table associated with the device") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ipv4: disable BH when changing ip local port rangeWANG Cong
[ Upstream commit 4ee3bd4a8c7463cdef0b82ebc33fc94a9170a7e0 ] This fixes the following lockdep warning: [ INFO: inconsistent lock state ] 4.3.0-rc7+ #1197 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage. sysctl/1019 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&net->ipv4.ip_local_ports.lock)->seqcount){+.+-..}, at: [<ffffffff81921de7>] ipv4_local_port_range+0xb4/0x12a {IN-SOFTIRQ-R} state was registered at: [<ffffffff810bd682>] __lock_acquire+0x2f6/0xdf0 [<ffffffff810be6d5>] lock_acquire+0x11c/0x1a4 [<ffffffff818e599c>] inet_get_local_port_range+0x4e/0xae [<ffffffff8166e8e3>] udp_flow_src_port.constprop.40+0x23/0x116 [<ffffffff81671cb9>] vxlan_xmit_one+0x219/0xa6a [<ffffffff81672f75>] vxlan_xmit+0xa6b/0xaa5 [<ffffffff817f2deb>] dev_hard_start_xmit+0x2ae/0x465 [<ffffffff817f35ed>] __dev_queue_xmit+0x531/0x633 [<ffffffff817f3702>] dev_queue_xmit_sk+0x13/0x15 [<ffffffff818004a5>] neigh_resolve_output+0x12f/0x14d [<ffffffff81959cfa>] ip6_finish_output2+0x344/0x39f [<ffffffff8195bf58>] ip6_finish_output+0x88/0x8e [<ffffffff8195bfef>] ip6_output+0x91/0xe5 [<ffffffff819792ae>] dst_output_sk+0x47/0x4c [<ffffffff81979392>] NF_HOOK_THRESH.constprop.30+0x38/0x82 [<ffffffff8197981e>] mld_sendpack+0x189/0x266 [<ffffffff8197b28b>] mld_ifc_timer_expire+0x1ef/0x223 [<ffffffff810de581>] call_timer_fn+0xfb/0x28c [<ffffffff810ded1e>] run_timer_softirq+0x1c7/0x1f1 Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_devSabrina Dubroca
[ Upstream commit 2a189f9e57650e9f310ddf4aad75d66c1233a064 ] In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up the dev_snmp6 entry that we have already registered for this device. Call snmp6_unregister_dev in this case. Fixes: a317a2f19da7d ("ipv6: fail early when creating netdev named all or default") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09sit: fix sit0 percpu double allocationsEric Dumazet
[ Upstream commit 4ece9009774596ee3df0acba65a324b7ea79387c ] sit0 device allocates its percpu storage twice : - One time in ipip6_tunnel_init() - One time in ipip6_fb_tunnel_init() Thus we leak 48 bytes per possible cpu per network namespace dismantle. ipip6_fb_tunnel_init() can be much simpler and does not return an error, and should be called after register_netdev() Note that ipip6_tunnel_clone_6rd() also needs to be called after register_netdev() (calling ipip6_tunnel_init()) Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() ↵Ani Sinha
in preemptible context. [ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ] Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ipv4: update RTNH_F_LINKDOWN flag on UP eventJulian Anastasov
[ Upstream commit c9b3292eeb52c6834e972eb5b8fe38914771ed12 ] When nexthop is part of multipath route we should clear the LINKDOWN flag when link goes UP or when first address is added. This is needed because we always set LINKDOWN flag when DEAD flag was set but now on UP the nexthop is not dead anymore. Examples when LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered: - link goes down (LINKDOWN is set), then link goes UP and device shows carrier OK but LINKDOWN remains set - last address is deleted (LINKDOWN is set), then address is added and device shows carrier OK but LINKDOWN remains set Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up here add a multipath route where one nexthop is for dummy0: ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE ifconfig dummy0 down ifconfig dummy0 up now ip route shows nexthop that is not dead. Now set the sysctl var: echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown now ip route will show a dead nexthop because the forgotten RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD. Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09ipv4: fix to not remove local route on link downJulian Anastasov
[ Upstream commit 4f823defdd5b106a5e89745ee8b163c71855de1e ] When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event we should not delete the local routes if the local address is still present. The confusion comes from the fact that both fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN constant. Fix it by returning back the variable 'force'. Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up ifconfig dummy0 down ip route list table local | grep dummy | grep host local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1 Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09tipc: linearize arriving NAME_DISTR and LINK_PROTO buffersJon Paul Maloy
[ Upstream commit 5cbb28a4bf65c7e4daa6c25b651fed8eb888c620 ] Testing of the new UDP bearer has revealed that reception of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE message buffers is not prepared for the case that those may be non-linear. We now linearize all such buffers before they are delivered up to the generic reception layer. In order for the commit to apply cleanly to 'net' and 'stable', we do the change in the function tipc_udp_recv() for now. Later, we will post a commit to 'net-next' moving the linearization to generic code, in tipc_named_rcv() and tipc_link_proto_rcv(). Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-29ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in ↵Sowmini Varadhan
rds_tcp_data_recv Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27openvswitch: Fix skb leak using IPv6 defragJoe Stringer
nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the reassembly is successful, expects the caller to free all of the original skbs using nf_ct_frag6_consume_orig(). This call was previously missing, meaning that the original fragments were never freed (with the exception of the last fragment to arrive). Fix this by ensuring that all original fragments except for the last fragment are freed via nf_ct_frag6_consume_orig(). The last fragment will be morphed into the head, so it must not be freed yet. Furthermore, retain the ->next pointer for the head after skb_morph(). Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27ipv6: Export nf_ct_frag6_consume_orig()Joe Stringer
This is needed in openvswitch to fix an skb leak in the next patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27openvswitch: Fix double-free on ip_defrag() errorsJoe Stringer
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is freed. When handle_fragments() passes this back up to do_execute_actions(), it will be freed again. Prevent this double free by never freeing the skb in do_execute_actions() for errors returned by ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27fib_trie: leaf_walk_rcu should not compute key if key is less than pn->keyAlexander Duyck
We were computing the child index in cases where the key value we were looking for was actually less than the base key of the tnode. As a result we were getting incorrect index values that would cause us to skip over some children. To fix this I have added a test that will force us to use child index 0 if the key we are looking for is less than the key of the current tnode. Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Brian Rak <brak@gameservers.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26ipv6: gre: support SIT encapsulationEric Dumazet
gre_gso_segment() chokes if SIT frames were aggregated by GRO engine. Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23net: sysctl: fix a kmemleak warningLi RongQing
the returned buffer of register_sysctl() is stored into net_header variable, but net_header is not used after, and compiler maybe optimise the variable out, and lead kmemleak reported the below warning comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s) hex dump (first 32 bytes): 90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8.............. 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffc00020f134>] create_object+0x10c/0x2a0 [<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0 [<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8 [<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0 [<ffffffc00028eef0>] register_sysctl+0x30/0x40 [<ffffffc00099c304>] net_sysctl_init+0x20/0x58 [<ffffffc000994dd8>] sock_init+0x10/0xb0 [<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8 [<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0 [<ffffffc00070ed6c>] kernel_init+0x1c/0xe8 [<ffffffc000083bfc>] ret_from_fork+0xc/0x50 [<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>> Before fix, the objdump result on ARM64: 0000000000000000 <net_sysctl_init>: 0: a9be7bfd stp x29, x30, [sp,#-32]! 4: 90000001 adrp x1, 0 <net_sysctl_init> 8: 90000000 adrp x0, 0 <net_sysctl_init> c: 910003fd mov x29, sp 10: 91000021 add x1, x1, #0x0 14: 91000000 add x0, x0, #0x0 18: a90153f3 stp x19, x20, [sp,#16] 1c: 12800174 mov w20, #0xfffffff4 // #-12 20: 94000000 bl 0 <register_sysctl> 24: b4000120 cbz x0, 48 <net_sysctl_init+0x48> 28: 90000013 adrp x19, 0 <net_sysctl_init> 2c: 91000273 add x19, x19, #0x0 30: 9101a260 add x0, x19, #0x68 34: 94000000 bl 0 <register_pernet_subsys> 38: 2a0003f4 mov w20, w0 3c: 35000060 cbnz w0, 48 <net_sysctl_init+0x48> 40: aa1303e0 mov x0, x19 44: 94000000 bl 0 <register_sysctl_root> 48: 2a1403e0 mov w0, w20 4c: a94153f3 ldp x19, x20, [sp,#16] 50: a8c27bfd ldp x29, x30, [sp],#32 54: d65f03c0 ret After: 0000000000000000 <net_sysctl_init>: 0: a9bd7bfd stp x29, x30, [sp,#-48]! 4: 90000000 adrp x0, 0 <net_sysctl_init> 8: 910003fd mov x29, sp c: a90153f3 stp x19, x20, [sp,#16] 10: 90000013 adrp x19, 0 <net_sysctl_init> 14: 91000000 add x0, x0, #0x0 18: 91000273 add x19, x19, #0x0 1c: f90013f5 str x21, [sp,#32] 20: aa1303e1 mov x1, x19 24: 12800175 mov w21, #0xfffffff4 // #-12 28: 94000000 bl 0 <register_sysctl> 2c: f9002260 str x0, [x19,#64] 30: b40001a0 cbz x0, 64 <net_sysctl_init+0x64> 34: 90000014 adrp x20, 0 <net_sysctl_init> 38: 91000294 add x20, x20, #0x0 3c: 9101a280 add x0, x20, #0x68 40: 94000000 bl 0 <register_pernet_subsys> 44: 2a0003f5 mov w21, w0 48: 35000080 cbnz w0, 58 <net_sysctl_init+0x58> 4c: aa1403e0 mov x0, x20 50: 94000000 bl 0 <register_sysctl_root> 54: 14000004 b 64 <net_sysctl_init+0x64> 58: f9402260 ldr x0, [x19,#64] 5c: 94000000 bl 0 <unregister_sysctl_table> 60: f900227f str xzr, [x19,#64] 64: 2a1503e0 mov w0, w21 68: f94013f5 ldr x21, [sp,#32] 6c: a94153f3 ldp x19, x20, [sp,#16] 70: a8c37bfd ldp x29, x30, [sp],#48 74: d65f03c0 ret Add the possible error handle to free the net_header to remove the kmemleak warning Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23af_key: fix two typosLi RongQing
Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23tcp: allow dctcp alpha to drop to zeroAndrew Shewmaker
If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less than 1 << dctcp_shift_g, then alpha may never reach zero. For example, given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha remains 15. The effect isn't noticeable in this case below cwnd=137, but could gradually drive uncongested flows with leftover alpha down to cwnd=137. A larger dctcp_shift_g would have a greater effect. This change causes alpha=15 to drop to 0 instead of being decrementing by 1 as it would when alpha=16. However, it requires one less conditional to implement since it doesn't have to guard against subtracting 1 from 0U. A decay of 15 is not unreasonable since an equal or greater amount occurs at alpha >= 240. Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-23ipv6: fix the incorrect return value of throw routelucien
The error condition -EAGAIN, which is signaled by throw routes, tells the rules framework to walk on searching for next matches. If the walk ends and we stop walking the rules with the result of a throw route we have to translate the error conditions to -ENETUNREACH. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22openvswitch: Fix egress tunnel info.Pravin B Shelar
While transitioning to netdev based vport we broke OVS feature which allows user to retrieve tunnel packet egress information for lwtunnel devices. Following patch fixes it by introducing ndo operation to get the tunnel egress info. Same ndo operation can be used for lwtunnel devices and compat ovs-tnl-vport devices. So after adding such device operation we can remove similar operation from ovs-vport. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22VSOCK: Fix lockdep issue.Jorgen Hansen
The recent fix for the vsock sock_put issue used the wrong initializer for the transport spin_lock causing an issue when running with lockdep checking. Testing: Verified fix on kernel with lockdep enabled. Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2015-10-22 1) Fix IPsec pre-encap fragmentation for GSO packets. From Herbert Xu. 2) Fix some header checks in _decode_session6. We skip the header informations if the data pointer points already behind the header in question for some protocols. This is because we call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull in this case. Skipping the header informations can lead to incorrect policy lookups. From Mathias Krause. 3) Allow to change the replay threshold and expiry timer of a state without having to set other attributes like replay counter and byte lifetime. Changing these other attributes may break the SA. From Michael Rossberg. 4) Fix pmtu discovery for local generated packets. We may fail dispatch to the inner address family. As a reault, the local error handler is not called and the mtu value is not reported back to userspace. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr setDavid Ahern
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if oif is given. Hajime reported that this change breaks the Mobile IPv6 use case that wants to force the message through one interface yet use the source address from another interface. Handle this case by only adding the flag if oif is set and saddr is not set. Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") Cc: Hajime Tazaki <thehajime@gmail.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22VSOCK: sock_put wasn't safe to call in interrupt contextJorgen Hansen
In the vsock vmci_transport driver, sock_put wasn't safe to call in interrupt context, since that may call the vsock destructor which in turn calls several functions that should only be called from process context. This change defers the callling of these functions to a worker thread. All these functions were deallocation of resources related to the transport itself. Furthermore, an unused callback was removed to simplify the cleanup. Multiple customers have been hitting this issue when using VMware tools on vSphere 2015. Also added a version to the vmci transport module (starting from 1.0.2.0-k since up until now it appears that this module was sharing version with vsock that is currently at 1.0.1.0-k). Reviewed-by: Aditya Asarwade <asarwade@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22netlink: fix locking around NETLINK_LIST_MEMBERSHIPSDavid Herrmann
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying the membership state to user-space. However, grabing the netlink table is effectively a write_lock_irq(), and as such we should not be triggering page-faults in the critical section. This can be easily reproduced by the following snippet: int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092)); This should work just fine, but currently triggers EFAULT and a possible WARN_ON below handle_mm_fault(). Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side lock. The write-lock was overkill in the first place, and the read-lock allows page-faults just fine. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Serialize nested ct actions if providedJoe Stringer
If userspace provides a ct action with no nested mark or label, then the storage for these fields is zeroed. Later when actions are requested, such zeroed fields are serialized even though userspace didn't originally specify them. Fix the behaviour by ensuring that no action is serialized in this case, and reject actions where userspace attempts to set these fields with mask=0. This should make netlink marshalling consistent across deserialization/reserialization. Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Mark connections new when not confirmed.Joe Stringer
New, related connections are marked as such as part of ovs_ct_lookup(), but they are not marked as "new" if the commit flag is used. Make this consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct). Reported-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Reject ct_state masks for unknown bitsJoe Stringer
Currently, 0-bits are generated in ct_state where the bit position is undefined, and matches are accepted on these bit-positions. If userspace requests to match the 0-value for this bit then it may expect only a subset of traffic to match this value, whereas currently all packets will have this bit set to 0. Fix this by rejecting such masks. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tcp: remove improper preemption check in tcp_xmit_probe_skb()Renato Westphal
Commit e520af48c7e5a introduced the following bug when setting the TCP_REPAIR sockoption: [ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164 [ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20 [ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1 [ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012 [ 2860.657054] ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002 [ 2860.657058] 0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18 [ 2860.657062] ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38 [ 2860.657065] Call Trace: [ 2860.657072] [<ffffffff8185d765>] dump_stack+0x4f/0x7b [ 2860.657075] [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0 [ 2860.657078] [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20 [ 2860.657082] [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100 [ 2860.657085] [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30 [ 2860.657089] [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830 [ 2860.657093] [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30 [ 2860.657097] [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20 [ 2860.657100] [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0 [ 2860.657104] [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75 Since tcp_xmit_probe_skb() can be called from process context, use NET_INC_STATS() instead of NET_INC_STATS_BH(). Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters") Signed-off-by: Renato Westphal <renatow@taghos.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains four Netfilter fixes for net, they are: 1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6. 2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking --accept-local, from Xin Long. 3) Wait for RCU grace period after dropping the pending packets in the nfqueue, from Florian Westphal. 4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: conditionally expand buffer headroom over udp tunnelJon Paul Maloy
In commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") we altered the packet retransmission function. Since then, when restransmitting packets, we create a clone of the original buffer using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of the area we want to have copied, but also the smallest possible TIPC packet size. The value of MIN_H_SIZE is 24. Unfortunately, __pskb_copy() also has the effect that the headroom of the cloned buffer takes the size MIN_H_SIZE. This is too small for carrying the packet over the UDP tunnel bearer, which requires a minimum headroom of 28 bytes. A change to just use pskb_copy() lets the clone inherit the original headroom of 80 bytes, but also assumes that the copied data area is of at least that size, something that is not always the case. So that is not a viable solution. We now fix this by adding a check for sufficient headroom in the transmit function of udp_media.c, and expanding it when necessary. Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: allow non-linear first fragment bufferJon Paul Maloy
The current code for message reassembly is erroneously assuming that the the first arriving fragment buffer always is linear, and then goes ahead resetting the fragment list of that buffer in anticipation of more arriving fragments. However, if the buffer already happens to be non-linear, we will inadvertently drop the already attached fragment list, and later on trig a BUG() in __pskb_pull_tail(). We see this happen when running fragmented TIPC multicast across UDP, something made possible since commit d0f91938bede ("tipc: add ip/udp media type") We fix this by not resetting the fragment list when the buffer is non- linear, and by initiatlizing our private fragment list tail pointer to the tail of the existing fragment list. Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21openvswitch: Allocate memory for ovs internal device stats.James Morse
"openvswitch: Remove vport stats" removed the per-vport statistics, in order to use the netdev's statistics fields. "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats to user-space, by using the provided netdev_ops to collate them - but ovs internal devices still use an unallocated dev->tstats field to count packets, which are no longer exported by this api. Allocate the dev->tstats field for ovs internal devices, and wire up ndo_get_stats64 with the original implementation of ovs_vport_get_stats(). On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs, unmasking a full-on panic on arm64: =============%<============== [<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch] [<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch] [<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch] [<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch] [<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch] [<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310 [<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4 [<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc [<ffffffc0005e8a94>] genl_rcv+0x38/0x50 [<ffffffc0005e76c0>] netlink_unicast+0x164/0x210 [<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368 [<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c [SNIP] Kernel panic - not syncing: Fatal exception in interrupt =============%<============== Fixes: 8c876639c985 ("openvswitch: Remove vport stats.") Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21net: Really fix vti6 with oif in dst lookupsDavid Ahern
6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6") is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21tipc: extend broadcast link window sizeJon Paul Maloy
The default fix broadcast window size is currently set to 20 packets. This is a very low value, set at a time when we were still testing on 10 Mb/s hubs, and a change to it is long overdue. Commit 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") revealed a problem with this low value. For messages of importance LOW, the backlog queue limit will be calculated to 30 packets, while a single, maximum sized message of 66000 bytes, carried across a 1500 MTU network consists of 46 packets. This leads to the following scenario (among others leading to the same situation): 1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26 packets to the backlog queue. 2: Msg 2 of 46 packets is attempted sent, but rejected because there is no more space in the backlog queue at this level. The sender is added to the wakeup queue with a "pending packets chain size" number of 46. 3: Some packets in the transmit queue are acked and released. We try to wake up the sender, but the pending size of 46 is bigger than the LOW wakeup limit of 30, so this doesn't happen. 5: Subsequent acks releases all the remaining buffers. Each time we test for the wakeup criteria and find that 46 still is larger than 30, even after both the transmit and the backlog queues are empty. 6: The sender is never woken up and given a chance to send its message. He is stuck. We could now loosen the wakeup criteria (used by link_prepare_wakeup()) to become equal to the send criteria (used by tipc_link_xmit()), i.e., by ignoring the "pending packets chain size" value altogether, or we can just increase the queue limits so that the criteria can be satisfied anyway. There are good reasons (potentially multiple waiting senders) to not opt for the former solution, so we choose the latter one. This commit fixes the problem by giving the broadcast link window a default value of 50 packets. We also introduce a new minimum link window size BCLINK_MIN_WIN of 32, which is enough to always avoid the described situation. Finally, in order to not break any existing users which may set the window explicitly, we enforce that the window is set to the new minimum value in case the user is trying to set it to anything lower. Fixes: 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>