summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2013-10-13ip6tnl: allow to use rtnl ops on fb tunnelNicolas Dichtel
[ Upstream commit bb8140947a247b9aa15652cc24dc555ebb0b64b0 ] rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via rtnl"), but I forget to assign rtnl ops to fb tunnels. Now that it is done, we must remove the explicit call to unregister_netdevice_queue(), because the fallback tunnel is added to the queue in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13sit: allow to use rtnl ops on fb tunnelNicolas Dichtel
[ Upstream commit 205983c43700ac3a81e7625273a3fa83cd2759b5 ] rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via rtnl"), but I forget to assign rtnl ops to fb tunnels. Now that it is done, we must remove the explicit call to unregister_netdevice_queue(), because the fallback tunnel is added to the queue in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_putSalam Noureddine
[ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ] It is possible for the timer handlers to run after the call to ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the handler function in order to do proper cleanup when the refcnt reaches 0. Otherwise, the refcnt can reach zero without the inet6_dev being destroyed and we end up leaking a reference to the net_device and see messages like the following, unregister_netdevice: waiting for eth0 to become free. Usage count = 1 Tested on linux-3.4.43. Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ipv6: gre: correct calculation of max_headroomHannes Frederic Sowa
[ Upstream commit 3da812d860755925da890e8c713f2d2e2d7b1bae ] gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header, so initialize max_headroom to zero. Otherwise the if (encap_limit >= 0) { max_headroom += 8; mtu -= 8; } increments an uninitialized variable before max_headroom was reset. Found with coverity: 728539 Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13IPv6 NAT: Do not drop DNATed 6to4/6rd packetsCatalin(ux) M. BOIE
[ Upstream commit 7df37ff33dc122f7bd0614d707939fe84322d264 ] When a router is doing DNAT for 6to4/6rd packets the latest anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for 6to4 and 6rd") will drop them because the IPv6 address embedded does not match the IPv4 destination. This patch will allow them to pass by testing if we have an address that matches on 6to4/6rd interface. I have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR. Also, log the dropped packets (with rate limit). Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa
[ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ] In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ] If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ip6_tunnels: raddr and laddr are inverted in nl msgDing Zhi
[ Upstream commit 0d2ede929f61783aebfb9228e4d32a0546ee4d23 ] IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted. Introduced by c075b13098b3 (ip6tnl: advertise tunnel param via rtnl). Signed-off-by: Ding Zhi <zhi.ding@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13net: fib: fib6_add: fix potential NULL pointer dereferenceDaniel Borkmann
[ Upstream commit ae7b4e1f213aa659aedf9c6ecad0bf5f0476e1e2 ] When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return with an error in fn = fib6_add_1(), then error codes are encoded into the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we write the error code into err and jump to out, hence enter the if(err) condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for: if (pn != fn && pn->leaf == rt) ... if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO)) ... Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn evaluates to true and causes a NULL-pointer dereference on further checks on pn. Fix it, by setting both NULL in error case, so that pn != fn already evaluates to false and no further dereference takes place. This was first correctly implemented in 4a287eba2 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag"), but the bug got later on introduced by 188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Lin Ming <mlin@ss.pku.edu.cn> Cc: Matti Vaittinen <matti.vaittinen@nsn.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-13ipv6/exthdrs: accept tlv which includes only paddingJiri Pirko
[ Upstream commit 8112b1fe071be01a28a774ed55909e6f4b29712d ] In rfc4942 and rfc2460 I cannot find anything which would implicate to drop packets which have only padding in tlv. Current behaviour breaks TAHI Test v6LC.1.2.6. Problem was intruduced in: 9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14net: ipv6: tcp: fix potential use after free in tcp_v6_do_rcvDaniel Borkmann
[ Upstream commit 3a1c756590633c0e86df606e5c618c190926a0df ] In tcp_v6_do_rcv() code, when processing pkt options, we soley work on our skb clone opt_skb that we've created earlier before entering tcp_rcv_established() on our way. However, only in condition ... if (np->rxopt.bits.rxtclass) np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb)); ... we work on skb itself. As we extract every other information out of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can already be released by tcp_rcv_established() earlier on. When we try to access it in ipv6_hdr(), we will dereference freed skb. [ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for IP_PKTOPTIONS") ] Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ipv6: fix null pointer dereference in __ip6addrlbl_addHannes Frederic Sowa
[ Upstream commit 639739b5e609a5074839bb22fc061b37baa06269 ] Commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a ("hlist: drop the node parameter from iterators") changed the behavior of hlist_for_each_entry_safe to leave the p argument NULL. Fix this up by tracking the last argument. Reported-by: Michele Baldessari <michele@acksyn.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ICMPv6: treat dest unreachable codes 5 and 6 as EACCES, not EPROTOJiri Bohac
[ Upstream commit 61e76b178dbe7145e8d6afa84bb4ccea71918994 ] RFC 4443 has defined two additional codes for ICMPv6 type 1 (destination unreachable) messages: 5 - Source address failed ingress/egress policy 6 - Reject route to destination Now they are treated as protocol error and icmpv6_err_convert() converts them to EPROTO. RFC 4443 says: "Codes 5 and 6 are more informative subsets of code 1." Treat codes 5 and 6 as code 1 (EACCES) Btw, connect() returning -EPROTO confuses firefox, so that fallback to other/IPv4 addresses does not work: https://bugzilla.mozilla.org/show_bug.cgi?id=910773 Signed-off-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ipv6: Don't depend on per socket memory for neighbour discovery messagesThomas Graf
[ Upstream commit 25a6e6b84fba601eff7c28d30da8ad7cfbef0d43 ] Allocating skbs when sending out neighbour discovery messages currently uses sock_alloc_send_skb() based on a per net namespace socket and thus share a socket wmem buffer space. If a netdevice is temporarily unable to transmit due to carrier loss or for other reasons, the queued up ndisc messages will cosnume all of the wmem space and will thus prevent from any more skbs to be allocated even for netdevices that are able to transmit packets. The number of neighbour discovery messages sent is very limited, use of alloc_skb() bypasses the socket wmem buffer size enforcement while the manual call to skb_set_owner_w() maintains the socket reference needed for the IPv6 output path. This patch has orginally been posted by Eric Dumazet in a modified form. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Stephen Warren <swarren@nvidia.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ipv6: drop packets with multiple fragmentation headersHannes Frederic Sowa
[ Upstream commit f46078cfcd77fa5165bf849f5e568a7ac5fa569c ] It is not allowed for an ipv6 packet to contain multiple fragmentation headers. So discard packets which were already reassembled by fragmentation logic and send back a parameter problem icmp. The updates for RFC 6980 will come in later, I have to do a bit more research here. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ipv6: remove max_addresses check from ipv6_create_tempaddrHannes Frederic Sowa
[ Upstream commit 4b08a8f1bd8cb4541c93ec170027b4d0782dab52 ] Because of the max_addresses check attackers were able to disable privacy extensions on an interface by creating enough autoconfigured addresses: <http://seclists.org/oss-sec/2012/q4/292> But the check is not actually needed: max_addresses protects the kernel to install too many ipv6 addresses on an interface and guards addrconf_prefix_rcv to install further addresses as soon as this limit is reached. We only generate temporary addresses in direct response of a new address showing up. As soon as we filled up the maximum number of addresses of an interface, we stop installing more addresses and thus also stop generating more temp addresses. Even if the attacker tries to generate a lot of temporary addresses by announcing a prefix and removing it again (lifetime == 0) we won't install more temp addresses, because the temporary addresses do count to the maximum number of addresses, thus we would stop installing new autoconfigured addresses when the limit is reached. This patch fixes CVE-2013-0343 (but other layer-2 attacks are still possible). Thanks to Ding Tianhong to bring this topic up again. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: George Kargiotakis <kargig@void.gr> Cc: P J P <ppandit@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14ipv6: don't stop backtracking in fib6_lookup_1 if subtree does not matchHannes Frederic Sowa
[ Upstream commit 3e3be275851bc6fc90bfdcd732cd95563acd982b ] In case a subtree did not match we currently stop backtracking and return NULL (root table from fib_lookup). This could yield in invalid routing table lookups when using subtrees. Instead continue to backtrack until a valid subtree or node is found and return this match. Also remove unneeded NULL check. Reported-by: Teco Boot <teco@inf-net.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Lamparter <equinox@diac24.net> Cc: <boutier@pps.univ-paris-diderot.fr> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanupHannes Frederic Sowa
[ Upstream commit 905a6f96a1b18e490a75f810d733ced93c39b0e5 ] Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer which leads to a panic (from Srivatsa S. Bhat): BUG: unable to handle kernel paging request at ffff882018552020 IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4 Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 Workqueue: netns cleanup_net task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000 RIP: 0010:[<ffffffffa0366b02>] [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP: 0018:ffff881039367bd8 EFLAGS: 00010286 RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200 RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68 RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040 R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0 Stack: ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0 Call Trace: [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6] [<ffffffff815bdecb>] inet_release+0xfb/0x220 [<ffffffff815bddf2>] ? inet_release+0x22/0x220 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6] [<ffffffff8151c1d9>] sock_release+0x29/0xa0 [<ffffffff81525520>] sk_release_kernel+0x30/0x70 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6] [<ffffffff8152fff9>] ops_exit_list+0x39/0x60 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0 [<ffffffff81075e3a>] process_one_work+0x1da/0x610 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 [<ffffffff81076390>] worker_thread+0x120/0x3a0 [<ffffffff81076270>] ? process_one_work+0x610/0x610 [<ffffffff8107da2e>] kthread+0xee/0x100 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65 RIP [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP <ffff881039367bd8> CR2: ffff882018552020 Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: only static routes qualify for equal cost multipathingHannes Frederic Sowa
[ Upstream commit 307f2fb95e9b96b3577916e73d92e104f8f26494 ] Static routes in this case are non-expiring routes which did not get configured by autoconf or by icmpv6 redirects. To make sure we actually get an ecmp route while searching for the first one in this fib6_node's leafs, also make sure it matches the ecmp route assumptions. v2: a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF already ensures that this route, even if added again without RTF_EXPIRES (in case of a RA announcement with infinite timeout), does not cause the rt6i_nsiblings logic to go wrong if a later RA updates the expiration time later. v3: a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so, because an pmtu event could update the RTF_EXPIRES flag and we would not count this route, if another route joins this set. We now filter only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that don't get changed after rt6_info construction. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: fix route selection if kernel is not compiled with CONFIG_IPV6_ROUTER_PREFHannes Frederic Sowa
[ Upstream commit afc154e978de1eb11c555bc8bcec1552f75ebc43 ] This is a follow-up patch to 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ("ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available"). Since the removal of rt->n in rt6_info we can end up with a dst == NULL in rt6_check_neigh. In case the kernel is not compiled with CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown NUD state but we must not avoid doing round robin selection on routes with the same target. So introduce and pass down a boolean ``do_rr'' to indicate when we should update rt->rr_ptr. As soon as no route is valid we do backtracking and do a lookup on a higher level in the fib trie. v2: a) Improved rt6_check_neigh logic (no need to create neighbour there) and documented return values. v3: a) Introduce enum rt6_nud_state to get rid of the magic numbers (thanks to David Miller). b) Update and shorten commit message a bit to actualy reflect the source. Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: in case of link failure remove route directly instead of letting it expireHannes Frederic Sowa
[ Upstream commit 1eb4f758286884e7566627164bca4c4a16952a83 ] We could end up expiring a route which is part of an ecmp route set. Doing so would invalidate the rt->rt6i_nsiblings calculations and could provoke the following panic: [ 80.144667] ------------[ cut here ]------------ [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! [ 80.145172] invalid opcode: 0000 [#1] SMP [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.145172] Stack: [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 [ 80.145172] Call Trace: [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP <ffff880118771798> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: rt6_check_neigh should successfully verify neigh if no NUD information ↵Hannes Frederic Sowa
are available [ Upstream commit 3630d40067a21d4dfbadc6002bb469ce26ac5d52 ] After the removal of rt->n we do not create a neighbour entry at route insertion time (rt6_bind_neighbour is gone). As long as no neighbour is created because of "useful traffic" we skip this routing entry because rt6_check_neigh cannot pick up a valid neighbour (neigh == NULL) and thus returns false. This change was introduced by commit 887c95cc1da53f66a5890fdeab13414613010097 ("ipv6: Complete neighbour entry removal from dst_entry.") To quote RFC4191: "If the host has no information about the router's reachability, then the host assumes the router is reachable." and also: "A host MUST NOT probe a router's reachability in the absence of useful traffic that the host would have sent to the router if it were reachable." So, just assume the router is reachable and let's rt6_probe do the rest. We don't need to create a neighbour on route insertion time. If we don't compile with CONFIG_IPV6_ROUTER_PREF (RFC4191 support) a neighbour is only valid if its nud_state is NUD_VALID. I did not find any references that we should probe the router on route insertion time via the other RFCs. So skip this route in that case. v2: a) use IS_ENABLED instead of #ifdefs (thanks to Sergei Shtylyov) Reported-by: Pierre Emeriaud <petrus.lt@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_sizeHannes Frederic Sowa
[ Upstream commit 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be ] If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track of this when appending the second frame on a corked socket. This results in the following splat: [37598.993962] ------------[ cut here ]------------ [37598.994008] kernel BUG at net/core/skbuff.c:2064! [37598.994008] invalid opcode: 0000 [#1] SMP [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm [37598.994008] snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video [37598.994008] CPU 0 [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG [37598.994008] RIP: 0010:[<ffffffff815443a5>] [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP: 0018:ffff88003670da18 EFLAGS: 00010202 [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0 [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00 [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040 [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8 [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000 [37598.994008] FS: 00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [37598.994008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0 [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0) [37598.994008] Stack: [37598.994008] ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8 [37598.994008] ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200 [37598.994008] 0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4 [37598.994008] Call Trace: [37598.994008] [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0 [37598.994008] [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0 [37598.994008] [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40 [37598.994008] [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10 [37598.994008] [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90 [37598.994008] [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0 [37598.994008] [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30 [37598.994008] [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0 [37598.994008] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0 [37598.994008] [<ffffffff8153d97d>] sys_sendto+0x12d/0x180 [37598.994008] [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0 [37598.994008] [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240 [37598.994008] [<ffffffff8166a7e7>] tracesys+0xdd/0xe2 [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 [37598.994008] RIP [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP <ffff88003670da18> [37599.007323] ---[ end trace d69f6a17f8ac8eee ]--- While there, also check if path mtu discovery is activated for this socket. The logic was adapted from ip6_append_data when first writing on the corked socket. This bug was introduced with commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ("ipv6: fix incorrect ipsec fragment"). v2: a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE. b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao feng, thanks!). c) Change mtu to unsigned int, else we get a warning about non-matching types because of the min()-macro type-check. Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa
pending data [ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ] We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6,mcast: always hold idev->lock before mca_lockAmerigo Wang
[ Upstream commit 8965779d2c0e6ab246c82a405236b1fb2adae6b2, with some bits from commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 ("ipv6: split duplicate address detection and router solicitation timer") to get the __ipv6_get_lladdr() used by this patch. ] dingtianhong reported the following deadlock detected by lockdep: ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.24.05-0.1-default #1 Not tainted ------------------------------------------------------- ksoftirqd/0/3 is trying to acquire lock: (&ndev->lock){+.+...}, at: [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 but task is already holding lock: (&mc->mca_lock){+.+...}, at: [<ffffffff8149d130>] mld_send_report+0x40/0x150 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&mc->mca_lock){+.+...}: [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f691a>] rt_spin_lock+0x4a/0x60 [<ffffffff8149e4bb>] igmp6_group_added+0x3b/0x120 [<ffffffff8149e5d8>] ipv6_mc_up+0x38/0x60 [<ffffffff81480a4d>] ipv6_find_idev+0x3d/0x80 [<ffffffff81483175>] addrconf_notify+0x3d5/0x4b0 [<ffffffff814fae3f>] notifier_call_chain+0x3f/0x80 [<ffffffff81073471>] raw_notifier_call_chain+0x11/0x20 [<ffffffff813d8722>] call_netdevice_notifiers+0x32/0x60 [<ffffffff813d92d4>] __dev_notify_flags+0x34/0x80 [<ffffffff813d9360>] dev_change_flags+0x40/0x70 [<ffffffff813ea627>] do_setlink+0x237/0x8a0 [<ffffffff813ebb6c>] rtnl_newlink+0x3ec/0x600 [<ffffffff813eb4d0>] rtnetlink_rcv_msg+0x160/0x310 [<ffffffff814040b9>] netlink_rcv_skb+0x89/0xb0 [<ffffffff813eb357>] rtnetlink_rcv+0x27/0x40 [<ffffffff81403e20>] netlink_unicast+0x140/0x180 [<ffffffff81404a9e>] netlink_sendmsg+0x33e/0x380 [<ffffffff813c4252>] sock_sendmsg+0x112/0x130 [<ffffffff813c537e>] __sys_sendmsg+0x44e/0x460 [<ffffffff813c5544>] sys_sendmsg+0x44/0x70 [<ffffffff814feab9>] system_call_fastpath+0x16/0x1b -> #0 (&ndev->lock){+.+...}: [<ffffffff810a798e>] check_prev_add+0x3de/0x440 [<ffffffff810a8027>] validate_chain+0x637/0x730 [<ffffffff810a8417>] __lock_acquire+0x2f7/0x500 [<ffffffff810a8734>] lock_acquire+0x114/0x150 [<ffffffff814f6c82>] rt_read_lock+0x42/0x60 [<ffffffff8147f804>] ipv6_get_lladdr+0x74/0x120 [<ffffffff8149b036>] mld_newpack+0xb6/0x160 [<ffffffff8149b18b>] add_grhead+0xab/0xc0 [<ffffffff8149d03b>] add_grec+0x3ab/0x460 [<ffffffff8149d14a>] mld_send_report+0x5a/0x150 [<ffffffff8149f99e>] igmp6_timer_handler+0x4e/0xb0 [<ffffffff8105705a>] call_timer_fn+0xca/0x1d0 [<ffffffff81057b9f>] run_timer_softirq+0x1df/0x2e0 [<ffffffff8104e8c7>] handle_pending_softirqs+0xf7/0x1f0 [<ffffffff8104ea3b>] __do_softirq_common+0x7b/0xf0 [<ffffffff8104f07f>] __thread_do_softirq+0x1af/0x210 [<ffffffff8104f1c1>] run_ksoftirqd+0xe1/0x1f0 [<ffffffff8106c7de>] kthread+0xae/0xc0 [<ffffffff814fff74>] kernel_thread_helper+0x4/0x10 actually we can just hold idev->lock before taking pmc->mca_lock, and avoid taking idev->lock again when iterating idev->addr_list, since the upper callers of mld_newpack() already take read_lock_bh(&idev->lock). Reported-by: dingtianhong <dingtianhong@huawei.com> Cc: dingtianhong <dingtianhong@huawei.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Chen Weilong <chenweilong@huawei.com> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-28ipv6: only apply anti-spoofing checks to not-pointopoint tunnelsHannes Frederic Sowa
[ Upstream commit 5c29fb12e8fb8a8105ea048cb160fd79a85a52bb ] Because of commit 218774dc341f219bfcf940304a081b121a0e8099 ("ipv6: add anti-spoofing checks for 6to4 and 6rd") the sit driver dropped packets for 2002::/16 destinations and sources even when configured to work as a tunnel with fixed endpoint. We may only apply the 6rd/6to4 anti-spoofing checks if the device is not in pointopoint mode. This was an oversight from me in the above commit, sorry. Thanks to Roman Mamedov for reporting this! Reported-by: Roman Mamedov <rm@romanrm.ru> Cc: David Miller <davem@davemloft.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-26ipv6: ip6_sk_dst_check() must not assume ipv6 dstEric Dumazet
It's possible to use AF_INET6 sockets and to connect to an IPv4 destination. After this, socket dst cache is a pointer to a rtable, not rt6_info. ip6_sk_dst_check() should check the socket dst cache is IPv6, or else various corruptions/crashes can happen. Dave Jones can reproduce immediate crash with trinity -q -l off -n -c sendmsg -c connect With help from Hannes Frederic Sowa Reported-by: Dave Jones <davej@redhat.com> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25ipv6: check return value of ipv6_get_lladdrHannes Frederic Sowa
We should check the return value of ipv6_get_lladdr in inet6_set_iftoken. A possible situation, which could leave ll_addr unassigned is, when the user removed her link-local address but a global scoped address was already set. In this case the interface would still be IF_READY and not dead. In that case the RS source address is some value from the stack. v2: Daniel Borkmann noted a small indent inconstancy; no semantic changes. Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-25ipv6: Process unicast packet with Router Alert by checking flag in skb.YOSHIFUJI Hideaki / 吉藤英明
Router Alert option is marked in skb. Previously, IP6CB(skb)->ra was set to positive value for such packets. Since commit dd3332bf ("ipv6: Store Router Alert option in IP6CB directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and the value of Router Alert option (in network byte order) is set to IP6CB(skb)->ra for such packets. Multicast forwarding path uses that flag and value, but unicast forwarding path does not use the flag and misuses IP6CB(skb)->ra value. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-24Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains five fixes for Netfilter/IPVS, they are: * A skb leak fix in fragmentation handling in case that helpers are in place, it occurs since the IPV6 NAT infrastructure, from Phil Oester. * Fix SCTP port mangling in ICMP packets for IPVS, from Julian Anastasov. * Fix event delivery in ctnetlink regarding the new connlabel infrastructure, from Florian Westphal. * Fix mangling in the SIP NAT helper, from Balazs Peter Odor. * Fix crash in ipt_ULOG introduced while adding netnamespace support, from Gao Feng. I'll take care of passing several of these patches to -stable once they hit Linus' tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-20netfilter: nf_conntrack_ipv6: Plug sk_buff leak in fragment handlingPhil Oester
In commit 4cdd3408 ("netfilter: nf_conntrack_ipv6: improve fragmentation handling"), an sk_buff leak was introduced when dealing with reassembled packets by grabbing a reference to the original skb instead of the reassembled skb. At this point, the leak only impacted conntracks with an associated helper. In commit 58a317f1 ("netfilter: ipv6: add IPv6 NAT support"), the bug was expanded to include all reassembled packets with unconfirmed conntracks. Fix this by grabbing a reference to the proper reassembled skb. This closes netfilter bugzilla #823. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-19ipv6: don't call addrconf_dst_alloc again when enable loGao feng
If we disable all of the net interfaces, and enable un-lo interface before lo interface, we already allocated the addrconf dst in ipv6_add_addr. So we shouldn't allocate it again when we enable lo interface. Otherwise the message below will be triggered. unregister_netdevice: waiting for sit1 to become free. Usage count = 1 This problem is introduced by commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-17ipv6: ndisc: fix ndisc_send_redirect writing to the wrong skbMatthias Schiffer
Since some refactoring in 5f5a011, ndisc_send_redirect called ndisc_fill_redirect_hdr_option on the wrong skb, leading to data corruption or in the worst case a panic when the skb_put failed. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04ipv6: assign rt6_info to inet6_ifaddr in init_loopbackGao feng
Commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" forgot to assign rt6_info to the inet6_ifaddr. When disable the net device, the rt6_info which allocated in init_loopback will not be destroied in __ipv6_ifa_notify. This will trigger the waring message below [23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1 Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31udp6: Fix udp fragmentation for tunnel traffic.Pravin B Shelar
udp6 over GRE tunnel does not work after to GRE tso changes. GRE tso handler passes inner packet but keeps track of outer header start in SKB_GSO_CB(skb)->mac_offset. udp6 fragment need to take care of outer header, which start at the mac_offset, while adding fragment header. This bug is introduced by commit 68c3316311 (GRE: Add TCP segmentation offload for GRE). Reported-by: Dmitry Kravkov <dkravkov@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Tested-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-31snmp6: remove IPSTATS_MIB_CSUMERRORSNicolas Dichtel
This stat is not relevant in IPv6, there is no checksum in IPv6 header. Just leave a comment to explain the hole. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-23netfilter: add nf_ipv6_ops hook to fix xt_addrtype with IPv6Florian Westphal
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812: [ ip6tables -m addrtype ] When I tried to use in the nat/PREROUTING it messes up the routing cache even if the rule didn't matched at all. [..] If I remove the --limit-iface-in from the non-working scenario, so just use the -m addrtype --dst-type LOCAL it works! This happens when LOCAL type matching is requested with --limit-iface-in, and the default ipv6 route is via the interface the packet we test arrived on. Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation creates an unwanted cached entry, and the packet won't make it to the real/expected destination. Silently ignoring --limit-iface-in makes the routing work but it breaks rule matching (--dst-type LOCAL with limit-iface-in is supposed to only match if the dst address is configured on the incoming interface; without --limit-iface-in it will match if the address is reachable via lo). The test should call ipv6_chk_addr() instead. However, this would add a link-time dependency on ipv6. There are two possible solutions: 1) Revert the commit that moved ipt_addrtype to xt_addrtype, and put ipv6 specific code into ip6t_addrtype. 2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions. While the former might seem preferable, Pablo pointed out that there are more xt modules with link-time dependeny issues regarding ipv6, so lets go for 2). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-05-18ipv6: fix possible crashes in ip6_cork_release()Eric Dumazet
commit 0178b695fd6b4 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com>
2013-05-11xfrm6: release dev before returning errorCong Wang
We forget to call dev_put() on error path in xfrm6_fill_dst(), its caller doesn't handle this. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11ipv6,gre: do not leak info to user-spaceAmerigo Wang
There is a hole in struct ip6_tnl_parm2, so we have to zero the struct on stack before copying it to user-space. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-11ipv6: do not clear pinet6 fieldEric Dumazet
We have seen multiple NULL dereferences in __inet6_lookup_established() After analysis, I found that inet6_sk() could be NULL while the check for sk_family == AF_INET6 was true. Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP and TCP stacks. Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash table, we no longer can clear pinet6 field. This patch extends logic used in commit fcbdf09d9652c891 ("net: fix nulls list corruptions in sk_prot_alloc") TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method to make sure we do not clear pinet6 field. At socket clone phase, we do not really care, as cloning the parent (non NULL) pinet6 is not adding a fatal race. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
2013-05-01proc: Supply a function to remove a proc entry by PDEDavid Howells
Supply a function (proc_remove()) to remove a proc entry (and any subtree rooted there) by proc_dir_entry pointer rather than by name and (optionally) root dir entry pointer. This allows us to eliminate all remaining pde->name accesses outside of procfs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Grant Likely <grant.likely@linaro.or> cc: linux-acpi@vger.kernel.org cc: openipmi-developer@lists.sourceforge.net cc: devicetree-discuss@lists.ozlabs.org cc: linux-pci@vger.kernel.org cc: netdev@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-29net: Add MIB counters for checksum errorsEric Dumazet
Add MIB counters for checksum errors in IP layer, and TCP/UDP/ICMP layers, to help diagnose problems. $ nstat -a | grep Csum IcmpInCsumErrors 72 0.0 TcpInCsumErrors 382 0.0 UdpInCsumErrors 463221 0.0 Icmp6InCsumErrors 75 0.0 Udp6InCsumErrors 173442 0.0 IpExtInCsumErrors 10884 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-29ipv6: Kill ipv6 dependency of icmpv6_send().Pravin B Shelar
Following patch adds icmp-registration module for ipv6. It allows ipv6 protocol to register icmp_sender which is used for sending ipv6 icmp msgs. This extra layer allows us to kill ipv6 dependency for sending icmp packets. This patch also fixes ip_tunnel compilation problem when ip_tunnel is statically compiled in kernel but ipv6 is module Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c drivers/net/ethernet/intel/igb/igb_main.c drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c include/net/scm.h net/batman-adv/routing.c net/ipv4/tcp_input.c The e{uid,gid} --> {uid,gid} credentials fix conflicted with the cleanup in net-next to now pass cred structs around. The be2net driver had a bug fix in 'net' that overlapped with the VLAN interface changes by Patrick McHardy in net-next. An IGB conflict existed because in 'net' the build_skb() support was reverted, and in 'net-next' there was a comment style fix within that code. Several batman-adv conflicts were resolved by making sure that all calls to batadv_is_my_mac() are changed to have a new bat_priv first argument. Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO rewrite in 'net-next', mostly overlapping changes. Thanks to Stephen Rothwell and Antonio Quartulli for help with several of these merge resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains a small batch of Netfilter updates for your net-next tree, they are: * Three patches that provide more accurate error reporting to user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing code and NAT, from Patrick McHardy. * Update copyright statements in Netfilter filters of Patrick McHardy, from himself. * Add Kconfig dependency on the raw/mangle tables to the rpfilter, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== If time allows, please consider pulling the following patchset contains two late Netfilter fixes, they are: * Skip broadcast/multicast locally generated traffic in the rpfilter, (closes netfilter bugzilla #814), from Florian Westphal. * Fix missing elements in the listing of ipset bitmap ip,mac set type with timeout support enabled, from Jozsef Kadlecsik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netfilter: xt_rpfilter: depend on raw or mangle tableFlorian Westphal
rpfilter is only valid in raw/mangle PREROUTING, i.e. RPFILTER=y|m is useless without raw or mangle table support. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-19netfilter: xt_rpfilter: skip locally generated broadcast/multicast, tooFlorian Westphal
Alex Efros reported rpfilter module doesn't match following packets: IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ] (netfilter bugzilla #814). Problem is that network stack arranges for the locally generated broadcasts to appear on the interface they were sent out, so the IFF_LOOPBACK check doesn't trigger. As -m rpfilter is restricted to PREROUTING, we can check for existing rtable instead, it catches locally-generated broad/multicast case, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>