summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2015-04-29ipv6: Don't reduce hop limit for an interfaceD.S. Ljungmark
[ Upstream commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a ] A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29tcp: prevent fetching dst twice in early demux codeMichal Kubeček
[ Upstream commit d0c294c53a771ae7e84506dfbd8c18c30f078735 ] On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux() struct dst_entry *dst = sk->sk_rx_dst; if (dst) dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie); to code reading sk->sk_rx_dst twice, once for the test and once for the argument of ip6_dst_check() (dst_check() is inline). This allows ip6_dst_check() to be called with null first argument, causing a crash. Protect sk->sk_rx_dst access by ACCESS_ONCE() both in IPv4 and IPv6 TCP early demux code. Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.") Fixes: c7109986db3c ("ipv6: Early TCP socket demux") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-18udp: only allow UFO for packets from SOCK_DGRAM socketsMichal Kubeček
[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ] If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-18ipv6: fix ipv6_cow_metrics for non DST_HOST caseMartin KaFai Lau
[ Upstream commit 3b4711757d7903ab6fa88a9e7ab8901b8227da60 ] ipv6_cow_metrics() currently assumes only DST_HOST routes require dynamic metrics allocation from inetpeer. The assumption breaks when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric. Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set() is called after the route is created. This patch creates the metrics array (by calling dst_cow_metrics_generic) in ipv6_cow_metrics(). Test: radvd.conf: interface qemubr0 { AdvLinkMTU 1300; AdvCurHopLimit 30; prefix fd00:face:face:face::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; Before: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec After: [root@qemu1 ~]# ip -6 r show | egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300 fe80::/64 dev eth0 proto kernel metric 256 mtu 1300 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30 Fixes: 8e2ec639173f325 (ipv6: don't use inetpeer to store metrics for routes.) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos tooHannes Frederic Sowa
[ Upstream commit 6e9e16e6143b725662e47026a1d0f270721cdd24 ] Lubomir Rintel reported that during replacing a route the interface reference counter isn't correctly decremented. To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>: | [root@rhel7-5 lkundrak]# sh -x lal | + ip link add dev0 type dummy | + ip link set dev0 up | + ip link add dev1 type dummy | + ip link set dev1 up | + ip addr add 2001:db8:8086::2/64 dev dev0 | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20 | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10 | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20 | + ip link del dev0 type dummy | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 | | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ... | kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2 During replacement of a rt6_info we must walk all parent nodes and check if the to be replaced rt6_info got propagated. If so, replace it with an alive one. Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag") Reported-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Tested-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26ipv6: stop sending PTB packets for MTU < 1280Hagen Paul Pfeifer
[ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ] Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-26ip: zero sockaddr returned on error queueWillem de Bruijn
[ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ] The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That structure is defined and allocated on the stack as struct { struct sock_extended_err ee; struct sockaddr_in(6) offender; } errhdr; The second part is only initialized for certain SO_EE_ORIGIN values. Always initialize it completely. An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that would return uninitialized bytes. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Also verified that there is no padding between errhdr.ee and errhdr.offender that could leak additional kernel data. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16ipv6: gre: fix wrong skb->protocol in WCCPYuri Chislov
[ Upstream commit be6572fdb1bfbe23b2624d477de50af50b02f5d6 ] When using GRE redirection in WCCP, it sets the wrong skb->protocol, that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Cc: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com> Tested-by: Yuri Chislov <yuri.chislov@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21gre6: Move the setting of dev->iflink into the ndo_init functions.Steffen Klassert
[ Upstream commit f03eb128e3f4276f46442d14f3b8f864f3775821 ] Otherwise it gets overwritten by register_netdev(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.Steffen Klassert
[ Upstream commit 6c6151daaf2d8dc2046d9926539feed5f66bf74e ] ip6_tnl_dev_init() sets the dev->iflink via a call to ip6_tnl_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the ndo_init function. Then ip6_tnl_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15ipv6: reallocate addrconf router for ipv6 address when lo device upGao feng
[ Upstream commit 33d99113b1102c2d2f8603b9ba72d89d915c13f5 ] commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" allocates addrconf router for ipv6 address when lo device up. but commit a881ae1f625c599b460cc8f8a7fcb1c438f699ad "ipv6:don't call addrconf_dst_alloc again when enable lo" breaks this behavior. Since the addrconf router is moved to the garbage list when lo device down, we should release this router and rellocate a new one for ipv6 address when lo device up. This patch solves bug 67951 on bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=67951 change from v1: use ip6_rt_put to repleace ip6_del_rt, thanks Hannes! change code style, suggested by Sergei. CC: Sabrina Dubroca <sd@queasysnail.net> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15ip6_gre: fix flowi6_proto value in xmit pathNicolas Dichtel
[ Upstream commit 3be07244b7337760a3269d56b2f4a63e72218648 ] In xmit path, we build a flowi6 which will be used for the output route lookup. We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the protocol should be IPPROTO_GRE. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()Neal Cardwell
[ Upstream commit 4fab9071950c2021d846e18351e0f46a1cffd67b ] Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Fixes: 563d34d057862 ("tcp: dont drop MTU reduction indications") Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15sit: Fix ipip6_tunnel_lookup device matching criteriaShmulik Ladkani
[ Upstream commit bc8fc7b8f825ef17a0fb9e68c18ce94fa66ab337 ] As of 4fddbf5d78 ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14ip: make IP identifiers less predictableEric Dumazet
[ Upstream commit 04ca6973f7c1a0d8537f2d9906a0cf8e69886d75 ] In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-14inetpeer: get rid of ip_id_countEric Dumazet
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ] Ideally, we would need to generate IP ID using a per destination IP generator. linux kernels used inet_peer cache for this purpose, but this had a huge cost on servers disabling MTU discovery. 1) each inet_peer struct consumes 192 bytes 2) inetpeer cache uses a binary tree of inet_peer structs, with a nominal size of ~66000 elements under load. 3) lookups in this tree are hitting a lot of cache lines, as tree depth is about 20. 4) If server deals with many tcp flows, we have a high probability of not finding the inet_peer, allocating a fresh one, inserting it in the tree with same initial ip_id_count, (cf secure_ip_id()) 5) We garbage collect inet_peer aggressively. IP ID generation do not have to be 'perfect' Goal is trying to avoid duplicates in a short period of time, so that reassembly units have a chance to complete reassembly of fragments belonging to one message before receiving other fragments with a recycled ID. We simply use an array of generators, and a Jenkin hash using the dst IP as a key. ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it belongs (it is only used from this file) secure_ip_id() and secure_ipv6_id() no longer are needed. Rename ip_select_ident_more() to ip_select_ident_segs() to avoid unnecessary decrement/increment of the number of segments. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26ipip, sit: fix ipv4_{update_pmtu,redirect} callsDmitry Popov
[ Upstream commit 2346829e641b804ece9ac9298136b56d9567c278 ] ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a tunnel netdevice). It caused wrong route lookup and failure of pmtu update or redirect. We should use the same ifindex that we use in ip_route_output_* in *tunnel_xmit code. It is t->parms.link . Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: fix inet_getid() and ipv6_select_ident() bugsEric Dumazet
[ Upstream commit 39c36094d78c39e038c1e499b2364e13bce36f54 ] I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery is disabled. Note how GSO/TSO packets do not have monotonically incrementing ID. 06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396) 06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212) 06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972) 06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292) 06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764) It appears I introduced this bug in linux-3.1. inet_getid() must return the old value of peer->ip_id_count, not the new one. Lets revert this part, and remove the prevention of a null identification field in IPv6 Fragment Extension Header, which is dubious and not even done properly. Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: tunnels - enable module autoloadingTom Gundersen
[ Upstream commit f98f89a0104454f35a62d681683c844f6dbf4043 ] Enable the module alias hookup to allow tunnel modules to be autoloaded on demand. This is in line with how most other netdev kinds work, and will allow userspace to create tunnels without having CAP_SYS_MODULE. Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11netfilter: Fix potential use after free in ip6_route_me_harder()Sergey Popovich
commit a8951d5814e1373807a94f79f7ccec7041325470 upstream. Dst is released one line before we access it again with dst->error. Fixes: 58e35d147128 netfilter: ipv6: propagate routing errors from ip6_route_me_harder() Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ip6_tunnel: fix potential NULL pointer dereferenceSusant Sahani
[ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ] The function ip6_tnl_validate assumes that the rtnl attribute IFLA_IPTUN_PROTO always be filled . If this attribute is not filled by the userspace application kernel get crashed with NULL pointer dereference. This patch fixes the potential kernel crash when IFLA_IPTUN_PROTO is missing . Signed-off-by: Susant Sahani <susant@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30net: ipv6: send pkttoobig immediately if orig frag size > mtuFlorian Westphal
[ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ] If conntrack defragments incoming ipv6 frags it stores largest original frag size in ip6cb and sets ->local_df. We must thus first test the largest original frag size vs. mtu, and not vice versa. Without this patch PKTTOOBIG is still generated in ip6_fragment() later in the stack, but 1) IPSTATS_MIB_INTOOBIGERRORS won't increment 2) packet did (needlessly) traverse netfilter postrouting hook. Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv6: fib: fix fib dump restartKumar Sundararajan
[ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ] When the ipv6 fib changes during a table dump, the walk is restarted and the number of nodes dumped are skipped. But the existing code doesn't advance to the next node after a node is skipped. This can cause the dump to loop or produce lots of duplicates when the fib is modified during the dump. This change advances the walk to the next node if the current node is skipped after a restart. Signed-off-by: Kumar Sundararajan <kumar@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ip6_gre: don't allow to remove the fb_tunnel_devNicolas Dichtel
[ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ] It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but this is unsafe, the module always supposes that this device exists. For example, ip6gre_tunnel_lookup() may use it unconditionally. Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we let ip6gre_destroy_tunnels() do the job). Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6"). CC: Dmitry Kozlov <xeb@mail.ru> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30ipv6: Limit mtu to 65575 bytesEric Dumazet
[ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ] Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-30netfilter: Can't fail and free after table replacementThomas Graf
commit c58dd2dd443c26d856a168db108a0cd11c285bf3 upstream. All xtables variants suffer from the defect that the copy_to_user() to copy the counters to user memory may fail after the table has already been exchanged and thus exposed. Return an error at this point will result in freeing the already exposed table. Any subsequent packet processing will result in a kernel panic. We can't copy the counters before exposing the new tables as we want provide the counter state after the old table has been unhooked. Therefore convert this into a silent error. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ipv6: some ipv6 statistic counters failed to disable bhHannes Frederic Sowa
[ Upstream commit 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 ] After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue") some counters are now updated in process context and thus need to disable bh before doing so, otherwise deadlocks can happen on 32-bit archs. Fabio Estevam noticed this while while mounting a NFS volume on an ARM board. As a compensation for missing this I looked after the other *_STATS_BH and found three other calls which need updating: 1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling) 2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ... (only in case of icmp protocol with raw sockets in error handling) 3) ping6_v6_sendmsg (error handling) Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue") Reported-by: Fabio Estevam <festevam@gmail.com> Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ip6mr: fix mfc notification flagsNicolas Dichtel
[ Upstream commit f518338b16038beeb73e155e60d0f70beb9379f4 ] Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the function ip6mr_fill_mroute() to notify mfc events. But this function was used only for dump and thus was always setting the flag NLM_F_MULTI, which is wrong in case of a single notification. Libraries like libnl will wait forever for NLMSG_DONE. CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properlylucien
[ Upstream commit e367c2d03dba4c9bcafad24688fadb79dd95b218 ] In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as transport),the ipsec header need to be added in the first fragment, so the mtu will decrease to reserve space for it, then the second fragment come, the mtu should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74 said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use *mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway equal with the first fragment's. and cannot turn back. when I test through ping6 -c1 -s5000 $ip (mtu=1280): ...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232 ...frag (1232|1216) ...frag (2448|1216) ...frag (3664|1216) ...frag (4880|164) which should be: ...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232 ...frag (1232|1232) ...frag (2464|1232) ...frag (3696|1232) ...frag (4928|116) so delete the min() when change back the mtu. Signed-off-by: Xin Long <lucien.xin@gmail.com> Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ipv6: Avoid unnecessary temporary addresses being generatedHeiner Kallweit
[ Upstream commit ecab67015ef6e3f3635551dcc9971cf363cc1cd5 ] tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore age needs to be added to the condition. Age calculation in ipv6_create_tempaddr is different from the one in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS. This can cause age in ipv6_create_tempaddr to be less than the one in addrconf_verify and therefore unnecessary temporary address to be generated. Use age calculation as in addrconf_modify to avoid this. Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ipv6: don't set DST_NOCOUNT for remotely added routesSabrina Dubroca
[ Upstream commit c88507fbad8055297c1d1e21e599f46960cbee39 ] DST_NOCOUNT should only be used if an authorized user adds routes locally. In case of routes which are added on behalf of router advertisments this flag must not get used as it allows an unlimited number of routes getting added remotely. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14ipv6: Fix exthdrs offload registration.Anton Nayshtut
[ Upstream commit d2d273ffabd315eecefce21a4391d44b6e156b73 ] Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds). This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS header. The issue detected and the fix verified by running MS HCK Offload LSO test on top of QEMU Windows guests, as this test sends IPv6 packets with IPPROTO_DSTOPTS. Signed-off-by: Anton Nayshtut <anton@swortex.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23ipv6: ipv6_find_hdr restore prev functionalityHans Schillstrom
[ Upstream commit accfe0e356327da5bd53da8852b93fc22de9b5fc ] The commit 9195bb8e381d81d5a315f911904cdf0cfcc919b8 ("ipv6: improve ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr(). When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr() returns -ENOENT when it's found, not the header as expected. A part of IPVS is broken and possible also nft_exthdr_eval(). When target is -1 which it is most cases, it works. This patch exits the do while loop if the specific header is found so the nexthdr could be returned as expected. Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> CC:Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23ipv6: reuse ip6_frag_id from ip6_ufo_append_dataHannes Frederic Sowa
[ Upstream commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 ] Currently we generate a new fragmentation id on UFO segmentation. It is pretty hairy to identify the correct net namespace and dst there. Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst available at all. This causes unreliable or very predictable ipv6 fragmentation id generation while segmentation. Luckily we already have pregenerated the ip6_frag_id in ip6_ufo_append_data and can use it here. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-06net: ip, ipv6: handle gso skbs in forwarding pathFlorian Westphal
commit fe6cc55f3a9a053482a76f5a6b2257cee51b4663 upstream. Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ip6tnl: fix double free of fb_tnl_dev on exitNicolas Dichtel
[ No relevant upstream commit. ] This problem was fixed upstream by commit 1e9f3d6f1c40 ("ip6tnl: fix use after free of fb_tnl_dev"). The upstream patch depends on upstream commit 0bd8762824e7 ("ip6tnl: add x-netns support"), which was not backported into 3.10 branch. First, explain the problem: when the ip6_tunnel module is unloaded, ip6_tunnel_cleanup() is called. rmmod ip6_tunnel => ip6_tunnel_cleanup() => rtnl_link_unregister() => __rtnl_kill_links() => for_each_netdev(net, dev) { if (dev->rtnl_link_ops == ops) ops->dellink(dev, &list_kill); } At this point, the FB device is deleted (and all ip6tnl tunnels). => unregister_pernet_device() => unregister_pernet_operations() => ops_exit_list() => ip6_tnl_exit_net() => ip6_tnl_destroy_tunnels() => t = rtnl_dereference(ip6n->tnls_wc[0]); unregister_netdevice_queue(t->dev, &list); We delete the FB device a second time here! The previous fix removes these lines, which fix this double free. But the patch introduces a memory leak when a netns is destroyed, because the FB device is never deleted. By adding an rtnl ops which delete all ip6tnl device excepting the FB device, we can keep this exlicit removal in ip6_tnl_destroy_tunnels(). CC: Steven Rostedt <rostedt@goodmis.org> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06Revert "ip6tnl: fix use after free of fb_tnl_dev"Nicolas Dichtel
[ No relevant upstream commit. ] This reverts commit 22c3ec552c29cf4bd4a75566088950fe57d860c4. This patch is not the right fix, it introduces a memory leak when a netns is destroyed (the FB device is never deleted). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06sit: fix double free of fb_tunnel_dev on exitNicolas Dichtel
[ No relevant upstream commit. ] This problem was fixed upstream by commit 9434266f2c64 ("sit: fix use after free of fb_tunnel_dev"). The upstream patch depends on upstream commit 5e6700b3bf98 ("sit: add support of x-netns"), which was not backported into 3.10 branch. First, explain the problem: when the sit module is unloaded, sit_cleanup() is called. rmmod sit => sit_cleanup() => rtnl_link_unregister() => __rtnl_kill_links() => for_each_netdev(net, dev) { if (dev->rtnl_link_ops == ops) ops->dellink(dev, &list_kill); } At this point, the FB device is deleted (and all sit tunnels). => unregister_pernet_device() => unregister_pernet_operations() => ops_exit_list() => sit_exit_net() => sit_destroy_tunnels() In this function, no tunnel is found. => unregister_netdevice_queue(sitn->fb_tunnel_dev, &list); We delete the FB device a second time here! Because we cannot simply remove the second deletion (sit_exit_net() must remove the FB device when a netns is deleted), we add an rtnl ops which delete all sit device excepting the FB device and thus we can keep the explicit deletion in sit_exit_net(). CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Willem de Bruijn <willemb@google.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net: Fix memory leak if TPROXY used with TCP early demuxHolger Eitzenberger
[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ] I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9 [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5 [<ffffffff812702cf>] sk_clone_lock+0x14/0x283 [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b [<ffffffff8129a893>] netlink_broadcast+0x14/0x16 [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3 [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0 [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55 [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725 [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154 [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514 [<ffffffff8127aa77>] process_backlog+0xee/0x1c5 [<ffffffff8127c949>] net_rx_action+0xa7/0x200 [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net: avoid reference counter overflows on fib_rules in multicast forwardingHannes Frederic Sowa
[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ] Bob Falken reported that after 4G packets, multicast forwarding stopped working. This was because of a rule reference counter overflow which freed the rule as soon as the overflow happend. This patch solves this by adding the FIB_LOOKUP_NOREF flag to fib_rules_lookup calls. This is safe even from non-rcu locked sections as in this case the flag only implies not taking a reference to the rule, which we don't need at all. Rules only hold references to the namespace, which are guaranteed to be available during the call of the non-rcu protected function reg_vif_xmit because of the interface reference which itself holds a reference to the net namespace. Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables") Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15ipv6: always set the new created dst's from in ip6_rt_copyLi RongQing
[ Upstream commit 24f5b855e17df7e355eacd6c4a12cc4d6a6c9ff0 ] ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15ipv6: fix illegal mac_header comparison on 32bitHannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15ipv6: don't count addrconf generated routes against gc limitHannes Frederic Sowa
[ Upstream commit a3300ef4bbb1f1e33ff0400e1e6cf7733d988f4f ] Brett Ciphery reported that new ipv6 addresses failed to get installed because the addrconf generated dsts where counted against the dst gc limit. We don't need to count those routes like we currently don't count administratively added routes. Because the max_addresses check enforces a limit on unbounded address generation first in case someone plays with router advertisments, we are still safe here. Reported-by: Brett Ciphery <brett.ciphery@windriver.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15IPv6: Fixed support for blackhole and prohibit routesKamala R
[ Upstream commit 7150aede5dd241539686e17d9592f5ebd28a2cda ] The behaviour of blackhole and prohibit routes has been corrected by setting the input and output pointers of the dst variable appropriately. For blackhole routes, they are set to dst_discard and to ip6_pkt_discard and ip6_pkt_discard_out respectively for prohibit routes. ipv6: ip6_pkt_prohibit(_out) should not depend on CONFIG_IPV6_MULTIPLE_TABLES We need ip6_pkt_prohibit(_out) available without CONFIG_IPV6_MULTIPLE_TABLES Signed-off-by: Kamala R <kamala@aristanetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20ip6tnl: fix use after free of fb_tnl_devNicolas Dichtel
The upstream commit bb8140947a24 ("ip6tnl: allow to use rtnl ops on fb tunnel") (backported into linux-3.10.y) left a bug which was fixed upstream by commit 1e9f3d6f1c40 ("ip6tnl: fix use after free of fb_tnl_dev"). The problem is a bit different in linux-3.10.y, because there is no x-netns support (upstream commit 0bd8762824e7 ("ip6tnl: add x-netns support")). When ip6_tunnel.ko is unloaded, FB device is deleted by rtnl_link_unregister() and then we try to delete it again in ip6_tnl_destroy_tunnels(). This patch removes the second deletion. Reported-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08ipv6: fix possible seqlock deadlock in ip6_finish_output2Hannes Frederic Sowa
[ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ] IPv6 stats are 64 bits and thus are protected with a seqlock. By not disabling bottom-half we could deadlock here if we don't disable bh and a softirq reentrantly updates the same mib. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08netfilter: push reasm skb through instead of original frag skbsJiri Pirko
[ Upstream commit 6aafeef03b9d9ecf255f3a80ed85ee070260e1ae ] Pushing original fragments through causes several problems. For example for matching, frags may not be matched correctly. Take following example: <example> On HOSTA do: ip6tables -I INPUT -p icmpv6 -j DROP ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT and on HOSTB you do: ping6 HOSTA -s2000 (MTU is 1500) Incoming echo requests will be filtered out on HOSTA. This issue does not occur with smaller packets than MTU (where fragmentation does not happen) </example> As was discussed previously, the only correct solution seems to be to use reassembled skb instead of separete frags. Doing this has positive side effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams dances in ipvs and conntrack can be removed. Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c entirely and use code in net/ipv6/reassembly.c instead. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08ip6_output: fragment outgoing reassembled skb properlyJiri Pirko
[ Upstream commit 9037c3579a277f3a23ba476664629fda8c35f7c4 ] If reassembled packet would fit into outdev MTU, it is not fragmented according the original frag size and it is send as single big packet. The second case is if skb is gso. In that case fragmentation does not happen according to the original frag size. This patch fixes these. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08ipv6: fix leaking uninitialized port number of offender sockaddrHannes Frederic Sowa
[ Upstream commit 1fa4c710b6fe7b0aac9907240291b6fe6aafc3b8 ] Offenders don't have port numbers, so set it to 0. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu ↵Hannes Frederic Sowa
functions [ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ] Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") conditionally updated addr_len if the msg_name is written to. The recv_error and rxpmtu functions relied on the recvmsg functions to set up addr_len before. As this does not happen any more we have to pass addr_len to those functions as well and set it to the size of the corresponding sockaddr length. This broke traceroute and such. Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") Reported-by: Brad Spengler <spender@grsecurity.net> Reported-by: Tom Labanowski Cc: mpb <mpb.mail@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>