summaryrefslogtreecommitdiff
path: root/net/openvswitch
AgeCommit message (Collapse)Author
2019-12-21openvswitch: support asymmetric conntrackAaron Conole
[ Upstream commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 ] The openvswitch module shares a common conntrack and NAT infrastructure exposed via netfilter. It's possible that a packet needs both SNAT and DNAT manipulation, due to e.g. tuple collision. Netfilter can support this because it runs through the NAT table twice - once on ingress and again after egress. The openvswitch module doesn't have such capability. Like netfilter hook infrastructure, we should run through NAT twice to keep the symmetry. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05openvswitch: remove another BUG_ON()Paolo Abeni
[ Upstream commit 8a574f86652a4540a2433946ba826ccb87f398cc ] If we can't build the flow del notification, we can simply delete the flow, no need to crash the kernel. Still keep a WARN_ON to preserve debuggability. Note: the BUG_ON() predates the Fixes tag, but this change can be applied only after the mentioned commit. v1 -> v2: - do not leak an skb on error Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()Paolo Abeni
[ Upstream commit 8ffeb03fbba3b599690b361467bfd2373e8c450f ] All the callers of ovs_flow_cmd_build_info() already deal with error return code correctly, so we can handle the error condition in a more gracefull way. Still dump a warning to preserve debuggability. v1 -> v2: - clarify the commit message - clean the skb and report the error (DaveM) Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-05openvswitch: fix flow command message sizePaolo Abeni
[ Upstream commit 4e81c0b3fa93d07653e2415fa71656b080a112fd ] When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant flow has no UFID, we can exceed the computed size, as ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY attribute. Take the above in account when computing the flow command message size. Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") Reported-by: Qi Jun Ding <qding@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-25net: ovs: fix return type of ndo_start_xmit functionYueHaibing
[ Upstream commit eddf11e18dff0e8671e06ce54e64cfc843303ab9 ] The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, so make sure the implementation in this driver has returns 'netdev_tx_t' value, and change the function return type to netdev_tx_t. Found by coccinelle. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05openvswitch: change type of UPCALL_PID attribute to NLA_UNSPECLi RongQing
[ Upstream commit ea8564c865299815095bebeb4b25bef474218e4c ] userspace openvswitch patch "(dpif-linux: Implement the API functions to allow multiple handler threads read upcall)" changes its type from U32 to UNSPEC, but leave the kernel unchanged and after kernel 6e237d099fac "(netlink: Relax attr validation for fixed length types)", this bug is exposed by the below warning [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's") Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04net: openvswitch: fix csum updates for MPLS actionsJohn Hurley
[ Upstream commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615 ] Skbs may have their checksum value populated by HW. If this is a checksum calculated over the entire packet then the CHECKSUM_COMPLETE field is marked. Changes to the data pointer on the skb throughout the network stack still try to maintain this complete csum value if it is required through functions such as skb_postpush_rcsum. The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when changes are made to packet data without a push or a pull. This occurs when the ethertype of the MAC header is changed or when MPLS lse fields are modified. The modification is carried out using the csum_partial function to get the csum of a buffer and add it into the larger checksum. The buffer is an inversion of the data to be removed followed by the new data. Because the csum is calculated over 16 bits and these values align with 16 bits, the effect is the removal of the old value from the CHECKSUM_COMPLETE and addition of the new value. However, the csum fed into the function and the outcome of the calculation are also inverted. This would only make sense if it was the new value rather than the old that was inverted in the input buffer. Fix the issue by removing the bit inverts in the csum_partial calculation. The bug was verified and the fix tested by comparing the folded value of the updated CHECKSUM_COMPLETE value with the folded value of a full software checksum calculation (reset skb->csum to 0 and run skb_checksum_complete(skb)). Prior to the fix the outcomes differed but after they produce the same result. Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Fixes: bc7cc5999fd3 ("openvswitch: update checksum in {push,pop}_mpls") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-02ipv6: remove dependency of nf_defrag_ipv6 on ipv6 moduleFlorian Westphal
[ Upstream commit 70b095c84326640eeacfd69a411db8fc36e8ab1a ] IPV6=m DEFRAG_IPV6=m CONNTRACK=y yields: net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get': net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable' net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6' Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params ip6_frag_init and ip6_expire_frag_queue so it would be needed to force IPV6=y too. This patch gets rid of the 'followup linker error' by removing the dependency of ipv6.ko symbols from netfilter ipv6 defrag. Shared code is placed into a header, then used from both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-17openvswitch: fix flow actions reallocationAndrea Righi
[ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ] The flow action buffer can be resized if it's not big enough to contain all the requested flow actions. However, this resize doesn't take into account the new requested size, the buffer is only increased by a factor of 2x. This might be not enough to contain the new data, causing a buffer overflow, for example: [ 42.044472] ============================================================================= [ 42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten [ 42.046415] ----------------------------------------------------------------------------- [ 42.047715] Disabling lock debugging due to kernel taint [ 42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc [ 42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101 [ 42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb [ 42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc ........ [ 42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00 kkkkkkkk....l... [ 42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6 l...........x... [ 42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00 ............... [ 42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 42.059061] Redzone 8bf2c4a5: 00 00 00 00 .... [ 42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Fix by making sure the new buffer is properly resized to contain all the requested data. BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-31openvswitch: Avoid OOB read when parsing flow nlattrsRoss Lagerwall
[ Upstream commit 04a4af334b971814eedf4e4a413343ad3287d9a9 ] For nested and variable attributes, the expected length of an attribute is not known and marked by a negative number. This results in an OOB read when the expected length is later used to check if the attribute is all zeros. Fix this by using the actual length of the attribute rather than the expected length. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30openvswitch: Remove padding from packet before L3+ conntrack processingEd Swierk
[ Upstream commit 9382fe71c0058465e942a633869629929102843d ] IPv4 and IPv6 packets may arrive with lower-layer padding that is not included in the L3 length. For example, a short IPv4 packet may have up to 6 bytes of padding following the IP payload when received on an Ethernet device with a minimum packet length of 64 bytes. Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(), and help() in nf_conntrack_ftp) assume skb->len reflects the length of the L3 header and payload, rather than referring back to ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by lower-layer padding. In the normal IPv4 receive path, ip_rcv() trims the packet to ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly in the br_netfilter receive path, br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3 length before invoking netfilter hooks. Currently in the OVS conntrack receive path, ovs_ct_execute() pulls the skb to the L3 header but does not trim it to the L3 length before calling nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp encounters a packet with lower-layer padding, nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log message. While extra zero bytes don't affect the checksum, the length in the IP pseudoheader does. That length is based on skb->len, and without trimming, it doesn't match the length the sender used when computing the checksum. In ovs_ct_execute(), trim the skb to the L3 length before higher-layer processing. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-19openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is foundStefano Brivio
[ Upstream commit 72f17baf2352ded6a1d3f4bb2d15da8c678cd2cb ] If an OVS_ATTR_NESTED attribute type is found while walking through netlink attributes, we call nlattr_set() recursively passing the length table for the following nested attributes, if different from the current one. However, once we're done with those sub-nested attributes, we should continue walking through attributes using the current table, instead of using the one related to the sub-nested attributes. For example, given this sequence: 1 OVS_KEY_ATTR_PRIORITY 2 OVS_KEY_ATTR_TUNNEL 3 OVS_TUNNEL_KEY_ATTR_ID 4 OVS_TUNNEL_KEY_ATTR_IPV4_SRC 5 OVS_TUNNEL_KEY_ATTR_IPV4_DST 6 OVS_TUNNEL_KEY_ATTR_TTL 7 OVS_TUNNEL_KEY_ATTR_TP_SRC 8 OVS_TUNNEL_KEY_ATTR_TP_DST 9 OVS_KEY_ATTR_IN_PORT 10 OVS_KEY_ATTR_SKB_MARK 11 OVS_KEY_ATTR_MPLS we switch to the 'ovs_tunnel_key_lens' table on attribute #3, and we don't switch back to 'ovs_key_lens' while setting attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is 15, we also get this kind of KASan splat while accessing the wrong table: [ 7654.586496] ================================================================== [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch] [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430 [ 7654.610983] [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1 [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 [ 7654.631379] Call Trace: [ 7654.634108] [<ffffffffb65a7c50>] dump_stack+0x19/0x1b [ 7654.639843] [<ffffffffb53ff373>] print_address_description+0x33/0x290 [ 7654.647129] [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch] [ 7654.654607] [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330 [ 7654.661406] [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40 [ 7654.668789] [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch] [ 7654.676076] [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch] [ 7654.684234] [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40 [ 7654.689968] [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590 [ 7654.696574] [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch] [ 7654.705122] [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0 [ 7654.712503] [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21 [ 7654.719401] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.726298] [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370 [ 7654.733195] [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50 [ 7654.740187] [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0 [ 7654.746406] [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20 [ 7654.752914] [<ffffffffb53fe711>] ? memset+0x31/0x40 [ 7654.758456] [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch] [snip] [ 7655.132484] The buggy address belongs to the variable: [ 7655.138226] ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch] [ 7655.145507] [ 7655.147166] Memory state around the buggy address: [ 7655.152514] ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa [ 7655.160585] ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa [ 7655.176701] ^ [ 7655.184372] ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05 [ 7655.192431] ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 7655.200490] ================================================================== Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: 982b52700482 ("openvswitch: Fix mask generation for nested attributes.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24openvswitch: Delete conntrack entry clashing with an expectation.Jarno Rajahalme
[ Upstream commit cf5d70918877c6a6655dc1e92e2ebb661ce904fd ] Conntrack helpers do not check for a potentially clashing conntrack entry when creating a new expectation. Also, nf_conntrack_in() will check expectations (via init_conntrack()) only if a conntrack entry can not be found. The expectation for a packet which also matches an existing conntrack entry will not be removed by conntrack, and is currently handled inconsistently by OVS, as OVS expects the expectation to be removed when the connection tracking entry matching that expectation is confirmed. It should be noted that normally an IP stack would not allow reuse of a 5-tuple of an old (possibly lingering) connection for a new data connection, so this is somewhat unlikely corner case. However, it is possible that a misbehaving source could cause conntrack entries be created that could then interfere with new related connections. Fix this in the OVS module by deleting the clashing conntrack entry after an expectation has been matched. This causes the following nf_conntrack_in() call also find the expectation and remove it when creating the new conntrack entry, as well as the forthcoming reply direction packets to match the new related connection instead of the old clashing conntrack entry. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-03openvswitch: fix the incorrect flow action alloc sizezhangliping
[ Upstream commit 67c8d22a73128ff910e2287567132530abcf5b71 ] If we want to add a datapath flow, which has more than 500 vxlan outputs' action, we will get the following error reports: openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Flow action size 32832 bytes exceeds max openvswitch: netlink: Actions may not be safe on all matching packets ... ... It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but this is not the root cause. For example, for a vxlan output action, we need about 60 bytes for the nlattr, but after it is converted to the flow action, it only occupies 24 bytes. This means that we can still support more than 1000 vxlan output actions for a single datapath flow under the the current 32k max limitation. So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we shouldn't report EINVAL and keep it move on, as the judgement can be done by the reserve_sfa_size. Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-30openvswitch: fix skb_panic due to the incorrect actions attrlenLiping Zhang
[ Upstream commit 494bea39f3201776cdfddc232705f54a0bd210c4 ] For sw_flow_actions, the actions_len only represents the kernel part's size, and when we dump the actions to the userspace, we will do the convertions, so it's true size may become bigger than the actions_len. But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len to alloc the skbuff, so the user_skb's size may become insufficient and oops will happen like this: skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head: ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: <IRQ> [<ffffffff8148be82>] skb_put+0x43/0x44 [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4 [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch] [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch] [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch] [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6] [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch] [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch] [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch] [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch] [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch] [...] Also we can find that the actions_len is much little than the orig_len: crash> struct sw_flow_actions 0xffff8812f539d000 struct sw_flow_actions { rcu = { next = 0xffff8812f5398800, func = 0xffffe3b00035db32 }, orig_len = 1384, actions_len = 592, actions = 0xffff8812f539d01c } So as a quick fix, use the orig_len instead of the actions_len to alloc the user_skb. Last, this oops happened on our system running a relative old kernel, but the same risk still exists on the mainline, since we use the wrong actions_len from the beginning. Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace") Cc: Neil McKee <neil.mckee@inmon.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11openvswitch: fix potential out of bound access in parse_ctLiping Zhang
[ Upstream commit 69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c ] Before the 'type' is validated, we shouldn't use it to fetch the ovs_ct_attr_lens's minlen and maxlen, else, out of bound access may happen. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30openvswitch: Add missing case OVS_TUNNEL_KEY_ATTR_PADKris Murphy
[ Upstream commit 8f3dbfd79ed9ef9770305a7cc4e13dfd31ad2cd0 ] Added a case for OVS_TUNNEL_KEY_ATTR_PAD to the switch statement in ip_tun_from_nlattr in order to prevent the default case returning an error. Fixes: b46f6ded906e ("libnl: nla_put_be64(): align on a 64-bit area") Signed-off-by: Kris Murphy <kriskend@linux.vnet.ibm.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-30net/openvswitch: Set the ipv6 source tunnel key address attribute correctlyOr Gerlitz
[ Upstream commit 3d20f1f7bd575d147ffa75621fa560eea0aec690 ] When dealing with ipv6 source tunnel key address attribute (OVS_TUNNEL_KEY_ATTR_IPV6_SRC) we are wrongly setting the tunnel dst ip, fix that. Fixes: 6b26ba3a7d95 ('openvswitch: netlink attributes for IPv6 tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22ipv6: orphan skbs in reassembly unitEric Dumazet
[ Upstream commit 48cac18ecf1de82f76259a54402c3adb7839ad01 ] Andrey reported a use-after-free in IPv6 stack. Issue here is that we free the socket while it still has skb in TX path and in some queues. It happens here because IPv6 reassembly unit messes skb->truesize, breaking skb_set_owner_w() badly. We fixed a similar issue for IPV4 in commit 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") Acked-by: Joe Stringer <joe@ovn.org> ================================================================== BUG: KASAN: use-after-free in sock_wfree+0x118/0x120 Read of size 8 at addr ffff880062da0060 by task a.out/4140 page:ffffea00018b6800 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013 raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000 page dumped because: kasan: bad access detected CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 dump_stack+0x292/0x398 lib/dump_stack.c:51 describe_address mm/kasan/report.c:262 kasan_report_error+0x121/0x560 mm/kasan/report.c:370 kasan_report mm/kasan/report.c:392 __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413 sock_flag ./arch/x86/include/asm/bitops.h:324 sock_wfree+0x118/0x120 net/core/sock.c:1631 skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put ./include/net/inet_frag.h:133 nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn ./include/linux/netfilter.h:102 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook ./include/linux/netfilter.h:212 __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x620 net/socket.c:848 new_sync_write fs/read_write.c:499 __vfs_write+0x483/0x760 fs/read_write.c:512 vfs_write+0x187/0x530 fs/read_write.c:560 SYSC_write fs/read_write.c:607 SyS_write+0xfb/0x230 fs/read_write.c:599 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 RIP: 0033:0x7ff26e6f5b79 RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79 RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003 RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003 The buggy address belongs to the object at ffff880062da0000 which belongs to the cache RAWv6 of size 1504 The buggy address ffff880062da0060 is located 96 bytes inside of 1504-byte region [ffff880062da0000, ffff880062da05e0) Freed by task 4113: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1352 slab_free_freelist_hook mm/slub.c:1374 slab_free mm/slub.c:2951 kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973 sk_prot_free net/core/sock.c:1377 __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sk_free+0x23/0x30 net/core/sock.c:1479 sock_put ./include/net/sock.h:1638 sk_common_release+0x31e/0x4e0 net/core/sock.c:2782 rawv6_close+0x54/0x80 net/ipv6/raw.c:1214 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431 sock_release+0x8d/0x1e0 net/socket.c:599 sock_close+0x16/0x20 net/socket.c:1063 __fput+0x332/0x7f0 fs/file_table.c:208 ____fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x19b/0x270 kernel/task_work.c:116 exit_task_work ./include/linux/task_work.h:21 do_exit+0x186b/0x2800 kernel/exit.c:839 do_group_exit+0x149/0x420 kernel/exit.c:943 SYSC_exit_group kernel/exit.c:954 SyS_exit_group+0x1d/0x20 kernel/exit.c:952 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Allocated by task 4115: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 slab_post_alloc_hook mm/slab.h:432 slab_alloc_node mm/slub.c:2708 slab_alloc mm/slub.c:2716 kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334 sk_alloc+0x105/0x1010 net/core/sock.c:1396 inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183 __sock_create+0x4f6/0x880 net/socket.c:1199 sock_create net/socket.c:1239 SYSC_socket net/socket.c:1269 SyS_socket+0xf9/0x230 net/socket.c:1249 entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203 Memory state around the buggy address: ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-04openvswitch: maintain correct checksum state in conntrack actionsLance Richardson
[ Upstream commit 75f01a4c9cc291ff5cb28ca1216adb163b7a20ee ] When executing conntrack actions on skbuffs with checksum mode CHECKSUM_COMPLETE, the checksum must be updated to account for header pushes and pulls. Otherwise we get "hw csum failure" logs similar to this (ICMP packet received on geneve tunnel via ixgbe NIC): [ 405.740065] genev_sys_6081: hw csum failure [ 405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G I 4.10.0-rc3+ #1 [ 405.740108] Call Trace: [ 405.740110] <IRQ> [ 405.740113] dump_stack+0x63/0x87 [ 405.740116] netdev_rx_csum_fault+0x3a/0x40 [ 405.740118] __skb_checksum_complete+0xcf/0xe0 [ 405.740120] nf_ip_checksum+0xc8/0xf0 [ 405.740124] icmp_error+0x1de/0x351 [nf_conntrack_ipv4] [ 405.740132] nf_conntrack_in+0xe1/0x550 [nf_conntrack] [ 405.740137] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [ 405.740143] __ovs_ct_lookup+0x95/0x980 [openvswitch] [ 405.740145] ? netif_rx_internal+0x44/0x110 [ 405.740149] ovs_ct_execute+0x147/0x4b0 [openvswitch] [ 405.740153] do_execute_actions+0x22e/0xa70 [openvswitch] [ 405.740157] ovs_execute_actions+0x40/0x120 [openvswitch] [ 405.740161] ovs_dp_process_packet+0x84/0x120 [openvswitch] [ 405.740166] ovs_vport_receive+0x73/0xd0 [openvswitch] [ 405.740168] ? udp_rcv+0x1a/0x20 [ 405.740170] ? ip_local_deliver_finish+0x93/0x1e0 [ 405.740172] ? ip_local_deliver+0x6f/0xe0 [ 405.740174] ? ip_rcv_finish+0x3a0/0x3a0 [ 405.740176] ? ip_rcv_finish+0xdb/0x3a0 [ 405.740177] ? ip_rcv+0x2a7/0x400 [ 405.740180] ? __netif_receive_skb_core+0x970/0xa00 [ 405.740185] netdev_frame_hook+0xd3/0x160 [openvswitch] [ 405.740187] __netif_receive_skb_core+0x1dc/0xa00 [ 405.740194] ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe] [ 405.740197] __netif_receive_skb+0x18/0x60 [ 405.740199] netif_receive_skb_internal+0x40/0xb0 [ 405.740201] napi_gro_receive+0xcd/0x120 [ 405.740204] gro_cell_poll+0x57/0x80 [geneve] [ 405.740206] net_rx_action+0x260/0x3c0 [ 405.740209] __do_softirq+0xc9/0x28c [ 405.740211] irq_exit+0xd9/0xf0 [ 405.740213] do_IRQ+0x51/0xd0 [ 405.740215] common_interrupt+0x93/0x93 Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-30openvswitch: Fix skb leak in IPv6 reassembly.Daniele Di Proietto
If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it means that we still have a reference to the skb. We should free it before returning from handle_fragments, as stated in the comment above. Fixes: daaa7d647f81 ("netfilter: ipv6: avoid nf_iterate recursion") CC: Florian Westphal <fw@strlen.de> CC: Pravin B Shelar <pshelar@ovn.org> CC: Joe Stringer <joe@ovn.org> Signed-off-by: Daniele Di Proietto <diproiettod@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal devJiri Benc
The internal device does support 802.1AD offloading since 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes"). Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: fix vlan subtraction from packet lengthJiri Benc
When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN header is not counted in skb->len. It doesn't make sense to subtract it. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-13openvswitch: vlan: remove wrong likely statementJiri Benc
This code is called whenever flow key is being extracted from the packet. The packet may be as likely vlan tagged as not. Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Eric Garver <e@erig.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03openvswitch: use mpls_hdrJiri Benc
skb_mpls_header is equivalent to mpls_hdr now. Use the existing helper instead. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-03openvswitch: mpls: set network header correctly on key extractJiri Benc
After the 48d2ab609b6b ("net: mpls: Fixups for GSO"), MPLS handling in openvswitch was changed to have network header pointing to the start of the MPLS headers and inner_network_header pointing after the MPLS headers. However, key_extract was missed by the mentioned commit, causing incorrect headers to be set when a MPLS packet just enters the bridge or after it is recirculated. Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20openvswitch: avoid resetting flow key while installing new flow.pravin shelar
since commit commit db74a3335e0f6 ("openvswitch: use percpu flow stats") flow alloc resets flow-key. So there is no need to reset the flow-key again if OVS is using newly allocated flow-key. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-20openvswitch: Fix Frame-size larger than 1024 bytes warning.pravin shelar
There is no need to declare separate key on stack, we can just use sw_flow->key to store the key directly. This commit fixes following warning: net/openvswitch/datapath.c: In function ‘ovs_flow_cmd_new’: net/openvswitch/datapath.c:1080:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18openvswitch: use percpu flow statsThadeu Lima de Souza Cascardo
Instead of using flow stats per NUMA node, use it per CPU. When using megaflows, the stats lock can be a bottleneck in scalability. On a E5-2690 12-core system, usual throughput went from ~4Mpps to ~15Mpps when forwarding between two 40GbE ports with a single flow configured on the datapath. This has been tested on a system with possible CPUs 0-7,16-23. After module removal, there were no corruption on the slab cache. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Cc: pravin shelar <pshelar@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-18openvswitch: fix flow stats accounting when node 0 is not possibleThadeu Lima de Souza Cascardo
On a system with only node 1 as possible, all statistics is going to be accounted on node 0 as it will have a single writer. However, when getting and clearing the statistics, node 0 is not going to be considered, as it's not a possible node. Tested that statistics are not zero on a system with only node 1 possible. Also compile-tested with CONFIG_NUMA off. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-15openvswitch: avoid deferred execution of recirc actionsLance Richardson
The ovs kernel data path currently defers the execution of all recirc actions until stack utilization is at a minimum. This is too limiting for some packet forwarding scenarios due to the small size of the deferred action FIFO (10 entries). For example, broadcast traffic sent out more than 10 ports with recirculation results in packet drops when the deferred action FIFO becomes full, as reported here: http://openvswitch.org/pipermail/dev/2016-March/067672.html Since the current recursion depth is available (it is already tracked by the exec_actions_level pcpu variable), we can use it to determine whether to execute recirculation actions immediately (safe when recursion depth is low) or defer execution until more stack space is available. With this change, the deferred action fifo size becomes a non-issue for currently failing scenarios because it is no longer used when there are three or fewer recursions through ovs_execute_actions(). Suggested-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10openvswitch: use alias for genetlink family namesThadeu Lima de Souza Cascardo
When userspace tries to create datapaths and the module is not loaded, it will simply fail. With this patch, the module will be automatically loaded. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributesEric Garver
Add support for 802.1ad including the ability to push and pop double tagged vlans. Add support for 802.1ad to netlink parsing and flow conversion. Uses double nested encap attributes to represent double tagged vlan. Inner TPID encoded along with ctci in nested attributes. This is based on Thomas F Herbert's original v20 patch. I made some small clean ups and bug fixes. Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com> Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04openvswitch: Free tmpl with tmpl_free.Joe Stringer
When an error occurs during conntrack template creation as part of actions validation, we need to free the template. Previously we've been using nf_ct_put() to do this, but nf_ct_tmpl_free() is more appropriate. Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30net: mpls: Fixups for GSODavid Ahern
As reported by Lennert the MPLS GSO code is failing to properly segment large packets. There are a couple of problems: 1. the inner protocol is not set so the gso segment functions for inner protocol layers are not getting run, and 2 MPLS labels for packets that use the "native" (non-OVS) MPLS code are not properly accounted for in mpls_gso_segment. The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment to call the gso segment functions for the higher layer protocols. That means skb_mac_gso_segment is called twice -- once with the network protocol set to MPLS and again with the network protocol set to the inner protocol. This patch sets the inner skb protocol addressing item 1 above and sets the network_header and inner_network_header to mark where the MPLS labels start and end. The MPLS code in OVS is also updated to set the two network markers. >From there the MPLS GSO code uses the difference between the network header and the inner network header to know the size of the MPLS header that was pushed. It then pulls the MPLS header, resets the mac_len and protocol for the inner protocol and then calls skb_mac_gso_segment to segment the skb. Afterward the inner protocol segmentation is done the skb protocol is set to mpls for each segment and the network and mac headers restored. Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-10openvswitch: do not ignore netdev errors when creating tunnel vportsMartynas Pumputis
The creation of a tunnel vport (geneve, gre, vxlan) brings up a corresponding netdev, a multi-step operation which can fail. For example, changing a vxlan vport's netdev state to 'up' binds the vport's socket to a UDP port - if the binding fails (e.g. due to the port being in use), the error is currently ignored giving the appearance that the tunnel vport creation completed successfully. Signed-off-by: Martynas Pumputis <martynas@weave.works> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06OVS: Ignore negative headroom valueIan Wienand
net_device->ndo_set_rx_headroom (introduced in 871b642adebe300be2e50aa5f65a418510f636ec) says "Setting a negtaive value reset the rx headroom to the default value". It seems that the OVS implementation in 3a927bc7cf9d0fbe8f4a8189dd5f8440228f64e7 overlooked this and sets dev->needed_headroom unconditionally. This doesn't have an immediate effect, but can mess up later LL_RESERVED_SPACE calculations, such as done in net/ipv6/mcast.c:mld_newpack. For reference, this issue was found from a skb_panic raised there after the length calculations had given the wrong result. Note the other current users of this interface (drivers/net/tun.c:tun_set_headroom and drivers/net/veth.c:veth_set_rx_headroom) are both checking this correctly thus need no modification. Thanks to Ben for some pointers from the crash dumps! Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-03openvswitch: Remove incorrect WARN_ONCE().Jarno Rajahalme
ovs_ct_find_existing() issues a warning if an existing conntrack entry classified as IP_CT_NEW is found, with the premise that this should not happen. However, a newly confirmed, non-expected conntrack entry remains IP_CT_NEW as long as no reply direction traffic is seen. This has resulted into somewhat confusing kernel log messages. This patch removes this check and warning. Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-22netfilter: conntrack: support a fixed size of 128 distinct labelsFlorian Westphal
The conntrack label extension is currently variable-sized, e.g. if only 2 labels are used by iptables rules then the labels->bits[] array will only contain one element. We track size of each label storage area in the 'words' member. But in nftables and openvswitch we always have to ask for worst-case since we don't know what bit will be used at configuration time. As most arches are 64bit we need to allocate 24 bytes in this case: struct nf_conn_labels { u8 words; /* 0 1 */ /* XXX 7 bytes hole, try to pack */ long unsigned bits[2]; /* 8 24 */ Make bits a fixed size and drop the words member, it simplifies the code and only increases memory requirements on x86 when less than 64bit labels are required. We still only allocate the extension if its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29openvswitch: fix conntrack netlink event deliverySamuel Gauthier
Only the first and last netlink message for a particular conntrack are actually sent. The first message is sent through nf_conntrack_confirm when the conntrack is committed. The last one is sent when the conntrack is destroyed on timeout. The other conntrack state change messages are not advertised. When the conntrack subsystem is used from netfilter, nf_conntrack_confirm is called for each packet, from the postrouting hook, which in turn calls nf_ct_deliver_cached_events to send the state change netlink messages. This commit fixes the problem by calling nf_ct_deliver_cached_events in the non-commit case as well. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") CC: Joe Stringer <joestringer@nicira.com> CC: Justin Pettit <jpettit@nicira.com> CC: Andy Zhou <azhou@nicira.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25openvswitch: Only set mark and labels with a commit flag.Jarno Rajahalme
Only set conntrack mark or labels when the commit flag is specified. This makes sure we can not set them before the connection has been persisted, as in that case the mark and labels would be lost in an event of an userspace upcall. OVS userspace already requires the commit flag to accept setting ct_mark and/or ct_labels. Validate for this in the kernel API. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25openvswitch: Set mark and labels before confirming.Jarno Rajahalme
Set conntrack mark and labels right before committing so that the initial conntrack NEW event has the mark and labels. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22openvswitch: Add packet len info to upcall.William Tu
The commit f2a4d086ed4c ("openvswitch: Add packet truncation support.") introduces packet truncation before sending to userspace upcall receiver. This patch passes up the skb->len before truncation so that the upcall receiver knows the original packet size. Potentially this will be used by sFlow, where OVS translates sFlow config header=N to a sample action, truncating packet to N byte in kernel datapath. Thus, only N bytes instead of full-packet size is copied from kernel to userspace, saving the kernel-to-userspace bandwidth. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10openvswitch: Add packet truncation support.William Tu
The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to truncate packets. A 'max_len' is added for setting up the maximum packet size, and a 'cutlen' field is to record the number of bytes to trim the packet when the packet is outputting to a port, or when the packet is sent to userspace. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02ovs: set name assign type of internal portZhang Shengju
Set name_assign_type of internal port to NET_NAME_USER. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31openvswitch: update checksum in {push,pop}_mplsSimon Horman
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in {push,pop}_mpls() as they the type in the ethernet header. As suggested by Pravin Shelar. Cc: Pravin Shelar <pshelar@nicira.com> Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The nf_conntrack_core.c fix in 'net' is not relevant in 'net-next' because we no longer have a per-netns conntrack hash. The ip_gre.c conflict as well as the iwlwifi ones were cases of overlapping changes. Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/tx.c net/ipv4/ip_gre.c net/netfilter/nf_conntrack_core.c Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-11openvswitch: Fix cached ct with helper.Joe Stringer
When using conntrack helpers from OVS, a common configuration is to perform a lookup without specifying a helper, then go through a firewalling policy, only to decide to attach a helper afterwards. In this case, the initial lookup will cause a ct entry to be attached to the skb, then the later commit with helper should attach the helper and confirm the connection. However, the helper attachment has been missing. If the user has enabled automatic helper attachment, then this issue will be masked as it will be applied in init_conntrack(). It is also masked if the action is executed from ovs_packet_cmd_execute() as that will construct a fresh skb. This patch fixes the issue by making an explicit call to try to assign the helper if there is a discrepancy between the action's helper and the current skb->nfct. Fixes: cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following large patchset contains Netfilter updates for your net-next tree. My initial intention was to send you this in two goes but when I looked back twice I already had this burden on top of me. Several updates for IPVS from Marco Angaroni: 1) Allow SIP connections originating from real-servers to be load balanced by the SIP persistence engine as is already implemented in the other direction. 2) Release connections immediately for One-packet-scheduling (OPS) in IPVS, instead of making it via timer and rcu callback. 3) Skip deleting conntracks for each one packet in OPS, and don't call nf_conntrack_alter_reply() since no reply is expected. 4) Enable drop on exhaustion for OPS + SIP persistence. Miscelaneous conntrack updates from Florian Westphal, including fix for hash resize: 5) Move conntrack generation counter out of conntrack pernet structure since this is only used by the init_ns to allow hash resizing. 6) Use get_random_once() from packet path to collect hash random seed instead of our compound. 7) Don't disable BH from ____nf_conntrack_find() for statistics, use NF_CT_STAT_INC_ATOMIC() instead. 8) Fix lookup race during conntrack hash resizing. 9) Introduce clash resolution on conntrack insertion for connectionless protocol. Then, Florian's netns rework to get rid of per-netns conntrack table, thus we use one single table for them all. There was consensus on this change during the NFWS 2015 and, on top of that, it has recently been pointed as a source of multiple problems from unpriviledged netns: 11) Use a single conntrack hashtable for all namespaces. Include netns in object comparisons and make it part of the hash calculation. Adapt early_drop() to consider netns. 12) Use single expectation and NAT hashtable for all namespaces. 13) Use a single slab cache for all namespaces for conntrack objects. 14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet conntrack counter tells us the table is empty (ie. equals zero). Fixes for nf_tables interval set element handling, support to set conntrack connlabels and allow set names up to 32 bytes. 15) Parse element flags from element deletion path and pass it up to the backend set implementation. 16) Allow adjacent intervals in the rbtree set type for dynamic interval updates. 17) Add support to set connlabel from nf_tables, from Florian Westphal. 18) Allow set names up to 32 bytes in nf_tables. Several x_tables fixes and updates: 19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch from Andrzej Hajda. And finally, miscelaneous netfilter updates such as: 20) Disable automatic helper assignment by default. Note this proc knob was introduced by a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") 4 years ago to start moving towards explicit conntrack helper configuration via iptables CT target. 21) Get rid of obsolete and inconsistent debugging instrumentation in x_tables. 22) Remove unnecessary check for null after ip6_route_output(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>