summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)Author
2013-10-01Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter/IPVS fixes for your net tree, they are: * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from Patrick McHardy. * Fix possible weight overflow in lblc and lblcr schedulers due to 32-bits arithmetics, from Simon Kirby. * Fix possible memory access race in the lblc and lblcr schedulers, introduced when it was converted to use RCU, two patches from Julian Anastasov. * Fix hard dependency on CPU 0 when reading per-cpu stats in the rate estimator, from Julian Anastasov. * Fix race that may lead to object use after release, when invoking ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30netfilter: synproxy: fix BUG_ON triggered by corrupt TCP packetsPatrick McHardy
TCP packets hitting the SYN proxy through the SYNPROXY target are not validated by TCP conntrack. When th->doff is below 5, an underflow happens when calculating the options length, causing skb_header_pointer() to return NULL and triggering the BUG_ON(). Handle this case gracefully by checking for NULL instead of using BUG_ON(). Reported-by: Martin Topholm <mph@one.com> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-19ip: generate unique IP identificator if local fragmentation is allowedAnsis Atteka
If local fragmentation is allowed, then ip_select_ident() and ip_select_ident_more() need to generate unique IDs to ensure correct defragmentation on the peer. For example, if IPsec (tunnel mode) has to encrypt large skbs that have local_df bit set, then all IP fragments that belonged to different ESP datagrams would have used the same identificator. If one of these IP fragments would get lost or reordered, then peer could possibly stitch together wrong IP fragments that did not belong to the same datagram. This would lead to a packet loss or data corruption. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-18ipvs: stats should not depend on CPU 0Julian Anastasov
When reading percpu stats we need to properly reset the sum when CPU 0 is not present in the possible mask. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: do not use dest after ip_vs_dest_put in LBLCRJulian Anastasov
commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow e->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now e->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: do not use dest after ip_vs_dest_put in LBLCJulian Anastasov
commit c2a4ffb70eef39 ("ipvs: convert lblc scheduler to rcu") allows RCU readers to use dest after calling ip_vs_dest_put(). In the corner case it can race with ip_vs_dest_trash_expire() which can release the dest while it is being returned to the RCU readers as scheduling result. To fix the problem do not allow en->dest to be replaced and defer the ip_vs_dest_put() call by using RCU callback. Now en->dest does not need to be RCU pointer. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: make the service replacement more robustJulian Anastasov
commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added IP_VS_DEST_STATE_REMOVING flag and RCU callback named ip_vs_dest_wait_readers() to keep dests and services after removal for at least a RCU grace period. But we have the following corner cases: - we can not reuse the same dest if its service is removed while IP_VS_DEST_STATE_REMOVING is still set because another dest removal in the first grace period can not extend this period. It can happen when ipvsadm -C && ipvsadm -R is used. - dest->svc can be replaced but ip_vs_in_stats() and ip_vs_out_stats() have no explicit read memory barriers when accessing dest->svc. It can happen that dest->svc was just freed (replaced) while we use it to update the stats. We solve the problems as follows: - IP_VS_DEST_STATE_REMOVING is removed and we ensure a fixed idle period for the dest (IP_VS_DEST_TRASH_PERIOD). idle_start will remember when for first time after deletion we noticed dest->refcnt=0. Later, the connections can grab a reference while in RCU grace period but if refcnt becomes 0 we can safely free the dest and its svc. - dest->svc becomes RCU pointer. As result, we add explicit RCU locking in ip_vs_in_stats() and ip_vs_out_stats(). - __ip_vs_unbind_svc is renamed to __ip_vs_svc_put(), it now can free the service immediately or after a RCU grace period. dest->svc is not set to NULL anymore. As result, unlinked dests and their services are freed always after IP_VS_DEST_TRASH_PERIOD period, unused services are freed after a RCU grace period. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-18ipvs: fix overflow on dest weight multiplySimon Kirby
Schedulers such as lblc and lblcr require the weight to be as high as the maximum number of active connections. In commit b552f7e3a9524abcbcdf ("ipvs: unify the formula to estimate the overhead of processing connections"), the consideration of inactconns and activeconns was cleaned up to always count activeconns as 256 times more important than inactconns. In cases where 3000 or more connections are expected, a weight of 3000 * 256 * 3000 connections overflows the 32-bit signed result used to determine if rescheduling is required. On amd64, this merely changes the multiply and comparison instructions to 64-bit. On x86, a 64-bit result is already present from imull, so only a few more comparison instructions are emitted. Signed-off-by: Simon Kirby <sim@hostway.ca> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-09-17netfilter: nfnetlink_queue: use network skb for sequence adjustmentGao feng
Instead of the netlink skb. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-16netfilter: ipset: Fix serious failure in CIDR trackingOliver Smith
This fixes a serious bug affecting all hash types with a net element - specifically, if a CIDR value is deleted such that none of the same size exist any more, all larger (less-specific) values will then fail to match. Adding back any prefix with a CIDR equal to or more specific than the one deleted will fix it. Steps to reproduce: ipset -N test hash:net ipset -A test 1.1.0.0/16 ipset -A test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS in set ipset -D test 2.2.2.0/24 ipset -T test 1.1.1.1 #1.1.1.1 IS NOT in set This is due to the fact that the nets counter was unconditionally decremented prior to the iteration that shifts up the entries. Now, we first check if there is a proceeding entry and if not, decrement it and return. Otherwise, we proceed to iterate and then zero the last element, which, in most cases, will already be zero. Signed-off-by: Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Validate the set family and not the set type family at ↵Jozsef Kadlecsik
swapping This closes netfilter bugzilla #843, reported by Quentin Armitage. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Consistent userspace testing with nomatch flagJozsef Kadlecsik
The "nomatch" commandline flag should invert the matching at testing, similarly to the --return-nomatch flag of the "set" match of iptables. Until now it worked with the elements with "nomatch" flag only. From now on it works with elements without the flag too, i.e: # ipset n test hash:net # ipset a test 10.0.0.0/24 nomatch # ipset t test 10.0.0.1 10.0.0.1 is NOT in set test. # ipset t test 10.0.0.1 nomatch 10.0.0.1 is in set test. # ipset a test 192.168.0.0/24 # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is NOT in set test. Before the patch the results were ... # ipset t test 192.168.0.1 192.168.0.1 is in set test. # ipset t test 192.168.0.1 nomatch 192.168.0.1 is in set test. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-16netfilter: ipset: Skip really non-first fragments for IPv6 when getting ↵Jozsef Kadlecsik
port/protocol Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2013-09-05netfilter: Fix build errors with xt_socket.cDavid S. Miller
As reported by Randy Dunlap: ==================== when CONFIG_IPV6=m and CONFIG_NETFILTER_XT_MATCH_SOCKET=y: net/built-in.o: In function `socket_mt6_v1_v2': xt_socket.c:(.text+0x51b55): undefined reference to `udp6_lib_lookup' net/built-in.o: In function `socket_mt_init': xt_socket.c:(.init.text+0x1ef8): undefined reference to `nf_defrag_ipv6_enable' ==================== Like several other modules under net/netfilter/ we have to have a dependency "IPV6 disabled or set compatibly with this module" clause. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packetPhil Oester
In commit b396966c4 (netfilter: xt_TCPMSS: Fix missing fragmentation handling), I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan of Project N56U correctly points out that returning XT_CONTINUE in this function does not work. The callers (tcpmss_tg[46]) expect to receive a value of 0 in order to return XT_CONTINUE. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-09-04netfilter: synproxy_core: fix warning in __nf_ct_ext_add_length()Patrick McHardy
With CONFIG_NETFILTER_DEBUG we get the following warning during SYNPROXY init: [ 80.558906] WARNING: CPU: 1 PID: 4833 at net/netfilter/nf_conntrack_extend.c:80 __nf_ct_ext_add_length+0x217/0x220 [nf_conntrack]() The reason is that the conntrack template is set to confirmed before adding the extension and it is invalid to add extensions to already confirmed conntracks. Fix by adding the extensions before setting the conntrack to confirmed. Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: ctnetlink: fix uninitialized variableFlorian Westphal
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect': 'helper' may be used uninitialized in this function It was only initialized in if CTA_EXPECT_HELP_NAME attribute was present, it must be NULL otherwise. Problem added recently in bd077937 (netfilter: nfnetlink_queue: allow to attach expectations to conntracks). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: add SYNPROXY core/targetPatrick McHardy
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-20Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-13netfilter: nfnetlink_queue: allow to attach expectations to conntracksPablo Neira Ayuso
This patch adds the capability to attach expectations via nfnetlink_queue. This is required by conntrack helpers that trigger expectations based on the first packet seen like the TFTP and the DHCPv6 user-space helpers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-13netfilter: ctnetlink: refactor ctnetlink_create_expectPablo Neira Ayuso
This patch refactors ctnetlink_create_expect by spliting it in two chunks. As a result, we have a new function ctnetlink_alloc_expect to allocate and to setup the expectation from ctnetlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-10netfilter: nf_conntrack: fix tcp_in_window for Fast OpenYuchung Cheng
Currently the conntrack checks if the ending sequence of a packet falls within the observed receive window. However it does so even if it has not observe any packet from the remote yet and uses an uninitialized receive window (td_maxwin). If a connection uses Fast Open to send a SYN-data packet which is dropped afterward in the network. The subsequent SYNs retransmits will all fail this check and be discarded, leading to a connection timeout. This is because the SYN retransmit does not contain data payload so end == initial sequence number (isn) + 1 sender->td_end == isn + syn_data_len receiver->td_maxwin == 0 The fix is to only apply this check after td_maxwin is initialized. Reported-by: Michael Chan <mcfchan@stanford.edu> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-09netfilter: nf_conntrack: don't send destroy events from iteratorFlorian Westphal
Let nf_ct_delete handle delivery of the DESTROY event. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-07ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULLDaniel Borkmann
skb_header_pointer could return NULL, so check for it as we do it everywhere else in ipvs code. This fixes a coverity warning. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-06ipvs: fixed spacing at for statementsDragos Foianu
found using checkpatch.pl Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-08-05netfilter: nfnetlink_{log,queue}: fix information leaks in netlink messageDan Carpenter
These structs have a "_pad" member. Also the "phw" structs have an 8 byte "hw_addr[]" array but sometimes only the first 6 bytes are initialized. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-05netfilter: tproxy: fix build with IP6_NF_IPTABLES=nFlorian Westphal
after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h) CONFIG_IPV6=y CONFIG_IP6_NF_IPTABLES=n gives us: net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6': net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01netfilter: xt_TCPOPTSTRIP: fix possible off by one accessPablo Neira Ayuso
Fix a possible off by one access since optlen() touches opt[offset+1] unsafely when i == tcp_hdrlen(skb) - 1. This patch replaces tcp_hdrlen() by the local variable tcp_hdrlen that stores the TCP header length, to save some cycles. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-01netfilter: xt_TCPMSS: fix handling of malformed TCP header and optionsPablo Neira Ayuso
Make sure the packet has enough room for the TCP header and that it is not malformed. While at it, store tcph->doff*4 in a variable, as it is used several times. This patch also fixes a possible off by one in case of malformed TCP options. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: use per-conntrack locking for sequence number adjustmentsPatrick McHardy
Get rid of the global lock and use per-conntrack locks for protecting the sequencen number adjustment data. Additionally saves one lock/unlock operation for every TCP packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: change sequence number adjustments to 32 bitsPatrick McHardy
Using 16 bits is too small, when many adjustments happen the offsets might overflow and break the connection. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_nat: fix locking in nf_nat_seq_adjust()Patrick McHardy
nf_nat_seq_adjust() needs to grab nf_nat_seqofs_lock to protect against concurrent changes to the sequence adjustment data. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: remove duplicate code in ctnetlinkFlorian Westphal
ctnetlink contains copy-paste code from death_by_timeout. In order to avoid changing both places in upcoming event delivery patch, export death_by_timeout functionality and use it in the ctnetlink code. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: tproxy: remove nf_tproxy_core.hFlorian Westphal
We've removed nf_tproxy_core.ko, so also remove its header. The lookup helpers are split and then moved to tproxy target/socket match. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skbFlorian Westphal
The module was "permanent", due to the special tproxy skb->destructor. Nowadays we have tcp early demux and its sock_edemux destructor in networking core which can be used instead. Thanks to early demux changes the input path now also handles "skb->sk is tw socket" correctly, so this no longer needs the special handling introduced with commit d503b30bd648b3cb4e5f50b65d27e389960cc6d9 (netfilter: tproxy: do not assign timewait sockets to skb->sk). Thus: - move assign_sock function to where its needed - don't prevent timewait sockets from being assigned to the skb - remove nf_tproxy_core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_queue: relax NFQA_CT attribute checkFlorian Westphal
Allow modifying attributes of the conntrack associated with a packet without first requesting ct data via CFG_F_CONNTRACK or extra nfnetlink_conntrack socket. Also remove unneded rcu_read_lock; the entire function is already protected by rcu. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: connlabels: remove unneeded includesFlorian Westphal
leftovers from the (never merged) v1 patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-31netfilter: xt_addrtype: fix trivial typoPhil Oester
Fix typo in error message. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-27net/sctp: Refactor SCTP skb checksum computationJoe Stringer
This patch consolidates the SCTP checksum calculation code from various places to a single new function, sctp_compute_cksum(skb, offset). Signed-off-by: Joe Stringer <joe@wand.net.nz> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-15netfilter: xt_socket: fix broken v0 supportEric Dumazet
commit 681f130f39e10 ("netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag") added a potential NULL dereference if an old iptables package uses v0 of the match. Fix this by removing the test on @info in fast path. IPv6 can remove the test as well, as it uses v1 or v2. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-15netfilter: ctnetlink: fix incorrect NAT expectation dumpingPablo Neira Ayuso
nf_ct_expect_alloc leaves unset the expectation NAT fields. However, ctnetlink_exp_dump_expect expects them to be zeroed in case they are not used, which may not be the case. This results in dumping the NAT tuple of the expectation when it should not. Fix it by zeroing the NAT fields of the expectation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/freescale/fec_main.c drivers/net/ethernet/renesas/sh_eth.c net/ipv4/gre.c The GRE conflict is between a bug fix (kfree_skb --> kfree_skb_list) and the splitting of the gre.c code into seperate files. The FEC conflict was two sets of changes adding ethtool support code in an "!CONFIG_M5272" CPP protected block. Finally the sh_eth.c conflict was between one commit add bits set in the .eesr_err_check mask whilst another commit removed the .tx_error_check member and assignments. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-30netfilter: nf_queue: add NFQA_SKB_CSUM_NOTVERIFIED info flagFlorian Westphal
The common case is that TCP/IP checksums have already been verified, e.g. by hardware (rx checksum offload), or conntrack. Userspace can use this flag to determine when the checksum has not been validated yet. If the flag is set, this doesn't necessarily mean that the packet has an invalid checksum, e.g. if NIC doesn't support rx checksum. Userspace that sucessfully enabled NFQA_CFG_F_GSO queue feature flag can infer that IP/TCP checksum has already been validated if either the SKB_INFO attribute is not present or the NFQA_SKB_CSUM_NOTVERIFIED flag is unset. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-06-26ipvs: add sync_persist_mode flagJulian Anastasov
Add sync_persist_mode flag to reduce sync traffic by syncing only persistent templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: SH fallback and L4 hashingAlexander Frolkin
By default the SH scheduler rejects connections that are hashed onto a realserver of weight 0. This patch adds a flag to make SH choose a different realserver in this case, instead of rejecting the connection. The patch also adds a flag to make SH include the source port (TCP, UDP, SCTP) in the hash as well as the source address. This basically allows for deterministic round-robin load balancing (i.e., where any director in a cluster of directors with identical config will send the same packet the same way). The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options can be set per service. They are set using a new option to ipvsadm. Signed-off-by: Alexander Frolkin <avf@eldamar.org.uk> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-06-26ipvs: drop SCTP connections depending on stateJulian Anastasov
Drop SCTP connections under load (dropentry context) depending on the protocol state, just like for TCP: INIT conns are dropped immediately, established are dropped randomly while connections in progress or shutdown are skipped. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>