summaryrefslogtreecommitdiff
path: root/include/net/ip_vs.h
AgeCommit message (Collapse)Author
2019-11-12ipvs: move old_secure_tcp into struct netns_ipvsEric Dumazet
[ Upstream commit c24b75e0f9239e78105f81c5f03a751641eb07ef ] syzbot reported the following issue : BUG: KCSAN: data-race in update_defense_level / update_defense_level read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1: update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0: update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events defense_work_handler Indeed, old_secure_tcp is currently a static variable, while it needs to be a per netns variable. Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2016-05-11ipvs: drop first packet to redirect conntrackJulian Anastasov
commit f719e3754ee2f7275437e61a6afd520181fdd43b upstream. Jiri Bohac is reporting for a problem where the attempt to reschedule existing connection to another real server needs proper redirect for the conntrack used by the IPVS connection. For example, when IPVS connection is created to NAT-ed real server we alter the reply direction of conntrack. If we later decide to select different real server we can not alter again the conntrack. And if we expire the old connection, the new connection is left without conntrack. So, the only way to redirect both the IPVS connection and the Netfilter's conntrack is to drop the SYN packet that hits existing connection, to wait for the next jiffie to expire the old connection and its conntrack and to rely on client's retransmission to create new connection as usually. Jiri Bohac provided a fix that drops all SYNs on rescheduling, I extended his patch to do such drops only for connections that use conntrack. Here is the original report from Jiri Bohac: Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead"), new connections to dead servers are redistributed immediately to new servers. The old connection is expired using ip_vs_conn_expire_now() which sets the connection timer to expire immediately. However, before the timer callback, ip_vs_conn_expire(), is run to clean the connection's conntrack entry, the new redistributed connection may already be established and its conntrack removed instead. Fix this by dropping the first packet of the new connection instead, like we do when the destination server is not available. The timer will have deleted the old conntrack entry long before the first packet of the new connection is retransmitted. Fixes: dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead") Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-24ipvs: Remove skb_sknetEric W. Biederman
This function adds no real value and it obscures what the code is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Remove skb_netEric W. Biederman
This hack has no more users so remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_protocol_net_(init|cleanup)Eric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Remove net argument from ip_vs_tcp_conn_listenEric W. Biederman
The argument is unnecessary and in practice confusing, and has caused the callers to do all manner of silly things. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Wrap sysctl_cache_bypass and remove ifdefs in ip_vs_leaveEric W. Biederman
With sysctl_cache_bypass now a compile time constant the compiler can figue out that it can elimiate all of the code that depends on sysctl_cache_bypass being true. Also remove the duplicate computation of net previously necessitated by #ifdef CONFIG_SYSCTL Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into .conn_schedule and ip_vs_try_to_scheduleEric W. Biederman
This moves the hack "net_ipvs(skb_net(skb))" up one level where it will be easier to remove. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into ip_vs_conn_net_init and ip_vs_conn_net_cleanupEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into conn_out_getEric W. Biederman
Move the hack of relying on "net_ipvs(skb_net(skb))" to derive the ipvs up a layer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs into .conn_in_get and ip_vs_conn_in_get_protoEric W. Biederman
Stop relying on "net_ipvs(skb_net(skb))" to derive the ipvs as skb_net is a hack. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into init_netns and exit_netnsEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into ip_vs_app_net_init and ip_vs_app_net_cleanupEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to register_ip_vs_app and unregister_ip_vs_appEric W. Biederman
Also move the tests for net_ipvs being NULL into __ip_vs_ftp_init and __ip_vs_ftp_exit. The only places where they possibly make sense. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to register_ip_vs_app_incEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into register_app and unregister_appEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_estimator_net_init and ip_vs_estimator_cleanupEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net into ip_vs_control_net_(init|cleanup)Eric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_random_drop_entryEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_start_estimator aned ip_vs_stop_estimatorEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_sync_net_cleanupEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_sync_net_initEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_sync_connEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to stop_sync_threadEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to start_sync_threadEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_proto_data_getEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_service_net_cleanupEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_find_destEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_has_real_serviceEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_service_findEric W. Biederman
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Store ipvs not net in struct ip_vs_serviceEric W. Biederman
In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of param->net to access param->ipvs->net instead. In functions where we are searching for an svc and filtering by net filter by ipvs instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Pass ipvs not net to ip_vs_fill_connEric W. Biederman
ipvs is what is actually desired so change the parameter and the modify the callers to pass struct netns_ipvs. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Store ipvs not net in struct ip_vs_conn_paramEric W. Biederman
In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of param->net to access param->ipvs->net instead. When lookup up struct ip_vs_conn in a hash table replace comparisons of cp->net with comparisons of cp->ipvs which is possible now that ipvs is present in ip_vs_conn_param. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-24ipvs: Store ipvs not net in struct ip_vs_connEric W. Biederman
In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of conn->net to access conn->ipvs->net instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-17ipvs: add sysctl to ignore tunneled packetsAlex Gartrell
This is a way to avoid nasty routing loops when multiple ipvs instances can forward to eachother. Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01ipvs: add schedule_icmp sysctlAlex Gartrell
This sysctl will be used to enable the scheduling of icmp packets. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01ipvs: drop inverse argument to conn_{in,out}_getAlex Gartrell
No longer necessary since the information is included in the ip_vs_iphdr itself. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01ipvs: Add hdr_flags to iphdrAlex Gartrell
These flags contain information like whether or not the addresses are inverted or from icmp. The first will allow us to drop an inverse param all over the place, and the second will later be useful in scheduling icmp. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-01ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_offAlex Gartrell
This removes some duplicated code and makes the ICMPv6 path look more like the ICMP path. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-08-21ipvs: add more mcast parameters for the sync daemonJulian Anastasov
- mcast_group: configure the multicast address, now IPv6 is supported too - mcast_port: configure the multicast port - mcast_ttl: configure the multicast TTL/HOP_LIMIT Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-08-21ipvs: add sync_maxlen parameter for the sync daemonJulian Anastasov
Allow setups with large MTU to send large sync packets by adding sync_maxlen parameter. The default value is now based on MTU but no more than 1500 for compatibility reasons. To avoid problems if MTU changes allow fragmentation by sending packets with DF=0. Problem reported by Dan Carpenter. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-03-12net: Introduce possible_net_tEric W. Biederman
Having to say > #ifdef CONFIG_NET_NS > struct net *net; > #endif in structures is a little bit wordy and a little bit error prone. Instead it is possible to say: > typedef struct { > #ifdef CONFIG_NET_NS > struct net *net; > #endif > } possible_net_t; And then in a header say: > possible_net_t net; Which is cleaner and easier to use and easier to test, as the possible_net_t is always there no matter what the compile options. Further this allows read_pnet and write_pnet to be functions in all cases which is better at catching typos. This change adds possible_net_t, updates the definitions of read_pnet and write_pnet, updates optional struct net * variables that write_pnet uses on to have the type possible_net_t, and finally fixes up the b0rked users of read_pnet and write_pnet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25ipvs: allow rescheduling of new connections when port reuse is detectedMarcelo Ricardo Leitner
Currently, when TCP/SCTP port reusing happens, IPVS will find the old entry and use it for the new one, behaving like a forced persistence. But if you consider a cluster with a heavy load of small connections, such reuse will happen often and may lead to a not optimal load balancing and might prevent a new node from getting a fair load. This patch introduces a new sysctl, conn_reuse_mode, that allows controlling how to proceed when port reuse is detected. The default value will allow rescheduling of new connections only if the old entry was in TIME_WAIT state for TCP or CLOSED for SCTP. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-09ipvs: use 64-bit rates in statsJulian Anastasov
IPVS stats are limited to 2^(32-10) conns/s and packets/s, 2^(32-5) bytes/s. It is time to use 64 bits: * Change all conn/packet kernel counters to 64-bit and update them in u64_stats_update_{begin,end} section * In kernel use struct ip_vs_kstats instead of the user-space struct ip_vs_stats_user and use new func ip_vs_export_stats_user to export it to sockopt users to preserve compatibility with 32-bit values * Rename cpu counters "ustats" to "cnt" * To netlink users provide additionally 64-bit stats: IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats remain for old binaries. * We can use ip_vs_copy_stats in ip_vs_stats_percpu_show Thanks to Chris Caputo for providing initial patch for ip_vs_est.c Signed-off-by: Chris Caputo <ccaputo@alt.net> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-10-02ipvs: Clean up comment style in ip_vs.hSimon Horman
* Consistently use the multi-line comment style for networking code: /* This * That * The other thing */ * Use single-line comment style for comments with only one line of text. * In general follow the leading '*' of each line of a comment with a single space and then text. * Add missing line break between functions, remove double line break, align comments to previous lines whenever possible. Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-16ipvs: prevent mixing heterogeneous pools and synchronizationAlex Gartrell
The synchronization protocol is not compatible with heterogeneous pools, so we need to verify that we're not turning both on at the same time. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination address family to ip_vs_conn_newAlex Gartrell
The assumption that dest af is equal to service af is now unreliable, so we must specify it manually so as not to copy just the first 4 bytes of a v6 address or doing an illegal read of 16 butes on a v6 address. We "lie" in two places: for synchronization (which we will explicitly disallow from happening when we have heterogeneous pools) and for black hole addresses where there's no real dest. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}Alex Gartrell
We need to remove the assumption that virtual address family is the same as real address family in order to support heterogeneous services (that is, services with v4 vips and v6 backends or the opposite). Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-09-16ipvs: Add destination address family to netlink interfaceAlex Gartrell
This is necessary to support heterogeneous pools. For example, if you have an ipv6 addressed network, you'll want to be able to forward ipv4 traffic into it. This patch enforces that destination address family is the same as service family, as none of the forwarding mechanisms support anything else. For the old setsockopt mechanism, we simply set the dest address family to AF_INET as we do with the service. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2014-04-18arch: Mass conversion of smp_mb__*()Peter Zijlstra
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>