linux-toradex.git/include/net, branch v3.2.73

ipv6: prevent fib6_run_gc() contention

2015-10-13T02:46:13+00:00

commit 2ac3ac8f86f2fe065d746d9a9abaca867adec577 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
Cc: Konstantin Khlebnikov

ipv6: lock socket in ip6_datagram_connect()

2015-10-13T02:46:12+00:00

[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ]

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet 
Acked-by: Herbert Xu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

ipvs: kernel oops - do_ip_vs_get_ctl

2015-08-06T23:32:16+00:00

commit 8537de8a7ab6681cc72fb0411ab1ba7fdba62dd0 upstream.

Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" 
Signed-off-by: Hans Schillstrom 
Acked-by: Julian Anastasov 
Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Simon Horman 
Signed-off-by: Ben Hutchings 
Cc: Andreas Unterkircher

sctp: fix ASCONF list handling

2015-08-06T23:32:15+00:00

commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 upstream.

->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen 
Suggested-by: Neil Horman 
Suggested-by: Hannes Frederic Sowa 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Adjust filename, context
 - Most per-netns state is global]
Signed-off-by: Ben Hutchings

netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len

2015-05-09T22:16:35+00:00

commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream.

"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.

nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8

I have seen "len" up to 280 and my host has crashes w/o this patch.

The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.

Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Signed-off-by: Andrey Vagin 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Ben Hutchings

Revert "tcp: Apply device TSO segment limit earlier"

2015-02-20T00:49:33+00:00

This reverts commit 9f871e883277cc22c6217db806376dce52401a31, which
was commit 1485348d2424e1131ea42efc033cbd9366462b01 upstream.

It can cause connections to stall when a PMTU event occurs.  This was
fixed by commit 843925f33fcc ("tcp: Do not apply TSO segment limit to
non-TSO packets") upstream, but that depends on other changes to TSO.

The original issue this fixed was a performance regression for the sfc
driver in extreme cases of TSO (skb with > 100 segments).  This is not
really very important and it seems best to revert it rather than try
to fix it up.

Signed-off-by: Ben Hutchings 
Cc: Herbert Xu 
Cc: netdev@vger.kernel.org
Cc: linux-net-drivers@solarflare.com

fib_trie: Fix /proc/net/fib_trie when CONFIG_IP_MULTIPLE_TABLES is not defined

2015-02-20T00:49:29+00:00

commit a5a519b2710be43fce3cf9ce7bd8de8db3f2a9de upstream.

In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result
when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple
times.  I found that the problem line of code was in the function
fib_trie_seq_next.  Specifically the line below caused the indexes to go in
the opposite direction of our traversal:

	h = tb->tb_id & (FIB_TABLE_HASHSZ - 1);

This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID
255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254
with a TABLE_MAIN_INDEX of 1.  This means that the above line will return 1
for the local table and 0 for main.  The result is that fib_trie_seq_next
will return NULL at the end of the local table, fib_trie_seq_start will
return the start of the main table, and then fib_trie_seq_next will loop on
main forever as h will always return 0.

The fix for this is to reverse the ordering of the two tables.  It has the
advantage of making it so that the tables now print in the same order
regardless of if multiple tables are enabled or not.  In order to make the
definition consistent with the multiple tables case I simply masked the to
RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1).  This way the two table
layouts should always stay consistent.

Fixes: 93456b6 ("[IPV4]: Unify access to the routing tables")
Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: md5: remove spinlock usage in fast path

2015-01-01T01:27:52+00:00

commit 71cea17ed39fdf1c0634f530ddc6a2c2fc601c2b upstream.

TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.

[ tcp_md5sig_pool_lock is locked twice per incoming packet ]

Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.

Next step would be to allow crypto allocations being done in a NUMA
aware way.

Signed-off-by: Eric Dumazet 
Cc: Herbert Xu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Adjust context
 - Conditions for alloc/free are quite different]
Signed-off-by: Ben Hutchings

drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets

2015-01-01T01:27:51+00:00

commit 5188cd44c55db3e92cd9e77a40b5baa7ed4340f7 upstream.

UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway.  Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings 
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller 
[bwh: For 3.2, net/ipv6/output_core.c is a completely new file]

tcp: be more strict before accepting ECN negociation

2014-12-14T16:24:00+00:00

commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24 upstream.

It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet 
Cc: Perry Lorier 
Cc: Matt Mathis 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
Cc: Wilmer van der Gaast 
Cc: Ankur Jain 
Cc: Tom Herbert 
Cc: Dave Täht 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings 
Cc: Florian Westphal