linux-toradex.git/net/ipv4, branch v3.2.26

tcp: drop SYN+FIN messages

2012-07-25T03:10:55+00:00

commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream.

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag

Reported-by: Denys Fedoryshchenko 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

xfrm: take net hdr len into account for esp payload size calculation

2012-06-10T13:42:07+00:00

[ Upstream commit 91657eafb64b4cb53ec3a2fbc4afc3497f735788 ]

Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.

According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.

Signed-off-by: Benjamin Poirier 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

ipv4: fix the rcu race between free_fib_info and ip_route_output_slow

2012-06-10T13:42:02+00:00

[ Upstream commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 ]

We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [] __might_sleep+0x10a/0x110
<4>[23900.007357]  [] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [] ? restore_all+0xf/0xf
<4>[23900.024024]  [] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [] error_code+0x5f/0x64
<4>[23900.133466]  [] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [] sys_connect+0x9a/0xd0
<4>[23900.183538]  [] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang 
Signed-off-by: Kun Jiang 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

ipv4: Do not use dead fib_info entries.

2012-06-10T13:42:02+00:00

[ Upstream commit dccd9ecc374462e5d6a5b8f8110415a86c2213d8 ]

Due to RCU lookups and RCU based release, fib_info objects can
be found during lookup which have fi->fib_dead set.

We must ignore these entries, otherwise we risk dereferencing
the parts of the entry which are being torn down.

Reported-by: Yevgen Pronenko 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: do_tcp_sendpages() must try to push data out on oom conditions

2012-05-20T21:56:51+00:00

commit bad115cfe5b509043b684d3a007ab54b80090aa1 upstream.

Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau 
Acked-by: Eric Dumazet 
Signed-off-by: Ben Hutchings

tcp: change tcp_adv_win_scale and tcp_rmem[2]

2012-05-20T21:56:38+00:00

[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Tom Herbert 
Cc: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: fix infinite cwnd in tcp_complete_cwr()

2012-05-20T21:56:37+00:00

[ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ]

When the cwnd reduction is done, ssthresh may be infinite
if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
infinite ssthresh value. The correct operation is to keep cwnd
intact because it has been updated in ECN or F-RTO.

Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: fix tcp_grow_window() for large incoming frames

2012-05-11T12:14:26+00:00

[ Upstream commit 4d846f02392a710f9604892ac3329e628e60a230 ]

tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.

tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.

This patch fixes one of the performance issue we noticed with GRO on.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Tom Herbert 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: avoid order-1 allocations on wifi and tx path

2012-05-11T12:14:22+00:00

[ This combines upstream commit
  a21d45726acacc963d8baddf74607d9b74e2b723 and the follow-on bug fix
  commit 22b4a4f22da4b39c6f7f679fd35f3d35c91bf851 ]

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN 
Tested-by: Marc MERLIN 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
[bwh: Correct commit hash for follow-on bug fix]
Signed-off-by: Ben Hutchings

tcp: fix tcp_trim_head()

2012-05-11T12:14:22+00:00

[ Upstream commit 4fa48bf3c75069d636fc8830743c929a062e80dc ]

commit f07d960df3 (tcp: avoid frag allocation for small frames)
breaked assumption in tcp stack that skb is either linear (skb->data_len
== 0), or fully fragged (skb->data_len == skb->len)

tcp_trim_head() made this assumption, we must fix it.

Thanks to Vijay for providing a very detailed explanation.

Reported-by: Vijay Subramanian 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings