linux-toradex.git/net/ipv4/tcp_input.c, branch v3.4.88

tcp: do not forget FIN in tcp_shifted_skb()

2013-11-04T12:23:40+00:00

[ Upstream commit 5e8a402f831dbe7ee831340a91439e46f0d38acd ]

Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792 
+.000 > S. 0:0(0) ack 1 
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(10000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 
+.050 < . 1:1(0) ack 2001 win 257 
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%

First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Cc: Ilpo Järvinen 
Acked-by: Ilpo Järvinen 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: bug fix in proportional rate reduction.

2013-06-27T18:27:31+00:00

[ Upstream commit 35f079ebbc860dcd1cca70890c9c8d59c1145525 ]

This patch is a fix for a bug triggering newly_acked_sacked < 0
in tcp_ack(.).

The bug is triggered by sacked_out decreasing relative to prior_sacked,
but packets_out remaining the same as pior_packets. This is because the
snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
prior_sacked is captured before tcp_sacktag_write_queue(). The problem
is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
adjusts the pcount for packets_out and sacked_out (MSS change or other
reason). As a result, this delta in pcount is reflected in
(prior_sacked - sacked_out) but not in (prior_packets - packets_out).

This patch does the following:
1) initializes prior_packets at the start of tcp_ack() so as to
capture the delta in packets_out created by tcp_fragment.
2) introduces a new "previous_packets_out" variable that snapshots
packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
correctly computed as before.
3) Computes pkts_acked using previous_packets_out, and computes
newly_acked_sacked using prior_packets.

Signed-off-by: Nandita Dukkipati 
Acked-by: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: call tcp_replace_ts_recent() from tcp_ack()

2013-05-01T16:41:08+00:00

[ Upstream commit 12fb3dd9dc3c64ba7d64cec977cca9b5fb7b1d4e ]

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.

1 A > B P. 1:10001(10000) ack 1 
2 B < A . 1:1(0) ack 1 win 257 
3 A > B . 1:1001(1000) ack 1 win 227 
4 A > B . 1001:2001(1000) ack 1 win 227 

(ecr 200 should be ecr 300 in packets 3 & 4)

Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.

Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.

Reported-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: undo spurious timeout after SACK reneging

2013-04-05T17:04:38+00:00

[ Upstream commit 7ebe183c6d444ef5587d803b64a1f4734b18c564 ]

On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.

Signed-off-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: fix double-counted receiver RTT when leaving receiver fast path

2013-03-20T20:05:01+00:00

[ Upstream commit aab2b4bf224ef8358d262f95b568b8ad0cecf0a0 ]

We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both
before and after going to step5. That wastes CPU and double-counts the
receiver-side RTT sample.

Signed-off-by: Neal Cardwell 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: fix for zero packets_in_flight was too broad

2013-02-14T18:49:06+00:00

[ Upstream commit 6731d2095bd4aef18027c72ef845ab1087c3ba63 ]

There are transients during normal FRTO procedure during which
the packets_in_flight can go to zero between write_queue state
updates and firing the resulting segments out. As FRTO processing
occurs during that window the check must be more precise to
not match "spuriously" :-). More specificly, e.g., when
packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
branch that set cwnd into zero would not be taken and new segments
might be sent out later.

Signed-off-by: Ilpo Järvinen 
Tested-by: Eric Dumazet 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: frto should not set snd_cwnd to 0

2013-02-14T18:49:06+00:00

[ Upstream commit 2e5f421211ff76c17130b4597bc06df4eeead24f ]

Commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
uncovered a bug in FRTO code :
tcp_process_frto() is setting snd_cwnd to 0 if the number
of in flight packets is 0.

As Neal pointed out, if no packet is in flight we lost our
chance to disambiguate whether a loss timeout was spurious.

We should assume it was a proper loss.

Reported-by: Pasi Kärkkäinen 
Signed-off-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
Cc: Ilpo Järvinen 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation

2013-01-11T17:07:15+00:00

[ Upstream commit 354e4aa391ed50a4d827ff6fc11e0667d0859b25 ]

RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]

  All TCP stacks MAY implement the following mitigation.  TCP stacks
  that implement this mitigation MUST add an additional input check to
  any incoming segment.  The ACK value is considered acceptable only if
  it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
  SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
  above condition MUST be discarded and an ACK sent back.

Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Jerry Chu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()

2013-01-11T17:07:15+00:00

[ Upstream commit bd090dfc634ddd711a5fbd0cadc6e0ab4977bcaf ]

We added support for RFC 5961 in latest kernels but TCP fails
to perform exhaustive check of ACK sequence.

We can update our view of peer tsval from a frame that is
later discarded by tcp_ack()

This makes timestamps enabled sessions vulnerable to injection of
a high tsval : peers start an ACK storm, since the victim
sends a dupack each time it receives an ACK from the other peer.

As tcp_validate_incoming() is called before tcp_ack(), we should
not peform tcp_replace_ts_recent() from it, and let callers do it
at the right time.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Nandita Dukkipati 
Cc: H.K. Jerry Chu 
Cc: Romain Francoise 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: refine SYN handling in tcp_validate_incoming

2013-01-11T17:07:15+00:00

[ Upstream commit e371589917011efe6ff8c7dfb4e9e81934ac5855 ]

Followup of commit 0c24604b68fc (tcp: implement RFC 5961 4.2)

As reported by Vijay Subramanian, we should send a challenge ACK
instead of a dup ack if a SYN flag is set on a packet received out of
window.

This permits the ratelimiting to work as intended, and to increase
correct SNMP counters.

Suggested-by: Vijay Subramanian 
Signed-off-by: Eric Dumazet 
Acked-by: Vijay Subramanian 
Cc: Kiran Kumar Kella 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman