linux-toradex.git/net/ipv4, branch v3.4.30

tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation

2013-01-11T17:07:15+00:00

[ Upstream commit 354e4aa391ed50a4d827ff6fc11e0667d0859b25 ]

RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]

  All TCP stacks MAY implement the following mitigation.  TCP stacks
  that implement this mitigation MUST add an additional input check to
  any incoming segment.  The ACK value is considered acceptable only if
  it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
  SND.NXT).  All incoming segments whose ACK value doesn't satisfy the
  above condition MUST be discarded and an ACK sent back.

Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Jerry Chu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()

2013-01-11T17:07:15+00:00

[ Upstream commit bd090dfc634ddd711a5fbd0cadc6e0ab4977bcaf ]

We added support for RFC 5961 in latest kernels but TCP fails
to perform exhaustive check of ACK sequence.

We can update our view of peer tsval from a frame that is
later discarded by tcp_ack()

This makes timestamps enabled sessions vulnerable to injection of
a high tsval : peers start an ACK storm, since the victim
sends a dupack each time it receives an ACK from the other peer.

As tcp_validate_incoming() is called before tcp_ack(), we should
not peform tcp_replace_ts_recent() from it, and let callers do it
at the right time.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Nandita Dukkipati 
Cc: H.K. Jerry Chu 
Cc: Romain Francoise 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: refine SYN handling in tcp_validate_incoming

2013-01-11T17:07:15+00:00

[ Upstream commit e371589917011efe6ff8c7dfb4e9e81934ac5855 ]

Followup of commit 0c24604b68fc (tcp: implement RFC 5961 4.2)

As reported by Vijay Subramanian, we should send a challenge ACK
instead of a dup ack if a SYN flag is set on a packet received out of
window.

This permits the ratelimiting to work as intended, and to increase
correct SNMP counters.

Suggested-by: Vijay Subramanian 
Signed-off-by: Eric Dumazet 
Acked-by: Vijay Subramanian 
Cc: Kiran Kumar Kella 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: implement RFC 5961 4.2

2013-01-11T17:07:15+00:00

[ Upstream commit 0c24604b68fc7810d429d6c3657b6f148270e528 ]

Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.

Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.

Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)

Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.

Signed-off-by: Eric Dumazet 
Cc: Kiran Kumar Kella 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: implement RFC 5961 3.2

2013-01-11T17:07:14+00:00

[ Upstream commit 282f23c6ee343126156dd41218b22ece96d747e3 ]

Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.

Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)

If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.

Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.

Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)

Signed-off-by: Eric Dumazet 
Cc: Kiran Kumar Kella 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock

2013-01-11T17:07:14+00:00

[ Upstream commit e337e24d6624e74a558aa69071e112a65f7b5758 ]

If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [] kmemleak_alloc+0x21/0x3e
    [] kmem_cache_alloc+0xb5/0xc5
    [] sk_prot_alloc.isra.53+0x2b/0xcd
    [] sk_clone_lock+0x16/0x21e
    [] inet_csk_clone_lock+0x10/0x7b
    [] tcp_create_openreq_child+0x21/0x481
    [] tcp_v4_syn_recv_sock+0x3a/0x23b
    [] tcp_check_req+0x29f/0x416
    [] tcp_v4_do_rcv+0x161/0x2bc
    [] tcp_v4_rcv+0x6c9/0x701
    [] ip_local_deliver_finish+0x70/0xc4
    [] ip_local_deliver+0x4e/0x7f
    [] ip_rcv_finish+0x1fc/0x233
    [] ip_rcv+0x217/0x267
    [] __netif_receive_skb+0x49e/0x553
    [] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet_diag: validate port comparison byte code to prevent unsafe reads

2013-01-11T17:06:29+00:00

[ Upstream commit 5e1f54201cb481f40a04bc47e1bc8c093a189e23 ]

Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.

Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.

Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()

2013-01-11T17:06:28+00:00

[ Upstream commit f67caec9068cee426ec23cf9005a1dee2ecad187 ]

Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).

This change is needed for two reasons:

(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.

(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).

Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet_diag: validate byte code to prevent oops in inet_diag_bc_run()

2013-01-11T17:06:28+00:00

[ Upstream commit 405c005949e47b6e91359159c24753519ded0c67 ]

Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.

Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.

Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state

2013-01-11T17:06:28+00:00

[ Upstream commit 1c95df85ca49640576de2f0a850925957b547b84 ]

Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman