linux-toradex.git/net/ipv4, branch v4.1.12

inet: fix race in reqsk_queue_unlink()

2015-10-27T00:51:50+00:00

[ Upstream commit 2306c704ce280c97a60d1f45333b822b40281dea ]

reqsk_timer_handler() tests if icsk_accept_queue.listen_opt
is NULL at its beginning.

By the time it calls inet_csk_reqsk_queue_drop() and
reqsk_queue_unlink(), listener might have been closed and
inet_csk_listen_stop() had called reqsk_queue_yank_acceptq()
which sets icsk_accept_queue.listen_opt to NULL

We therefore need to correctly check listen_opt being NULL
after holding syn_wait_lock for proper synchronization.

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Fixes: b357a364c57c ("inet: fix possible panic in reqsk_queue_unlink()")
Signed-off-by: Eric Dumazet 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: fix races in reqsk_queue_hash_req()

2015-10-27T00:51:49+00:00

[ Upstream commit 29c6852602e259d2c1882f320b29d5c3fec0de04 ]

Before allowing lockless LISTEN processing, we need to make
sure to arm the SYN_RECV timer before the req socket is visible
in hash tables.

Also, req->rsk_hash should be written before we set rsk_refcnt
to a non zero value.

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet 
Cc: Ying Cai 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: fix potential deadlock in reqsk_queue_unlink()

2015-10-22T21:43:24+00:00

commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af upstream.

When replacing del_timer() with del_timer_sync(), I introduced
a deadlock condition :

reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()

inet_csk_reqsk_queue_drop() can be called from many contexts,
one being the timer handler itself (reqsk_timer_handler()).

In this case, del_timer_sync() loops forever.

Simple fix is to test if timer is pending.

Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Cc: Holger Hoffstätte 
Cc: Andre Tomt 
Cc: Chris Caputo 
Signed-off-by: Greg Kroah-Hartman

tcp: add proper TS val into RST packets

2015-10-03T11:49:17+00:00

[ Upstream commit 675ee231d960af2af3606b4480324e26797eb010 ]

RST packets sent on behalf of TCP connections with TS option (RFC 7323
TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.

A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
ecr 0], length 0
B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
1460,nop,nop,TS val 7264344 ecr 100], length 0
A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
7264344], length 0

B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
ecr 110], length 0

We need to call skb_mstamp_get() to get proper TS val,
derived from skb->skb_mstamp

Note that RFC 1323 was advocating to not send TS option in RST segment,
but RFC 7323 recommends the opposite :

  Once TSopt has been successfully negotiated, that is both  and
   contain TSopt, the TSopt MUST be sent in every non-
  segment for the duration of the connection, and SHOULD be sent in an
   segment (see Section 5.2 for details)

Note this RFC recommends to send TS val = 0, but we believe it is
premature : We do not know if all TCP stacks are properly
handling the receive side :

   When an  segment is
   received, it MUST NOT be subjected to the PAWS check by verifying an
   acceptable value in SEG.TSval, and information from the Timestamps
   option MUST NOT be used to update connection state information.
   SEG.TSecr MAY be used to provide stricter  acceptance checks.

In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
to decide to send TS val = 0, if it buys something.

Fixes: 7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet 
Acked-by: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: off-by-one in continuation handling in /proc/net/route

2015-09-29T17:26:26+00:00

[ Upstream commit 25b97c016b26039982daaa2c11d83979f93b71ab ]

When generating /proc/net/route we emit a header followed by a line for
each route.  When a short read is performed we will restart this process
based on the open file descriptor.  When calculating the start point we
fail to take into account that the 0th entry is the header.  This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

  while read l; do echo "$l"; done A
  cat /proc/net/route >B
  diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output.  There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

  +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
  -tun1  00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440
Acked-by: Alexander Duyck 
Signed-off-by: Andy Whitcroft 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: fix races with reqsk timers

2015-09-29T17:26:26+00:00

[ Upstream commit 2235f2ac75fd2501c251b0b699a9632e80239a6d ]

reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.

But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: fix possible request socket leak

2015-09-29T17:26:26+00:00

[ Upstream commit 3257d8b12f954c462d29de6201664a846328a522 ]

In commit b357a364c57c9 ("inet: fix possible panic in
reqsk_queue_unlink()"), I missed fact that tcp_check_req()
can return the listener socket in one case, and that we must
release the request socket refcount or we leak it.

Tested:

 Following packetdrill test template shows the issue

0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0    bind(3, ..., ...) = 0
+0    listen(3, 1) = 0

+0    < S 0:0(0) win 2920 
+0    > S. 0:0(0) ack 1 
+.002 < . 1:1(0) ack 21 win 2920
+0    > R 21:21(0)

Fixes: b357a364c57c9 ("inet: fix possible panic in reqsk_queue_unlink()")
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

udp: fix dst races with multicast early demux

2015-09-29T17:26:25+00:00

[ Upstream commit 10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a ]

Multicast dst are not cached. They carry DST_NOCACHE.

As mentioned in commit f8864972126899 ("ipv4: fix dst race in
sk_dst_get()"), these dst need special care before caching them
into a socket.

Caching them is allowed only if their refcnt was not 0, ie we
must use atomic_inc_not_zero()

Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned
in commit d0c294c53a771 ("tcp: prevent fetching dst twice in early demux
code")

Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Tested-by: Gregory Hoggarth 
Signed-off-by: Eric Dumazet 
Reported-by: Gregory Hoggarth 
Reported-by: Alex Gartrell 
Cc: Michal Kubeček 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

fib_trie: Drop unnecessary calls to leaf_pull_suffix

2015-09-29T17:26:24+00:00

[ Upstream commit 1513069edcf8dd86cfd8d5daef482b97d6b93df6 ]

It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node.  As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it.  As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.

The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.

Reported-by: David Ahern 
Signed-off-by: Alexander Duyck 
Tested-by: David Ahern 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: frags: fix defragmented packet's IP header for af_packet

2015-09-29T17:26:23+00:00

[ Upstream commit 0848f6428ba3a2e42db124d41ac6f548655735bf ]

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee 
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Jerry Chu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman