linux-toradex.git/net/ipv4, branch v3.2.74

ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.

2015-11-27T12:48:25+00:00

[ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ]

Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[] dump_stack+0x52/0x80
[] check_preemption_disabled+0xce/0xe1
[] __this_cpu_preempt_check+0x13/0x15
[] ipmr_queue_xmit+0x647/0x70c
[] ip_mr_forward+0x32f/0x34e
[] ip_mroute_setsockopt+0xe03/0x108c
[] ? get_parent_ip+0x11/0x42
[] ? pollwake+0x4d/0x51
[] ? default_wake_function+0x0/0xf
[] ? get_parent_ip+0x11/0x42
[] ? __wake_up_common+0x45/0x77
[] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[] ? __wake_up_sync_key+0x4a/0x53
[] ? sock_def_readable+0x71/0x75
[] do_ip_setsockopt+0x9d/0xb55
[] ? unix_seqpacket_sendmsg+0x3f/0x41
[] ? sock_sendmsg+0x6d/0x86
[] ? sockfd_lookup_light+0x12/0x5d
[] ? SyS_sendto+0xf3/0x11b
[] ? new_sync_read+0x82/0xaa
[] compat_ip_setsockopt+0x3b/0x99
[] compat_raw_setsockopt+0x11/0x32
[] compat_sock_common_setsockopt+0x18/0x1f
[] compat_SyS_setsockopt+0x1a9/0x1cf
[] compat_SyS_socketcall+0x180/0x1e3
[] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS]
Signed-off-by: Ben Hutchings

net: add length argument to skb_copy_and_csum_datagram_iovec

2015-11-17T15:54:45+00:00

Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.

This is needed for stable kernels where 89c22d8c3b27 ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees <= 3.18.

This also fixes a kernel crash for NFS servers when the client uses
 -onfsvers=3,proto=udp to mount the export.

Signed-off-by: Sabrina Dubroca 
Reviewed-by: Hannes Frederic Sowa 
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings

ipv6: lock socket in ip6_datagram_connect()

2015-10-13T02:46:12+00:00

[ Upstream commit 03645a11a570d52e70631838cb786eb4253eb463 ]

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.

Signed-off-by: Eric Dumazet 
Acked-by: Herbert Xu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

inet: frags: fix defragmented packet's IP header for af_packet

2015-08-12T14:33:22+00:00

commit 0848f6428ba3a2e42db124d41ac6f548655735bf upstream.

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee 
Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Jerry Chu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

udp: fix behavior of wrong checksums

2015-08-06T23:32:15+00:00

commit beb39db59d14990e401e235faf66a6b9b31240b0 upstream.

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
   This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
   processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: avoid looping in tcp_send_fin()

2015-05-09T22:16:40+00:00

[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Drop inapplicable change to sk_forced_wmem_schedule()
 - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings

ip_forward: Drop frames with attached skb->sk

2015-05-09T22:16:40+00:00

[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ]

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment

Signed-off-by: Sebastian Poehn 
Signed-off-by: David S. Miller 
Signed-off-by: Ben Hutchings

tcp: make connect() mem charging friendly

2015-05-09T22:16:39+00:00

[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ]

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko 
Signed-off-by: Eric Dumazet 
Acked-by: Yuchung Cheng 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Drop the Fast Open changes
 - Adjust context]
Signed-off-by: Ben Hutchings

net: avoid to hang up on sending due to sysctl configuration overflow.

2015-05-09T22:16:38+00:00

commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream.

    I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.

    This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:

net.core.wmem_default
net.core.rmem_default

net.core.rmem_max
net.core.wmem_max

net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min

net.ipv4.tcp_wmem
net.ipv4.tcp_rmem

Signed-off-by: Eric Dumazet 
Signed-off-by: Li Yu 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Adjust context
 - Delete now-unused 'zero' variable]
Signed-off-by: Ben Hutchings

net: ping: Return EAFNOSUPPORT when appropriate.

2015-05-09T22:16:38+00:00

[ Upstream commit 9145736d4862145684009d6a72a6e61324a9439e ]

1. For an IPv4 ping socket, ping_check_bind_addr does not check
   the family of the socket address that's passed in. Instead,
   make it behave like inet_bind, which enforces either that the
   address family is AF_INET, or that the family is AF_UNSPEC and
   the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
   if the socket family is not AF_INET6. Return EAFNOSUPPORT
   instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
   instead of EINVAL if an incorrect socket address structure is
   passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
   IPv4, and it cannot easily be made to support IPv4 because
   the protocol numbers for ICMP and ICMPv6 are different. This
   makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
   of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

    int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
    struct sockaddr_in6 sin6 = {
        .sin6_family = AF_INET6,
        .sin6_addr = in6addr_any,
    };
    bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2:
 - Drop the IPv6 part
 - Adjust context, indentation]
Signed-off-by: Ben Hutchings