linux-toradex.git/net/ipv4, branch v3.0.44

net: ipv4: ipmr_expire_timer causes crash when removing net namespace

2012-10-02T16:47:22+00:00

[ Upstream commit acbb219d5f53821b2d0080d047800410c0420ea1 ]

When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.

Tested in Linux 3.4.8.

Signed-off-by: Francesco Ruggeri 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: Apply device TSO segment limit earlier

2012-10-02T16:47:04+00:00

[ Upstream commit 1485348d2424e1131ea42efc033cbd9366462b01 ]

Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs.  This avoids the need to fall back to
software GSO for local TCP senders.

Signed-off-by: Ben Hutchings 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: perform DMA to userspace only if there is a task waiting for it

2012-08-09T15:27:52+00:00

[ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ]

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: Add TCP_USER_TIMEOUT negative value check

2012-08-09T15:27:52+00:00

[ Upstream commit 42493570100b91ef663c4c6f0c0fdab238f9d3c2 ]

TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.

Signed-off-by: Hangbin Liu 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

cipso: don't follow a NULL pointer when setsockopt() is called

2012-08-09T15:27:52+00:00

[ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ]

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface.  In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

	#include 
	#include 
	#include 
	#include 
	#include 

	struct local_tag {
		char type;
		char length;
		char info[4];
	};

	struct cipso {
		char type;
		char length;
		char doi[4];
		struct local_tag local;
	};

	int main(int argc, char **argv)
	{
		int sockfd;
		struct cipso cipso = {
			.type = IPOPT_CIPSO,
			.length = sizeof(struct cipso),
			.local = {
				.type = 128,
				.length = sizeof(struct local_tag),
			},
		};

		memset(cipso.doi, 0, 4);
		cipso.doi[3] = 3;

		sockfd = socket(AF_INET, SOCK_DGRAM, 0);
		#define SOL_IP 0
		setsockopt(sockfd, SOL_IP, IP_OPTIONS,
			&cipso, sizeof(struct cipso));

		return 0;
	}

CC: Lin Ming 
Reported-by: Alan Cox 
Signed-off-by: Paul Moore 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: drop SYN+FIN messages

2012-07-19T15:58:22+00:00

commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream.

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag

Reported-by: Denys Fedoryshchenko 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Cc: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

xfrm: take net hdr len into account for esp payload size calculation

2012-06-09T15:33:03+00:00

[ Upstream commit 91657eafb64b4cb53ec3a2fbc4afc3497f735788 ]

Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.

According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.

Signed-off-by: Benjamin Poirier 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: fix the rcu race between free_fib_info and ip_route_output_slow

2012-06-09T15:33:02+00:00

[ Upstream commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 ]

We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G        W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209]  [] ? pgtable_bad+0x130/0x130
<4>[23900.000416]  [] __might_sleep+0x10a/0x110
<4>[23900.007357]  [] do_page_fault+0xd1/0x3c0
<4>[23900.013764]  [] ? restore_all+0xf/0xf
<4>[23900.024024]  [] ? napi_complete+0x8b/0x690
<4>[23900.029297]  [] ? pgtable_bad+0x130/0x130
<4>[23900.123739]  [] ? pgtable_bad+0x130/0x130
<4>[23900.128955]  [] error_code+0x5f/0x64
<4>[23900.133466]  [] ? pgtable_bad+0x130/0x130
<4>[23900.138450]  [] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312]  [] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730]  [] ip_route_output_flow+0x1f/0x60
<4>[23900.156261]  [] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960]  [] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834]  [] inet_dgram_connect+0x36/0x80
<4>[23900.173224]  [] ? _copy_from_user+0x48/0x140
<4>[23900.178817]  [] sys_connect+0x9a/0xd0
<4>[23900.183538]  [] ? alloc_file+0xdc/0x240
<4>[23900.189111]  [] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang 
Signed-off-by: Kun Jiang 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ipv4: Do not use dead fib_info entries.

2012-06-09T15:33:02+00:00

[ Upstream commit dccd9ecc374462e5d6a5b8f8110415a86c2213d8 ]

Due to RCU lookups and RCU based release, fib_info objects can
be found during lookup which have fi->fib_dead set.

We must ignore these entries, otherwise we risk dereferencing
the parts of the entry which are being torn down.

Reported-by: Yevgen Pronenko 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: do_tcp_sendpages() must try to push data out on oom conditions

2012-05-21T16:40:03+00:00

commit bad115cfe5b509043b684d3a007ab54b80090aa1 upstream.

Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau 
Acked-by: Eric Dumazet 
Signed-off-by: Greg Kroah-Hartman