linux-toradex.git/net, branch v4.1.13

svcrdma: handle rdma read with a non-zero initial page offset

2015-10-27T00:51:59+00:00

commit c91aed9896946721bb30705ea2904edb3725dd61 upstream.

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

The server gets an async error from the rdma device and kills the
connection, and the client then reconnects and resends.  This repeats
indefinitely, and the application hangs.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf4828983df ('svcrdma: refactor marshalling logic')
Signed-off-by: Steve Wise 
Tested-by: Chuck Lever 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

net/unix: fix logic about sk_peek_offset

2015-10-27T00:51:52+00:00

[ Upstream commit e9193d60d363e4dff75ff6d43a48f22be26d59c7 ]

Now send with MSG_PEEK can return data from multiple SKBs.

Unfortunately we take into account the peek offset for each skb,
that is wrong. We need to apply the peek offset only once.

In addition, the peek offset should be used only if MSG_PEEK is set.

Cc: "David S. Miller"  (maintainer:NETWORKING
Cc: Eric Dumazet  (commit_signer:1/14=7%)
Cc: Aaron Conole 
Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag")
Signed-off-by: Andrey Vagin 
Tested-by: Aaron Conole 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag

2015-10-27T00:51:52+00:00

[ Upstream commit 9f389e35674f5b086edd70ed524ca0f287259725 ]

AF_UNIX sockets now return multiple skbs from recv() when MSG_PEEK flag
is set.

This is referenced in kernel bugzilla #12323 @
https://bugzilla.kernel.org/show_bug.cgi?id=12323

As described both in the BZ and lkml thread @
http://lkml.org/lkml/2008/1/8/444 calling recv() with MSG_PEEK on an
AF_UNIX socket only reads a single skb, where the desired effect is
to return as much skb data has been queued, until hitting the recv
buffer size (whichever comes first).

The modified MSG_PEEK path will now move to the next skb in the tree
and jump to the again: label, rather than following the natural loop
structure. This requires duplicating some of the loop head actions.

This was tested using the python socketpair python code attached to
the bugzilla issue.

Signed-off-by: Aaron Conole 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

netlink: Trim skb to alloc size to avoid MSG_TRUNC

2015-10-27T00:51:52+00:00

[ Upstream commit db65a3aaf29ecce2e34271d52e8d2336b97bd9fe ]

netlink_dump() allocates skb based on the calculated min_dump_alloc or
a per socket max_recvmsg_len.
min_alloc_size is maximum space required for any single netdev
attributes as calculated by rtnl_calcit().
max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
It is capped at 16KiB.
The intention is to avoid small allocations and to minimize the number
of calls required to obtain dump information for all net devices.

netlink_dump packs as many small messages as could fit within an skb
that was sized for the largest single netdev information. The actual
space available within an skb is larger than what is requested. It could
be much larger and up to near 2x with align to next power of 2 approach.

Allowing netlink_dump to use all the space available within the
allocated skb increases the buffer size a user has to provide to avoid
truncaion (i.e. MSG_TRUNG flag set).

It was observed that with many VLANs configured on at least one netdev,
a larger buffer of near 64KiB was necessary to avoid "Message truncated"
error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
min_alloc_size was only little over 32KiB.

This patch trims skb to allocated size in order to allow the user to
avoid truncation with more reasonable buffer size.

Signed-off-by: Ronen Arad 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tipc: move fragment importance field to new header position

2015-10-27T00:51:51+00:00

[ Upstream commit dde4b5ae65de659b9ec64bafdde0430459fcb495 ]

In commit e3eea1eb47a ("tipc: clean up handling of message priorities")
we introduced a field in the packet header for keeping track of the
priority of fragments, since this value is not present in the specified
protocol header. Since the value so far only is used at the transmitting
end of the link, we have not yet officially defined it as part of the
protocol.

Unfortunately, the field we use for keeping this value, bits 13-15 in
in word 5, has turned out to be a poor choice; it is already used by the
broadcast protocol for carrying the 'network id' field of the sending
node. Since packet fragments also need to be transported across the
broadcast protocol, the risk of conflict is obvious, and we see this
happen when we use network identities larger than 2^13-1. This has
escaped our testing because we have so far only been using small network
id values.

We now move this field to bits 0-2 in word 9, a field that is guaranteed
to be unused by all involved protocols.

Fixes: e3eea1eb47a ("tipc: clean up handling of message priorities")
Signed-off-by: Jon Maloy 
Acked-by: Ying Xue 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings

2015-10-27T00:51:51+00:00

[ Upstream commit 077cb37fcf6f00a45f375161200b5ee0cd4e937b ]

It seems that kernel memory can leak into userspace by a
kmalloc, ethtool_get_strings, then copy_to_user sequence.

Avoid this by using kcalloc to zero fill the copied buffer.

Signed-off-by: Joe Perches 
Acked-by: Ben Hutchings 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

act_mirred: clear sender cpu before sending to tx

2015-10-27T00:51:51+00:00

[ Upstream commit d40496a56430eac0d330378816954619899fe303 ]

Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding")
the skb->sender_cpu needs to be cleared when moving from Rx
Tx, otherwise kernel could crash.

Fixes: 2bd82484bb4c ("xps: fix xps for stacked devices")
Cc: Eric Dumazet 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
Signed-off-by: Cong Wang 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ovs: do not allocate memory from offline numa node

2015-10-27T00:51:50+00:00

[ Upstream commit 598c12d0ba6de9060f04999746eb1e015774044b ]

When openvswitch tries allocate memory from offline numa node 0:
stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0)
It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
[ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h
This patch disables numa affinity in this case.

Signed-off-by: Konstantin Khlebnikov 
Acked-by: Pravin B Shelar 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

bpf: fix panic in SO_GET_FILTER with native ebpf programs

2015-10-27T00:51:50+00:00

[ Upstream commit 93d08b6966cf730ea669d4d98f43627597077153 ]

When sockets have a native eBPF program attached through
setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to
dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...),
the following panic appears:

  [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null)
  [49904.178762] IP: [] sk_get_filter+0x39/0x90
  [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0
  [49904.185196] Oops: 0000 [#1] SMP
  [...]
  [49904.224677] Call Trace:
  [49904.226090]  [] sock_getsockopt+0x319/0x740
  [49904.227535]  [] ? sock_has_perm+0x63/0x70
  [49904.228953]  [] ? release_sock+0x108/0x150
  [49904.230380]  [] ? selinux_socket_getsockopt+0x23/0x30
  [49904.231788]  [] SyS_getsockopt+0xa6/0xc0
  [49904.233267]  [] entry_SYSCALL_64_fastpath+0x12/0x71

The underlying issue is the very same as in commit b382c0865600
("sock, diag: fix panic in sock_diag_put_filterinfo"), that is,
native eBPF programs don't store an original program since this
is only needed in cBPF ones.

However, sk_get_filter() wasn't updated to test for this at the
time when eBPF could be attached. Just throw an error to the user
to indicate that eBPF cannot be dumped over this interface.
That way, it can also be known that a program _is_ attached (as
opposed to just return 0), and a different (future) method needs
to be consulted for a dump.

Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

inet: fix race in reqsk_queue_unlink()

2015-10-27T00:51:50+00:00

[ Upstream commit 2306c704ce280c97a60d1f45333b822b40281dea ]

reqsk_timer_handler() tests if icsk_accept_queue.listen_opt
is NULL at its beginning.

By the time it calls inet_csk_reqsk_queue_drop() and
reqsk_queue_unlink(), listener might have been closed and
inet_csk_listen_stop() had called reqsk_queue_yank_acceptq()
which sets icsk_accept_queue.listen_opt to NULL

We therefore need to correctly check listen_opt being NULL
after holding syn_wait_lock for proper synchronization.

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Fixes: b357a364c57c ("inet: fix possible panic in reqsk_queue_unlink()")
Signed-off-by: Eric Dumazet 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman