linux-toradex.git/net, branch v4.7

Merge tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client

2016-07-24T01:00:31+00:00

Pull ceph fix from Ilya Dryomov:
 "A fix for a long-standing bug in the incremental osdmap handling code
  that caused misdirected requests, tagged for stable"

  The tag is signed with a brand new key - Sage is on vacation and I
  didn't anticipate this"

* tag 'ceph-for-4.7-rc8' of git://github.com/ceph/ceph-client:
  libceph: apply new_state before new_up_client on incrementals

libceph: apply new_state before new_up_client on incrementals

2016-07-22T13:17:40+00:00

Currently, osd_weight and osd_state fields are updated in the encoding
order.  This is wrong, because an incremental map may look like e.g.

    new_up_client: { osd=6, addr=... } # set osd_state and addr
    new_state: { osd=6, xorstate=EXISTS } # clear osd_state

Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
on with the new_state update, we flip EXISTS and leave osd6 in a weird
"!EXISTS but UP" state.  A non-existent OSD is considered down by the
mapping code

2087    for (i = 0; i < pg->pg_temp.len; i++) {
2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
2089                    if (ceph_can_shift_osds(pi))
2090                            continue;
2091
2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;

and so requests get directed to the second OSD in the set instead of
the first, resulting in OSD-side errors like:

[WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680

and hung rbds on the client:

[  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
[  493.566805] rbd: rbd0:   result -6 xferred 400000
[  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688

The fix is to decouple application from the decoding and:
- apply new_weight first
- apply new_state before new_up_client
- twiddle osd_state flags if marking in
- clear out some of the state if osd is destroyed

Fixes: http://tracker.ceph.com/issues/14901

Cc: stable@vger.kernel.org # 3.15+: 6dd74e44dc1d: libceph: set 'exists' flag for newly up osd
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Ilya Dryomov 
Reviewed-by: Josh Durgin

packet: propagate sock_cmsg_send() error

2016-07-22T05:41:48+00:00

sock_cmsg_send() can return different error codes and not only
-EINVAL, and we should properly propagate them.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Signed-off-by: David S. Miller

packet: fix second argument of sock_tx_timestamp()

2016-07-20T04:00:50+00:00

This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
work correctly. Since the sendto syscall doesn't have msg_control buffer,
the sock_tx_timestamp() in packet_snd() cannot work correctly because
the socks.tsflags is set to 0.
So, this patch sets the socks.tsflags to sk->sk_tsflags as default.

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Reported-by: Kazuya Mizuguchi 
Reported-by: Keita Kobayashi 
Signed-off-by: Yoshihiro Shimoda 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Willem de Bruijn 
Signed-off-by: David S. Miller

sctp: load transport header after sk_filter

2016-07-19T05:46:52+00:00

Do not cache pointers into the skb linear segment across sk_filter.
The function call can trigger pskb_expand_head.

Signed-off-by: Willem de Bruijn 
Acked-by: Daniel Borkmann 
Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int

2016-07-19T05:44:31+00:00

In kernel HTB keeps tokens in signed 64-bit in nanoseconds. In netlink
protocol these values are converted into pshed ticks (64ns for now) and
truncated to 32-bit. In struct tc_htb_xstats fields "tokens" and "ctokens"
are declared as unsigned 32-bit but they could be negative thus tool 'tc'
prints them as signed. Big values loose higher bits and/or become negative.

This patch clamps tokens in xstat into range from INT_MIN to INT_MAX.
In this way it's easier to understand what's going on here.

Signed-off-by: Konstantin Khlebnikov 
Signed-off-by: David S. Miller

vlan: use a valid default mtu value for vlan over macsec

2016-07-17T03:15:02+00:00

macsec can't cope with mtu frames which need vlan tag insertion, and
vlan device set the default mtu equal to the underlying dev's one.
By default vlan over macsec devices use invalid mtu, dropping
all the large packets.
This patch adds a netif helper to check if an upper vlan device
needs mtu reduction. The helper is used during vlan devices
initialization to set a valid default and during mtu updating to
forbid invalid, too bit, mtu values.
The helper currently only check if the lower dev is a macsec device,
if we get more users, we need to update only the helper (possibly
reserving an additional IFF bit).

Signed-off-by: Paolo Abeni 
Signed-off-by: David S. Miller

tcp: enable per-socket rate limiting of all 'challenge acks'

2016-07-15T21:18:29+00:00

The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:

commit f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock")

And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.

Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.

It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable

Cc: Eric Dumazet 
Cc: David S. Miller 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Yue Cao 
Signed-off-by: Jason Baron 
Signed-off-by: David S. Miller

dccp: limit sk_filter trim to payload

2016-07-13T18:53:41+00:00

Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.

Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller

rose: limit sk_filter trim to payload

2016-07-13T18:53:40+00:00

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller