<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/rxrpc/input.c, branch v4.10</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>udp: do fwd memory scheduling on dequeue</title>
<updated>2016-11-07T18:24:41+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2016-11-04T10:28:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7c13f97ffde63cc792c49ec1513f3974f2f05229'/>
<id>7c13f97ffde63cc792c49ec1513f3974f2f05229</id>
<content type='text'>
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb-&gt;desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -&gt; v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb-&gt;desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -&gt; v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Partially handle OpenAFS's improper termination of calls</title>
<updated>2016-10-06T07:11:49+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-10-06T07:11:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3156274ca01297b861e912175820e78c9ac4d7c'/>
<id>b3156274ca01297b861e912175820e78c9ac4d7c</id>
<content type='text'>
OpenAFS doesn't always correctly terminate client calls that it makes -
this includes calls the OpenAFS servers make to the cache manager service.
It should end the client call with either:

 (1) An ACK that has firstPacket set to one greater than the seq number of
     the reply DATA packet with the LAST_PACKET flag set (thereby
     hard-ACK'ing all packets).  nAcks should be 0 and acks[] should be
     empty (ie. no soft-ACKs).

 (2) An ACKALL packet.

OpenAFS, though, may send an ACK packet with firstPacket set to the last
seq number or less and soft-ACKs listed for all packets up to and including
the last DATA packet.

The transmitter, however, is obliged to keep the call live and the
soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
permitted to drop any merely soft-ACK'd packet and request retransmission
by sending an ACK packet with a NACK in it.

Further, OpenAFS will also terminate a client call by beginning the next
client call on the same connection channel.  This implicitly completes the
previous call.

This patch handles implicit ACK of a call on a channel by the reception of
the first packet of the next call on that channel.

If another call doesn't come along to implicitly ACK a call, then we have
to time the call out.  There are some bugs there that will be addressed in
subsequent patches.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
OpenAFS doesn't always correctly terminate client calls that it makes -
this includes calls the OpenAFS servers make to the cache manager service.
It should end the client call with either:

 (1) An ACK that has firstPacket set to one greater than the seq number of
     the reply DATA packet with the LAST_PACKET flag set (thereby
     hard-ACK'ing all packets).  nAcks should be 0 and acks[] should be
     empty (ie. no soft-ACKs).

 (2) An ACKALL packet.

OpenAFS, though, may send an ACK packet with firstPacket set to the last
seq number or less and soft-ACKs listed for all packets up to and including
the last DATA packet.

The transmitter, however, is obliged to keep the call live and the
soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
permitted to drop any merely soft-ACK'd packet and request retransmission
by sending an ACK packet with a NACK in it.

Further, OpenAFS will also terminate a client call by beginning the next
client call on the same connection channel.  This implicitly completes the
previous call.

This patch handles implicit ACK of a call on a channel by the reception of
the first packet of the next call on that channel.

If another call doesn't come along to implicitly ACK a call, then we have
to time the call out.  There are some bugs there that will be addressed in
subsequent patches.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs</title>
<updated>2016-10-06T07:11:49+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-10-06T07:11:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a5af7e1fc69a46f29b977fd4b570e0ac414c2338'/>
<id>a5af7e1fc69a46f29b977fd4b570e0ac414c2338</id>
<content type='text'>
Separate the output of PING ACKs from the output of other sorts of ACK so
that if we receive a PING ACK and schedule transmission of a PING RESPONSE
ACK, the response doesn't get cancelled by a PING ACK we happen to be
scheduling transmission of at the same time.

If a PING RESPONSE gets lost, the other side might just sit there waiting
for it and refuse to proceed otherwise.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Separate the output of PING ACKs from the output of other sorts of ACK so
that if we receive a PING ACK and schedule transmission of a PING RESPONSE
ACK, the response doesn't get cancelled by a PING ACK we happen to be
scheduling transmission of at the same time.

If a PING RESPONSE gets lost, the other side might just sit there waiting
for it and refuse to proceed otherwise.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Only ping for lost reply in client call</title>
<updated>2016-10-06T07:11:49+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-10-06T07:11:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a9f312d98affab387557e2795d4e11ad82a4e4e8'/>
<id>a9f312d98affab387557e2795d4e11ad82a4e4e8</id>
<content type='text'>
When a reply is deemed lost, we send a ping to find out the other end
received all the request data packets we sent.  This should be limited to
client calls and we shouldn't do this on service calls.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a reply is deemed lost, we send a ping to find out the other end
received all the request data packets we sent.  This should be limited to
client calls and we shouldn't do this on service calls.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Keep the call timeouts as ktimes rather than jiffies</title>
<updated>2016-09-30T13:40:11+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-26T21:12:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=df0adc788ae74e35ab1a79f3db878df7fdc7db55'/>
<id>df0adc788ae74e35ab1a79f3db878df7fdc7db55</id>
<content type='text'>
Keep that call timeouts as ktimes rather than jiffies so that they can be
expressed as functions of RTT.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Keep that call timeouts as ktimes rather than jiffies so that they can be
expressed as functions of RTT.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: The offset field in struct rxrpc_skb_priv is unnecessary</title>
<updated>2016-09-30T13:39:28+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-30T12:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=775e5b71db6aca47d49d43d08751f2e8ebad7f60'/>
<id>775e5b71db6aca47d49d43d08751f2e8ebad7f60</id>
<content type='text'>
The offset field in struct rxrpc_skb_priv is unnecessary as the value can
always be calculated.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The offset field in struct rxrpc_skb_priv is unnecessary as the value can
always be calculated.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Reduce ssthresh to peer's receive window</title>
<updated>2016-09-30T13:38:59+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-30T08:33:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0851115090a3eb9585d6a804a61e47f3d89ac2a8'/>
<id>0851115090a3eb9585d6a804a61e47f3d89ac2a8</id>
<content type='text'>
When we receive an ACK from the peer that tells us what the peer's receive
window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller
than ssthresh.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we receive an ACK from the peer that tells us what the peer's receive
window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller
than ssthresh.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Switch to Congestion Avoidance mode at cwnd==ssthresh</title>
<updated>2016-09-30T13:38:56+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-30T08:26:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8782def204e57f6a507ff425e4944df4e010751a'/>
<id>8782def204e57f6a507ff425e4944df4e010751a</id>
<content type='text'>
Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying
on cwnd getting incremented beyond ssthresh and the window size, the mode
being shifted and then cwnd being corrected.

We need to make sure we switch into CA mode so that we stop marking every
packet for ACK.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying
on cwnd getting incremented beyond ssthresh and the window size, the mode
being shifted and then cwnd being corrected.

We need to make sure we switch into CA mode so that we stop marking every
packet for ACK.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Note serial number being ACK'd in the congestion management trace</title>
<updated>2016-09-29T21:57:47+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-29T21:37:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ed1e8679d8bc6537077d1f24bc83b396f6062f09'/>
<id>ed1e8679d8bc6537077d1f24bc83b396f6062f09</id>
<content type='text'>
Note the serial number of the packet being ACK'd in the congestion
management trace rather than the serial number of the ACK packet.  Whilst
the serial number of the ACK packet is useful for matching ACK packet in
the output of wireshark, the serial number that the ACK is in response to
is of more use in working out how different trace lines relate.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Note the serial number of the packet being ACK'd in the congestion
management trace rather than the serial number of the ACK packet.  Whilst
the serial number of the ACK packet is useful for matching ACK packet in
the output of wireshark, the serial number that the ACK is in response to
is of more use in working out how different trace lines relate.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rxrpc: Implement slow-start</title>
<updated>2016-09-24T22:49:46+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2016-09-24T17:05:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=57494343cb5d66962bb197878fb1cc576177db31'/>
<id>57494343cb5d66962bb197878fb1cc576177db31</id>
<content type='text'>
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

 (1) Since we send fixed-size DATA packets (apart from the final packet in
     each phase), counters and calculations are in terms of packets rather
     than bytes.

 (2) The ACK packet carries the equivalent of TCP SACK.

 (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
     suited to SACK of a small number of packets.  It seems that, almost
     inevitably, by the time three 'duplicate' ACKs have been seen, we have
     narrowed the loss down to one or two missing packets, and the
     FLIGHT_SIZE calculation ends up as 2.

 (4) In rxrpc_resend(), if there was no data that apparently needed
     retransmission, we transmit a PING ACK to ask the peer to tell us what
     its Rx window state is.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

 (1) Since we send fixed-size DATA packets (apart from the final packet in
     each phase), counters and calculations are in terms of packets rather
     than bytes.

 (2) The ACK packet carries the equivalent of TCP SACK.

 (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
     suited to SACK of a small number of packets.  It seems that, almost
     inevitably, by the time three 'duplicate' ACKs have been seen, we have
     narrowed the loss down to one or two missing packets, and the
     FLIGHT_SIZE calculation ends up as 2.

 (4) In rxrpc_resend(), if there was no data that apparently needed
     retransmission, we transmit a PING ACK to ask the peer to tell us what
     its Rx window state is.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
