<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/net/sctp, branch v2.6.32.14</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>sctp: on T3_RTX retransmit all the in-flight chunks</title>
<updated>2009-11-29T08:14:02+00:00</updated>
<author>
<name>Andrei Pelinescu-Onciul</name>
<email>andrei@iptel.org</email>
</author>
<published>2009-11-29T08:14:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5fdd4baef6195a1f2960e901c8877e2105f832ca'/>
<id>5fdd4baef6195a1f2960e901c8877e2105f832ca</id>
<content type='text'>
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout =&gt; it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
also removes the now unused transport-&gt;last_rto, introduced in
 b6157d8e03e1e780660a328f7183bcbfa4a93a19.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul &lt;andrei@iptel.org&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding  transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
 T3-rtx timer expired but did not fit in one MTU (rule E3 above)
 should be marked for retransmission and sent as soon as cwnd
 allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout =&gt; it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
also removes the now unused transport-&gt;last_rto, introduced in
 b6157d8e03e1e780660a328f7183bcbfa4a93a19.

p.s  The problem is not only when multiple paths are there.  It
can happen in a single homed environment.  If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul &lt;andrei@iptel.org&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Set source addresses on the association before adding transports</title>
<updated>2009-11-14T03:56:50+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-11-10T08:57:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=409b95aff3583c05ac7a9247fa3d8c9aa7f9cae3'/>
<id>409b95aff3583c05ac7a9247fa3d8c9aa7f9cae3</id>
<content type='text'>
Recent commit 8da645e101a8c20c6073efda3c7cc74eec01b87f
	sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup.  The behavior was

different between IPv4 and IPv6.  IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded.  In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet.  Thus resulted in a hung connection.

The solution is to set the source addresses on the association prior to
adding peers.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recent commit 8da645e101a8c20c6073efda3c7cc74eec01b87f
	sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup.  The behavior was

different between IPv4 and IPv6.  IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded.  In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet.  Thus resulted in a hung connection.

The solution is to set the source addresses on the association prior to
adding peers.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Make setsockopt() optlen be unsigned.</title>
<updated>2009-09-30T23:12:20+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2009-09-30T23:12:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b7058842c940ad2c08dd829b21e5c92ebe3b8758'/>
<id>b7058842c940ad2c08dd829b21e5c92ebe3b8758</id>
<content type='text'>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: turn flags in 'struct sctp_association' into bit fields</title>
<updated>2009-09-04T22:21:02+00:00</updated>
<author>
<name>Wei Yongjun</name>
<email>yjwei@cn.fujitsu.com</email>
</author>
<published>2009-09-04T06:33:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9237ccbc0b22b5aa5396463ba0b14dc5efe5b98c'/>
<id>9237ccbc0b22b5aa5396463ba0b14dc5efe5b98c</id>
<content type='text'>
This shrinks the size of struct sctp_association a little.

Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This shrinks the size of struct sctp_association a little.

Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Sysctl configuration for IPv4 Address Scoping</title>
<updated>2009-09-04T22:21:01+00:00</updated>
<author>
<name>Bhaskar Dutta</name>
<email>bhaskie@gmail.com</email>
</author>
<published>2009-09-03T11:55:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=723884339f90a9c420783135168cc1045750eb5d'/>
<id>723884339f90a9c420783135168cc1045750eb5d</id>
<content type='text'>
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable &lt;draft-stewart-tsvwg-sctp-ipv4-00.txt&gt;.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack

Signed-off-by: Bhaskar Dutta &lt;bhaskar.dutta@globallogic.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable &lt;draft-stewart-tsvwg-sctp-ipv4-00.txt&gt;.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack

Signed-off-by: Bhaskar Dutta &lt;bhaskar.dutta@globallogic.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Turn flags in 'sctp_packet' into bit fields</title>
<updated>2009-09-04T22:21:01+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-09-04T22:21:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a803c942303e6a4ef0ab6b80114529852cffa058'/>
<id>a803c942303e6a4ef0ab6b80114529852cffa058</id>
<content type='text'>
This shrinks the size of sctp_packet a little.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This shrinks the size of sctp_packet a little.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Fix SCTP_MAXSEG socket option to comply to spec.</title>
<updated>2009-09-04T22:21:00+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-09-04T22:21:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f68b2e05f326971cd76c65aa91a1a41771dd7485'/>
<id>f68b2e05f326971cd76c65aa91a1a41771dd7485</id>
<content type='text'>
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Don't do NAGLE delay on large writes that were fragmented small</title>
<updated>2009-09-04T22:20:59+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-09-04T22:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cb95ea32a457871f72752164de8d94fa20f4703c'/>
<id>cb95ea32a457871f72752164de8d94fa20f4703c</id>
<content type='text'>
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
and Doug Graham &lt;dgraham@nortel.com&gt;.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU.  Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens.  The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
and Doug Graham &lt;dgraham@nortel.com&gt;.  Many thanks go out to them.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: drop a_rwnd to 0 when receive buffer overflows.</title>
<updated>2009-09-04T22:20:59+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-09-04T22:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4d3c46e6833208428d366630aa708f6876e61fc1'/>
<id>4d3c46e6833208428d366630aa708f6876e61fc1</id>
<content type='text'>
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages.  To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion.  When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time.  This worked well in testing and under stress produced
rather even recovery.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Send user messages to the lower layer as one</title>
<updated>2009-09-04T22:20:57+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2009-08-10T17:51:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9c5c62be2f794c7cee533d856f9f34c3cf21ff1b'/>
<id>9c5c62be2f794c7cee533d856f9f34c3cf21ff1b</id>
<content type='text'>
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
