<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/ipv4/inetpeer.c, branch v3.0.29</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: Compute protocol sequence numbers and fragment IDs using MD5.</title>
<updated>2011-08-16T01:31:35+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-08-04T03:50:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e997d47bff5a467262ef224b4cf8cbba2d3eceea'/>
<id>e997d47bff5a467262ef224b4cf8cbba2d3eceea</id>
<content type='text'>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky &lt;dan@doxpara.com&gt;
Tested-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.

MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)

Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation.  So the periodic
regeneration and 8-bit counter have been removed.  We compute and
use a full 32-bit sequence number.

For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.

Reported-by: Dan Kaminsky &lt;dan@doxpara.com&gt;
Tested-by: Willy Tarreau &lt;w@1wt.eu&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: fix race in unused_list manipulations</title>
<updated>2011-05-27T17:39:11+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-05-26T17:27:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=686a7e32ca7fdd819eb9606abd3db52b77d1479f'/>
<id>686a7e32ca7fdd819eb9606abd3db52b77d1479f</id>
<content type='text'>
Several crashes in cleanup_once() were reported in recent kernels.

Commit d6cc1d642de9 (inetpeer: various changes) added a race in
unlink_from_unused().

One way to avoid taking unused_peers.lock before doing the list_empty()
test is to catch 0-&gt;1 refcnt transitions, using full barrier atomic
operations variants (atomic_cmpxchg() and atomic_inc_return()) instead
of previous atomic_inc() and atomic_add_unless() variants.

We then call unlink_from_unused() only for the owner of the 0-&gt;1
transition.

Add a new atomic_add_unless_return() static helper

With help from Arun Sharma.

Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772

Reported-by: Arun Sharma &lt;asharma@fb.com&gt;
Reported-by: Maximilian Engelhardt &lt;maxi@daemonizer.de&gt;
Reported-by: Yann Dupont &lt;Yann.Dupont@univ-nantes.fr&gt;
Reported-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Several crashes in cleanup_once() were reported in recent kernels.

Commit d6cc1d642de9 (inetpeer: various changes) added a race in
unlink_from_unused().

One way to avoid taking unused_peers.lock before doing the list_empty()
test is to catch 0-&gt;1 refcnt transitions, using full barrier atomic
operations variants (atomic_cmpxchg() and atomic_inc_return()) instead
of previous atomic_inc() and atomic_add_unless() variants.

We then call unlink_from_unused() only for the owner of the 0-&gt;1
transition.

Add a new atomic_add_unless_return() static helper

With help from Arun Sharma.

Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772

Reported-by: Arun Sharma &lt;asharma@fb.com&gt;
Reported-by: Maximilian Engelhardt &lt;maxi@daemonizer.de&gt;
Reported-by: Yann Dupont &lt;Yann.Dupont@univ-nantes.fr&gt;
Reported-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: reduce stack usage</title>
<updated>2011-04-12T20:58:33+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-11T22:39:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=66944e1c5797562cebe2d1857d46dff60bf9a69e'/>
<id>66944e1c5797562cebe2d1857d46dff60bf9a69e</id>
<content type='text'>
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Scot Doyle &lt;lkml@scotdoyle.com&gt;
Cc: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Reviewed-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Scot Doyle &lt;lkml@scotdoyle.com&gt;
Cc: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Reviewed-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: should use call_rcu() variant</title>
<updated>2011-03-14T06:22:23+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-14T06:22:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0'/>
<id>4e75db2e8ff2c97762e87f61f54d7cdeaab1a6b0</id>
<content type='text'>
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Fix PMTU update.</title>
<updated>2011-03-14T01:37:49+00:00</updated>
<author>
<name>Hiroaki SHIMODA</name>
<email>shimoda.hiroaki@gmail.com</email>
</author>
<published>2011-03-09T20:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46af31800b6916c92fffa529dc3c357008da957d'/>
<id>46af31800b6916c92fffa529dc3c357008da957d</id>
<content type='text'>
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -&gt; ip_rt_frag_needed
         (peer-&gt;pmtu_expires is set here)
    -&gt; tcp_v4_err
         -&gt; do_pmtu_discovery
              -&gt; ip_rt_update_pmtu
                   (peer-&gt;pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -&gt; check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer-&gt;pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer-&gt;pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4
(Destination unreachable (Fragmentation needed)),

  icmp_unreach
    -&gt; ip_rt_frag_needed
         (peer-&gt;pmtu_expires is set here)
    -&gt; tcp_v4_err
         -&gt; do_pmtu_discovery
              -&gt; ip_rt_update_pmtu
                   (peer-&gt;pmtu_expires is already set,
                    so check_peer_pmtu is skipped.)
                   -&gt; check_peer_pmtu

check_peer_pmtu is skipped and MTU is not updated.

To fix this, let check_peer_pmtu execute unconditionally.
And some minor fixes
1) Avoid potential peer-&gt;pmtu_expires set to be zero.
2) In check_peer_pmtu, argument of time_before is reversed.
3) check_peer_pmtu expects peer-&gt;pmtu_orig is initialized as zero,
   but not initialized.

Signed-off-by: Hiroaki SHIMODA &lt;shimoda.hiroaki@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Don't disable BH for initial fast RCU lookup.</title>
<updated>2011-03-08T22:59:28+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-03-08T22:59:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7b46ac4e77f3224a1befe032c77f1df31d1b42c4'/>
<id>7b46ac4e77f3224a1befe032c77f1df31d1b42c4</id>
<content type='text'>
If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If modifications on other cpus are ok, then modifications to
the tree during lookup done by the local cpu are ok too.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: seqlock optimization</title>
<updated>2011-03-04T22:33:59+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-03-04T22:33:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=65e8354ec13a45414045084166cb340c0d7ffe8a'/>
<id>65e8354ec13a45414045084166cb340c0d7ffe8a</id>
<content type='text'>
David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
David noticed :

------------------
Eric, I was profiling the non-routing-cache case and something that
stuck out is the case of calling inet_getpeer() with create==0.

If an entry is not found, we have to redo the lookup under a spinlock
to make certain that a concurrent writer rebalancing the tree does
not "hide" an existing entry from us.

This makes the case of a create==0 lookup for a not-present entry
really expensive.  It is on the order of 600 cpu cycles on my
Niagara2.

I added a hack to not do the relookup under the lock when create==0
and it now costs less than 300 cycles.

This is now a pretty common operation with the way we handle COW'd
metrics, so I think it's definitely worth optimizing.
-----------------

One solution is to use a seqlock instead of a spinlock to protect struct
inet_peer_base.

After a failed avl tree lookup, we can easily detect if a writer did
some changes during our lookup. Taking the lock and redo the lookup is
only necessary in this case.

Note: Add one private rcu_deref_locked() macro to place in one spot the
access to spinlock included in seqlock.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Add redirect and PMTU discovery cached info.</title>
<updated>2011-02-10T21:29:30+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-09T23:36:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ddd4aa424b866a08ceba7ddf38e61542c91b93a0'/>
<id>ddd4aa424b866a08ceba7ddf38e61542c91b93a0</id>
<content type='text'>
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst-&gt;expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Validity of the cached PMTU information is indicated by it's
expiration value being non-zero, just as per dst-&gt;expires.

The scheme we will use is that we will remember the pre-ICMP value
held in the metrics or route entry, and then at expiration time
we will restore that value.

In this way PMTU expiration does not kill off the cached route as is
done currently.

Redirect information is permanent, or at least until another redirect
is received.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Abstract address representation further.</title>
<updated>2011-02-10T21:22:28+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-09T22:30:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7a71ed899e77cc822abb863e24a422dcf7e9fa33'/>
<id>7a71ed899e77cc822abb863e24a422dcf7e9fa33</id>
<content type='text'>
Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the -&gt;daddr.family member,
replicating the family in ever address we store is entirely
redundant.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Future changes will add caching information, and some of
these new elements will be addresses.

Since the family is implicit via the -&gt;daddr.family member,
replicating the family in ever address we store is entirely
redundant.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: Move ICMP rate limiting state into inet_peer entries.</title>
<updated>2011-02-04T23:59:53+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-02-04T23:55:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=92d8682926342d2b6aa5b2ecc02221e00e1573a0'/>
<id>92d8682926342d2b6aa5b2ecc02221e00e1573a0</id>
<content type='text'>
Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
