<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/net/inet6_hashtables.h, branch v2.6.19.2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Don't include linux/config.h from anywhere else in include/</title>
<updated>2006-04-26T11:56:16+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw2@infradead.org</email>
</author>
<published>2006-04-26T11:56:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=62c4f0a2d5a188f73a94f2cb8ea0dba3e7cf0a7f'/>
<id>62c4f0a2d5a188f73a94f2cb8ea0dba3e7cf0a7f</id>
<content type='text'>
Signed-off-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David Woodhouse &lt;dwmw2@infradead.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[IPV6]: Deinline few large functions in inet6 code</title>
<updated>2006-04-10T05:48:59+00:00</updated>
<author>
<name>Denis Vlasenko</name>
<email>vda@ilport.com.ua</email>
</author>
<published>2006-04-10T05:48:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b1a7ffcb7a047e99ab02424e651e0492f36095f7'/>
<id>b1a7ffcb7a047e99ab02424e651e0492f36095f7</id>
<content type='text'>
Deinline a few functions which produce 200+ bytes of code.

Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  429    3    818 __inet6_lookup        include/net/inet6_hashtables.h
  404    2    384 __inet6_lookup_established    include/net/inet6_hashtables.h
  206    3    372 __inet6_hash  include/net/inet6_hashtables.h

Signed-off-by: Denis Vlasenko &lt;vda@ilport.com.ua&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Deinline a few functions which produce 200+ bytes of code.

Size  Uses Wasted Name and definition
===== ==== ====== ================================================
  429    3    818 __inet6_lookup        include/net/inet6_hashtables.h
  404    2    384 __inet6_lookup_established    include/net/inet6_hashtables.h
  206    3    372 __inet6_hash  include/net/inet6_hashtables.h

Signed-off-by: Denis Vlasenko &lt;vda@ilport.com.ua&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[IPV6]: Introduce inet6_timewait_sock</title>
<updated>2006-01-03T21:10:47+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-12-14T07:23:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0fa1a53e1f055a6c790f40e7728f42a825b29248'/>
<id>0fa1a53e1f055a6c790f40e7728f42a825b29248</id>
<content type='text'>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash</title>
<updated>2006-01-03T21:10:33+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-12-14T07:15:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=90b19d31695371bd3ed256d4c9e280861cd6ae7e'/>
<id>90b19d31695371bd3ed256d4c9e280861cd6ae7e</id>
<content type='text'>
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET]: speedup inet (tcp/dccp) lookups</title>
<updated>2005-10-03T21:13:38+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>dada1@cosmosbay.com</email>
</author>
<published>2005-10-03T21:13:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=81c3d5470ecc70564eb9209946730fe2be93ad06'/>
<id>81c3d5470ecc70564eb9209946730fe2be93ad06</id>
<content type='text'>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &amp;head-&gt;chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)-&gt;daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))
     ((*((__u64 *)&amp;(inet_sk(__sk)-&gt;daddr)))== (__cookie))   &amp;&amp;  \
     ((*((__u32 *)&amp;(inet_sk(__sk)-&gt;dport))) == (__ports))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)-&gt;sk_hash == (__hash))                 &amp;&amp;  \
     (inet_sk(__sk)-&gt;daddr          == (__saddr))   &amp;&amp;  \
     (inet_sk(__sk)-&gt;rcv_saddr      == (__daddr))   &amp;&amp;  \
     (!((__sk)-&gt;sk_bound_dev_if) || ((__sk)-&gt;sk_bound_dev_if == (__dif))))


- Adds a prefetch(head-&gt;chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&amp;head-&gt;lock);

Signed-off-by: Eric Dumazet &lt;dada1@cosmosbay.com&gt;
Acked-by: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[INET6_HASHTABLES]: Move inet6_lookup functions to net/ipv6/inet6_hashtables.c</title>
<updated>2005-08-29T22:57:29+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-08-12T12:26:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5324a040ccc708998e61ea93e669b81312f0ae11'/>
<id>5324a040ccc708998e61ea93e669b81312f0ae11</id>
<content type='text'>
Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.

Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.

Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[IPV6]: Generalise the tcp_v6_lookup routines</title>
<updated>2005-08-29T22:57:24+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@mandriva.com</email>
</author>
<published>2005-08-12T12:19:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=505cbfc577f3fa778005e2800b869eca25727d5f'/>
<id>505cbfc577f3fa778005e2800b869eca25727d5f</id>
<content type='text'>
In the same way as was done with the v4 counterparts, this will be moved
to inet6_hashtables.c.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the same way as was done with the v4 counterparts, this will be moved
to inet6_hashtables.c.

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@mandriva.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
