<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/skbuff.h, branch v2.6.38.6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>netfilter: fix compilation when conntrack is disabled but tproxy is enabled</title>
<updated>2011-01-12T19:25:08+00:00</updated>
<author>
<name>KOVACS Krisztian</name>
<email>hidden@balabit.hu</email>
</author>
<published>2011-01-12T19:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2fc72c7b84002ffb3c66918e2a7b0ee607d8b5aa'/>
<id>2fc72c7b84002ffb3c66918e2a7b0ee607d8b5aa</id>
<content type='text'>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&amp;m=129010118516341&amp;w=2

Reported-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Acked-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: KOVACS Krisztian &lt;hidden@balabit.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&amp;m=129010118516341&amp;w=2

Reported-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Acked-by: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Signed-off-by: KOVACS Krisztian &lt;hidden@balabit.hu&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Introduce skb_checksum_start_offset()</title>
<updated>2010-12-16T22:43:14+00:00</updated>
<author>
<name>Michał Mirosław</name>
<email>mirq-linux@rere.qmqm.pl</email>
</author>
<published>2010-12-14T15:24:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=04fb451eff978ca059399eab83d5594b073caf6f'/>
<id>04fb451eff978ca059399eab83d5594b073caf6f</id>
<content type='text'>
Introduce skb_checksum_start_offset() to replace repetitive calculation.

Signed-off-by: Michał Mirosław &lt;mirq-linux@rere.qmqm.pl&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce skb_checksum_start_offset() to replace repetitive calculation.

Signed-off-by: Michał Mirosław &lt;mirq-linux@rere.qmqm.pl&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bnx2x: Take the distribution range definition out of skb_tx_hash()</title>
<updated>2010-12-16T21:15:53+00:00</updated>
<author>
<name>Vladislav Zolotarov</name>
<email>vladz@broadcom.com</email>
</author>
<published>2010-12-13T06:27:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a3d22a68d752ccc1a01bb0a64dd70b7a98bf9e23'/>
<id>a3d22a68d752ccc1a01bb0a64dd70b7a98bf9e23</id>
<content type='text'>
Move the calcualation of the Tx hash for a given hash range into a separate
function and define the skb_tx_hash(), which calculates a Tx hash for a
[0; dev-&gt;real_num_tx_queues - 1] hash values range, using this
function (__skb_tx_hash()).

Signed-off-by: Vladislav Zolotarov &lt;vladz@broadcom.com&gt;
Signed-off-by: Eilon Greenstein &lt;eilong@broadcom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the calcualation of the Tx hash for a given hash range into a separate
function and define the skb_tx_hash(), which calculates a Tx hash for a
[0; dev-&gt;real_num_tx_queues - 1] hash values range, using this
function (__skb_tx_hash()).

Signed-off-by: Vladislav Zolotarov &lt;vladz@broadcom.com&gt;
Signed-off-by: Eilon Greenstein &lt;eilong@broadcom.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xps: Improvements in TX queue selection</title>
<updated>2010-11-24T19:44:19+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2010-11-21T13:17:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3853b5841c01a3f492fe137afaad9c209e5162c6'/>
<id>3853b5841c01a3f492fe137afaad9c209e5162c6</id>
<content type='text'>
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: avoid RCU for NOCACHE dst</title>
<updated>2010-10-20T10:02:23+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-15T05:44:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=27b75c95f10d249574d9c4cb9dab878107faede8'/>
<id>27b75c95f10d249574d9c4cb9dab878107faede8</id>
<content type='text'>
There is no point using RCU for dst we allocate for a very short time
(used once).

Change dst_release() to take DST_NOCACHE into account, but also change
skb_dst_set_noref() to force a refcount increment for such dst.

This is a _huge_ gain, because we dont waste memory to store xx thousand
of dsts. Instead of queueing them to RCU, we can free them instantly.

CPU caches can stay hot, re-using same memory blocks to hold temporary
dsts.

Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(),
since atomic_dec_return() implies a full memory barrier.

Stress test, 160.000.000 udp frames sent, IP route cache disabled
(DDOS).

Before:

real    0m38.091s
user    0m13.189s
sys     7m53.018s

After:

real	0m29.946s
user	0m12.157s
sys	7m40.605s

For reference, if IP route cache was enabled :

real	0m32.030s
user	0m10.521s
sys	8m15.243s

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is no point using RCU for dst we allocate for a very short time
(used once).

Change dst_release() to take DST_NOCACHE into account, but also change
skb_dst_set_noref() to force a refcount increment for such dst.

This is a _huge_ gain, because we dont waste memory to store xx thousand
of dsts. Instead of queueing them to RCU, we can free them instantly.

CPU caches can stay hot, re-using same memory blocks to hold temporary
dsts.

Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(),
since atomic_dec_return() implies a full memory barrier.

Stress test, 160.000.000 udp frames sent, IP route cache disabled
(DDOS).

Before:

real    0m38.091s
user    0m13.189s
sys     7m53.018s

After:

real	0m29.946s
user	0m12.157s
sys	7m40.605s

For reference, if IP route cache was enabled :

real	0m32.030s
user	0m10.521s
sys	8m15.243s

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: allocate skbs on local node</title>
<updated>2010-10-16T18:13:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-10-11T19:05:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=564824b0c52c34692d804bb6ea214451615b0b50'/>
<id>564824b0c52c34692d804bb6ea214451615b0b50</id>
<content type='text'>
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b30973f877 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: skb_frag_t can be smaller on small arches</title>
<updated>2010-09-27T01:31:13+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-09-23T05:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cb4dfe562cac6fcb544df752e40c1d78000d0712'/>
<id>cb4dfe562cac6fcb544df752e40c1d78000d0712</id>
<content type='text'>
On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: return operator cleanup</title>
<updated>2010-09-23T21:33:39+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-09-22T20:43:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a02cec2155fbea457eca8881870fd2de1a4c4c76'/>
<id>a02cec2155fbea457eca8881870fd2de1a4c4c76</id>
<content type='text'>
Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drivers/net: avoid some skb-&gt;ip_summed initializations</title>
<updated>2010-09-03T02:06:22+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2010-09-02T20:07:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bc8acf2c8c3e43fcc192762a9f964b3e9a17748b'/>
<id>bc8acf2c8c3e43fcc192762a9f964b3e9a17748b</id>
<content type='text'>
fresh skbs have ip_summed set to CHECKSUM_NONE (0)

We can avoid setting again skb-&gt;ip_summed to CHECKSUM_NONE in drivers.

Introduce skb_checksum_none_assert() helper so that we keep this
assertion documented in driver sources.

Change most occurrences of :

skb-&gt;ip_summed = CHECKSUM_NONE;

by :

skb_checksum_none_assert(skb);

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fresh skbs have ip_summed set to CHECKSUM_NONE (0)

We can avoid setting again skb-&gt;ip_summed to CHECKSUM_NONE in drivers.

Introduce skb_checksum_none_assert() helper so that we keep this
assertion documented in driver sources.

Change most occurrences of :

skb-&gt;ip_summed = CHECKSUM_NONE;

by :

skb_checksum_none_assert(skb);

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Rename skb_has_frags to skb_has_frag_list</title>
<updated>2010-08-23T07:13:46+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2010-08-23T07:13:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21dc330157454046dd7c494961277d76e1c957fe'/>
<id>21dc330157454046dd7c494961277d76e1c957fe</id>
<content type='text'>
SKBs can be "fragmented" in two ways, via a page array (called
skb_shinfo(skb)-&gt;frags[]) and via a list of SKBs (called
skb_shinfo(skb)-&gt;frag_list).

Since skb_has_frags() tests the latter, it's name is confusing
since it sounds more like it's testing the former.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SKBs can be "fragmented" in two ways, via a page array (called
skb_shinfo(skb)-&gt;frags[]) and via a list of SKBs (called
skb_shinfo(skb)-&gt;frag_list).

Since skb_has_frags() tests the latter, it's name is confusing
since it sounds more like it's testing the former.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
