<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/core, branch v2.6.16.42</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>[PKTGEN]: Fix module load/unload races.</title>
<updated>2007-01-04T00:02:58+00:00</updated>
<author>
<name>Robert Olsson</name>
<email>data.slu.se</email>
</author>
<published>2007-01-03T23:57:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e79366b5564af42aa2449042c75630c16edbdb4d'/>
<id>e79366b5564af42aa2449042c75630c16edbdb4d</id>
<content type='text'>
Adrian Bunk:
Backported to 2.6.16.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adrian Bunk:
Backported to 2.6.16.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NET_SCHED: Fix fallout from dev-&gt;qdisc RCU change</title>
<updated>2007-01-03T23:38:10+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2007-01-03T23:38:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=83d285a27720a4927ad1ca8e12b035ddcf1b5e38'/>
<id>83d285a27720a4927ad1ca8e12b035ddcf1b5e38</id>
<content type='text'>
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.

The two assumptions were:

- since changes only happen in process context, read_lock doesn't need
  bottem half protection. Now invalid since destruction of inner qdiscs,
  classifiers, actions and estimators happens in the RCU callback unless
  they're manually deleted, resulting in dead-locks when read_lock in
  process context is interrupted by write_lock_bh in bottem half context.

- since changes only happen under the RTNL, no additional locking is
  necessary for data not used during packet processing (f.e. u32_list).
  Again, since destruction now happens in the RCU callback, this assumption
  is not valid anymore, causing races while using this data, which can
  result in corruption or use-after-free.

Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev-&gt;qdisc
pointer is protected by RCU, but -&gt;enqueue and the qdisc tree are still
protected by dev-&gt;qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.

The two assumptions were:

- since changes only happen in process context, read_lock doesn't need
  bottem half protection. Now invalid since destruction of inner qdiscs,
  classifiers, actions and estimators happens in the RCU callback unless
  they're manually deleted, resulting in dead-locks when read_lock in
  process context is interrupted by write_lock_bh in bottem half context.

- since changes only happen under the RTNL, no additional locking is
  necessary for data not used during packet processing (f.e. u32_list).
  Again, since destruction now happens in the RCU callback, this assumption
  is not valid anymore, causing races while using this data, which can
  result in corruption or use-after-free.

Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev-&gt;qdisc
pointer is protected by RCU, but -&gt;enqueue and the qdisc tree are still
protected by dev-&gt;qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[RTNETLINK]: Fix IFLA_ADDRESS handling.</title>
<updated>2006-11-19T23:21:04+00:00</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2006-11-19T23:21:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=705cb93548ca3f5e552ea10832203373fb7b7917'/>
<id>705cb93548ca3f5e552ea10832203373fb7b7917</id>
<content type='text'>
The -&gt;set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.

So whip up a sockaddr to wrap around the netlink
attribute for the -&gt;set_mac_address call.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The -&gt;set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.

So whip up a sockaddr to wrap around the netlink
attribute for the -&gt;set_mac_address call.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix timer race in dst GC code</title>
<updated>2006-11-17T16:53:07+00:00</updated>
<author>
<name>Dmitry Mishin</name>
<email>dim@openvz.org</email>
</author>
<published>2006-11-17T16:53:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9e9c71470300baacb977ef0f332038691f8dc46a'/>
<id>9e9c71470300baacb977ef0f332038691f8dc46a</id>
<content type='text'>
Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.

   CPU1                            CPU2
dst_run_gc()  entered           dst_run_gc() entered
spin_lock(&amp;dst_lock)                   .....
del_timer(&amp;dst_gc_timer)         fail to get lock
   ....                         mod_timer() &lt;--- puts
                                             timer back
                                             to the list
add_timer(&amp;dst_gc_timer) &lt;--- BUG because timer is in list already.

Found during OpenVZ internal testing.

At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.

F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.

Signed-off-by: Dmitry Mishin &lt;dim@openvz.org&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.

   CPU1                            CPU2
dst_run_gc()  entered           dst_run_gc() entered
spin_lock(&amp;dst_lock)                   .....
del_timer(&amp;dst_gc_timer)         fail to get lock
   ....                         mod_timer() &lt;--- puts
                                             timer back
                                             to the list
add_timer(&amp;dst_gc_timer) &lt;--- BUG because timer is in list already.

Found during OpenVZ internal testing.

At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.

F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.

Signed-off-by: Dmitry Mishin &lt;dim@openvz.org&gt;
Signed-off-by: Kirill Korotaev &lt;dev@openvz.org&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Update frag_list in pskb_trim</title>
<updated>2006-11-10T23:15:10+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2006-11-10T23:15:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e1dc7abb2490bfe0b72f5f9ee6cf28001197ab5b'/>
<id>e1dc7abb2490bfe0b72f5f9ee6cf28001197ab5b</id>
<content type='text'>
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming.  This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.

Examples include esp_output and ip_fragment.

Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.

It is possible to get away with this if we audit everything to make sure
that they always consult skb-&gt;len before going down onto frag_list.  In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb.  For now though, let's do the conservative fix
and update frag_list.

Many thanks to Marco Berizzi for helping me to track down this bug.

This 4-year old bug took 3 months to track down.  Marco was very patient
indeed :)

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming.  This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.

Examples include esp_output and ip_fragment.

Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.

It is possible to get away with this if we audit everything to make sure
that they always consult skb-&gt;len before going down onto frag_list.  In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb.  For now though, let's do the conservative fix
and update frag_list.

Many thanks to Marco Berizzi for helping me to track down this bug.

This 4-year old bug took 3 months to track down.  Marco was very patient
indeed :)

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: __alloc_pages() failures reported due to fragmentation</title>
<updated>2006-11-09T10:05:38+00:00</updated>
<author>
<name>Larry Woodman</name>
<email>lwoodman@redhat.com</email>
</author>
<published>2006-11-09T10:05:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4504530e9f7310af00f79807029659c32fda90a1'/>
<id>4504530e9f7310af00f79807029659c32fda90a1</id>
<content type='text'>
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available.  I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator.  Shouldnt the gfp_mask be passed to alloc_skb() ?

Signed-off-by: Larry Woodman &lt;lwoodman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available.  I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator.  Shouldnt the gfp_mask be passed to alloc_skb() ?

Signed-off-by: Larry Woodman &lt;lwoodman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Set truesize in pskb_copy</title>
<updated>2006-11-09T10:03:56+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2006-11-09T10:03:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5f61e927c7c47e696f7fe18ae3e7ea32a42e1d24'/>
<id>5f61e927c7c47e696f7fe18ae3e7ea32a42e1d24</id>
<content type='text'>
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[NET]: Add missing UFO initialisations</title>
<updated>2006-11-08T06:47:29+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2006-11-08T06:47:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=41dc00ec4d90588a7362075cfd0f0afae5d29ff4'/>
<id>41dc00ec4d90588a7362075cfd0f0afae5d29ff4</id>
<content type='text'>
This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).

Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.

What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.

The fix is simple enough.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).

Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.

What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.

The fix is simple enough.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PKTGEN]: Make sure skb-&gt;{nh,h} are initialized in fill_packet_ipv6() too.</title>
<updated>2006-09-06T17:35:53+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2006-09-06T17:35:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b385946d47de12d8531c3e2abf69e5d5bff2720'/>
<id>8b385946d47de12d8531c3e2abf69e5d5bff2720</id>
<content type='text'>
Mirror the bug fix from fill_packet_ipv4()

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mirror the bug fix from fill_packet_ipv4()

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PKTGEN]: Fix oops when used with balance-tlb bonding</title>
<updated>2006-09-06T17:34:53+00:00</updated>
<author>
<name>Chen-Li Tien</name>
<email>cltien@gmail.com</email>
</author>
<published>2006-09-06T17:34:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=da56aea0bddc155f786594720ab91e36421cc51e'/>
<id>da56aea0bddc155f786594720ab91e36421cc51e</id>
<content type='text'>
Signed-off-by: Chen-Li Tien &lt;cltien@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Chen-Li Tien &lt;cltien@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Adrian Bunk &lt;bunk@stusta.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
