<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/core/dev.c, branch v2.6.27.43</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: fix packet socket delivery in rx irq handler</title>
<updated>2009-02-06T22:00:36+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2008-11-04T22:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d6c1e66f7bfcebb9d70f91731071f24a22e1a71'/>
<id>7d6c1e66f7bfcebb9d70f91731071f24a22e1a71</id>
<content type='text'>
commit 9b22ea560957de1484e6b3e8538f7eef202e3596 upstream.

The changes to deliver hardware accelerated VLAN packets to packet
sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
The __vlan_hwaccel_rx() function is called directly from the drivers
RX function, for non-NAPI drivers that means its still in RX IRQ
context:

[   27.779463] ------------[ cut here ]------------
[   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
...
[   27.782520]  [&lt;c0264755&gt;] netif_nit_deliver+0x5b/0x75
[   27.782590]  [&lt;c02bba83&gt;] __vlan_hwaccel_rx+0x79/0x162
[   27.782664]  [&lt;f8851c1d&gt;] atl1_intr+0x9a9/0xa7c [atl1]
[   27.782738]  [&lt;c0155b17&gt;] handle_IRQ_event+0x23/0x51
[   27.782808]  [&lt;c015692e&gt;] handle_edge_irq+0xc2/0x102
[   27.782878]  [&lt;c0105fd5&gt;] do_IRQ+0x4d/0x64

Split hardware accelerated VLAN reception into two parts to fix this:

- __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
  device lookup, then calls netif_receive_skb()/netif_rx()

- vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
  in softirq context, performs the real reception and delivery to
  packet sockets.

Reported-and-tested-by: Ramon Casellas &lt;ramon.casellas@cttc.es&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Jesse Brandeburg &lt;jesse.brandeburg@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9b22ea560957de1484e6b3e8538f7eef202e3596 upstream.

The changes to deliver hardware accelerated VLAN packets to packet
sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
The __vlan_hwaccel_rx() function is called directly from the drivers
RX function, for non-NAPI drivers that means its still in RX IRQ
context:

[   27.779463] ------------[ cut here ]------------
[   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
...
[   27.782520]  [&lt;c0264755&gt;] netif_nit_deliver+0x5b/0x75
[   27.782590]  [&lt;c02bba83&gt;] __vlan_hwaccel_rx+0x79/0x162
[   27.782664]  [&lt;f8851c1d&gt;] atl1_intr+0x9a9/0xa7c [atl1]
[   27.782738]  [&lt;c0155b17&gt;] handle_IRQ_event+0x23/0x51
[   27.782808]  [&lt;c015692e&gt;] handle_edge_irq+0xc2/0x102
[   27.782878]  [&lt;c0105fd5&gt;] do_IRQ+0x4d/0x64

Split hardware accelerated VLAN reception into two parts to fix this:

- __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
  device lookup, then calls netif_receive_skb()/netif_rx()

- vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
  in softirq context, performs the real reception and delivery to
  packet sockets.

Reported-and-tested-by: Ramon Casellas &lt;ramon.casellas@cttc.es&gt;
Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Jesse Brandeburg &lt;jesse.brandeburg@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>net: eliminate warning from NETIF_F_UFO on bridge</title>
<updated>2008-12-18T17:13:37+00:00</updated>
<author>
<name>Stephen Hemminger</name>
<email>shemminger@vyatta.com</email>
</author>
<published>2008-12-12T18:27:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d79c98afbdd27d20777d9b7558240db66d69ca37'/>
<id>d79c98afbdd27d20777d9b7558240db66d69ca37</id>
<content type='text'>
Based on commit b63365a2d60268a3988285d6c3c6003d7066f93a upstream, but
drastically cut down for 2.6.27.y

The bridge device always causes a warning because when it is first created
it has the no checksum flag set along with all the segmentation/fragmentation
offload bits.  The code in register_netdevice incorrectly checks for only
hardware checksum bit and ignores no checksum bit.

Similar code is already in 2.6.28:
   commit b63365a2d60268a3988285d6c3c6003d7066f93a
   net: Fix disjunct computation of netdev features

Signed-off-by: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on commit b63365a2d60268a3988285d6c3c6003d7066f93a upstream, but
drastically cut down for 2.6.27.y

The bridge device always causes a warning because when it is first created
it has the no checksum flag set along with all the segmentation/fragmentation
offload bits.  The code in register_netdevice incorrectly checks for only
hardware checksum bit and ignores no checksum bit.

Similar code is already in 2.6.28:
   commit b63365a2d60268a3988285d6c3c6003d7066f93a
   net: Fix disjunct computation of netdev features

Signed-off-by: Stephen Hemminger &lt;shemminger@vyatta.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix netdev_run_todo dead-lock</title>
<updated>2008-10-07T22:50:03+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2008-10-07T22:50:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=58ec3b4db9eb5a28e3aec5f407a54e28f7039c19'/>
<id>58ec3b4db9eb5a28e3aec5f407a54e28f7039c19</id>
<content type='text'>
Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly.  We were trying to create
parallelism where none was required.  As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly.  We were trying to create
parallelism where none was required.  As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: only invoke dev-&gt;change_rx_flags when device is UP</title>
<updated>2008-10-07T22:26:48+00:00</updated>
<author>
<name>Patrick McHardy</name>
<email>kaber@trash.net</email>
</author>
<published>2008-10-07T22:26:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b6c40d68ff6498b7f63ddf97cf0aa818d748dee7'/>
<id>b6c40d68ff6498b7f63ddf97cf0aa818d748dee7</id>
<content type='text'>
Jesper Dangaard Brouer &lt;hawk@comx.dk&gt; reported a bug when setting a VLAN
device down that is in promiscous mode:

When the VLAN device is set down, the promiscous count on the real
device is decremented by one by vlan_dev_stop(). When removing the
promiscous flag from the VLAN device afterwards, the promiscous
count on the real device is decremented a second time by the
vlan_change_rx_flags() callback.

The root cause for this is that the -&gt;change_rx_flags() callback is
invoked while the device is down. The synchronization is meant to mirror
the behaviour of the -&gt;set_rx_mode callbacks, meaning the -&gt;open function
is responsible for doing a full sync on open, the -&gt;close() function is
responsible for doing full cleanup on -&gt;stop() and -&gt;change_rx_flags()
is meant to do incremental changes while the device is UP.

Only invoke -&gt;change_rx_flags() while the device is UP to provide the
intended behaviour.

Tested-by: Jesper Dangaard Brouer &lt;jdb@comx.dk&gt;

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Jesper Dangaard Brouer &lt;hawk@comx.dk&gt; reported a bug when setting a VLAN
device down that is in promiscous mode:

When the VLAN device is set down, the promiscous count on the real
device is decremented by one by vlan_dev_stop(). When removing the
promiscous flag from the VLAN device afterwards, the promiscous
count on the real device is decremented a second time by the
vlan_change_rx_flags() callback.

The root cause for this is that the -&gt;change_rx_flags() callback is
invoked while the device is down. The synchronization is meant to mirror
the behaviour of the -&gt;set_rx_mode callbacks, meaning the -&gt;open function
is responsible for doing a full sync on open, the -&gt;close() function is
responsible for doing full cleanup on -&gt;stop() and -&gt;change_rx_flags()
is meant to do incremental changes while the device is UP.

Only invoke -&gt;change_rx_flags() while the device is UP to provide the
intended behaviour.

Tested-by: Jesper Dangaard Brouer &lt;jdb@comx.dk&gt;

Signed-off-by: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netdev: simple_tx_hash shouldn't hash inside fragments</title>
<updated>2008-09-21T05:05:50+00:00</updated>
<author>
<name>Alexander Duyck</name>
<email>alexander.h.duyck@intel.com</email>
</author>
<published>2008-09-21T05:05:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ad55dcaff0e34269f86975ce2ea0da22e9eb74a1'/>
<id>ad55dcaff0e34269f86975ce2ea0da22e9eb74a1</id>
<content type='text'>
Currently simple_tx_hash is hashing inside of udp fragments.  As a result
packets are getting getting sent to all queues when they shouldn't be.
This causes a serious performance regression which can be seen by sending
UDP frames larger than mtu on multiqueue devices.  This change will make
it so that fragments are hashed only as IP datagrams w/o any protocol
information.

Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently simple_tx_hash is hashing inside of udp fragments.  As a result
packets are getting getting sent to all queues when they shouldn't be.
This causes a serious performance regression which can be seen by sending
UDP frames larger than mtu on multiqueue devices.  This change will make
it so that fragments are hashed only as IP datagrams w/o any protocol
information.

Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: Fix qdisc state in net_tx_action()</title>
<updated>2008-09-08T01:41:21+00:00</updated>
<author>
<name>Jarek Poplawski</name>
<email>jarkao2@gmail.com</email>
</author>
<published>2008-09-08T01:41:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e8a83e10d7dfe5d0841062780769b30f65417e15'/>
<id>e8a83e10d7dfe5d0841062780769b30f65417e15</id>
<content type='text'>
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc
is neither ran nor rescheduled, which may cause endless loop in
dev_deactivate().

Reported-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Tested-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
net_tx_action() can skip __QDISC_STATE_SCHED bit clearing while qdisc
is neither ran nor rescheduled, which may cause endless loop in
dev_deactivate().

Reported-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Tested-by: Denys Fedoryshchenko &lt;denys@visp.net.lb&gt;
Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: Prevent livelock in TX queue running.</title>
<updated>2008-08-19T11:00:36+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2008-08-19T11:00:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=195648bbc5ae0848e82f771ecf4cd7497054c212'/>
<id>195648bbc5ae0848e82f771ecf4cd7497054c212</id>
<content type='text'>
If dev_deactivate() is trying to quiesce the queue, it
is theoretically possible for another cpu to livelock
trying to process that queue.  This happens because
dev_deactivate() grabs the queue spinlock as it checks
the queue state, whereas net_tx_action() does a trylock
and reschedules the qdisc if it hits the lock.

This breaks the livelock by adding a check on
__QDISC_STATE_DEACTIVATED to net_tx_action() when
the trylock fails.

Based upon feedback from Herbert Xu and Jarek Poplawski.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If dev_deactivate() is trying to quiesce the queue, it
is theoretically possible for another cpu to livelock
trying to process that queue.  This happens because
dev_deactivate() grabs the queue spinlock as it checks
the queue state, whereas net_tx_action() does a trylock
and reschedules the qdisc if it hits the lock.

This breaks the livelock by adding a check on
__QDISC_STATE_DEACTIVATED to net_tx_action() when
the trylock fails.

Based upon feedback from Herbert Xu and Jarek Poplawski.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: Fix missed RCU unlock in dev_queue_xmit()</title>
<updated>2008-08-18T06:37:16+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2008-08-18T06:37:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=96d203169d1d851ac1468f7d4459a09581be364c'/>
<id>96d203169d1d851ac1468f7d4459a09581be364c</id>
<content type='text'>
Noticed by Jarek Poplawski.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Noticed by Jarek Poplawski.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Change handling of the __QDISC_STATE_SCHED flag in net_tx_action().</title>
<updated>2008-08-18T04:54:43+00:00</updated>
<author>
<name>Jarek Poplawski</name>
<email>jarkao2@gmail.com</email>
</author>
<published>2008-08-18T04:54:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=def82a1db1fdc4f861c77009e2ee86870c3743b0'/>
<id>def82a1db1fdc4f861c77009e2ee86870c3743b0</id>
<content type='text'>
Change handling of the __QDISC_STATE_SCHED flag in net_tx_action() to
enable proper control in dev_deactivate(). Now, if this flag is seen
as unset under root_lock means a qdisc can't be netif_scheduled.

Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change handling of the __QDISC_STATE_SCHED flag in net_tx_action() to
enable proper control in dev_deactivate(). Now, if this flag is seen
as unset under root_lock means a qdisc can't be netif_scheduled.

Signed-off-by: Jarek Poplawski &lt;jarkao2@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>pkt_sched: Add 'deactivated' state.</title>
<updated>2008-08-18T04:51:03+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2008-08-18T04:51:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a9312ae89324438b0edc554eb36c3ec6bf927d04'/>
<id>a9312ae89324438b0edc554eb36c3ec6bf927d04</id>
<content type='text'>
This new state lets dev_deactivate() mark a qdisc as having been
deactivated.

dev_queue_xmit() and ing_filter() check for this bit and do not
try to process the qdisc if the bit is set.

dev_deactivate() polls the qdisc after setting the bit, waiting
for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear.

This isn't perfect yet, but subsequent changesets will make it so.
This part is just one piece of the puzzle.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This new state lets dev_deactivate() mark a qdisc as having been
deactivated.

dev_queue_xmit() and ing_filter() check for this bit and do not
try to process the qdisc if the bit is set.

dev_deactivate() polls the qdisc after setting the bit, waiting
for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear.

This isn't perfect yet, but subsequent changesets will make it so.
This part is just one piece of the puzzle.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
