<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/core/dev.c, branch v4.3-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>netpoll: Close race condition between poll_one_napi and napi_disable</title>
<updated>2015-09-23T21:32:50+00:00</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@redhat.com</email>
</author>
<published>2015-09-23T18:57:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2d8bff12699abc3a9bf886bb0b79f44d94d81496'/>
<id>2d8bff12699abc3a9bf886bb0b79f44d94d81496</id>
<content type='text'>
Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

V3)
	Clean up spacing nits

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Drivers might call napi_disable while not holding the napi instance poll_lock.
In those instances, its possible for a race condition to exist between
poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
the following may happen:

CPU0				CPU1
ndo_tx_timeout			napi_poll_dev
 napi_disable			 poll_one_napi
  test_and_set_bit (ret 0)
				  test_bit (ret 1)
   reset adapter		   napi_poll_routine

If the adapter gets a tx timeout without a napi instance scheduled, its possible
for the adapter to think it has exclusive access to the hardware  (as the napi
instance is now scheduled via the napi_disable call), while the netpoll code
thinks there is simply work to do.  The result is parallel hardware access
leading to corrupt data structures in the driver, and a crash.

Additionaly, there is another, more critical race between netpoll and
napi_disable.  The disabled napi state is actually identical to the scheduled
state for a given napi instance.  The implication being that, if a napi instance
is disabled, a netconsole instance would see the napi state of the device as
having been scheduled, and poll it, likely while the driver was dong something
requiring exclusive access.  In the case above, its fairly clear that not having
the rings in a state ready to be polled will cause any number of crashes.

The fix should be pretty easy.  netpoll uses its own bit to indicate that that
the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
We can just gate disabling on that bit as well as the sched bit.  That should
prevent netpoll from conducting a napi poll if we convert its set bit to a
test_and_set_bit operation to provide mutual exclusion

Change notes:
V2)
	Remove a trailing whtiespace
	Resubmit with proper subject prefix

V3)
	Clean up spacing nits

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: jmaxwell@redhat.com
Tested-by: jmaxwell@redhat.com
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix IFF_NO_QUEUE for drivers using alloc_netdev</title>
<updated>2015-08-28T00:14:29+00:00</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2015-08-27T19:21:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f84bb1eac0275283ccd76455e20f926e186ea8c8'/>
<id>f84bb1eac0275283ccd76455e20f926e186ea8c8</id>
<content type='text'>
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.

Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.

Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: introduce change upper device notifier change info</title>
<updated>2015-08-27T23:28:34+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@mellanox.com</email>
</author>
<published>2015-08-27T07:31:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0e4ead9d7b3655d76371604abb9b0dcc4e79bb7d'/>
<id>0e4ead9d7b3655d76371604abb9b0dcc4e79bb7d</id>
<content type='text'>
Add info that is passed along with NETDEV_CHANGEUPPER event.

Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add info that is passed along with NETDEV_CHANGEUPPER event.

Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sched: consolidate tc_classify{,_compat}</title>
<updated>2015-08-27T21:18:48+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-08-26T21:00:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b3ae880266d148bf73a573a766bc9b78c08d805'/>
<id>3b3ae880266d148bf73a573a766bc9b78c08d805</id>
<content type='text'>
For classifiers getting invoked via tc_classify(), we always need an
extra function call into tc_classify_compat(), as both are being
exported as symbols and tc_classify() itself doesn't do much except
handling of reclassifications when tp-&gt;classify() returned with
TC_ACT_RECLASSIFY.

CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
all others use tc_classify(). When tc actions are being configured
out in the kernel, tc_classify() effectively does nothing besides
delegating.

We could spare this layer and consolidate both functions. pktgen on
single CPU constantly pushing skbs directly into the netif_receive_skb()
path with a dummy classifier on ingress qdisc attached, improves
slightly from 22.3Mpps to 23.1Mpps.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For classifiers getting invoked via tc_classify(), we always need an
extra function call into tc_classify_compat(), as both are being
exported as symbols and tc_classify() itself doesn't do much except
handling of reclassifications when tp-&gt;classify() returned with
TC_ACT_RECLASSIFY.

CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
all others use tc_classify(). When tc actions are being configured
out in the kernel, tc_classify() effectively does nothing besides
delegating.

We could spare this layer and consolidate both functions. pktgen on
single CPU constantly pushing skbs directly into the netif_receive_skb()
path with a dummy classifier on ingress qdisc attached, improves
slightly from 22.3Mpps to 23.1Mpps.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: warn if drivers set tx_queue_len = 0</title>
<updated>2015-08-18T18:55:08+00:00</updated>
<author>
<name>Phil Sutter</name>
<email>phil@nwl.cc</email>
</author>
<published>2015-08-18T08:30:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=906470c19da771e638e7c8e16e16c31995b139cc'/>
<id>906470c19da771e638e7c8e16e16c31995b139cc</id>
<content type='text'>
Due to the introduction of IFF_NO_QUEUE, there is a better way for
drivers to indicate that no qdisc should be attached by default. Though,
the old convention can't be dropped since ignoring that setting would
break drivers still using it. Instead, add a warning so out-of-tree
driver maintainers get a chance to adjust their code before we finally
get rid of any special handling of tx_queue_len == 0.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Due to the introduction of IFF_NO_QUEUE, there is a better way for
drivers to indicate that no qdisc should be attached by default. Though,
the old convention can't be dropped since ignoring that setting would
break drivers still using it. Instead, add a warning so out-of-tree
driver maintainers get a chance to adjust their code before we finally
get rid of any special handling of tx_queue_len == 0.

Signed-off-by: Phil Sutter &lt;phil@nwl.cc&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dev: Spelling fix in comments</title>
<updated>2015-07-27T07:54:57+00:00</updated>
<author>
<name>subashab@codeaurora.org</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2015-07-24T03:03:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b469139e81ca8265fb4797c007f8d3338f4191a5'/>
<id>b469139e81ca8265fb4797c007f8d3338f4191a5</id>
<content type='text'>
Fix the following typo
- unchainged -&gt; unchanged

Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the following typo
- unchainged -&gt; unchanged

Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dst: Metadata destinations</title>
<updated>2015-07-21T17:39:05+00:00</updated>
<author>
<name>Thomas Graf</name>
<email>tgraf@suug.ch</email>
</author>
<published>2015-07-21T08:43:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f38a9eb1f77b296ff07e000823884a0f64d67b2a'/>
<id>f38a9eb1f77b296ff07e000823884a0f64d67b2a</id>
<content type='text'>
Introduces a new dst_metadata which enables to carry per packet metadata
between forwarding and processing elements via the skb-&gt;dst pointer.

The structure is set up to be a union. Thus, each separate type of
metadata requires its own dst instance. If demand arises to carry
multiple types of metadata concurrently, metadata dst entries can be
made stackable.

The metadata dst entry is refcnt'ed as expected for now but a non
reference counted use is possible if the reference is forced before
queueing the skb.

In order to allow allocating dsts with variable length, the existing
dst_alloc() is split into a dst_alloc() and dst_init() function. The
existing dst_init() function to initialize the subsystem is being
renamed to dst_subsys_init() to make it clear what is what.

The check before ip_route_input() is changed to ignore metadata dsts
and drop the dst inside the routing function thus allowing to interpret
metadata in a later commit.

Signed-off-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduces a new dst_metadata which enables to carry per packet metadata
between forwarding and processing elements via the skb-&gt;dst pointer.

The structure is set up to be a union. Thus, each separate type of
metadata requires its own dst instance. If demand arises to carry
multiple types of metadata concurrently, metadata dst entries can be
made stackable.

The metadata dst entry is refcnt'ed as expected for now but a non
reference counted use is possible if the reference is forced before
queueing the skb.

In order to allow allocating dsts with variable length, the existing
dst_alloc() is split into a dst_alloc() and dst_init() function. The
existing dst_init() function to initialize the subsystem is being
renamed to dst_subsys_init() to make it clear what is what.

The check before ip_route_input() is changed to ignore metadata dsts
and drop the dst inside the routing function thus allowing to interpret
metadata in a later commit.

Signed-off-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: don't reforward packets already forwarded by offload device</title>
<updated>2015-07-21T01:32:44+00:00</updated>
<author>
<name>Scott Feldman</name>
<email>sfeldma@gmail.com</email>
</author>
<published>2015-07-19T01:24:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0c4f691ff6791e55ac831666df0b49b1679c56e4'/>
<id>0c4f691ff6791e55ac831666df0b49b1679c56e4</id>
<content type='text'>
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb-&gt;offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev-&gt;skb_mark.  The switchdev port
driver would assign a non-zero dev-&gt;offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Acked-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Acked-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb-&gt;offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev-&gt;skb_mark.  The switchdev port
driver would assign a non-zero dev-&gt;offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Acked-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Acked-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net core: Add protodown support.</title>
<updated>2015-07-16T04:39:40+00:00</updated>
<author>
<name>Anuradha Karuppiah</name>
<email>anuradhak@cumulusnetworks.com</email>
</author>
<published>2015-07-14T20:43:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d746d707a8b1421a4ba46b497cb5d59e20161645'/>
<id>d746d707a8b1421a4ba46b497cb5d59e20161645</id>
<content type='text'>
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.

The switch driver can react to protodown notification by doing a phys down
on the associated switch port.

Signed-off-by: Anuradha Karuppiah &lt;anuradhak@cumulusnetworks.com&gt;
Signed-off-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: Wilson Kok &lt;wkok@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.

The switch driver can react to protodown notification by doing a phys down
on the associated switch port.

Signed-off-by: Anuradha Karuppiah &lt;anuradhak@cumulusnetworks.com&gt;
Signed-off-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: Wilson Kok &lt;wkok@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-07-14T00:28:09+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-07-14T00:28:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=638d3c63811e31b2745f7fdd568b38c8abcffe03'/>
<id>638d3c63811e31b2745f7fdd568b38c8abcffe03</id>
<content type='text'>
Conflicts:
	net/bridge/br_mdb.c

Minor conflict in br_mdb.c, in 'net' we added a memset of the
on-stack 'ip' variable whereas in 'net-next' we assign a new
member 'vid'.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/bridge/br_mdb.c

Minor conflict in br_mdb.c, in 'net' we added a memset of the
on-stack 'ip' variable whereas in 'net-next' we assign a new
member 'vid'.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
