<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/rtnetlink.h, branch v4.6-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net, sched: add clsact qdisc</title>
<updated>2016-01-11T03:13:15+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-01-07T21:29:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1f211a1b929c804100e138c5d3d656992cfd5622'/>
<id>1f211a1b929c804100e138c5d3d656992cfd5622</id>
<content type='text'>
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).

Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb-&gt;priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb-&gt;priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb-&gt;tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.

Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb-&gt;priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).

The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev-&gt;ingress_cl_list and dev-&gt;egress_cl_list, latter stored close to
dev-&gt;_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.

Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.

I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.

Example, adding qdisc:

   # tc qdisc add dev foo clsact
   # tc qdisc show dev foo
   qdisc mq 0: root
   qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc clsact ffff: parent ffff:fff1

Adding filters (deleting, etc works analogous by specifying ingress/egress):

   # tc filter add dev foo ingress bpf da obj bar.o sec ingress
   # tc filter add dev foo egress  bpf da obj bar.o sec egress
   # tc filter show dev foo ingress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
   # tc filter show dev foo egress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.

Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

  [1] http://patchwork.ozlabs.org/patch/512949/

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).

Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb-&gt;priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb-&gt;priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb-&gt;tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.

Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb-&gt;priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).

The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev-&gt;ingress_cl_list and dev-&gt;egress_cl_list, latter stored close to
dev-&gt;_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.

Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.

I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.

Example, adding qdisc:

   # tc qdisc add dev foo clsact
   # tc qdisc show dev foo
   qdisc mq 0: root
   qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
   qdisc clsact ffff: parent ffff:fff1

Adding filters (deleting, etc works analogous by specifying ingress/egress):

   # tc filter add dev foo ingress bpf da obj bar.o sec ingress
   # tc filter add dev foo egress  bpf da obj bar.o sec egress
   # tc filter show dev foo ingress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
   # tc filter show dev foo egress
   filter protocol all pref 49152 bpf
   filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.

Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

  [1] http://patchwork.ozlabs.org/patch/512949/

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/core: lockdep_rtnl_is_held can be boolean</title>
<updated>2015-10-09T14:49:06+00:00</updated>
<author>
<name>Yaowei Bai</name>
<email>bywxiaobai@163.com</email>
</author>
<published>2015-10-08T13:29:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0cbf334376d5e82d7a2f5cd234ca4f5d0843f3ea'/>
<id>0cbf334376d5e82d7a2f5cd234ca4f5d0843f3ea</id>
<content type='text'>
This patch makes lockdep_rtnl_is_held return bool due to this
particular function only using either one or zero as its return
value.

In another patch lockdep_is_held is also made return bool.

No functional change.

Signed-off-by: Yaowei Bai &lt;bywxiaobai@163.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch makes lockdep_rtnl_is_held return bool due to this
particular function only using either one or zero as its return
value.

In another patch lockdep_is_held is also made return bool.

No functional change.

Signed-off-by: Yaowei Bai &lt;bywxiaobai@163.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>switchdev; add VLAN support for port's bridge_getlink</title>
<updated>2015-06-23T13:56:18+00:00</updated>
<author>
<name>Scott Feldman</name>
<email>sfeldma@gmail.com</email>
</author>
<published>2015-06-22T07:27:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d4f8d871ab15bd50a5771382ca2c9355b38d73c'/>
<id>7d4f8d871ab15bd50a5771382ca2c9355b38d73c</id>
<content type='text'>
One more missing piece of the puzzle.  Add vlan dump support to switchdev
port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg.  Before this patch, "bridge vlan show":

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			&lt;&lt; bridge side vlans
		 57

	sw1p1				&lt;&lt; device side vlans (missing)

	sw1p2    57

	sw1p2

	sw1p3

	sw1p4

	br0     None

(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port.  The listing above show no vlans for the device side even though they
are installed).

After this patch:

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			&lt;&lt; bridge side vlan
		 57

	sw1p1    30-34			&lt;&lt; device side vlans
		 57
		 3840 PVID

	sw1p2    57

	sw1p2    57
		 3840 PVID

	sw1p3    3842 PVID

	sw1p4    3843 PVID

	br0     None

I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
switchdev support adds an obj dump for VLAN objects, using the same
call-back scheme as FDB dump.  Support included for both compressed and
un-compressed vlan dumps.

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One more missing piece of the puzzle.  Add vlan dump support to switchdev
port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
to show the vlans installed on the bridge and the device , but (until now)
no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
msg.  Before this patch, "bridge vlan show":

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			&lt;&lt; bridge side vlans
		 57

	sw1p1				&lt;&lt; device side vlans (missing)

	sw1p2    57

	sw1p2

	sw1p3

	sw1p4

	br0     None

(When the port is bridged, the output repeats the vlan list for the vlans
on the bridge side of the port and the vlans on the device side of the
port.  The listing above show no vlans for the device side even though they
are installed).

After this patch:

	$ bridge -c vlan show
	port    vlan ids
	sw1p1    30-34			&lt;&lt; bridge side vlan
		 57

	sw1p1    30-34			&lt;&lt; device side vlans
		 57
		 3840 PVID

	sw1p2    57

	sw1p2    57
		 3840 PVID

	sw1p3    3842 PVID

	sw1p4    3843 PVID

	br0     None

I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
switchdev support adds an obj dump for VLAN objects, using the same
call-back scheme as FDB dump.  Support included for both compressed and
un-compressed vlan dumps.

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add CONFIG_NET_INGRESS to enable ingress filtering</title>
<updated>2015-05-14T05:10:05+00:00</updated>
<author>
<name>Pablo Neira</name>
<email>pablo@netfilter.org</email>
</author>
<published>2015-05-13T16:19:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1cf51900f8545b358b5deaacfda348d990f671db'/>
<id>1cf51900f8545b358b5deaacfda348d990f671db</id>
<content type='text'>
This new config switch enables the ingress filtering infrastructure that is
controlled through the ingress_needed static key. This prepares the
introduction of the Netfilter ingress hook that resides under this unique
static key.

Note that CONFIG_SCH_INGRESS automatically selects this, that should be no
problem since this also depends on CONFIG_NET_CLS_ACT.

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This new config switch enables the ingress filtering infrastructure that is
controlled through the ingress_needed static key. This prepares the
introduction of the Netfilter ingress hook that resides under this unique
static key.

Note that CONFIG_SCH_INGRESS automatically selects this, that should be no
problem since this also depends on CONFIG_NET_CLS_ACT.

Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: kill useless net_*_ingress_queue() definitions when NET_CLS_ACT is unset</title>
<updated>2015-05-13T19:44:28+00:00</updated>
<author>
<name>Pablo Neira</name>
<email>pablo@netfilter.org</email>
</author>
<published>2015-05-12T18:28:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f0b5e8a42f37a880b8467e59dc814f4f21581d3d'/>
<id>f0b5e8a42f37a880b8467e59dc814f4f21581d3d</id>
<content type='text'>
This fixes 4577139b2dabf589 ("net: use jump label patching for ingress qdisc in
__netif_receive_skb_core").

The only client of this is sch_ingress and it depends on NET_CLS_ACT. So
there is no way these definition can be of any help.

Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes 4577139b2dabf589 ("net: use jump label patching for ingress qdisc in
__netif_receive_skb_core").

The only client of this is sch_ingress and it depends on NET_CLS_ACT. So
there is no way these definition can be of any help.

Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bridge/nl: remove wrong use of NLM_F_MULTI</title>
<updated>2015-04-29T18:59:16+00:00</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2015-04-28T16:33:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46c264daaaa569e24f8aba877d0fd8167c42a9a4'/>
<id>46c264daaaa569e24f8aba877d0fd8167c42a9a4</id>
<content type='text'>
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: e5a55a898720 ("net: create generic bridge ops")
Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
CC: John Fastabend &lt;john.r.fastabend@intel.com&gt;
CC: Sathya Perla &lt;sathya.perla@emulex.com&gt;
CC: Subbu Seetharaman &lt;subbu.seetharaman@emulex.com&gt;
CC: Ajit Khaparde &lt;ajit.khaparde@emulex.com&gt;
CC: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
CC: intel-wired-lan@lists.osuosl.org
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Scott Feldman &lt;sfeldma@gmail.com&gt;
CC: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
it is sent only at the end of a dump.

Libraries like libnl will wait forever for NLMSG_DONE.

Fixes: e5a55a898720 ("net: create generic bridge ops")
Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
CC: John Fastabend &lt;john.r.fastabend@intel.com&gt;
CC: Sathya Perla &lt;sathya.perla@emulex.com&gt;
CC: Subbu Seetharaman &lt;subbu.seetharaman@emulex.com&gt;
CC: Ajit Khaparde &lt;ajit.khaparde@emulex.com&gt;
CC: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
CC: intel-wired-lan@lists.osuosl.org
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Scott Feldman &lt;sfeldma@gmail.com&gt;
CC: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
CC: bridge@lists.linux-foundation.org
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: use jump label patching for ingress qdisc in __netif_receive_skb_core</title>
<updated>2015-04-13T17:34:40+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-04-10T21:07:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4577139b2dabf58973d59d157aae4ddd3bde863a'/>
<id>4577139b2dabf58973d59d157aae4ddd3bde863a</id>
<content type='text'>
Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.

Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.

Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.

Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.

Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()</title>
<updated>2014-12-09T18:36:57+00:00</updated>
<author>
<name>Mahesh Bandewar</name>
<email>maheshb@google.com</email>
</author>
<published>2014-12-03T21:46:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=395eea6ccf2b253f81b4718ffbcae67d36fe2e69'/>
<id>395eea6ccf2b253f81b4718ffbcae67d36fe2e69</id>
<content type='text'>
The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.

This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.

It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.

kernel: [  255.139429] ------------[ cut here ]------------
kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [  255.139520] Call Trace:
kernel: [  255.139527]  [&lt;ffffffff815d87f4&gt;] dump_stack+0x46/0x58
kernel: [  255.139531]  [&lt;ffffffff8109c29c&gt;] warn_slowpath_common+0x8c/0xc0
kernel: [  255.139540]  [&lt;ffffffff8109c2ea&gt;] warn_slowpath_null+0x1a/0x20
kernel: [  255.139544]  [&lt;ffffffff8150d570&gt;] rtmsg_ifinfo+0x100/0x110
kernel: [  255.139547]  [&lt;ffffffff814f78b5&gt;] rollback_registered_many+0x1d5/0x2d0
kernel: [  255.139549]  [&lt;ffffffff814f79cf&gt;] unregister_netdevice_many+0x1f/0xb0
kernel: [  255.139551]  [&lt;ffffffff8150acab&gt;] rtnl_dellink+0xbb/0x110
kernel: [  255.139553]  [&lt;ffffffff8150da90&gt;] rtnetlink_rcv_msg+0xa0/0x240
kernel: [  255.139557]  [&lt;ffffffff81329283&gt;] ? rhashtable_lookup_compare+0x43/0x80
kernel: [  255.139558]  [&lt;ffffffff8150d9f0&gt;] ? __rtnl_unlock+0x20/0x20
kernel: [  255.139562]  [&lt;ffffffff8152cb11&gt;] netlink_rcv_skb+0xb1/0xc0
kernel: [  255.139563]  [&lt;ffffffff8150a495&gt;] rtnetlink_rcv+0x25/0x40
kernel: [  255.139565]  [&lt;ffffffff8152c398&gt;] netlink_unicast+0x178/0x230
kernel: [  255.139567]  [&lt;ffffffff8152c75f&gt;] netlink_sendmsg+0x30f/0x420
kernel: [  255.139571]  [&lt;ffffffff814e0b0c&gt;] sock_sendmsg+0x9c/0xd0
kernel: [  255.139575]  [&lt;ffffffff811d1d7f&gt;] ? rw_copy_check_uvector+0x6f/0x130
kernel: [  255.139577]  [&lt;ffffffff814e11c9&gt;] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [  255.139578]  [&lt;ffffffff814e1774&gt;] ___sys_sendmsg+0x304/0x310
kernel: [  255.139581]  [&lt;ffffffff81198723&gt;] ? handle_mm_fault+0xca3/0xde0
kernel: [  255.139585]  [&lt;ffffffff811ebc4c&gt;] ? destroy_inode+0x3c/0x70
kernel: [  255.139589]  [&lt;ffffffff8108e6ec&gt;] ? __do_page_fault+0x20c/0x500
kernel: [  255.139597]  [&lt;ffffffff811e8336&gt;] ? dput+0xb6/0x190
kernel: [  255.139606]  [&lt;ffffffff811f05f6&gt;] ? mntput+0x26/0x40
kernel: [  255.139611]  [&lt;ffffffff811d2b94&gt;] ? __fput+0x174/0x1e0
kernel: [  255.139613]  [&lt;ffffffff814e2129&gt;] __sys_sendmsg+0x49/0x90
kernel: [  255.139615]  [&lt;ffffffff814e2182&gt;] SyS_sendmsg+0x12/0x20
kernel: [  255.139617]  [&lt;ffffffff815df092&gt;] system_call_fastpath+0x12/0x17
kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---

Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Reported-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.

This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.

It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.

kernel: [  255.139429] ------------[ cut here ]------------
kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [  255.139520] Call Trace:
kernel: [  255.139527]  [&lt;ffffffff815d87f4&gt;] dump_stack+0x46/0x58
kernel: [  255.139531]  [&lt;ffffffff8109c29c&gt;] warn_slowpath_common+0x8c/0xc0
kernel: [  255.139540]  [&lt;ffffffff8109c2ea&gt;] warn_slowpath_null+0x1a/0x20
kernel: [  255.139544]  [&lt;ffffffff8150d570&gt;] rtmsg_ifinfo+0x100/0x110
kernel: [  255.139547]  [&lt;ffffffff814f78b5&gt;] rollback_registered_many+0x1d5/0x2d0
kernel: [  255.139549]  [&lt;ffffffff814f79cf&gt;] unregister_netdevice_many+0x1f/0xb0
kernel: [  255.139551]  [&lt;ffffffff8150acab&gt;] rtnl_dellink+0xbb/0x110
kernel: [  255.139553]  [&lt;ffffffff8150da90&gt;] rtnetlink_rcv_msg+0xa0/0x240
kernel: [  255.139557]  [&lt;ffffffff81329283&gt;] ? rhashtable_lookup_compare+0x43/0x80
kernel: [  255.139558]  [&lt;ffffffff8150d9f0&gt;] ? __rtnl_unlock+0x20/0x20
kernel: [  255.139562]  [&lt;ffffffff8152cb11&gt;] netlink_rcv_skb+0xb1/0xc0
kernel: [  255.139563]  [&lt;ffffffff8150a495&gt;] rtnetlink_rcv+0x25/0x40
kernel: [  255.139565]  [&lt;ffffffff8152c398&gt;] netlink_unicast+0x178/0x230
kernel: [  255.139567]  [&lt;ffffffff8152c75f&gt;] netlink_sendmsg+0x30f/0x420
kernel: [  255.139571]  [&lt;ffffffff814e0b0c&gt;] sock_sendmsg+0x9c/0xd0
kernel: [  255.139575]  [&lt;ffffffff811d1d7f&gt;] ? rw_copy_check_uvector+0x6f/0x130
kernel: [  255.139577]  [&lt;ffffffff814e11c9&gt;] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [  255.139578]  [&lt;ffffffff814e1774&gt;] ___sys_sendmsg+0x304/0x310
kernel: [  255.139581]  [&lt;ffffffff81198723&gt;] ? handle_mm_fault+0xca3/0xde0
kernel: [  255.139585]  [&lt;ffffffff811ebc4c&gt;] ? destroy_inode+0x3c/0x70
kernel: [  255.139589]  [&lt;ffffffff8108e6ec&gt;] ? __do_page_fault+0x20c/0x500
kernel: [  255.139597]  [&lt;ffffffff811e8336&gt;] ? dput+0xb6/0x190
kernel: [  255.139606]  [&lt;ffffffff811f05f6&gt;] ? mntput+0x26/0x40
kernel: [  255.139611]  [&lt;ffffffff811d2b94&gt;] ? __fput+0x174/0x1e0
kernel: [  255.139613]  [&lt;ffffffff814e2129&gt;] __sys_sendmsg+0x49/0x90
kernel: [  255.139615]  [&lt;ffffffff814e2182&gt;] SyS_sendmsg+0x12/0x20
kernel: [  255.139617]  [&lt;ffffffff815df092&gt;] system_call_fastpath+0x12/0x17
kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---

Signed-off-by: Mahesh Bandewar &lt;maheshb@google.com&gt;
Reported-by: Toshiaki Makita &lt;makita.toshiaki@lab.ntt.co.jp&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bridge: add brport flags to dflt bridge_getlink</title>
<updated>2014-12-03T04:01:24+00:00</updated>
<author>
<name>Scott Feldman</name>
<email>sfeldma@gmail.com</email>
</author>
<published>2014-11-28T13:34:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2c3c031c8f8930861815fa1685d7c5e8ccec047c'/>
<id>2c3c031c8f8930861815fa1685d7c5e8ccec047c</id>
<content type='text'>
To allow brport device to return current brport flags set on port.  Add
returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink.
With this change, netlink msg returned for bridge_getlink contains the port's
offloaded flag settings (the port's SELF settings).

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To allow brport device to return current brport flags set on port.  Add
returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink.
With this change, netlink msg returned for bridge_getlink contains the port's
offloaded flag settings (the port's SELF settings).

Signed-off-by: Scott Feldman &lt;sfeldma@gmail.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del</title>
<updated>2014-12-03T04:01:18+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@resnulli.us</email>
</author>
<published>2014-11-28T13:34:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f6f6424ba773da6221ecaaa70973eb4dacfa03b2'/>
<id>f6f6424ba773da6221ecaaa70973eb4dacfa03b2</id>
<content type='text'>
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.

Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Acked-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.

Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Acked-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
