linux-toradex.git/net/core/flow_dissector.c, branch v4.1.10

xps: fix xps for stacked devices

2015-02-04T21:02:54+00:00

A typical qdisc setup is the following :

bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc

XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.

Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.

CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0

CPUB -> dequeue packet P1 from bond0
        enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
        enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)

get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.

P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)

Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)

To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.

We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)

Signed-off-by: Eric Dumazet 
Cc: Willem de Bruijn 
Cc: Nandita Dukkipati 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller

flow_dissector: add tipc support

2015-01-27T00:58:09+00:00

The flows are hashed on the sending node address, which allows us
to spread out the TIPC link processing to RPS enabled cores. There
is no point to include the destination address in the hash as that
will always be the same for all inbound links. We have experimented
with a 3-tuple hash over [srcnode, sport, dport], but this showed to
give slightly lower performance because of increased lock contention
when the same link was handled by multiple cores.

Signed-off-by: Ying Xue 
Signed-off-by: Erik Hugne 
Reviewed-by: Jon Maloy 
Signed-off-by: David S. Miller

flow-dissector: Fix alignment issue in __skb_flow_get_ports

2014-10-10T19:33:47+00:00

This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter.  Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.

In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present.  In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.

Reported-by: David S. Miller 
Signed-off-by: Alexander Duyck 
Signed-off-by: David S. Miller

net: Add function for parsing the header length out of linear ethernet frames

2014-09-06T00:47:02+00:00

This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments.  Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.

I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff.  The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.

Signed-off-by: Alexander Duyck 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller

net: make skb an optional parameter for__skb_flow_dissect()

2014-08-26T00:21:26+00:00

Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller

net: fix comments for __skb_flow_get_ports()

2014-08-26T00:21:26+00:00

Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller

net: use reciprocal_scale() helper

2014-08-23T19:21:21+00:00

Replace open codings of (((u64)  * ) >> 32) with reciprocal_scale().

Signed-off-by: Daniel Borkmann 
Cc: Hannes Frederic Sowa 
Signed-off-by: David S. Miller

net: Allow raw buffers to be passed into the flow dissector.

2014-08-23T19:13:41+00:00

Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.

Signed-off-by: David S. Miller

net: Only do flow_dissector hash computation once per packet

2014-07-08T04:14:21+00:00

Add sw_hash flag to skbuff to indicate that skb->hash was computed
from flow_dissector. This flag is checked in skb_get_hash to avoid
repeatedly trying to compute the hash (ie. in the case that no L4 hash
can be computed).

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller

flow_dissector: Use IPv6 flow label in flow_dissector

2014-07-08T04:14:21+00:00

This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller