<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/net/inet_hashtables.h, branch v4.15-rc2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>net: ipv4: add second dif to inet socket lookups</title>
<updated>2017-08-07T18:39:21+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsahern@gmail.com</email>
</author>
<published>2017-08-07T15:44:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3fa6f616a7a4d0bdf4d877d530456d8a5c3b109b'/>
<id>3fa6f616a7a4d0bdf4d877d530456d8a5c3b109b</id>
<content type='text'>
Add a second device index, sdif, to inet socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after the cb move
in  tcp_v4_rcv the tcp_v4_sdif helper is used.

Signed-off-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a second device index, sdif, to inet socket lookups. sdif is the
index for ingress devices enslaved to an l3mdev. It allows the lookups
to consider the enslaved device as well as the L3 domain when searching
for a socket.

TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the
ingress index is obtained from IPCB using inet_sdif and after the cb move
in  tcp_v4_rcv the tcp_v4_sdif helper is used.

Signed-off-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: make sk_ehashfn() static</title>
<updated>2017-07-03T10:29:14+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-07-03T09:57:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=784c372a8184bdb8ae722c94250c2d57dc327a8e'/>
<id>784c372a8184bdb8ae722c94250c2d57dc327a8e</id>
<content type='text'>
sk_ehashfn() is only used from a single file.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sk_ehashfn() is only used from a single file.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: convert sock.sk_refcnt from atomic_t to refcount_t</title>
<updated>2017-07-01T14:39:08+00:00</updated>
<author>
<name>Reshetova, Elena</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-06-30T10:08:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=41c6d650f6537e55a1b53438c646fbc3f49176bf'/>
<id>41c6d650f6537e55a1b53438c646fbc3f49176bf</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: reset tb-&gt;fastreuseport when adding a reuseport sk</title>
<updated>2017-01-18T18:04:29+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2017-01-17T15:51:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=637bc8bbe6c0a288a596edfdcdd5657c72a848db'/>
<id>637bc8bbe6c0a288a596edfdcdd5657c72a848db</id>
<content type='text'>
If we have non reuseport sockets on a tb we will set tb-&gt;fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family
so we know what comparison to make, and the ipv6 only setting so we can make
sure to compare with new sockets appropriately.  Once one sk has made it onto
the list we know that there are no potential bind conflicts on the owners list
that match that sk's rcv_addr.  So copy the sk's information into our bind
bucket and set tb-&gt;fastruseport to FASTREUSESOCK_STRICT so we know we have to do
an extra check for subsequent reuseport sockets and skip the expensive bind
conflict check.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we have non reuseport sockets on a tb we will set tb-&gt;fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family
so we know what comparison to make, and the ipv6 only setting so we can make
sure to compare with new sockets appropriately.  Once one sk has made it onto
the list we know that there are no potential bind conflicts on the owners list
that match that sk's rcv_addr.  So copy the sk's information into our bind
bucket and set tb-&gt;fastruseport to FASTREUSESOCK_STRICT so we know we have to do
an extra check for subsequent reuseport sockets and skip the expensive bind
conflict check.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: kill smallest_size and smallest_port</title>
<updated>2017-01-18T18:04:29+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2017-01-17T15:51:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b9470c27607bed1ad3450de789c154f225530112'/>
<id>b9470c27607bed1ad3450de789c154f225530112</id>
<content type='text'>
In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: collapse ipv4/v6 rcv_saddr_equal functions into one</title>
<updated>2017-01-18T18:04:28+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2017-01-17T15:51:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fe38d2a1c8bee0b3a0be40de5b621a28200612e5'/>
<id>fe38d2a1c8bee0b3a0be40de5b621a28200612e5</id>
<content type='text'>
We pass these per-protocol equal functions around in various places, but
we can just have one function that checks the sk-&gt;sk_family and then do
the right comparison function.  I've also changed the ipv4 version to
not cast to inet_sock since it is unneeded.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We pass these per-protocol equal functions around in various places, but
we can just have one function that checks the sk-&gt;sk_family and then do
the right comparison function.  I've also changed the ipv4 version to
not cast to inet_sock since it is unneeded.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp/dccp: do not touch listener sk_refcnt under synflood</title>
<updated>2016-04-05T02:11:20+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-01T15:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b24d854cb35383c30642116e5992fd619bdc9bc'/>
<id>3b24d854cb35383c30642116e5992fd619bdc9bc</id>
<content type='text'>
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
cpus contend on sk-&gt;sk_refcnt and sk-&gt;sk_wmem_alloc changes.

By letting listeners use SOCK_RCU_FREE infrastructure,
we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
only listeners are impacted by this change.

Peak performance under SYNFLOOD is increased by ~33% :

On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

Most consuming functions are now skb_set_owner_w() and sock_wfree()
contending on sk-&gt;sk_wmem_alloc when cooking SYNACK and freeing them.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
cpus contend on sk-&gt;sk_refcnt and sk-&gt;sk_wmem_alloc changes.

By letting listeners use SOCK_RCU_FREE infrastructure,
we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt

Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
only listeners are impacted by this change.

Peak performance under SYNFLOOD is increased by ~33% :

On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps

Most consuming functions are now skb_set_owner_w() and sock_wfree()
contending on sk-&gt;sk_wmem_alloc when cooking SYNACK and freeing them.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp/dccp: remove BH disable/enable in lookup</title>
<updated>2016-04-05T02:11:19+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-01T15:52:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ee3cf32a4a5e6cf5ccc0f0de9865fda3ebc46436'/>
<id>ee3cf32a4a5e6cf5ccc0f0de9865fda3ebc46436</id>
<content type='text'>
Since linux 2.6.29, lookups only use rcu locking.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since linux 2.6.29, lookups only use rcu locking.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>soreuseport: fast reuseport TCP socket selection</title>
<updated>2016-02-11T08:54:15+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c125e80b88687b25b321795457309eaaee4bf270'/>
<id>c125e80b88687b25b321795457309eaaee4bf270</id>
<content type='text'>
This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: refactor inet[6]_lookup functions to take skb</title>
<updated>2016-02-11T08:54:14+00:00</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-02-10T16:50:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a583636a83ea383fd07517e5a7a2eedbc5d90fb1'/>
<id>a583636a83ea383fd07517e5a7a2eedbc5d90fb1</id>
<content type='text'>
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups.  Doing so with a BPF filter will require access to the
skb in question.  This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups.  Doing so with a BPF filter will require access to the
skb in question.  This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.

Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
