<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/net/ip.h, branch v3.17-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2014-07-30T20:25:49+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-07-30T20:25:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f139c74a8df071217dcd63f3ef06ae7be7071c4d'/>
<id>f139c74a8df071217dcd63f3ef06ae7be7071c4d</id>
<content type='text'>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fail early when creating netdev named all or default</title>
<updated>2014-07-29T18:43:50+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-07-25T22:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=20e61da7ffcfd84a1b6f797e745608572e5bc218'/>
<id>20e61da7ffcfd84a1b6f797e745608572e5bc218</id>
<content type='text'>
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".

Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.

Reported-by: Stephane Chazelas &lt;stephane.chazelas@gmail.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We create a proc dir for each network device, this will cause
conflicts when the devices have name "all" or "default".

Rather than emitting an ugly kernel warning, we could just
fail earlier by checking the device name.

Reported-by: Stephane Chazelas &lt;stephane.chazelas@gmail.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip: make IP identifiers less predictable</title>
<updated>2014-07-29T01:46:34+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-07-26T06:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=04ca6973f7c1a0d8537f2d9906a0cf8e69886d75'/>
<id>04ca6973f7c1a0d8537f2d9906a0cf8e69886d75</id>
<content type='text'>
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A &gt; target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target &gt; A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A &gt; target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target &gt; A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A &gt; target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target &gt; A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Jeffrey Knockel &lt;jeffk@cs.unm.edu&gt;
Reported-by: Jedidiah R. Crandall &lt;crandall@cs.unm.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Cc: Hannes Frederic Sowa &lt;hannes@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.

With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.

This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.

Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.

This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.

We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.

For IPv6, adds saddr into the hash as well, but not nexthdr.

If I ping the patched target, we can see ID are now hard to predict.

21:57:11.008086 IP (...)
    A &gt; target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
    target &gt; A: ICMP echo reply, seq 1, length 64

21:57:12.013133 IP (...)
    A &gt; target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
    target &gt; A: ICMP echo reply, seq 2, length 64

21:57:13.016580 IP (...)
    A &gt; target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
    target &gt; A: ICMP echo reply, seq 3, length 64

[1] TCP sessions uses a per flow ID generator not changed by this patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Jeffrey Knockel &lt;jeffk@cs.unm.edu&gt;
Reported-by: Jedidiah R. Crandall &lt;crandall@cs.unm.edu&gt;
Cc: Willy Tarreau &lt;w@1wt.eu&gt;
Cc: Hannes Frederic Sowa &lt;hannes@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: frag: don't account number of fragment queues</title>
<updated>2014-07-28T05:34:36+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-07-24T14:50:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=434d305405ab86414f6ea3f261307d443a2c3506'/>
<id>434d305405ab86414f6ea3f261307d443a2c3506</id>
<content type='text'>
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Save TX flow hash in sock and set in skbuf on xmit</title>
<updated>2014-07-08T04:14:21+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>therbert@google.com</email>
</author>
<published>2014-07-02T04:32:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b73c3d0e4f0e1961e15bec18720e48aabebe2109'/>
<id>b73c3d0e4f0e1961e15bec18720e48aabebe2109</id>
<content type='text'>
For a connected socket we can precompute the flow hash for setting
in skb-&gt;hash on output. This is a performance advantage over
calculating the skb-&gt;hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb-&gt;hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For a connected socket we can precompute the flow hash for setting
in skb-&gt;hash on output. This is a performance advantage over
calculating the skb-&gt;hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb-&gt;hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: get rid of ip_id_count</title>
<updated>2014-06-02T18:00:41+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-02T12:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=73f156a6e8c1074ac6327e0abd1169e95eb66463'/>
<id>73f156a6e8c1074ac6327e0abd1169e95eb66463</id>
<content type='text'>
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix CONFIG_SYSCTL ifdef test.</title>
<updated>2014-05-15T17:43:14+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-05-15T17:43:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fcd77db07dd2b8d35e0db0d1209f2ce1bb05531e'/>
<id>fcd77db07dd2b8d35e0db0d1209f2ce1bb05531e</id>
<content type='text'>
&gt; include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef]
&gt;  #if CONFIG_SYSCTL
&gt;      ^

Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
&gt; include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef]
&gt;  #if CONFIG_SYSCTL
&gt;      ^

Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: make ip_local_reserved_ports per netns</title>
<updated>2014-05-14T19:31:45+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-05-12T23:04:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=122ff243f5f104194750ecbc76d5946dd1eec934'/>
<id>122ff243f5f104194750ecbc76d5946dd1eec934</id>
<content type='text'>
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.

By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()

Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: add a sysctl to reflect the fwmark on replies</title>
<updated>2014-05-13T22:35:08+00:00</updated>
<author>
<name>Lorenzo Colitti</name>
<email>lorenzo@google.com</email>
</author>
<published>2014-05-13T17:17:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e110861f86094cd78cc85593b873970092deb43a'/>
<id>e110861f86094cd78cc85593b873970092deb43a</id>
<content type='text'>
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).

Signed-off-by: Lorenzo Colitti &lt;lorenzo@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).

Signed-off-by: Lorenzo Colitti &lt;lorenzo@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: rename local_df to ignore_df</title>
<updated>2014-05-12T18:03:41+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2014-05-04T23:39:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=60ff746739bf805a912484643c720b6124826140'/>
<id>60ff746739bf805a912484643c720b6124826140</id>
<content type='text'>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski &lt;maze@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
