<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/bpf, branch v4.5-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>bpf: fix branch offset adjustment on backjumps after patching ctx expansion</title>
<updated>2016-02-10T21:56:47+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-02-10T15:47:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a1b14d27ed0965838350f1377ff97c93ee383492'/>
<id>a1b14d27ed0965838350f1377ff97c93ee383492</id>
<content type='text'>
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.

Analysis on what the check in adjust_branches() is currently doing:

  /* adjust offset of jmps if necessary */
  if (i &lt; pos &amp;&amp; i + insn-&gt;off + 1 &gt; pos)
    insn-&gt;off += delta;
  else if (i &gt; pos &amp;&amp; i + insn-&gt;off + 1 &lt; pos)
    insn-&gt;off -= delta;

First condition (forward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] &lt;--- i/insn            insns[1] &lt;--- i/insn
  insns[2] &lt;--- pos               insns[P] &lt;--- pos
  insns[3]                        insns[P]  `------| delta
  insns[4] &lt;--- target_X          insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] &lt;--- target_X
                                  insns[5]

First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn-&gt;off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.

Second condition (backward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] &lt;--- target_X          insns[1] &lt;--- target_X
  insns[2] &lt;--- pos &lt;-- target_Y  insns[P] &lt;--- pos &lt;-- target_Y
  insns[3]                        insns[P]  `------| delta
  insns[4] &lt;--- i/insn            insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] &lt;--- i/insn
                                  insns[5]

Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i &gt; pos is okay only by itself. However, i +
insn-&gt;off + 1 &lt; pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn-&gt;off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be &lt;= as pos
itself can be a target in the backjump, too.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.

Analysis on what the check in adjust_branches() is currently doing:

  /* adjust offset of jmps if necessary */
  if (i &lt; pos &amp;&amp; i + insn-&gt;off + 1 &gt; pos)
    insn-&gt;off += delta;
  else if (i &gt; pos &amp;&amp; i + insn-&gt;off + 1 &lt; pos)
    insn-&gt;off -= delta;

First condition (forward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] &lt;--- i/insn            insns[1] &lt;--- i/insn
  insns[2] &lt;--- pos               insns[P] &lt;--- pos
  insns[3]                        insns[P]  `------| delta
  insns[4] &lt;--- target_X          insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] &lt;--- target_X
                                  insns[5]

First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn-&gt;off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.

Second condition (backward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] &lt;--- target_X          insns[1] &lt;--- target_X
  insns[2] &lt;--- pos &lt;-- target_Y  insns[P] &lt;--- pos &lt;-- target_Y
  insns[3]                        insns[P]  `------| delta
  insns[4] &lt;--- i/insn            insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] &lt;--- i/insn
                                  insns[5]

Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i &gt; pos is okay only by itself. However, i +
insn-&gt;off + 1 &lt; pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn-&gt;off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be &lt;= as pos
itself can be a target in the backjump, too.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf/bpf: Convert perf_event_array to use struct file</title>
<updated>2016-01-29T07:35:25+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>alexei.starovoitov@gmail.com</email>
</author>
<published>2016-01-26T04:59:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e03e7ee34fdd1c3ef494949a75cb8c61c7265fa9'/>
<id>e03e7ee34fdd1c3ef494949a75cb8c61c7265fa9</id>
<content type='text'>
Robustify refcounting.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Robustify refcounting.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160126045947.GA40151@ast-mbp.thefacebook.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: bpf: reject invalid shifts</title>
<updated>2016-01-12T22:06:53+00:00</updated>
<author>
<name>Rabin Vincent</name>
<email>rabin@rab.in</email>
</author>
<published>2016-01-12T19:17:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=229394e8e62a4191d592842cf67e80c62a492937'/>
<id>229394e8e62a4191d592842cf67e80c62a492937</id>
<content type='text'>
On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a
constant shift that can't be encoded in the immediate field of the
UBFM/SBFM instructions is passed to the JIT.  Since these shifts
amounts, which are negative or &gt;= regsize, are invalid, reject them in
the eBPF verifier and the classic BPF filter checker, for all
architectures.

Signed-off-by: Rabin Vincent &lt;rabin@rab.in&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On ARM64, a BUG() is triggered in the eBPF JIT if a filter with a
constant shift that can't be encoded in the immediate field of the
UBFM/SBFM instructions is passed to the JIT.  Since these shifts
amounts, which are negative or &gt;= regsize, are invalid, reject them in
the eBPF verifier and the classic BPF filter checker, for all
architectures.

Signed-off-by: Rabin Vincent &lt;rabin@rab.in&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: hash: use per-bucket spinlock</title>
<updated>2015-12-29T20:13:44+00:00</updated>
<author>
<name>tom.leiming@gmail.com</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2015-12-29T14:40:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=688ecfe60220516e8b6707c832ec02e92522dd85'/>
<id>688ecfe60220516e8b6707c832ec02e92522dd85</id>
<content type='text'>
Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: hash: move select_bucket() out of htab's spinlock</title>
<updated>2015-12-29T20:13:44+00:00</updated>
<author>
<name>tom.leiming@gmail.com</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2015-12-29T14:40:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=45d8390c56bd2851097736c1c20ad958880168df'/>
<id>45d8390c56bd2851097736c1c20ad958880168df</id>
<content type='text'>
The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: hash: use atomic count</title>
<updated>2015-12-29T20:13:43+00:00</updated>
<author>
<name>tom.leiming@gmail.com</name>
<email>tom.leiming@gmail.com</email>
</author>
<published>2015-12-29T14:40:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6591f1e6662dd595effb52a54e42a6d2d2b03e51'/>
<id>6591f1e6662dd595effb52a54e42a6d2d2b03e51</id>
<content type='text'>
Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: move clearing of A/X into classic to eBPF migration prologue</title>
<updated>2015-12-18T21:04:51+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-12-17T22:51:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b614aebecdf2b1f72d51b1527f5a75d218b78e2'/>
<id>8b614aebecdf2b1f72d51b1527f5a75d218b78e2</id>
<content type='text'>
Back in the days where eBPF (or back then "internal BPF" ;-&gt;) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88da
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Michael Holzheu &lt;holzheu@linux.vnet.ibm.com&gt;
Tested-by: Michael Holzheu &lt;holzheu@linux.vnet.ibm.com&gt;
Cc: Zi Shen Lim &lt;zlim.lnx@gmail.com&gt;
Cc: Yang Shi &lt;yang.shi@linaro.org&gt;
Acked-by: Yang Shi &lt;yang.shi@linaro.org&gt;
Acked-by: Zi Shen Lim &lt;zlim.lnx@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Back in the days where eBPF (or back then "internal BPF" ;-&gt;) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99c9
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3fe ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88da
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Michael Holzheu &lt;holzheu@linux.vnet.ibm.com&gt;
Tested-by: Michael Holzheu &lt;holzheu@linux.vnet.ibm.com&gt;
Cc: Zi Shen Lim &lt;zlim.lnx@gmail.com&gt;
Cc: Yang Shi &lt;yang.shi@linaro.org&gt;
Acked-by: Yang Shi &lt;yang.shi@linaro.org&gt;
Acked-by: Zi Shen Lim &lt;zlim.lnx@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, inode: allow for rename and link ops</title>
<updated>2015-12-12T23:44:23+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-12-10T21:33:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bb35a6ef7da492e7df1fe8772716ff88c172b4cc'/>
<id>bb35a6ef7da492e7df1fe8772716ff88c172b4cc</id>
<content type='text'>
Add support for renaming and hard links to the fs. Most of this can be
implemented by using simple library operations under the same constraints
that we don't use a reserved name like elsewhere. Linking can be useful
to share/manage things like maps across subsystem users. It works within
the file system boundary, but is not allowed for directories.

Symbolic links are explicitly not implemented here, as it can be better
done already by doing bind mounts inside bpf fs to set up shared directories
f.e. useful when using volumes in docker containers that map a private
working directory into /sys/fs/bpf/ which contains itself a bind mounted
path from the host's /sys/fs/bpf/ mount that is shared among multiple
containers. For single maps instead of whole directory, hard links can
be easily used to do the same.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for renaming and hard links to the fs. Most of this can be
implemented by using simple library operations under the same constraints
that we don't use a reserved name like elsewhere. Linking can be useful
to share/manage things like maps across subsystem users. It works within
the file system boundary, but is not allowed for directories.

Symbolic links are explicitly not implemented here, as it can be better
done already by doing bind mounts inside bpf fs to set up shared directories
f.e. useful when using volumes in docker containers that map a private
working directory into /sys/fs/bpf/ which contains itself a bind mounted
path from the host's /sys/fs/bpf/ mount that is shared among multiple
containers. For single maps instead of whole directory, hard links can
be easily used to do the same.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-12-04T02:09:12+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-12-04T02:03:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f188b951f33a0464338f94f928338f84fc0e4392'/>
<id>f188b951f33a0464338f94f928338f84fc0e4392</id>
<content type='text'>
Conflicts:
	drivers/net/ethernet/renesas/ravb_main.c
	kernel/bpf/syscall.c
	net/ipv4/ipmr.c

All three conflicts were cases of overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	drivers/net/ethernet/renesas/ravb_main.c
	kernel/bpf/syscall.c
	net/ipv4/ipmr.c

All three conflicts were cases of overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix allocation warnings in bpf maps and integer overflow</title>
<updated>2015-12-03T04:36:00+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2015-11-30T00:59:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=01b3f52157ff5a47d6d8d796f396a4b34a53c61d'/>
<id>01b3f52157ff5a47d6d8d796f396a4b34a53c61d</id>
<content type='text'>
For large map-&gt;value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
 [&lt;     inline     &gt;] __dump_stack lib/dump_stack.c:15
 [&lt;ffffffff82743b56&gt;] dump_stack+0x68/0x92 lib/dump_stack.c:50
 [&lt;ffffffff81244ec9&gt;] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [&lt;ffffffff812450f9&gt;] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [&lt;     inline     &gt;] __alloc_pages_slowpath mm/page_alloc.c:2989
 [&lt;ffffffff81554e95&gt;] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
 [&lt;ffffffff816188fe&gt;] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [&lt;     inline     &gt;] alloc_pages include/linux/gfp.h:451
 [&lt;ffffffff81550706&gt;] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [&lt;ffffffff815a1c89&gt;] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
 [&lt;ffffffff815a1cef&gt;] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
 [&lt;     inline     &gt;] kmalloc_large include/linux/slab.h:390
 [&lt;ffffffff81627784&gt;] __kmalloc+0x234/0x250 mm/slub.c:3525
 [&lt;     inline     &gt;] kmalloc include/linux/slab.h:463
 [&lt;     inline     &gt;] map_update_elem kernel/bpf/syscall.c:288
 [&lt;     inline     &gt;] SYSC_bpf kernel/bpf/syscall.c:744

To avoid never succeeding kmalloc with order &gt;= MAX_ORDER check that
elem-&gt;value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size &lt;= 512,
so keep those kmalloc-s as-is.

Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For large map-&gt;value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
 [&lt;     inline     &gt;] __dump_stack lib/dump_stack.c:15
 [&lt;ffffffff82743b56&gt;] dump_stack+0x68/0x92 lib/dump_stack.c:50
 [&lt;ffffffff81244ec9&gt;] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [&lt;ffffffff812450f9&gt;] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [&lt;     inline     &gt;] __alloc_pages_slowpath mm/page_alloc.c:2989
 [&lt;ffffffff81554e95&gt;] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
 [&lt;ffffffff816188fe&gt;] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [&lt;     inline     &gt;] alloc_pages include/linux/gfp.h:451
 [&lt;ffffffff81550706&gt;] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [&lt;ffffffff815a1c89&gt;] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
 [&lt;ffffffff815a1cef&gt;] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
 [&lt;     inline     &gt;] kmalloc_large include/linux/slab.h:390
 [&lt;ffffffff81627784&gt;] __kmalloc+0x234/0x250 mm/slub.c:3525
 [&lt;     inline     &gt;] kmalloc include/linux/slab.h:463
 [&lt;     inline     &gt;] map_update_elem kernel/bpf/syscall.c:288
 [&lt;     inline     &gt;] SYSC_bpf kernel/bpf/syscall.c:744

To avoid never succeeding kmalloc with order &gt;= MAX_ORDER check that
elem-&gt;value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size &lt;= 512,
so keep those kmalloc-s as-is.

Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
