<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/binfmts.h, branch v4.14-rc7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>exec: load_script: kill the onstack interp[BINPRM_BUF_SIZE] array</title>
<updated>2017-10-04T00:54:25+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2017-10-03T23:15:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c2315c187fa0d3ab363fdebe22718170b40473e3'/>
<id>c2315c187fa0d3ab363fdebe22718170b40473e3</id>
<content type='text'>
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".

It looks like this code was always wrong, then commit 948b701a607f
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.

This patch (of 6):

load_script() can simply use i_name instead, it points into bprm-&gt;buf[]
and nobody can change this memory until we call prepare_binprm().

The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.

While at it, do whitespace/style cleanups.

NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.

Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Travis Gummels &lt;tgummels@redhat.com&gt;
Cc: Ben Woodard &lt;woodard@redhat.com&gt;
Cc: Jim Foraker &lt;foraker1@llnl.gov&gt;
Cc: &lt;tdhooge@llnl.gov&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "exec: binfmt_misc: fix use-after-free, kill
iname[BINPRM_BUF_SIZE]".

It looks like this code was always wrong, then commit 948b701a607f
("binfmt_misc: add persistent opened binary handler for containers")
added more problems.

This patch (of 6):

load_script() can simply use i_name instead, it points into bprm-&gt;buf[]
and nobody can change this memory until we call prepare_binprm().

The only complication is that we need to also change the signature of
bprm_change_interp() but this change looks good too.

While at it, do whitespace/style cleanups.

NOTE: the real motivation for this change is that people want to
increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but
this looks more complicated because afaics it is very buggy.

Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Travis Gummels &lt;tgummels@redhat.com&gt;
Cc: Ben Woodard &lt;woodard@redhat.com&gt;
Cc: Jim Foraker &lt;foraker1@llnl.gov&gt;
Cc: &lt;tdhooge@llnl.gov&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>commoncap: Move cap_elevated calculation into bprm_set_creds</title>
<updated>2017-08-01T19:03:09+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-07-18T22:25:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ee67ae7ef6ff499137292ac8a9dfe86096796283'/>
<id>ee67ae7ef6ff499137292ac8a9dfe86096796283</id>
<content type='text'>
Instead of a separate function, open-code the cap_elevated test, which
lets us entirely remove bprm-&gt;cap_effective (to use the local "effective"
variable instead), and more accurately examine euid/egid changes via the
existing local "is_setid".

The following LTP tests were run to validate the changes:

	# ./runltp -f syscalls -s cap
	# ./runltp -f securebits
	# ./runltp -f cap_bounds
	# ./runltp -f filecaps

All kernel selftests for capabilities and exec continue to pass as well.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Reviewed-by: Andy Lutomirski &lt;luto@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of a separate function, open-code the cap_elevated test, which
lets us entirely remove bprm-&gt;cap_effective (to use the local "effective"
variable instead), and more accurately examine euid/egid changes via the
existing local "is_setid".

The following LTP tests were run to validate the changes:

	# ./runltp -f syscalls -s cap
	# ./runltp -f securebits
	# ./runltp -f cap_bounds
	# ./runltp -f filecaps

All kernel selftests for capabilities and exec continue to pass as well.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Reviewed-by: Andy Lutomirski &lt;luto@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>commoncap: Refactor to remove bprm_secureexec hook</title>
<updated>2017-08-01T19:03:08+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-07-18T22:25:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46d98eb4e1d2bc225f661879e0e157a952107598'/>
<id>46d98eb4e1d2bc225f661879e0e157a952107598</id>
<content type='text'>
The commoncap implementation of the bprm_secureexec hook is the only LSM
that depends on the final call to its bprm_set_creds hook (since it may
be called for multiple files, it ignores bprm-&gt;called_set_creds). As a
result, it cannot safely _clear_ bprm-&gt;secureexec since other LSMs may
have set it.  Instead, remove the bprm_secureexec hook by introducing a
new flag to bprm specific to commoncap: cap_elevated. This is similar to
cap_effective, but that is used for a specific subset of elevated
privileges, and exists solely to track state from bprm_set_creds to
bprm_secureexec. As such, it will be removed in the next patch.

Here, set the new bprm-&gt;cap_elevated flag when setuid/setgid has happened
from bprm_fill_uid() or fscapabilities have been prepared. This temporarily
moves the bprm_secureexec hook to a static inline. The helper will be
removed in the next patch; this makes the step easier to review and bisect,
since this does not introduce any changes to inputs nor outputs to the
"elevated privileges" calculation.

The new flag is merged with the bprm-&gt;secureexec flag in setup_new_exec()
since this marks the end of any further prepare_binprm() calls.

Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commoncap implementation of the bprm_secureexec hook is the only LSM
that depends on the final call to its bprm_set_creds hook (since it may
be called for multiple files, it ignores bprm-&gt;called_set_creds). As a
result, it cannot safely _clear_ bprm-&gt;secureexec since other LSMs may
have set it.  Instead, remove the bprm_secureexec hook by introducing a
new flag to bprm specific to commoncap: cap_elevated. This is similar to
cap_effective, but that is used for a specific subset of elevated
privileges, and exists solely to track state from bprm_set_creds to
bprm_secureexec. As such, it will be removed in the next patch.

Here, set the new bprm-&gt;cap_elevated flag when setuid/setgid has happened
from bprm_fill_uid() or fscapabilities have been prepared. This temporarily
moves the bprm_secureexec hook to a static inline. The helper will be
removed in the next patch; this makes the step easier to review and bisect,
since this does not introduce any changes to inputs nor outputs to the
"elevated privileges" calculation.

The new flag is merged with the bprm-&gt;secureexec flag in setup_new_exec()
since this marks the end of any further prepare_binprm() calls.

Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>binfmt: Introduce secureexec flag</title>
<updated>2017-08-01T19:03:05+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-07-18T22:25:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c425e189ffd7720c881fe9ccd7143cea577f6d03'/>
<id>c425e189ffd7720c881fe9ccd7143cea577f6d03</id>
<content type='text'>
The bprm_secureexec hook can be moved earlier. Right now, it is called
during create_elf_tables(), via load_binary(), via search_binary_handler(),
via exec_binprm(). Nearly all (see exception below) state used by
bprm_secureexec is created during the bprm_set_creds hook, called from
prepare_binprm().

For all LSMs (except commoncaps described next), only the first execution
of bprm_set_creds takes any effect (they all check bprm-&gt;called_set_creds
which prepare_binprm() sets after the first call to the bprm_set_creds
hook).  However, all these LSMs also only do anything with bprm_secureexec
when they detected a secure state during their first run of bprm_set_creds.
Therefore, it is functionally identical to move the detection into
bprm_set_creds, since the results from secureexec here only need to be
based on the first call to the LSM's bprm_set_creds hook.

The single exception is that the commoncaps secureexec hook also examines
euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
via prepare_binprm(), which can be called multiple times (e.g.
binfmt_script, binfmt_misc), and may clear the euid/egid for the final
load (i.e. the script interpreter). However, while commoncaps specifically
ignores bprm-&gt;cred_prepared, and runs its bprm_set_creds hook each time
prepare_binprm() may get called, it needs to base the secureexec decision
on the final call to bprm_set_creds. As a result, it will need special
handling.

To begin this refactoring, this adds the secureexec flag to the bprm
struct, and calls the secureexec hook during setup_new_exec(). This is
safe since all the cred work is finished (and past the point of no return).
This explicit call will be removed in later patches once the hook has been
removed.

Cc: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: John Johansen &lt;john.johansen@canonical.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Reviewed-by: James Morris &lt;james.l.morris@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bprm_secureexec hook can be moved earlier. Right now, it is called
during create_elf_tables(), via load_binary(), via search_binary_handler(),
via exec_binprm(). Nearly all (see exception below) state used by
bprm_secureexec is created during the bprm_set_creds hook, called from
prepare_binprm().

For all LSMs (except commoncaps described next), only the first execution
of bprm_set_creds takes any effect (they all check bprm-&gt;called_set_creds
which prepare_binprm() sets after the first call to the bprm_set_creds
hook).  However, all these LSMs also only do anything with bprm_secureexec
when they detected a secure state during their first run of bprm_set_creds.
Therefore, it is functionally identical to move the detection into
bprm_set_creds, since the results from secureexec here only need to be
based on the first call to the LSM's bprm_set_creds hook.

The single exception is that the commoncaps secureexec hook also examines
euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
via prepare_binprm(), which can be called multiple times (e.g.
binfmt_script, binfmt_misc), and may clear the euid/egid for the final
load (i.e. the script interpreter). However, while commoncaps specifically
ignores bprm-&gt;cred_prepared, and runs its bprm_set_creds hook each time
prepare_binprm() may get called, it needs to base the secureexec decision
on the final call to bprm_set_creds. As a result, it will need special
handling.

To begin this refactoring, this adds the secureexec flag to the bprm
struct, and calls the secureexec hook during setup_new_exec(). This is
safe since all the cred work is finished (and past the point of no return).
This explicit call will be removed in later patches once the hook has been
removed.

Cc: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Reviewed-by: John Johansen &lt;john.johansen@canonical.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Reviewed-by: James Morris &lt;james.l.morris@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: Rename bprm-&gt;cred_prepared to called_set_creds</title>
<updated>2017-08-01T19:02:48+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2017-07-18T22:25:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ddb4a1442def2a78b91a85b4251fb712ef23662b'/>
<id>ddb4a1442def2a78b91a85b4251fb712ef23662b</id>
<content type='text'>
The cred_prepared bprm flag has a misleading name. It has nothing to do
with the bprm_prepare_cred hook, and actually tracks if bprm_set_creds has
been called. Rename this flag and improve its comment.

Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: John Johansen &lt;john.johansen@canonical.com&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cred_prepared bprm flag has a misleading name. It has nothing to do
with the bprm_prepare_cred hook, and actually tracks if bprm_set_creds has
been called. Rename this flag and improve its comment.

Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: Casey Schaufler &lt;casey@schaufler-ca.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: John Johansen &lt;john.johansen@canonical.com&gt;
Acked-by: James Morris &lt;james.l.morris@oracle.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>randstruct: Mark various structs for randomization</title>
<updated>2017-06-30T19:00:51+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2016-10-28T08:22:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3859a271a003aba01e45b85c9d8b355eb7bf25f9'/>
<id>3859a271a003aba01e45b85c9d8b355eb7bf25f9</id>
<content type='text'>
This marks many critical kernel structures for randomization. These are
structures that have been targeted in the past in security exploits, or
contain functions pointers, pointers to function pointer tables, lists,
workqueues, ref-counters, credentials, permissions, or are otherwise
sensitive. This initial list was extracted from Brad Spengler/PaX Team's
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

Left out of this list is task_struct, which requires special handling
and will be covered in a subsequent patch.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This marks many critical kernel structures for randomization. These are
structures that have been targeted in the past in security exploits, or
contain functions pointers, pointers to function pointer tables, lists,
workqueues, ref-counters, credentials, permissions, or are otherwise
sensitive. This initial list was extracted from Brad Spengler/PaX Team's
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

Left out of this list is task_struct, which requires special handling
and will be covered in a subsequent patch.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/headers, vfs/execve: Move the do_execve*() prototypes from &lt;linux/sched.h&gt; to &lt;linux/binfmts.h&gt;</title>
<updated>2017-03-03T00:45:23+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-05T13:24:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=28851755990c832d7050d5e1e2c91ed9d9e6fe59'/>
<id>28851755990c832d7050d5e1e2c91ed9d9e6fe59</id>
<content type='text'>
These are not scheduler related.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These are not scheduler related.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>elf_fdpic_transfer_args_to_stack(): make it generic</title>
<updated>2016-07-25T06:51:49+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nicolas.pitre@linaro.org</email>
</author>
<published>2016-07-24T15:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7e7ec6a934349ef6983f06f7ac0da09cc8a42983'/>
<id>7e7ec6a934349ef6983f06f7ac0da09cc8a42983</id>
<content type='text'>
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Reviewed-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Signed-off-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.

While at it, improve the code a bit not to copy below the actual
data area.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Reviewed-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
Signed-off-by: Greg Ungerer &lt;gerg@linux-m68k.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>coredump: fix dumping through pipes</title>
<updated>2016-06-08T02:07:09+00:00</updated>
<author>
<name>Mateusz Guzik</name>
<email>mguzik@redhat.com</email>
</author>
<published>2016-06-05T21:14:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1607f09c226d1378439c411baaaa020042750338'/>
<id>1607f09c226d1378439c411baaaa020042750338</id>
<content type='text'>
The offset in the core file used to be tracked with -&gt;written field of
the coredump_params structure. The field was retired in favour of
file-&gt;f_pos.

However, -&gt;f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
-&gt;pos field for this purpose since -&gt;written was already reused.

Fixes: a00839395103 ("get rid of coredump_params-&gt;written").

Reported-by: Zbigniew Jędrzejewski-Szmek &lt;zbyszek@in.waw.pl&gt;
Signed-off-by: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The offset in the core file used to be tracked with -&gt;written field of
the coredump_params structure. The field was retired in favour of
file-&gt;f_pos.

However, -&gt;f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
-&gt;pos field for this purpose since -&gt;written was already reused.

Fixes: a00839395103 ("get rid of coredump_params-&gt;written").

Reported-by: Zbigniew Jędrzejewski-Szmek &lt;zbyszek@in.waw.pl&gt;
Signed-off-by: Mateusz Guzik &lt;mguzik@redhat.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>syscalls: implement execveat() system call</title>
<updated>2014-12-13T20:42:51+00:00</updated>
<author>
<name>David Drysdale</name>
<email>drysdale@google.com</email>
</author>
<published>2014-12-13T00:57:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=51f39a1f0cea1cacf8c787f652f26dfee9611874'/>
<id>51f39a1f0cea1cacf8c787f652f26dfee9611874</id>
<content type='text'>
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/&lt;fd&gt;" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/&lt;fd&gt;"
(for an empty filename) or "/dev/fd/&lt;fd&gt;/&lt;filename&gt;", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale &lt;drysdale@google.com&gt;
Cc: Meredydd Luff &lt;meredydd@senatehouse.org&gt;
Cc: Shuah Khan &lt;shuah.kh@samsung.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Rich Felker &lt;dalias@aerifal.cx&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/&lt;fd&gt;" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/&lt;fd&gt;"
(for an empty filename) or "/dev/fd/&lt;fd&gt;/&lt;filename&gt;", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale &lt;drysdale@google.com&gt;
Cc: Meredydd Luff &lt;meredydd@senatehouse.org&gt;
Cc: Shuah Khan &lt;shuah.kh@samsung.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Rich Felker &lt;dalias@aerifal.cx&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
