linux-toradex.git/fs/exec.c, branch v2.6.33.9

install_special_mapping skips security_file_mmap check.

2011-03-21T19:44:29+00:00

commit 462e635e5b73ba9a4c03913b77138cd57ce4b050 upstream.

The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy 
Acked-by: Kees Cook 
Acked-by: Robert Swiecki 
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

execve: make responsive to SIGKILL with large arguments

2011-03-21T19:43:26+00:00

commit 9aea5a65aa7a1af9a4236dfaeb0088f1624f9919 upstream.

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call.  It runs
uninterruptibly to count and copy all the strings.  This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending().  It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.

Signed-off-by: Roland McGrath 
Reviewed-by: KOSAKI Motohiro 
Cc: Chuck Ebbert 
Signed-off-by: Linus Torvalds

execve: improve interactivity with large arguments

2011-03-21T19:43:26+00:00

commit 7993bc1f4663c0db67bb8f0d98e6678145b387cd upstream.

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings().  There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.

Signed-off-by: Roland McGrath 
Reviewed-by: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Cc: Chuck Ebbert 
Signed-off-by: Greg Kroah-Hartman

setup_arg_pages: diagnose excessive argument size

2011-03-21T19:43:25+00:00

commit 1b528181b2ffa14721fb28ad1bd539fe1732c583 upstream.

The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.

Check that the initial stack is not too large to make it possible
to map in any executable.  We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit.  So those
mappings might clobber part of the initial stack mapping.  But
that is just userland lossage that userland made happen, not a
kernel problem.

Signed-off-by: Roland McGrath 
Reviewed-by: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Cc: Chuck Ebbert 
Signed-off-by: Greg Kroah-Hartman

revert "procfs: provide stack information for threads" and its fixup commits

2010-05-26T21:32:04+00:00

commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream.

Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.

Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.

Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.

Commit 9ebd4eba7 ("procfs: fix /proc//stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.

Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.

This patch nearly undoes the effect of all these patches.

The reason for reverting these is it provides an unusable value in
field 28.  For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address.  This unpredictability of the stack_start value makes
it worthless.  That includes the intended use of showing how much stack
space a thread has.

Other architectures will get different values.  As an example, ia64
gets 0.  The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.

I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") .  If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured.  Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.

I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") .  I left the KSTK_ESP() change in
place as that seemed worthwhile.

Signed-off-by: Robin Holt 
Cc: Stefani Seibold 
Cc: KOSAKI Motohiro 
Cc: Michal Simek 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

coredump: suppress uid comparison test if core output files are pipes

2010-04-01T23:01:18+00:00

commit 76595f79d76fbe6267a51b3a866a028d150f06d4 upstream.

Modify uid check in do_coredump so as to not apply it in the case of
pipes.

This just got noticed in testing.  The end of do_coredump validates the
uid of the inode for the created file against the uid of the crashing
process to ensure that no one can pre-create a core file with different
ownership and grab the information contained in the core when they
shouldn' tbe able to.  This causes failures when using pipes for a core
dumps if the crashing process is not root, which is the uid of the pipe
when it is created.

The fix is simple.  Since the check for matching uid's isn't relevant for
pipes (a process can't create a pipe that the uermodehelper code will open
anyway), we can just just skip it in the event ispipe is non-zero

Reverts a pipe-affecting change which was accidentally made in

: commit c46f739dd39db3b07ab5deb4e3ec81e1c04a91af
: Author:     Ingo Molnar 
: AuthorDate: Wed Nov 28 13:59:18 2007 +0100
: Commit:     Linus Torvalds 
: CommitDate: Wed Nov 28 10:58:01 2007 -0800
:
:     vfs: coredumping fix

Signed-off-by: Neil Horman 
Cc: Andi Kleen 
Cc: Oleg Nesterov 
Cc: Alan Cox 
Cc: Al Viro 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Cc: maximilian attems 
Signed-off-by: Greg Kroah-Hartman

fs/exec.c: fix initial stack reservation

2010-02-23T03:50:34+00:00

803bf5ec259941936262d10ecc84511b76a20921 ("fs/exec.c: restrict initial
stack space expansion to rlimit") attempts to limit the initial stack to
20*PAGE_SIZE.  Unfortunately, in attempting ensure the stack is not
reduced in size, we ended up not changing the stack at all.

This size reduction check is not necessary as the expand_stack call does
this already.

This caused a regression in UML resulting in most guest processes being
killed.

Signed-off-by: Michael Neuling 
Reviewed-by: KOSAKI Motohiro 
Acked-by: WANG Cong 
Cc: Anton Blanchard 
Cc: Oleg Nesterov 
Cc: James Morris 
Cc: Serge Hallyn 
Cc: Benjamin Herrenschmidt 
Cc: Jouni Malinen 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/exec.c: restrict initial stack space expansion to rlimit

2010-02-11T21:59:43+00:00

When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.

This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm:
variable length argument support") and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b ("exec: setup_arg_pages() fails
to return errors").

This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
80K on 4K pages or 'ulimit -s 79') all processes will be killed before
they start.  This is particularly bad with 64K pages, where a ulimit below
1280K will kill every process.

To test, do:

  'ulimit -s 15; ls'

before and after the patch is applied.  Before it's applied, 'ls' should
be killed.  After the patch is applied, 'ls' should no longer be killed.

A stack limit of 15KB since it's small enough to trigger 20*PAGE_SIZE.
Also 15KB not a multiple of PAGE_SIZE, which is a trickier case to handle
correctly with this code.

4K pages should be fine to test with.

[kosaki.motohiro@jp.fujitsu.com: cleanup]
[akpm@linux-foundation.org: cleanup cleanup]
Signed-off-by: Michael Neuling 
Signed-off-by: KOSAKI Motohiro 
Cc: Americo Wang 
Cc: Anton Blanchard 
Cc: Oleg Nesterov 
Cc: James Morris 
Cc: Ingo Molnar 
Cc: Serge Hallyn 
Cc: Benjamin Herrenschmidt 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Fix 'flush_old_exec()/setup_new_exec()' split

2010-02-02T20:37:44+00:00

Commit 221af7f87b9 ("Split 'flush_old_exec' into two functions") split
the function at the point of no return - ie right where there were no
more error cases to check.  That made sense from a technical standpoint,
but when we then also combined it with the actual personality setting
going in between flush_old_exec() and setup_new_exec(), it needs to be a
bit more careful.

In particular, we need to make sure that we really flush the old
personality bits in the 'flush' stage, rather than later in the 'setup'
stage, since otherwise we might be flushing the _new_ personality state
that we're just setting up.

So this moves the flags and personality flushing (and 'flush_thread()',
which is the arch-specific function that generally resets lazy FP state
etc) of the old process into flush_old_exec(), so that it doesn't affect
any state that execve() is setting up for the new process environment.

This was reported by Michal Simek as breaking his Microblaze qemu
environment.

Reported-and-tested-by: Michal Simek 
Cc: Peter Anvin 
Signed-off-by: Linus Torvalds

Split 'flush_old_exec' into two functions

2010-01-29T16:22:01+00:00

'flush_old_exec()' is the point of no return when doing an execve(), and
it is pretty badly misnamed.  It doesn't just flush the old executable
environment, it also starts up the new one.

Which is very inconvenient for things like setting up the new
personality, because we want the new personality to affect the starting
of the new environment, but at the same time we do _not_ want the new
personality to take effect if flushing the old one fails.

As a result, the x86-64 '32-bit' personality is actually done using this
insane "I'm going to change the ABI, but I haven't done it yet" bit
(TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
personality, but just the "pending" bit, so that "flush_thread()" can do
the actual personality magic.

This patch in no way changes any of that insanity, but it does split the
'flush_old_exec()' function up into a preparatory part that can fail
(still called flush_old_exec()), and a new part that will actually set
up the new exec environment (setup_new_exec()).  All callers are changed
to trivially comply with the new world order.

Signed-off-by: H. Peter Anvin 
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds