linux-toradex.git/kernel/exit.c, branch tegra-10.11.7

posix-cpu-timers: workaround to suppress the problems with mt exec

2011-01-07T21:58:52+00:00

commit e0a70217107e6f9844628120412cb27bb4cea194 upstream.

posix-cpu-timers.c correctly assumes that the dying process does
posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD
timers from signal->cpu_timers list.

But, it also assumes that timer->it.cpu.task is always the group
leader, and thus the dead ->task means the dead thread group.

This is obviously not true after de_thread() changes the leader.
After that almost every posix_cpu_timer_ method has problems.

It is not simple to fix this bug correctly. First of all, I think
that timer->it.cpu should use struct pid instead of task_struct.
Also, the locking should be reworked completely. In particular,
tasklist_lock should not be used at all. This all needs a lot of
nontrivial and hard-to-test changes.

Change __exit_signal() to do posix_cpu_timers_exit_group() when
the old leader dies during exec. This is not the fix, just the
temporary hack to hide the problem for 2.6.37 and stable. IOW,
this is obviously wrong but this is what we currently have anyway:
cpu timers do not work after mt exec.

In theory this change adds another race. The exiting leader can
detach the timers which were attached to the new leader. However,
the window between de_thread() and release_task() is small, we
can pretend that sys_timer_create() was called before de_thread().

Signed-off-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

do_exit(): make sure that we run with get_fs() == USER_DS

2010-12-09T21:33:14+00:00

commit 33dd94ae1ccbfb7bf0fb6c692bc3d1c4269e6177 upstream.

If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit().  do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.

This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing.  I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.

A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.

Let's just stick it in do_exit instead.

[akpm@linux-foundation.org: update code comment]
Signed-off-by: Nelson Elhage 
Cc: KOSAKI Motohiro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

Fix unprotected access to task credentials in waitid()

2010-08-18T01:07:43+00:00

Using a program like the following:

	#include 
	#include 
	#include 
	#include 

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310
	 #1:  (&(&sighand->siglock)->rlock){-.-...}, at: []
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [] lockdep_rcu_dereference+0xa4/0xc0
	 [] wait_consider_task+0xaf1/0xbe0
	 [] do_wait+0xf5/0x310
	 [] sys_waitid+0x86/0x1f0
	 [] ? child_wait_callback+0x0/0x70
	 [] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman 
Signed-off-by: David Howells 
Acked-by: Paul E. McKenney 
Acked-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds

ptrace: optimize exit_ptrace() for the likely case

2010-08-11T15:59:19+00:00

exit_ptrace() takes tasklist_lock unconditionally.  We need this lock to
avoid the race with ptrace_traceme(), it acts as a barrier.

Change its caller, forget_original_parent(), to call exit_ptrace() under
tasklist_lock.  Change exit_ptrace() to drop and reacquire this lock if
needed.

This allows us to add the fastpath list_empty(ptraced) check.  In the
likely no-tracees case exit_ptrace() just returns and we avoid the lock()
+ unlock() sequence.

"Zhang, Yanmin"  suggested to add this
check, and he reports that this change adds about 11% improvement in some
tests.

Suggested-and-tested-by: "Zhang, Yanmin" 
Signed-off-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

proc: turn signal_struct->count into "int nr_threads"

2010-05-27T16:12:47+00:00

No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov 
Cc: Alexey Dobriyan 
Cc: "Eric W. Biederman" 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

exit: move taskstats_tgid_free() from __exit_signal() to free_signal_struct()

2010-05-27T16:12:46+00:00

Move taskstats_tgid_free() from __exit_signal() to free_signal_struct().

This way signal->stats never points to nowhere and we can read ->stats
lockless.

Signed-off-by: Oleg Nesterov 
Cc: Balbir Singh 
Cc: Roland McGrath 
Cc: Veaceslav Falico 
Cc: Stanislaw Gruszka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

exit: __exit_signal: use thread_group_leader() consistently

2010-05-27T16:12:46+00:00

Cleanup:

- Add the boolean, group_dead = thread_group_leader(), for clarity.

- Do not test/set sig == NULL to detect the all-dead case, use this
  boolean.

- Pass this boolen to __unhash_process() and use it instead of another
  thread_group_leader() call which needs ->group_leader.

  This can be considered as microoptimization, but hopefully this also
  allows us do do other cleanups later.

Signed-off-by: Oleg Nesterov 
Cc: Balbir Singh 
Cc: Roland McGrath 
Cc: Veaceslav Falico 
Cc: Stanislaw Gruszka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: kill the awful task_rq_unlock_wait() hack

2010-05-27T16:12:46+00:00

Now that task->signal can't go away we can revert the horrible hack added
by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.

Signed-off-by: Oleg Nesterov 
Cc: Alan Cox 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: clear signal->tty when the last thread exits

2010-05-27T16:12:46+00:00

When the last thread exits signal->tty is freed, but the pointer is not
cleared and points to nowhere.

This is OK.  Nobody should use signal->tty lockless, and it is no longer
possible to take ->siglock.  However this looks wrong even if correct, and
the nice OOPS is better than subtle and hard to find bugs.

Change __exit_signal() to clear signal->tty under ->siglock.

Note: __exit_signal() needs more cleanups.  It should not check "sig !=
NULL" to detect the all-dead case and we have the same issues with
signal->stats.

Signed-off-by: Oleg Nesterov 
Cc: Alan Cox 
Cc: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: make task_struct->signal immutable/refcountable

2010-05-27T16:12:46+00:00

We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment.  Even current can't use its ->signal safely
after exit_notify().  ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal->tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct->count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.

Signed-off-by: Oleg Nesterov 
Cc: Alan Cox 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Acked-by: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds