From 7c9f8861e6c9c839f913e49b98c3854daca18f27 Mon Sep 17 00:00:00 2001 From: Eric Sandeen Date: Tue, 22 Apr 2008 16:38:23 -0500 Subject: stackprotector: use canary at end of stack to indicate overruns at oops time (Updated with a common max-stack-used checker that knows about the canary, as suggested by Joe Perches) Use a canary at the end of the stack to clearly indicate at oops time whether the stack has ever overflowed. This is a very simple implementation with a couple of drawbacks: 1) a thread may legitimately use exactly up to the last word on the stack -- but the chances of doing this and then oopsing later seem slim 2) it's possible that the stack usage isn't dense enough that the canary location could get skipped over -- but the worst that happens is that we don't flag the overrun -- though this happens fairly often in my testing :( With the code in place, an intentionally-bloated stack oops might do: BUG: unable to handle kernel paging request at ffff8103f84cc680 IP: [] update_curr+0x9a/0xa8 PGD 8063 PUD 0 Thread overran stack or stack corrupted Oops: 0000 [1] SMP CPU 0 ... ... unless the stack overrun is so bad that it corrupts some other thread. Signed-off-by: Eric Sandeen Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner --- kernel/exit.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 8f6185e69b69..fb8de6cbf2c7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -899,12 +899,9 @@ static void check_stack_usage(void) { static DEFINE_SPINLOCK(low_water_lock); static int lowest_to_date = THREAD_SIZE; - unsigned long *n = end_of_stack(current); unsigned long free; - while (*n == 0) - n++; - free = (unsigned long)n - (unsigned long)end_of_stack(current); + free = stack_not_used(current); if (free >= lowest_to_date) return; -- cgit v1.2.3 From 32bd671d6cbeda60dc73be77fa2b9037d9a9bfa0 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 5 Feb 2009 12:24:15 +0100 Subject: signal: re-add dead task accumulation stats. We're going to split the process wide cpu accounting into two parts: - clocks; which can take all the time they want since they run from user context. - timers; which need constant time tracing but can affort the overhead because they're default off -- and rare. The clock readout will go back to a full sum of the thread group, for this we need to re-add the exit stats that were removed in the initial itimer rework (f06febc9: timers: fix itimer/many thread hang). Furthermore, since that full sum can be rather slow for large thread groups and we have the complete dead task stats, revert the do_notify_parent time computation. Signed-off-by: Peter Zijlstra Reviewed-by: Ingo Molnar Signed-off-by: Ingo Molnar --- kernel/exit.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index f80dec3f1875..efd30ccf3858 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -118,6 +118,8 @@ static void __exit_signal(struct task_struct *tsk) * We won't ever get here for the group leader, since it * will have been the last reference on the signal_struct. */ + sig->utime = cputime_add(sig->utime, task_utime(tsk)); + sig->stime = cputime_add(sig->stime, task_stime(tsk)); sig->gtime = cputime_add(sig->gtime, task_gtime(tsk)); sig->min_flt += tsk->min_flt; sig->maj_flt += tsk->maj_flt; @@ -126,6 +128,7 @@ static void __exit_signal(struct task_struct *tsk) sig->inblock += task_io_get_inblock(tsk); sig->oublock += task_io_get_oublock(tsk); task_io_accounting_add(&sig->ioac, &tsk->ioac); + sig->sum_sched_runtime += tsk->se.sum_exec_runtime; sig = NULL; /* Marker for below. */ } -- cgit v1.2.3 From 3aa551c9b4c40018f0e261a178e3d25478dc04a9 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 23 Mar 2009 18:28:15 +0100 Subject: genirq: add threaded interrupt handler support Add support for threaded interrupt handlers: A device driver can request that its main interrupt handler runs in a thread. To achive this the device driver requests the interrupt with request_threaded_irq() and provides additionally to the handler a thread function. The handler function is called in hard interrupt context and needs to check whether the interrupt originated from the device. If the interrupt originated from the device then the handler can either return IRQ_HANDLED or IRQ_WAKE_THREAD. IRQ_HANDLED is returned when no further action is required. IRQ_WAKE_THREAD causes the genirq code to invoke the threaded (main) handler. When IRQ_WAKE_THREAD is returned handler must have disabled the interrupt on the device level. This is mandatory for shared interrupt handlers, but we need to do it as well for obscure x86 hardware where disabling an interrupt on the IO_APIC level redirects the interrupt to the legacy PIC interrupt lines. Signed-off-by: Thomas Gleixner Reviewed-by: Ingo Molnar --- kernel/exit.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 167e1e3ad7c6..ca0b3488c4a9 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1037,6 +1037,8 @@ NORET_TYPE void do_exit(long code) schedule(); } + exit_irq_thread(); + exit_signals(tsk); /* sets PF_EXITING */ /* * tsk->flags are checked in the futex code to protect against -- cgit v1.2.3 From 3e93cd671813e204c258f1e6c797959920cf7772 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 29 Mar 2009 19:00:13 -0400 Subject: Take fs_struct handling to new file (fs/fs_struct.c) Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro --- kernel/exit.c | 31 +------------------------------ 1 file changed, 1 insertion(+), 30 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 167e1e3ad7c6..ad8375758a79 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -429,7 +429,6 @@ EXPORT_SYMBOL(disallow_signal); void daemonize(const char *name, ...) { va_list args; - struct fs_struct *fs; sigset_t blocked; va_start(args, name); @@ -462,11 +461,7 @@ void daemonize(const char *name, ...) /* Become as one with the init task */ - exit_fs(current); /* current->fs->count--; */ - fs = init_task.fs; - current->fs = fs; - atomic_inc(&fs->count); - + daemonize_fs_struct(); exit_files(current); current->files = init_task.files; atomic_inc(¤t->files->count); @@ -565,30 +560,6 @@ void exit_files(struct task_struct *tsk) } } -void put_fs_struct(struct fs_struct *fs) -{ - /* No need to hold fs->lock if we are killing it */ - if (atomic_dec_and_test(&fs->count)) { - path_put(&fs->root); - path_put(&fs->pwd); - kmem_cache_free(fs_cachep, fs); - } -} - -void exit_fs(struct task_struct *tsk) -{ - struct fs_struct * fs = tsk->fs; - - if (fs) { - task_lock(tsk); - tsk->fs = NULL; - task_unlock(tsk); - put_fs_struct(fs); - } -} - -EXPORT_SYMBOL_GPL(exit_fs); - #ifdef CONFIG_MM_OWNER /* * Task p is exiting and it owned mm, lets find a new owner for it -- cgit v1.2.3 From 5ad4e53bd5406ee214ddc5a41f03f779b8b2d526 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 29 Mar 2009 19:50:06 -0400 Subject: Get rid of indirect include of fs_struct.h Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro --- kernel/exit.c | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index ad8375758a79..b5d656845c90 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include -- cgit v1.2.3 From 90bc8d8b1a38f1ab131a2399a202e1889db95de8 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:57:58 -0700 Subject: do_wait: fix waiting for the group stop with the dead leader do_wait(WSTOPPED) assumes that p->state must be == TASK_STOPPED, this is not true if the leader is already dead. Check SIGNAL_STOP_STOPPED instead and use signal->group_exit_code. Trivial test-case: void *tfunc(void *arg) { pause(); return NULL; } int main(void) { pthread_t thr; pthread_create(&thr, NULL, tfunc, NULL); pthread_exit(NULL); return 0; } It doesn't react to ^Z (and then to ^C or ^\). The task is stopped, but bash can't see this. The bug is very old, and it was reported multiple times. This patch was sent more than a year ago (http://marc.info/?t=119713920000003) but it was ignored. This change also fixes other oddities (but not all) in this area. For example, before this patch: $ sleep 100 ^Z [1]+ Stopped sleep 100 $ strace -p `pidof sleep` Process 11442 attached - interrupt to quit strace hangs in do_wait(), because ->exit_code was already consumed by bash. After this patch, strace happily proceeds: --- SIGTSTP (Stopped) @ 0 (0) --- restart_syscall(<... resuming interrupted call ...> To me, this looks much more "natural" and correct. Another example. Let's suppose we have the main thread M and sub-thread T, the process is stopped, and its parent did wait(WSTOPPED). Now we can ptrace T but not M. This looks at least strange to me. Imho, do_wait() should not confuse the per-thread ptrace stops with the per-process job control stops. Signed-off-by: Oleg Nesterov Cc: Denys Vlasenko Cc: "Eric W. Biederman" Cc: Jan Kratochvil Cc: Kaz Kylheku Cc: Michael Kerrisk Cc: Roland McGrath Cc: Ulrich Drepper Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 167e1e3ad7c6..0c06b9efae3b 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1417,6 +1417,18 @@ static int wait_task_zombie(struct task_struct *p, int options, return retval; } +static int *task_stopped_code(struct task_struct *p, bool ptrace) +{ + if (ptrace) { + if (task_is_stopped_or_traced(p)) + return &p->exit_code; + } else { + if (p->signal->flags & SIGNAL_STOP_STOPPED) + return &p->signal->group_exit_code; + } + return NULL; +} + /* * Handle sys_wait4 work for one task in state TASK_STOPPED. We hold * read_lock(&tasklist_lock) on entry. If we return zero, we still hold @@ -1427,7 +1439,7 @@ static int wait_task_stopped(int ptrace, struct task_struct *p, int options, struct siginfo __user *infop, int __user *stat_addr, struct rusage __user *ru) { - int retval, exit_code, why; + int retval, exit_code, *p_code, why; uid_t uid = 0; /* unneeded, required by compiler */ pid_t pid; @@ -1437,22 +1449,16 @@ static int wait_task_stopped(int ptrace, struct task_struct *p, exit_code = 0; spin_lock_irq(&p->sighand->siglock); - if (unlikely(!task_is_stopped_or_traced(p))) - goto unlock_sig; - - if (!ptrace && p->signal->group_stop_count > 0) - /* - * A group stop is in progress and this is the group leader. - * We won't report until all threads have stopped. - */ + p_code = task_stopped_code(p, ptrace); + if (unlikely(!p_code)) goto unlock_sig; - exit_code = p->exit_code; + exit_code = *p_code; if (!exit_code) goto unlock_sig; if (!unlikely(options & WNOWAIT)) - p->exit_code = 0; + *p_code = 0; /* don't need the RCU readlock here as we're holding a spinlock */ uid = __task_cred(p)->uid; @@ -1608,7 +1614,7 @@ static int wait_consider_task(struct task_struct *parent, int ptrace, */ *notask_error = 0; - if (task_is_stopped_or_traced(p)) + if (task_stopped_code(p, ptrace)) return wait_task_stopped(ptrace, p, options, infop, stat_addr, ru); -- cgit v1.2.3 From 6d69cb87f05eef3b02370b2f7bae608ad2301a00 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:12 -0700 Subject: ptrace: simplify ptrace_exit()->ignoring_children() path ignoring_children() takes parent->sighand->siglock and checks k_sigaction[SIGCHLD] atomically. But this buys nothing, we can't get the "really" wrong result even if we race with sigaction(SIGCHLD). If we read the "stale" sa_handler/sa_flags we can pretend it was changed right after the check. Remove spin_lock(->siglock), and kill "int ign" which caches the result of ignoring_children() which becomes rather trivial. Perhaps it makes sense to export this helper, do_notify_parent() can use it too. Signed-off-by: Oleg Nesterov Cc: Jerome Marchand Cc: Roland McGrath Cc: Denys Vlasenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 0c06b9efae3b..7a8311422930 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -732,19 +732,15 @@ static void exit_mm(struct task_struct * tsk) } /* - * Return nonzero if @parent's children should reap themselves. - * - * Called with write_lock_irq(&tasklist_lock) held. + * Called with irqs disabled, returns true if childs should reap themselves. */ -static int ignoring_children(struct task_struct *parent) +static int ignoring_children(struct sighand_struct *sigh) { int ret; - struct sighand_struct *psig = parent->sighand; - unsigned long flags; - spin_lock_irqsave(&psig->siglock, flags); - ret = (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN || - (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT)); - spin_unlock_irqrestore(&psig->siglock, flags); + spin_lock(&sigh->siglock); + ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) || + (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT); + spin_unlock(&sigh->siglock); return ret; } @@ -757,7 +753,6 @@ static int ignoring_children(struct task_struct *parent) static void ptrace_exit(struct task_struct *parent, struct list_head *dead) { struct task_struct *p, *n; - int ign = -1; list_for_each_entry_safe(p, n, &parent->ptraced, ptrace_entry) { __ptrace_unlink(p); @@ -779,12 +774,8 @@ static void ptrace_exit(struct task_struct *parent, struct list_head *dead) if (!task_detached(p) && thread_group_empty(p)) { if (!same_thread_group(p->real_parent, parent)) do_notify_parent(p, p->exit_signal); - else { - if (ign < 0) - ign = ignoring_children(parent); - if (ign) - p->exit_signal = -1; - } + else if (ignoring_children(parent->sighand)) + p->exit_signal = -1; } if (task_detached(p)) { -- cgit v1.2.3 From b1b4c6799fb59e710454bfe0ab477cb8523a8667 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:13 -0700 Subject: ptrace: reintroduce __ptrace_detach() as a callee of ptrace_exit() No functional changes, preparation for the next patch. Move the "should we release this child" logic into the separate handler, __ptrace_detach(). Signed-off-by: Oleg Nesterov Cc: Jerome Marchand Cc: Roland McGrath Cc: Denys Vlasenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 62 +++++++++++++++++++++++++++++++---------------------------- 1 file changed, 33 insertions(+), 29 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 7a8311422930..576eae233b53 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -744,6 +744,38 @@ static int ignoring_children(struct sighand_struct *sigh) return ret; } +/* Returns nonzero if the tracee should be released. */ +int __ptrace_detach(struct task_struct *tracer, struct task_struct *p) +{ + __ptrace_unlink(p); + + if (p->exit_state != EXIT_ZOMBIE) + return 0; + /* + * If it's a zombie, our attachedness prevented normal + * parent notification or self-reaping. Do notification + * now if it would have happened earlier. If it should + * reap itself we return true. + * + * If it's our own child, there is no notification to do. + * But if our normal children self-reap, then this child + * was prevented by ptrace and we must reap it now. + */ + if (!task_detached(p) && thread_group_empty(p)) { + if (!same_thread_group(p->real_parent, tracer)) + do_notify_parent(p, p->exit_signal); + else if (ignoring_children(tracer->sighand)) + p->exit_signal = -1; + } + + if (!task_detached(p)) + return 0; + + /* Mark it as in the process of being reaped. */ + p->exit_state = EXIT_DEAD; + return 1; +} + /* * Detach all tasks we were using ptrace on. * Any that need to be release_task'd are put on the @dead list. @@ -755,36 +787,8 @@ static void ptrace_exit(struct task_struct *parent, struct list_head *dead) struct task_struct *p, *n; list_for_each_entry_safe(p, n, &parent->ptraced, ptrace_entry) { - __ptrace_unlink(p); - - if (p->exit_state != EXIT_ZOMBIE) - continue; - - /* - * If it's a zombie, our attachedness prevented normal - * parent notification or self-reaping. Do notification - * now if it would have happened earlier. If it should - * reap itself, add it to the @dead list. We can't call - * release_task() here because we already hold tasklist_lock. - * - * If it's our own child, there is no notification to do. - * But if our normal children self-reap, then this child - * was prevented by ptrace and we must reap it now. - */ - if (!task_detached(p) && thread_group_empty(p)) { - if (!same_thread_group(p->real_parent, parent)) - do_notify_parent(p, p->exit_signal); - else if (ignoring_children(parent->sighand)) - p->exit_signal = -1; - } - - if (task_detached(p)) { - /* - * Mark it as in the process of being reaped. - */ - p->exit_state = EXIT_DEAD; + if (__ptrace_detach(parent, p)) list_add(&p->ptrace_entry, dead); - } } } -- cgit v1.2.3 From 0a967a044a777e8b9c739120927114ddc0094298 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:15 -0700 Subject: reparent_thread: don't call kill_orphaned_pgrp() if task_detached() If task_detached(p) == T, then either a) p is not the main thread, we will find the group leader on the ->children list. or b) p is the group leader but its ->exit_state = EXIT_DEAD. This can only happen when the last sub-thread has died, but in that case that thread has already called kill_orphaned_pgrp() from exit_notify(). In both cases kill_orphaned_pgrp() looks bogus. Move the task_detached() check up and simplify the code, this is also right from the "common sense" pov: we should do nothing with the detached childs, except move them to the new parent's ->children list. Signed-off-by: Oleg Nesterov Cc: Roland McGrath Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 576eae233b53..405e6877168b 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -818,6 +818,8 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father) list_move_tail(&p->sibling, &p->real_parent->children); + if (task_detached(p)) + return; /* If this is a threaded reparent there is no need to * notify anyone anything has happened. */ @@ -825,15 +827,13 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father) return; /* We don't want people slaying init. */ - if (!task_detached(p)) - p->exit_signal = SIGCHLD; + p->exit_signal = SIGCHLD; /* If we'd notified the old parent about this child's death, * also notify the new parent. */ if (!ptrace_reparented(p) && - p->exit_state == EXIT_ZOMBIE && - !task_detached(p) && thread_group_empty(p)) + p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) do_notify_parent(p, p->exit_signal); kill_orphaned_pgrp(p, father); -- cgit v1.2.3 From b1442b055c154699a6a2c436f3352f71b6beede3 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:16 -0700 Subject: reparent_thread: fix the "is it traced" check reparent_thread() uses ptrace_reparented() to check whether this thread is ptraced, in that case we should not notify the new parent. But ptrace_reparented() is not exactly correct when the reparented thread is traced by /sbin/init, because forget_original_parent() has already changed ->real_parent. Currently, the only problem is the false notification. But with the next patch the kernel crash in this (yes, pathological) case. Signed-off-by: Oleg Nesterov Cc: Roland McGrath Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 405e6877168b..5be0a406faeb 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -832,7 +832,7 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father) /* If we'd notified the old parent about this child's death, * also notify the new parent. */ - if (!ptrace_reparented(p) && + if (!p->ptrace && p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) do_notify_parent(p, p->exit_signal); -- cgit v1.2.3 From 7f5d3652d469cdf9eb2365dfea7ce3fb9e1409cc Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:17 -0700 Subject: reparent_thread: fix a zombie leak if /sbin/init ignores SIGCHLD If /sbin/init ignores SIGCHLD and we re-parent a zombie, it is leaked. reparent_thread() does do_notify_parent() which sets ->exit_signal = -1 in this case. This means that nobody except us can reap it, the detached task is not visible to do_wait(). Change reparent_thread() to return a boolean (like __pthread_detach) to indicate that the thread is dead and must be released. Also change forget_original_parent() to add the child to ptrace_dead list in this case. The naming becomes insane, the next patch does the cleanup. Signed-off-by: Oleg Nesterov Cc: Roland McGrath Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 5be0a406faeb..3e09b7cb3b20 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -810,8 +810,11 @@ static void ptrace_exit_finish(struct task_struct *parent, } } -static void reparent_thread(struct task_struct *p, struct task_struct *father) +/* Returns nonzero if the child should be released. */ +static int reparent_thread(struct task_struct *p, struct task_struct *father) { + int dead; + if (p->pdeath_signal) /* We already hold the tasklist_lock here. */ group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p); @@ -819,12 +822,12 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father) list_move_tail(&p->sibling, &p->real_parent->children); if (task_detached(p)) - return; + return 0; /* If this is a threaded reparent there is no need to * notify anyone anything has happened. */ if (same_thread_group(p->real_parent, father)) - return; + return 0; /* We don't want people slaying init. */ p->exit_signal = SIGCHLD; @@ -832,11 +835,19 @@ static void reparent_thread(struct task_struct *p, struct task_struct *father) /* If we'd notified the old parent about this child's death, * also notify the new parent. */ + dead = 0; if (!p->ptrace && - p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) + p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { do_notify_parent(p, p->exit_signal); + if (task_detached(p)) { + p->exit_state = EXIT_DEAD; + dead = 1; + } + } kill_orphaned_pgrp(p, father); + + return dead; } /* @@ -896,7 +907,8 @@ static void forget_original_parent(struct task_struct *father) BUG_ON(p->ptrace); p->parent = p->real_parent; } - reparent_thread(p, father); + if (reparent_thread(p, father)) + list_add(&p->ptrace_entry, &ptrace_dead);; } write_unlock_irq(&tasklist_lock); -- cgit v1.2.3 From 39c626ae47c469abdfd30c6e42eff884931380d6 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:18 -0700 Subject: forget_original_parent: split out the un-ptrace part By discussion with Roland. - Rename ptrace_exit() to exit_ptrace(), and change it to do all the necessary work with ->ptraced list by its own. - Move this code from exit.c to ptrace.c - Update the comment in ptrace_detach() to explain the rechecking of the child->ptrace. Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: "Metzger, Markus T" Cc: Roland McGrath Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 95 ++++------------------------------------------------------- 1 file changed, 6 insertions(+), 89 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 3e09b7cb3b20..506693dfdd4e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -61,11 +61,6 @@ DEFINE_TRACE(sched_process_wait); static void exit_mm(struct task_struct * tsk); -static inline int task_detached(struct task_struct *p) -{ - return p->exit_signal == -1; -} - static void __unhash_process(struct task_struct *p) { nr_threads--; @@ -731,85 +726,6 @@ static void exit_mm(struct task_struct * tsk) mmput(mm); } -/* - * Called with irqs disabled, returns true if childs should reap themselves. - */ -static int ignoring_children(struct sighand_struct *sigh) -{ - int ret; - spin_lock(&sigh->siglock); - ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) || - (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT); - spin_unlock(&sigh->siglock); - return ret; -} - -/* Returns nonzero if the tracee should be released. */ -int __ptrace_detach(struct task_struct *tracer, struct task_struct *p) -{ - __ptrace_unlink(p); - - if (p->exit_state != EXIT_ZOMBIE) - return 0; - /* - * If it's a zombie, our attachedness prevented normal - * parent notification or self-reaping. Do notification - * now if it would have happened earlier. If it should - * reap itself we return true. - * - * If it's our own child, there is no notification to do. - * But if our normal children self-reap, then this child - * was prevented by ptrace and we must reap it now. - */ - if (!task_detached(p) && thread_group_empty(p)) { - if (!same_thread_group(p->real_parent, tracer)) - do_notify_parent(p, p->exit_signal); - else if (ignoring_children(tracer->sighand)) - p->exit_signal = -1; - } - - if (!task_detached(p)) - return 0; - - /* Mark it as in the process of being reaped. */ - p->exit_state = EXIT_DEAD; - return 1; -} - -/* - * Detach all tasks we were using ptrace on. - * Any that need to be release_task'd are put on the @dead list. - * - * Called with write_lock(&tasklist_lock) held. - */ -static void ptrace_exit(struct task_struct *parent, struct list_head *dead) -{ - struct task_struct *p, *n; - - list_for_each_entry_safe(p, n, &parent->ptraced, ptrace_entry) { - if (__ptrace_detach(parent, p)) - list_add(&p->ptrace_entry, dead); - } -} - -/* - * Finish up exit-time ptrace cleanup. - * - * Called without locks. - */ -static void ptrace_exit_finish(struct task_struct *parent, - struct list_head *dead) -{ - struct task_struct *p, *n; - - BUG_ON(!list_empty(&parent->ptraced)); - - list_for_each_entry_safe(p, n, dead, ptrace_entry) { - list_del_init(&p->ptrace_entry); - release_task(p); - } -} - /* Returns nonzero if the child should be released. */ static int reparent_thread(struct task_struct *p, struct task_struct *father) { @@ -894,12 +810,10 @@ static void forget_original_parent(struct task_struct *father) struct task_struct *p, *n, *reaper; LIST_HEAD(ptrace_dead); + exit_ptrace(father); + write_lock_irq(&tasklist_lock); reaper = find_new_reaper(father); - /* - * First clean up ptrace if we were using it. - */ - ptrace_exit(father, &ptrace_dead); list_for_each_entry_safe(p, n, &father->children, sibling) { p->real_parent = reaper; @@ -914,7 +828,10 @@ static void forget_original_parent(struct task_struct *father) write_unlock_irq(&tasklist_lock); BUG_ON(!list_empty(&father->children)); - ptrace_exit_finish(father, &ptrace_dead); + list_for_each_entry_safe(p, n, &ptrace_dead, ptrace_entry) { + list_del_init(&p->ptrace_entry); + release_task(p); + } } /* -- cgit v1.2.3 From 5dfc80be73dd0c212d2e6dd8dbf5afa07e680bbe Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:19 -0700 Subject: forget_original_parent: do not abuse child->ptrace_entry By discussion with Roland. - Use ->sibling instead of ->ptrace_entry to chain the need to be release_task'd childs. Nobody else can use ->sibling, this task is EXIT_DEAD and nobody can find it on its own list. - rename ptrace_dead to dead_childs. - Now that we don't have the "parallel" untrace code, change back reparent_thread() to return void, pass dead_childs as an argument. Actually, I don't understand why do we notify /sbin/init when we reparent a zombie, probably it is better to reap it unconditionally. [akpm@linux-foundation.org: s/childs/children/] Signed-off-by: Oleg Nesterov Cc: "Eric W. Biederman" Cc: "Metzger, Markus T" Cc: Roland McGrath Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 87 ++++++++++++++++++++++++++++------------------------------- 1 file changed, 41 insertions(+), 46 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 506693dfdd4e..029415d9f82e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -726,46 +726,6 @@ static void exit_mm(struct task_struct * tsk) mmput(mm); } -/* Returns nonzero if the child should be released. */ -static int reparent_thread(struct task_struct *p, struct task_struct *father) -{ - int dead; - - if (p->pdeath_signal) - /* We already hold the tasklist_lock here. */ - group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p); - - list_move_tail(&p->sibling, &p->real_parent->children); - - if (task_detached(p)) - return 0; - /* If this is a threaded reparent there is no need to - * notify anyone anything has happened. - */ - if (same_thread_group(p->real_parent, father)) - return 0; - - /* We don't want people slaying init. */ - p->exit_signal = SIGCHLD; - - /* If we'd notified the old parent about this child's death, - * also notify the new parent. - */ - dead = 0; - if (!p->ptrace && - p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { - do_notify_parent(p, p->exit_signal); - if (task_detached(p)) { - p->exit_state = EXIT_DEAD; - dead = 1; - } - } - - kill_orphaned_pgrp(p, father); - - return dead; -} - /* * When we die, we re-parent all our children. * Try to give them to another thread in our thread @@ -805,10 +765,46 @@ static struct task_struct *find_new_reaper(struct task_struct *father) return pid_ns->child_reaper; } +/* +* Any that need to be release_task'd are put on the @dead list. + */ +static void reparent_thread(struct task_struct *father, struct task_struct *p, + struct list_head *dead) +{ + if (p->pdeath_signal) + group_send_sig_info(p->pdeath_signal, SEND_SIG_NOINFO, p); + + list_move_tail(&p->sibling, &p->real_parent->children); + + if (task_detached(p)) + return; + /* + * If this is a threaded reparent there is no need to + * notify anyone anything has happened. + */ + if (same_thread_group(p->real_parent, father)) + return; + + /* We don't want people slaying init. */ + p->exit_signal = SIGCHLD; + + /* If it has exited notify the new parent about this child's death. */ + if (!p->ptrace && + p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { + do_notify_parent(p, p->exit_signal); + if (task_detached(p)) { + p->exit_state = EXIT_DEAD; + list_move_tail(&p->sibling, dead); + } + } + + kill_orphaned_pgrp(p, father); +} + static void forget_original_parent(struct task_struct *father) { struct task_struct *p, *n, *reaper; - LIST_HEAD(ptrace_dead); + LIST_HEAD(dead_children); exit_ptrace(father); @@ -821,15 +817,14 @@ static void forget_original_parent(struct task_struct *father) BUG_ON(p->ptrace); p->parent = p->real_parent; } - if (reparent_thread(p, father)) - list_add(&p->ptrace_entry, &ptrace_dead);; + reparent_thread(father, p, &dead_children); } - write_unlock_irq(&tasklist_lock); + BUG_ON(!list_empty(&father->children)); - list_for_each_entry_safe(p, n, &ptrace_dead, ptrace_entry) { - list_del_init(&p->ptrace_entry); + list_for_each_entry_safe(p, n, &dead_children, sibling) { + list_del_init(&p->sibling); release_task(p); } } -- cgit v1.2.3 From 2ae448efc87df6d328f5835969076c7f9fce59c3 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:36 -0700 Subject: pids: improve get_task_pid() to fix the unsafe sys_wait4()->task_pgrp() sys_wait4() does get_pid(task_pgrp(current)), this is not safe. We can add rcu lock/unlock around, but we already have get_task_pid() which can be improved to handle the special pids in more reliable manner. Signed-off-by: Oleg Nesterov Cc: Louis Rilling Cc: "Eric W. Biederman" Cc: Pavel Emelyanov Cc: Sukadev Bhattiprolu Cc: Roland McGrath Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 029415d9f82e..384f09caf2ef 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1737,7 +1737,7 @@ SYSCALL_DEFINE4(wait4, pid_t, upid, int __user *, stat_addr, pid = find_get_pid(-upid); } else if (upid == 0) { type = PIDTYPE_PGID; - pid = get_pid(task_pgrp(current)); + pid = get_task_pid(current, PIDTYPE_PGID); } else /* upid > 0 */ { type = PIDTYPE_PID; pid = find_get_pid(upid); -- cgit v1.2.3 From 1b0f7ffd0ea27cd3a0b9ca04e3df9522048c32a3 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 2 Apr 2009 16:58:39 -0700 Subject: pids: kill signal_struct-> __pgrp/__session and friends We are wasting 2 words in signal_struct without any reason to implement task_pgrp_nr() and task_session_nr(). task_session_nr() has no callers since 2e2ba22ea4fd4bb85f0fa37c521066db6775cbef, we can remove it. task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and fs/coda. This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills __pgrp/__session and the related helpers. The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense anyway. Signed-off-by: Oleg Nesterov Acked-by: Alan Cox [tty parts] Cc: Cedric Le Goater Cc: Dave Hansen Cc: Eric Biederman Cc: Pavel Emelyanov Cc: Serge Hallyn Cc: Sukadev Bhattiprolu Cc: Roland McGrath Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 384f09caf2ef..3bec141c82f6 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -357,16 +357,12 @@ static void reparent_to_kthreadd(void) void __set_special_pids(struct pid *pid) { struct task_struct *curr = current->group_leader; - pid_t nr = pid_nr(pid); - if (task_session(curr) != pid) { + if (task_session(curr) != pid) change_pid(curr, PIDTYPE_SID, pid); - set_task_session(curr, nr); - } - if (task_pgrp(curr) != pid) { + + if (task_pgrp(curr) != pid) change_pid(curr, PIDTYPE_PGID, pid); - set_task_pgrp(curr, nr); - } } static void set_special_pids(struct pid *pid) -- cgit v1.2.3 From 432870dab85a2f69dc417022646cb9a70acf7f94 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 6 Apr 2009 16:16:02 +0200 Subject: exit_notify: kill the wrong capable(CAP_KILL) check The CAP_KILL check in exit_notify() looks just wrong, kill it. Whatever logic we have to reset ->exit_signal, the malicious user can bypass it if it execs the setuid application before exiting. Signed-off-by: Oleg Nesterov Acked-by: Serge Hallyn Acked-by: Roland McGrath Signed-off-by: Linus Torvalds --- kernel/exit.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 6686ed1e4aa3..32cbf2607cb0 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -837,8 +837,7 @@ static void exit_notify(struct task_struct *tsk, int group_dead) */ if (tsk->exit_signal != SIGCHLD && !task_detached(tsk) && (tsk->parent_exec_id != tsk->real_parent->self_exec_id || - tsk->self_exec_id != tsk->parent_exec_id) && - !capable(CAP_KILL)) + tsk->self_exec_id != tsk->parent_exec_id)) tsk->exit_signal = SIGCHLD; signal = tracehook_notify_death(tsk, &cookie, group_dead); -- cgit v1.2.3