<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/exit.c, branch v3.15-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process</title>
<updated>2014-04-07T23:36:06+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7c733eb3eac0e3d091aaf37c183d2175eeebfb2b'/>
<id>7c733eb3eac0e3d091aaf37c183d2175eeebfb2b</id>
<content type='text'>
Even if the main thread is dead the process still can stop/continue.
However, if the leader is ptraced wait_consider_task(ptrace =&gt; false)
always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
WCONTINUED can never work for the natural parent in this case.

Move the "A zombie ptracee is only visible to its ptracer" check into the
"if (!delay_group_leader(p))" block.  -&gt;notask_error is cleared by the
"fall through" code below.

This depends on the previous change, wait_task_stopped/continued must be
avoided if !delay_group_leader() and the tracer is -&gt;real_parent.
Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
is already dead (single-threaded or not).  If it is traced by another task
then the "stopped" state is fine until the debugger detaches and reveals a
zombie state.

Stupid test-case:

	void *tfunc(void *arg)
	{
		sleep(1);	// wait for zombie leader
		raise(SIGSTOP);
		exit(0x13);
		return NULL;
	}

	int run_child(void)
	{
		pthread_t thread;

		if (!fork()) {
			int tracee = getppid();

			assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
			do
				ptrace(PTRACE_CONT, tracee, 0,0);
			while (wait(NULL) &gt; 0);

			return 0;
		}

		sleep(1);	// wait for PTRACE_ATTACH
		assert(pthread_create(&amp;thread, NULL, tfunc, NULL) == 0);
		pthread_exit(NULL);
	}

	int main(void)
	{
		int child, stat;

		child = fork();
		if (!child)
			return run_child();

		assert(child == waitpid(-1, &amp;stat, WSTOPPED));
		assert(stat == 0x137f);

		kill(child, SIGCONT);

		assert(child == waitpid(-1, &amp;stat, WCONTINUED));
		assert(stat == 0xffff);

		assert(child == waitpid(-1, &amp;stat, 0));
		assert(stat == 0x1300);

		return 0;
	}

Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
never called.

Note: this doesn't fix all problems with a zombie delay_group_leader(),
WCONTINUED | WEXITED check is not exactly right.  debugger can't assume it
will be notified if another thread reaps the whole thread group.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Even if the main thread is dead the process still can stop/continue.
However, if the leader is ptraced wait_consider_task(ptrace =&gt; false)
always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
WCONTINUED can never work for the natural parent in this case.

Move the "A zombie ptracee is only visible to its ptracer" check into the
"if (!delay_group_leader(p))" block.  -&gt;notask_error is cleared by the
"fall through" code below.

This depends on the previous change, wait_task_stopped/continued must be
avoided if !delay_group_leader() and the tracer is -&gt;real_parent.
Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
is already dead (single-threaded or not).  If it is traced by another task
then the "stopped" state is fine until the debugger detaches and reveals a
zombie state.

Stupid test-case:

	void *tfunc(void *arg)
	{
		sleep(1);	// wait for zombie leader
		raise(SIGSTOP);
		exit(0x13);
		return NULL;
	}

	int run_child(void)
	{
		pthread_t thread;

		if (!fork()) {
			int tracee = getppid();

			assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
			do
				ptrace(PTRACE_CONT, tracee, 0,0);
			while (wait(NULL) &gt; 0);

			return 0;
		}

		sleep(1);	// wait for PTRACE_ATTACH
		assert(pthread_create(&amp;thread, NULL, tfunc, NULL) == 0);
		pthread_exit(NULL);
	}

	int main(void)
	{
		int child, stat;

		child = fork();
		if (!child)
			return run_child();

		assert(child == waitpid(-1, &amp;stat, WSTOPPED));
		assert(stat == 0x137f);

		kill(child, SIGCONT);

		assert(child == waitpid(-1, &amp;stat, WCONTINUED));
		assert(stat == 0xffff);

		assert(child == waitpid(-1, &amp;stat, 0));
		assert(stat == 0x1300);

		return 0;
	}

Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
never called.

Note: this doesn't fix all problems with a zombie delay_group_leader(),
WCONTINUED | WEXITED check is not exactly right.  debugger can't assume it
will be notified if another thread reaps the whole thread group.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent</title>
<updated>2014-04-07T23:36:06+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=377d75dafa07ee0da64223c9169f4e17b26c2b9a'/>
<id>377d75dafa07ee0da64223c9169f4e17b26c2b9a</id>
<content type='text'>
"A zombie is only visible to its ptracer" logic in wait_consider_task()
is very wrong. Trivial test-case:

	#include &lt;unistd.h&gt;
	#include &lt;sys/ptrace.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;assert.h&gt;

	int main(void)
	{
		int child = fork();

		if (!child) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			return 0x23;
		}

		assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0);
		assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1);
		return 0;
	}

it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
child.  This is because wait_consider_task(ptrace =&gt; 0) sees p-&gt;ptrace and
cleares -&gt;notask_error assuming that the debugger should detach and notify
us.

Change wait_consider_task(ptrace =&gt; 0) to pretend that ptrace == T if the
child is traced by us.  This really simplifies the logic and allows us to
do more fixes, see the next changes.  This also hides the unwanted group
stop state automatically, we can remove another ptrace_reparented() check.

Unfortunately, this adds the following behavioural changes:

	1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
	   a natural child if it is traced by the caller's sub-thread.

	   Hopefully nobody will ever notice this change, and I think
	   that nobody should rely on this behaviour anyway.

	2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
	   it is real parent.

	   While this change comes as a side effect, I think it is good
	   by itself. The group continued state can not be consumed by
	   another process in this case, it doesn't depend on ptrace,
	   it doesn't make sense to hide it from real parent.

	   Perhaps we should add the thread_group_leader() check before
	   wait_task_continued()? May be, but this shouldn't depend on
	   ptrace_reparented().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"A zombie is only visible to its ptracer" logic in wait_consider_task()
is very wrong. Trivial test-case:

	#include &lt;unistd.h&gt;
	#include &lt;sys/ptrace.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;assert.h&gt;

	int main(void)
	{
		int child = fork();

		if (!child) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			return 0x23;
		}

		assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0);
		assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1);
		return 0;
	}

it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
child.  This is because wait_consider_task(ptrace =&gt; 0) sees p-&gt;ptrace and
cleares -&gt;notask_error assuming that the debugger should detach and notify
us.

Change wait_consider_task(ptrace =&gt; 0) to pretend that ptrace == T if the
child is traced by us.  This really simplifies the logic and allows us to
do more fixes, see the next changes.  This also hides the unwanted group
stop state automatically, we can remove another ptrace_reparented() check.

Unfortunately, this adds the following behavioural changes:

	1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
	   a natural child if it is traced by the caller's sub-thread.

	   Hopefully nobody will ever notice this change, and I think
	   that nobody should rely on this behaviour anyway.

	2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
	   it is real parent.

	   While this change comes as a side effect, I think it is good
	   by itself. The group continued state can not be consumed by
	   another process in this case, it doesn't depend on ptrace,
	   it doesn't make sense to hide it from real parent.

	   Perhaps we should add the thread_group_leader() check before
	   wait_task_continued()? May be, but this shouldn't depend on
	   ptrace_reparented().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: completely ignore the EXIT_DEAD tasks</title>
<updated>2014-04-07T23:36:06+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b3ab03160dfaf8ab78d476b670de319f4c1a5685'/>
<id>b3ab03160dfaf8ab78d476b670de319f4c1a5685</id>
<content type='text'>
Now that EXIT_DEAD is the terminal state it doesn't make sense to call
eligible_child() or security_task_wait() if the task is really dead.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that EXIT_DEAD is the terminal state it doesn't make sense to call
eligible_child() or security_task_wait() if the task is really dead.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: use EXIT_TRACE only if thread_group_leader(zombie)</title>
<updated>2014-04-07T23:36:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b436069059fede30ca31d4bf439cc86436ff5b1d'/>
<id>b436069059fede30ca31d4bf439cc86436ff5b1d</id>
<content type='text'>
wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
ptrace_reparented().  This is suboptimal and a bit confusing: we do not
need do_notify_parent(p) if !thread_group_leader(p) and in this case we
also do not need ptrace_unlink(), we can rely on ptrace_release_task().

Change wait_task_zombie() to check thread_group_leader() along with
ptrace_reparented() and simplify the final p-&gt;exit_state transition.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
ptrace_reparented().  This is suboptimal and a bit confusing: we do not
need do_notify_parent(p) if !thread_group_leader(p) and in this case we
also do not need ptrace_unlink(), we can rely on ptrace_release_task().

Change wait_task_zombie() to check thread_group_leader() along with
ptrace_reparented() and simplify the final p-&gt;exit_state transition.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD-&gt;EXIT_ZOMBIE transition</title>
<updated>2014-04-07T23:36:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=abd50b39e783e1b6c75c7534c37f1eb2d94a89cd'/>
<id>abd50b39e783e1b6c75c7534c37f1eb2d94a89cd</id>
<content type='text'>
wait_task_zombie() first does EXIT_ZOMBIE-&gt;EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for -&gt;real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD-&gt;EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
-&gt;notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset &gt;exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with -&gt;exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.  This was fixed by
the previous commit, but it was the temporary hack.

1. Add the new exit_state, EXIT_TRACE. It means that the task is the
   traced zombie, debugger is going to detach and notify its natural
   parent.

   This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
   can avoid the changes in proc/kgdb code, get_task_state() still
   reports "X (dead)" in this case.

   Note: with or without this change userspace can see Z -&gt; X -&gt; Z
   transition. Not really bad, but probably makes sense to fix.

2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
   if we need to notify the -&gt;real_parent.

3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
   is always the final state we can safely ignore such a task.

4. Change wait_consider_task() to check EXIT_TRACE separately and kill
   the racy and no longer needed ptrace_reparented() case.

   If ptrace == T an EXIT_TRACE thread should be simply ignored, the
   owner of this state is going to ptrace_unlink() this task. We can
   pretend that it was already removed from -&gt;ptraced list.

   Otherwise we should skip this thread too but clear -&gt;notask_error,
   we must be the natural parent and debugger is going to untrace and
   notify us. IOW, this doesn't differ from "EXIT_ZOMBIE &amp;&amp; p-&gt;ptrace"
   even if the task was already untraced.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Reported-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
wait_task_zombie() first does EXIT_ZOMBIE-&gt;EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for -&gt;real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD-&gt;EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
-&gt;notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset &gt;exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with -&gt;exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.  This was fixed by
the previous commit, but it was the temporary hack.

1. Add the new exit_state, EXIT_TRACE. It means that the task is the
   traced zombie, debugger is going to detach and notify its natural
   parent.

   This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
   can avoid the changes in proc/kgdb code, get_task_state() still
   reports "X (dead)" in this case.

   Note: with or without this change userspace can see Z -&gt; X -&gt; Z
   transition. Not really bad, but probably makes sense to fix.

2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
   if we need to notify the -&gt;real_parent.

3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
   is always the final state we can safely ignore such a task.

4. Change wait_consider_task() to check EXIT_TRACE separately and kill
   the racy and no longer needed ptrace_reparented() case.

   If ptrace == T an EXIT_TRACE thread should be simply ignored, the
   owner of this state is going to ptrace_unlink() this task. We can
   pretend that it was already removed from -&gt;ptraced list.

   Otherwise we should skip this thread too but clear -&gt;notask_error,
   we must be the natural parent and debugger is going to untrace and
   notify us. IOW, this doesn't differ from "EXIT_ZOMBIE &amp;&amp; p-&gt;ptrace"
   even if the task was already untraced.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Reported-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>wait: fix reparent_leader() vs EXIT_DEAD-&gt;EXIT_ZOMBIE race</title>
<updated>2014-04-07T23:36:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dfccbb5e49a621c1b21a62527d61fc4305617aca'/>
<id>dfccbb5e49a621c1b21a62527d61fc4305617aca</id>
<content type='text'>
wait_task_zombie() first does EXIT_ZOMBIE-&gt;EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for -&gt;real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD-&gt;EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
-&gt;notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset &gt;exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with -&gt;exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update -&gt;exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Reported-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
wait_task_zombie() first does EXIT_ZOMBIE-&gt;EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for -&gt;real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD-&gt;EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
-&gt;notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset &gt;exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with -&gt;exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update -&gt;exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Jan Kratochvil &lt;jan.kratochvil@redhat.com&gt;
Reported-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Tested-by: Michal Schmidt &lt;mschmidt@redhat.com&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Lennart Poettering &lt;lpoetter@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/exit.c: call proc_exit_connector() after exit_state is set</title>
<updated>2014-04-07T23:36:04+00:00</updated>
<author>
<name>Guillaume Morin</name>
<email>guillaume@morinfr.org</email>
</author>
<published>2014-04-07T22:38:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ef9823939e5acd5d323ff61fbc427ef998dd203e'/>
<id>ef9823939e5acd5d323ff61fbc427ef998dd203e</id>
<content type='text'>
The process events connector delivers a notification when a process
exits.  This is really convenient for a process that spawns and wants to
monitor its children through an epoll-able() interface.

Unfortunately, there is a small window between when the event is
delivered and the child become wait()-able.

This is creates a race if the parent wants to make sure that it knows
about the exit, e.g

pid_t pid = fork();
if (pid &gt; 0) {
	register_interest_for_pid(pid);
	if (waitpid(pid, NULL, WNOHANG) &gt; 0)
	{
	  /* We might have raced with exit() */
	}
	return;
}

/* Child */
execve(...)

register_interest_for_pid() would be telling the the connector socket
reader to pay attention to events related to pid.

Though this is not a bug, I think it would make the connector a bit more
usable if this race was closed by simply moving the call to
proc_exit_connector() from just before exit_notify() to right after.

Oleg said:

: Even with this patch the code above is still "racy" if the child is
: multi-threaded.  Plus it should obviously filter-out subthreads.  And
: afaics there is no way to make it reliable, even if you change the code
: above so that waitpid() is called only after the last thread exits WNOHANG
: still can fail.

Signed-off-by: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Cc: Matt Helsley &lt;matt.helsley@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The process events connector delivers a notification when a process
exits.  This is really convenient for a process that spawns and wants to
monitor its children through an epoll-able() interface.

Unfortunately, there is a small window between when the event is
delivered and the child become wait()-able.

This is creates a race if the parent wants to make sure that it knows
about the exit, e.g

pid_t pid = fork();
if (pid &gt; 0) {
	register_interest_for_pid(pid);
	if (waitpid(pid, NULL, WNOHANG) &gt; 0)
	{
	  /* We might have raced with exit() */
	}
	return;
}

/* Child */
execve(...)

register_interest_for_pid() would be telling the the connector socket
reader to pay attention to events related to pid.

Though this is not a bug, I think it would make the connector a bit more
usable if this race was closed by simply moving the call to
proc_exit_connector() from just before exit_notify() to right after.

Oleg said:

: Even with this patch the code above is still "racy" if the child is
: multi-threaded.  Plus it should obviously filter-out subthreads.  And
: afaics there is no way to make it reliable, even if you change the code
: above so that waitpid() is called only after the last thread exits WNOHANG
: still can fail.

Signed-off-by: Guillaume Morin &lt;guillaume@morinfr.org&gt;
Cc: Matt Helsley &lt;matt.helsley@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exit: move check_stack_usage() to the end of do_exit()</title>
<updated>2014-04-07T23:36:04+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4bcb8232cf4eb061b086c10f56b6808adcdb5a93'/>
<id>4bcb8232cf4eb061b086c10f56b6808adcdb5a93</id>
<content type='text'>
It is not clear why check_stack_usage() is called so early and thus it
never checks the stack usage in, say, exit_notify() or
flush_ptrace_hw_breakpoint() or other functions which are only called by
do_exit().

Move the callsite down to the last preempt_disable/schedule.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is not clear why check_stack_usage() is called so early and thus it
never checks the stack usage in, say, exit_notify() or
flush_ptrace_hw_breakpoint() or other functions which are only called by
do_exit().

Move the callsite down to the last preempt_disable/schedule.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exit: call disassociate_ctty() before exit_task_namespaces()</title>
<updated>2014-04-07T23:36:03+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-04-07T22:38:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c39df5fa37b0623589508c95515b4aa1531c524e'/>
<id>c39df5fa37b0623589508c95515b4aa1531c524e</id>
<content type='text'>
Commit 8aac62706ada ("move exit_task_namespaces() outside of
exit_notify()") breaks pppd and the exiting service crashes the kernel:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: ppp_register_channel+0x13/0x20 [ppp_generic]
    Call Trace:
      ppp_asynctty_open+0x12b/0x170 [ppp_async]
      tty_ldisc_open.isra.2+0x27/0x60
      tty_ldisc_hangup+0x1e3/0x220
      __tty_hangup+0x2c4/0x440
      disassociate_ctty+0x61/0x270
      do_exit+0x7f2/0xa50

ppp_register_channel() needs -&gt;net_ns and current-&gt;nsproxy == NULL.

Move disassociate_ctty() before exit_task_namespaces(), it doesn't make
sense to delay it after perf_event_exit_task() or cgroup_exit().

This also allows to use task_work_add() inside the (nontrivial) code
paths in disassociate_ctty().

Investigated by Peter Hurley.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Sree Harsha Totakura &lt;sreeharsha@totakura.in&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Sree Harsha Totakura &lt;sreeharsha@totakura.in&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Andrey Vagin &lt;avagin@openvz.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v3.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 8aac62706ada ("move exit_task_namespaces() outside of
exit_notify()") breaks pppd and the exiting service crashes the kernel:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: ppp_register_channel+0x13/0x20 [ppp_generic]
    Call Trace:
      ppp_asynctty_open+0x12b/0x170 [ppp_async]
      tty_ldisc_open.isra.2+0x27/0x60
      tty_ldisc_hangup+0x1e3/0x220
      __tty_hangup+0x2c4/0x440
      disassociate_ctty+0x61/0x270
      do_exit+0x7f2/0xa50

ppp_register_channel() needs -&gt;net_ns and current-&gt;nsproxy == NULL.

Move disassociate_ctty() before exit_task_namespaces(), it doesn't make
sense to delay it after perf_event_exit_task() or cgroup_exit().

This also allows to use task_work_add() inside the (nontrivial) code
paths in disassociate_ctty().

Investigated by Peter Hurley.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Sree Harsha Totakura &lt;sreeharsha@totakura.in&gt;
Cc: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Cc: Sree Harsha Totakura &lt;sreeharsha@totakura.in&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Andrey Vagin &lt;avagin@openvz.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[v3.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: remove useless argument from cgroup_exit()</title>
<updated>2014-03-29T13:15:54+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizefan@huawei.com</email>
</author>
<published>2014-03-28T07:22:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1ec41830e087cda1f62dda4182c2b62811eb0ffc'/>
<id>1ec41830e087cda1f62dda4182c2b62811eb0ffc</id>
<content type='text'>
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Li Zefan &lt;lizefan@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
