<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel, branch v2.6.30.8</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>kthreads: fix kthread_create() vs kthread_stop() race</title>
<updated>2009-09-09T03:33:58+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2009-08-24T10:45:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=df221c536aa18008724e57acc4170e0ba38fb11c'/>
<id>df221c536aa18008724e57acc4170e0ba38fb11c</id>
<content type='text'>
The bug should be "accidently" fixed by recent changes in 2.6.31,
all kernels &lt;= 2.6.30 need the fix. The problem was never noticed before,
it was found because it causes mysterious failures with GFS mount/umount.

Credits to Robert Peterson. He blaimed kthread.c from the very beginning.
But, despite my promise, I forgot to inspect the old implementation until
he did a lot of testing and reminded me. This led to huge delay in fixing
this bug.

kthread_stop() does put_task_struct(k) before it clears kthread_stop_info.k.
This means another kthread_create() can re-use this task_struct, but the
new kthread can still see kthread_should_stop() == T and exit even without
calling threadfn().

Reported-by: Robert Peterson &lt;rpeterso@redhat.com&gt;
Tested-by: Robert Peterson &lt;rpeterso@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bug should be "accidently" fixed by recent changes in 2.6.31,
all kernels &lt;= 2.6.30 need the fix. The problem was never noticed before,
it was found because it causes mysterious failures with GFS mount/umount.

Credits to Robert Peterson. He blaimed kthread.c from the very beginning.
But, despite my promise, I forgot to inspect the old implementation until
he did a lot of testing and reminded me. This led to huge delay in fixing
this bug.

kthread_stop() does put_task_struct(k) before it clears kthread_stop_info.k.
This means another kthread_create() can re-use this task_struct, but the
new kthread can still see kthread_should_stop() == T and exit even without
calling threadfn().

Reported-by: Robert Peterson &lt;rpeterso@redhat.com&gt;
Tested-by: Robert Peterson &lt;rpeterso@redhat.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>do_sigaltstack: avoid copying 'stack_t' as a structure to user space</title>
<updated>2009-09-09T03:33:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-08-01T17:34:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=108c7d85e4f7e2c5e01dd554d7a13bb0dbab7768'/>
<id>108c7d85e4f7e2c5e01dd554d7a13bb0dbab7768</id>
<content type='text'>
commit 0083fc2c50e6c5127c2802ad323adf8143ab7856 upstream.

Ulrich Drepper correctly points out that there is generally padding in
the structure on 64-bit hosts, and that copying the structure from
kernel to user space can leak information from the kernel stack in those
padding bytes.

Avoid the whole issue by just copying the three members one by one
instead, which also means that the function also can avoid the need for
a stack frame.  This also happens to match how we copy the new structure
from user space, so it all even makes sense.

[ The obvious solution of adding a memset() generates horrid code, gcc
  does really stupid things. ]

Reported-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0083fc2c50e6c5127c2802ad323adf8143ab7856 upstream.

Ulrich Drepper correctly points out that there is generally padding in
the structure on 64-bit hosts, and that copying the structure from
kernel to user space can leak information from the kernel stack in those
padding bytes.

Avoid the whole issue by just copying the three members one by one
instead, which also means that the function also can avoid the need for
a stack frame.  This also happens to match how we copy the new structure
from user space, so it all even makes sense.

[ The obvious solution of adding a memset() generates horrid code, gcc
  does really stupid things. ]

Reported-by: Ulrich Drepper &lt;drepper@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>clone(): fix race between copy_process() and de_thread()</title>
<updated>2009-09-09T03:33:24+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2009-08-26T21:29:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9c3239fd2f55924bf333b9cb4d2c28f27388c728'/>
<id>9c3239fd2f55924bf333b9cb4d2c28f27388c728</id>
<content type='text'>
commit 4ab6c08336535f8c8e42cf45d7adeda882eff06e upstream.

Spotted by Hiroshi Shimamoto who also provided the test-case below.

copy_process() uses signal-&gt;count as a reference counter, but it is not.
This test case

	#include &lt;sys/types.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;errno.h&gt;
	#include &lt;pthread.h&gt;

	void *null_thread(void *p)
	{
		for (;;)
			sleep(1);

		return NULL;
	}

	void *exec_thread(void *p)
	{
		execl("/bin/true", "/bin/true", NULL);

		return null_thread(p);
	}

	int main(int argc, char **argv)
	{
		for (;;) {
			pid_t pid;
			int ret, status;

			pid = fork();
			if (pid &lt; 0)
				break;

			if (!pid) {
				pthread_t tid;

				pthread_create(&amp;tid, NULL, exec_thread, NULL);
				for (;;)
					pthread_create(&amp;tid, NULL, null_thread, NULL);
			}

			do {
				ret = waitpid(pid, &amp;status, 0);
			} while (ret == -1 &amp;&amp; errno == EINTR);
		}

		return 0;
	}

quickly creates an unkillable task.

If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()-&gt;atomic(signal-&gt;count) breaks the signal-&gt;notify_count
logic, and the execing thread can hang forever in kernel space.

Change copy_process() to increment count/live only when we know for sure
we can't fail.  In this case the forked thread will take care of its
reference to signal correctly.

If copy_process() fails, check CLONE_THREAD flag.  If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group.  If it is not set, -&gt;signal must be released in any case
(and -&gt;count must be == 1), the forked child is the only thread in the
thread group.

We need more cleanups here, in particular signal-&gt;count should not be used
by de_thread/__exit_signal at all.  This patch only fixes the bug.

Reported-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Tested-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4ab6c08336535f8c8e42cf45d7adeda882eff06e upstream.

Spotted by Hiroshi Shimamoto who also provided the test-case below.

copy_process() uses signal-&gt;count as a reference counter, but it is not.
This test case

	#include &lt;sys/types.h&gt;
	#include &lt;sys/wait.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;errno.h&gt;
	#include &lt;pthread.h&gt;

	void *null_thread(void *p)
	{
		for (;;)
			sleep(1);

		return NULL;
	}

	void *exec_thread(void *p)
	{
		execl("/bin/true", "/bin/true", NULL);

		return null_thread(p);
	}

	int main(int argc, char **argv)
	{
		for (;;) {
			pid_t pid;
			int ret, status;

			pid = fork();
			if (pid &lt; 0)
				break;

			if (!pid) {
				pthread_t tid;

				pthread_create(&amp;tid, NULL, exec_thread, NULL);
				for (;;)
					pthread_create(&amp;tid, NULL, null_thread, NULL);
			}

			do {
				ret = waitpid(pid, &amp;status, 0);
			} while (ret == -1 &amp;&amp; errno == EINTR);
		}

		return 0;
	}

quickly creates an unkillable task.

If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()-&gt;atomic(signal-&gt;count) breaks the signal-&gt;notify_count
logic, and the execing thread can hang forever in kernel space.

Change copy_process() to increment count/live only when we know for sure
we can't fail.  In this case the forked thread will take care of its
reference to signal correctly.

If copy_process() fails, check CLONE_THREAD flag.  If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group.  If it is not set, -&gt;signal must be released in any case
(and -&gt;count must be == 1), the forked child is the only thread in the
thread group.

We need more cleanups here, in particular signal-&gt;count should not be used
by de_thread/__exit_signal at all.  This patch only fixes the bug.

Reported-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Tested-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ring-buffer: Fix advance of reader in rb_buffer_peek()</title>
<updated>2009-08-16T21:19:16+00:00</updated>
<author>
<name>Robert Richter</name>
<email>robert.richter@amd.com</email>
</author>
<published>2009-08-12T15:59:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46dbb6a2b732f337f11a981eb0d994582405c03e'/>
<id>46dbb6a2b732f337f11a981eb0d994582405c03e</id>
<content type='text'>
Backport for 2.6.30-stable of:

 469535a ring-buffer: Fix advance of reader in rb_buffer_peek()

When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.

Also, I simplified some code in ring_buffer_consume().

------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[&lt;ffffffff8106776f&gt;] ? rb_advance_reader+0x2e/0xc5
[&lt;ffffffff81039ffe&gt;] warn_slowpath_common+0x77/0x8f
[&lt;ffffffff8103a025&gt;] warn_slowpath_null+0xf/0x11
[&lt;ffffffff8106776f&gt;] rb_advance_reader+0x2e/0xc5
[&lt;ffffffff81068bda&gt;] ring_buffer_consume+0xa0/0xd2
[&lt;ffffffff81326933&gt;] op_cpu_buffer_read_entry+0x21/0x9e
[&lt;ffffffff810be3af&gt;] ? __find_get_block+0x4b/0x165
[&lt;ffffffff8132749b&gt;] sync_buffer+0xa5/0x401
[&lt;ffffffff810be3af&gt;] ? __find_get_block+0x4b/0x165
[&lt;ffffffff81326c1b&gt;] ? wq_sync_buffer+0x0/0x78
[&lt;ffffffff81326c76&gt;] wq_sync_buffer+0x5b/0x78
[&lt;ffffffff8104aa30&gt;] worker_thread+0x113/0x1ac
[&lt;ffffffff8104dd95&gt;] ? autoremove_wake_function+0x0/0x38
[&lt;ffffffff8104a91d&gt;] ? worker_thread+0x0/0x1ac
[&lt;ffffffff8104dc9a&gt;] kthread+0x88/0x92
[&lt;ffffffff8100bdba&gt;] child_rip+0xa/0x20
[&lt;ffffffff8104dc12&gt;] ? kthread+0x0/0x92
[&lt;ffffffff8100bdb0&gt;] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---

Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Robert Richter &lt;robert.richter@amd.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Backport for 2.6.30-stable of:

 469535a ring-buffer: Fix advance of reader in rb_buffer_peek()

When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.

Also, I simplified some code in ring_buffer_consume().

------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[&lt;ffffffff8106776f&gt;] ? rb_advance_reader+0x2e/0xc5
[&lt;ffffffff81039ffe&gt;] warn_slowpath_common+0x77/0x8f
[&lt;ffffffff8103a025&gt;] warn_slowpath_null+0xf/0x11
[&lt;ffffffff8106776f&gt;] rb_advance_reader+0x2e/0xc5
[&lt;ffffffff81068bda&gt;] ring_buffer_consume+0xa0/0xd2
[&lt;ffffffff81326933&gt;] op_cpu_buffer_read_entry+0x21/0x9e
[&lt;ffffffff810be3af&gt;] ? __find_get_block+0x4b/0x165
[&lt;ffffffff8132749b&gt;] sync_buffer+0xa5/0x401
[&lt;ffffffff810be3af&gt;] ? __find_get_block+0x4b/0x165
[&lt;ffffffff81326c1b&gt;] ? wq_sync_buffer+0x0/0x78
[&lt;ffffffff81326c76&gt;] wq_sync_buffer+0x5b/0x78
[&lt;ffffffff8104aa30&gt;] worker_thread+0x113/0x1ac
[&lt;ffffffff8104dd95&gt;] ? autoremove_wake_function+0x0/0x38
[&lt;ffffffff8104a91d&gt;] ? worker_thread+0x0/0x1ac
[&lt;ffffffff8104dc9a&gt;] kthread+0x88/0x92
[&lt;ffffffff8100bdba&gt;] child_rip+0xa/0x20
[&lt;ffffffff8104dc12&gt;] ? kthread+0x0/0x92
[&lt;ffffffff8100bdb0&gt;] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---

Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Robert Richter &lt;robert.richter@amd.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ring-buffer: Fix memleak in ring_buffer_free()</title>
<updated>2009-08-16T21:19:07+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-08-07T10:49:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fe1bbf0945bfff8913fe36b54bde71ebc412ce4b'/>
<id>fe1bbf0945bfff8913fe36b54bde71ebc412ce4b</id>
<content type='text'>
commit bd3f02212d6a457267e0c9c02c426151c436d9d4 upstream.

I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
LKML-Reference: &lt;4A7C06B9.2090302@gmail.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bd3f02212d6a457267e0c9c02c426151c436d9d4 upstream.

I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
LKML-Reference: &lt;4A7C06B9.2090302@gmail.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>generic-ipi: fix hotplug_cfd()</title>
<updated>2009-08-16T21:19:00+00:00</updated>
<author>
<name>Xiao Guangrong</name>
<email>xiaoguangrong@cn.fujitsu.com</email>
</author>
<published>2009-08-06T22:07:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=da63841486a8f43acf16c6709d5fdf37675fb50b'/>
<id>da63841486a8f43acf16c6709d5fdf37675fb50b</id>
<content type='text'>
commit 69dd647f969c28d18de77e2153f30d05a1874571 upstream.

Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n.

The bug was introduced by 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()").

Signed-off-by: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 69dd647f969c28d18de77e2153f30d05a1874571 upstream.

Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n.

The bug was introduced by 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()").

Signed-off-by: Xiao Guangrong &lt;xiaoguangrong@cn.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: Nick Piggin &lt;nickpiggin@yahoo.com.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>execve: must clear current-&gt;clear_child_tid</title>
<updated>2009-08-16T21:18:57+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2009-08-06T22:09:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=36bd78649e79b5689d263e51eec98e965c43ca3a'/>
<id>36bd78649e79b5689d263e51eec98e965c43ca3a</id>
<content type='text'>
commit 9c8a8228d0827e0d91d28527209988f672f97d28 upstream.

While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.  This
support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current-&gt;clear_child_tid.

At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call that
   glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
   ...) syscall

2) When new process starts, its current-&gt;clear_child_tid is set to a
   location that has a meaning only in bash (or initial program) context
   (&amp;THREAD_SELF-&gt;tid)

3) This new process does the execve() syscall to start a new program.
   current-&gt;clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
   kernel will attempt to clear the integer pointed by
   current-&gt;clear_child_tid from mm_release() :

        if (tsk-&gt;clear_child_tid
            &amp;&amp; !(tsk-&gt;flags &amp; PF_SIGNALED)
            &amp;&amp; atomic_read(&amp;mm-&gt;mm_users) &gt; 1) {
                u32 __user * tidptr = tsk-&gt;clear_child_tid;
                tsk-&gt;clear_child_tid = NULL;

                /*
                 * We don't check the error code - if userspace has
                 * not set up a proper pointer then tough luck.
                 */
&lt;&lt; here &gt;&gt;      put_user(0, tidptr);
                sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
        }

5) OR : if new program is not multi-threaded, but spied by /proc/pid
   users (ps command for example), mm_users &gt; 1, and the exiting program
   could corrupt 4 bytes in a persistent memory area (shm or memory mapped
   file)

If current-&gt;clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom &lt;jens@mcbone.net&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sonny Rao &lt;sonnyrao@us.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9c8a8228d0827e0d91d28527209988f672f97d28 upstream.

While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.  This
support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current-&gt;clear_child_tid.

At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call that
   glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
   ...) syscall

2) When new process starts, its current-&gt;clear_child_tid is set to a
   location that has a meaning only in bash (or initial program) context
   (&amp;THREAD_SELF-&gt;tid)

3) This new process does the execve() syscall to start a new program.
   current-&gt;clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
   kernel will attempt to clear the integer pointed by
   current-&gt;clear_child_tid from mm_release() :

        if (tsk-&gt;clear_child_tid
            &amp;&amp; !(tsk-&gt;flags &amp; PF_SIGNALED)
            &amp;&amp; atomic_read(&amp;mm-&gt;mm_users) &gt; 1) {
                u32 __user * tidptr = tsk-&gt;clear_child_tid;
                tsk-&gt;clear_child_tid = NULL;

                /*
                 * We don't check the error code - if userspace has
                 * not set up a proper pointer then tough luck.
                 */
&lt;&lt; here &gt;&gt;      put_user(0, tidptr);
                sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
        }

5) OR : if new program is not multi-threaded, but spied by /proc/pid
   users (ps command for example), mm_users &gt; 1, and the exiting program
   could corrupt 4 bytes in a persistent memory area (shm or memory mapped
   file)

If current-&gt;clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom &lt;jens@mcbone.net&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sonny Rao &lt;sonnyrao@us.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW</title>
<updated>2009-08-16T21:18:38+00:00</updated>
<author>
<name>Hiroshi Shimamoto</name>
<email>h-shimamoto@ct.jp.nec.com</email>
</author>
<published>2009-08-03T02:48:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f175fe2d75ae60db85e8ca5c78da99789fcfe212'/>
<id>f175fe2d75ae60db85e8ca5c78da99789fcfe212</id>
<content type='text'>
commit 70d715fd0597f18528f389b5ac59102263067744 upstream.

Prevent calling do_nanosleep() with clockid
CLOCK_MONOTONIC_RAW, it may cause oops, such as NULL pointer
dereference.

Signed-off-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: John Stultz &lt;johnstul@us.ibm.com&gt;
LKML-Reference: &lt;4A764FF3.50607@ct.jp.nec.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 70d715fd0597f18528f389b5ac59102263067744 upstream.

Prevent calling do_nanosleep() with clockid
CLOCK_MONOTONIC_RAW, it may cause oops, such as NULL pointer
dereference.

Signed-off-by: Hiroshi Shimamoto &lt;h-shimamoto@ct.jp.nec.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: John Stultz &lt;johnstul@us.ibm.com&gt;
LKML-Reference: &lt;4A764FF3.50607@ct.jp.nec.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Fix missing function_graph events when we splice_read from trace_pipe</title>
<updated>2009-08-16T21:18:36+00:00</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2009-07-28T12:17:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5f18da549f0277e1c867169a74b1365d24cc3a9e'/>
<id>5f18da549f0277e1c867169a74b1365d24cc3a9e</id>
<content type='text'>
commit 74e7ff8c50b6b022e6ffaa736b16a4dc161d3eaf upstream.

About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1)               |      ktime_get_real() {
1)   2.846 us    |          read_hpet();
1)   4.558 us    |        }
1)   6.195 us    |      }

After this patch:

0)               |      ktime_get_real() {
0)               |        getnstimeofday() {
0)   1.960 us    |          read_hpet();
0)   3.597 us    |        }
0)   5.196 us    |      }

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;4A6EEC52.90704@cn.fujitsu.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 74e7ff8c50b6b022e6ffaa736b16a4dc161d3eaf upstream.

About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1)               |      ktime_get_real() {
1)   2.846 us    |          read_hpet();
1)   4.558 us    |        }
1)   6.195 us    |      }

After this patch:

0)               |      ktime_get_real() {
0)               |        getnstimeofday() {
0)   1.960 us    |          read_hpet();
0)   3.597 us    |        }
0)   5.196 us    |      }

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;4A6EEC52.90704@cn.fujitsu.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Fix invalid function_graph entry</title>
<updated>2009-08-16T21:18:35+00:00</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2009-07-28T12:11:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=75db2222e404b259f1fd2caaeb825e8489e02a31'/>
<id>75db2222e404b259f1fd2caaeb825e8489e02a31</id>
<content type='text'>
commit 38ceb592fcac9110c6b3c87ea0a27bff68c43486 upstream.

When print_graph_entry() computes a function call entry event, it needs
to also check the next entry to guess if it matches the return event of
the current function entry.
In order to look at this next event, it needs to consume the current
entry before going ahead in the ring buffer.

However, if the current event that gets consumed is the last one in the
ring buffer head page, the ring_buffer may reuse the page for writers.
The consumed entry will then become invalid because of possible
racy overwriting.

Me must then handle this entry by making a copy of it.

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;4A6EEAEC.3050508@cn.fujitsu.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 38ceb592fcac9110c6b3c87ea0a27bff68c43486 upstream.

When print_graph_entry() computes a function call entry event, it needs
to also check the next entry to guess if it matches the return event of
the current function entry.
In order to look at this next event, it needs to consume the current
entry before going ahead in the ring buffer.

However, if the current event that gets consumed is the last one in the
ring buffer head page, the ring_buffer may reuse the page for writers.
The consumed entry will then become invalid because of possible
racy overwriting.

Me must then handle this entry by making a copy of it.

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
LKML-Reference: &lt;4A6EEAEC.3050508@cn.fujitsu.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
