linux-toradex.git/kernel, branch v2.6.30.8

kthreads: fix kthread_create() vs kthread_stop() race

2009-09-09T03:33:58+00:00

The bug should be "accidently" fixed by recent changes in 2.6.31,
all kernels <= 2.6.30 need the fix. The problem was never noticed before,
it was found because it causes mysterious failures with GFS mount/umount.

Credits to Robert Peterson. He blaimed kthread.c from the very beginning.
But, despite my promise, I forgot to inspect the old implementation until
he did a lot of testing and reminded me. This led to huge delay in fixing
this bug.

kthread_stop() does put_task_struct(k) before it clears kthread_stop_info.k.
This means another kthread_create() can re-use this task_struct, but the
new kthread can still see kthread_should_stop() == T and exit even without
calling threadfn().

Reported-by: Robert Peterson 
Tested-by: Robert Peterson 
Signed-off-by: Oleg Nesterov 
Acked-by: Rusty Russell 
Signed-off-by: Greg Kroah-Hartman

do_sigaltstack: avoid copying 'stack_t' as a structure to user space

2009-09-09T03:33:50+00:00

commit 0083fc2c50e6c5127c2802ad323adf8143ab7856 upstream.

Ulrich Drepper correctly points out that there is generally padding in
the structure on 64-bit hosts, and that copying the structure from
kernel to user space can leak information from the kernel stack in those
padding bytes.

Avoid the whole issue by just copying the three members one by one
instead, which also means that the function also can avoid the need for
a stack frame.  This also happens to match how we copy the new structure
from user space, so it all even makes sense.

[ The obvious solution of adding a memset() generates horrid code, gcc
  does really stupid things. ]

Reported-by: Ulrich Drepper 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

clone(): fix race between copy_process() and de_thread()

2009-09-09T03:33:24+00:00

commit 4ab6c08336535f8c8e42cf45d7adeda882eff06e upstream.

Spotted by Hiroshi Shimamoto who also provided the test-case below.

copy_process() uses signal->count as a reference counter, but it is not.
This test case

	#include 
	#include 
	#include 
	#include 
	#include 
	#include 

	void *null_thread(void *p)
	{
		for (;;)
			sleep(1);

		return NULL;
	}

	void *exec_thread(void *p)
	{
		execl("/bin/true", "/bin/true", NULL);

		return null_thread(p);
	}

	int main(int argc, char **argv)
	{
		for (;;) {
			pid_t pid;
			int ret, status;

			pid = fork();
			if (pid < 0)
				break;

			if (!pid) {
				pthread_t tid;

				pthread_create(&tid, NULL, exec_thread, NULL);
				for (;;)
					pthread_create(&tid, NULL, null_thread, NULL);
			}

			do {
				ret = waitpid(pid, &status, 0);
			} while (ret == -1 && errno == EINTR);
		}

		return 0;
	}

quickly creates an unkillable task.

If copy_process(CLONE_THREAD) races with de_thread()
copy_signal()->atomic(signal->count) breaks the signal->notify_count
logic, and the execing thread can hang forever in kernel space.

Change copy_process() to increment count/live only when we know for sure
we can't fail.  In this case the forked thread will take care of its
reference to signal correctly.

If copy_process() fails, check CLONE_THREAD flag.  If it it set - do
nothing, the counters were not changed and current belongs to the same
thread group.  If it is not set, ->signal must be released in any case
(and ->count must be == 1), the forked child is the only thread in the
thread group.

We need more cleanups here, in particular signal->count should not be used
by de_thread/__exit_signal at all.  This patch only fixes the bug.

Reported-by: Hiroshi Shimamoto 
Tested-by: Hiroshi Shimamoto 
Signed-off-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ring-buffer: Fix advance of reader in rb_buffer_peek()

2009-08-16T21:19:16+00:00

Backport for 2.6.30-stable of:

 469535a ring-buffer: Fix advance of reader in rb_buffer_peek()

When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.

Also, I simplified some code in ring_buffer_consume().

------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[] ? rb_advance_reader+0x2e/0xc5
[] warn_slowpath_common+0x77/0x8f
[] warn_slowpath_null+0xf/0x11
[] rb_advance_reader+0x2e/0xc5
[] ring_buffer_consume+0xa0/0xd2
[] op_cpu_buffer_read_entry+0x21/0x9e
[] ? __find_get_block+0x4b/0x165
[] sync_buffer+0xa5/0x401
[] ? __find_get_block+0x4b/0x165
[] ? wq_sync_buffer+0x0/0x78
[] wq_sync_buffer+0x5b/0x78
[] worker_thread+0x113/0x1ac
[] ? autoremove_wake_function+0x0/0x38
[] ? worker_thread+0x0/0x1ac
[] kthread+0x88/0x92
[] child_rip+0xa/0x20
[] ? kthread+0x0/0x92
[] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---

Cc: Steven Rostedt 
Signed-off-by: Robert Richter 
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

ring-buffer: Fix memleak in ring_buffer_free()

2009-08-16T21:19:07+00:00

commit bd3f02212d6a457267e0c9c02c426151c436d9d4 upstream.

I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.

Signed-off-by: Eric Dumazet 
LKML-Reference: <4A7C06B9.2090302@gmail.com>
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

generic-ipi: fix hotplug_cfd()

2009-08-16T21:19:00+00:00

commit 69dd647f969c28d18de77e2153f30d05a1874571 upstream.

Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n.

The bug was introduced by 8969a5ede0f9e17da4b943712429aef2c9bcd82b
("generic-ipi: remove kmalloc()").

Signed-off-by: Xiao Guangrong 
Cc: Ingo Molnar 
Cc: Jens Axboe 
Cc: Nick Piggin 
Cc: Peter Zijlstra 
Cc: Rusty Russell 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

execve: must clear current->clear_child_tid

2009-08-16T21:18:57+00:00

commit 9c8a8228d0827e0d91d28527209988f672f97d28 upstream.

While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.  This
support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current->clear_child_tid.

At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call that
   glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
   ...) syscall

2) When new process starts, its current->clear_child_tid is set to a
   location that has a meaning only in bash (or initial program) context
   (&THREAD_SELF->tid)

3) This new process does the execve() syscall to start a new program.
   current->clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
   kernel will attempt to clear the integer pointed by
   current->clear_child_tid from mm_release() :

        if (tsk->clear_child_tid
            && !(tsk->flags & PF_SIGNALED)
            && atomic_read(&mm->mm_users) > 1) {
                u32 __user * tidptr = tsk->clear_child_tid;
                tsk->clear_child_tid = NULL;

                /*
                 * We don't check the error code - if userspace has
                 * not set up a proper pointer then tough luck.
                 */
<< here >>      put_user(0, tidptr);
                sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
        }

5) OR : if new program is not multi-threaded, but spied by /proc/pid
   users (ps command for example), mm_users > 1, and the exiting program
   could corrupt 4 bytes in a persistent memory area (shm or memory mapped
   file)

If current->clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom 
Acked-by: Linus Torvalds 
Signed-off-by: Eric Dumazet 
Signed-off-by: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Sonny Rao 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Ulrich Drepper 
Cc: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW

2009-08-16T21:18:38+00:00

commit 70d715fd0597f18528f389b5ac59102263067744 upstream.

Prevent calling do_nanosleep() with clockid
CLOCK_MONOTONIC_RAW, it may cause oops, such as NULL pointer
dereference.

Signed-off-by: Hiroshi Shimamoto 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: John Stultz 
LKML-Reference: <4A764FF3.50607@ct.jp.nec.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

tracing: Fix missing function_graph events when we splice_read from trace_pipe

2009-08-16T21:18:36+00:00

commit 74e7ff8c50b6b022e6ffaa736b16a4dc161d3eaf upstream.

About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1)               |      ktime_get_real() {
1)   2.846 us    |          read_hpet();
1)   4.558 us    |        }
1)   6.195 us    |      }

After this patch:

0)               |      ktime_get_real() {
0)               |        getnstimeofday() {
0)   1.960 us    |          read_hpet();
0)   3.597 us    |        }
0)   5.196 us    |      }

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan 
Cc: Steven Rostedt 
LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Greg Kroah-Hartman

tracing: Fix invalid function_graph entry

2009-08-16T21:18:35+00:00

commit 38ceb592fcac9110c6b3c87ea0a27bff68c43486 upstream.

When print_graph_entry() computes a function call entry event, it needs
to also check the next entry to guess if it matches the return event of
the current function entry.
In order to look at this next event, it needs to consume the current
entry before going ahead in the ring buffer.

However, if the current event that gets consumed is the last one in the
ring buffer head page, the ring_buffer may reuse the page for writers.
The consumed entry will then become invalid because of possible
racy overwriting.

Me must then handle this entry by making a copy of it.

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan 
Cc: Steven Rostedt 
LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Greg Kroah-Hartman