linux-toradex.git/kernel, branch v2.6.33.8

call_function_many: add missing ordering

2011-03-21T19:45:53+00:00

commit 45a5791920ae643eafc02e2eedef1a58e341b736 upstream.

Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.

First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing.  This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs.  This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request.  (I was thinking this barrier was present
but was not).

Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu.  This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().

However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu.  By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).

I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function.  I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.

Cc: Paul McKinney 
Signed-off-by: Milton Miller 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

call_function_many: fix list delete vs add race

2011-03-21T19:45:52+00:00

commit e6cd1e07a185d5f9b0aa75e020df02d3c1c44940 upstream.

Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.

Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.

I tried to force this race with a udelay before the lock & list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful.  Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.

Reported-by: Peter Zijlstra 
Signed-off-by: Milton Miller 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

sched: Fix user time incorrectly accounted as system time on 32-bit

2011-03-21T19:45:45+00:00

commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

 https://bugzilla.redhat.com/show_bug.cgi?id=633037
 https://bugzilla.kernel.org/show_bug.cgi?id=16559

Reported-by: Michael Chapman 
Reported-by: Ciriaco Garcia de Celis 
Signed-off-by: Stanislaw Gruszka 
Signed-off-by: Peter Zijlstra 
Cc: Hidetoshi Seto 
LKML-Reference: <20100914143513.GB8415@redhat.com>
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

pid: make setpgid() system call use RCU read-side critical section

2011-03-21T19:45:45+00:00

commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream.

[   23.584719]
[   23.584720] ===================================================
[   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
[   23.585176] ---------------------------------------------------
[   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
[   23.585176]
[   23.585176] other info that might help us debug this:
[   23.585176]
[   23.585176]
[   23.585176] rcu_scheduler_active = 1, debug_locks = 1
[   23.585176] 1 lock held by rc.sysinit/728:
[   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193
[   23.585176]
[   23.585176] stack backtrace:
[   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
[   23.585176] Call Trace:
[   23.585176]  [] lockdep_rcu_dereference+0x99/0xa2
[   23.585176]  [] find_task_by_pid_ns+0x50/0x6a
[   23.585176]  [] find_task_by_vpid+0x1d/0x1f
[   23.585176]  [] sys_setpgid+0x67/0x193
[   23.585176]  [] system_call_fastpath+0x16/0x1b
[   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

It turns out that the setpgid() system call fails to enter an RCU
read-side critical section before doing a PID-to-task_struct translation.
This commit therefore does rcu_read_lock() before the translation, and
also does rcu_read_unlock() after the last use of the returned pointer.

Reported-by: Andrew Morton 
Signed-off-by: Paul E. McKenney 
Acked-by: David Howells 
Cc: Jiri Slaby 
Cc: Oleg Nesterov 
Signed-off-by: Greg Kroah-Hartman

hw breakpoints: Fix pid namespace bug

2011-03-21T19:45:42+00:00

commit 068e35eee9ef98eb4cab55181977e24995d273be upstream.

Hardware breakpoints can't be registered within pid namespaces
because tsk->pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker  for doing
the bulk of the work finding this bug.

Reported-by: Robin Green 
Signed-off-by: Matt Helsley 
Signed-off-by: Peter Zijlstra 
Cc: Prasad 
Cc: Arnaldo Carvalho de Melo 
Cc: Steven Rostedt 
Cc: Will Deacon 
Cc: Mahesh Salgaonkar 
LKML-Reference: 
Signed-off-by: Ingo Molnar 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Greg Kroah-Hartman

Fix unprotected access to task credentials in waitid()

2011-03-21T19:45:41+00:00

commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream.

Using a program like the following:

	#include 
	#include 
	#include 
	#include 

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310
	 #1:  (&(&sighand->siglock)->rlock){-.-...}, at: []
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [] lockdep_rcu_dereference+0xa4/0xc0
	 [] wait_consider_task+0xaf1/0xbe0
	 [] do_wait+0xf5/0x310
	 [] sys_waitid+0x86/0x1f0
	 [] ? child_wait_callback+0x0/0x70
	 [] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman 
Signed-off-by: David Howells 
Acked-by: Paul E. McKenney 
Acked-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ftrace: Fix memory leak with function graph and cpu hotplug

2011-03-21T19:45:37+00:00

commit 868baf07b1a259f5f3803c1dc2777b6c358f83cf upstream.

When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.

On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.

ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.

The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.

Reported-by: Benjamin Herrenschmidt 
Suggested-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

cpuset: add a missing unlock in cpuset_write_resmask()

2011-03-21T19:45:28+00:00

commit b75f38d659e6fc747eda64cb72f3920e29dd44a4 upstream.

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan 
Cc: Paul Menage 
Acked-by: David Rientjes 
Cc: Miao Xie 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

clockevents: Prevent oneshot mode when broadcast device is periodic

2011-03-21T19:45:24+00:00

commit 3a142a0672b48a853f00af61f184c7341ac9c99d upstream.

When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.

Add the necessary check to tick_is_oneshot_available().

Reported-and-tested-by: Seth Forshee 
Signed-off-by: Thomas Gleixner 
LKML-Reference: 
Signed-off-by: Greg Kroah-Hartman

genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now

2011-03-21T19:45:13+00:00

commit 6d83f94db95cfe65d2a6359cccdf61cf087c2598 upstream.

With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
in request_threaded_irq().

The original implementation (commit a304e1b8) called the handler
_BEFORE_ it was installed, but that caused problems with handlers
calling disable_irq_nosync(). See commit 377bf1e4.

It's braindead in the first place to call disable_irq_nosync in shared
handlers, but ....

Moving this call after we installed the handler looks innocent, but it
is very subtle broken on SMP.

Interrupt handlers rely on the fact, that the irq core prevents
reentrancy.

Now this debug call violates that promise because we run the handler
w/o the IRQ_INPROGRESS protection - which we cannot apply here because
that would result in a possibly forever masked interrupt line.

A concurrent real hardware interrupt on a different CPU results in
handler reentrancy and can lead to complete wreckage, which was
unfortunately observed in reality and took a fricking long time to
debug.

Leave the code here for now. We want this debug feature, but that's
not easy to fix. We really should get rid of those
disable_irq_nosync() abusers and remove that function completely.

Signed-off-by: Thomas Gleixner 
Cc: Anton Vorontsov 
Cc: David Woodhouse 
Cc: Arjan van de Ven 
Signed-off-by: Greg Kroah-Hartman