<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel, branch v2.6.33.9</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code</title>
<updated>2011-03-28T14:31:20+00:00</updated>
<author>
<name>Julien Tinnes</name>
<email>jln@google.com</email>
</author>
<published>2011-03-18T22:05:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba8c721fdba8bfee6f87725cad3bbc28e20f9692'/>
<id>ba8c721fdba8bfee6f87725cad3bbc28e20f9692</id>
<content type='text'>
commit da48524eb20662618854bb3df2db01fc65f3070c upstream.

Userland should be able to trust the pid and uid of the sender of a
signal if the si_code is SI_TKILL.

Unfortunately, the kernel has historically allowed sigqueueinfo() to
send any si_code at all (as long as it was negative - to distinguish it
from kernel-generated signals like SIGILL etc), so it could spoof a
SI_TKILL with incorrect siginfo values.

Happily, it looks like glibc has always set si_code to the appropriate
SI_QUEUE, so there are probably no actual user code that ever uses
anything but the appropriate SI_QUEUE flag.

So just tighten the check for si_code (we used to allow any negative
value), and add a (one-time) warning in case there are binaries out
there that might depend on using other si_code values.

Signed-off-by: Julien Tinnes &lt;jln@google.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit da48524eb20662618854bb3df2db01fc65f3070c upstream.

Userland should be able to trust the pid and uid of the sender of a
signal if the si_code is SI_TKILL.

Unfortunately, the kernel has historically allowed sigqueueinfo() to
send any si_code at all (as long as it was negative - to distinguish it
from kernel-generated signals like SIGILL etc), so it could spoof a
SI_TKILL with incorrect siginfo values.

Happily, it looks like glibc has always set si_code to the appropriate
SI_QUEUE, so there are probably no actual user code that ever uses
anything but the appropriate SI_QUEUE flag.

So just tighten the check for si_code (we used to allow any negative
value), and add a (one-time) warning in case there are binaries out
there that might depend on using other si_code values.

Signed-off-by: Julien Tinnes &lt;jln@google.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>smp_call_function_many: handle concurrent clearing of mask</title>
<updated>2011-03-28T14:31:17+00:00</updated>
<author>
<name>Milton Miller</name>
<email>miltonm@bga.com</email>
</author>
<published>2011-03-15T19:27:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ff4e140f90d39616973a61512a7e245fea967334'/>
<id>ff4e140f90d39616973a61512a7e245fea967334</id>
<content type='text'>
commit 723aae25d5cdb09962901d36d526b44d4be1051c upstream.

Mike Galbraith reported finding a lockup ("perma-spin bug") where the
cpumask passed to smp_call_function_many was cleared by other cpu(s)
while a cpu was preparing its call_data block, resulting in no cpu to
clear the last ref and unlock the block.

Having cpus clear their bit asynchronously could be useful on a mask of
cpus that might have a translation context, or cpus that need a push to
complete an rcu window.

Instead of adding a BUG_ON and requiring yet another cpumask copy, just
detect the race and handle it.

Note: arch_send_call_function_ipi_mask must still handle an empty
cpumask because the data block is globally visible before the that arch
callback is made.  And (obviously) there are no guarantees to which cpus
are notified if the mask is changed during the call; only cpus that were
online and had their mask bit set during the whole call are guaranteed
to be called.

Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Reported-by: Jan Beulich &lt;JBeulich@novell.com&gt;
Acked-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 723aae25d5cdb09962901d36d526b44d4be1051c upstream.

Mike Galbraith reported finding a lockup ("perma-spin bug") where the
cpumask passed to smp_call_function_many was cleared by other cpu(s)
while a cpu was preparing its call_data block, resulting in no cpu to
clear the last ref and unlock the block.

Having cpus clear their bit asynchronously could be useful on a mask of
cpus that might have a translation context, or cpus that need a push to
complete an rcu window.

Instead of adding a BUG_ON and requiring yet another cpumask copy, just
detect the race and handle it.

Note: arch_send_call_function_ipi_mask must still handle an empty
cpumask because the data block is globally visible before the that arch
callback is made.  And (obviously) there are no guarantees to which cpus
are notified if the mask is changed during the call; only cpus that were
online and had their mask bit set during the whole call are guaranteed
to be called.

Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Reported-by: Jan Beulich &lt;JBeulich@novell.com&gt;
Acked-by: Jan Beulich &lt;jbeulich@novell.com&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>call_function_many: add missing ordering</title>
<updated>2011-03-21T19:45:53+00:00</updated>
<author>
<name>Milton Miller</name>
<email>miltonm@bga.com</email>
</author>
<published>2011-03-15T19:27:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dae148953f37229ca7be1a80d8c9584792d990b2'/>
<id>dae148953f37229ca7be1a80d8c9584792d990b2</id>
<content type='text'>
commit 45a5791920ae643eafc02e2eedef1a58e341b736 upstream.

Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.

First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing.  This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs.  This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request.  (I was thinking this barrier was present
but was not).

Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu.  This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().

However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu.  By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).

I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function.  I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.

Cc: Paul McKinney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 45a5791920ae643eafc02e2eedef1a58e341b736 upstream.

Paul McKenney's review pointed out two problems with the barriers in the
2.6.38 update to the smp call function many code.

First, a barrier that would force the func and info members of data to
be visible before their consumption in the interrupt handler was
missing.  This can be solved by adding a smp_wmb between setting the
func and info members and setting setting the cpumask; this will pair
with the existing and required smp_rmb ordering the cpumask read before
the read of refs.  This placement avoids the need a second smp_rmb in
the interrupt handler which would be executed on each of the N cpus
executing the call request.  (I was thinking this barrier was present
but was not).

Second, the previous write to refs (establishing the zero that we the
interrupt handler was testing from all cpus) was performed by a third
party cpu.  This would invoke transitivity which, as a recient or
concurrent addition to memory-barriers.txt now explicitly states, would
require a full smp_mb().

However, we know the cpumask will only be set by one cpu (the data
owner) and any preivous iteration of the mask would have cleared by the
reading cpu.  By redundantly writing refs to 0 on the owning cpu before
the smp_wmb, the write to refs will follow the same path as the writes
that set the cpumask, which in turn allows us to keep the barrier in the
interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
will be be executed by N cpus for each of the possible M elements on the
list).

I moved and expanded the comment about our (ab)use of the rcu list
primitives for the concurrent walk earlier into this function.  I
considered moving the first two paragraphs to the queue list head and
lock, but felt it would have been too disconected from the code.

Cc: Paul McKinney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>call_function_many: fix list delete vs add race</title>
<updated>2011-03-21T19:45:52+00:00</updated>
<author>
<name>Milton Miller</name>
<email>miltonm@bga.com</email>
</author>
<published>2011-03-15T19:27:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9b1bd836a0acdd1d939417017535d37cfaf0e6d0'/>
<id>9b1bd836a0acdd1d939417017535d37cfaf0e6d0</id>
<content type='text'>
commit e6cd1e07a185d5f9b0aa75e020df02d3c1c44940 upstream.

Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.

Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.

I tried to force this race with a udelay before the lock &amp; list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful.  Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e6cd1e07a185d5f9b0aa75e020df02d3c1c44940 upstream.

Peter pointed out there was nothing preventing the list_del_rcu in
smp_call_function_interrupt from running before the list_add_rcu in
smp_call_function_many.

Fix this by not setting refs until we have gotten the lock for the list.
Take advantage of the wmb in list_add_rcu to save an explicit additional
one.

I tried to force this race with a udelay before the lock &amp; list_add and
by mixing all 64 online cpus with just 3 random cpus in the mask, but
was unsuccessful.  Still, inspection shows a valid race, and the fix is
a extension of the existing protection window in the current code.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Milton Miller &lt;miltonm@bga.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix user time incorrectly accounted as system time on 32-bit</title>
<updated>2011-03-21T19:45:45+00:00</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2010-09-14T14:35:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3ed36704fb8766909627b59a6564e3d43d7f033e'/>
<id>3ed36704fb8766909627b59a6564e3d43d7f033e</id>
<content type='text'>
commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

 https://bugzilla.redhat.com/show_bug.cgi?id=633037
 https://bugzilla.kernel.org/show_bug.cgi?id=16559

Reported-by: Michael Chapman &lt;redhat-bugzilla@very.puzzling.org&gt;
Reported-by: Ciriaco Garcia de Celis &lt;sysman@etherpilot.com&gt;
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
LKML-Reference: &lt;20100914143513.GB8415@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

 https://bugzilla.redhat.com/show_bug.cgi?id=633037
 https://bugzilla.kernel.org/show_bug.cgi?id=16559

Reported-by: Michael Chapman &lt;redhat-bugzilla@very.puzzling.org&gt;
Reported-by: Ciriaco Garcia de Celis &lt;sysman@etherpilot.com&gt;
Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
LKML-Reference: &lt;20100914143513.GB8415@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>pid: make setpgid() system call use RCU read-side critical section</title>
<updated>2011-03-21T19:45:45+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2010-09-01T00:00:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=527a95060a52006117803e36ac63911540fe0cf3'/>
<id>527a95060a52006117803e36ac63911540fe0cf3</id>
<content type='text'>
commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream.

[   23.584719]
[   23.584720] ===================================================
[   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
[   23.585176] ---------------------------------------------------
[   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
[   23.585176]
[   23.585176] other info that might help us debug this:
[   23.585176]
[   23.585176]
[   23.585176] rcu_scheduler_active = 1, debug_locks = 1
[   23.585176] 1 lock held by rc.sysinit/728:
[   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [&lt;ffffffff8104771f&gt;] sys_setpgid+0x5f/0x193
[   23.585176]
[   23.585176] stack backtrace:
[   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
[   23.585176] Call Trace:
[   23.585176]  [&lt;ffffffff8105b436&gt;] lockdep_rcu_dereference+0x99/0xa2
[   23.585176]  [&lt;ffffffff8104c324&gt;] find_task_by_pid_ns+0x50/0x6a
[   23.585176]  [&lt;ffffffff8104c35b&gt;] find_task_by_vpid+0x1d/0x1f
[   23.585176]  [&lt;ffffffff81047727&gt;] sys_setpgid+0x67/0x193
[   23.585176]  [&lt;ffffffff810029eb&gt;] system_call_fastpath+0x16/0x1b
[   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

It turns out that the setpgid() system call fails to enter an RCU
read-side critical section before doing a PID-to-task_struct translation.
This commit therefore does rcu_read_lock() before the translation, and
also does rcu_read_unlock() after the last use of the returned pointer.

Reported-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream.

[   23.584719]
[   23.584720] ===================================================
[   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
[   23.585176] ---------------------------------------------------
[   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
[   23.585176]
[   23.585176] other info that might help us debug this:
[   23.585176]
[   23.585176]
[   23.585176] rcu_scheduler_active = 1, debug_locks = 1
[   23.585176] 1 lock held by rc.sysinit/728:
[   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [&lt;ffffffff8104771f&gt;] sys_setpgid+0x5f/0x193
[   23.585176]
[   23.585176] stack backtrace:
[   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
[   23.585176] Call Trace:
[   23.585176]  [&lt;ffffffff8105b436&gt;] lockdep_rcu_dereference+0x99/0xa2
[   23.585176]  [&lt;ffffffff8104c324&gt;] find_task_by_pid_ns+0x50/0x6a
[   23.585176]  [&lt;ffffffff8104c35b&gt;] find_task_by_vpid+0x1d/0x1f
[   23.585176]  [&lt;ffffffff81047727&gt;] sys_setpgid+0x67/0x193
[   23.585176]  [&lt;ffffffff810029eb&gt;] system_call_fastpath+0x16/0x1b
[   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

It turns out that the setpgid() system call fails to enter an RCU
read-side critical section before doing a PID-to-task_struct translation.
This commit therefore does rcu_read_lock() before the translation, and
also does rcu_read_unlock() after the last use of the returned pointer.

Reported-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>hw breakpoints: Fix pid namespace bug</title>
<updated>2011-03-21T19:45:42+00:00</updated>
<author>
<name>Matt Helsley</name>
<email>matthltc@us.ibm.com</email>
</author>
<published>2010-09-13T20:01:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6192bed1743d6a1bfc7c6620b3856f4284e31f4b'/>
<id>6192bed1743d6a1bfc7c6620b3856f4284e31f4b</id>
<content type='text'>
commit 068e35eee9ef98eb4cab55181977e24995d273be upstream.

Hardware breakpoints can't be registered within pid namespaces
because tsk-&gt;pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker &lt;fweisbec@gmail.com&gt; for doing
the bulk of the work finding this bug.

Reported-by: Robin Green &lt;greenrd@greenrd.org&gt;
Signed-off-by: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Prasad &lt;prasad@linux.vnet.ibm.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
LKML-Reference: &lt;f63454af09fb1915717251570423eb9ddd338340.1284407762.git.matthltc@us.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 068e35eee9ef98eb4cab55181977e24995d273be upstream.

Hardware breakpoints can't be registered within pid namespaces
because tsk-&gt;pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker &lt;fweisbec@gmail.com&gt; for doing
the bulk of the work finding this bug.

Reported-by: Robin Green &lt;greenrd@greenrd.org&gt;
Signed-off-by: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Prasad &lt;prasad@linux.vnet.ibm.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Mahesh Salgaonkar &lt;mahesh@linux.vnet.ibm.com&gt;
LKML-Reference: &lt;f63454af09fb1915717251570423eb9ddd338340.1284407762.git.matthltc@us.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Fix unprotected access to task credentials in waitid()</title>
<updated>2011-03-21T19:45:41+00:00</updated>
<author>
<name>Daniel J Blueman</name>
<email>daniel.blueman@gmail.com</email>
</author>
<published>2010-08-17T22:56:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2d41b2aadaf5d3cb8300fcd418f6ae932902b11b'/>
<id>2d41b2aadaf5d3cb8300fcd418f6ae932902b11b</id>
<content type='text'>
commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream.

Using a program like the following:

	#include &lt;stdlib.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/types.h&gt;
	#include &lt;sys/wait.h&gt;

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &amp;infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [&lt;ffffffff81061ce5&gt;] do_wait+0xc5/0x310
	 #1:  (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){-.-...}, at: [&lt;ffffffff810611da&gt;]
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [&lt;ffffffff81095da4&gt;] lockdep_rcu_dereference+0xa4/0xc0
	 [&lt;ffffffff81061b31&gt;] wait_consider_task+0xaf1/0xbe0
	 [&lt;ffffffff81061d15&gt;] do_wait+0xf5/0x310
	 [&lt;ffffffff810620b6&gt;] sys_waitid+0x86/0x1f0
	 [&lt;ffffffff8105fce0&gt;] ? child_wait_callback+0x0/0x70
	 [&lt;ffffffff81003282&gt;] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman &lt;daniel.blueman@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream.

Using a program like the following:

	#include &lt;stdlib.h&gt;
	#include &lt;unistd.h&gt;
	#include &lt;sys/types.h&gt;
	#include &lt;sys/wait.h&gt;

	int main() {
		id_t id;
		siginfo_t infop;
		pid_t res;

		id = fork();
		if (id == 0) { sleep(1); exit(0); }
		kill(id, SIGSTOP);
		alarm(1);
		waitid(P_PID, id, &amp;infop, WCONTINUED);
		return 0;
	}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

	===================================================
	[ INFO: suspicious rcu_dereference_check() usage. ]
	---------------------------------------------------
	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

	other info that might help us debug this:

	rcu_scheduler_active = 1, debug_locks = 1
	2 locks held by waitid02/22252:
	 #0:  (tasklist_lock){.?.?..}, at: [&lt;ffffffff81061ce5&gt;] do_wait+0xc5/0x310
	 #1:  (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){-.-...}, at: [&lt;ffffffff810611da&gt;]
	wait_consider_task+0x19a/0xbe0

	stack backtrace:
	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
	Call Trace:
	 [&lt;ffffffff81095da4&gt;] lockdep_rcu_dereference+0xa4/0xc0
	 [&lt;ffffffff81061b31&gt;] wait_consider_task+0xaf1/0xbe0
	 [&lt;ffffffff81061d15&gt;] do_wait+0xf5/0x310
	 [&lt;ffffffff810620b6&gt;] sys_waitid+0x86/0x1f0
	 [&lt;ffffffff8105fce0&gt;] ? child_wait_callback+0x0/0x70
	 [&lt;ffffffff81003282&gt;] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman &lt;daniel.blueman@gmail.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ftrace: Fix memory leak with function graph and cpu hotplug</title>
<updated>2011-03-21T19:45:37+00:00</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2011-02-11T02:26:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b0e080d125f14ce2cde9a2ab50b2c2d0fb5609df'/>
<id>b0e080d125f14ce2cde9a2ab50b2c2d0fb5609df</id>
<content type='text'>
commit 868baf07b1a259f5f3803c1dc2777b6c358f83cf upstream.

When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.

On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.

ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.

The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.

Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Suggested-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 868baf07b1a259f5f3803c1dc2777b6c358f83cf upstream.

When the fuction graph tracer starts, it needs to make a special
stack for each task to save the real return values of the tasks.
All running tasks have this stack created, as well as any new
tasks.

On CPU hot plug, the new idle task will allocate a stack as well
when init_idle() is called. The problem is that cpu hotplug does
not create a new idle_task. Instead it uses the idle task that
existed when the cpu went down.

ftrace_graph_init_task() will add a new ret_stack to the task
that is given to it. Because a clone will make the task
have a stack of its parent it does not check if the task's
ret_stack is already NULL or not. When the CPU hotplug code
starts a CPU up again, it will allocate a new stack even
though one already existed for it.

The solution is to treat the idle_task specially. In fact, the
function_graph code already does, just not at init_idle().
Instead of using the ftrace_graph_init_task() for the idle task,
which that function expects the task to be a clone, have a
separate ftrace_graph_init_idle_task(). Also, we will create a
per_cpu ret_stack that is used by the idle task. When we call
ftrace_graph_init_idle_task() it will check if the idle task's
ret_stack is NULL, if it is, then it will assign it the per_cpu
ret_stack.

Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Suggested-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: add a missing unlock in cpuset_write_resmask()</title>
<updated>2011-03-21T19:45:28+00:00</updated>
<author>
<name>Li Zefan</name>
<email>lizf@cn.fujitsu.com</email>
</author>
<published>2011-03-05T01:36:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2c103bfa4db025da9ab9428981e9e8afcc765492'/>
<id>2c103bfa4db025da9ab9428981e9e8afcc765492</id>
<content type='text'>
commit b75f38d659e6fc747eda64cb72f3920e29dd44a4 upstream.

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b75f38d659e6fc747eda64cb72f3920e29dd44a4 upstream.

Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.

[akpm@linux-foundation.org: avoid multiple return points]
Signed-off-by: Li Zefan &lt;lizf@cn.fujitsu.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
</feed>
