<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sys.c, branch v2.6.33.1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>kernel/sys.c: fix missing rcu protection for sys_getpriority()</title>
<updated>2010-02-23T03:50:34+00:00</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2010-02-22T20:44:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=701188374b6f1ef9cf7e4dce4a2e69ef4c0012ac'/>
<id>701188374b6f1ef9cf7e4dce4a2e69ef4c0012ac</id>
<content type='text'>
find_task_by_vpid() is not safe without rcu_read_lock().  2.6.33-rc7 got
RCU protection for sys_setpriority() but missed it for sys_getpriority().

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
find_task_by_vpid() is not safe without rcu_read_lock().  2.6.33-rc7 got
RCU protection for sys_setpriority() but missed it for sys_getpriority().

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2009-12-19T17:47:34+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-12-19T17:47:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=10e5453ffa0d04a2eda3cda3f55b88cb9c04595f'/>
<id>10e5453ffa0d04a2eda3cda3f55b88cb9c04595f</id>
<content type='text'>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sys: Fix missing rcu protection for __task_cred() access
  signals: Fix more rcu assumptions
  signal: Fix racy access to __task_cred in kill_pid_info_as_uid()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sys: Fix missing rcu protection for __task_cred() access
  signals: Fix more rcu assumptions
  signal: Fix racy access to __task_cred in kill_pid_info_as_uid()
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/sys.c: fix "warning: do-while statement is not a compound statement" noise</title>
<updated>2009-12-15T16:53:26+00:00</updated>
<author>
<name>H Hartley Sweeten</name>
<email>hartleys@visionengravers.com</email>
</author>
<published>2009-12-15T02:00:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dfc6a736d452a8c308190b618b065c2257d370ff'/>
<id>dfc6a736d452a8c308190b618b065c2257d370ff</id>
<content type='text'>
do_each_thread/while_each_thread wrap a block of code that is in this format:

	for (...)
		do
			...
		while

If curly braces do not surround the inner loop the following warning is
generated by sparse:

	warning: do-while statement is not a compound statement

Fix the warning by adding the braces.

Signed-off-by: H Hartley Sweeten &lt;hsweeten@visionengravers.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
do_each_thread/while_each_thread wrap a block of code that is in this format:

	for (...)
		do
			...
		while

If curly braces do not surround the inner loop the following warning is
generated by sparse:

	warning: do-while statement is not a compound statement

Fix the warning by adding the braces.

Signed-off-by: H Hartley Sweeten &lt;hsweeten@visionengravers.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sys: Fix missing rcu protection for __task_cred() access</title>
<updated>2009-12-10T22:04:11+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2009-12-10T00:52:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d4581a239a40319205762b76c01eb6363f277efa'/>
<id>d4581a239a40319205762b76c01eb6363f277efa</id>
<content type='text'>
commit c69e8d9 (CRED: Use RCU to access another task's creds and to
release a task's own creds) added non rcu_read_lock() protected access
to task creds of the target task in set_prio_one().

The comment above the function says:
 * - the caller must hold the RCU read lock

The calling code in sys_setpriority does read_lock(&amp;tasklist_lock) but
not rcu_read_lock(). This works only when CONFIG_TREE_PREEMPT_RCU=n.
With CONFIG_TREE_PREEMPT_RCU=y the rcu_callbacks can run in the tick
interrupt when they see no read side critical section.

There is another instance of __task_cred() in sys_setpriority() itself
which is equally unprotected.

Wrap the whole code section into a rcu read side critical section to
fix this quick and dirty.

Will be revisited in course of the read_lock(&amp;tasklist_lock) -&gt; rcu
crusade.

Oleg noted further:

This also fixes another bug here. find_task_by_vpid() is not safe
without rcu_read_lock(). I do not mean it is not safe to use the
result, just find_pid_ns() by itself is not safe.

Usually tasklist gives enough protection, but if copy_process() fails
it calls free_pid() lockless and does call_rcu(delayed_put_pid().
This means, without rcu lock find_pid_ns() can't scan the hash table
safely.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
LKML-Reference: &lt;20091210004703.029784964@linutronix.de&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c69e8d9 (CRED: Use RCU to access another task's creds and to
release a task's own creds) added non rcu_read_lock() protected access
to task creds of the target task in set_prio_one().

The comment above the function says:
 * - the caller must hold the RCU read lock

The calling code in sys_setpriority does read_lock(&amp;tasklist_lock) but
not rcu_read_lock(). This works only when CONFIG_TREE_PREEMPT_RCU=n.
With CONFIG_TREE_PREEMPT_RCU=y the rcu_callbacks can run in the tick
interrupt when they see no read side critical section.

There is another instance of __task_cred() in sys_setpriority() itself
which is equally unprotected.

Wrap the whole code section into a rcu read side critical section to
fix this quick and dirty.

Will be revisited in course of the read_lock(&amp;tasklist_lock) -&gt; rcu
crusade.

Oleg noted further:

This also fixes another bug here. find_task_by_vpid() is not safe
without rcu_read_lock(). I do not mean it is not safe to use the
result, just find_pid_ns() by itself is not safe.

Usually tasklist gives enough protection, but if copy_process() fails
it calls free_pid() lockless and does call_rcu(delayed_put_pid().
This means, without rcu lock find_pid_ns() can't scan the hash table
safely.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
LKML-Reference: &lt;20091210004703.029784964@linutronix.de&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2009-12-09T16:07:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-12-09T16:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b8ecd22447c4266500c0bcf97f035310543e494'/>
<id>3b8ecd22447c4266500c0bcf97f035310543e494</id>
<content type='text'>
* 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sys: Remove BKL from sys_reboot
  pm_qos: clean up racy global "name" variable
  pm_qos: remove BKL
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'bkl-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sys: Remove BKL from sys_reboot
  pm_qos: clean up racy global "name" variable
  pm_qos: remove BKL
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, cputime: Introduce thread_group_times()</title>
<updated>2009-12-02T16:32:40+00:00</updated>
<author>
<name>Hidetoshi Seto</name>
<email>seto.hidetoshi@jp.fujitsu.com</email>
</author>
<published>2009-12-02T08:28:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0cf55e1ec08bb5a22e068309e2d8ba1180ab4239'/>
<id>0cf55e1ec08bb5a22e068309e2d8ba1180ab4239</id>
<content type='text'>
This is a real fix for problem of utime/stime values decreasing
described in the thread:

   http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

 - {u,s}time in task_struct are increased every time when the thread
   is interrupted by a tick (timer interrupt).

 - When a thread exits, its {u,s}time are added to signal-&gt;{u,s}time,
   after adjusted by task_times().

 - When all threads in a thread_group exits, accumulated {u,s}time
   (and also c{u,s}time) in signal struct are added to c{u,s}time
   in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

 - task_times(), to get cputime of a thread:
   This function returns adjusted values that originates from raw
   {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

 - thread_group_cputime(), to get cputime of a thread group:
   This function returns sum of all {u,s}time of living threads in
   the group, plus {u,s}time in the signal struct that is sum of
   adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

  group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

  group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

 - Modify thread's exit (= __exit_signal()) not to use task_times().
   It means {u,s}time in signal struct accumulates raw values instead
   of adjusted values.  As the result it makes thread_group_cputime()
   to return pure sum of "raw" values.

 - Introduce a new function thread_group_times(*task, *utime, *stime)
   that converts "raw" values of thread_group_cputime() to "adjusted"
   values, in same calculation procedure as task_times().

 - Modify group's exit (= wait_task_zombie()) to use this introduced
   thread_group_times().  It make c{u,s}time in signal struct to
   have adjusted values like before this patch.

 - Replace some thread_group_cputime() by thread_group_times().
   This replacements are only applied where conveys the "adjusted"
   cputime to users, and where already uses task_times() near by it.
   (i.e. sys_times(), getrusage(), and /proc/&lt;PID&gt;/stat.)

This patch have a positive side effect:

 - Before this patch, if a group contains many short-life threads
   (e.g. runs 0.9ms and not interrupted by ticks), the group's
   cputime could be invisible since thread's cputime was accumulated
   after adjusted: imagine adjustment function as adj(ticks, runtime),
     {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
   After this patch it will not happen because the adjustment is
   applied after accumulated.

v2:
 - remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Spencer Candland &lt;spencer@bluehost.com&gt;
Cc: Americo Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
LKML-Reference: &lt;4B162517.8040909@jp.fujitsu.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a real fix for problem of utime/stime values decreasing
described in the thread:

   http://lkml.org/lkml/2009/11/3/522

Now cputime is accounted in the following way:

 - {u,s}time in task_struct are increased every time when the thread
   is interrupted by a tick (timer interrupt).

 - When a thread exits, its {u,s}time are added to signal-&gt;{u,s}time,
   after adjusted by task_times().

 - When all threads in a thread_group exits, accumulated {u,s}time
   (and also c{u,s}time) in signal struct are added to c{u,s}time
   in signal struct of the group's parent.

So {u,s}time in task struct are "raw" tick count, while
{u,s}time and c{u,s}time in signal struct are "adjusted" values.

And accounted values are used by:

 - task_times(), to get cputime of a thread:
   This function returns adjusted values that originates from raw
   {u,s}time and scaled by sum_exec_runtime that accounted by CFS.

 - thread_group_cputime(), to get cputime of a thread group:
   This function returns sum of all {u,s}time of living threads in
   the group, plus {u,s}time in the signal struct that is sum of
   adjusted cputimes of all exited threads belonged to the group.

The problem is the return value of thread_group_cputime(),
because it is mixed sum of "raw" value and "adjusted" value:

  group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)

This misbehavior can break {u,s}time monotonicity.
Assume that if there is a thread that have raw values greater
than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
but only runs 45ms) and if it exits, cputime will decrease (e.g.
-5ms).

To fix this, we could do:

  group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)

But task_times() contains hard divisions, so applying it for
every thread should be avoided.

This patch fixes the above problem in the following way:

 - Modify thread's exit (= __exit_signal()) not to use task_times().
   It means {u,s}time in signal struct accumulates raw values instead
   of adjusted values.  As the result it makes thread_group_cputime()
   to return pure sum of "raw" values.

 - Introduce a new function thread_group_times(*task, *utime, *stime)
   that converts "raw" values of thread_group_cputime() to "adjusted"
   values, in same calculation procedure as task_times().

 - Modify group's exit (= wait_task_zombie()) to use this introduced
   thread_group_times().  It make c{u,s}time in signal struct to
   have adjusted values like before this patch.

 - Replace some thread_group_cputime() by thread_group_times().
   This replacements are only applied where conveys the "adjusted"
   cputime to users, and where already uses task_times() near by it.
   (i.e. sys_times(), getrusage(), and /proc/&lt;PID&gt;/stat.)

This patch have a positive side effect:

 - Before this patch, if a group contains many short-life threads
   (e.g. runs 0.9ms and not interrupted by ticks), the group's
   cputime could be invisible since thread's cputime was accumulated
   after adjusted: imagine adjustment function as adj(ticks, runtime),
     {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
   After this patch it will not happen because the adjustment is
   applied after accumulated.

v2:
 - remove if()s, put new variables into signal_struct.

Signed-off-by: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Spencer Candland &lt;spencer@bluehost.com&gt;
Cc: Americo Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
LKML-Reference: &lt;4B162517.8040909@jp.fujitsu.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Introduce task_times() to replace task_{u,s}time() pair</title>
<updated>2009-11-26T11:59:19+00:00</updated>
<author>
<name>Hidetoshi Seto</name>
<email>seto.hidetoshi@jp.fujitsu.com</email>
</author>
<published>2009-11-26T05:48:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d180c5bccec02612256fd8076ff3c1fac3429553'/>
<id>d180c5bccec02612256fd8076ff3c1fac3429553</id>
<content type='text'>
Functions task_{u,s}time() are called in pair in almost all
cases.  However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.

It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.

This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.

Signed-off-by: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Spencer Candland &lt;spencer@bluehost.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Americo Wang &lt;xiyou.wangcong@gmail.com&gt;
LKML-Reference: &lt;4B0E16AE.906@jp.fujitsu.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Functions task_{u,s}time() are called in pair in almost all
cases.  However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.

It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.

This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.

Signed-off-by: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Spencer Candland &lt;spencer@bluehost.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Americo Wang &lt;xiyou.wangcong@gmail.com&gt;
LKML-Reference: &lt;4B0E16AE.906@jp.fujitsu.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6</title>
<updated>2009-10-29T15:20:00+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-10-29T15:20:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3242f9804ba992c867360e2b57efc268b8e4e175'/>
<id>3242f9804ba992c867360e2b57efc268b8e4e175</id>
<content type='text'>
* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
  HWPOISON: fix invalid page count in printk output
  HWPOISON: Allow schedule_on_each_cpu() from keventd
  HWPOISON: fix/proc/meminfo alignment
  HWPOISON: fix oops on ksm pages
  HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
  HWPOISON: return early on non-LRU pages
  HWPOISON: Add brief hwpoison description to Documentation
  HWPOISON: Clean up PR_MCE_KILL interface
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'hwpoison-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
  HWPOISON: fix invalid page count in printk output
  HWPOISON: Allow schedule_on_each_cpu() from keventd
  HWPOISON: fix/proc/meminfo alignment
  HWPOISON: fix oops on ksm pages
  HWPOISON: Fix page count leak in hwpoison late kill in do_swap_page
  HWPOISON: return early on non-LRU pages
  HWPOISON: Add brief hwpoison description to Documentation
  HWPOISON: Clean up PR_MCE_KILL interface
</pre>
</div>
</content>
</entry>
<entry>
<title>connector: fix regression introduced by sid connector</title>
<updated>2009-10-29T14:39:25+00:00</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2009-10-26T23:49:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0d0df599f1f11f12d589318bacb59a50fb5c0310'/>
<id>0d0df599f1f11f12d589318bacb59a50fb5c0310</id>
<content type='text'>
Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([&lt;000000013fe04100&gt;] 0x13fe04100)
 [&lt;000000000048a946&gt;] sk_filter+0x9a/0xd0
 [&lt;000000000049d938&gt;] netlink_broadcast+0x2c0/0x53c
 [&lt;00000000003ba9ae&gt;] cn_netlink_send+0x272/0x2b0
 [&lt;00000000003baef0&gt;] proc_sid_connector+0xc4/0xd4
 [&lt;0000000000142604&gt;] __set_special_pids+0x58/0x90
 [&lt;0000000000159938&gt;] sys_setsid+0xb4/0xd8
 [&lt;00000000001187fe&gt;] sysc_noemu+0x10/0x16
 [&lt;00000041616cb266&gt;] 0x41616cb266

The warning is
---&gt;    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err &gt; 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Scott James Remnant &lt;scott@ubuntu.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 02b51df1b07b4e9ca823c89284e704cadb323cd1 (proc connector: add
event for process becoming session leader) we have the following warning:

Badness at kernel/softirq.c:143
[...]
Krnl PSW : 0404c00180000000 00000000001481d4 (local_bh_enable+0xb0/0xe0)
[...]
Call Trace:
([&lt;000000013fe04100&gt;] 0x13fe04100)
 [&lt;000000000048a946&gt;] sk_filter+0x9a/0xd0
 [&lt;000000000049d938&gt;] netlink_broadcast+0x2c0/0x53c
 [&lt;00000000003ba9ae&gt;] cn_netlink_send+0x272/0x2b0
 [&lt;00000000003baef0&gt;] proc_sid_connector+0xc4/0xd4
 [&lt;0000000000142604&gt;] __set_special_pids+0x58/0x90
 [&lt;0000000000159938&gt;] sys_setsid+0xb4/0xd8
 [&lt;00000000001187fe&gt;] sysc_noemu+0x10/0x16
 [&lt;00000041616cb266&gt;] 0x41616cb266

The warning is
---&gt;    WARN_ON_ONCE(in_irq() || irqs_disabled());

The network code must not be called with disabled interrupts but
sys_setsid holds the tasklist_lock with spinlock_irq while calling the
connector.

After a discussion we agreed that we can move proc_sid_connector from
__set_special_pids to sys_setsid.

We also agreed that it is sufficient to change the check from
task_session(curr) != pid into err &gt; 0, since if we don't change the
session, this means we were already the leader and return -EPERM.

One last thing:
There is also daemonize(), and some people might want to get a
notification in that case. Since daemonize() is only needed if a user
space does kernel_thread this does not look important (and there seems
to be no consensus if this connector should be called in daemonize). If
we really want this, we can add proc_sid_connector to daemonize() in an
additional patch (Scott?)

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Scott James Remnant &lt;scott@ubuntu.com&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sys: Remove BKL from sys_reboot</title>
<updated>2009-10-14T13:31:10+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2009-10-09T18:31:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f15fa50087c8317e353145319466afbeb27a75d'/>
<id>6f15fa50087c8317e353145319466afbeb27a75d</id>
<content type='text'>
Serialization of sys_reboot can be done local. The BKL is not
protecting anything else.

LKML-Reference: &lt;20091010153349.405590702@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Serialization of sys_reboot can be done local. The BKL is not
protecting anything else.

LKML-Reference: &lt;20091010153349.405590702@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
