<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sched/fair.c, branch v3.3-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug</title>
<updated>2012-01-26T18:38:13+00:00</updated>
<author>
<name>Suresh Siddha</name>
<email>suresh.b.siddha@intel.com</email>
</author>
<published>2012-01-20T02:28:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=71325960d16cd68ea0e22a8da15b2495b0f363f7'/>
<id>71325960d16cd68ea0e22a8da15b2495b0f363f7</id>
<content type='text'>
With the recent nohz scheduler changes, rq's nohz flag
'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared
immediately after the cpu exits idle. This gets cleared as part
of the next tick seen on that cpu.

For the cpu offline support, we need to clear this state
manually. Fix it by registering a cpu notifier, which clears the
nohz idle load balance state for this rq explicitly during the
CPU_DYING notification.

There won't be any nohz updates for that cpu, after the
CPU_DYING notification. But lets be extra paranoid and skip
updating the nohz state in the select_nohz_load_balancer() if
the cpu is not in active state anymore.

Reported-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Reviewed-and-tested-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Tested-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the recent nohz scheduler changes, rq's nohz flag
'NOHZ_TICK_STOPPED' and its associated state doesn't get cleared
immediately after the cpu exits idle. This gets cleared as part
of the next tick seen on that cpu.

For the cpu offline support, we need to clear this state
manually. Fix it by registering a cpu notifier, which clears the
nohz idle load balance state for this rq explicitly during the
CPU_DYING notification.

There won't be any nohz updates for that cpu, after the
CPU_DYING notification. But lets be extra paranoid and skip
updating the nohz state in the select_nohz_load_balancer() if
the cpu is not in active state anymore.

Reported-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Reviewed-and-tested-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Tested-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Signed-off-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1327026538.16150.40.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix lockup by limiting load-balance retries on lock-break</title>
<updated>2012-01-11T16:15:12+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-01-11T12:11:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bced76aeaca03b45e3b4bdb868cada328e497847'/>
<id>bced76aeaca03b45e3b4bdb868cada328e497847</id>
<content type='text'>
Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.

Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.

Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.

Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reported-by: David Ahern &lt;dsahern@gmail.com&gt;
[ logic cleanup as suggested by Eric ]
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.

Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.

Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.

Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Tested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reported-by: David Ahern &lt;dsahern@gmail.com&gt;
[ logic cleanup as suggested by Eric ]
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/tracing: Add a new tracepoint for sleeptime</title>
<updated>2011-12-23T16:56:17+00:00</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2011-12-22T00:15:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1ac9bc6943edf7d181b4b1cc734981350d4f6bae'/>
<id>1ac9bc6943edf7d181b4b1cc734981350d4f6bae</id>
<content type='text'>
If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@infradead.org&gt;
Cc: Andrew Vagin &lt;avagin@openvz.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix cgroup movement of waking process</title>
<updated>2011-12-21T09:34:52+00:00</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2011-12-15T05:37:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=62af3783e4fd8ba9e28416e8e91cb3bdd9fb133e'/>
<id>62af3783e4fd8ba9e28416e8e91cb3bdd9fb133e</id>
<content type='text'>
There is a small race between try_to_wake_up() and sched_move_task(),
which is trying to move the process being woken up.

    try_to_wake_up() on CPU0       sched_move_task() on CPU1
--------------------------------+---------------------------------
  raw_spin_lock_irqsave(p-&gt;pi_lock)
  task_waking_fair()
    -&gt;p.se.vruntime -= cfs_rq-&gt;min_vruntime
  ttwu_queue()
    -&gt;send reschedule IPI to CPU1
  raw_spin_unlock_irqsave(p-&gt;pi_lock)
                                   task_rq_lock()
                                     -&gt; tring to aquire both p-&gt;pi_lock and
                                        rq-&gt;lock with IRQ disabled
                                   task_move_group_fair()
                                     -&gt; p.se.vruntime
                                          -= (old)cfs_rq-&gt;min_vruntime
                                          += (new)cfs_rq-&gt;min_vruntime
                                   task_rq_unlock()

                                   (via IPI)
                                   sched_ttwu_pending()
                                     raw_spin_lock(rq-&gt;lock)
                                     ttwu_do_activate()
                                       ...
                                       enqueue_entity()
                                         child.se-&gt;vruntime += cfs_rq-&gt;min_vruntime
                                     raw_spin_unlock(rq-&gt;lock)

As a result, vruntime of the process becomes far bigger than min_vruntime,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_waking_fair().

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143741.df82dd50.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a small race between try_to_wake_up() and sched_move_task(),
which is trying to move the process being woken up.

    try_to_wake_up() on CPU0       sched_move_task() on CPU1
--------------------------------+---------------------------------
  raw_spin_lock_irqsave(p-&gt;pi_lock)
  task_waking_fair()
    -&gt;p.se.vruntime -= cfs_rq-&gt;min_vruntime
  ttwu_queue()
    -&gt;send reschedule IPI to CPU1
  raw_spin_unlock_irqsave(p-&gt;pi_lock)
                                   task_rq_lock()
                                     -&gt; tring to aquire both p-&gt;pi_lock and
                                        rq-&gt;lock with IRQ disabled
                                   task_move_group_fair()
                                     -&gt; p.se.vruntime
                                          -= (old)cfs_rq-&gt;min_vruntime
                                          += (new)cfs_rq-&gt;min_vruntime
                                   task_rq_unlock()

                                   (via IPI)
                                   sched_ttwu_pending()
                                     raw_spin_lock(rq-&gt;lock)
                                     ttwu_do_activate()
                                       ...
                                       enqueue_entity()
                                         child.se-&gt;vruntime += cfs_rq-&gt;min_vruntime
                                     raw_spin_unlock(rq-&gt;lock)

As a result, vruntime of the process becomes far bigger than min_vruntime,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_waking_fair().

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143741.df82dd50.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix cgroup movement of newly created process</title>
<updated>2011-12-21T09:34:51+00:00</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2011-12-15T05:36:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7ceff013c43c0f38f0d26c79507889c6791c0ea0'/>
<id>7ceff013c43c0f38f0d26c79507889c6791c0ea0</id>
<content type='text'>
There is a small race between do_fork() and sched_move_task(), which is
trying to move the child.

            do_fork()                 sched_move_task()
--------------------------------+---------------------------------
  copy_process()
    sched_fork()
      task_fork_fair()
        -&gt; vruntime of the child is initialized
           based on that of the parent.
  -&gt; we can see the child in "tasks" file now.
                                    task_rq_lock()
                                    task_move_group_fair()
                                      -&gt; child.se.vruntime
                                           -= (old)cfs_rq-&gt;min_vruntime
                                           += (new)cfs_rq-&gt;min_vruntime
                                    task_rq_unlock()
  wake_up_new_task()
    ...
    enqueue_entity()
      child.se.vruntime += cfs_rq-&gt;min_vruntime

As a result, vruntime of the child becomes far bigger than min_vruntime,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_fork_fair().

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143607.2ee12c5d.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a small race between do_fork() and sched_move_task(), which is
trying to move the child.

            do_fork()                 sched_move_task()
--------------------------------+---------------------------------
  copy_process()
    sched_fork()
      task_fork_fair()
        -&gt; vruntime of the child is initialized
           based on that of the parent.
  -&gt; we can see the child in "tasks" file now.
                                    task_rq_lock()
                                    task_move_group_fair()
                                      -&gt; child.se.vruntime
                                           -= (old)cfs_rq-&gt;min_vruntime
                                           += (new)cfs_rq-&gt;min_vruntime
                                    task_rq_unlock()
  wake_up_new_task()
    ...
    enqueue_entity()
      child.se.vruntime += cfs_rq-&gt;min_vruntime

As a result, vruntime of the child becomes far bigger than min_vruntime,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by just ignoring such process in
task_move_group_fair(), because the vruntime has already been normalized in
task_fork_fair().

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143607.2ee12c5d.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix cgroup movement of forking process</title>
<updated>2011-12-21T09:34:49+00:00</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2011-12-15T05:36:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4fc420c91f53e0a9f95665c6b14a1983716081e7'/>
<id>4fc420c91f53e0a9f95665c6b14a1983716081e7</id>
<content type='text'>
There is a small race between task_fork_fair() and sched_move_task(),
which is trying to move the parent.

        task_fork_fair()                 sched_move_task()
--------------------------------+---------------------------------
  cfs_rq = task_cfs_rq(current)
    -&gt; cfs_rq is the "old" one.
  curr = cfs_rq-&gt;curr
    -&gt; curr is set to the parent.
                                    task_rq_lock()
                                    dequeue_task()
                                      -&gt;parent.se.vruntime -= (old)cfs_rq-&gt;min_vruntime
                                    enqueue_task()
                                      -&gt;parent.se.vruntime += (new)cfs_rq-&gt;min_vruntime
                                    task_rq_unlock()
  raw_spin_lock_irqsave(rq-&gt;lock)
  se-&gt;vruntime = curr-&gt;vruntime
    -&gt; vruntime of the child is set to that of the parent
       which has already been updated by sched_move_task().
  se-&gt;vruntime -= (old)cfs_rq-&gt;min_vruntime.
  raw_spin_unlock_irqrestore(rq-&gt;lock)

As a result, vruntime of the child becomes far bigger than expected,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by setting "cfs_rq" and "curr" after
holding the rq-&gt;lock.

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143655.662676b0.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a small race between task_fork_fair() and sched_move_task(),
which is trying to move the parent.

        task_fork_fair()                 sched_move_task()
--------------------------------+---------------------------------
  cfs_rq = task_cfs_rq(current)
    -&gt; cfs_rq is the "old" one.
  curr = cfs_rq-&gt;curr
    -&gt; curr is set to the parent.
                                    task_rq_lock()
                                    dequeue_task()
                                      -&gt;parent.se.vruntime -= (old)cfs_rq-&gt;min_vruntime
                                    enqueue_task()
                                      -&gt;parent.se.vruntime += (new)cfs_rq-&gt;min_vruntime
                                    task_rq_unlock()
  raw_spin_lock_irqsave(rq-&gt;lock)
  se-&gt;vruntime = curr-&gt;vruntime
    -&gt; vruntime of the child is set to that of the parent
       which has already been updated by sched_move_task().
  se-&gt;vruntime -= (old)cfs_rq-&gt;min_vruntime.
  raw_spin_unlock_irqrestore(rq-&gt;lock)

As a result, vruntime of the child becomes far bigger than expected,
if (new)cfs_rq-&gt;min_vruntime &gt;&gt; (old)cfs_rq-&gt;min_vruntime.

This patch fixes this problem by setting "cfs_rq" and "curr" after
holding the rq-&gt;lock.

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Paul Turner &lt;pjt@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20111215143655.662676b0.nishimura@mxp.nes.nec.co.jp
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix load-balance lock-breaking</title>
<updated>2011-12-21T09:34:47+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-09-22T13:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a195f004e9496b4d99f471bb96e0a0c1af080909'/>
<id>a195f004e9496b4d99f471bb96e0a0c1af080909</id>
<content type='text'>
The current lock break relies on contention on the rq locks, something
which might never come because we've got IRQs disabled. Or will be
very likely because on anything with more than 2 cpus a synchronized
load-balance pass will very likely cause contention on the rq locks.

Also the sched_nr_migrate thing fails when it gets trapped the loops
of either the cgroup muck in load_balance_fair() or the move_tasks()
load condition.

Instead, use the new lb_flags field to propagate break/abort
conditions for all these loops and create a new loop outside the irq
disabled on the break being required.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-tsceb6w61q0gakmsccix6xxi@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current lock break relies on contention on the rq locks, something
which might never come because we've got IRQs disabled. Or will be
very likely because on anything with more than 2 cpus a synchronized
load-balance pass will very likely cause contention on the rq locks.

Also the sched_nr_migrate thing fails when it gets trapped the loops
of either the cgroup muck in load_balance_fair() or the move_tasks()
load condition.

Instead, use the new lb_flags field to propagate break/abort
conditions for all these loops and create a new loop outside the irq
disabled on the break being required.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-tsceb6w61q0gakmsccix6xxi@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Replace all_pinned with a generic flags field</title>
<updated>2011-12-21T09:34:45+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-09-22T13:23:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5b54b56be5b540a9cb12682c4d0df5454c098a38'/>
<id>5b54b56be5b540a9cb12682c4d0df5454c098a38</id>
<content type='text'>
Replace the all_pinned argument with a flags field so that we can add
some extra controls throughout that entire call chain.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-33kevm71m924ok1gpxd720v3@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace the all_pinned argument with a flags field so that we can add
some extra controls throughout that entire call chain.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-33kevm71m924ok1gpxd720v3@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Only queue remote wakeups when crossing cache boundaries</title>
<updated>2011-12-21T09:34:44+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-12-07T14:07:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=518cd62341786aa4e3839810832af2fbc0de1ea4'/>
<id>518cd62341786aa4e3839810832af2fbc0de1ea4</id>
<content type='text'>
Mike reported a 13% drop in netperf TCP_RR performance due to the
new remote wakeup code. Suresh too noticed some performance issues
with it.

Reducing the IPIs to only cross cache domains solves the observed
performance issues.

Reported-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Acked-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Dave Kleikamp &lt;dave.kleikamp@oracle.com&gt;
Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mike reported a 13% drop in netperf TCP_RR performance due to the
new remote wakeup code. Suresh too noticed some performance issues
with it.

Reducing the IPIs to only cross cache domains solves the observed
performance issues.

Reported-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Acked-by: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Acked-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Chris Mason &lt;chris.mason@oracle.com&gt;
Cc: Dave Kleikamp &lt;dave.kleikamp@oracle.com&gt;
Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched, nohz: Fix missing RCU read lock</title>
<updated>2011-12-08T04:45:48+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2011-12-07T13:32:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=067491b7313c41f49607fce782d29344d1472587'/>
<id>067491b7313c41f49607fce782d29344d1472587</id>
<content type='text'>
Yong Zhang reported:

 &gt; [ INFO: suspicious RCU usage. ]
 &gt; kernel/sched/fair.c:5091 suspicious rcu_dereference_check() usage!

This is due to the sched_domain stuff being RCU protected and
commit 0b005cf5 ("sched, nohz: Implement sched group, domain
aware nohz idle load balancing") overlooking this fact.

The sd variable only lives inside the for_each_domain() block,
so we only need to wrap that.

Reported-by: Yong Zhang &lt;yong.zhang0@gmail.com&gt;
Tested-by: Yong Zhang &lt;yong.zhang0@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Link: http://lkml.kernel.org/r/1323264728.32012.107.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Yong Zhang reported:

 &gt; [ INFO: suspicious RCU usage. ]
 &gt; kernel/sched/fair.c:5091 suspicious rcu_dereference_check() usage!

This is due to the sched_domain stuff being RCU protected and
commit 0b005cf5 ("sched, nohz: Implement sched group, domain
aware nohz idle load balancing") overlooking this fact.

The sd variable only lives inside the for_each_domain() block,
so we only need to wrap that.

Reported-by: Yong Zhang &lt;yong.zhang0@gmail.com&gt;
Tested-by: Yong Zhang &lt;yong.zhang0@gmail.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Link: http://lkml.kernel.org/r/1323264728.32012.107.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
</feed>
