<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sched/core.c, branch v4.1.15</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>sched/preempt: Fix cond_resched_lock() and cond_resched_softirq()</title>
<updated>2015-10-27T00:51:58+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-07-15T09:52:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=98197d3de58a62785be3e421864d6145955f197d'/>
<id>98197d3de58a62785be3e421864d6145955f197d</id>
<content type='text'>
commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero =&gt; should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument "preempt_offset" to should_resched().

preempt_count offset constants for that:

  PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
  PREEMPT_LOCK_OFFSET     - offset after spin_lock()
  SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
  SOFTIRQ_LOCK_OFFSET     - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit fe32d3cd5e8eb0f82e459763374aa80797023403 upstream.

These functions check should_resched() before unlocking spinlock/bh-enable:
preempt_count always non-zero =&gt; should_resched() always returns false.
cond_resched_lock() worked iff spin_needbreak is set.

This patch adds argument "preempt_offset" to should_resched().

preempt_count offset constants for that:

  PREEMPT_DISABLE_OFFSET  - offset after preempt_disable()
  PREEMPT_LOCK_OFFSET     - offset after spin_lock()
  SOFTIRQ_DISABLE_OFFSET  - offset after local_bh_distable()
  SOFTIRQ_LOCK_OFFSET     - offset after spin_lock_bh()

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Alexander Graf &lt;agraf@suse.de&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: David Vrabel &lt;david.vrabel@citrix.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: bdb438065890 ("sched: Extract the basic add/sub preempt_count modifiers")
Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzz
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Fix TASK_DEAD race in finish_task_switch()</title>
<updated>2015-10-22T21:43:14+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-09-29T12:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8210b9199f66447513b63c9a5aef988ee60c4319'/>
<id>8210b9199f66447513b63c9a5aef988ee60c4319</id>
<content type='text'>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A-&gt;pi_lock
                                          A-&gt;on_cpu == 0
        finish_task_switch(A)
          prev_state = A-&gt;state  &lt;-.
          WMB                      |
          A-&gt;on_cpu = 0;           |
          UNLOCK rq0-&gt;lock         |
                                   |    context_switch(C, A)
                                   `--  A-&gt;state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A-&gt;state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A-&gt;state on CPU0
to cross over and observe CPU1's store of A-&gt;state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A-&gt;state while holding rq-&gt;lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq-&gt;lock; it takes A-&gt;pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&amp;A-&gt;on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq-&gt;lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.

So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A-&gt;pi_lock
                                          A-&gt;on_cpu == 0
        finish_task_switch(A)
          prev_state = A-&gt;state  &lt;-.
          WMB                      |
          A-&gt;on_cpu = 0;           |
          UNLOCK rq0-&gt;lock         |
                                   |    context_switch(C, A)
                                   `--  A-&gt;state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A-&gt;state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A-&gt;state on CPU0
to cross over and observe CPU1's store of A-&gt;state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A-&gt;state while holding rq-&gt;lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq-&gt;lock; it takes A-&gt;pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&amp;A-&gt;on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq-&gt;lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: access local runqueue directly in single_task_running</title>
<updated>2015-10-22T21:43:12+00:00</updated>
<author>
<name>Dominik Dingel</name>
<email>dingel@linux.vnet.ibm.com</email>
</author>
<published>2015-09-18T09:27:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=feada04ec6c916fa3779d736e75c8453ee3ddc58'/>
<id>feada04ec6c916fa3779d736e75c8453ee3ddc58</id>
<content type='text'>
commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream.

Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).

With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.

To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.

Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel &lt;dingel@linux.vnet.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 00cc1633816de8c95f337608a1ea64e228faf771 upstream.

Commit 2ee507c47293 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id.  When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).

With commit f78195129963 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.

To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.

Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Fixes: 2ee507c472939db4b146d545352b8a7c79ef47f8
Signed-off-by: Dominik Dingel &lt;dingel@linux.vnet.ibm.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Fix cpu_active_mask/cpu_online_mask race</title>
<updated>2015-09-21T17:05:29+00:00</updated>
<author>
<name>Jan H. Schönherr</name>
<email>jschoenh@amazon.de</email>
</author>
<published>2015-08-12T19:35:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=668327e49d2aa4782da771c4b1c6c152bf5084a1'/>
<id>668327e49d2aa4782da771c4b1c6c152bf5084a1</id>
<content type='text'>
commit dd9d3843755da95f63dd3a376f62b3e45c011210 upstream.

There is a race condition in SMP bootup code, which may result
in

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()
or
    kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

	CPU0                        CPUn
	====                        ====

	_cpu_up()
	  __cpu_up()
				    start_secondary()
				      set_cpu_online()
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_online_bits));
	  cpu_notify(CPU_ONLINE)
	    &lt;do stuff, see below&gt;
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

  cpu_notify(CPU_ONLINE)
    workqueue_cpu_up_callback()
      rebind_workers()
        set_cpus_allowed_ptr()

  This call fails because it requires an active CPU; rebind_workers()
  ends with a warning:

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()

Variant 2:

  cpu_notify(CPU_ONLINE)
    smpboot_thread_call()
      smpboot_unpark_threads()
       ..
        __kthread_unpark()
          __kthread_bind()
          wake_up_state()
           ..
            select_task_rq()
              select_fallback_rq()

  The -&gt;wake_cpu of the unparked thread is not allowed, making a call
  to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
  find an allowed, active CPU and promptly resets the allowed CPUs, so
  that the task in question ends up on CPU0.

  When those unparked tasks are eventually executed, they run
  immediately into a BUG:

    kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online")
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

  - secondary CPU sets active before online, because active is assumed to
    be a subset of online;

  - secondary CPU sets online before active, because the primary CPU
    assumes that an online CPU is also active;

  - secondary CPU sets online and waits for primary CPU to set active,
    because it might deadlock.

Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active &amp; online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Wilson &lt;msw@amazon.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit dd9d3843755da95f63dd3a376f62b3e45c011210 upstream.

There is a race condition in SMP bootup code, which may result
in

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()
or
    kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

	CPU0                        CPUn
	====                        ====

	_cpu_up()
	  __cpu_up()
				    start_secondary()
				      set_cpu_online()
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_online_bits));
	  cpu_notify(CPU_ONLINE)
	    &lt;do stuff, see below&gt;
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

  cpu_notify(CPU_ONLINE)
    workqueue_cpu_up_callback()
      rebind_workers()
        set_cpus_allowed_ptr()

  This call fails because it requires an active CPU; rebind_workers()
  ends with a warning:

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()

Variant 2:

  cpu_notify(CPU_ONLINE)
    smpboot_thread_call()
      smpboot_unpark_threads()
       ..
        __kthread_unpark()
          __kthread_bind()
          wake_up_state()
           ..
            select_task_rq()
              select_fallback_rq()

  The -&gt;wake_cpu of the unparked thread is not allowed, making a call
  to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
  find an allowed, active CPU and promptly resets the allowed CPUs, so
  that the task in question ends up on CPU0.

  When those unparked tasks are eventually executed, they run
  immediately into a BUG:

    kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb96976f ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online")
- commit 5fbd036b552f ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

  - secondary CPU sets active before online, because active is assumed to
    be a subset of online;

  - secondary CPU sets online before active, because the primary CPU
    assumes that an online CPU is also active;

  - secondary CPU sets online and waits for primary CPU to set active,
    because it might deadlock.

Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are
active &amp; online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")

Signed-off-by: Jan H. Schönherr &lt;jschoenh@amazon.de&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Joerg Roedel &lt;jroedel@suse.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Matt Wilson &lt;msw@amazon.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-block</title>
<updated>2015-05-22T22:15:30+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-05-22T22:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1c8df7bd48347a707b437cfd0dad6b08a3b89ab6'/>
<id>1c8df7bd48347a707b437cfd0dad6b08a3b89ab6</id>
<content type='text'>
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: always use blk_schedule_flush_plug in io_schedule_out</title>
<updated>2015-05-18T22:06:41+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-05-08T17:51:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=10d784eae2b41e25d8fc6a88096cd27286093c84'/>
<id>10d784eae2b41e25d8fc6a88096cd27286093c84</id>
<content type='text'>
block plug callback could sleep, so we introduce a parameter
'from_schedule' and corresponding drivers can use it to destinguish a
schedule plug flush or a plug finish. Unfortunately io_schedule_out
still uses blk_flush_plug(). This causes below output (Note, I added a
might_sleep() in raid1_unplug to make it trigger faster, but the whole
thing doesn't matter if I add might_sleep). In raid1/10, this can cause
deadlock.

This patch makes io_schedule_out always uses blk_schedule_flush_plug.
This should only impact drivers (as far as I know, raid 1/10) which are
sensitive to the 'from_schedule' parameter.

[  370.817949] ------------[ cut here ]------------
[  370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
[  370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [&lt;ffffffff81092fcf&gt;] prepare_to_wait+0x2f/0x90
[  370.817971] Modules linked in: raid1
[  370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G        W       4.0.0+ #361
[  370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[  370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  370.817985]  ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
[  370.817988]  ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
[  370.817990]  ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
[  370.817993] Call Trace:
[  370.817999]  [&lt;ffffffff819dd7af&gt;] dump_stack+0x4f/0x7b
[  370.818002]  [&lt;ffffffff81051afc&gt;] warn_slowpath_common+0x8c/0xd0
[  370.818004]  [&lt;ffffffff81051b86&gt;] warn_slowpath_fmt+0x46/0x50
[  370.818006]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818008]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818010]  [&lt;ffffffff810776ef&gt;] __might_sleep+0x7f/0x90
[  370.818014]  [&lt;ffffffffa0000c03&gt;] raid1_unplug+0xd3/0x170 [raid1]
[  370.818024]  [&lt;ffffffff81421d9a&gt;] blk_flush_plug_list+0x8a/0x1e0
[  370.818028]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818031]  [&lt;ffffffff819e21b0&gt;] io_schedule_timeout+0x130/0x140
[  370.818033]  [&lt;ffffffff819e3586&gt;] bit_wait_io+0x36/0x50
[  370.818034]  [&lt;ffffffff819e31b5&gt;] __wait_on_bit+0x65/0x90
[  370.818041]  [&lt;ffffffff8125b67c&gt;] ? ext4_read_block_bitmap_nowait+0xbc/0x630
[  370.818043]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818045]  [&lt;ffffffff819e3302&gt;] out_of_line_wait_on_bit+0x72/0x80
[  370.818047]  [&lt;ffffffff810935e0&gt;] ? autoremove_wake_function+0x40/0x40
[  370.818050]  [&lt;ffffffff811de744&gt;] __wait_on_buffer+0x44/0x50
[  370.818053]  [&lt;ffffffff8125ae80&gt;] ext4_wait_block_bitmap+0xe0/0xf0
[  370.818058]  [&lt;ffffffff812975d6&gt;] ext4_mb_init_cache+0x206/0x790
[  370.818062]  [&lt;ffffffff8114bc6c&gt;] ? lru_cache_add+0x1c/0x50
[  370.818064]  [&lt;ffffffff81297c7e&gt;] ext4_mb_init_group+0x11e/0x200
[  370.818066]  [&lt;ffffffff81298231&gt;] ext4_mb_load_buddy+0x341/0x360
[  370.818068]  [&lt;ffffffff8129a1a3&gt;] ext4_mb_find_by_goal+0x93/0x2f0
[  370.818070]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818072]  [&lt;ffffffff8129ab67&gt;] ext4_mb_regular_allocator+0x67/0x460
[  370.818074]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818076]  [&lt;ffffffff8129ca4b&gt;] ext4_mb_new_blocks+0x4cb/0x620
[  370.818079]  [&lt;ffffffff81290956&gt;] ext4_ext_map_blocks+0x4c6/0x14d0
[  370.818081]  [&lt;ffffffff812a4d4e&gt;] ? ext4_es_lookup_extent+0x4e/0x290
[  370.818085]  [&lt;ffffffff8126399d&gt;] ext4_map_blocks+0x14d/0x4f0
[  370.818088]  [&lt;ffffffff81266fbd&gt;] ext4_writepages+0x76d/0xe50
[  370.818094]  [&lt;ffffffff81149691&gt;] do_writepages+0x21/0x50
[  370.818097]  [&lt;ffffffff811d5c00&gt;] __writeback_single_inode+0x60/0x490
[  370.818099]  [&lt;ffffffff811d630a&gt;] writeback_sb_inodes+0x2da/0x590
[  370.818103]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818105]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818107]  [&lt;ffffffff811d665f&gt;] __writeback_inodes_wb+0x9f/0xd0
[  370.818109]  [&lt;ffffffff811d69db&gt;] wb_writeback+0x34b/0x3c0
[  370.818111]  [&lt;ffffffff811d70df&gt;] bdi_writeback_workfn+0x23f/0x550
[  370.818116]  [&lt;ffffffff8106bbd8&gt;] process_one_work+0x1c8/0x570
[  370.818117]  [&lt;ffffffff8106bb5b&gt;] ? process_one_work+0x14b/0x570
[  370.818119]  [&lt;ffffffff8106c09b&gt;] worker_thread+0x11b/0x470
[  370.818121]  [&lt;ffffffff8106bf80&gt;] ? process_one_work+0x570/0x570
[  370.818124]  [&lt;ffffffff81071868&gt;] kthread+0xf8/0x110
[  370.818126]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818129]  [&lt;ffffffff819e9322&gt;] ret_from_fork+0x42/0x70
[  370.818131]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818132] ---[ end trace 7b4deb71e68b6605 ]---

V2: don't change -&gt;in_iowait

Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
block plug callback could sleep, so we introduce a parameter
'from_schedule' and corresponding drivers can use it to destinguish a
schedule plug flush or a plug finish. Unfortunately io_schedule_out
still uses blk_flush_plug(). This causes below output (Note, I added a
might_sleep() in raid1_unplug to make it trigger faster, but the whole
thing doesn't matter if I add might_sleep). In raid1/10, this can cause
deadlock.

This patch makes io_schedule_out always uses blk_schedule_flush_plug.
This should only impact drivers (as far as I know, raid 1/10) which are
sensitive to the 'from_schedule' parameter.

[  370.817949] ------------[ cut here ]------------
[  370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
[  370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [&lt;ffffffff81092fcf&gt;] prepare_to_wait+0x2f/0x90
[  370.817971] Modules linked in: raid1
[  370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G        W       4.0.0+ #361
[  370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[  370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
[  370.817985]  ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
[  370.817988]  ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
[  370.817990]  ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
[  370.817993] Call Trace:
[  370.817999]  [&lt;ffffffff819dd7af&gt;] dump_stack+0x4f/0x7b
[  370.818002]  [&lt;ffffffff81051afc&gt;] warn_slowpath_common+0x8c/0xd0
[  370.818004]  [&lt;ffffffff81051b86&gt;] warn_slowpath_fmt+0x46/0x50
[  370.818006]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818008]  [&lt;ffffffff81092fcf&gt;] ? prepare_to_wait+0x2f/0x90
[  370.818010]  [&lt;ffffffff810776ef&gt;] __might_sleep+0x7f/0x90
[  370.818014]  [&lt;ffffffffa0000c03&gt;] raid1_unplug+0xd3/0x170 [raid1]
[  370.818024]  [&lt;ffffffff81421d9a&gt;] blk_flush_plug_list+0x8a/0x1e0
[  370.818028]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818031]  [&lt;ffffffff819e21b0&gt;] io_schedule_timeout+0x130/0x140
[  370.818033]  [&lt;ffffffff819e3586&gt;] bit_wait_io+0x36/0x50
[  370.818034]  [&lt;ffffffff819e31b5&gt;] __wait_on_bit+0x65/0x90
[  370.818041]  [&lt;ffffffff8125b67c&gt;] ? ext4_read_block_bitmap_nowait+0xbc/0x630
[  370.818043]  [&lt;ffffffff819e3550&gt;] ? bit_wait+0x50/0x50
[  370.818045]  [&lt;ffffffff819e3302&gt;] out_of_line_wait_on_bit+0x72/0x80
[  370.818047]  [&lt;ffffffff810935e0&gt;] ? autoremove_wake_function+0x40/0x40
[  370.818050]  [&lt;ffffffff811de744&gt;] __wait_on_buffer+0x44/0x50
[  370.818053]  [&lt;ffffffff8125ae80&gt;] ext4_wait_block_bitmap+0xe0/0xf0
[  370.818058]  [&lt;ffffffff812975d6&gt;] ext4_mb_init_cache+0x206/0x790
[  370.818062]  [&lt;ffffffff8114bc6c&gt;] ? lru_cache_add+0x1c/0x50
[  370.818064]  [&lt;ffffffff81297c7e&gt;] ext4_mb_init_group+0x11e/0x200
[  370.818066]  [&lt;ffffffff81298231&gt;] ext4_mb_load_buddy+0x341/0x360
[  370.818068]  [&lt;ffffffff8129a1a3&gt;] ext4_mb_find_by_goal+0x93/0x2f0
[  370.818070]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818072]  [&lt;ffffffff8129ab67&gt;] ext4_mb_regular_allocator+0x67/0x460
[  370.818074]  [&lt;ffffffff81295b54&gt;] ? ext4_mb_normalize_request+0x1e4/0x5b0
[  370.818076]  [&lt;ffffffff8129ca4b&gt;] ext4_mb_new_blocks+0x4cb/0x620
[  370.818079]  [&lt;ffffffff81290956&gt;] ext4_ext_map_blocks+0x4c6/0x14d0
[  370.818081]  [&lt;ffffffff812a4d4e&gt;] ? ext4_es_lookup_extent+0x4e/0x290
[  370.818085]  [&lt;ffffffff8126399d&gt;] ext4_map_blocks+0x14d/0x4f0
[  370.818088]  [&lt;ffffffff81266fbd&gt;] ext4_writepages+0x76d/0xe50
[  370.818094]  [&lt;ffffffff81149691&gt;] do_writepages+0x21/0x50
[  370.818097]  [&lt;ffffffff811d5c00&gt;] __writeback_single_inode+0x60/0x490
[  370.818099]  [&lt;ffffffff811d630a&gt;] writeback_sb_inodes+0x2da/0x590
[  370.818103]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818105]  [&lt;ffffffff811abf4b&gt;] ? trylock_super+0x1b/0x50
[  370.818107]  [&lt;ffffffff811d665f&gt;] __writeback_inodes_wb+0x9f/0xd0
[  370.818109]  [&lt;ffffffff811d69db&gt;] wb_writeback+0x34b/0x3c0
[  370.818111]  [&lt;ffffffff811d70df&gt;] bdi_writeback_workfn+0x23f/0x550
[  370.818116]  [&lt;ffffffff8106bbd8&gt;] process_one_work+0x1c8/0x570
[  370.818117]  [&lt;ffffffff8106bb5b&gt;] ? process_one_work+0x14b/0x570
[  370.818119]  [&lt;ffffffff8106c09b&gt;] worker_thread+0x11b/0x470
[  370.818121]  [&lt;ffffffff8106bf80&gt;] ? process_one_work+0x570/0x570
[  370.818124]  [&lt;ffffffff81071868&gt;] kthread+0xf8/0x110
[  370.818126]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818129]  [&lt;ffffffff819e9322&gt;] ret_from_fork+0x42/0x70
[  370.818131]  [&lt;ffffffff81071770&gt;] ? kthread_create_on_node+0x210/0x210
[  370.818132] ---[ end trace 7b4deb71e68b6605 ]---

V2: don't change -&gt;in_iowait

Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Reviewed-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Fix regression in cpuset_cpu_inactive() for suspend</title>
<updated>2015-05-08T09:53:56+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@osandov.com</email>
</author>
<published>2015-05-04T10:09:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=533445c6e53368569e50ab3fb712230c03d523f3'/>
<id>533445c6e53368569e50ab3fb712230c03d523f3</id>
<content type='text'>
Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

	$ ./a.out
	sched_setaffinity: Success
	$ systemctl suspend
	$ ./a.out
	sched_setaffinity: Invalid argument

... where ./a.out is:

	#define _GNU_SOURCE
	#include &lt;errno.h&gt;
	#include &lt;sched.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;string.h&gt;
	#include &lt;unistd.h&gt;

	int main(void)
	{
		long num_cores;
		cpu_set_t cpu_set;
		int ret;

		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
		CPU_ZERO(&amp;cpu_set);
		CPU_SET(num_cores - 1, &amp;cpu_set);
		errno = 0;
		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &amp;cpu_set);
		perror("sched_setaffinity");
		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
	}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval &lt;osandov@osandov.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Juri Lelli &lt;juri.lelli@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

	$ ./a.out
	sched_setaffinity: Success
	$ systemctl suspend
	$ ./a.out
	sched_setaffinity: Invalid argument

... where ./a.out is:

	#define _GNU_SOURCE
	#include &lt;errno.h&gt;
	#include &lt;sched.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;string.h&gt;
	#include &lt;unistd.h&gt;

	int main(void)
	{
		long num_cores;
		cpu_set_t cpu_set;
		int ret;

		num_cores = sysconf(_SC_NPROCESSORS_ONLN);
		CPU_ZERO(&amp;cpu_set);
		CPU_SET(num_cores - 1, &amp;cpu_set);
		errno = 0;
		ret = sched_setaffinity(getpid(), sizeof(cpu_set), &amp;cpu_set);
		perror("sched_setaffinity");
		return ret ? EXIT_FAILURE : EXIT_SUCCESS;
	}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval &lt;osandov@osandov.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Juri Lelli &lt;juri.lelli@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Handle priority boosted tasks proper in setscheduler()</title>
<updated>2015-05-08T09:53:55+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2015-05-05T17:49:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0782e63bc6fe7e2d3408d250df11d388b7799c6b'/>
<id>0782e63bc6fe7e2d3408d250df11d388b7799c6b</id>
<content type='text'>
Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus &lt;ronny.meeus@gmail.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Ronny reported that the following scenario is not handled correctly:

	T1 (prio = 10)
	   lock(rtmutex);

	T2 (prio = 20)
	   lock(rtmutex)
	      boost T1

	T1 (prio = 20)
	   sys_set_scheduler(prio = 30)
	   T1 prio = 30
	   ....
	   sys_set_scheduler(prio = 10)
	   T1 prio = 30

The last step is wrong as T1 should now be back at prio 20.

Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.

Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.

Reported-by: Ronny Meeus &lt;ronny.meeus@gmail.com&gt;
Tested-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86: pvclock: Really remove the sched notifier for cross-cpu migrations</title>
<updated>2015-04-27T13:49:30+00:00</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2015-04-23T11:20:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=73459e2a1ada09a68c02cc5b73f3116fc8194b3d'/>
<id>73459e2a1ada09a68c02cc5b73f3116fc8194b3d</id>
<content type='text'>
This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27
and 80f7fdb1c7f0f9266421f823964fd1962681f6ce.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Suggested-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commits 0a4e6be9ca17c54817cf814b4b5aa60478c6df27
and 80f7fdb1c7f0f9266421f823964fd1962681f6ce.

The task migration notifier was originally introduced in order to support
the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
with synchronized TSC.  Hence, on KVM the race condition is only needed
due to a bad implementation on the host side, and even then it's so rare
that it's mostly theoretical.

As far as KVM is concerned it's possible to fix the host, avoiding the
additional complexity in the vDSO and the (re)introduction of the task
migration notifier.

Xen, on the other hand, hasn't yet implemented vsyscall support at
all, so we do not care about its plans for non-synchronized TSC.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Suggested-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2015-04-14T20:58:48+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-04-14T20:58:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e95e7f627062be5e6ce971ce873e6234c91ffc50'/>
<id>e95e7f627062be5e6ce971ce873e6234c91ffc50</id>
<content type='text'>
Pull NOHZ changes from Ingo Molnar:
 "This tree adds full dynticks support to KVM guests (support the
  disabling of the timer tick on the guest).  The main missing piece was
  the recognition of guest execution as RCU extended quiescent state and
  related changes"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest
  context_tracking: Export context_tracking_user_enter/exit
  context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER
  context_tracking: Add stub context_tracking_is_enabled
  context_tracking: Generalize context tracking APIs to support user and guest
  context_tracking: Rename context symbols to prepare for transition state
  ppc: Remove unused cpp symbols in kvm headers
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull NOHZ changes from Ingo Molnar:
 "This tree adds full dynticks support to KVM guests (support the
  disabling of the timer tick on the guest).  The main missing piece was
  the recognition of guest execution as RCU extended quiescent state and
  related changes"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest
  context_tracking: Export context_tracking_user_enter/exit
  context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER
  context_tracking: Add stub context_tracking_is_enabled
  context_tracking: Generalize context tracking APIs to support user and guest
  context_tracking: Rename context symbols to prepare for transition state
  ppc: Remove unused cpp symbols in kvm headers
</pre>
</div>
</content>
</entry>
</feed>
