<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sched/wait.c, branch v5.1-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>kernel/sched/: remove caller signal_pending branch predictions</title>
<updated>2019-01-04T21:13:48+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-01-03T23:28:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=34ec35ad8f5f4624e8391dbb83afb4c791f027e3'/>
<id>34ec35ad8f5f4624e8391dbb83afb4c791f027e3</id>
<content type='text'>
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/wait: assert the wait_queue_head lock is held in __wake_up_common</title>
<updated>2018-08-22T17:52:47+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2018-08-22T04:56:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e05a8e4d88d16e088d83ce679ac3343ac66c936b'/>
<id>e05a8e4d88d16e088d83ce679ac3343ac66c936b</id>
<content type='text'>
Better ensure we actually hold the lock using lockdep than just commenting
on it.  Due to the various exported _locked interfaces it is far too easy
to get the locking wrong.

Link: http://lkml.kernel.org/r/20171214152344.6880-4-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Better ensure we actually hold the lock using lockdep than just commenting
on it.  Due to the various exported _locked interfaces it is far too easy
to get the locking wrong.

Link: http://lkml.kernel.org/r/20171214152344.6880-4-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/Documentation: Update wake_up() &amp; co. memory-barrier guarantees</title>
<updated>2018-07-17T07:30:34+00:00</updated>
<author>
<name>Andrea Parri</name>
<email>andrea.parri@amarulasolutions.com</email>
</author>
<published>2018-07-16T18:06:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7696f9910a9a40b8a952f57d3428515fabd2d889'/>
<id>7696f9910a9a40b8a952f57d3428515fabd2d889</id>
<content type='text'>
Both the implementation and the users' expectation [1] for the various
wakeup primitives have evolved over time, but the documentation has not
kept up with these changes: brings it into 2018.

[1] http://lkml.kernel.org/r/20180424091510.GB4064@hirez.programming.kicks-ass.net

Also applied feedback from Alan Stern.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Akira Yokosawa &lt;akiyks@gmail.com&gt;
Cc: Alan Stern &lt;stern@rowland.harvard.edu&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Daniel Lustig &lt;dlustig@nvidia.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jade Alglave &lt;j.alglave@ucl.ac.uk&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luc Maranget &lt;luc.maranget@inria.fr&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arch@vger.kernel.org
Cc: parri.andrea@gmail.com
Link: http://lkml.kernel.org/r/20180716180605.16115-12-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both the implementation and the users' expectation [1] for the various
wakeup primitives have evolved over time, but the documentation has not
kept up with these changes: brings it into 2018.

[1] http://lkml.kernel.org/r/20180424091510.GB4064@hirez.programming.kicks-ass.net

Also applied feedback from Alan Stern.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Akira Yokosawa &lt;akiyks@gmail.com&gt;
Cc: Alan Stern &lt;stern@rowland.harvard.edu&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Daniel Lustig &lt;dlustig@nvidia.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Jade Alglave &lt;j.alglave@ucl.ac.uk&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luc Maranget &lt;luc.maranget@inria.fr&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arch@vger.kernel.org
Cc: parri.andrea@gmail.com
Link: http://lkml.kernel.org/r/20180716180605.16115-12-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/core: Use smp_mb() in wake_woken_function()</title>
<updated>2018-07-17T07:30:33+00:00</updated>
<author>
<name>Andrea Parri</name>
<email>andrea.parri@amarulasolutions.com</email>
</author>
<published>2018-07-16T18:06:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=76e079fefc8f62bd9b2cd2950814d1ee806e31a5'/>
<id>76e079fefc8f62bd9b2cd2950814d1ee806e31a5</id>
<content type='text'>
wake_woken_function() synchronizes with wait_woken() as follows:

  [wait_woken]                       [wake_woken_function]

  entry-&gt;flags &amp;= ~wq_flag_woken;    condition = true;
  smp_mb();                          smp_wmb();
  if (condition)                     wq_entry-&gt;flags |= wq_flag_woken;
     break;

This commit replaces the above smp_wmb() with an smp_mb() in order to
guarantee that either wait_woken() sees the wait condition being true
or the store to wq_entry-&gt;flags in woken_wake_function() follows the
store in wait_woken() in the coherence order (so that the former can
eventually be observed by wait_woken()).

The commit also fixes a comment associated to set_current_state() in
wait_woken(): the comment pairs the barrier in set_current_state() to
the above smp_wmb(), while the actual pairing involves the barrier in
set_current_state() and the barrier executed by the try_to_wake_up()
in wake_woken_function().

Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: akiyks@gmail.com
Cc: boqun.feng@gmail.com
Cc: dhowells@redhat.com
Cc: j.alglave@ucl.ac.uk
Cc: linux-arch@vger.kernel.org
Cc: luc.maranget@inria.fr
Cc: npiggin@gmail.com
Cc: parri.andrea@gmail.com
Cc: stern@rowland.harvard.edu
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
wake_woken_function() synchronizes with wait_woken() as follows:

  [wait_woken]                       [wake_woken_function]

  entry-&gt;flags &amp;= ~wq_flag_woken;    condition = true;
  smp_mb();                          smp_wmb();
  if (condition)                     wq_entry-&gt;flags |= wq_flag_woken;
     break;

This commit replaces the above smp_wmb() with an smp_mb() in order to
guarantee that either wait_woken() sees the wait condition being true
or the store to wq_entry-&gt;flags in woken_wake_function() follows the
store in wait_woken() in the coherence order (so that the former can
eventually be observed by wait_woken()).

The commit also fixes a comment associated to set_current_state() in
wait_woken(): the comment pairs the barrier in set_current_state() to
the above smp_wmb(), while the actual pairing involves the barrier in
set_current_state() and the barrier executed by the try_to_wake_up()
in wake_woken_function().

Signed-off-by: Andrea Parri &lt;andrea.parri@amarulasolutions.com&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: akiyks@gmail.com
Cc: boqun.feng@gmail.com
Cc: dhowells@redhat.com
Cc: j.alglave@ucl.ac.uk
Cc: linux-arch@vger.kernel.org
Cc: luc.maranget@inria.fr
Cc: npiggin@gmail.com
Cc: parri.andrea@gmail.com
Cc: stern@rowland.harvard.edu
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/headers: Simplify and clean up header usage in the scheduler</title>
<updated>2018-03-04T11:39:29+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2018-03-03T11:20:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=325ea10c0809406ce23f038602abbc454f3f761d'/>
<id>325ea10c0809406ce23f038602abbc454f3f761d</id>
<content type='text'>
Do the following cleanups and simplifications:

 - sched/sched.h already includes &lt;asm/paravirt.h&gt;, so no need to
   include it in sched/core.c again.

 - order the &lt;linux/sched/*.h&gt; headers alphabetically

 - add all &lt;linux/sched/*.h&gt; headers to kernel/sched/sched.h

 - remove all unnecessary includes from the .c files that
   are already included in kernel/sched/sched.h.

Finally, make all scheduler .c files use a single common header:

  #include "sched.h"

... which now contains a union of the relied upon headers.

This makes the various .c files easier to read and easier to handle.

Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do the following cleanups and simplifications:

 - sched/sched.h already includes &lt;asm/paravirt.h&gt;, so no need to
   include it in sched/core.c again.

 - order the &lt;linux/sched/*.h&gt; headers alphabetically

 - add all &lt;linux/sched/*.h&gt; headers to kernel/sched/sched.h

 - remove all unnecessary includes from the .c files that
   are already included in kernel/sched/sched.h.

Finally, make all scheduler .c files use a single common header:

  #include "sched.h"

... which now contains a union of the relied upon headers.

This makes the various .c files easier to read and easier to handle.

Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: Clean up and harmonize the coding style of the scheduler code base</title>
<updated>2018-03-03T14:50:21+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2018-03-03T13:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=97fb7a0a8944bd6d2c5634e1e0fa689a5c40bc22'/>
<id>97fb7a0a8944bd6d2c5634e1e0fa689a5c40bc22</id>
<content type='text'>
A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:

 - fix speling in comments,

 - use curly braces for multi-line statements,

 - remove unnecessary parentheses from integer literals,

 - capitalize consistently,

 - remove stray newlines,

 - add comments where necessary,

 - remove invalid/unnecessary comments,

 - align structure definitions and other data types vertically,

 - add missing newlines for increased readability,

 - fix vertical tabulation where it's misaligned,

 - harmonize preprocessor conditional block labeling
   and vertical alignment,

 - remove line-breaks where they uglify the code,

 - add newline after local variable definitions,

No change in functionality:

  md5:
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm

Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A good number of small style inconsistencies have accumulated
in the scheduler core, so do a pass over them to harmonize
all these details:

 - fix speling in comments,

 - use curly braces for multi-line statements,

 - remove unnecessary parentheses from integer literals,

 - capitalize consistently,

 - remove stray newlines,

 - add comments where necessary,

 - remove invalid/unnecessary comments,

 - align structure definitions and other data types vertically,

 - add missing newlines for increased readability,

 - fix vertical tabulation where it's misaligned,

 - harmonize preprocessor conditional block labeling
   and vertical alignment,

 - remove line-breaks where they uglify the code,

 - add newline after local variable definitions,

No change in functionality:

  md5:
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
     1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm

Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/wait: Fix add_wait_queue() behavioral change</title>
<updated>2017-12-06T18:30:34+00:00</updated>
<author>
<name>Omar Sandoval</name>
<email>osandov@fb.com</email>
</author>
<published>2017-12-06T07:15:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c6b9d9a33029014446bd9ed84c1688f6d3d4eab9'/>
<id>c6b9d9a33029014446bd9ed84c1688f6d3d4eab9</id>
<content type='text'>
The following cleanup commit:

  50816c48997a ("sched/wait: Standardize internal naming of wait-queue entries")

... unintentionally changed the behavior of add_wait_queue() from
inserting the wait entry at the head of the wait queue to the tail
of the wait queue.

Beyond a negative performance impact this change in behavior
theoretically also breaks wait queues which mix exclusive and
non-exclusive waiters, as non-exclusive waiters will not be
woken up if they are queued behind enough exclusive waiters.

Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-team@fb.com
Fixes: ("sched/wait: Standardize internal naming of wait-queue entries")
Link: http://lkml.kernel.org/r/a16c8ccffd39bd08fdaa45a5192294c784b803a7.1512544324.git.osandov@fb.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The following cleanup commit:

  50816c48997a ("sched/wait: Standardize internal naming of wait-queue entries")

... unintentionally changed the behavior of add_wait_queue() from
inserting the wait entry at the head of the wait queue to the tail
of the wait queue.

Beyond a negative performance impact this change in behavior
theoretically also breaks wait queues which mix exclusive and
non-exclusive waiters, as non-exclusive waiters will not be
woken up if they are queued behind enough exclusive waiters.

Signed-off-by: Omar Sandoval &lt;osandov@fb.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-team@fb.com
Fixes: ("sched/wait: Standardize internal naming of wait-queue entries")
Link: http://lkml.kernel.org/r/a16c8ccffd39bd08fdaa45a5192294c784b803a7.1512544324.git.osandov@fb.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/wait: Introduce wakeup boomark in wake_up_page_bit</title>
<updated>2017-09-14T16:56:18+00:00</updated>
<author>
<name>Tim Chen</name>
<email>tim.c.chen@linux.intel.com</email>
</author>
<published>2017-08-25T16:13:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=11a19c7b099f96d00a8dec52bfbb8475e89b6745'/>
<id>11a19c7b099f96d00a8dec52bfbb8475e89b6745</id>
<content type='text'>
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.

We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long.  This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.

[ v2: Remove bookmark_wake_function ]

Signed-off-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.

We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long.  This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.

[ v2: Remove bookmark_wake_function ]

Signed-off-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/wait: Break up long wake list walk</title>
<updated>2017-09-14T16:56:17+00:00</updated>
<author>
<name>Tim Chen</name>
<email>tim.c.chen@linux.intel.com</email>
</author>
<published>2017-08-25T16:13:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2554db916586b228ce93e6f74a12fd7fe430a004'/>
<id>2554db916586b228ce93e6f74a12fd7fe430a004</id>
<content type='text'>
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.

We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.

Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.

The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.

This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.

This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.

[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
  simply access to flags. ]

Reported-by: Kan Liang &lt;kan.liang@intel.com&gt;
Tested-by: Kan Liang &lt;kan.liang@intel.com&gt;
Signed-off-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.

We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.

Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.

The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.

This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.

This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.

[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
  simply access to flags. ]

Reported-by: Kan Liang &lt;kan.liang@intel.com&gt;
Tested-by: Kan Liang &lt;kan.liang@intel.com&gt;
Signed-off-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Minor page waitqueue cleanups</title>
<updated>2017-08-27T20:55:12+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-08-27T20:55:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2'/>
<id>3510ca20ece0150af6b10c77a74ff1b5c198e3e2</id>
<content type='text'>
Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
