<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux, branch v5.9-rc6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-09-20T22:25:33+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-20T22:25:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3d491679b880006d2469ad14f73c2debb2a522bd'/>
<id>3d491679b880006d2469ad14f73c2debb2a522bd</id>
<content type='text'>
Pull locking fixes from Borislav Petkov:
 "Two fixes from the locking/urgent pile:

   - Fix lockdep's detection of "USED" &lt;- "IN-NMI" inversions (Peter
     Zijlstra)

   - Make percpu-rwsem operations on the semaphore's -&gt;read_count
     IRQ-safe because it can be used in an IRQ context (Hou Tao)"

* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
  locking/lockdep: Fix "USED" &lt;- "IN-NMI" inversions
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull locking fixes from Borislav Petkov:
 "Two fixes from the locking/urgent pile:

   - Fix lockdep's detection of "USED" &lt;- "IN-NMI" inversions (Peter
     Zijlstra)

   - Make percpu-rwsem operations on the semaphore's -&gt;read_count
     IRQ-safe because it can be used in an IRQ context (Hou Tao)"

* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
  locking/lockdep: Fix "USED" &lt;- "IN-NMI" inversions
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2020-09-20T22:01:57+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-20T22:01:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a123dbaf3a6efca717e3a5ac8176fd4897456b2'/>
<id>4a123dbaf3a6efca717e3a5ac8176fd4897456b2</id>
<content type='text'>
Pull libnvdimm fixes from Dan Williams:
 "A handful of fixes to address a string of mistakes in the mechanism
  for device-mapper to determine if its component devices are dax
  capable.

   - Fix an original bug in device-mapper table reference counting when
     interrogating dax capability in the component device. This bug was
     hidden by the following bug.

   - Fix device-mapper to use the proper helper (dax_supported() instead
     of the leaf helper generic_fsdax_supported()) to determine dax
     operation of a stacked block device configuration. The original
     implementation is only valid for one level of dax-capable block
     device stacking. This bug was discovered while fixing the below
     regression.

   - Fix an infinite recursion regression introduced by broken attempts
     to quiet the generic_fsdax_supported() path and make it bail out
     before logging "dax capability not found" errors"

* tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix stack overflow when mounting fsdax pmem device
  dm: Call proper helper to determine dax support
  dm/dax: Fix table reference counts
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull libnvdimm fixes from Dan Williams:
 "A handful of fixes to address a string of mistakes in the mechanism
  for device-mapper to determine if its component devices are dax
  capable.

   - Fix an original bug in device-mapper table reference counting when
     interrogating dax capability in the component device. This bug was
     hidden by the following bug.

   - Fix device-mapper to use the proper helper (dax_supported() instead
     of the leaf helper generic_fsdax_supported()) to determine dax
     operation of a stacked block device configuration. The original
     implementation is only valid for one level of dax-capable block
     device stacking. This bug was discovered while fixing the below
     regression.

   - Fix an infinite recursion regression introduced by broken attempts
     to quiet the generic_fsdax_supported() path and make it bail out
     before logging "dax capability not found" errors"

* tag 'libnvdimm-fixes-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix stack overflow when mounting fsdax pmem device
  dm: Call proper helper to determine dax support
  dm/dax: Fix table reference counts
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty</title>
<updated>2020-09-20T17:46:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-20T17:46:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f44f3f83d895b830a85f62790cedd5605a399ac4'/>
<id>f44f3f83d895b830a85f62790cedd5605a399ac4</id>
<content type='text'>
Pull tty/serial/fbcon fixes from Greg KH:
 "Here are some small tty/serial and one more fbcon fix.

  They include:

   - serial core locking regression fixes

   - new device ids for 8250_pci driver

   - fbcon fix for syzbot found issue

  All have been in linux-next with no reported issues"

* tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  fbcon: Fix user font detection test at fbcon_resize().
  serial: 8250_pci: Add Realtek 816a and 816b
  serial: core: fix console port-lock regression
  serial: core: fix port-lock initialisation
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull tty/serial/fbcon fixes from Greg KH:
 "Here are some small tty/serial and one more fbcon fix.

  They include:

   - serial core locking regression fixes

   - new device ids for 8250_pci driver

   - fbcon fix for syzbot found issue

  All have been in linux-next with no reported issues"

* tag 'tty-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  fbcon: Fix user font detection test at fbcon_resize().
  serial: 8250_pci: Add Realtek 816a and 816b
  serial: core: fix console port-lock regression
  serial: core: fix port-lock initialisation
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: Call proper helper to determine dax support</title>
<updated>2020-09-20T15:55:09+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2020-09-20T15:54:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e2ec5128254518cae320d5dc631b71b94160f663'/>
<id>e2ec5128254518cae320d5dc631b71b94160f663</id>
<content type='text'>
DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: &lt;stable@vger.kernel.org&gt;
Tested-by: Adrian Huang &lt;ahuang12@lenovo.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: &lt;stable@vger.kernel.org&gt;
Tested-by: Adrian Huang &lt;ahuang12@lenovo.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>stackleak: let stack_erasing_sysctl take a kernel pointer buffer</title>
<updated>2020-09-19T20:13:39+00:00</updated>
<author>
<name>Tobias Klauser</name>
<email>tklauser@distanz.ch</email>
</author>
<published>2020-09-19T04:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4773ef33fc6e59bad2e5d19e334de2fa79c27b74'/>
<id>4773ef33fc6e59bad2e5d19e334de2fa79c27b74</id>
<content type='text'>
Commit 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50:    expected void *
kernel/stackleak.c:31:50:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
Signed-off-by: Tobias Klauser &lt;tklauser@distanz.ch&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50:    expected void *
kernel/stackleak.c:31:50:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
Signed-off-by: Tobias Klauser &lt;tklauser@distanz.ch&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ftrace: let ftrace_enable_sysctl take a kernel pointer buffer</title>
<updated>2020-09-19T20:13:39+00:00</updated>
<author>
<name>Tobias Klauser</name>
<email>tklauser@distanz.ch</email>
</author>
<published>2020-09-19T04:20:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7bb82ac30c3dd4ecf1485685cbe84d2ba10dddf4'/>
<id>7bb82ac30c3dd4ecf1485685cbe84d2ba10dddf4</id>
<content type='text'>
Commit 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
Signed-off-by: Tobias Klauser &lt;tklauser@distanz.ch&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to -&gt;proc_handler")
Signed-off-by: Tobias Klauser &lt;tklauser@distanz.ch&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux</title>
<updated>2020-09-18T18:55:43+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-18T18:55:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=69828c475d15290553cb5512108424746baf6225'/>
<id>69828c475d15290553cb5512108424746baf6225</id>
<content type='text'>
Pull arm64 fixes from Catalin Marinas:

 - Allow CPUs affected by erratum 1418040 to come online late
   (previously we only fixed the other case - CPUs not affected by the
   erratum coming up late).

 - Fix branch offset in BPF JIT.

 - Defer the stolen time initialisation to the CPU online time from the
   CPU starting time to avoid a (sleep-able) memory allocation in an
   atomic context.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: paravirt: Initialize steal time when cpu is online
  arm64: bpf: Fix branch offset in JIT
  arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull arm64 fixes from Catalin Marinas:

 - Allow CPUs affected by erratum 1418040 to come online late
   (previously we only fixed the other case - CPUs not affected by the
   erratum coming up late).

 - Fix branch offset in BPF JIT.

 - Defer the stolen time initialisation to the CPU online time from the
   CPU starting time to avoid a (sleep-able) memory allocation in an
   atomic context.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: paravirt: Initialize steal time when cpu is online
  arm64: bpf: Fix branch offset in JIT
  arm64: Allow CPUs unffected by ARM erratum 1418040 to come in late
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2020-09-18T18:43:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-18T18:43:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=794a9965eef498f6e4c466167880acb4ab990b63'/>
<id>794a9965eef498f6e4c466167880acb4ab990b63</id>
<content type='text'>
Pull power management updates from Rafael Wysocki:
 "These add a new CPU ID to the RAPL power capping driver and prevent
  the ACPI processor idle driver from triggering RCU-lockdep complaints.

  Specifics:

   - Add support for the Lakefield chip to the RAPL power capping driver
     (Ricardo Neri).

   - Modify the ACPI processor idle driver to prevent it from triggering
     RCU-lockdep complaints which has started to happen after recent
     changes in that area (Peter Zijlstra)"

* tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Take over RCU-idle for C3-BM idle
  cpuidle: Allow cpuidle drivers to take over RCU-idle
  ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
  ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP
  powercap: RAPL: Add support for Lakefield
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull power management updates from Rafael Wysocki:
 "These add a new CPU ID to the RAPL power capping driver and prevent
  the ACPI processor idle driver from triggering RCU-lockdep complaints.

  Specifics:

   - Add support for the Lakefield chip to the RAPL power capping driver
     (Ricardo Neri).

   - Modify the ACPI processor idle driver to prevent it from triggering
     RCU-lockdep complaints which has started to happen after recent
     changes in that area (Peter Zijlstra)"

* tag 'pm-5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: processor: Take over RCU-idle for C3-BM idle
  cpuidle: Allow cpuidle drivers to take over RCU-idle
  ACPI: processor: Use CPUIDLE_FLAG_TLB_FLUSHED
  ACPI: processor: Use CPUIDLE_FLAG_TIMER_STOP
  powercap: RAPL: Add support for Lakefield
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: allow a controlled amount of unfairness in the page lock</title>
<updated>2020-09-17T17:26:41+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-09-13T21:05:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5ef64cc8987a9211d3f3667331ba3411a94ddc79'/>
<id>5ef64cc8987a9211d3f3667331ba3411a94ddc79</id>
<content type='text'>
Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to.  But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks.  And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&amp;item=linux-50-59&amp;num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel &lt;Michael@michaellarabel.com&gt;
Tested-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic") made
the page locking entirely fair, in that if a waiter came in while the
lock was held, the lock would be transferred to the lockers strictly in
order.

That was intended to finally get rid of the long-reported watchdog
failures that involved the page lock under extreme load, where a process
could end up waiting essentially forever, as other page lockers stole
the lock from under it.

It also improved some benchmarks, but it ended up causing huge
performance regressions on others, simply because fair lock behavior
doesn't end up giving out the lock as aggressively, causing better
worst-case latency, but potentially much worse average latencies and
throughput.

Instead of reverting that change entirely, this introduces a controlled
amount of unfairness, with a sysctl knob to tune it if somebody needs
to.  But the default value should hopefully be good for any normal load,
allowing a few rounds of lock stealing, but enforcing the strict
ordering before the lock has been stolen too many times.

There is also a hint from Matthieu Baerts that the fair page coloring
may end up exposing an ABBA deadlock that is hidden by the usual
optimistic lock stealing, and while the unfairness doesn't fix the
fundamental issue (and I'm still looking at that), it avoids it in
practice.

The amount of unfairness can be modified by writing a new value to the
'sysctl_page_lock_unfairness' variable (default value of 5, exposed
through /proc/sys/vm/page_lock_unfairness), but that is hopefully
something we'd use mainly for debugging rather than being necessary for
any deep system tuning.

This whole issue has exposed just how critical the page lock can be, and
how contended it gets under certain locks.  And the main contention
doesn't really seem to be anything related to IO (which was the origin
of this lock), but for things like just verifying that the page file
mapping is stable while faulting in the page into a page table.

Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
Link: https://www.phoronix.com/scan.php?page=article&amp;item=linux-50-59&amp;num=1
Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/
Reported-and-tested-by: Michael Larabel &lt;Michael@michaellarabel.com&gt;
Tested-by: Matthieu Baerts &lt;matthieu.baerts@tessares.net&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>arm64: paravirt: Initialize steal time when cpu is online</title>
<updated>2020-09-17T17:12:18+00:00</updated>
<author>
<name>Andrew Jones</name>
<email>drjones@redhat.com</email>
</author>
<published>2020-09-16T15:45:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=75df529bec9110dad43ab30e2d9490242529e8b8'/>
<id>75df529bec9110dad43ab30e2d9490242529e8b8</id>
<content type='text'>
Steal time initialization requires mapping a memory region which
invokes a memory allocation. Doing this at CPU starting time results
in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:

BUG: sleeping function called from invalid context at mm/slab.h:498
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
Call trace:
 dump_backtrace+0x0/0x208
 show_stack+0x1c/0x28
 dump_stack+0xc4/0x11c
 ___might_sleep+0xf8/0x130
 __might_sleep+0x58/0x90
 slab_pre_alloc_hook.constprop.101+0xd0/0x118
 kmem_cache_alloc_node_trace+0x84/0x270
 __get_vm_area_node+0x88/0x210
 get_vm_area_caller+0x38/0x40
 __ioremap_caller+0x70/0xf8
 ioremap_cache+0x78/0xb0
 memremap+0x9c/0x1a8
 init_stolen_time_cpu+0x54/0xf0
 cpuhp_invoke_callback+0xa8/0x720
 notify_cpu_starting+0xc8/0xd8
 secondary_start_kernel+0x114/0x180
CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]

However we don't need to initialize steal time at CPU starting time.
We can simply wait until CPU online time, just sacrificing a bit of
accuracy by returning zero for steal time until we know better.

While at it, add __init to the functions that are only called by
pv_time_init() which is __init.

Signed-off-by: Andrew Jones &lt;drjones@redhat.com&gt;
Fixes: e0685fa228fd ("arm64: Retrieve stolen time as paravirtualized guest")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Steal time initialization requires mapping a memory region which
invokes a memory allocation. Doing this at CPU starting time results
in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:

BUG: sleeping function called from invalid context at mm/slab.h:498
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
Call trace:
 dump_backtrace+0x0/0x208
 show_stack+0x1c/0x28
 dump_stack+0xc4/0x11c
 ___might_sleep+0xf8/0x130
 __might_sleep+0x58/0x90
 slab_pre_alloc_hook.constprop.101+0xd0/0x118
 kmem_cache_alloc_node_trace+0x84/0x270
 __get_vm_area_node+0x88/0x210
 get_vm_area_caller+0x38/0x40
 __ioremap_caller+0x70/0xf8
 ioremap_cache+0x78/0xb0
 memremap+0x9c/0x1a8
 init_stolen_time_cpu+0x54/0xf0
 cpuhp_invoke_callback+0xa8/0x720
 notify_cpu_starting+0xc8/0xd8
 secondary_start_kernel+0x114/0x180
CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]

However we don't need to initialize steal time at CPU starting time.
We can simply wait until CPU online time, just sacrificing a bit of
accuracy by returning zero for steal time until we know better.

While at it, add __init to the functions that are only called by
pv_time_init() which is __init.

Signed-off-by: Andrew Jones &lt;drjones@redhat.com&gt;
Fixes: e0685fa228fd ("arm64: Retrieve stolen time as paravirtualized guest")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Price &lt;steven.price@arm.com&gt;
Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
