<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/filemap.c, branch v5.2-rc2</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>treewide: Add SPDX license identifier for missed files</title>
<updated>2019-05-21T08:50:45+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:08:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=457c89965399115e5cd8bf38f9c597293405703d'/>
<id>457c89965399115e5cd8bf38f9c597293405703d</id>
<content type='text'>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: delete find_get_entries_tag</title>
<updated>2019-05-14T16:47:51+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2019-05-14T00:23:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a1b8e6abf35b9903807eced67a4c26e440663620'/>
<id>a1b8e6abf35b9903807eced67a4c26e440663620</id>
<content type='text'>
I removed the only user of this and hadn't noticed it was now unused.

Link: http://lkml.kernel.org/r/20190430152929.21813-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Ross Zwisler &lt;zwisler@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I removed the only user of this and hadn't noticed it was now unused.

Link: http://lkml.kernel.org/r/20190430152929.21813-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Reviewed-by: Ross Zwisler &lt;zwisler@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/filemap.c: fix minor typo</title>
<updated>2019-05-14T16:47:49+00:00</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2019-05-14T00:21:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2346a560599a4438d66b17d83f102b2ec59f167c'/>
<id>2346a560599a4438d66b17d83f102b2ec59f167c</id>
<content type='text'>
Link: http://lkml.kernel.org/r/20190304155240.19215-1-ldufour@linux.ibm.com
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Link: http://lkml.kernel.org/r/20190304155240.19215-1-ldufour@linux.ibm.com
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/filemap.c: enable error injection at add_to_page_cache()</title>
<updated>2019-05-14T16:47:49+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-05-14T00:21:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cfcbfb1382dbac331d8aa92d3a218a16b803b2a9'/>
<id>cfcbfb1382dbac331d8aa92d3a218a16b803b2a9</id>
<content type='text'>
Recently I messed up the error handling in filemap_fault() because of an
unexpected ENOMEM (related to cgroup memory limits) in add_to_page_cache.
Enable error injection at this point so I can add a testcase to xfstests
to verify I don't mess this up again.

[akpm@linux-foundation.org: include linux/error-injection.h]
Link: http://lkml.kernel.org/r/20190403152604.14008-1-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recently I messed up the error handling in filemap_fault() because of an
unexpected ENOMEM (related to cgroup memory limits) in add_to_page_cache.
Enable error injection at this point so I can add a testcase to xfstests
to verify I don't mess this up again.

[akpm@linux-foundation.org: include linux/error-injection.h]
Link: http://lkml.kernel.org/r/20190403152604.14008-1-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: page cache: store only head pages in i_pages</title>
<updated>2019-05-14T16:47:45+00:00</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-05-14T00:16:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5fd4ca2d84b249f0858ce28cf637cf25b61a398f'/>
<id>5fd4ca2d84b249f0858ce28cf637cf25b61a398f</id>
<content type='text'>
Transparent Huge Pages are currently stored in i_pages as pointers to
consecutive subpages.  This patch changes that to storing consecutive
pointers to the head page in preparation for storing huge pages more
efficiently in i_pages.

Large parts of this are "inspired" by Kirill's patch
https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

[willy@infradead.org: fix swapcache pages]
  Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
[kirill@shutemov.name: hugetlb stores pages in page cache differently]
  Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Kirill Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-and-tested-by: Song Liu &lt;songliubraving@fb.com&gt;
Tested-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Tested-by: Qian Cai &lt;cai@lca.pw&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Song Liu &lt;liu.song.a23@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Transparent Huge Pages are currently stored in i_pages as pointers to
consecutive subpages.  This patch changes that to storing consecutive
pointers to the head page in preparation for storing huge pages more
efficiently in i_pages.

Large parts of this are "inspired" by Kirill's patch
https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

[willy@infradead.org: fix swapcache pages]
  Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
[kirill@shutemov.name: hugetlb stores pages in page cache differently]
  Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Kirill Shutemov &lt;kirill@shutemov.name&gt;
Reviewed-and-tested-by: Song Liu &lt;songliubraving@fb.com&gt;
Tested-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Reviewed-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Tested-by: Qian Cai &lt;cai@lca.pw&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Song Liu &lt;liu.song.a23@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior</title>
<updated>2019-03-15T18:26:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-03-15T18:26:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b0f9fa2e02dc95216577c3387b0707c5f60fbaf'/>
<id>8b0f9fa2e02dc95216577c3387b0707c5f60fbaf</id>
<content type='text'>
I thought Josef Bacik's patch to drop the mmap_sem was buggy, because
when looking at the error cases, there was one case where we returned
VM_FAULT_RETRY without actually dropping the mmap_sem.

Josef had to explain to me (using small words) that yes, that's actually
what we're supposed to do, and his patch was correct.  Which not only
convinced me he knew what he was doing and I should stop arguing with
him, but also that I should add a comment to the case I was confused
about.

Patiently-pointed-out-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I thought Josef Bacik's patch to drop the mmap_sem was buggy, because
when looking at the error cases, there was one case where we returned
VM_FAULT_RETRY without actually dropping the mmap_sem.

Josef had to explain to me (using small words) that yes, that's actually
what we're supposed to do, and his patch was correct.  Which not only
convinced me he knew what he was doing and I should stop arguing with
him, but also that I should add a comment to the case I was confused
about.

Patiently-pointed-out-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: drop the mmap_sem for all blocking operations</title>
<updated>2019-03-15T18:21:25+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-03-13T18:44:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11'/>
<id>6b4c9f4469819a0c1a38a0a4541337e0f9bf6c11</id>
<content type='text'>
Currently we only drop the mmap_sem if there is contention on the page
lock.  The idea is that we issue readahead and then go to lock the page
while it is under IO and we want to not hold the mmap_sem during the IO.

The problem with this is the assumption that the readahead does anything.
In the case that the box is under extreme memory or IO pressure we may end
up not reading anything at all for readahead, which means we will end up
reading in the page under the mmap_sem.

Even if the readahead does something, it could get throttled because of io
pressure on the system and the process is in a lower priority cgroup.

Holding the mmap_sem while doing IO is problematic because it can cause
system-wide priority inversions.  Consider some large company that does a
lot of web traffic.  This large company has load balancing logic in it's
core web server, cause some engineer thought this was a brilliant plan.
This load balancing logic gets statistics from /proc about the system,
which trip over processes mmap_sem for various reasons.  Now the web
server application is in a protected cgroup, but these other processes may
not be, and if they are being throttled while their mmap_sem is held we'll
stall, and cause this nice death spiral.

Instead rework filemap fault path to drop the mmap sem at any point that
we may do IO or block for an extended period of time.  This includes while
issuing readahead, locking the page, or needing to call -&gt;readpage because
readahead did not occur.  Then once we have a fully uptodate page we can
return with VM_FAULT_RETRY and come back again to find our nicely in-cache
page that was gotten outside of the mmap_sem.

This patch also adds a new helper for locking the page with the mmap_sem
dropped.  This doesn't make sense currently as generally speaking if the
page is already locked it'll have been read in (unless there was an error)
before it was unlocked.  However a forthcoming patchset will change this
with the ability to abort read-ahead bio's if necessary, making it more
likely that we could contend for a page lock and still have a not uptodate
page.  This allows us to deal with this case by grabbing the lock and
issuing the IO without the mmap_sem held, and then returning
VM_FAULT_RETRY to come back around.

[josef@toxicpanda.com: v6]
  Link: http://lkml.kernel.org/r/20181212152757.10017-1-josef@toxicpanda.com
[kirill@shutemov.name: fix race in filemap_fault()]
  Link: http://lkml.kernel.org/r/20181228235106.okk3oastsnpxusxs@kshutemo-mobl1
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181211173801.29535-4-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Tested-by: syzbot+b437b5a429d680cf2217@syzkaller.appspotmail.com
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently we only drop the mmap_sem if there is contention on the page
lock.  The idea is that we issue readahead and then go to lock the page
while it is under IO and we want to not hold the mmap_sem during the IO.

The problem with this is the assumption that the readahead does anything.
In the case that the box is under extreme memory or IO pressure we may end
up not reading anything at all for readahead, which means we will end up
reading in the page under the mmap_sem.

Even if the readahead does something, it could get throttled because of io
pressure on the system and the process is in a lower priority cgroup.

Holding the mmap_sem while doing IO is problematic because it can cause
system-wide priority inversions.  Consider some large company that does a
lot of web traffic.  This large company has load balancing logic in it's
core web server, cause some engineer thought this was a brilliant plan.
This load balancing logic gets statistics from /proc about the system,
which trip over processes mmap_sem for various reasons.  Now the web
server application is in a protected cgroup, but these other processes may
not be, and if they are being throttled while their mmap_sem is held we'll
stall, and cause this nice death spiral.

Instead rework filemap fault path to drop the mmap sem at any point that
we may do IO or block for an extended period of time.  This includes while
issuing readahead, locking the page, or needing to call -&gt;readpage because
readahead did not occur.  Then once we have a fully uptodate page we can
return with VM_FAULT_RETRY and come back again to find our nicely in-cache
page that was gotten outside of the mmap_sem.

This patch also adds a new helper for locking the page with the mmap_sem
dropped.  This doesn't make sense currently as generally speaking if the
page is already locked it'll have been read in (unless there was an error)
before it was unlocked.  However a forthcoming patchset will change this
with the ability to abort read-ahead bio's if necessary, making it more
likely that we could contend for a page lock and still have a not uptodate
page.  This allows us to deal with this case by grabbing the lock and
issuing the IO without the mmap_sem held, and then returning
VM_FAULT_RETRY to come back around.

[josef@toxicpanda.com: v6]
  Link: http://lkml.kernel.org/r/20181212152757.10017-1-josef@toxicpanda.com
[kirill@shutemov.name: fix race in filemap_fault()]
  Link: http://lkml.kernel.org/r/20181228235106.okk3oastsnpxusxs@kshutemo-mobl1
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20181211173801.29535-4-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Tested-by: syzbot+b437b5a429d680cf2217@syzkaller.appspotmail.com
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: kill page_cache_read usage in filemap_fault</title>
<updated>2019-03-15T18:21:25+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-03-13T18:44:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a75d4c33377277b6034dd1e2663bce444f952c14'/>
<id>a75d4c33377277b6034dd1e2663bce444f952c14</id>
<content type='text'>
Patch series "drop the mmap_sem when doing IO in the fault path", v6.

Now that we have proper isolation in place with cgroups2 we have started
going through and fixing the various priority inversions.  Most are all
gone now, but this one is sort of weird since it's not necessarily a
priority inversion that happens within the kernel, but rather because of
something userspace does.

We have giant applications that we want to protect, and parts of these
giant applications do things like watch the system state to determine how
healthy the box is for load balancing and such.  This involves running
'ps' or other such utilities.  These utilities will often walk
/proc/&lt;pid&gt;/whatever, and these files can sometimes need to
down_read(&amp;task-&gt;mmap_sem).  Not usually a big deal, but we noticed when
we are stress testing that sometimes our protected application has latency
spikes trying to get the mmap_sem for tasks that are in lower priority
cgroups.

This is because any down_write() on a semaphore essentially turns it into
a mutex, so even if we currently have it held for reading, any new readers
will not be allowed on to keep from starving the writer.  This is fine,
except a lower priority task could be stuck doing IO because it has been
throttled to the point that its IO is taking much longer than normal.  But
because a higher priority group depends on this completing it is now stuck
behind lower priority work.

In order to avoid this particular priority inversion we want to use the
existing retry mechanism to stop from holding the mmap_sem at all if we
are going to do IO.  This already exists in the read case sort of, but
needed to be extended for more than just grabbing the page lock.  With
io.latency we throttle at submit_bio() time, so the readahead stuff can
block and even page_cache_read can block, so all these paths need to have
the mmap_sem dropped.

The other big thing is -&gt;page_mkwrite.  btrfs is particularly shitty here
because we have to reserve space for the dirty page, which can be a very
expensive operation.  We use the same retry method as the read path, and
simply cache the page and verify the page is still setup properly the next
pass through -&gt;page_mkwrite().

I've tested these patches with xfstests and there are no regressions.

This patch (of 3):

If we do not have a page at filemap_fault time we'll do this weird forced
page_cache_read thing to populate the page, and then drop it again and
loop around and find it.  This makes for 2 ways we can read a page in
filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
flag so that pagecache_get_page() will return a unlocked page that's in
pagecache.  Then use the normal page locking and readpage logic already in
filemap_fault.  This simplifies the no page in page cache case
significantly.

[akpm@linux-foundation.org: fix comment text]
[josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
  Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "drop the mmap_sem when doing IO in the fault path", v6.

Now that we have proper isolation in place with cgroups2 we have started
going through and fixing the various priority inversions.  Most are all
gone now, but this one is sort of weird since it's not necessarily a
priority inversion that happens within the kernel, but rather because of
something userspace does.

We have giant applications that we want to protect, and parts of these
giant applications do things like watch the system state to determine how
healthy the box is for load balancing and such.  This involves running
'ps' or other such utilities.  These utilities will often walk
/proc/&lt;pid&gt;/whatever, and these files can sometimes need to
down_read(&amp;task-&gt;mmap_sem).  Not usually a big deal, but we noticed when
we are stress testing that sometimes our protected application has latency
spikes trying to get the mmap_sem for tasks that are in lower priority
cgroups.

This is because any down_write() on a semaphore essentially turns it into
a mutex, so even if we currently have it held for reading, any new readers
will not be allowed on to keep from starving the writer.  This is fine,
except a lower priority task could be stuck doing IO because it has been
throttled to the point that its IO is taking much longer than normal.  But
because a higher priority group depends on this completing it is now stuck
behind lower priority work.

In order to avoid this particular priority inversion we want to use the
existing retry mechanism to stop from holding the mmap_sem at all if we
are going to do IO.  This already exists in the read case sort of, but
needed to be extended for more than just grabbing the page lock.  With
io.latency we throttle at submit_bio() time, so the readahead stuff can
block and even page_cache_read can block, so all these paths need to have
the mmap_sem dropped.

The other big thing is -&gt;page_mkwrite.  btrfs is particularly shitty here
because we have to reserve space for the dirty page, which can be a very
expensive operation.  We use the same retry method as the read path, and
simply cache the page and verify the page is still setup properly the next
pass through -&gt;page_mkwrite().

I've tested these patches with xfstests and there are no regressions.

This patch (of 3):

If we do not have a page at filemap_fault time we'll do this weird forced
page_cache_read thing to populate the page, and then drop it again and
loop around and find it.  This makes for 2 ways we can read a page in
filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
flag so that pagecache_get_page() will return a unlocked page that's in
pagecache.  Then use the normal page locking and readpage logic already in
filemap_fault.  This simplifies the no page in page cache case
significantly.

[akpm@linux-foundation.org: fix comment text]
[josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
  Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>filemap: pass vm_fault to the mmap ra helpers</title>
<updated>2019-03-14T21:36:20+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2019-03-13T18:44:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2a1180f1bd389e9d47693e5eb384b95f482d8d19'/>
<id>2a1180f1bd389e9d47693e5eb384b95f482d8d19</id>
<content type='text'>
All of the arguments to these functions come from the vmf.

Cut down on the amount of arguments passed by simply passing in the vmf
to these two helpers.

Link: http://lkml.kernel.org/r/20181211173801.29535-3-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All of the arguments to these functions come from the vmf.

Cut down on the amount of arguments passed by simply passing in the vmf
to these two helpers.

Link: http://lkml.kernel.org/r/20181211173801.29535-3-josef@toxicpanda.com
Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Kirill A. Shutemov" &lt;kirill@shutemov.name&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: remove zone_lru_lock() function, access -&gt;lru_lock directly</title>
<updated>2019-03-06T05:07:21+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>aryabinin@virtuozzo.com</email>
</author>
<published>2019-03-05T23:49:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f4b7e272b5c0425915e2115068e0a5a20a3a628e'/>
<id>f4b7e272b5c0425915e2115068e0a5a20a3a628e</id>
<content type='text'>
We have common pattern to access lru_lock from a page pointer:
	zone_lru_lock(page_zone(page))

Which is silly, because it unfolds to this:
	&amp;NODE_DATA(page_to_nid(page))-&gt;node_zones[page_zonenum(page)]-&gt;zone_pgdat-&gt;lru_lock
while we can simply do
	&amp;NODE_DATA(page_to_nid(page))-&gt;lru_lock

Remove zone_lru_lock() function, since it's only complicate things.  Use
'page_pgdat(page)-&gt;lru_lock' pattern instead.

[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
  Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have common pattern to access lru_lock from a page pointer:
	zone_lru_lock(page_zone(page))

Which is silly, because it unfolds to this:
	&amp;NODE_DATA(page_to_nid(page))-&gt;node_zones[page_zonenum(page)]-&gt;zone_pgdat-&gt;lru_lock
while we can simply do
	&amp;NODE_DATA(page_to_nid(page))-&gt;lru_lock

Remove zone_lru_lock() function, since it's only complicate things.  Use
'page_pgdat(page)-&gt;lru_lock' pattern instead.

[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
  Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin &lt;aryabinin@virtuozzo.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: William Kucharski &lt;william.kucharski@oracle.com&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
