linux-toradex.git/include/linux/swap.h, branch v5.12-rc7

swap: fix swapfile read/write offset

2021-03-03T00:25:46+00:00

We're not factoring in the start of the file for where to write and
read the swapfile, which leads to very unfortunate side effects of
writing where we should not be...

Fixes: 48d15436fde6 ("mm: remove get_swap_bio")
Signed-off-by: Jens Axboe

mm/vmscan: __isolate_lru_page_prepare() cleanup

2021-02-24T21:38:33+00:00

The function just returns 2 results, so using a 'switch' to deal with its
result is unnecessary.  Also simplify it to a bool func as Vlastimil
suggested.

Also remove 'goto' by reusing list_move(), and take Matthew Wilcox's
suggestion to update comments in function.

Link: https://lkml.kernel.org/r/728874d7-2d93-4049-68c1-dcc3b2d52ccd@linux.alibaba.com
Signed-off-by: Alex Shi 
Reviewed-by: Andrew Morton 
Acked-by: Vlastimil Babka 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: Yu Zhao 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: memcg: add swapcache stat for memcg v2

2021-02-24T21:38:29+00:00

This patch adds swapcache stat for the cgroup v2.  The swapcache
represents the memory that is accounted against both the memory and the
swap limit of the cgroup.  The main motivation behind exposing the
swapcache stat is for enabling users to gracefully migrate from cgroup
v1's memsw counter to cgroup v2's memory and swap counters.

Cgroup v1's memsw limit allows users to limit the memory+swap usage of a
workload but without control on the exact proportion of memory and swap.
Cgroup v2 provides separate limits for memory and swap which enables more
control on the exact usage of memory and swap individually for the
workload.

With some little subtleties, the v1's memsw limit can be switched with the
sum of the v2's memory and swap limits.  However the alternative for memsw
usage is not yet available in cgroup v2.  Exposing per-cgroup swapcache
stat enables that alternative.  Adding the memory usage and swap usage and
subtracting the swapcache will approximate the memsw usage.  This will
help in the transparent migration of the workloads depending on memsw
usage and limit to v2' memory and swap counters.

The reasons these applications are still interested in this approximate
memsw usage are: (1) these applications are not really interested in two
separate memory and swap usage metrics.  A single usage metric is more
simple to use and reason about for them.

(2) The memsw usage metric hides the underlying system's swap setup from
the applications.  Applications with multiple instances running in a
datacenter with heterogeneous systems (some have swap and some don't) will
keep seeing a consistent view of their usage.

[akpm@linux-foundation.org: fix CONFIG_SWAP=n build]

Link: https://lkml.kernel.org/r/20210108155813.2914586-3-shakeelb@google.com
Signed-off-by: Shakeel Butt 
Acked-by: Michal Hocko 
Reviewed-by: Roman Gushchin 
Cc: Johannes Weiner 
Cc: Muchun Song 
Cc: Yang Shi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove get_swap_bio

2021-01-27T16:51:49+00:00

Just reuse the block_device and sector from the swap_info structure,
just as used by the SWP_SYNCHRONOUS path.  Also remove the checks for
NULL returns from bio_alloc as that can't happen for sleeping
allocations.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Chaitanya Kulkarni 
Acked-by: Damien Le Moal 
Signed-off-by: Jens Axboe

mm/compaction: do page isolation first in compaction

2020-12-15T22:48:04+00:00

Currently, compaction would get the lru_lock and then do page isolation
which works fine with pgdat->lru_lock, since any page isoltion would
compete for the lru_lock.  If we want to change to memcg lru_lock, we have
to isolate the page before getting lru_lock, thus isoltion would block
page's memcg change which relay on page isoltion too.  Then we could
safely use per memcg lru_lock later.

The new page isolation use previous introduced TestClearPageLRU() + pgdat
lru locking which will be changed to memcg lru lock later.

Hugh Dickins  fixed following bugs in this patch's early
version:

Fix lots of crashes under compaction load: isolate_migratepages_block()
must clean up appropriately when rejecting a page, setting PageLRU again
if it had been cleared; and a put_page() after get_page_unless_zero()
cannot safely be done while holding locked_lruvec - it may turn out to be
the final put_page(), which will take an lruvec lock when PageLRU.

And move __isolate_lru_page_prepare back after get_page_unless_zero to
make trylock_page() safe: trylock_page() is not safe to use at this time:
its setting PG_locked can race with the page being freed or allocated
("Bad page"), and can also erase flags being set by one of those "sole
owners" of a freshly allocated page who use non-atomic __SetPageFlag().

Link: https://lkml.kernel.org/r/1604566549-62481-16-git-send-email-alex.shi@linux.alibaba.com
Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Acked-by: Hugh Dickins 
Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Cc: Matthew Wilcox 
Cc: Alexander Duyck 
Cc: Andrea Arcangeli 
Cc: Andrey Ryabinin 
Cc: "Chen, Rong A" 
Cc: Daniel Jordan 
Cc: "Huang, Ying" 
Cc: Jann Horn 
Cc: Joonsoo Kim 
Cc: Kirill A. Shutemov 
Cc: Kirill A. Shutemov 
Cc: Konstantin Khlebnikov 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Michal Hocko 
Cc: Mika Penttilä 
Cc: Minchan Kim 
Cc: Shakeel Butt 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
Cc: Vladimir Davydov 
Cc: Wei Yang 
Cc: Yang Shi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/thp: move lru_add_page_tail() to huge_memory.c

2020-12-15T22:48:03+00:00

Patch series "per memcg lru lock", v21.

This patchset includes 3 parts:

 1) some code cleanup and minimum optimization as preparation

 2) use TestCleanPageLRU as page isolation's precondition

 3) replace per node lru_lock with per memcg per node lru_lock

Current lru_lock is one for each of node, pgdat->lru_lock, that guard
for lru lists, but now we had moved the lru lists into memcg for long
time.  Still using per node lru_lock is clearly unscalable, pages on
each of memcgs have to compete each others for a whole lru_lock.  This
patchset try to use per lruvec/memcg lru_lock to repleace per node lru
lock to guard lru lists, make it scalable for memcgs and get performance
gain.

Currently lru_lock still guards both lru list and page's lru bit, that's
ok.  but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking.  Just taking lruvec
lock first may be undermined by the page's memcg charge/migration.  To
fix this problem, we could take out the page's lru bit clear and use it
as pin down action to block the memcg changes.  That's the reason for
new atomic func TestClearPageLRU.  So now isolating a page need both
actions: TestClearPageLRU and hold the lru_lock.

The typical usage of this is isolate_migratepages_block() in
compaction.c we have to take lru bit before lru lock, that serialized
the page isolation in memcg page charge/migration which will change
page's lruvec and new lru_lock in it.

The above solution suggested by Johannes Weiner, and based on his new
memcg charge path, then have this patchset.  (Hugh Dickins tested and
contributed much code from compaction fix to general code polish, thanks
a lot!).

Daniel Jordan's testing show 62% improvement on modified readtwice case
on his 2P * 10 core * 2 HT broadwell box on v18, which has no much
different with this v20.

 https://lore.kernel.org/lkml/20200915165807.kpp7uhiw7l3loofu@ca-dmjordan1.us.oracle.com/

Thanks to Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who gave comments as well: Daniel Jordan,
Mel Gorman, Shakeel Butt, Matthew Wilcox, Alexander Duyck etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang.  Hugh Dickins also shared his kbuild-swap case.

This patch (of 19):

lru_add_page_tail() is only used in huge_memory.c, defining it in other
file with a CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.

Let's move it THP. And make it static as Hugh Dickins suggested.

Link: https://lkml.kernel.org/r/1604566549-62481-1-git-send-email-alex.shi@linux.alibaba.com
Link: https://lkml.kernel.org/r/1604566549-62481-2-git-send-email-alex.shi@linux.alibaba.com
Signed-off-by: Alex Shi 
Reviewed-by: Kirill A. Shutemov 
Acked-by: Hugh Dickins 
Acked-by: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Tejun Heo 
Cc: Konstantin Khlebnikov 
Cc: Daniel Jordan 
Cc: Shakeel Butt 
Cc: Joonsoo Kim 
Cc: Wei Yang 
Cc: Alexander Duyck 
Cc: "Chen, Rong A" 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Andrea Arcangeli 
Cc: Andrey Ryabinin 
Cc: "Huang, Ying" 
Cc: Jann Horn 
Cc: Kirill A. Shutemov 
Cc: Michal Hocko 
Cc: Mika Penttilä 
Cc: Minchan Kim 
Cc: Thomas Gleixner 
Cc: Vlastimil Babka 
Cc: Yang Shi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: remove activate_page() from unuse_pte()

2020-10-14T01:38:30+00:00

We don't initially add anon pages to active lruvec after commit
b518154e59aa ("mm/vmscan: protect the workingset on anonymous LRU").
Remove activate_page() from unuse_pte(), which seems to be missed by the
commit.  And make the function static while we are at it.

Before the commit, we called lru_cache_add_active_or_unevictable() to add
new ksm pages to active lruvec.  Therefore, activate_page() wasn't
necessary for them in the first place.

Signed-off-by: Yu Zhao 
Signed-off-by: Andrew Morton 
Reviewed-by: Yang Shi 
Cc: Alexander Duyck 
Cc: Huang Ying 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Qian Cai 
Cc: Mel Gorman 
Cc: Nicholas Piggin 
Cc: Hugh Dickins 
Cc: Joonsoo Kim 
Link: http://lkml.kernel.org/r/20200818184704.3625199-1-yuzhao@google.com
Signed-off-by: Linus Torvalds

swap: rename SWP_FS to SWAP_FS_OPS to avoid ambiguity

2020-10-14T01:38:29+00:00

SWP_FS is used to make swap_{read,write}page() go through the filesystem,
and it's only used for swap files over NFS for now.  Otherwise it will
directly submit IO to blockdev according to swapfile extents reported by
filesystems in advance.

As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's
rename to SWP_FS_OPS.

[1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.org

Suggested-by: Matthew Wilcox 
Signed-off-by: Gao Xiang 
Signed-off-by: Andrew Morton 
Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.com
Signed-off-by: Linus Torvalds

mm: factor find_get_incore_page out of mincore_page

2020-10-14T01:38:29+00:00

Patch series "Return head pages from find_*_entry", v2.

This patch series started out as part of the THP patch set, but it has
some nice effects along the way and it seems worth splitting it out and
submitting separately.

Currently find_get_entry() and find_lock_entry() return the page
corresponding to the requested index, but the first thing most callers do
is find the head page, which we just threw away.  As part of auditing all
the callers, I found some misuses of the APIs and some plain
inefficiencies that I've fixed.

The diffstat is unflattering, but I added more kernel-doc and a new wrapper.

This patch (of 8);

Provide this functionality from the swap cache.  It's useful for
more than just mincore().

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton 
Cc: Hugh Dickins 
Cc: William Kucharski 
Cc: Jani Nikula 
Cc: Alexey Dobriyan 
Cc: Johannes Weiner 
Cc: Chris Wilson 
Cc: Matthew Auld 
Cc: Huang Ying 
Link: https://lkml.kernel.org/r/20200910183318.20139-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200910183318.20139-2-willy@infradead.org
Signed-off-by: Linus Torvalds

mm: split swap_type_of

2020-09-23T16:43:19+00:00

swap_type_of is used for two entirely different purposes:

 (1) check what swap type a given device/offset corresponds to
 (2) find the first available swap device that can be written to

Mixing both in a single function creates an unreadable mess.  Create two
separate functions instead, and switch both to pass a dev_t instead of
a struct block_device to further simplify the code.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe