linux-toradex.git/mm/compaction.c, branch v4.9.85

mm, compaction: fix NR_ISOLATED_* stats for pfn based migration

2017-01-12T10:39:32+00:00

commit 6afcf8ef0ca0a69d014f8edb613d94821f0ae700 upstream.

Since commit bda807d44454 ("mm: migrate: support non-lru movable page
migration") isolate_migratepages_block) can isolate !PageLRU pages which
would acct_isolated account as NR_ISOLATED_*.  Accounting these non-lru
pages NR_ISOLATED_{ANON,FILE} doesn't make any sense and it can misguide
heuristics based on those counters such as pgdat_reclaimable_pages resp.
too_many_isolated which would lead to unexpected stalls during the
direct reclaim without any good reason.  Note that
__alloc_contig_migrate_range can isolate a lot of pages at once.

On mobile devices such as 512M ram android Phone, it may use a big zram
swap.  In some cases zram(zsmalloc) uses too many non-lru but
migratedable pages, such as:

      MemTotal: 468148 kB
      Normal free:5620kB
      Free swap:4736kB
      Total swap:409596kB
      ZRAM: 164616kB(zsmalloc non-lru pages)
      active_anon:60700kB
      inactive_anon:60744kB
      active_file:34420kB
      inactive_file:37532kB

Fix this by only accounting lru pages to NR_ISOLATED_* in
isolate_migratepages_block right after they were isolated and we still
know they were on LRU.  Drop acct_isolated because it is called after
the fact and we've lost that information.  Batching per-cpu counter
doesn't make much improvement anyway.  Also make sure that we uncharge
only LRU pages when putting them back on the LRU in
putback_movable_pages resp.  when unmap_and_move migrates the page.

[mhocko@suse.com: replace acct_isolated() with direct counting]
Fixes: bda807d44454 ("mm: migrate: support non-lru movable page migration")
Link: http://lkml.kernel.org/r/20161019080240.9682-1-mhocko@kernel.org
Signed-off-by: Ming Ling 
Signed-off-by: Michal Hocko 
Acked-by: Minchan Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm, compaction: restrict fragindex to costly orders

2016-10-08T01:46:29+00:00

Fragmentation index and the vm.extfrag_threshold sysctl is meant as a
heuristic to prevent excessive compaction for costly orders (i.e.  THP).
It's unlikely to make any difference for non-costly orders, especially
with the default threshold.  But we cannot afford any uncertainty for
the non-costly orders where the only alternative to successful
reclaim/compaction is OOM.  After the recent patches we are guaranteed
maximum effort without heuristics from compaction before deciding OOM,
and fragindex is the last remaining heuristic.  Therefore skip fragindex
altogether for non-costly orders.

Suggested-by: Michal Hocko 
Link: http://lkml.kernel.org/r/20160926162025.21555-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: ignore fragindex from compaction_zonelist_suitable()

2016-10-08T01:46:29+00:00

The compaction_zonelist_suitable() function tries to determine if
compaction will be able to proceed after sufficient reclaim, i.e.
whether there are enough reclaimable pages to provide enough order-0
freepages for compaction.

This addition of reclaimable pages to the free pages works well for the
order-0 watermark check, but in the fragmentation index check we only
consider truly free pages.  Thus we can get fragindex value close to 0
which indicates failure do to lack of memory, and wrongly decide that
compaction won't be suitable even after reclaim.

Instead of trying to somehow adjust fragindex for reclaimable pages,
let's just skip it from compaction_zonelist_suitable().

Link: http://lkml.kernel.org/r/20160926162025.21555-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: make full priority ignore pageblock suitability

2016-10-08T01:46:29+00:00

Several people have reported premature OOMs for order-2 allocations
(stack) due to OOM rework in 4.7.  In the scenario (parallel kernel
build and dd writing to two drives) many pageblocks get marked as
Unmovable and compaction free scanner struggles to isolate free pages.
Joonsoo Kim pointed out that the free scanner skips pageblocks that are
not movable to prevent filling them and forcing non-movable allocations
to fallback to other pageblocks.  Such heuristic makes sense to help
prevent long-term fragmentation, but premature OOMs are relatively more
urgent problem.  As a compromise, this patch disables the heuristic only
for the ultimate compaction priority.

Link: http://lkml.kernel.org/r/20160906135258.18335-5-vbabka@suse.cz
Reported-by: Ralf-Peter Rohbeck 
Reported-by: Arkadiusz Miskiewicz 
Reported-by: Olaf Hering 
Suggested-by: Joonsoo Kim 
Signed-off-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Cc: Tetsuo Handa 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: require only min watermarks for non-costly orders

2016-10-08T01:46:27+00:00

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction.  Then __isolate_free_page uses low watermark check to decide
if particular free page can be isolated.  In the latter case, using low
watermark is needlessly pessimistic, as the free page isolations are
only temporary.  For __compaction_suitable() the higher watermark makes
sense for high-order allocations where more freepages increase the
chance of success, and we can typically fail with some order-0 fallback
when the system is struggling to reach that watermark.  But for
low-order allocation, forming the page should not be that hard.  So
using low watermark here might just prevent compaction from even trying,
and eventually lead to OOM killer even if we are above min watermarks.

So after this patch, we use min watermark for non-costly orders in
__compaction_suitable(), and for all orders in __isolate_free_page().

[vbabka@suse.cz: clarify __isolate_free_page() comment]
 Link: http://lkml.kernel.org/r/7ae4baec-4eca-e70b-2a69-94bea4fb19fa@suse.cz
Link: http://lkml.kernel.org/r/20160810091226.6709-11-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: use proper alloc_flags in __compaction_suitable()

2016-10-08T01:46:27+00:00

The __compaction_suitable() function checks the low watermark plus a
compact_gap() gap to decide if there's enough free memory to perform
compaction.  This check uses direct compactor's alloc_flags, but that's
wrong, since these flags are not applicable for freepage isolation.

For example, alloc_flags may indicate access to memory reserves, making
compaction proceed, and then fail watermark check during the isolation.

A similar problem exists for ALLOC_CMA, which may be part of
alloc_flags, but not during freepage isolation.  In this case however it
makes sense to use ALLOC_CMA both in __compaction_suitable() and
__isolate_free_page(), since there's actually nothing preventing the
freepage scanner to isolate from CMA pageblocks, with the assumption
that a page that could be migrated once by compaction can be migrated
also later by CMA allocation.  Thus we should count pages in CMA
pageblocks when considering compaction suitability and when isolating
freepages.

To sum up, this patch should remove some false positives from
__compaction_suitable(), and allow compaction to proceed when free pages
required for compaction reside in the CMA pageblocks.

Link: http://lkml.kernel.org/r/20160810091226.6709-10-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: create compact_gap wrapper

2016-10-08T01:46:27+00:00

Compaction uses a watermark gap of (2UL << order) pages at various
places and it's not immediately obvious why.  Abstract it through a
compact_gap() wrapper to create a single place with a thorough
explanation.

[vbabka@suse.cz: clarify the comment of compact_gap()]
 Link: http://lkml.kernel.org/r/7b6aed1f-fdf8-2063-9ff4-bbe4de712d37@suse.cz
Link: http://lkml.kernel.org/r/20160810091226.6709-9-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: use correct watermark when checking compaction success

2016-10-08T01:46:27+00:00

The __compact_finished() function uses low watermark in a check that has
to pass if the direct compaction is to finish and allocation should
succeed.  This is too pessimistic, as the allocation will typically use
min watermark.  It may happen that during compaction, we drop below the
low watermark (due to parallel activity), but still form the target
high-order page.  By checking against low watermark, we might needlessly
continue compaction.

Similarly, __compaction_suitable() uses low watermark in a check whether
allocation can succeed without compaction.  Again, this is unnecessarily
pessimistic.

After this patch, these check will use direct compactor's alloc_flags to
determine the watermark, which is effectively the min watermark.

Link: http://lkml.kernel.org/r/20160810091226.6709-8-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: add the ultimate direct compaction priority

2016-10-08T01:46:27+00:00

During reclaim/compaction loop, it's desirable to get a final answer
from unsuccessful compaction so we can either fail the allocation or
invoke the OOM killer.  However, heuristics such as deferred compaction
or pageblock skip bits can cause compaction to skip parts or whole zones
and lead to premature OOM's, failures or excessive reclaim/compaction
retries.

To remedy this, we introduce a new direct compaction priority called
COMPACT_PRIO_SYNC_FULL, which instructs direct compaction to:

 - ignore deferred compaction status for a zone
 - ignore pageblock skip hints
 - ignore cached scanner positions and scan the whole zone

The new priority should get eventually picked up by
should_compact_retry() and this should improve success rates for costly
allocations using __GFP_REPEAT, such as hugetlbfs allocations, and
reduce some corner-case OOM's for non-costly allocations.

Link: http://lkml.kernel.org/r/20160810091226.6709-6-vbabka@suse.cz
[vbabka@suse.cz: use the MIN_COMPACT_PRIORITY alias]
  Link: http://lkml.kernel.org/r/d443b884-87e7-1c93-8684-3a3a35759fb1@suse.cz
Signed-off-by: Vlastimil Babka 
Tested-by: Lorenzo Stoakes 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: don't recheck watermarks after COMPACT_SUCCESS

2016-10-08T01:46:27+00:00

Joonsoo has reminded me that in a later patch changing watermark checks
throughout compaction I forgot to update checks in
try_to_compact_pages() and compactd_do_work().  Closer inspection
however shows that they are redundant now in the success case, because
compact_zone() now reliably reports this with COMPACT_SUCCESS.  So
effectively the checks just repeat (a subset) of checks that have just
passed.  So instead of checking watermarks again, just test the return
value.

Note it's also possible that compaction would declare failure e.g.
because its find_suitable_fallback() is more strict than simple
watermark check, and then the watermark check we are removing would then
still succeed.  After this patch this is not possible and it's arguably
better, because for long-term fragmentation avoidance we should rather
try a different zone than allocate with the unsuitable fallback.  If
compaction of all zones fail and the allocation is important enough, it
will retry and succeed anyway.

Also remove the stray "bool success" variable from kcompactd_do_work().

Link: http://lkml.kernel.org/r/20160810091226.6709-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka 
Reported-by: Joonsoo Kim 
Tested-by: Lorenzo Stoakes 
Acked-by: Michal Hocko 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds