linux-toradex.git/mm/compaction.c, branch v4.4.108

mm, compaction: prevent VM_BUG_ON when terminating freeing scanner

2016-08-10T09:49:25+00:00

commit a46cbf3bc53b6a93fb84a5ffb288c354fa807954 upstream.

It's possible to isolate some freepages in a pageblock and then fail
split_free_page() due to the low watermark check.  In this case, we hit
VM_BUG_ON() because the freeing scanner terminated early without a
contended lock or enough freepages.

This should never have been a VM_BUG_ON() since it's not a fatal
condition.  It should have been a VM_WARN_ON() at best, or even handled
gracefully.

Regardless, we need to terminate anytime the full pageblock scan was not
done.  The logic belongs in isolate_freepages_block(), so handle its
state gracefully by terminating the pageblock loop and making a note to
restart at the same pageblock next time since it was not possible to
complete the scan this time.

[rientjes@google.com: don't rescan pages in a pageblock]
  Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1607111244150.83138@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606291436300.145590@chino.kir.corp.google.com
Signed-off-by: David Rientjes 
Reported-by: Minchan Kim 
Tested-by: Minchan Kim 
Cc: Joonsoo Kim 
Cc: Hugh Dickins 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm, compaction: abort free scanner if split fails

2016-08-10T09:49:25+00:00

commit a4f04f2c6955aff5e2c08dcb40aca247ff4d7370 upstream.

If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly.  If the
watermark is insufficient for a free page of order <= cc->order, then
terminate the scanner since all future splits will also likely fail.

This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail while holding zone->lock.

compaction_alloc() iterating a 128GB zone has been benchmarked to take
over 400ms on some systems whereas any free page isolated and ready to
be split ends up failing in split_free_page() because of the low
watermark check and thus the iteration continues.

The next time compaction occurs, the freeing scanner will likely start
at the end of the zone again since no success was made previously and we
get the same lengthy iteration until the zone is brought above the low
watermark.  All thp page faults can take >400ms in such a state without
this fix.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1606211820350.97086@chino.kir.corp.google.com
Signed-off-by: David Rientjes 
Acked-by: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm, cma: prevent nr_isolated_* counters from going negative

2016-05-11T09:21:14+00:00

commit 14af4a5e9b26ad251f81c174e8a43f3e179434a5 upstream.

/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay.  The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.

The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated().  Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.

Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog]
Signed-off-by: Hugh Dickins 
Signed-off-by: Vlastimil Babka 
Acked-by: Joonsoo Kim 
Cc: Michal Hocko 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

mm, compaction: distinguish contended status in tracepoints

2015-11-06T03:34:48+00:00

Compaction returns prematurely with COMPACT_PARTIAL when contended or has
fatal signal pending.  This is ok for the callers, but might be misleading
in the traces, as the usual reason to return COMPACT_PARTIAL is that we
think the allocation should succeed.  After this patch we distinguish the
premature ending condition in the mm_compaction_finished and
mm_compaction_end tracepoints.

The contended status covers the following reasons:
- lock contention or need_resched() detected in async compaction
- fatal signal pending
- too many pages isolated in the zone (only for async compaction)
Further distinguishing the exact reason seems unnecessary for now.

Signed-off-by: Vlastimil Babka 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: David Rientjes 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: export tracepoints status strings to userspace

2015-11-06T03:34:48+00:00

Some compaction tracepoints convert the integer return values to strings
using the compaction_status_string array.  This works for in-kernel
printing, but not userspace trace printing of raw captured trace such as
via trace-cmd report.

This patch converts the private array to appropriate tracepoint macros
that result in proper userspace support.

trace-cmd output before:
transhuge-stres-4235  [000]   453.149280: mm_compaction_finished: node=0
  zone=ffffffff81815d7a order=9 ret=

after:
transhuge-stres-4235  [000]   453.149280: mm_compaction_finished: node=0
  zone=ffffffff81815d7a order=9 ret=partial

Signed-off-by: Vlastimil Babka 
Reviewed-by: Steven Rostedt 
Cc: Joonsoo Kim 
Cc: Ingo Molnar 
Cc: Mel Gorman 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction.c: add an is_via_compact_memory() helper

2015-11-06T03:34:48+00:00

Introduce is_via_compact_memory() helper indicating compacting via
/proc/sys/vm/compact_memory to improve readability.

To catch this situation in __compaction_suitable, use order as parameter
directly instead of using struct compact_control.

This patch has no functional changes.

Signed-off-by: Yaowei Bai 
Cc: Mel Gorman 
Acked-by: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/compaction: correct to flush migrated pages if pageblock skip happens

2015-09-08T22:35:28+00:00

We cache isolate_start_pfn before entering isolate_migratepages().  If
pageblock is skipped in isolate_migratepages() due to whatever reason,
cc->migrate_pfn can be far from isolate_start_pfn hence we flush pages
that were freed.  For example, the following scenario can be possible:

- assume order-9 compaction, pageblock order is 9
- start_isolate_pfn is 0x200
- isolate_migratepages()
  - skip a number of pageblocks
  - start to isolate from pfn 0x600
  - cc->migrate_pfn = 0x620
  - return
- last_migrated_pfn is set to 0x200
- check flushing condition
  - current_block_start is set to 0x600
  - last_migrated_pfn < current_block_start then do useless flush

This wrong flush would not help the performance and success rate so this
patch tries to fix it.  One simple way to know the exact position where
we start to isolate migratable pages is that we cache it in
isolate_migratepages() before entering actual isolation.  This patch
implements that and fixes the problem.

Signed-off-by: Joonsoo Kim 
Acked-by: Vlastimil Babka 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: David Rientjes 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: skip compound pages by order in free scanner

2015-09-08T22:35:28+00:00

The compaction free scanner is looking for PageBuddy() pages and
skipping all others.  For large compound pages such as THP or hugetlbfs,
we can save a lot of iterations if we skip them at once using their
compound_order().  This is generally unsafe and we can read a bogus
value of order due to a race, but if we are careful, the only danger is
skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB
hugetlbfs pages, the vmstat compact_free_scanned count decreased by at
least 15%.

Signed-off-by: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Mel Gorman 
Acked-by: Joonsoo Kim 
Acked-by: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: always skip all compound pages by order in migrate scanner

2015-09-08T22:35:28+00:00

The compaction migrate scanner tries to skip THP pages by their order,
to reduce number of iterations for pages it cannot isolate.  The check
is only done if PageLRU() is true, which means it applies to THP pages,
but not e.g.  hugetlbfs pages or any other non-LRU compound pages, which
we have to iterate by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be
split under us.  But the only danger (after filtering out order values
that are not below MAX_ORDER, to prevent overflows) is that we skip too
much or too little after reading a bogus compound_order() due to a rare
race.  This is the same reasoning as patch 99c0fd5e51c4 ("mm,
compaction: skip buddy pages by their order in the migrate scanner")
introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip
them by compound_order().  The test is done after the test for
balloon_page_movable() as we don't want to assume if balloon pages (or
other pages with own isolation and migration implementation if a generic
API gets implemented) are compound or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB
hugetlbfs pages, the vmstat compact_migrate_scanned count decreased by
15%.

[kirill.shutemov@linux.intel.com: change PageTransHuge checks to PageCompound for different series was squashed here]
Signed-off-by: Vlastimil Babka 
Cc: Minchan Kim 
Acked-by: Mel Gorman 
Acked-by: Joonsoo Kim 
Acked-by: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, compaction: encapsulate resetting cached scanner positions

2015-09-08T22:35:28+00:00

Reseting the cached compaction scanner positions is now open-coded in
__reset_isolation_suitable() and compact_finished().  Encapsulate the
functionality in a new function reset_cached_positions().

Signed-off-by: Vlastimil Babka 
Cc: Minchan Kim 
Cc: Mel Gorman 
Cc: Joonsoo Kim 
Acked-by: Michal Nazarewicz 
Cc: Naoya Horiguchi 
Cc: Christoph Lameter 
Cc: Rik van Riel 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds