linux-toradex.git/mm/compaction.c, branch v6.10

mm: handle profiling for fake memory allocations during compaction

2024-06-25T03:52:09+00:00

During compaction isolated free pages are marked allocated so that they
can be split and/or freed.  For that, post_alloc_hook() is used inside
split_map_pages() and release_free_list().  split_map_pages() marks free
pages allocated, splits the pages and then lets
alloc_contig_range_noprof() free those pages.  release_free_list() marks
free pages and immediately frees them.  This usage of post_alloc_hook()
affect memory allocation profiling because these functions might not be
called from an instrumented allocator, therefore current->alloc_tag is
NULL and when debugging is enabled (CONFIG_MEM_ALLOC_PROFILING_DEBUG=y)
that causes warnings.  To avoid that, wrap such post_alloc_hook() calls
into an instrumented function which acts as an allocator which will be
charged for these fake allocations.  Note that these allocations are very
short lived until they are freed, therefore the associated counters should
usually read 0.

Link: https://lkml.kernel.org/r/20240614230504.3849136-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan 
Acked-by: Vlastimil Babka 
Cc: Kees Cook 
Cc: Kent Overstreet 
Cc: Pasha Tatashin 
Cc: Sourav Panda 
Signed-off-by: Andrew Morton

memory: remove the now superfluous sentinel element from ctl_table array

2024-04-26T03:56:32+00:00

This commit comes at the tail end of a greater effort to remove the empty
elements at the end of the ctl_table arrays (sentinels) which will reduce
the overall build time size of the kernel and run time memory bloat by ~64
bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel from all files under mm/ that register a sysctl table.

Link: https://lkml.kernel.org/r/20240328-jag-sysctl_remset_misc-v1-1-47c1463b3af2@samsung.com
Signed-off-by: Joel Granados 
Reviewed-by: Muchun Song 
Reviewed-by: Miaohe Lin 
Signed-off-by: Andrew Morton

mm: enable page allocation tagging

2024-04-26T03:55:54+00:00

Redefine page allocators to record allocation tags upon their invocation. 
Instrument post_alloc_hook and free_pages_prepare to modify current
allocation tag.

[surenb@google.com: undo _noprof additions in the documentation]
  Link: https://lkml.kernel.org/r/20240326231453.1206227-3-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-19-surenb@google.com
Signed-off-by: Suren Baghdasaryan 
Co-developed-by: Kent Overstreet 
Signed-off-by: Kent Overstreet 
Reviewed-by: Kees Cook 
Tested-by: Kees Cook 
Cc: Alexander Viro 
Cc: Alex Gaynor 
Cc: Alice Ryhl 
Cc: Andreas Hindborg 
Cc: Benno Lossin 
Cc: "Björn Roy Baron" 
Cc: Boqun Feng 
Cc: Christoph Lameter 
Cc: Dennis Zhou 
Cc: Gary Guo 
Cc: Miguel Ojeda 
Cc: Pasha Tatashin 
Cc: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Vlastimil Babka 
Cc: Wedson Almeida Filho 
Signed-off-by: Andrew Morton

Merge branch 'master' into mm-stable

2024-03-18T16:47:52+00:00

mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations

2024-03-05T00:40:32+00:00

Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO.  Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.

Quoting Sven:

1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
   with __GFP_RETRY_MAYFAIL set.

2. page alloc's __alloc_pages_slowpath tries to get a page from the
   freelist. This fails because there is nothing free of that costly
   order.

3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
   which bails out because a zone is ready to be compacted; it pretends
   to have made a single page of progress.

4. page alloc tries to compact, but this always bails out early because
   __GFP_IO is not set (it's not passed by the snd allocator, and even
   if it were, we are suspending so the __GFP_IO flag would be cleared
   anyway).

5. page alloc believes reclaim progress was made (because of the
   pretense in item 3) and so it checks whether it should retry
   compaction. The compaction retry logic thinks it should try again,
   because:
    a) reclaim is needed because of the early bail-out in item 4
    b) a zonelist is suitable for compaction

6. goto 2. indefinite stall.

(end quote)

The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries.  There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.

To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.

Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
  there's at least one attempt to actually reclaim, even if chances are
  small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
  return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
  which is then used to avoid retrying reclaim/compaction for costly
  allocations (step 5) if we can't compact and also to skip the early
  compaction attempt that we do in some cases

Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka 
Reported-by: Sven van Ashbrook 
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Tested-by: Karthikeyan Ramasubramanian 
Cc: Brian Geffon 
Cc: Curtis Malainey 
Cc: Jaroslav Kysela 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Takashi Iwai 
Cc: 
Signed-off-by: Andrew Morton

mm/compaction: optimize >0 order folio compaction with free page split.

2024-02-24T01:48:33+00:00

During migration in a memory compaction, free pages are placed in an array
of page lists based on their order.  But the desired free page order
(i.e., the order of a source page) might not be always present, thus
leading to migration failures and premature compaction termination.  Split
a high order free pages when source migration page has a lower order to
increase migration successful rate.

Note: merging free pages when a migration fails and a lower order free
page is returned via compaction_free() is possible, but there is too much
work.  Since the free pages are not buddy pages, it is hard to identify
these free pages using existing PFN-based page merging algorithm.

Link: https://lkml.kernel.org/r/20240220183220.1451315-5-zi.yan@sent.com
Signed-off-by: Zi Yan 
Reviewed-by: Baolin Wang 
Reviewed-by: Vlastimil Babka 
Tested-by: Baolin Wang 
Tested-by: Yu Zhao 
Cc: Adam Manzanares 
Cc: David Hildenbrand 
Cc: Huang Ying 
Cc: Johannes Weiner 
Cc: Kemeng Shi 
Cc: Kirill A. Shutemov 
Cc: Luis Chamberlain 
Cc: Matthew Wilcox (Oracle) 
Cc: Mel Gorman 
Cc: Ryan Roberts 
Cc: Vishal Moola (Oracle) 
Cc: Vlastimil Babka 
Cc: Yin Fengwei 
Signed-off-by: Andrew Morton

mm/compaction: add support for >0 order folio memory compaction.

2024-02-24T01:48:33+00:00

Before last commit, memory compaction only migrates order-0 folios and
skips >0 order folios.  Last commit splits all >0 order folios during
compaction.  This commit migrates >0 order folios during compaction by
keeping isolated free pages at their original size without splitting them
into order-0 pages and using them directly during migration process.

What is different from the prior implementation:
1. All isolated free pages are kept in a NR_PAGE_ORDERS array of page
   lists, where each page list stores free pages in the same order.
2. All free pages are not post_alloc_hook() processed nor buddy pages,
   although their orders are stored in first page's private like buddy
   pages.
3. During migration, in new page allocation time (i.e., in
   compaction_alloc()), free pages are then processed by post_alloc_hook().
   When migration fails and a new page is returned (i.e., in
   compaction_free()), free pages are restored by reversing the
   post_alloc_hook() operations using newly added
   free_pages_prepare_fpi_none().

Step 3 is done for a latter optimization that splitting and/or merging
free pages during compaction becomes easier.

Note: without splitting free pages, compaction can end prematurely due to
migration will return -ENOMEM even if there is free pages.  This happens
when no order-0 free page exist and compaction_alloc() return NULL.

Link: https://lkml.kernel.org/r/20240220183220.1451315-4-zi.yan@sent.com
Signed-off-by: Zi Yan 
Reviewed-by: Baolin Wang 
Reviewed-by: Vlastimil Babka 
Tested-by: Baolin Wang 
Tested-by: Yu Zhao 
Cc: Adam Manzanares 
Cc: David Hildenbrand 
Cc: Huang Ying 
Cc: Johannes Weiner 
Cc: Kemeng Shi 
Cc: Kirill A. Shutemov 
Cc: Luis Chamberlain 
Cc: Matthew Wilcox (Oracle) 
Cc: Mel Gorman 
Cc: Ryan Roberts 
Cc: Vishal Moola (Oracle) 
Cc: Vlastimil Babka 
Cc: Yin Fengwei 
Signed-off-by: Andrew Morton

mm/compaction: enable compacting >0 order folios.

2024-02-24T01:48:33+00:00

migrate_pages() supports >0 order folio migration and during compaction,
even if compaction_alloc() cannot provide >0 order free pages,
migrate_pages() can split the source page and try to migrate the base
pages from the split.  It can be a baseline and start point for adding
support for compacting >0 order folios.

Link: https://lkml.kernel.org/r/20240220183220.1451315-3-zi.yan@sent.com
Signed-off-by: Zi Yan 
Suggested-by: Huang Ying 
Reviewed-by: Baolin Wang 
Reviewed-by: Vlastimil Babka 
Tested-by: Baolin Wang 
Tested-by: Yu Zhao 
Cc: Adam Manzanares 
Cc: David Hildenbrand 
Cc: Johannes Weiner 
Cc: Kemeng Shi 
Cc: Kirill A. Shutemov 
Cc: Luis Chamberlain 
Cc: Matthew Wilcox (Oracle) 
Cc: Mel Gorman 
Cc: Ryan Roberts 
Cc: Vishal Moola (Oracle) 
Cc: Vlastimil Babka 
Cc: Yin Fengwei 
Signed-off-by: Andrew Morton

mm: compaction: early termination in compact_nodes()

2024-02-24T01:48:31+00:00

No need to continue try compact memory if pending fatal signal, allow loop
termination earlier in compact_nodes().

The existing fatal_signal_pending() check does make compact_zone()
break out of the while loop, but it still enters the next zone/next
nid, and some unnecessary functions(eg, lru_add_drain) are called. 
There was no observable benefit from the new test, it is just found
from code inspection when refactoring compact_node().

Link: https://lkml.kernel.org/r/20240208022508.1771534-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang 
Cc: David Hildenbrand 
Signed-off-by: Andrew Morton

mm: compaction: limit the suitable target page order to be less than cc->order

2024-02-22T23:27:16+00:00

It can not improve the fragmentation if we isolate the target free pages
exceeding cc->order, especially when the cc->order is less than
pageblock_order.  For example, suppose the pageblock_order is MAX_ORDER
(size is 4M) and cc->order is 2M THP size, we should not isolate other 2M
free pages to be the migration target, which can not improve the
fragmentation.

Moreover this is also applicable for large folio compaction.

Link: https://lkml.kernel.org/r/afcd9377351c259df7a25a388a4a0d5862b986f4.1705928395.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang 
Acked-by: Mel Gorman 
Cc: Vlastimil Babka 
Cc: Zi Yan 
Signed-off-by: Andrew Morton