linux-toradex.git/mm/memory_hotplug.c, branch v6.7-rc8

mm/memory_hotplug: fix error handling in add_memory_resource()

2023-12-07T00:12:46+00:00

In add_memory_resource(), creation of memory block devices occurs after
successful call to arch_add_memory().  However, creation of memory block
devices could fail.  In that case, arch_remove_memory() is called to
perform necessary cleanup.

Currently with or without altmap support, arch_remove_memory() is always
passed with altmap set to NULL during error handling.  This leads to
freeing of struct pages using free_pages(), eventhough the allocation
might have been performed with altmap support via
altmap_alloc_block_buf().

Fix the error handling by passing altmap in arch_remove_memory(). This
ensures the following:
* When altmap is disabled, deallocation of the struct pages array occurs
  via free_pages().
* When altmap is enabled, deallocation occurs via vmem_altmap_free().

Link: https://lkml.kernel.org/r/20231120145354.308999-3-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar 
Reviewed-by: Gerald Schaefer 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Anshuman Khandual 
Cc: Heiko Carstens 
Cc: kernel test robot 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Cc: 	[5.15+]
Signed-off-by: Andrew Morton

mm/memory_hotplug: add missing mem_hotplug_lock

2023-12-07T00:12:46+00:00

From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).

mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.

When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0:					     | CPU 1:
memory_offline()			     |
-> offline_pages()			     |
	-> mem_hotplug_begin()		     |
	   ...				     |
	-> mem_hotplug_done()		     |
					     | kmemleak_scan()
					     | -> get_online_mems()
					     |    ...
-> mhp_deinit_memmap_on_memory()	     |
  [not protected by mem_hotplug_begin/done()]|
  Marks memory section as offline,	     |   Retrieves zone_start_pfn
  poisons vmemmap struct pages and updates   |   and struct page members.
  the zone related data			     |
   					     |    ...
   					     | -> put_online_mems()

Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory().  Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.

online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.

Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com
Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Signed-off-by: Sumanth Korikkar 
Reviewed-by: Gerald Schaefer 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Anshuman Khandual 
Cc: Heiko Carstens 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Cc: kernel test robot 
Cc: 	[5.15+]
Signed-off-by: Andrew Morton

mm: memory_hotplug: drop memoryless node from fallback lists

2023-10-25T23:47:14+00:00

In offline_pages(), if a node becomes memoryless, we will clear its
N_MEMORY state by calling node_states_clear_node().  But we do this
after rebuilding the zonelists by calling build_all_zonelists(), which
will cause this memoryless node to still be in the fallback nodes
(node_order[]) of other nodes.

To drop memoryless nodes from fallback nodes in this case, just call
node_states_clear_node() before calling build_all_zonelists().

In this way, we will not try to allocate pages from memoryless node0,
then the panic mentioned in [1] will also be fixed.  Even though this
problem has been solved by dropping the NODE_MIN_SIZE constrain in x86
[2], it would be better to fix it in the core MM as well.

https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [1]
https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ [2]

Link: https://lkml.kernel.org/r/9f1dbe7ee1301c7163b2770e32954ff5e3ecf2c4.1697711415.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng 
Acked-by: David Hildenbrand 
Acked-by: Ingo Molnar 
Cc: Aneesh Kumar K.V 
Cc: "Huang, Ying" 
Cc: Johannes Weiner 
Cc: Matthew Wilcox (Oracle) 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Oscar Salvador 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton

mm/memory_hotplug: use pfn math in place of direct struct page manipulation

2023-10-04T17:32:29+00:00

When dealing with hugetlb pages, manipulating struct page pointers
directly can get to wrong struct page, since struct page is not guaranteed
to be contiguous on SPARSEMEM without VMEMMAP.  Use pfn calculation to
handle it properly.

Without the fix, a wrong number of page might be skipped. Since skip cannot be
negative, scan_movable_page() will end early and might miss a movable page with
-ENOENT. This might fail offline_pages(). No bug is reported. The fix comes
from code inspection.

Link: https://lkml.kernel.org/r/20230913201248.452081-4-zi.yan@sent.com
Fixes: eeb0efd071d8 ("mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages")
Signed-off-by: Zi Yan 
Reviewed-by: Muchun Song 
Acked-by: David Hildenbrand 
Cc: Matthew Wilcox (Oracle) 
Cc: Mike Kravetz 
Cc: Mike Rapoport (IBM) 
Cc: Thomas Bogendoerfer 
Cc: 
Signed-off-by: Andrew Morton

mm/memory_hotplug: embed vmem_altmap details in memory block

2023-08-21T20:37:49+00:00

With memmap on memory, some architecture needs more details w.r.t altmap
such as base_pfn, end_pfn, etc to unmap vmemmap memory.  Instead of
computing them again when we remove a memory block, embed vmem_altmap
details in struct memory_block if we are using memmap on memory block
feature.

[yangyingliang@huawei.com: fix error return code in add_memory_resource()]
  Link: https://lkml.kernel.org/r/20230809081552.1351184-1-yangyingliang@huawei.com
Link: https://lkml.kernel.org/r/20230808091501.287660-7-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Signed-off-by: Yang Yingliang 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: support memmap_on_memory when memmap is not aligned to pageblocks

2023-08-21T20:37:49+00:00

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks.  This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space.  This helps the start address to be page block
aligned with different memory block sizes.  Using this facility implies
the kernel will be reserving some pages for every memoryblock.  This
allows the memmap on memory feature to be widely useful with different
memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages.  ie, for every 4096 pages 28 pages get reserved.

Link: https://lkml.kernel.org/r/20230808091501.287660-5-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: allow architecture to override memmap on memory support check

2023-08-21T20:37:48+00:00

Some architectures would want different restrictions. Hence add an
architecture-specific override.

The PMD_SIZE check is moved there.

Link: https://lkml.kernel.org/r/20230808091501.287660-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: allow memmap on memory hotplug request to fallback

2023-08-21T20:37:48+00:00

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Link: https://lkml.kernel.org/r/20230808091501.287660-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/memory_hotplug: document the signal_pending() check in offline_pages()

2023-08-18T17:12:19+00:00

Let's update the documentation that any signal is sufficient, and add a
comment that not only checking for fatal signals is historical baggage:
changing it now could break existing user space.  although unlikely.

For example, when an app provides a custom SIGALRM handler and triggers
memory offlining, the timeout cmd would no longer stop memory offlining,
because SIGALRM would no longer be considered a fatal signal.

Note that using signal_pending() instead of fatal_signal_pending() is
an anti-pattern, but slowly deprecating that behavior to eventually
change it in the far future is probably not worth the effort.  If this
ever becomes relevant for user-space, we might want to rethink.

Link: https://lkml.kernel.org/r/20230711174050.603820-1-david@redhat.com
Signed-off-by: David Hildenbrand 
Acked-by: Michal Hocko 
Cc: Oscar Salvador 
Cc: Jonathan Corbet 
Signed-off-by: Andrew Morton

mm: remove unnecessary pagevec includes

2023-06-23T23:59:31+00:00

These files no longer need pagevec.h, mostly due to function declarations
being moved out of it.

Link: https://lkml.kernel.org/r/20230621164557.3510324-14-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Andrew Morton