linux-toradex.git/include/linux/memory_hotplug.h, branch v6.16-rc6

mm: add build-time option for hotplug memory default online type

2025-01-26T04:22:21+00:00

Memory hotplug presently auto-onlines memory into a zone the kernel deems
appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.

The memhp_default_state boot param enables runtime config, but it's not
possible to do this at build-time.

Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with
CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param.

Selections:
  CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE
    => mhp_default_online_type = "offline"
       Memory will not be onlined automatically.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO
    => mhp_default_online_type = "online"
       Memory will be onlined automatically in a zone deemed.
       appropriate by the kernel.

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL
    => mhp_default_online_type = "online_kernel"
       Memory will be onlined automatically.
       The zone may allow kernel data (e.g. ZONE_NORMAL).

  CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE
    => mhp_default_online_type = "online_movable"
       Memory will be onlined automatically.
       The zone will be ZONE_MOVABLE.

Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing
default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior.

Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.

[gourry@gourry.net: update KConfig comments]
  Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net
Signed-off-by: Gregory Price 
Acked-by: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Huacai Chen 
Cc: Jonathan Corbet 
Cc: Oscar Salvador 
Cc: "Rafael J. Wysocki" 
Cc: WANG Xuerui 
Signed-off-by: Andrew Morton

mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION

2024-09-04T04:15:28+00:00

There are no users of HAVE_ARCH_NODEDATA_EXTENSION left, so
arch_alloc_nodedata() and arch_refresh_nodedata() are not needed anymore.

Replace the call to arch_alloc_nodedata() in free_area_init() with a new
helper alloc_offline_node_data(), remove arch_refresh_nodedata() and
cleanup include/linux/memory_hotplug.h from the associated ifdefery.

Link: https://lkml.kernel.org/r/20240807064110.1003856-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) 
Tested-by: Zi Yan  # for x86_64 and arm64
Acked-by: Dan Williams 
Cc: Alexander Gordeev 
Cc: Andreas Larsson 
Cc: Arnd Bergmann 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: David Hildenbrand 
Cc: Davidlohr Bueso 
Cc: David S. Miller 
Cc: Greg Kroah-Hartman 
Cc: Heiko Carstens 
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: Jiaxun Yang 
Cc: John Paul Adrian Glaubitz 
Cc: Jonathan Cameron 
Cc: Jonathan Corbet 
Cc: Michael Ellerman 
Cc: Palmer Dabbelt 
Cc: Rafael J. Wysocki 
Cc: Rob Herring (Arm) 
Cc: Samuel Holland 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Vasily Gorbik 
Cc: Will Deacon 
Signed-off-by: Andrew Morton

mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining

2024-07-04T02:30:18+00:00

We currently have a hack for virtio-mem in place to handle memory
offlining with PageOffline pages for which we already adjusted the managed
page count.

Let's enlighten memory offlining code so we can get rid of that hack, and
document the situation.

Link: https://lkml.kernel.org/r/20240607090939.89524-4-david@redhat.com
Signed-off-by: David Hildenbrand 
Acked-by: Oscar Salvador 
Cc: Alexander Potapenko 
Cc: Dexuan Cui 
Cc: Dmitry Vyukov 
Cc: Eugenio Pérez 
Cc: Haiyang Zhang 
Cc: Jason Wang 
Cc: Juergen Gross 
Cc: "K. Y. Srinivasan" 
Cc: Marco Elver 
Cc: Michael S. Tsirkin 
Cc: Mike Rapoport (IBM) 
Cc: Oleksandr Tyshchenko 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: Xuan Zhuo 
Signed-off-by: Andrew Morton

mm/memory_hotplug: export mhp_supports_memmap_on_memory()

2024-02-22T18:24:40+00:00

In preparation for adding sysfs ABI to toggle memmap_on_memory semantics
for drivers adding memory, export the mhp_supports_memmap_on_memory()
helper. This allows drivers to check if memmap_on_memory support is
available before trying to request it, and display an appropriate
message if it isn't available. As part of this, remove the size argument
to this - with recent updates to allow memmap_on_memory for larger
ranges, and the internal splitting of altmaps into respective memory
blocks, the size argument is meaningless.

[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/20240124-vv-dax_abi-v7-4-20d16cb8d23d@intel.com
Signed-off-by: Vishal Verma 
Acked-by: David Hildenbrand 
Suggested-by: David Hildenbrand 
Cc: Greg Kroah-Hartman 
Cc: Jonathan Cameron 
Cc: Li Zhijian 
Cc: Matthew Wilcox (Oracle) 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Signed-off-by: Andrew Morton

mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers

2024-02-22T00:00:01+00:00

Patch series "implement "memmap on memory" feature on s390".

This series provides "memmap on memory" support on s390 platform.  "memmap
on memory" allows struct pages array to be allocated from the hotplugged
memory range instead of allocating it from main system memory.

s390 currently preallocates struct pages array for all potentially
possible memory, which ensures memory onlining always succeeds, but with
the cost of significant memory consumption from the available system
memory during boottime.  In certain extreme configuration, this could lead
to ipl failure.

"memmap on memory" ensures struct pages array are populated from self
contained hotplugged memory range instead of depleting the available
system memory and this could eliminate ipl failure on s390 platform.

On other platforms, system might go OOM when the physically hotplugged
memory depletes the available memory before it is onlined.  Hence, "memmap
on memory" feature was introduced as described in commit a08a2ae34613
("mm,memory_hotplug: allocate memmap from the added memory range").

Unlike other architectures, s390 memory blocks are not physically
accessible until it is online.  To make it physically accessible two new
memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and
this notifier lets the hypervisor inform that the memory should be made
physically accessible.  This allows for "memmap on memory" initialization
during memory hotplug onlining phase, which is performed before calling
MEM_GOING_ONLINE notifier.

Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
to prepare the transition of memory to and from a physically accessible
state.  New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure
altmap cannot be written when adding memory - before it is set online. 
This enhancement is crucial for implementing the "memmap on memory"
feature for s390 in a subsequent patch.

Patches 2 allocates vmemmap pages from self-contained memory range for
s390.  It allocates memory map (struct pages array) from the hotplugged
memory range, rather than using system memory by passing altmap to vmemmap
functions.

Patch 3 removes unhandled memory notifier types on s390.

Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
on s390.  MEM_PREPARE_ONLINE memory notifier makes memory block physical
accessible via sclp assign command.  The notifier ensures self-contained
memory maps are accessible and hence enabling the "memmap on memory" on
s390.  MEM_FINISH_OFFLINE memory notifier shifts the memory block to an
inaccessible state via sclp unassign command.

Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390.


This patch (of 5):

Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to
prepare the transition of memory to and from a physically accessible
state.  This enhancement is crucial for implementing the "memmap on
memory" feature for s390 in a subsequent patch.

Platforms such as x86 can support physical memory hotplug via ACPI.  When
there is physical memory hotplug, ACPI event leads to the memory addition
with the following callchain:

acpi_memory_device_add()
  -> acpi_memory_enable_device()
     -> __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way.  The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier.  This is too late and "memmap
on memory" initialization is performed before calling MEM_GOING_ONLINE
notifier.

During the memory hotplug addition phase, altmap support is prepared and
during the memory onlining phase s390 requires memory to be physically
accessible and then subsequently initiate the "memmap on memory"
initialization process.

The memory provider will handle new MEM_PREPARE_ONLINE /
MEM_FINISH_OFFLINE notifications and make the memory accessible.

The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when
used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written
(e.g., poisoned) when adding memory -- before it is set online.  This
allows for adding memory with an altmap that is not currently made
available by a hypervisor.  When onlining that memory, the hypervisor can
be instructed to make that memory accessible via the new notifiers and the
onlining phase will not require any memory allocations, which is helpful
in low-memory situations.

All architectures ignore unknown memory notifiers.  Therefore, the
introduction of these new notifiers does not result in any functional
modifications across architectures.

Link: https://lkml.kernel.org/r/20240108132747.3238763-1-sumanthk@linux.ibm.com
Link: https://lkml.kernel.org/r/20240108132747.3238763-2-sumanthk@linux.ibm.com
Signed-off-by: Sumanth Korikkar 
Suggested-by: Gerald Schaefer 
Suggested-by: David Hildenbrand 
Acked-by: David Hildenbrand 
Cc: Alexander Gordeev 
Cc: Aneesh Kumar K.V 
Cc: Anshuman Khandual 
Cc: Heiko Carstens 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Vasily Gorbik 
Signed-off-by: Andrew Morton

mm/memory_hotplug: allow memmap on memory hotplug request to fallback

2023-08-21T20:37:48+00:00

If not supported, fallback to not using memap on memmory. This avoids
the need for callers to do the fallback.

Link: https://lkml.kernel.org/r/20230808091501.287660-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V 
Acked-by: Michal Hocko 
Acked-by: David Hildenbrand 
Cc: Christophe Leroy 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Oscar Salvador 
Cc: Vishal Verma 
Signed-off-by: Andrew Morton

mm/sparse: remove unused parameters in sparse_remove_section()

2023-06-19T23:19:04+00:00

These parameters ms and map_offset are not used in
sparse_remove_section(), so remove them.

The __remove_section() is only called by __remove_pages(), remove it.  And
put the WARN_ON_ONCE() in sparse_remove_section().

Link: https://lkml.kernel.org/r/20230607023952.2247489-1-yajun.deng@linux.dev
Signed-off-by: Yajun Deng 
Reviewed-by: David Hildenbrand 
Cc: Oscar Salvador 
Signed-off-by: Andrew Morton

mm: page_alloc: move set_zone_contiguous() into mm_init.c

2023-06-09T23:25:22+00:00

set_zone_contiguous() is only used in mm init/hotplug, and
clear_zone_contiguous() only used in hotplug, move them from page_alloc.c
to the more appropriate file.

Link: https://lkml.kernel.org/r/20230516063821.121844-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang 
Cc: David Hildenbrand 
Cc: "Huang, Ying" 
Cc: Iurii Zaikin 
Cc: Kees Cook 
Cc: Len Brown 
Cc: Luis Chamberlain 
Cc: Mike Rapoport (IBM) 
Cc: Oscar Salvador 
Cc: Pavel Machek 
Cc: Rafael J. Wysocki 
Signed-off-by: Andrew Morton

mm, memory_hotplug: remove obsolete generic_free_nodedata()

2022-10-03T21:03:29+00:00

Commit 390511e1476e ("mm, memory_hotplug: drop arch_free_nodedata") drops
the last caller of generic_free_nodedata().  Remove it too.

Link: https://lkml.kernel.org/r/20220916072257.9639-11-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin 
Reviewed-by: David Hildenbrand 
Reviewed-by: Anshuman Khandual 
Reviewed-by: Oscar Salvador 
Cc: Matthew Wilcox 
Signed-off-by: Andrew Morton

mm: fix null-ptr-deref in kswapd_is_running()

2022-09-12T03:26:04+00:00

kswapd_run/stop() will set pgdat->kswapd to NULL, which could race with
kswapd_is_running() in kcompactd(),

kswapd_run/stop()                       kcompactd()
                                          kswapd_is_running()
  pgdat->kswapd // error or nomal ptr
                                          verify pgdat->kswapd
                                            // load non-NULL
pgdat->kswapd
  pgdat->kswapd = NULL
                                          task_is_running(pgdat->kswapd)
                                            // Null pointer derefence

KASAN reports the null-ptr-deref shown below,

  vmscan: Failed to start kswapd on node 0
  ...
  BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504
  Read of size 8 at addr 0000000000000024 by task kcompactd0/37

  CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G           OE     5.10.60 #1
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Call trace:
   dump_backtrace+0x0/0x394
   show_stack+0x34/0x4c
   dump_stack+0x158/0x1e4
   __kasan_report+0x138/0x140
   kasan_report+0x44/0xdc
   __asan_load8+0x94/0xd0
   kcompactd+0x440/0x504
   kthread+0x1a4/0x1f0
   ret_from_fork+0x10/0x18

At present kswapd/kcompactd_run() and kswapd/kcompactd_stop() are protected
by mem_hotplug_begin/done(), but without kcompactd(). There is no need to
involve memory hotplug lock in kcompactd(), so let's add a new mutex to
protect pgdat->kswapd accesses.

Also, because the kcompactd task will check the state of kswapd task, it's
better to call kcompactd_stop() before kswapd_stop() to reduce lock
conflicts.

[akpm@linux-foundation.org: add comments]
Link: https://lkml.kernel.org/r/20220827111959.186838-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang 
Cc: David Hildenbrand 
Cc: Muchun Song 
Signed-off-by: Andrew Morton