linux-toradex.git/include/linux/memory_hotplug.h, branch v5.1-rc1

Merge tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

2019-03-12T00:08:14+00:00

Pull xen updates from Juergen Gross:
 "xen fixes and features:

   - remove fallback code for very old Xen hypervisors

   - three patches for fixing Xen dom0 boot regressions

   - an old patch for Xen PCI passthrough which was never applied for
     unknown reasons

   - some more minor fixes and cleanup patches"

* tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix dom0 boot on huge systems
  xen, cpu_hotplug: Prevent an out of bounds access
  xen: remove pre-xen3 fallback handlers
  xen/ACPI: Switch to bitmap_zalloc()
  x86/xen: dont add memory above max allowed allocation
  x86: respect memory size limiting via mem= parameter
  xen/gntdev: Check and release imported dma-bufs on close
  xen/gntdev: Do not destroy context while dma-bufs are in use
  xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
  xen-scsiback: mark expected switch fall-through
  xen: mark expected switch fall-through

mm/page_alloc.c: memory hotplug: free pages as higher order

2019-03-06T05:07:14+00:00

When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

[arunks@codeaurora.org: v11]
  Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
  Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
  Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS 
Reviewed-by: Andrew Morton 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
Reviewed-by: Alexander Duyck 
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Dan Williams 
Cc: Vlastimil Babka 
Cc: Joonsoo Kim 
Cc: Greg Kroah-Hartman 
Cc: Mathieu Malaterre 
Cc: "Kirill A. Shutemov" 
Cc: Souptick Joarder 
Cc: Mel Gorman 
Cc: Aaron Lu 
Cc: Srivatsa Vaddagiri 
Cc: Vinayak Menon 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

x86: respect memory size limiting via mem= parameter

2019-02-18T05:50:34+00:00

When limiting memory size via kernel parameter "mem=" this should be
respected even in case of memory made accessible via a PCI card.

Today this kind of memory won't be made usable in initial memory
setup as the memory won't be visible in E820 map, but it might be
added when adding PCI devices due to corresponding ACPI table entries.

Not respecting "mem=" can be corrected by adding a global max_mem_size
variable set by parse_memopt() which will result in rejecting adding
memory areas resulting in a memory size above the allowed limit.

Signed-off-by: Juergen Gross 
Acked-by: Ingo Molnar 
Reviewed-by: William Kucharski 
Signed-off-by: Juergen Gross

mm/hotplug: invalid PFNs from pfn_to_online_page()

2019-02-01T23:46:23+00:00

On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP).  Hence, the page->flags is
uninitialized.

This is due to the commit 9f1eb38e0e11 ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid().  However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.

Historically, the commit 68709f45385a ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().

Since the commit 2d070eab2e82 ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.

Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98f5ade ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.

[1]
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
  Mem abort info:
    ESR = 0x96000005
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000005
    CM = 0, WnR = 0
  Internal error: Oops: 96000005 [#1] SMP
  CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
  pstate: 60400009 (nZCv daif +PAN -UAO)
  pc : page_mapping+0x24/0x144
  lr : __dump_page+0x34/0x3dc
  sp : ffff00003a5cfd10
  x29: ffff00003a5cfd10 x28: 000000000000802f
  x27: 0000000000000000 x26: 0000000000277d00
  x25: ffff000010791f56 x24: ffff7fe000000000
  x23: ffff000010772f8b x22: ffff00001125f670
  x21: ffff000011311000 x20: ffff000010772f8b
  x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: ffff802698b19600
  x13: ffff802698b1a200 x12: ffff802698b16f00
  x11: ffff802698b1a400 x10: 0000000000001400
  x9 : 0000000000000001 x8 : ffff00001121a000
  x7 : 0000000000000000 x6 : ffff0000102c53b8
  x5 : 0000000000000000 x4 : 0000000000000003
  x3 : 0000000000000100 x2 : 0000000000000000
  x1 : ffff000010772f8b x0 : ffffffffffffffff
  Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
  Call trace:
   page_mapping+0x24/0x144
   __dump_page+0x34/0x3dc
   dump_page+0x28/0x4c
   kmemleak_scan+0x4ac/0x680
   kmemleak_scan_thread+0xb4/0xdc
   kthread+0x12c/0x13c
   ret_from_fork+0x10/0x18
  Code: d503201f f9400660 36000040 d1000413 (f9400661)
  ---[ end trace 4d4bd7f573490c8e ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x002,20000c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
Fixes: 9f1eb38e0e11 ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai 
Acked-by: Michal Hocko 
Cc: Oscar Salvador 
Cc: Catalin Marinas 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

include/linux/memory_hotplug.h: remove duplicate declaration of offline_pages()

2018-12-28T20:11:50+00:00

offline_pages() is already declared in this file.

Just remove the duplicated one.

Link: http://lkml.kernel.org/r/20181205031357.24769-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang 
Reviewed-by: David Hildenbrand 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, sparse: pass nid instead of pgdat to sparse_add_one_section()

2018-12-28T20:11:49+00:00

Since the information needed in sparse_add_one_section() is node id to
allocate proper memory, it is not necessary to pass its pgdat.

This patch changes the prototype of sparse_add_one_section() to pass node
id directly.  This is intended to reduce misleading that
sparse_add_one_section() would touch pgdat.

Link: http://lkml.kernel.org/r/20181204085657.20472-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang 
Reviewed-by: David Hildenbrand 
Acked-by: Michal Hocko 
Cc: Dave Hansen 
Cc: Oscar Salvador 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm, memory_hotplug: add nid parameter to arch_remove_memory

2018-12-28T20:11:49+00:00

Patch series "Do not touch pages in hot-remove path", v2.

This patchset aims for two things:

 1) A better definition about offline and hot-remove stage
 2) Solving bugs where we can access non-initialized pages
    during hot-remove operations [2] [3].

This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.

[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html

This patch (of 5):

This is a preparation for the following-up patches.  The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.

Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador 
Reviewed-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jerome Glisse 
Cc: Jonathan Cameron 
Cc: "Rafael J. Wysocki" 

Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory_hotplug: drop "online" parameter from add_memory_resource()

2018-12-28T20:11:48+00:00

Userspace should always be in charge of how to online memory and if memory
should be onlined automatically in the kernel.  Let's drop the parameter
to overwrite this - XEN passes memhp_auto_online, just like add_memory(),
so we can directly use that instead internally.

Link: http://lkml.kernel.org/r/20181123123740.27652-1-david@redhat.com
Signed-off-by: David Hildenbrand 
Acked-by: Michal Hocko 
Reviewed-by: Oscar Salvador 
Acked-by: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Stefano Stabellini 
Cc: Dan Williams 
Cc: Pavel Tatashin 
Cc: David Hildenbrand 
Cc: Joonsoo Kim 
Cc: Arun KS 
Cc: Mathieu Malaterre 
Cc: Stephen Rothwell 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory_hotplug: make add_memory() take the device_hotplug_lock

2018-10-31T15:54:17+00:00

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Mathieu Malaterre 
Cc: Pavel Tatashin 
Cc: YASUAKI ISHIMATSU 
Cc: Balbir Singh 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Michael Neuling 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm/memory_hotplug: make remove_memory() take the device_hotplug_lock

2018-10-31T15:54:17+00:00

Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock.  And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g.  via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock.  So we can
have concurrent callers in online_pages().  We e.g.  touch in
online_pages() basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible.  We
would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify).  And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.

This patch (of 6):

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported.  So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
	arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand 
Reviewed-by: Pavel Tatashin 
Reviewed-by: Rafael J. Wysocki 
Reviewed-by: Rashmica Gupta 
Reviewed-by: Oscar Salvador 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Rashmica Gupta 
Cc: Michael Neuling 
Cc: Balbir Singh 
Cc: Nathan Fontenot 
Cc: John Allen 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Joonsoo Kim 
Cc: Vlastimil Babka 
Cc: Greg Kroah-Hartman 
Cc: YASUAKI ISHIMATSU 
Cc: Mathieu Malaterre 
Cc: Boris Ostrovsky 
Cc: Haiyang Zhang 
Cc: Heiko Carstens 
Cc: Jonathan Corbet 
Cc: Juergen Gross 
Cc: Kate Stewart 
Cc: "K. Y. Srinivasan" 
Cc: Martin Schwidefsky 
Cc: Philippe Ombredanne 
Cc: Stephen Hemminger 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds