<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/node.h, branch v6.0-rc1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drivers/base/node: rename link_mem_sections() to register_memory_block_under_node()</title>
<updated>2022-03-22T22:57:10+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-03-22T21:47:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cc6515591b25f08ce199e9379844a964f52a27f2'/>
<id>cc6515591b25f08ce199e9379844a964f52a27f2</id>
<content type='text'>
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2.

I remember talking to Michal in the past about removing
test_pages_in_a_zone(), which we use for:
* verifying that a memory block we intend to offline is really only managed
  by a single zone. We don't support offlining of memory blocks that are
  managed by multiple zones (e.g., multiple nodes, DMA and DMA32)
* exposing that zone to user space via
  /sys/devices/system/memory/memory*/valid_zones

Now that I identified some more cases where test_pages_in_a_zone() might
go wrong, and we received an UBSAN report (see patch #3), let's get rid of
this PFN walker.

So instead of detecting the zone at runtime with test_pages_in_a_zone() by
scanning the memmap, let's determine and remember for each memory block if
it's managed by a single zone.  The stored zone can then be used for the
above two cases, avoiding a manual lookup using test_pages_in_a_zone().

This avoids eventually stumbling over uninitialized memmaps in corner
cases, especially when ZONE_DEVICE ranges partly fall into memory block
(that are responsible for managing System RAM).

Handling memory onlining is easy, because we online to exactly one zone.
Handling boot memory is more tricky, because we want to avoid scanning all
zones of all nodes to detect possible zones that overlap with the physical
memory region of interest.  Fortunately, we already have code that
determines the applicable nodes for a memory block, to create sysfs links
-- we'll hook into that.

Patch #1 is a simple cleanup I had laying around for a longer time.
Patch #2 contains the main logic to remove test_pages_in_a_zone() and
further details.

[1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
[2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com

This patch (of 2):

Let's adjust the stale terminology, making it match
unregister_memory_block_under_nodes() and
do_register_memory_block_under_node().  We're dealing with memory block
devices, which span 1..X memory sections.

Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Rafael Parra &lt;rparrazo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2.

I remember talking to Michal in the past about removing
test_pages_in_a_zone(), which we use for:
* verifying that a memory block we intend to offline is really only managed
  by a single zone. We don't support offlining of memory blocks that are
  managed by multiple zones (e.g., multiple nodes, DMA and DMA32)
* exposing that zone to user space via
  /sys/devices/system/memory/memory*/valid_zones

Now that I identified some more cases where test_pages_in_a_zone() might
go wrong, and we received an UBSAN report (see patch #3), let's get rid of
this PFN walker.

So instead of detecting the zone at runtime with test_pages_in_a_zone() by
scanning the memmap, let's determine and remember for each memory block if
it's managed by a single zone.  The stored zone can then be used for the
above two cases, avoiding a manual lookup using test_pages_in_a_zone().

This avoids eventually stumbling over uninitialized memmaps in corner
cases, especially when ZONE_DEVICE ranges partly fall into memory block
(that are responsible for managing System RAM).

Handling memory onlining is easy, because we online to exactly one zone.
Handling boot memory is more tricky, because we want to avoid scanning all
zones of all nodes to detect possible zones that overlap with the physical
memory region of interest.  Fortunately, we already have code that
determines the applicable nodes for a memory block, to create sysfs links
-- we'll hook into that.

Patch #1 is a simple cleanup I had laying around for a longer time.
Patch #2 contains the main logic to remove test_pages_in_a_zone() and
further details.

[1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
[2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com

This patch (of 2):

Let's adjust the stale terminology, making it match
unregister_memory_block_under_nodes() and
do_register_memory_block_under_node().  We're dealing with memory block
devices, which span 1..X memory sections.

Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Rafael Parra &lt;rparrazo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drivers/base/node: consolidate node device subsystem initialization in node_dev_init()</title>
<updated>2022-03-22T22:57:10+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2022-03-22T21:47:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2848a28b0a6052a4c8450397d2647d7d8e3f6f06'/>
<id>2848a28b0a6052a4c8450397d2647d7d8e3f6f06</id>
<content type='text'>
...  and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls.  All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().

This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.

Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.

The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device.  The latter should be the case for the current users of
topology_init().

Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Tested-by: Anatoly Pugachev &lt;matorola@gmail.com&gt; (sparc64)
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
...  and call node_dev_init() after memory_dev_init() from driver_init(),
so before any of the existing arch/subsys calls.  All online nodes should
be known at that point: early during boot, arch code determines node and
zone ranges and sets the relevant nodes online; usually this happens in
setup_arch().

This is in line with memory_dev_init(), which initializes the memory
device subsystem and creates all memory block devices.

Similar to memory_dev_init(), panic() if anything goes wrong, we don't
want to continue with such basic initialization errors.

The important part is that node_dev_init() gets called after
memory_dev_init() and after cpu_dev_init(), but before any of the relevant
archs call register_cpu() to register the new cpu device under the node
device.  The latter should be the case for the current users of
topology_init().

Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Tested-by: Anatoly Pugachev &lt;matorola@gmail.com&gt; (sparc64)
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NUMA Balancing: add page promotion counter</title>
<updated>2022-03-22T22:57:09+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2022-03-22T21:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e39bb6be9f2b39a6dbaeff484361de76021b175d'/>
<id>e39bb6be9f2b39a6dbaeff484361de76021b175d</id>
<content type='text'>
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13

With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.

After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes.  In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node.  The
CPUs and the DRAM will be put in one logical node, while the PMEM will
be put in another (faked) logical node.

To optimize the system overall performance, the hot pages should be
placed in DRAM node.  To do that, we need to identify the hot pages in
the PMEM node and migrate them to DRAM node via NUMA migration.

In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node.  So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory
tiering system.  This is implemented in this patchset.

At the other hand, the cold pages should be placed in PMEM node.  So, we
also need to identify the cold pages in the DRAM node and migrate them
to PMEM node.

In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory
pressure is implemented.  Based on that, the cold DRAM pages can be
demoted to PMEM node proactively to free some memory space on DRAM node
to accommodate the promoted hot PMEM pages.  This is implemented in this
patchset too.

We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent Memory
Model.  The test results shows that the pmbench score can improve up to
95.9%.

This patch (of 3):

In a system with multiple memory types, e.g.  DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM").  So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.

To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages.  A new vmstat count is added.  The
counter is per-node (count in the target node).  So this can be used to
identify promotion imbalance among the NUMA nodes.

Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: zhongjiang-ali &lt;zhongjiang-ali@linux.alibaba.com&gt;
Cc: Feng Tang &lt;feng.tang@intel.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13

With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.

After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes.  In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node.  The
CPUs and the DRAM will be put in one logical node, while the PMEM will
be put in another (faked) logical node.

To optimize the system overall performance, the hot pages should be
placed in DRAM node.  To do that, we need to identify the hot pages in
the PMEM node and migrate them to DRAM node via NUMA migration.

In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node.  So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory
tiering system.  This is implemented in this patchset.

At the other hand, the cold pages should be placed in PMEM node.  So, we
also need to identify the cold pages in the DRAM node and migrate them
to PMEM node.

In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory
pressure is implemented.  Based on that, the cold DRAM pages can be
demoted to PMEM node proactively to free some memory space on DRAM node
to accommodate the promoted hot PMEM pages.  This is implemented in this
patchset too.

We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent Memory
Model.  The test results shows that the pmbench score can improve up to
95.9%.

This patch (of 3):

In a system with multiple memory types, e.g.  DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM").  So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.

To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages.  A new vmstat count is added.  The
counter is per-node (count in the target node).  So this can be used to
identify promotion imbalance among the NUMA nodes.

Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: zhongjiang-ali &lt;zhongjiang-ali@linux.alibaba.com&gt;
Cc: Feng Tang &lt;feng.tang@intel.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE</title>
<updated>2021-11-06T20:30:42+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2021-11-05T20:44:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=50f9481ed9fb8a2d2a06a155634c7f9eeff9fa61'/>
<id>50f9481ed9fb8a2d2a06a155634c7f9eeff9fa61</id>
<content type='text'>
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;	[kselftest]
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;	[kselftest]
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Alex Shi &lt;alexs@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: don't panic when links can't be created in sysfs</title>
<updated>2020-10-16T18:11:18+00:00</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2020-10-16T03:09:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=90c7eaeb14a325a760d732184ff1fbed47e5fa98'/>
<id>90c7eaeb14a325a760d732184ff1fbed47e5fa98</id>
<content type='text'>
At boot time, or when doing memory hot-add operations, if the links in
sysfs can't be created, the system is still able to run, so just report
the error in the kernel log rather than BUG_ON and potentially make system
unusable because the callpath can be called with locks held.

Since the number of memory blocks managed could be high, the messages are
rate limited.

As a consequence, link_mem_sections() has no status to report anymore.

Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: "Rafael J . Wysocki" &lt;rafael@kernel.org&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-4-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At boot time, or when doing memory hot-add operations, if the links in
sysfs can't be created, the system is still able to run, so just report
the error in the kernel log rather than BUG_ON and potentially make system
unusable because the callpath can be called with locks held.

Since the number of memory blocks managed could be high, the messages are
rate limited.

As a consequence, link_mem_sections() has no status to report anymore.

Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: "Rafael J . Wysocki" &lt;rafael@kernel.org&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-4-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: don't rely on system state to detect hot-plug operations</title>
<updated>2020-09-26T17:33:57+00:00</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2020-09-26T04:19:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f85086f95fa36194eb0db5cd5c12e56801b98523'/>
<id>f85086f95fa36194eb0db5cd5c12e56801b98523</id>
<content type='text'>
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -&gt; ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -&gt; ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: 4fbce633910e ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -&gt; ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -&gt; ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: 4fbce633910e ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: make register_mem_sect_under_node() static</title>
<updated>2019-07-19T00:08:06+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8d595c4c0f768f19db043d378b22e98405f9fd47'/>
<id>8d595c4c0f768f19db043d378b22e98405f9fd47</id>
<content type='text'>
It is only used internally.

Link: http://lkml.kernel.org/r/20190614100114.311-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is only used internally.

Link: http://lkml.kernel.org/r/20190614100114.311-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail</title>
<updated>2019-07-19T00:08:06+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a31b264c2b415b29660da0bc2ba291a98629ce51'/>
<id>a31b264c2b415b29660da0bc2ba291a98629ce51</id>
<content type='text'>
We really don't want anything during memory hotunplug to fail.  We
always pass a valid memory block device, that check can go.  Avoid
allocating memory and eventually failing.  As we are always called under
lock, we can use a static piece of memory.  This avoids having to put
the structure onto the stack, having to guess about the stack size of
callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem-&gt;nid should tell us exactly what to remove.  Memory block devices
with mixed nodes (added during boot) should properly fenced off and
never removed.

Link: http://lkml.kernel.org/r/20190527111152.16324-11-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We really don't want anything during memory hotunplug to fail.  We
always pass a valid memory block device, that check can go.  Avoid
allocating memory and eventually failing.  As we are always called under
lock, we can use a static piece of memory.  This avoids having to put
the structure onto the stack, having to guess about the stack size of
callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem-&gt;nid should tell us exactly what to remove.  Memory block devices
with mixed nodes (added during boot) should properly fenced off and
never removed.

Link: http://lkml.kernel.org/r/20190527111152.16324-11-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/memory_hotplug: remove memory block devices before arch_remove_memory()</title>
<updated>2019-07-19T00:08:06+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4c4b7f9ba9486c565aead99a198ceeef73ae81f6'/>
<id>4c4b7f9ba9486c565aead99a198ceeef73ae81f6</id>
<content type='text'>
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices.  Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices.  Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>node: Add memory-side caching attributes</title>
<updated>2019-04-04T16:41:21+00:00</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2019-03-11T20:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=acc02a109b0497e917c83f986a89c51e47d0022c'/>
<id>acc02a109b0497e917c83f986a89c51e47d0022c</id>
<content type='text'>
System memory may have caches to help improve access speed to frequently
requested address ranges. While the system provided cache is transparent
to the software accessing these memory ranges, applications can optimize
their own access based on cache attributes.

Provide a new API for the kernel to register these memory-side caches
under the memory node that provides it.

The new sysfs representation is modeled from the existing cpu cacheinfo
attributes, as seen from /sys/devices/system/cpu/&lt;cpu&gt;/cache/.  Unlike CPU
cacheinfo though, the node cache level is reported from the view of the
memory. A higher level number is nearer to the CPU, while lower levels
are closer to the last level memory.

The exported attributes are the cache size, the line size, associativity
indexing, and write back policy, and add the attributes for the system
memory caches to sysfs stable documentation.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Tested-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
System memory may have caches to help improve access speed to frequently
requested address ranges. While the system provided cache is transparent
to the software accessing these memory ranges, applications can optimize
their own access based on cache attributes.

Provide a new API for the kernel to register these memory-side caches
under the memory node that provides it.

The new sysfs representation is modeled from the existing cpu cacheinfo
attributes, as seen from /sys/devices/system/cpu/&lt;cpu&gt;/cache/.  Unlike CPU
cacheinfo though, the node cache level is reported from the view of the
memory. A higher level number is nearer to the CPU, while lower levels
are closer to the last level memory.

The exported attributes are the cache size, the line size, associativity
indexing, and write back policy, and add the attributes for the system
memory caches to sysfs stable documentation.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Tested-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
