<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/mm/vmstat.c, branch v6.0-rc7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>mm: add DEVICE_ZONE to FOR_ALL_ZONES</title>
<updated>2022-08-20T22:17:45+00:00</updated>
<author>
<name>Hao Lee</name>
<email>haolee.swjtu@gmail.com</email>
</author>
<published>2022-08-07T15:44:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a39c5d3ce03dd890ab6a9be44b21177cec32da55'/>
<id>a39c5d3ce03dd890ab6a9be44b21177cec32da55</id>
<content type='text'>
FOR_ALL_ZONES should be consistent with enum zone_type.  Otherwise,
__count_zid_vm_events have the potential to add count to wrong item when
zid is ZONE_DEVICE.

Link: https://lkml.kernel.org/r/20220807154442.GA18167@haolee.io
Signed-off-by: Hao Lee &lt;haolee.swjtu@gmail.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
FOR_ALL_ZONES should be consistent with enum zone_type.  Otherwise,
__count_zid_vm_events have the potential to add count to wrong item when
zid is ZONE_DEVICE.

Link: https://lkml.kernel.org/r/20220807154442.GA18167@haolee.io
Signed-off-by: Hao Lee &lt;haolee.swjtu@gmail.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux</title>
<updated>2022-06-04T21:04:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-04T21:04:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d0e60d46bc03252b8d4ffaaaa0b371970ac16cda'/>
<id>d0e60d46bc03252b8d4ffaaaa0b371970ac16cda</id>
<content type='text'>
Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull bitmap updates from Yury Norov:

 - bitmap: optimize bitmap_weight() usage, from me

 - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro
   Carvalho Chehab

 - include/linux/find: Fix documentation, from Anna-Maria Behnsen

 - bitmap: fix conversion from/to fix-sized arrays, from me

 - bitmap: Fix return values to be unsigned, from Kees Cook

It has been in linux-next for at least a week with no problems.

* tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits)
  nodemask: Fix return values to be unsigned
  bitmap: Fix return values to be unsigned
  KVM: x86: hyper-v: replace bitmap_weight() with hweight64()
  KVM: x86: hyper-v: fix type of valid_bank_mask
  ia64: cleanup remove_siblinginfo()
  drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
  KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate
  lib/bitmap: add test for bitmap_{from,to}_arr64
  lib: add bitmap_{from,to}_arr64
  lib/bitmap: extend comment for bitmap_(from,to)_arr32()
  include/linux/find: Fix documentation
  lib/bitmap.c make bitmap_print_bitmask_to_buf parseable
  MAINTAINERS: add cpumask and nodemask files to BITMAP_API
  arch/x86: replace nodes_weight with nodes_empty where appropriate
  mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate
  clocksource: replace cpumask_weight with cpumask_empty in clocksource.c
  genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate
  irq: mips: replace cpumask_weight with cpumask_empty where appropriate
  drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate
  arch/x86: replace cpumask_weight with cpumask_empty where appropriate
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate</title>
<updated>2022-06-03T13:52:57+00:00</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2022-01-23T18:38:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b55032f1067a02c7f80943dd118060952e8bd7ac'/>
<id>b55032f1067a02c7f80943dd118060952e8bd7ac</id>
<content type='text'>
mm/vmstat.c code calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mm/vmstat.c code calls cpumask_weight() to check if any bit of a given
cpumask is set. We can do it more efficiently with cpumask_empty() because
cpumask_empty() stops traversing the cpumask as soon as it finds first set
bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Acked-by: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: zswap: add basic meminfo and vmstat coverage</title>
<updated>2022-05-19T21:08:53+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2022-05-19T21:08:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f6498b776d280b30a4614d8261840961e993c2c8'/>
<id>f6498b776d280b30a4614d8261840961e993c2c8</id>
<content type='text'>
Currently it requires poking at debugfs to figure out the size and
population of the zswap cache on a host.  There are no counters for reads
and writes against the cache.  As a result, it's difficult to understand
zswap behavior on production systems.

Print zswap memory consumption and how many pages are zswapped out in
/proc/meminfo.  Count zswapouts and zswapins in /proc/vmstat.

Link: https://lkml.kernel.org/r/20220510152847.230957-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dan Streetman &lt;ddstreet@ieee.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Seth Jennings &lt;sjenning@redhat.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently it requires poking at debugfs to figure out the size and
population of the zswap cache on a host.  There are no counters for reads
and writes against the cache.  As a result, it's difficult to understand
zswap behavior on production systems.

Print zswap memory consumption and how many pages are zswapped out in
/proc/meminfo.  Count zswapouts and zswapins in /proc/vmstat.

Link: https://lkml.kernel.org/r/20220510152847.230957-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dan Streetman &lt;ddstreet@ieee.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Seth Jennings &lt;sjenning@redhat.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmstat: add events for ksm cow</title>
<updated>2022-04-29T06:16:16+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2022-04-29T06:16:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94bfe85bde18a99612bb5d0ce348643c2d352836'/>
<id>94bfe85bde18a99612bb5d0ce348643c2d352836</id>
<content type='text'>
Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to
save memory, it's a tradeoff by suffering delay on ksm cow.  Users can get
to know how much memory ksm saved by reading
/sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of
ksm cow, and this is important of some delay sensitive tasks.

So add ksm cow events to help users evaluate whether or how to use ksm. 
Also update Documentation/admin-guide/mm/ksm.rst with new added events.

Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: xu xin &lt;xu.xin16@zte.com.cn&gt;
Reviewed-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to
save memory, it's a tradeoff by suffering delay on ksm cow.  Users can get
to know how much memory ksm saved by reading
/sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of
ksm cow, and this is important of some delay sensitive tasks.

So add ksm cow events to help users evaluate whether or how to use ksm. 
Also update Documentation/admin-guide/mm/ksm.rst with new added events.

Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: xu xin &lt;xu.xin16@zte.com.cn&gt;
Reviewed-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: untangle config dependencies for demote-on-reclaim</title>
<updated>2022-04-29T06:16:09+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2022-04-29T06:16:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7d6e2d96384556a4f30547803be1f606eb805a62'/>
<id>7d6e2d96384556a4f30547803be1f606eb805a62</id>
<content type='text'>
At the time demote-on-reclaim was introduced, it was tied to
CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate.

The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE,
so clean this up.  Furthermore, we only register the hotplug memory
notifier when the system has CONFIG_MEMORY_HOTPLUG.

Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At the time demote-on-reclaim was introduced, it was tied to
CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate.

The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE,
so clean this up.  Furthermore, we only register the hotplug memory
notifier when the system has CONFIG_MEMORY_HOTPLUG.

Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: only re-generate demotion targets when a numa node changes its N_CPU state</title>
<updated>2022-03-22T22:57:11+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2022-03-22T21:47:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=734c15700cdf9062ae98d8b131c6fe873dfad26d'/>
<id>734c15700cdf9062ae98d8b131c6fe873dfad26d</id>
<content type='text'>
Abhishek reported that after patch [1], hotplug operations are taking
roughly double the expected time.  [2]

The reason behind is that the CPU callbacks that
migrate_on_reclaim_init() sets always call set_migration_target_nodes()
whenever a CPU is brought up/down.

But we only care about numa nodes going from having cpus to become
cpuless, and vice versa, as that influences the demotion_target order.

We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.

[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/

[osalvador@suse.de: add feedback from Huang Ying]
  Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de

Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reported-by: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Abhishek reported that after patch [1], hotplug operations are taking
roughly double the expected time.  [2]

The reason behind is that the CPU callbacks that
migrate_on_reclaim_init() sets always call set_migration_target_nodes()
whenever a CPU is brought up/down.

But we only care about numa nodes going from having cpus to become
cpuless, and vice versa, as that influences the demotion_target order.

We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.

[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/

[osalvador@suse.de: add feedback from Huang Ying]
  Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de

Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reported-by: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Abhishek Goel &lt;huntbag@linux.vnet.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmstat: add event for ksm swapping in copy</title>
<updated>2022-03-22T22:57:09+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2022-03-22T21:46:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4d45c3aff5ebf80d329eba0f90544d20224f612d'/>
<id>4d45c3aff5ebf80d329eba0f90544d20224f612d</id>
<content type='text'>
When faults in from swap what used to be a KSM page and that page had been
swapped in before, system has to make a copy, and leaves remerging the
pages to a later pass of ksmd.

That is not good for performace, we'd better to reduce this kind of copy.
There are some ways to reduce it, for example lessen swappiness or
madvise(, , MADV_MERGEABLE) range.  So add this event to support doing
this tuning.  Just like this patch: "mm, THP, swap: add THP swapping out
fallback counting".

Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reviewed-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Yang Shi &lt;yang.shi@linux.alibaba.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When faults in from swap what used to be a KSM page and that page had been
swapped in before, system has to make a copy, and leaves remerging the
pages to a later pass of ksmd.

That is not good for performace, we'd better to reduce this kind of copy.
There are some ways to reduce it, for example lessen swappiness or
madvise(, , MADV_MERGEABLE) range.  So add this event to support doing
this tuning.  Just like this patch: "mm, THP, swap: add THP swapping out
fallback counting".

Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reviewed-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Yang Shi &lt;yang.shi@linux.alibaba.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>NUMA Balancing: add page promotion counter</title>
<updated>2022-03-22T22:57:09+00:00</updated>
<author>
<name>Huang Ying</name>
<email>ying.huang@intel.com</email>
</author>
<published>2022-03-22T21:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e39bb6be9f2b39a6dbaeff484361de76021b175d'/>
<id>e39bb6be9f2b39a6dbaeff484361de76021b175d</id>
<content type='text'>
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13

With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.

After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes.  In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node.  The
CPUs and the DRAM will be put in one logical node, while the PMEM will
be put in another (faked) logical node.

To optimize the system overall performance, the hot pages should be
placed in DRAM node.  To do that, we need to identify the hot pages in
the PMEM node and migrate them to DRAM node via NUMA migration.

In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node.  So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory
tiering system.  This is implemented in this patchset.

At the other hand, the cold pages should be placed in PMEM node.  So, we
also need to identify the cold pages in the DRAM node and migrate them
to PMEM node.

In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory
pressure is implemented.  Based on that, the cold DRAM pages can be
demoted to PMEM node proactively to free some memory space on DRAM node
to accommodate the promoted hot PMEM pages.  This is implemented in this
patchset too.

We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent Memory
Model.  The test results shows that the pmbench score can improve up to
95.9%.

This patch (of 3):

In a system with multiple memory types, e.g.  DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM").  So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.

To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages.  A new vmstat count is added.  The
counter is per-node (count in the target node).  So this can be used to
identify promotion imbalance among the NUMA nodes.

Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: zhongjiang-ali &lt;zhongjiang-ali@linux.alibaba.com&gt;
Cc: Feng Tang &lt;feng.tang@intel.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13

With the advent of various new memory types, some machines will have
multiple types of memory, e.g.  DRAM and PMEM (persistent memory).  The
memory subsystem of these machines can be called memory tiering system,
because the performance of the different types of memory are different.

After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for
use like normal RAM"), the PMEM could be used as the cost-effective
volatile memory in separate NUMA nodes.  In a typical memory tiering
system, there are CPUs, DRAM and PMEM in each physical NUMA node.  The
CPUs and the DRAM will be put in one logical node, while the PMEM will
be put in another (faked) logical node.

To optimize the system overall performance, the hot pages should be
placed in DRAM node.  To do that, we need to identify the hot pages in
the PMEM node and migrate them to DRAM node via NUMA migration.

In the original NUMA balancing, there are already a set of existing
mechanisms to identify the pages recently accessed by the CPUs in a node
and migrate the pages to the node.  So we can reuse these mechanisms to
build the mechanisms to optimize the page placement in the memory
tiering system.  This is implemented in this patchset.

At the other hand, the cold pages should be placed in PMEM node.  So, we
also need to identify the cold pages in the DRAM node and migrate them
to PMEM node.

In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a
mechanism to demote the cold DRAM pages to PMEM node under memory
pressure is implemented.  Based on that, the cold DRAM pages can be
demoted to PMEM node proactively to free some memory space on DRAM node
to accommodate the promoted hot PMEM pages.  This is implemented in this
patchset too.

We have tested the solution with the pmbench memory accessing benchmark
with the 80:20 read/write ratio and the Gauss access address
distribution on a 2 socket Intel server with Optane DC Persistent Memory
Model.  The test results shows that the pmbench score can improve up to
95.9%.

This patch (of 3):

In a system with multiple memory types, e.g.  DRAM and PMEM, the CPU
and DRAM in one socket will be put in one NUMA node as before, while
the PMEM will be put in another NUMA node as described in the
description of the commit c221c0b0308f ("device-dax: "Hotplug"
persistent memory for use like normal RAM").  So, the NUMA balancing
mechanism will identify all PMEM accesses as remote access and try to
promote the PMEM pages to DRAM.

To distinguish the number of the inter-type promoted pages from that of
the inter-socket migrated pages.  A new vmstat count is added.  The
counter is per-node (count in the target node).  So this can be used to
identify promotion imbalance among the NUMA nodes.

Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com
Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Yang Shi &lt;shy828301@gmail.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: Wei Xu &lt;weixugc@google.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: zhongjiang-ali &lt;zhongjiang-ali@linux.alibaba.com&gt;
Cc: Feng Tang &lt;feng.tang@intel.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm/vmstat: add events for THP max_ptes_* exceeds</title>
<updated>2022-01-15T14:30:29+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2022-01-14T22:07:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e9ea874a8ffb0f8ebed4f4981531a32c5b663d79'/>
<id>e9ea874a8ffb0f8ebed4f4981531a32c5b663d79</id>
<content type='text'>
There are interfaces to adjust max_ptes_none, max_ptes_swap,
max_ptes_shared values, see
  /sys/kernel/mm/transparent_hugepage/khugepaged/.

But system administrator may not know which value is the best.  So Add
those events to support adjusting max_ptes_* to suitable values.

For example, if default max_ptes_swap value causes too much failures,
and system uses zram whose IO is fast, administrator could increase
max_ptes_swap until THP_SCAN_EXCEED_SWAP_PTE not increase anymore.

Link: https://lkml.kernel.org/r/20211225094036.574157-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are interfaces to adjust max_ptes_none, max_ptes_swap,
max_ptes_shared values, see
  /sys/kernel/mm/transparent_hugepage/khugepaged/.

But system administrator may not know which value is the best.  So Add
those events to support adjusting max_ptes_* to suitable values.

For example, if default max_ptes_swap value causes too much failures,
and system uses zram whose IO is fast, administrator could increase
max_ptes_swap until THP_SCAN_EXCEED_SWAP_PTE not increase anymore.

Link: https://lkml.kernel.org/r/20211225094036.574157-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Cc: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Saravanan D &lt;saravanand@fb.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
