From 1f6c6ac03db4a8fface0ad34ba12e5ffb4d51327 Mon Sep 17 00:00:00 2001 From: Libo Chen Date: Wed, 23 Apr 2025 19:45:22 -0700 Subject: sched/numa: skip VMA scanning on memory pinned to one NUMA node via cpuset.mems MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems", v5. This patch (of 2): When the memory of the current task is pinned to one NUMA node by cgroup, there is no point in continuing the rest of VMA scanning and hinting page faults as they will just be overhead. With this change, there will be no more unnecessary PTE updates or page faults in this scenario. We have seen up to a 6x improvement on a typical java workload running on VMs with memory and CPU pinned to one NUMA node via cpuset in a two-socket AARCH64 system. With the same pinning, on a 18-cores-per-socket Intel platform, we have seen 20% improvment in a microbench that creates a 30-vCPU selftest KVM guest with 4GB memory, where each vCPU reads 4KB pages in a fixed number of loops. Link: https://lkml.kernel.org/r/20250424024523.2298272-1-libo.chen@oracle.com Link: https://lkml.kernel.org/r/20250424024523.2298272-2-libo.chen@oracle.com Signed-off-by: Libo Chen Tested-by: Chen Yu Tested-by: Srikanth Aithal Tested-by: Venkat Rao Bagalkote Cc: "Chen, Tim C" Cc: Chris Hyser Cc: Daniel Jordan Cc: Ingo Molnar Cc: Juri Lelli Cc: K Prateek Nayak Cc: Lorenzo Stoakes Cc: Madadi Vineeth Reddy Cc: Mel Gorman Cc: Michal Koutný Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Steven Rostedt Cc: Tejun Heo Cc: Vincent Guittot Signed-off-by: Andrew Morton --- kernel/sched/fair.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'kernel/sched') diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fb9bf995a47..b3b715e8a7cb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3329,6 +3329,13 @@ static void task_numa_work(struct callback_head *work) if (p->flags & PF_EXITING) return; + /* + * Memory is pinned to only one NUMA node via cpuset.mems, naturally + * no page can be migrated. + */ + if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1) + return; + if (!mm->numa_next_scan) { mm->numa_next_scan = now + msecs_to_jiffies(sysctl_numa_balancing_scan_delay); -- cgit v1.2.3 From 3fc567e4c0b71d6c59ba26c5d6e54cf3c490dd3a Mon Sep 17 00:00:00 2001 From: Libo Chen Date: Wed, 23 Apr 2025 19:45:23 -0700 Subject: sched/numa: add tracepoint that tracks the skipping of numa balancing due to cpuset memory pinning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unlike sched_skip_vma_numa tracepoint which tracks skipped VMAs, this tracks the task subjected to cpuset.mems pinning and prints out its allowed memory node mask. Link: https://lkml.kernel.org/r/20250424024523.2298272-3-libo.chen@oracle.com Signed-off-by: Libo Chen Cc: "Chen, Tim C" Cc: Chen Yu Cc: Chris Hyser Cc: Daniel Jordan Cc: Ingo Molnar Cc: Juri Lelli Cc: K Prateek Nayak Cc: Lorenzo Stoakes Cc: Madadi Vineeth Reddy Cc: Mel Gorman Cc: Michal Koutný Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Srikanth Aithal Cc: Steven Rostedt Cc: Tejun Heo Cc: Venkat Rao Bagalkote Cc: Vincent Guittot Signed-off-by: Andrew Morton --- kernel/sched/fair.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'kernel/sched') diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b3b715e8a7cb..cef163c174bd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3333,8 +3333,10 @@ static void task_numa_work(struct callback_head *work) * Memory is pinned to only one NUMA node via cpuset.mems, naturally * no page can be migrated. */ - if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1) + if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1) { + trace_sched_skip_cpuset_numa(current, &cpuset_current_mems_allowed); return; + } if (!mm->numa_next_scan) { mm->numa_next_scan = now + -- cgit v1.2.3