<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/kernel/sched/isolation.c, branch v7.0-rc7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock</title>
<updated>2026-02-23T20:46:49+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-02-21T18:54:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a84097e625f2b9e7f273161c004f34b7be63b348'/>
<id>a84097e625f2b9e7f273161c004f34b7be63b348</id>
<content type='text'>
The current cpuset partition code is able to dynamically update
the sched domains of a running system and the corresponding
HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentially the
"isolcpus=domain,..." boot command line feature at run time.

The housekeeping cpumask update requires flushing a number of different
workqueues which may not be safe with cpus_read_lock() held as the
workqueue flushing code may acquire cpus_read_lock() or acquiring locks
which have locking dependency with cpus_read_lock() down the chain. Below
is an example of such circular locking problem.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.18.0-test+ #2 Tainted: G S
  ------------------------------------------------------
  test_cpuset_prs/10971 is trying to acquire lock:
  ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180

  but task is already holding lock:
  ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:
  -&gt; #4 (cpuset_mutex){+.+.}-{4:4}:
  -&gt; #3 (cpu_hotplug_lock){++++}-{0:0}:
  -&gt; #2 (rtnl_mutex){+.+.}-{4:4}:
  -&gt; #1 ((work_completion)(&amp;arg.work)){+.+.}-{0:0}:
  -&gt; #0 ((wq_completion)sync_wq){+.+.}-{0:0}:

  Chain exists of:
    (wq_completion)sync_wq --&gt; cpu_hotplug_lock --&gt; cpuset_mutex

  5 locks held by test_cpuset_prs/10971:
   #0: ffff88816810e440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
   #1: ffff8891ab620890 (&amp;of-&gt;mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0
   #2: ffff8890a78b83e8 (kn-&gt;active#187){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0
   #3: ffffffffadf32900 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130
   #4: ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130

  Call Trace:
   &lt;TASK&gt;
     :
   touch_wq_lockdep_map+0x93/0x180
   __flush_workqueue+0x111/0x10b0
   housekeeping_update+0x12d/0x2d0
   update_parent_effective_cpumask+0x595/0x2440
   update_prstate+0x89d/0xce0
   cpuset_partition_write+0xc5/0x130
   cgroup_file_write+0x1a5/0x680
   kernfs_fop_write_iter+0x3df/0x5f0
   vfs_write+0x525/0xfd0
   ksys_write+0xf9/0x1d0
   do_syscall_64+0x95/0x520
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

To avoid such a circular locking dependency problem, we have to
call housekeeping_update() without holding the cpus_read_lock() and
cpuset_mutex. The current set of wq's flushed by housekeeping_update()
may not have work functions that call cpus_read_lock() directly,
but we are likely to extend the list of wq's that are flushed in the
future. Moreover, the current set of work functions may hold locks that
may have cpu_hotplug_lock down the dependency chain.

So housekeeping_update() is now called after releasing cpus_read_lock
and cpuset_mutex at the end of a cpuset operation. These two locks are
then re-acquired later before calling rebuild_sched_domains_locked().

To enable mutual exclusion between the housekeeping_update() call and
other cpuset control file write actions, a new top level cpuset_top_mutex
is introduced. This new mutex will be acquired first to allow sharing
variables used by both code paths. However, cpuset update from CPU
hotplug can still happen in parallel with the housekeeping_update()
call, though that should be rare in production environment.

As cpus_read_lock() is now no longer held when
tmigr_isolated_exclude_cpumask() is called, it needs to acquire it
directly.

The lockdep_is_cpuset_held() is also updated to return true if either
cpuset_top_mutex or cpuset_mutex is held.

Fixes: 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current cpuset partition code is able to dynamically update
the sched domains of a running system and the corresponding
HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentially the
"isolcpus=domain,..." boot command line feature at run time.

The housekeeping cpumask update requires flushing a number of different
workqueues which may not be safe with cpus_read_lock() held as the
workqueue flushing code may acquire cpus_read_lock() or acquiring locks
which have locking dependency with cpus_read_lock() down the chain. Below
is an example of such circular locking problem.

  ======================================================
  WARNING: possible circular locking dependency detected
  6.18.0-test+ #2 Tainted: G S
  ------------------------------------------------------
  test_cpuset_prs/10971 is trying to acquire lock:
  ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180

  but task is already holding lock:
  ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:
  -&gt; #4 (cpuset_mutex){+.+.}-{4:4}:
  -&gt; #3 (cpu_hotplug_lock){++++}-{0:0}:
  -&gt; #2 (rtnl_mutex){+.+.}-{4:4}:
  -&gt; #1 ((work_completion)(&amp;arg.work)){+.+.}-{0:0}:
  -&gt; #0 ((wq_completion)sync_wq){+.+.}-{0:0}:

  Chain exists of:
    (wq_completion)sync_wq --&gt; cpu_hotplug_lock --&gt; cpuset_mutex

  5 locks held by test_cpuset_prs/10971:
   #0: ffff88816810e440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
   #1: ffff8891ab620890 (&amp;of-&gt;mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0
   #2: ffff8890a78b83e8 (kn-&gt;active#187){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0
   #3: ffffffffadf32900 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130
   #4: ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130

  Call Trace:
   &lt;TASK&gt;
     :
   touch_wq_lockdep_map+0x93/0x180
   __flush_workqueue+0x111/0x10b0
   housekeeping_update+0x12d/0x2d0
   update_parent_effective_cpumask+0x595/0x2440
   update_prstate+0x89d/0xce0
   cpuset_partition_write+0xc5/0x130
   cgroup_file_write+0x1a5/0x680
   kernfs_fop_write_iter+0x3df/0x5f0
   vfs_write+0x525/0xfd0
   ksys_write+0xf9/0x1d0
   do_syscall_64+0x95/0x520
   entry_SYSCALL_64_after_hwframe+0x76/0x7e

To avoid such a circular locking dependency problem, we have to
call housekeeping_update() without holding the cpus_read_lock() and
cpuset_mutex. The current set of wq's flushed by housekeeping_update()
may not have work functions that call cpus_read_lock() directly,
but we are likely to extend the list of wq's that are flushed in the
future. Moreover, the current set of work functions may hold locks that
may have cpu_hotplug_lock down the dependency chain.

So housekeeping_update() is now called after releasing cpus_read_lock
and cpuset_mutex at the end of a cpuset operation. These two locks are
then re-acquired later before calling rebuild_sched_domains_locked().

To enable mutual exclusion between the housekeeping_update() call and
other cpuset control file write actions, a new top level cpuset_top_mutex
is introduced. This new mutex will be acquired first to allow sharing
variables used by both code paths. However, cpuset update from CPU
hotplug can still happen in parallel with the housekeeping_update()
call, though that should be rare in production environment.

As cpus_read_lock() is now no longer held when
tmigr_isolated_exclude_cpumask() is called, it needs to acquire it
directly.

The lockdep_is_cpuset_held() is also updated to return true if either
cpuset_top_mutex or cpuset_mutex is held.

Fixes: 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kthread: Honour kthreads preferred affinity after cpuset changes</title>
<updated>2026-02-03T14:23:35+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-06-04T12:02:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e894f633980804a528a2d6996c4ea651df631632'/>
<id>e894f633980804a528a2d6996c4ea651df631632</id>
<content type='text'>
When cpuset isolated partitions get updated, unbound kthreads get
indifferently affine to all non isolated CPUs, regardless of their
individual affinity preferences.

For example kswapd is a per-node kthread that prefers to be affine to
the node it refers to. Whenever an isolated partition is created,
updated or deleted, kswapd's node affinity is going to be broken if any
CPU in the related node is not isolated because kswapd will be affine
globally.

Fix this with letting the consolidated kthread managed affinity code do
the affinity update on behalf of cpuset.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When cpuset isolated partitions get updated, unbound kthreads get
indifferently affine to all non isolated CPUs, regardless of their
individual affinity preferences.

For example kswapd is a per-node kthread that prefers to be affine to
the node it refers to. Whenever an isolated partition is created,
updated or deleted, kswapd's node affinity is going to be broken if any
CPU in the related node is not isolated because kswapd will be affine
globally.

Fix this with letting the consolidated kthread managed affinity code do
the affinity update on behalf of cpuset.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: Propagate cpuset isolation update to timers through housekeeping</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-12-21T23:55:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f5c145ae4f26c25b853f059f1aa14a46f4e9f1ab'/>
<id>f5c145ae4f26c25b853f059f1aa14a46f4e9f1ab</id>
<content type='text'>
Until now, cpuset would propagate isolated partition changes to
timer migration so that unbound timers don't get migrated to isolated
CPUs.

Since housekeeping now centralizes, synchronize and propagates isolation
cpumask changes, perform the work from that subsystem for consolidation
and consistency purposes.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, cpuset would propagate isolated partition changes to
timer migration so that unbound timers don't get migrated to isolated
CPUs.

Since housekeeping now centralizes, synchronize and propagates isolation
cpumask changes, perform the work from that subsystem for consolidation
and consistency purposes.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: Propagate cpuset isolation update to workqueue through housekeeping</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-05-28T16:19:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=23f09dcc0a0fa3b4e48516bdea1c90223dfb3d6c'/>
<id>23f09dcc0a0fa3b4e48516bdea1c90223dfb3d6c</id>
<content type='text'>
Until now, cpuset would propagate isolated partition changes to
workqueues so that unbound workers get properly reaffined.

Since housekeeping now centralizes, synchronize and propagates isolation
cpumask changes, perform the work from that subsystem for consolidation
and consistency purposes.

For simplification purpose, the target function is adapted to take the
new housekeeping mask instead of the isolated mask.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Michal Koutný" &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, cpuset would propagate isolated partition changes to
workqueues so that unbound workers get properly reaffined.

Since housekeeping now centralizes, synchronize and propagates isolation
cpumask changes, perform the work from that subsystem for consolidation
and consistency purposes.

For simplification purpose, the target function is adapted to take the
new housekeeping mask instead of the isolated mask.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Michal Koutný" &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>PCI: Flush PCI probe workqueue on cpuset isolated partition change</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-09-30T13:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=29b306c44eb5eefdfa02d6ba1205f479f82fb088'/>
<id>29b306c44eb5eefdfa02d6ba1205f479f82fb088</id>
<content type='text'>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In
order to synchronize against PCI probe works and make sure that no
asynchronous probing is still pending or executing on a newly isolated
CPU, the housekeeping subsystem must flush the PCI probe works.

However the PCI probe works can't be flushed easily since they are
queued to the main per-CPU workqueue pool.

Solve this with creating a PCI probe-specific pool and provide and use
the appropriate flushing API.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: linux-pci@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In
order to synchronize against PCI probe works and make sure that no
asynchronous probing is still pending or executing on a newly isolated
CPU, the housekeeping subsystem must flush the PCI probe works.

However the PCI probe works can't be flushed easily since they are
queued to the main per-CPU workqueue pool.

Solve this with creating a PCI probe-specific pool and provide and use
the appropriate flushing API.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Acked-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: linux-pci@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Flush vmstat workqueues on cpuset isolated partition change</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-06-13T12:48:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ce84ad5e994aea5d41ff47135a71439ad4f54005'/>
<id>ce84ad5e994aea5d41ff47135a71439ad4f54005</id>
<content type='text'>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime.
In order to synchronize against vmstat workqueue to make sure
that no asynchronous vmstat work is still pending or executing on a
newly made isolated CPU, the housekeeping susbsystem must flush the
vmstat workqueues.

This involves flushing the whole mm_percpu_wq workqueue, shared with
LRU drain, introducing here a welcome side effect.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: linux-mm@kvack.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime.
In order to synchronize against vmstat workqueue to make sure
that no asynchronous vmstat work is still pending or executing on a
newly made isolated CPU, the housekeeping susbsystem must flush the
vmstat workqueues.

This involves flushing the whole mm_percpu_wq workqueue, shared with
LRU drain, introducing here a welcome side effect.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: linux-mm@kvack.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Flush memcg workqueues on cpuset isolated partition change</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-06-12T13:36:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b7eb4edcc3b5cd7ffdcbd56fa7a12de41b39424d'/>
<id>b7eb4edcc3b5cd7ffdcbd56fa7a12de41b39424d</id>
<content type='text'>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In
order to synchronize against memcg workqueue to make sure that no
asynchronous draining is still pending or executing on a newly made
isolated CPU, the housekeeping susbsystem must flush the memcg
workqueues.

However the memcg workqueues can't be flushed easily since they are
queued to the main per-CPU workqueue pool.

Solve this with creating a memcg specific pool and provide and use the
appropriate flushing API.

Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In
order to synchronize against memcg workqueue to make sure that no
asynchronous draining is still pending or executing on a newly made
isolated CPU, the housekeeping susbsystem must flush the memcg
workqueues.

However the memcg workqueues can't be flushed easily since they are
queued to the main per-CPU workqueue pool.

Solve this with creating a memcg specific pool and provide and use the
appropriate flushing API.

Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset</title>
<updated>2026-02-03T14:23:34+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-05-28T16:05:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=03ff7351016983653bab5b38e58529d0e38a28cf'/>
<id>03ff7351016983653bab5b38e58529d0e38a28cf</id>
<content type='text'>
Until now, HK_TYPE_DOMAIN used to only include boot defined isolated
CPUs passed through isolcpus= boot option. Users interested in also
knowing the runtime defined isolated CPUs through cpuset must use
different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc...

There are many drawbacks to that approach:

1) Most interested subsystems want to know about all isolated CPUs, not
  just those defined on boot time.

2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with
  concurrent cpuset changes.

3) Further cpuset modifications are not propagated to subsystems

Solve 1) and 2) and centralize all isolated CPUs within the
HK_TYPE_DOMAIN housekeeping cpumask.

Subsystems can rely on RCU to synchronize against concurrent changes.

The propagation mentioned in 3) will be handled in further patches.

[Chen Ridong: Fix cpu_hotplug_lock deadlock and use correct static
branch API]

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Cc: "Michal Koutný" &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until now, HK_TYPE_DOMAIN used to only include boot defined isolated
CPUs passed through isolcpus= boot option. Users interested in also
knowing the runtime defined isolated CPUs through cpuset must use
different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc...

There are many drawbacks to that approach:

1) Most interested subsystems want to know about all isolated CPUs, not
  just those defined on boot time.

2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with
  concurrent cpuset changes.

3) Further cpuset modifications are not propagated to subsystems

Solve 1) and 2) and centralize all isolated CPUs within the
HK_TYPE_DOMAIN housekeeping cpumask.

Subsystems can rely on RCU to synchronize against concurrent changes.

The propagation mentioned in 3) will be handled in further patches.

[Chen Ridong: Fix cpu_hotplug_lock deadlock and use correct static
branch API]

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Cc: "Michal Koutný" &lt;mkoutny@suse.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: cgroups@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Convert housekeeping cpumasks to rcu pointers</title>
<updated>2026-02-03T14:23:33+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-05-27T13:36:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=27c3a5967f054ab666704cb28f2aeb18ca07cab7'/>
<id>27c3a5967f054ab666704cb28f2aeb18ca07cab7</id>
<content type='text'>
HK_TYPE_DOMAIN's cpumask will soon be made modifiable by cpuset.
A synchronization mechanism is then needed to synchronize the updates
with the housekeeping cpumask readers.

Turn the housekeeping cpumasks into RCU pointers. Once a housekeeping
cpumask will be modified, the update side will wait for an RCU grace
period and propagate the change to interested subsystem when deemed
necessary.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HK_TYPE_DOMAIN's cpumask will soon be made modifiable by cpuset.
A synchronization mechanism is then needed to synchronize the updates
with the housekeeping cpumask readers.

Turn the housekeeping cpumasks into RCU pointers. Once a housekeeping
cpumask will be modified, the update side will wait for an RCU grace
period and propagate the change to interested subsystem when deemed
necessary.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/isolation: Save boot defined domain flags</title>
<updated>2026-02-03T14:23:33+00:00</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2025-05-26T11:06:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4fca0e550d506e1c95504c2d9247bc92bf621bf6'/>
<id>4fca0e550d506e1c95504c2d9247bc92bf621bf6</id>
<content type='text'>
HK_TYPE_DOMAIN will soon integrate not only boot defined isolcpus= CPUs
but also cpuset isolated partitions.

Housekeeping still needs a way to record what was initially passed
to isolcpus= in order to keep these CPUs isolated after a cpuset
isolated partition is modified or destroyed while containing some of
them.

Create a new HK_TYPE_DOMAIN_BOOT to keep track of those.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
HK_TYPE_DOMAIN will soon integrate not only boot defined isolcpus= CPUs
but also cpuset isolated partitions.

Housekeeping still needs a way to record what was initially passed
to isolcpus= in order to keep these CPUs isolated after a cpuset
isolated partition is modified or destroyed while containing some of
them.

Create a new HK_TYPE_DOMAIN_BOOT to keep track of those.

Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Phil Auld &lt;pauld@redhat.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Marco Crivellari &lt;marco.crivellari@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
