diff options
| author | Andrea Righi <arighi@nvidia.com> | 2026-05-13 13:24:38 +0200 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-05-13 10:02:57 -1000 |
| commit | 6ae315d37924435516d697ea7dde0b799a5928e0 (patch) | |
| tree | c73ac9805b64864cded6d8edb39e5b6024c4dea6 /kernel | |
| parent | cceb874eee46fe4b3d3c6c496f19125d9a3a9a8f (diff) | |
sched_ext: Use HK_TYPE_DOMAIN_BOOT to detect isolcpus= domain isolation
scx_enable() refuses to attach a BPF scheduler when isolcpus=domain is
in effect by comparing housekeeping_cpumask(HK_TYPE_DOMAIN) against
cpu_possible_mask.
Since commit 27c3a5967f05 ("sched/isolation: Convert housekeeping
cpumasks to rcu pointers"), HK_TYPE_DOMAIN's cpumask is RCU protected
and dereferencing it requires either RCU read lock, the cpu_hotplug
write lock, or the cpuset lock; scx_enable() holds none of these, so
booting with isolcpus=domain and attaching any BPF scheduler triggers
the following lockdep splat:
=============================
WARNING: suspicious RCU usage
-----------------------------
kernel/sched/isolation.c:60 suspicious rcu_dereference_check() usage!
1 lock held by scx_flash/281:
#0: ffffffff8379fce0 (update_mutex){+.+.}-{4:4}, at:
bpf_struct_ops_link_create+0x134/0x1c0
Call Trace:
dump_stack_lvl+0x6f/0xb0
lockdep_rcu_suspicious.cold+0x37/0x70
housekeeping_cpumask+0xcd/0xe0
scx_enable.isra.0+0x17/0x120
bpf_scx_reg+0x5e/0x80
bpf_struct_ops_link_create+0x151/0x1c0
__sys_bpf+0x1e4b/0x33c0
__x64_sys_bpf+0x21/0x30
do_syscall_64+0x117/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
In addition, commit 03ff73510169 ("cpuset: Update HK_TYPE_DOMAIN cpumask
from cpuset") made HK_TYPE_DOMAIN include cpuset isolated partitions as
well, which means the current check also rejects BPF schedulers when a
cpuset partition is active. That contradicts the original intent of
commit 9f391f94a173 ("sched_ext: Disallow loading BPF scheduler if
isolcpus= domain isolation is in effect"), which explicitly noted that
cpuset partitions are honored through per-task cpumasks and should not
be rejected.
Switch to housekeeping_enabled(HK_TYPE_DOMAIN_BOOT), which reads only
the housekeeping flag bit (no RCU dereference) and reflects exactly the
boot-time isolcpus= configuration that the error message refers to.
Fixes: 27c3a5967f05 ("sched/isolation: Convert housekeeping cpumasks to rcu pointers")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Diffstat (limited to 'kernel')
| -rw-r--r-- | kernel/sched/ext.c | 3 |
1 files changed, 1 insertions, 2 deletions
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 23f7b3f63b09..a6d0a93d8174 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -7415,8 +7415,7 @@ static s32 scx_enable(struct sched_ext_ops *ops, struct bpf_link *link) static DEFINE_MUTEX(helper_mutex); struct scx_enable_cmd cmd; - if (!cpumask_equal(housekeeping_cpumask(HK_TYPE_DOMAIN), - cpu_possible_mask)) { + if (housekeeping_enabled(HK_TYPE_DOMAIN_BOOT)) { pr_err("sched_ext: Not compatible with \"isolcpus=\" domain isolation\n"); return -EINVAL; } |
