From a8ad873113d3fe01f9b5d737d4b0570fa36826b0 Mon Sep 17 00:00:00 2001 From: Emil Tsalapatis Date: Fri, 10 Oct 2025 12:12:50 -0700 Subject: sched_ext: defer queue_balance_callback() until after ops.dispatch The sched_ext code calls queue_balance_callback() during enqueue_task() to defer operations that drop multiple locks until we can unpin them. The call assumes that the rq lock is held until the callbacks are invoked, and the pending callbacks will not be visible to any other threads. This is enforced by a WARN_ON_ONCE() in rq_pin_lock(). However, balance_one() may actually drop the lock during a BPF dispatch call. Another thread may win the race to get the rq lock and see the pending callback. To avoid this, sched_ext must only queue the callback after the dispatch calls have completed. CPU 0 CPU 1 CPU 2 scx_balance() rq_unpin_lock() scx_balance_one() |= IN_BALANCE scx_enqueue() ops.dispatch() rq_unlock() rq_lock() queue_balance_callback() rq_unlock() [WARN] rq_pin_lock() rq_lock() &= ~IN_BALANCE rq_repin_lock() Changelog v2-> v1 (https://lore.kernel.org/sched-ext/aOgOxtHCeyRT_7jn@gpd4) - Fixed explanation in patch description (Andrea) - Fixed scx_rq mask state updates (Andrea) - Added Reviewed-by tag from Andrea Reported-by: Jakub Kicinski Signed-off-by: Emil Tsalapatis (Meta) Reviewed-by: Andrea Righi Signed-off-by: Tejun Heo --- kernel/sched/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'kernel/sched/sched.h') diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1f5d07067f60..3f7fab3d7960 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -784,6 +784,7 @@ enum scx_rq_flags { SCX_RQ_BAL_KEEP = 1 << 3, /* balance decided to keep current */ SCX_RQ_BYPASSING = 1 << 4, SCX_RQ_CLK_VALID = 1 << 5, /* RQ clock is fresh and valid */ + SCX_RQ_BAL_CB_PENDING = 1 << 6, /* must queue a cb after dispatching */ SCX_RQ_IN_WAKEUP = 1 << 16, SCX_RQ_IN_BALANCE = 1 << 17, -- cgit v1.2.3 From 53abe3e1c154628cc74e33a1bfcd865656e433a5 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 15 Oct 2025 11:19:34 +0200 Subject: sched: Remove never used code in mm_cid_get() Clang is not happy with set but unused variable (this is visible with `make W=1` build: kernel/sched/sched.h:3744:18: error: variable 'cpumask' set but not used [-Werror,-Wunused-but-set-variable] It seems like the variable was never used along with the assignment that does not have side effects as far as I can see. Remove those altogether. Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Signed-off-by: Andy Shevchenko Tested-by: Eric Biggers Reviewed-by: Breno Leitao Signed-off-by: Linus Torvalds --- kernel/sched/sched.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'kernel/sched/sched.h') diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1f5d07067f60..361f9101cef9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3740,11 +3740,9 @@ static inline int mm_cid_get(struct rq *rq, struct task_struct *t, struct mm_struct *mm) { struct mm_cid __percpu *pcpu_cid = mm->pcpu_cid; - struct cpumask *cpumask; int cid; lockdep_assert_rq_held(rq); - cpumask = mm_cidmask(mm); cid = __this_cpu_read(pcpu_cid->cid); if (mm_cid_is_valid(cid)) { mm_cid_snapshot_time(rq, mm); -- cgit v1.2.3