diff options
| author | Tejun Heo <tj@kernel.org> | 2026-04-19 08:36:45 -1000 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-04-20 06:55:33 -1000 |
| commit | 41e3312861eafba171d9620150aaf2e99165d044 (patch) | |
| tree | 0f22d2d6af12df3f4f39c7c203171df3eaae79d1 /include/linux | |
| parent | ed859d4319863263665b239cd2c62c3aad1664ce (diff) | |
sched_ext: add p->scx.tid and SCX_OPS_TID_TO_TASK lookup
BPF schedulers that can't hold task_struct pointers (arena-backed ones in
particular) key tasks by pid. During exit, pid is released before the
task finishes passing through scheduler callbacks, so a dying task
becomes invisible to the BPF side mid-schedule. scx_qmap hits this: an
exiting task's dispatch callback can't recover its queue entry, stalling
dispatch until SCX_EXIT_ERROR_STALL.
Add a unique non-zero u64 p->scx.tid assigned at fork that survives the
full task lifetime including exit. scx_bpf_tid_to_task() looks up the
task; unlike bpf_task_from_pid(), it handles exiting tasks.
The lookup costs an rhashtable insert/remove under scx_tasks_lock, so
root schedulers opt in via SCX_OPS_TID_TO_TASK. Sub-schedulers that set
the flag to declare a dependency are rejected at attach if root didn't
opt in.
scx_qmap converted: keys tasks by tid and enables SCX_OPS_ENQ_EXITING.
Pre-patch it stalls within seconds under a non-leader-exec workload;
with the patch it runs cleanly.
v3: Warn on rhashtable_lookup_insert_fast() failure via new
scx_tid_hash_insert() helper (Cheng-Yang Chou).
v2: Guard scx_root deref in scx_bpf_tid_to_task() error path. The kfunc
is registered via scx_kfunc_set_any and reachable from tracing and
syscall programs when no scheduler is attached (Cheng-Yang Chou).
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/sched/ext.h | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h index 1a3af2ea2a79..d05efcac794d 100644 --- a/include/linux/sched/ext.h +++ b/include/linux/sched/ext.h @@ -203,6 +203,15 @@ struct sched_ext_entity { u64 core_sched_at; /* see scx_prio_less() */ #endif + /* + * Unique non-zero task ID assigned at fork. Persists across exec and + * is never reused. Lets BPF schedulers identify tasks without storing + * kernel pointers - arena-backed schedulers being one example. See + * scx_bpf_tid_to_task(). + */ + u64 tid; + struct rhash_head tid_hash_node; /* see SCX_OPS_TID_TO_TASK */ + /* BPF scheduler modifiable fields */ /* |
