diff options
| author | Tejun Heo <tj@kernel.org> | 2026-03-24 10:21:25 -1000 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-03-24 10:21:40 -1000 |
| commit | 1b164b876c36c3eb5561dd9b37702b04401b0166 (patch) | |
| tree | f691f34f1be6889dc67c6c73ff47189ffa9c85ff /include/uapi/linux | |
| parent | a72f73c4dd9b209c53cf8b03b6e97fcefad4262c (diff) | |
cgroup: Wait for dying tasks to leave on rmdir
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
tasks from cgroup.procs so that systemd doesn't see tasks that have already
been reaped via waitpid(). However, the populated counter (nr_populated_csets)
is only decremented when the task later passes through cgroup_task_dead() in
finish_task_switch(). This means cgroup.procs can appear empty while the
cgroup is still populated, causing rmdir to fail with -EBUSY.
Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
cgroup is populated but all remaining tasks have PF_EXITING set (the task
iterator returns none due to the existing filter), wait for a kick from
cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
cgroup's css_set between PF_EXITING assertion in do_exit() and
cgroup_task_dead() in finish_task_switch().
v2: cgroup_is_populated() true to false transition happens under css_set_lock
not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
missed wakeups (Sebastian).
Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Bert Karwatzki <spasswolf@web.de>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: cgroups@vger.kernel.org
Diffstat (limited to 'include/uapi/linux')
0 files changed, 0 insertions, 0 deletions
