Files
linux/kernel/sched
Tejun Heo d245698d72 cgroup: Defer task cgroup unlink until after the task is done switching out
When a task exits, css_set_move_task(tsk, cset, NULL, false) unlinks the task
from its cgroup. From the cgroup's perspective, the task is now gone. If this
makes the cgroup empty, it can be removed, triggering ->css_offline() callbacks
that notify controllers the cgroup is going offline resource-wise.

However, the exiting task can still run, perform memory operations, and schedule
until the final context switch in finish_task_switch(). This creates a confusing
situation where controllers are told a cgroup is offline while resource
activities are still happening in it. While this hasn't broken existing
controllers, it has caused direct confusion for sched_ext schedulers.

Split cgroup_task_exit() into two functions. cgroup_task_exit() now only calls
the subsystem exit callbacks and continues to be called from do_exit(). The
css_set cleanup is moved to the new cgroup_task_dead() which is called from
finish_task_switch() after the final context switch, so that the cgroup only
appears empty after the task is truly done running.

This also reorders operations so that subsys->exit() is now called before
unlinking from the cgroup, which shouldn't break anything.

Cc: Dan Schatzberg <dschatzberg@meta.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-03 11:46:18 -10:00
..
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-13 08:47:18 +02:00
2025-06-11 11:20:53 +02:00
2025-07-14 17:16:28 +02:00
2025-06-11 11:20:53 +02:00
2025-08-04 10:51:22 -07:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00
2025-06-11 11:20:53 +02:00