mirror of
https://github.com/torvalds/linux.git
synced 2026-01-25 15:03:52 +08:00
drm/sched: Document race condition in drm_sched_fini()
In drm_sched_fini() all entities are marked as stopped - without taking the appropriate lock, because that would deadlock. That means that drm_sched_fini() and drm_sched_entity_push_job() can race against each other. This should most likely be fixed by establishing the rule that all entities associated with a scheduler must be torn down first. Then, however, the locking should be removed from drm_sched_fini() alltogether with an appropriate comment. Reported-by: James Flowers <bold.zone2373@fastmail.com> Link: https://lore.kernel.org/dri-devel/20250720235748.2798-1-bold.zone2373@fastmail.com/ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://lore.kernel.org/r/20250813085654.102504-2-phasta@kernel.org
This commit is contained in:
@@ -1424,6 +1424,22 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
|
||||
* Prevents reinsertion and marks job_queue as idle,
|
||||
* it will be removed from the rq in drm_sched_entity_fini()
|
||||
* eventually
|
||||
*
|
||||
* FIXME:
|
||||
* This lacks the proper spin_lock(&s_entity->lock) and
|
||||
* is, therefore, a race condition. Most notably, it
|
||||
* can race with drm_sched_entity_push_job(). The lock
|
||||
* cannot be taken here, however, because this would
|
||||
* lead to lock inversion -> deadlock.
|
||||
*
|
||||
* The best solution probably is to enforce the life
|
||||
* time rule of all entities having to be torn down
|
||||
* before their scheduler. Then, however, locking could
|
||||
* be dropped alltogether from this function.
|
||||
*
|
||||
* For now, this remains a potential race in all
|
||||
* drivers that keep entities alive for longer than
|
||||
* the scheduler.
|
||||
*/
|
||||
s_entity->stopped = true;
|
||||
spin_unlock(&rq->lock);
|
||||
|
||||
Reference in New Issue
Block a user