linux-toradex.git/drivers/gpu/drm/xe/xe_exec_queue.c, branch v6.19

drm/xe/uapi: disallow bind queue sharing

2026-01-21T14:24:16+00:00

Currently this is very broken if someone attempts to create a bind
queue and share it across multiple VMs. For example currently we assume
it is safe to acquire the user VM lock to protect some of the bind queue
state, but if allow sharing the bind queue with multiple VMs then this
quickly breaks down.

To fix this reject using a bind queue with any VM that is not the same
VM that was originally passed when creating the bind queue. This a uAPI
change, however this was more of an oversight on kernel side that we
didn't reject this, and expectation is that userspace shouldn't be using
bind queues in this way, so in theory this change should go unnoticed.

Based on a patch from Matt Brost.

v2 (Matt B):
  - Hold the vm lock over queue create, to ensure it can't be closed as
    we attach the user_vm to the queue.
  - Make sure we actually check for NULL user_vm in destruction path.
v3:
  - Fix error path handling.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Reported-by: Thomas Hellström 
Signed-off-by: Matthew Auld 
Cc: José Roberto de Souza 
Cc: Matthew Brost 
Cc: Michal Mrozek 
Cc: Carl Zhang 
Cc:  # v6.8+
Acked-by: José Roberto de Souza 
Reviewed-by: Matthew Brost 
Reviewed-by: Arvind Yadav 
Acked-by: Michal Mrozek 
Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com
(cherry picked from commit 9dd08fdecc0c98d6516c2d2d1fa189c1332f8dab)
Signed-off-by: Thomas Hellström

drm/xe: Fix duplicated put due to merge resolution

2025-12-04T22:13:04+00:00

An incorrect backmerge resolution resulted in an
incorrect duplicate put. Fix.

Reported-by: Linus Torvalds 
Closes: https://lore.kernel.org/dri-devel/CAHk-=whaiMayMx=LrL7P119MLBX6exM_mEu4S2uBRT+xWQ-mbA@mail.gmail.com/
Fixes: Fixes: ce0478b02ed2 ("Merge tag 'v6.18-rc6' into drm-next")
Signed-off-by: Thomas Hellström 
Acked-by: Dave Airlie 
Signed-off-by: Linus Torvalds

Merge tag 'v6.18-rc6' into drm-next

2025-11-20T22:55:08+00:00

Linux 6.18-rc6

Backmerge in order to merge msm next

Signed-off-by: Dave Airlie

drm/xe: Enforce correct user fence signaling order using

2025-11-07T11:55:19+00:00

Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.

v7:
 - Skip drm_syncbj create of error (CI)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost 
Reviewed-by: Thomas Hellström 
Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com
(cherry picked from commit adda4e855ab6409a3edaa585293f1f2069ab7299)
Signed-off-by: Lucas De Marchi

drm/xe: Remove last fence dependency check from binds and execs

2025-11-04T16:21:18+00:00

Eliminate redundant last fence dependency checks in exec and bind jobs,
as they are now equivalent to xe_exec_queue_is_idle. Simplify the code
by removing this dead logic.

Signed-off-by: Matthew Brost 
Reviewed-by: Thomas Hellström 
Link: https://patch.msgid.link/20251031234050.3043507-7-matthew.brost@intel.com

drm/xe: Attach last fence to TLB invalidation job queues

2025-11-04T16:20:57+00:00

Add support for attaching the last fence to TLB invalidation job queues
to address serialization issues during bursts of unbind jobs. Ensure
that user fence signaling for a bind job reflects both the bind job
itself and the last fences of all related TLB invalidations. Maintain
submission order based solely on the state of the bind and TLB
invalidation queues.

Introduce support functions for last fence attachment to TLB
invalidation queues.

v3:
 - Fix assert in xe_exec_queue_tlb_inval_last_fence_set (CI)
 - Ensure migrate lock held for migrate queues (Testing)
v5:
 - Style nits (Thomas)
 - Rewrite commit message (Thomas)

Signed-off-by: Matthew Brost 
Reviewed-by: Thomas Hellström 
Link: https://patch.msgid.link/20251031234050.3043507-3-matthew.brost@intel.com

drm/xe: Enforce correct user fence signaling order using

2025-11-04T16:20:46+00:00

Prevent application hangs caused by out-of-order fence signaling when
user fences are attached. Use drm_syncobj (via dma-fence-chain) to
guarantee that each user fence signals in order, regardless of the
signaling order of the attached fences. Ensure user fence writebacks to
user space occur in the correct sequence.

v7:
 - Skip drm_syncbj create of error (CI)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost 
Reviewed-by: Thomas Hellström 
Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com

drm/xe: Limit number of jobs per exec queue

2025-10-29T01:46:19+00:00

Add a limit to the number of jobs that can be queued in a single
exec queue to avoid potential resource exhaustion.

A new field `job_cnt` is introduced in `struct xe_exec_queue` to
track the number of active DRM jobs, along with a maximum limit
`XE_MAX_JOB_COUNT_PER_EXEC_QUEUE` set to 1000.

If the job count exceeds this threshold, `xe_exec_ioctl()` now
returns `-EAGAIN` to signal that the caller should retry later.

A trace event is added to track when the limit is reached:
"xe_exec_queue_reach_max_job_count: dev=0000:03:00.0, job count
exceeded the maximum limit (1000) per exec queue. engine_class=0x3,
logical_mask=0x1, guc_id=2"

v3: add assert in xe_exec_queue_destroy that q->job_cnt is zero. (Matt)
v2 (Matt):
 - add log to trace the limit is hit.
 - Change max count from 0x1000 to 1000.
 - Use atomic_t for job_cnt.

Suggested-by: Matthew Brost 
Signed-off-by: Shuicheng Lin 
Reviewed-by: Matthew Brost 
Signed-off-by: Matthew Brost 
Link: https://patch.msgid.link/20251027202118.3339905-2-shuicheng.lin@intel.com

drm/xe: Move queue init before LRC creation

2025-10-09T10:22:53+00:00

A queue must be in the submission backend's tracking state before the
LRC is created to avoid a race condition where the LRC's GGTT addresses
are not properly fixed up during VF post-migration recovery.

Move the queue initialization—which adds the queue to the submission
backend's tracking state—before LRC creation.

Also wait on pending GGTT fixups before allocating LRCs to avoid racing
with fixups.

v2:
 - Wait on VF GGTT fixes before creating LRC (testing)
v5:
 - Adjust comment in code (Tomasz)
 - Reduce race window
v7:
 - Only wakeup waiters in recovery path (CI)
 - Wakeup waiters on abort
 - Use GT warn on (Michal)
 - Fix kernel doc for LRC ring size function (Tomasz)
v8:
 - Guard against migration not supported or no memirq (CI)

Signed-off-by: Matthew Brost 
Reviewed-by: Tomasz Lis 
Link: https://lore.kernel.org/r/20251008214532.3442967-28-matthew.brost@intel.com

drm/xe: Track LR jobs in DRM scheduler pending list

2025-10-09T10:22:19+00:00

VF migration requires jobs to remain pending so they can be replayed
after the VF comes back. Previously, LR job fences were intentionally
signaled immediately after submission to avoid the risk of exporting
them, as these fences do not naturally signal in a timely manner and
could break dma-fence contracts. A side effect of this approach was that
LR jobs were never added to the DRM scheduler’s pending list, preventing
them from being tracked for later resubmission.

We now avoid signaling LR job fences and ensure they are never exported;
Xe already guards against exporting these internal fences. With that
guarantee in place, we can safely track LR jobs in the scheduler’s
pending list so they are eligible for resubmission during VF
post-migration recovery (and similar recovery paths).

An added benefit is that LR queues now gain the DRM scheduler’s built-in
flow control over ring usage rather than rejecting new jobs in the exec
IOCTL if the ring is full.

v2:
 - Ensure DRM scheduler TDR doesn't run for LR jobs
 - Stack variable for killed_or_banned_or_wedged
v4:
 - Clarify commit message (Tomasz)

Signed-off-by: Matthew Brost 
Reviewed-by: Tomasz Lis 
Link: https://lore.kernel.org/r/20251008214532.3442967-5-matthew.brost@intel.com