<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/gpu/drm/xe, branch master</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/xe: fix job timeout recovery for unstarted jobs and kernel queues</title>
<updated>2026-06-11T13:39:43+00:00</updated>
<author>
<name>Rodrigo Vivi</name>
<email>rodrigo.vivi@intel.com</email>
</author>
<published>2026-06-10T15:25:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=347ccc0453fca2c669e8dc8a72000e76ca4adf10'/>
<id>347ccc0453fca2c669e8dc8a72000e76ca4adf10</id>
<content type='text'>
A job that GuC never scheduled (never started) indicates a GuC
scheduling failure; previously such jobs were silently errored out
instead of triggering a GT reset to recover. Trigger a GT reset and
resubmit them, but only when the queue was not already killed or banned:
an unstarted job on an already banned queue is the ban working as
intended and must neither clear the ban nor kick off a reset, otherwise
a banned userspace queue could be resurrected and spam GT resets.

Kernel queues are always recovered this way and wedge the device once
recovery attempts are exhausted, since kernel work must not silently
fail. A started job that times out on a userspace VM bind queue stays
banned rather than being reset and retried.

The queue is banned early in the timeout handler to signal the G2H
scheduling-done handler so it wakes the disable-scheduling waiter;
without it the waiter sleeps the full 5s timeout. When a reset is
warranted the ban is cleared before rearming so that
guc_exec_queue_start() can resubmit jobs after the GT reset - a
still-banned queue would block resubmission and cause an infinite TDR
loop. The already-banned case is gated out before this point via
skip_timeout_check, so it is unaffected.

v2: (Himal) Do it for any queue type, not just kernel/migration
v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already
      killed/banned queues on unstarted-job timeout
    - Update commit message
    - (Matt) Add Fixes tag

Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started")
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Cc: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Assisted-by: GitHub-Copilot:claude-opus-4.8
Tested-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
(cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A job that GuC never scheduled (never started) indicates a GuC
scheduling failure; previously such jobs were silently errored out
instead of triggering a GT reset to recover. Trigger a GT reset and
resubmit them, but only when the queue was not already killed or banned:
an unstarted job on an already banned queue is the ban working as
intended and must neither clear the ban nor kick off a reset, otherwise
a banned userspace queue could be resurrected and spam GT resets.

Kernel queues are always recovered this way and wedge the device once
recovery attempts are exhausted, since kernel work must not silently
fail. A started job that times out on a userspace VM bind queue stays
banned rather than being reset and retried.

The queue is banned early in the timeout handler to signal the G2H
scheduling-done handler so it wakes the disable-scheduling waiter;
without it the waiter sleeps the full 5s timeout. When a reset is
warranted the ban is cleared before rearming so that
guc_exec_queue_start() can resubmit jobs after the GT reset - a
still-banned queue would block resubmission and cause an infinite TDR
loop. The already-banned case is gated out before this point via
skip_timeout_check, so it is unaffected.

v2: (Himal) Do it for any queue type, not just kernel/migration
v3: - (Sashiko and Sanjay): don't clear the ban / GT reset for already
      killed/banned queues on unstarted-job timeout
    - Update commit message
    - (Matt) Add Fixes tag

Fixes: fe05cee4d953 ("drm/xe: Don't short circuit TDR on jobs not started")
Cc: Matthew Auld &lt;matthew.auld@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Cc: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Assisted-by: GitHub-Copilot:claude-opus-4.8
Tested-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Sanjay Yadav &lt;sanjay.kumar.yadav@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://patch.msgid.link/20260610152548.404575-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
(cherry picked from commit b1107d085e7e8ed15ba6f80c102528a9c8a6cb0e)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe: fix refcount leak in xe_range_fence_insert()</title>
<updated>2026-06-11T13:39:40+00:00</updated>
<author>
<name>Wentao Liang</name>
<email>vulab@iscas.ac.cn</email>
</author>
<published>2026-06-10T17:27:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba36786b21d19082e696eda85bfcd49e7071944a'/>
<id>ba36786b21d19082e696eda85bfcd49e7071944a</id>
<content type='text'>
xe_range_fence_insert() acquires a reference on fence via
dma_fence_get() and stores it in rfence-&gt;fence.  It then calls
dma_fence_add_callback() and handles two cases: when the callback
is successfully registered (err == 0) the fence is transferred to
the tree for later cleanup; when the fence is already signaled
(err == -ENOENT) it manually drops the extra reference with
dma_fence_put(fence).

However, dma_fence_add_callback() can fail with other errors
(e.g. -EINVAL) and in that case the code falls through to the free:
label without releasing the acquired reference, leaking it.

Fix the leak by adding an else branch that calls dma_fence_put()
before jumping to free: for any error other than -ENOENT.

Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Wentao Liang &lt;vulab@iscas.ac.cn&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com
(cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
xe_range_fence_insert() acquires a reference on fence via
dma_fence_get() and stores it in rfence-&gt;fence.  It then calls
dma_fence_add_callback() and handles two cases: when the callback
is successfully registered (err == 0) the fence is transferred to
the tree for later cleanup; when the fence is already signaled
(err == -ENOENT) it manually drops the extra reference with
dma_fence_put(fence).

However, dma_fence_add_callback() can fail with other errors
(e.g. -EINVAL) and in that case the code falls through to the free:
label without releasing the acquired reference, leaking it.

Fix the leak by adding an else branch that calls dma_fence_put()
before jumping to free: for any error other than -ENOENT.

Fixes: 845f64bdbfc9 ("drm/xe: Introduce a range-fence utility")
Signed-off-by: Wentao Liang &lt;vulab@iscas.ac.cn&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260610172705.3450560-1-matthew.brost@intel.com
(cherry picked from commit 98c4a4201290823c2c5c7ba21692bd9a64b61021)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe: include all registered queues in TLB invalidation</title>
<updated>2026-06-10T16:33:29+00:00</updated>
<author>
<name>Tangudu Tilak Tirumalesh</name>
<email>tilak.tirumalesh.tangudu@intel.com</email>
</author>
<published>2026-06-08T16:27:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e4aaac46593733a06ec1a1f1a63128206d67fcaa'/>
<id>e4aaac46593733a06ec1a1f1a63128206d67fcaa</id>
<content type='text'>
Context-based TLB invalidation currently selects only scheduling-active
exec queues via q-&gt;ops-&gt;active(). During rebind flows, queues may be
suspended (or transitioning through resume) while still owning valid
translations, causing them to be skipped from invalidation and leading
to missed TLB invalidations on LR rebinds.

The underlying issue is a TOCTOU: q-&gt;guc-&gt;state bits are flipped lock-free
from enable_scheduling(), disable_scheduling{,_deregister}(), the
suspend/resume sched-msg handlers, handle_sched_done(), and
guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes
against them, so any state-based predicate can race.

Include all the registered queues so that TLB invalidations are not
missed. This is race-free because list membership on vm-&gt;exec_queues.list
is stable under vm-&gt;exec_queues.lock held by the caller. The performance
impact is expected to be minimal and harmless. If it does turn out to be
a concern, we can come back with a race-safe solution to ignore certain
queues.

Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend")
Assisted-by: Claude:claude-opus-4.6
Suggested-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com
Signed-off-by: Shuicheng Lin &lt;shuicheng.lin@intel.com&gt;
(cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Context-based TLB invalidation currently selects only scheduling-active
exec queues via q-&gt;ops-&gt;active(). During rebind flows, queues may be
suspended (or transitioning through resume) while still owning valid
translations, causing them to be skipped from invalidation and leading
to missed TLB invalidations on LR rebinds.

The underlying issue is a TOCTOU: q-&gt;guc-&gt;state bits are flipped lock-free
from enable_scheduling(), disable_scheduling{,_deregister}(), the
suspend/resume sched-msg handlers, handle_sched_done(), and
guc_exec_queue_stop(); nothing in send_tlb_inval_ctx_ppgtt() serializes
against them, so any state-based predicate can race.

Include all the registered queues so that TLB invalidations are not
missed. This is race-free because list membership on vm-&gt;exec_queues.list
is stable under vm-&gt;exec_queues.lock held by the caller. The performance
impact is expected to be minimal and harmless. If it does turn out to be
a concern, we can come back with a race-safe solution to ignore certain
queues.

Fixes: 6cdaa5346d6f ("drm/xe: Add context-based invalidation to GuC TLB invalidation backend")
Assisted-by: Claude:claude-opus-4.6
Suggested-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Link: https://patch.msgid.link/20260608162745.338725-2-tilak.tirumalesh.tangudu@intel.com
Signed-off-by: Shuicheng Lin &lt;shuicheng.lin@intel.com&gt;
(cherry picked from commit aa625e1e9f0710e424fe4f0e3f032807df81b5b0)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/hw_error: Use HW_ERR prefix in log</title>
<updated>2026-06-10T16:33:25+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=381b3576a87f4ed6e76adb78d7d9400428f8f4b7'/>
<id>381b3576a87f4ed6e76adb78d7d9400428f8f4b7</id>
<content type='text'>
Hardware errors should be logged with HW_ERR prefix. Make them
consistent with existing logs.

Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hardware errors should be logged with HW_ERR prefix. Make them
consistent with existing logs.

Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-5-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit ad60a618c49fef07d1860bfb1091140d29f5eddb)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/drm_ras: Add per node cleanup action</title>
<updated>2026-06-10T16:33:22+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e3f5b0c5ae6845b4d8d23f079e872635cd8b0ae'/>
<id>3e3f5b0c5ae6845b4d8d23f079e872635cd8b0ae</id>
<content type='text'>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Add per node cleanup action which guarantees
cleanup on unwind and also simplifies the cleanup logic.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Add per node cleanup action which guarantees
cleanup on unwind and also simplifies the cleanup logic.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-4-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 67fc5543d8274b2fcbef87734fad0469358f4478)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/drm_ras: Make counter allocation drm managed</title>
<updated>2026-06-10T16:33:18+00:00</updated>
<author>
<name>Raag Jadav</name>
<email>raag.jadav@intel.com</email>
</author>
<published>2026-06-02T04:48:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4dcfcdc9fbb5efef21e149adf349d42d84c9da04'/>
<id>4dcfcdc9fbb5efef21e149adf349d42d84c9da04</id>
<content type='text'>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Fix this using drm managed allocation, which is
guaranteed to be cleaned up on unwind.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cleanup_node_param() is not registered for previous node in case of counter
allocation failure, which results in stale memory of previous node that
isn't cleaned up on unwind. Fix this using drm managed allocation, which is
guaranteed to be cleaned up on unwind.

Fixes: b40db12b542f ("drm/xe/xe_drm_ras: Add support for XE DRM RAS")
Signed-off-by: Raag Jadav &lt;raag.jadav@intel.com&gt;
Reviewed-by: Riana Tauro &lt;riana.tauro@intel.com&gt;
Link: https://patch.msgid.link/20260602044919.702209-3-raag.jadav@intel.com
Signed-off-by: Matt Roper &lt;matthew.d.roper@intel.com&gt;
(cherry picked from commit 58d77c77ea0c5cb2b755ebe23e973c8272acd896)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/display: fix oops in suspend/shutdown without display</title>
<updated>2026-06-10T16:33:09+00:00</updated>
<author>
<name>Jani Nikula</name>
<email>jani.nikula@intel.com</email>
</author>
<published>2026-05-15T16:09:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=68938cc08e23a94fd881e845837ff918de005ce7'/>
<id>68938cc08e23a94fd881e845837ff918de005ce7</id>
<content type='text'>
The xe driver keeps track of whether to probe display, and whether
display hardware is there, using xe-&gt;info.probe_display. It gets set to
false if there's no display after intel_display_device_probe(). However,
the display may also be disabled via fuses, detected at a later time in
intel_display_device_info_runtime_init().

In this case, the xe driver does for_each_intel_crtc() on uninitialized
mode config in xe_display_flush_cleanup_work(), leading to a NULL
pointer dereference, and generally calls display code with display info
cleared.

Check for intel_display_device_present() after
intel_display_device_info_runtime_init(), and reset
xe-&gt;info.probe_display as necessary. Also do unset_display_features()
for completeness, although display runtime init has already done
that. This will need to be unified across all cases later.

Move intel_display_device_info_runtime_init() call slightly earlier,
similar to i915, to avoid a bunch of unnecessary setup for no display
cases.

Note #1: The xe driver has no business doing low level display plumbing
like for_each_intel_crtc() to begin with. It all needs to happen in
display code.

Note #2: The actual bug is present already in commit 44e694958b95
("drm/xe/display: Implement display support"), but the oops was likely
introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display
suspend/resume work on discrete").

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Suraj Kandpal &lt;suraj.kandpal@intel.com&gt;
Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
(cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The xe driver keeps track of whether to probe display, and whether
display hardware is there, using xe-&gt;info.probe_display. It gets set to
false if there's no display after intel_display_device_probe(). However,
the display may also be disabled via fuses, detected at a later time in
intel_display_device_info_runtime_init().

In this case, the xe driver does for_each_intel_crtc() on uninitialized
mode config in xe_display_flush_cleanup_work(), leading to a NULL
pointer dereference, and generally calls display code with display info
cleared.

Check for intel_display_device_present() after
intel_display_device_info_runtime_init(), and reset
xe-&gt;info.probe_display as necessary. Also do unset_display_features()
for completeness, although display runtime init has already done
that. This will need to be unified across all cases later.

Move intel_display_device_info_runtime_init() call slightly earlier,
similar to i915, to avoid a bunch of unnecessary setup for no display
cases.

Note #1: The xe driver has no business doing low level display plumbing
like for_each_intel_crtc() to begin with. It all needs to happen in
display code.

Note #2: The actual bug is present already in commit 44e694958b95
("drm/xe/display: Implement display support"), but the oops was likely
introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display
suspend/resume work on discrete").

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150
Cc: stable@vger.kernel.org # v6.8+
Reviewed-by: Suraj Kandpal &lt;suraj.kandpal@intel.com&gt;
Link: https://patch.msgid.link/20260515160920.1082842-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
(cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da)
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/multi_queue: skip submit when primary queue is suspended</title>
<updated>2026-06-04T13:04:09+00:00</updated>
<author>
<name>Niranjana Vishwanathapura</name>
<email>niranjana.vishwanathapura@intel.com</email>
</author>
<published>2026-06-03T23:39:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ec4cbdd163f9bb2a2bd44eb93ecf4a2fa0e912a9'/>
<id>ec4cbdd163f9bb2a2bd44eb93ecf4a2fa0e912a9</id>
<content type='text'>
Return early in submit path when the multi-queue primary exec
queue is suspended to avoid submitting while suspended.

v2: Remove idle_skip_suspend fix as that feature is being
reverted here https://patchwork.freedesktop.org/series/167262/

Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support")
Cc: stable@vger.kernel.org # v7.0+
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Reviewed-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Signed-off-by: Niranjana Vishwanathapura &lt;niranjana.vishwanathapura@intel.com&gt;
Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Return early in submit path when the multi-queue primary exec
queue is suspended to avoid submitting while suspended.

v2: Remove idle_skip_suspend fix as that feature is being
reverted here https://patchwork.freedesktop.org/series/167262/

Fixes: bc5775c59258 ("drm/xe/multi_queue: Add GuC interface for multi queue support")
Cc: stable@vger.kernel.org # v7.0+
Assisted-by: GitHub-Copilot:claude-sonnet-4.6
Reviewed-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Signed-off-by: Niranjana Vishwanathapura &lt;niranjana.vishwanathapura@intel.com&gt;
Link: https://patch.msgid.link/20260603233946.863663-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit b7fb55cc3364ca128cfff9d50649ffd4327cd01e)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe: Clear pending_disable before signaling suspend fence</title>
<updated>2026-06-04T13:04:03+00:00</updated>
<author>
<name>Tangudu Tilak Tirumalesh</name>
<email>tilak.tirumalesh.tangudu@intel.com</email>
</author>
<published>2026-06-03T06:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=54f2a0442a30fe7a0f6bc8345e81f8b2db8effbd'/>
<id>54f2a0442a30fe7a0f6bc8345e81f8b2db8effbd</id>
<content type='text'>
In the schedule-disable done path for suspend, we
signal the suspend fence before clearing pending_disable.

That wakeup can let suspend_wait complete and resume be queued
immediately. The resume path may then reach enable_scheduling()
while pending_disable is still set and hit the
!exec_queue_pending_disable(q) assertion.

Fix this by clearing pending_disable before signaling
the suspend fence, so any resumed transition observes a
consistent state.

Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the schedule-disable done path for suspend, we
signal the suspend fence before clearing pending_disable.

That wakeup can let suspend_wait complete and resume be queued
immediately. The resume path may then reach enable_scheduling()
while pending_disable is still set and hit the
!exec_queue_pending_disable(q) assertion.

Fix this by clearing pending_disable before signaling
the suspend fence, so any resumed transition observes a
consistent state.

Fixes: 87651f31ae4e ("drm/xe/guc_submit: fix race around suspend_pending")
Cc: stable@vger.kernel.org # v7.0+
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Link: https://patch.msgid.link/20260603065217.3131066-3-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 4b1ae138b0e103d753773956a84eebc2edbf62c4)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "drm/xe: Skip exec queue schedule toggle if queue is idle during suspend"</title>
<updated>2026-06-04T13:03:57+00:00</updated>
<author>
<name>Tangudu Tilak Tirumalesh</name>
<email>tilak.tirumalesh.tangudu@intel.com</email>
</author>
<published>2026-06-03T06:52:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fa7c84726dc217ce0c183926ef9411636c7a2213'/>
<id>fa7c84726dc217ce0c183926ef9411636c7a2213</id>
<content type='text'>
This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6.

The idle-skip optimization bypasses GuC suspend, so the GPU may not
perform the context switch that flushes TLB entries for invalidated
userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB
invalidation and page faults during userptr invalidation tests.

Restore unconditional schedule toggling on suspend so the context-switch
TLB flush is always performed.

This optimization will be reintroduced with a fix that does not skip
suspend in LR/preempt-fence VM mode.

Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend")
Cc: stable@vger.kernel.org # v7.0+
Suggested-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 8533051ce92015e9cc6f75e0d52119b9d91610b6.

The idle-skip optimization bypasses GuC suspend, so the GPU may not
perform the context switch that flushes TLB entries for invalidated
userptr VMAs. In LR/preempt-fence VM mode, this can lead to missed TLB
invalidation and page faults during userptr invalidation tests.

Restore unconditional schedule toggling on suspend so the context-switch
TLB flush is always performed.

This optimization will be reintroduced with a fix that does not skip
suspend in LR/preempt-fence VM mode.

Fixes: 8533051ce920 ("drm/xe: Skip exec queue schedule toggle if queue is idle during suspend")
Cc: stable@vger.kernel.org # v7.0+
Suggested-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Tangudu Tilak Tirumalesh &lt;tilak.tirumalesh.tangudu@intel.com&gt;
Reviewed-by: Thomas Hellstrom &lt;thomas.hellstrom@linux.intel.com&gt;
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Link: https://patch.msgid.link/20260603065217.3131066-2-tilak.tirumalesh.tangudu@intel.com
(cherry picked from commit 6a1e7934d9a6cf46aecae00a99c2603d1295e170)
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
