<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/uapi/drm/xe_drm.h, branch v6.17</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drm/xe/uapi: Correct sync type definition in comments</title>
<updated>2025-06-25T14:15:54+00:00</updated>
<author>
<name>Shuicheng Lin</name>
<email>shuicheng.lin@intel.com</email>
</author>
<published>2025-06-08T23:01:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=771f002ef1d6f6c2b9bddf779abd31da6b9ccd25'/>
<id>771f002ef1d6f6c2b9bddf779abd31da6b9ccd25</id>
<content type='text'>
Commit 37d078e51b4c ("drm/xe/uapi: Split xe_sync types from flags") renamed some DRM_XE_SYNC_*
defines but later commits kept using the old names. Correct them with the new definition.

v2: correct fixes tag and update commit message to explain why (Lucas)

Fixes: 9329f0667215 ("drm/xe/uapi: Use LR abbrev for long-running vms")
Fixes: 4b437893a826 ("drm/xe/uapi: More uAPI documentation additions and cosmetic updates")
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Francois Dugast &lt;francois.dugast@intel.com&gt;
Cc: Zongyao Bai &lt;zongyao.bai@intel.com&gt;
Signed-off-by: Shuicheng Lin &lt;shuicheng.lin@intel.com&gt;
Link: https://lore.kernel.org/r/20250608230133.1250849-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 37d078e51b4c ("drm/xe/uapi: Split xe_sync types from flags") renamed some DRM_XE_SYNC_*
defines but later commits kept using the old names. Correct them with the new definition.

v2: correct fixes tag and update commit message to explain why (Lucas)

Fixes: 9329f0667215 ("drm/xe/uapi: Use LR abbrev for long-running vms")
Fixes: 4b437893a826 ("drm/xe/uapi: More uAPI documentation additions and cosmetic updates")
Reviewed-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Cc: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Cc: Francois Dugast &lt;francois.dugast@intel.com&gt;
Cc: Zongyao Bai &lt;zongyao.bai@intel.com&gt;
Signed-off-by: Shuicheng Lin &lt;shuicheng.lin@intel.com&gt;
Link: https://lore.kernel.org/r/20250608230133.1250849-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/oa/uapi: Expose media OA units</title>
<updated>2025-06-17T18:31:50+00:00</updated>
<author>
<name>Ashutosh Dixit</name>
<email>ashutosh.dixit@intel.com</email>
</author>
<published>2025-06-06T19:26:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e04dac12cec853347908432b663a3f78e26d3b8d'/>
<id>e04dac12cec853347908432b663a3f78e26d3b8d</id>
<content type='text'>
On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM)
units. One or more SCMI OAM units might be present on a platform. In
addition there is another OAM unit for global events, called
OAM-SAG. Performance metrics for media workloads can be obtained from these
OAM units, similar to OAG.

Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA
unit without any attached engines.

Bspec: 70819, 67103, 63844, 72572, 74476, 61284

v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for &lt; 12.7 platforms
v3: Return XE_OA_UNIT_INVALID for &lt; 12.7 to indicate no OAM units
v4: Move xe_oa_print_oa_units() to separate patch
v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v6: Introduce DRM_XE_OA_CAPS_OAM

Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM)
units. One or more SCMI OAM units might be present on a platform. In
addition there is another OAM unit for global events, called
OAM-SAG. Performance metrics for media workloads can be obtained from these
OAM units, similar to OAG.

Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA
unit without any attached engines.

Bspec: 70819, 67103, 63844, 72572, 74476, 61284

v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for &lt; 12.7 platforms
v3: Return XE_OA_UNIT_INVALID for &lt; 12.7 to indicate no OAM units
v4: Move xe_oa_print_oa_units() to separate patch
v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG
v6: Introduce DRM_XE_OA_CAPS_OAM

Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Reviewed-by: Umesh Nerlige Ramappa &lt;umesh.nerlige.ramappa@intel.com&gt;
Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready</title>
<updated>2025-06-02T15:28:48+00:00</updated>
<author>
<name>Daniele Ceraolo Spurio</name>
<email>daniele.ceraolospurio@intel.com</email>
</author>
<published>2025-05-22T22:54:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21784ca96025b62d95b670b7639ad70ddafa69b8'/>
<id>21784ca96025b62d95b670b7639ad70ddafa69b8</id>
<content type='text'>
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.

Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).

v2: Minor rework of the doc (Rodrigo)

Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Cc: José Roberto de Souza &lt;jose.souza@intel.com&gt;
Reviewed-by: José Roberto de Souza &lt;jose.souza@intel.com&gt;
Reviewed-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Acked-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.

Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).

v2: Minor rework of the doc (Rodrigo)

Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Cc: José Roberto de Souza &lt;jose.souza@intel.com&gt;
Reviewed-by: José Roberto de Souza &lt;jose.souza@intel.com&gt;
Reviewed-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Acked-by: Rodrigo Vivi &lt;rodrigo.vivi@intel.com&gt;
Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe: Allow scratch page under fault mode for certain platform</title>
<updated>2025-04-07T05:47:30+00:00</updated>
<author>
<name>Oak Zeng</name>
<email>oak.zeng@intel.com</email>
</author>
<published>2025-04-03T16:53:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ae28e34400aab0dc4876eac83766fa9731f9901c'/>
<id>ae28e34400aab0dc4876eac83766fa9731f9901c</id>
<content type='text'>
Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.

On xe2 and xe3, out of bound prefetch can cause page fault and further
system hang because xekmd can't resolve such page fault. SYCL and OCL
language runtime requires out of bound prefetch to be silently dropped
without causing any functional problem, thus the existing behavior
doesn't meet language runtime requirement.

At the same time, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.

Fix those problems by allowing scratch page under fault mode for xe2 and
xe3. With scratch page in place, HW prefetching could always hit scratch
page instead of causing interrupt.

A side effect is, scratch page could hide application program error.
Application out of bound accesses are hided by scratch page mapping,
instead of get reported to user.

v2: Refine commit message (Thomas)

v3: Move the scratch page flag check to after scratch page wa (Thomas)

v4: drop NEEDS_SCRATCH macro (matt)
    Add a comment to DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE

Signed-off-by: Oak Zeng &lt;oak.zeng@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://lore.kernel.org/r/20250403165328.2438690-4-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Normally scratch page is not allowed when a vm is operate under page
fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE
and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason
is fault mode relies on recoverable page to work, while scratch page
can mute recoverable page fault.

On xe2 and xe3, out of bound prefetch can cause page fault and further
system hang because xekmd can't resolve such page fault. SYCL and OCL
language runtime requires out of bound prefetch to be silently dropped
without causing any functional problem, thus the existing behavior
doesn't meet language runtime requirement.

At the same time, HW prefetching can cause page fault interrupt. Due to
page fault interrupt overhead (i.e., need Guc and KMD involved to fix
the page fault), HW prefetching can be slowed by many orders of magnitude.

Fix those problems by allowing scratch page under fault mode for xe2 and
xe3. With scratch page in place, HW prefetching could always hit scratch
page instead of causing interrupt.

A side effect is, scratch page could hide application program error.
Application out of bound accesses are hided by scratch page mapping,
instead of get reported to user.

v2: Refine commit message (Thomas)

v3: Move the scratch page flag check to after scratch page wa (Thomas)

v4: drop NEEDS_SCRATCH macro (matt)
    Add a comment to DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE

Signed-off-by: Oak Zeng &lt;oak.zeng@intel.com&gt;
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://lore.kernel.org/r/20250403165328.2438690-4-oak.zeng@intel.com
Signed-off-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR</title>
<updated>2025-03-06T19:35:52+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2025-03-06T01:26:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=77613a2e10087b1e613649ecb337c4922900421c'/>
<id>77613a2e10087b1e613649ecb337c4922900421c</id>
<content type='text'>
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag,
which indicates whether the device supports CPU address mirroring. The
intent is for UMDs to use this query to determine if a VM can be set up
with CPU address mirroring. This flag is implemented by checking if the
device supports GPU faults.

v7:
 - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI)

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag,
which indicates whether the device supports CPU address mirroring. The
intent is for UMDs to use this query to determine if a VM can be set up
with CPU address mirroring. This flag is implemented by checking if the
device supports GPU faults.

v7:
 - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI)

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR</title>
<updated>2025-03-06T19:35:33+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2025-03-06T01:26:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b43e864af0d4e74636c0e1dee857ce3275a84829'/>
<id>b43e864af0d4e74636c0e1dee857ce3275a84829</id>
<content type='text'>
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.

CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.

It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.

Expected usage:

- Bind the entire virtual address (VA) space upon program load using the
  DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
  allocate a CPU address using mmap(PROT_NONE), bind the BO to the
  mmapped address using existing bind IOCTLs. If a CPU map of the BO is
  needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
  space and them bind the mapping address with the
  DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
  faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
  remove GPU mappings.

Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.

This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.

v3:
 - Call vm_bind_ioctl_ops_fini on -ENODATA
 - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
 - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
 - Rework commit message for expected usage (Thomas)
 - Describe state of code after patch in commit message (Thomas)
v4:
 - Fix alignment (Checkpatch)

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to
create unpopulated virtual memory areas (VMAs) without memory backing or
GPU page tables. These VMAs are referred to as CPU address mirror VMAs.
The idea is that upon a page fault or prefetch, the memory backing and
GPU page tables will be populated.

CPU address mirror VMAs only update GPUVM state; they do not have an
internal page table (PT) state, nor do they have GPU mappings.

It is expected that CPU address mirror VMAs will be mixed with buffer
object (BO) VMAs within a single VM. In other words, system allocations
and runtime allocations can be mixed within a single user-mode driver
(UMD) program.

Expected usage:

- Bind the entire virtual address (VA) space upon program load using the
  DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- If a buffer object (BO) requires GPU mapping (runtime allocation),
  allocate a CPU address using mmap(PROT_NONE), bind the BO to the
  mmapped address using existing bind IOCTLs. If a CPU map of the BO is
  needed, mmap it again to the same CPU address using mmap(MAP_FIXED)
- If a BO no longer requires GPU mapping, munmap it from the CPU address
  space and them bind the mapping address with the
  DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag.
- Any malloc'd or mmapped CPU address accessed by the GPU will be
  faulted in via the SVM implementation (system allocation).
- Upon freeing any mmapped or malloc'd data, the SVM implementation will
  remove GPU mappings.

Only supporting 1 to 1 mapping between user address space and GPU
address space at the moment as that is the expected use case. uAPI
defines interface for non 1 to 1 but enforces 1 to 1, this restriction
can be lifted if use cases arrise for non 1 to 1 mappings.

This patch essentially short-circuits the code in the existing VM bind
paths to avoid populating page tables when the
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set.

v3:
 - Call vm_bind_ioctl_ops_fini on -ENODATA
 - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs
 - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas)
 - Rework commit message for expected usage (Thomas)
 - Describe state of code after patch in commit message (Thomas)
v4:
 - Fix alignment (Checkpatch)

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Reviewed-by: Himal Prasad Ghimiray &lt;himal.prasad.ghimiray@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/uapi: Use hint for guc to set GT frequency</title>
<updated>2025-03-05T04:24:24+00:00</updated>
<author>
<name>Tejas Upadhyay</name>
<email>tejas.upadhyay@intel.com</email>
</author>
<published>2025-02-28T07:02:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5488bec96bccbd87335921338f8dc38b87db7d2c'/>
<id>5488bec96bccbd87335921338f8dc38b87db7d2c</id>
<content type='text'>
Allow user to provide a low latency hint. When set, KMD sends a hint
to GuC which results in special handling for that process. SLPC will
ramp the GT frequency aggressively every time it switches to this
process.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to processes that set this bit during process
creation.

Improvement with this approach as below:

Before,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz
    Kernel launch latency : 283.16 us

After,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz

    Kernel launch latency : 63.38 us

Compute PR: https://github.com/intel/compute-runtime/pull/794
Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214
IGT PR: https://patchwork.freedesktop.org/patch/639989/

V10(Lucas):
  - Remove doc from drm-uapi.rst
v9(Vinay):
  - remove extra line, align commit message
v8(Vinay):
  - Add separate example for using low latency hint
v7(Jose):
  - Update UMD PR
  - applicable to all gpus
V6:
  - init flags, remove redundant flags check (MAuld)
V5:
  - Move uapi doc to documentation and GuC ABI specific change (Rodrigo)
  - Modify logic to restrict exec queue flags (MAuld)
V4:
  - To make it clear, dont use exec queue word (Vinay)
  - Correct typo in description of flag (Jose/Vinay)
  - rename set_strategy api and replace ctx with exec queue(Vinay)
  - Start with 0th bit to indentify user flags (Jose)
V3:
  - Conver user flag to kernel internal flag and use (Oak)
  - Support query config for use to check kernel support (Jose)
  - Dont need to take runtime pm (Vinay)
V2:
  - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon)
  - Add motivation to description (Lucas)

Acked-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Reviewed-by: Vinay Belgaumkar &lt;vinay.belgaumkar@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allow user to provide a low latency hint. When set, KMD sends a hint
to GuC which results in special handling for that process. SLPC will
ramp the GT frequency aggressively every time it switches to this
process.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to processes that set this bit during process
creation.

Improvement with this approach as below:

Before,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz
    Kernel launch latency : 283.16 us

After,

:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
  Device: Intel(R) Graphics [0xe20b]
    Driver version  : 24.52.0 (Linux x64)
    Compute units   : 160
    Clock frequency : 2850 MHz

    Kernel launch latency : 63.38 us

Compute PR: https://github.com/intel/compute-runtime/pull/794
Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214
IGT PR: https://patchwork.freedesktop.org/patch/639989/

V10(Lucas):
  - Remove doc from drm-uapi.rst
v9(Vinay):
  - remove extra line, align commit message
v8(Vinay):
  - Add separate example for using low latency hint
v7(Jose):
  - Update UMD PR
  - applicable to all gpus
V6:
  - init flags, remove redundant flags check (MAuld)
V5:
  - Move uapi doc to documentation and GuC ABI specific change (Rodrigo)
  - Modify logic to restrict exec queue flags (MAuld)
V4:
  - To make it clear, dont use exec queue word (Vinay)
  - Correct typo in description of flag (Jose/Vinay)
  - rename set_strategy api and replace ctx with exec queue(Vinay)
  - Start with 0th bit to indentify user flags (Jose)
V3:
  - Conver user flag to kernel internal flag and use (Oak)
  - Support query config for use to check kernel support (Jose)
  - Dont need to take runtime pm (Vinay)
V2:
  - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon)
  - Add motivation to description (Lucas)

Acked-by: Lucas De Marchi &lt;lucas.demarchi@intel.com&gt;
Reviewed-by: Vinay Belgaumkar &lt;vinay.belgaumkar@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay &lt;tejas.upadhyay@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/uapi: Add a device query to get EU stall sampling information</title>
<updated>2025-02-26T19:31:05+00:00</updated>
<author>
<name>Harish Chegondi</name>
<email>harish.chegondi@intel.com</email>
</author>
<published>2025-02-26T01:47:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cd5bbb2532f276626d1416b237676772bb61f11e'/>
<id>cd5bbb2532f276626d1416b237676772bb61f11e</id>
<content type='text'>
User space can get the EU stall data record size, EU stall capabilities,
EU stall sampling rates, and per XeCore buffer size with query IOCTL
DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
A struct drm_xe_query_eu_stall will be returned to the user space along
with an array of supported sampling rates sorted in the fastest sampling
rate first order. sampling_rates in struct drm_xe_query_eu_stall will
point to the array of sampling rates.

Any capabilities in EU stall sampling as of this patch are considered
as base capabilities. New capability bits will be added for any new
functionality added later.

v12: Rename has_eu_stall_sampling_support() to
     xe_eu_stall_supported_on_platform() and move it to header file.
v11: Check if EU stall sampling is supported on the platform.
v10: Change comments and variable names as per feedback
v9: Move reserved fields above num_sampling_rates in
    struct drm_xe_query_eu_stall.
v7: Change sampling_rates from a pointer to flexible array.
v6: Include EU stall sampling rates information and
    per XeCore buffer size in the query information.

Reviewed-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Harish Chegondi &lt;harish.chegondi@intel.com&gt;
Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/67ba42796a5a99d648239c315694cd222812a49b.1740533885.git.harish.chegondi@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
User space can get the EU stall data record size, EU stall capabilities,
EU stall sampling rates, and per XeCore buffer size with query IOCTL
DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL.
A struct drm_xe_query_eu_stall will be returned to the user space along
with an array of supported sampling rates sorted in the fastest sampling
rate first order. sampling_rates in struct drm_xe_query_eu_stall will
point to the array of sampling rates.

Any capabilities in EU stall sampling as of this patch are considered
as base capabilities. New capability bits will be added for any new
functionality added later.

v12: Rename has_eu_stall_sampling_support() to
     xe_eu_stall_supported_on_platform() and move it to header file.
v11: Check if EU stall sampling is supported on the platform.
v10: Change comments and variable names as per feedback
v9: Move reserved fields above num_sampling_rates in
    struct drm_xe_query_eu_stall.
v7: Change sampling_rates from a pointer to flexible array.
v6: Include EU stall sampling rates information and
    per XeCore buffer size in the query information.

Reviewed-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Harish Chegondi &lt;harish.chegondi@intel.com&gt;
Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/67ba42796a5a99d648239c315694cd222812a49b.1740533885.git.harish.chegondi@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/uapi: Introduce API for EU stall sampling</title>
<updated>2025-02-26T19:30:57+00:00</updated>
<author>
<name>Harish Chegondi</name>
<email>harish.chegondi@intel.com</email>
</author>
<published>2025-02-26T01:47:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1537ec85ebd7d7aa3ce1a003007cd3588cd58bda'/>
<id>1537ec85ebd7d7aa3ce1a003007cd3588cd58bda</id>
<content type='text'>
A new hardware feature first introduced in PVC gives capability to
periodically sample EU stall state and record counts for different stall
reasons, on a per IP basis, aggregate across all EUs in a subslice and
record the samples in a buffer in each subslice. Eventually, the aggregated
data is written out to a buffer in the memory. This feature is also
supported in XE2 and later architecture GPUs.

Use an existing IOCTL - DRM_IOCTL_XE_OBSERVATION as the interface into the
driver from the user space to do initial setup and obtain a file descriptor
for the EU stall data stream.  Input parameter to the IOCTL is a struct
drm_xe_observation_param in which observation_type should be set to
DRM_XE_OBSERVATION_TYPE_EU_STALL, observation_op should be
DRM_XE_OBSERVATION_OP_STREAM_OPEN and param should point to a chain of
drm_xe_ext_set_property structures in which each structure has a pair of
property and value. The EU stall sampling input properties are defined in
drm_xe_eu_stall_property_id enum.

With the file descriptor obtained from DRM_IOCTL_XE_OBSERVATION, user space
can enable and disable EU stall sampling with the IOCTLs:
DRM_XE_OBSERVATION_IOCTL_ENABLE and DRM_XE_OBSERVATION_IOCTL_DISABLE.
User space can also call poll() to check for availability of data in the
buffer. The data can be read with read(). Finally, the file descriptor
can be closed with close().

v11: Changed a couple of variables in struct eu_stall_open_properties
     from unsigned int to int.
v10: Use extension number while parsing chain of extensions.
     Remove function description for static functions.
     Move code around as per review feedback.
v9: Changed some u32 to unsigned int.
    Moved some code around as per review feedback from v8.
v8: Used div_u64 instead of / to fix 32-bit build issue.
    Changed copyright year in xe_eu_stall.c/h to 2025.
v7: Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT
    to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with
    OA. Renamed the corresponding internal variables.
    Fixed some commit messages based on review feedback.
v6: Change the input sampling rate to GPU cycles instead of
    GPU cycles multiplier.

Reviewed-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Harish Chegondi &lt;harish.chegondi@intel.com&gt;
Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/bb707a27975c33e4a912b9839b023acb7a1f9c90.1740533885.git.harish.chegondi@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A new hardware feature first introduced in PVC gives capability to
periodically sample EU stall state and record counts for different stall
reasons, on a per IP basis, aggregate across all EUs in a subslice and
record the samples in a buffer in each subslice. Eventually, the aggregated
data is written out to a buffer in the memory. This feature is also
supported in XE2 and later architecture GPUs.

Use an existing IOCTL - DRM_IOCTL_XE_OBSERVATION as the interface into the
driver from the user space to do initial setup and obtain a file descriptor
for the EU stall data stream.  Input parameter to the IOCTL is a struct
drm_xe_observation_param in which observation_type should be set to
DRM_XE_OBSERVATION_TYPE_EU_STALL, observation_op should be
DRM_XE_OBSERVATION_OP_STREAM_OPEN and param should point to a chain of
drm_xe_ext_set_property structures in which each structure has a pair of
property and value. The EU stall sampling input properties are defined in
drm_xe_eu_stall_property_id enum.

With the file descriptor obtained from DRM_IOCTL_XE_OBSERVATION, user space
can enable and disable EU stall sampling with the IOCTLs:
DRM_XE_OBSERVATION_IOCTL_ENABLE and DRM_XE_OBSERVATION_IOCTL_DISABLE.
User space can also call poll() to check for availability of data in the
buffer. The data can be read with read(). Finally, the file descriptor
can be closed with close().

v11: Changed a couple of variables in struct eu_stall_open_properties
     from unsigned int to int.
v10: Use extension number while parsing chain of extensions.
     Remove function description for static functions.
     Move code around as per review feedback.
v9: Changed some u32 to unsigned int.
    Moved some code around as per review feedback from v8.
v8: Used div_u64 instead of / to fix 32-bit build issue.
    Changed copyright year in xe_eu_stall.c/h to 2025.
v7: Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT
    to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with
    OA. Renamed the corresponding internal variables.
    Fixed some commit messages based on review feedback.
v6: Change the input sampling rate to GPU cycles instead of
    GPU cycles multiplier.

Reviewed-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Harish Chegondi &lt;harish.chegondi@intel.com&gt;
Signed-off-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/bb707a27975c33e4a912b9839b023acb7a1f9c90.1740533885.git.harish.chegondi@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/xe/pxp/uapi: Add API to mark a BO as using PXP</title>
<updated>2025-02-03T19:51:23+00:00</updated>
<author>
<name>Daniele Ceraolo Spurio</name>
<email>daniele.ceraolospurio@intel.com</email>
</author>
<published>2025-01-29T17:41:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=41a97c4a12947c2786a1680d6839bb72d1c57cec'/>
<id>41a97c4a12947c2786a1680d6839bb72d1c57cec</id>
<content type='text'>
The driver needs to know if a BO is encrypted with PXP to enable the
display decryption at flip time.
Furthermore, we want to keep track of the status of the encryption and
reject any operation that involves a BO that is encrypted using an old
key. There are two points in time where such checks can kick in:

1 - at VM bind time, all operations except for unmapping will be
    rejected if the key used to encrypt the BO is no longer valid. This
    check is opt-in via a new VM_BIND flag, to avoid a scenario where a
    malicious app purposely shares an invalid BO with a non-PXP aware
    app (such as a compositor). If the VM_BIND was failed, the
    compositor would be unable to display anything at all. Allowing the
    bind to go through means that output still works, it just displays
    garbage data within the bounds of the illegal BO.

2 - at job submission time, if the queue is marked as using PXP, all
    objects bound to the VM will be checked and the submission will be
    rejected if any of them was encrypted with a key that is no longer
    valid.

Note that there is no risk of leaking the encrypted data if a user does
not opt-in to those checks; the only consequence is that the user will
not realize that the encryption key is changed and that the data is no
longer valid.

v2: Better commnnts and descriptions (John), rebase

v3: Properly return the result of key_assign up the stack, do not use
xe_bo in display headers (Jani)

v4: improve key_instance variable documentation (John)

Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Reviewed-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The driver needs to know if a BO is encrypted with PXP to enable the
display decryption at flip time.
Furthermore, we want to keep track of the status of the encryption and
reject any operation that involves a BO that is encrypted using an old
key. There are two points in time where such checks can kick in:

1 - at VM bind time, all operations except for unmapping will be
    rejected if the key used to encrypt the BO is no longer valid. This
    check is opt-in via a new VM_BIND flag, to avoid a scenario where a
    malicious app purposely shares an invalid BO with a non-PXP aware
    app (such as a compositor). If the VM_BIND was failed, the
    compositor would be unable to display anything at all. Allowing the
    bind to go through means that output still works, it just displays
    garbage data within the bounds of the illegal BO.

2 - at job submission time, if the queue is marked as using PXP, all
    objects bound to the VM will be checked and the submission will be
    rejected if any of them was encrypted with a key that is no longer
    valid.

Note that there is no risk of leaking the encrypted data if a user does
not opt-in to those checks; the only consequence is that the user will
not realize that the encryption key is changed and that the data is no
longer valid.

v2: Better commnnts and descriptions (John), rebase

v3: Properly return the result of key_assign up the stack, do not use
xe_bo in display headers (Jani)

v4: improve key_instance variable documentation (John)

Signed-off-by: Daniele Ceraolo Spurio &lt;daniele.ceraolospurio@intel.com&gt;
Cc: Matthew Brost &lt;matthew.brost@intel.com&gt;
Cc: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt;
Cc: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Cc: Jani Nikula &lt;jani.nikula@intel.com&gt;
Reviewed-by: John Harrison &lt;John.C.Harrison@Intel.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com
</pre>
</div>
</content>
</entry>
</feed>
