<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c, branch v7.0-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGD</title>
<updated>2026-01-29T17:26:55+00:00</updated>
<author>
<name>Lang Yu</name>
<email>lang.yu@amd.com</email>
</author>
<published>2026-01-26T08:47:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a6a4dd519cbe1fdf1f33e2942356dcc9c7b4c682'/>
<id>a6a4dd519cbe1fdf1f33e2942356dcc9c7b4c682</id>
<content type='text'>
Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue.

Signed-off-by: Lang Yu &lt;lang.yu@amd.com&gt;
Reviewed-by: David Belanger &lt;david.belanger@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue.

Signed-off-by: Lang Yu &lt;lang.yu@amd.com&gt;
Reviewed-by: David Belanger &lt;david.belanger@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Reviewed-by: Mukul Joshi &lt;mukul.joshi@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: do not use amdgpu_bo_gpu_offset_no_check individually</title>
<updated>2025-12-16T18:27:13+00:00</updated>
<author>
<name>Saleemkhan Jamadar</name>
<email>saleemkhan083@gmail.com</email>
</author>
<published>2025-12-11T17:36:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1bc44dee2647b720065b71d57e594f70ea52fb3e'/>
<id>1bc44dee2647b720065b71d57e594f70ea52fb3e</id>
<content type='text'>
This should not be used indiviually, use amdgpu_bo_gpu_offset
with bo reserved.

v3 - unpin bo in queue destroy (Christian)
v2 - pin bo so that offset returned won't change after unlock (Christian)

Signed-off-by: Saleemkhan Jamadar &lt;saleemkhan083@gmail.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This should not be used indiviually, use amdgpu_bo_gpu_offset
with bo reserved.

v3 - unpin bo in queue destroy (Christian)
v2 - pin bo so that offset returned won't change after unlock (Christian)

Signed-off-by: Saleemkhan Jamadar &lt;saleemkhan083@gmail.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Change user queue interface signatures</title>
<updated>2025-12-08T18:56:39+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2025-11-24T07:17:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=473f12f820956988eb735d4cfd88ce0640f5d3af'/>
<id>473f12f820956988eb735d4cfd88ce0640f5d3af</id>
<content type='text'>
A userq is associated with its queue manager. Use that and make
the userqueue interfaces to operate on queue.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A userq is associated with its queue manager. Use that and make
the userqueue interfaces to operate on queue.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Update vm start, end, hole to support 57bit address</title>
<updated>2025-12-08T18:56:30+00:00</updated>
<author>
<name>Philip Yang</name>
<email>Philip.Yang@amd.com</email>
</author>
<published>2025-04-22T20:15:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cf856ca9b999bc81d27bf8c4e1d7b5c7740bcea8'/>
<id>cf856ca9b999bc81d27bf8c4e1d7b5c7740bcea8</id>
<content type='text'>
Change gmc macro AMDGPU_GMC_HOLE_START/END/MASK to 57bit if vm root
level is PDB3 for 5-level page tables.

The macro access adev without passing adev as parameter is to minimize
the code change to support 57bit, then we have to add adev variable in
several places to use the macro.

Because adev definition is not available in all amdgpu c files which
include amdgpu_gmc.h, change inline function amdgpu_gmc_sign_extend to
macro.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Acked-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change gmc macro AMDGPU_GMC_HOLE_START/END/MASK to 57bit if vm root
level is PDB3 for 5-level page tables.

The macro access adev without passing adev as parameter is to minimize
the code change to support 57bit, then we have to add adev variable in
several places to use the macro.

Because adev definition is not available in all amdgpu c files which
include amdgpu_gmc.h, change inline function amdgpu_gmc_sign_extend to
macro.

Signed-off-by: Philip Yang &lt;Philip.Yang@amd.com&gt;
Acked-by: Felix Kuehling &lt;felix.kuehling@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/mes: add multi-xcc support</title>
<updated>2025-12-08T18:56:29+00:00</updated>
<author>
<name>Jack Xiao</name>
<email>Jack.Xiao@amd.com</email>
</author>
<published>2024-11-21T08:22:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d09c7e266c8cd5590db59693ea3c3a66a55e63ab'/>
<id>d09c7e266c8cd5590db59693ea3c3a66a55e63ab</id>
<content type='text'>
a. extend mes pipe instances to num_xcc * max_mes_pipe
b. initialize mes schq/kiq pipes per xcc
c. submit mes packet to mes ring according to xcc_id

v2: rebase (Alex)

Signed-off-by: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
a. extend mes pipe instances to num_xcc * max_mes_pipe
b. initialize mes schq/kiq pipes per xcc
c. submit mes packet to mes ring according to xcc_id

v2: rebase (Alex)

Signed-off-by: Jack Xiao &lt;Jack.Xiao@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: resume MES scheduling after user queue hang detection and recovery</title>
<updated>2025-11-12T02:54:17+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-11-07T11:19:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46f2029fe1dbdbb2ff3d6a566b32002660d3944b'/>
<id>46f2029fe1dbdbb2ff3d6a566b32002660d3944b</id>
<content type='text'>
This patch ensures the Micro-Engine Scheduler (MES) is properly resumed
after detecting and recovering from a user queue hang condition.

Key changes:
1. Track when a hung user queue is detected using found_hung_queue flag
2. Call amdgpu_mes_resume() to restart MES scheduling after completing
   the hang recovery process
3. This complements the existing recovery steps (fence force completion
   and device wedging) by ensuring the scheduler can process new work

Without this resume call, the MES scheduler may remain in a paused state
even after the hung queue has been handled, preventing newly submitted
work from being processed and leading to system stalls.

Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch ensures the Micro-Engine Scheduler (MES) is properly resumed
after detecting and recovering from a user queue hang condition.

Key changes:
1. Track when a hung user queue is detected using found_hung_queue flag
2. Call amdgpu_mes_resume() to restart MES scheduling after completing
   the hang recovery process
3. This complements the existing recovery steps (fence force completion
   and device wedging) by ensuring the scheduler can process new work

Without this resume call, the MES scheduler may remain in a paused state
even after the hung queue has been handled, preventing newly submitted
work from being processed and leading to system stalls.

Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu/userq: fix SDMA and compute validation</title>
<updated>2025-10-28T13:59:48+00:00</updated>
<author>
<name>Alex Deucher</name>
<email>alexander.deucher@amd.com</email>
</author>
<published>2025-10-10T19:21:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a0559012a18a5a6ad87516e982892765a403b8ab'/>
<id>a0559012a18a5a6ad87516e982892765a403b8ab</id>
<content type='text'>
The CSA and EOP buffers have different alignement requirements.
Hardcode them for now as a bug fix.  A proper query will be added in
a subsequent patch.

v2: verify gfx shadow helper callback (Prike)

Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size")
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The CSA and EOP buffers have different alignement requirements.
Hardcode them for now as a bug fix.  A proper query will be added in
a subsequent patch.

v2: verify gfx shadow helper callback (Prike)

Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size")
Reviewed-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray</title>
<updated>2025-10-28T13:59:22+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-10-21T05:01:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f18719ef4bb7b0e48dee92703b01e16f6c0d6318'/>
<id>f18719ef4bb7b0e48dee92703b01e16f6c0d6318</id>
<content type='text'>
This commit refactors the AMDGPU userqueue management subsystem to replace
IDR (ID Allocation) with XArray for improved performance, scalability, and
maintainability. The changes address several issues with the previous IDR
implementation and provide better locking semantics.

Key changes:

1. **Global XArray Introduction**:
   - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking
   - Uses doorbell_index as key for efficient global lookup
   - Replaces the previous `userq_mgr_list` linked list approach

2. **Per-process XArray Conversion**:
   - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr`
   - Maintains per-process queue tracking with queue_id as key
   - Uses XA_FLAGS_ALLOC for automatic ID allocation

3. **Locking Improvements**:
   - Removed global `userq_mutex` from `struct amdgpu_device`
   - Replaced with fine-grained XArray locking using XArray's internal spinlocks

4. **Runtime Idle Check Optimization**:
   - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty

5. **Queue Management Functions**:
   - Converted all IDR operations to equivalent XArray functions:
     - `idr_alloc()` → `xa_alloc()`
     - `idr_find()` → `xa_load()`
     - `idr_remove()` → `xa_erase()`
     - `idr_for_each()` → `xa_for_each()`

Benefits:
- **Performance**: XArray provides better scalability for large numbers of queues
- **Memory Efficiency**: Reduced memory overhead compared to IDR
- **Thread Safety**: Improved locking semantics with XArray's internal spinlocks

v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa
    Remove xa_lock and use its own lock.

v3: Set queue-&gt;userq_mgr = uq_mgr in amdgpu_userq_create()
v4: use xa_store_irq (Christian)
    hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian)

Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit refactors the AMDGPU userqueue management subsystem to replace
IDR (ID Allocation) with XArray for improved performance, scalability, and
maintainability. The changes address several issues with the previous IDR
implementation and provide better locking semantics.

Key changes:

1. **Global XArray Introduction**:
   - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking
   - Uses doorbell_index as key for efficient global lookup
   - Replaces the previous `userq_mgr_list` linked list approach

2. **Per-process XArray Conversion**:
   - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr`
   - Maintains per-process queue tracking with queue_id as key
   - Uses XA_FLAGS_ALLOC for automatic ID allocation

3. **Locking Improvements**:
   - Removed global `userq_mutex` from `struct amdgpu_device`
   - Replaced with fine-grained XArray locking using XArray's internal spinlocks

4. **Runtime Idle Check Optimization**:
   - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty

5. **Queue Management Functions**:
   - Converted all IDR operations to equivalent XArray functions:
     - `idr_alloc()` → `xa_alloc()`
     - `idr_find()` → `xa_load()`
     - `idr_remove()` → `xa_erase()`
     - `idr_for_each()` → `xa_for_each()`

Benefits:
- **Performance**: XArray provides better scalability for large numbers of queues
- **Memory Efficiency**: Reduced memory overhead compared to IDR
- **Thread Safety**: Improved locking semantics with XArray's internal spinlocks

v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa
    Remove xa_lock and use its own lock.

v3: Set queue-&gt;userq_mgr = uq_mgr in amdgpu_userq_create()
v4: use xa_store_irq (Christian)
    hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian)

Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
