linux-toradex.git/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c, branch v7.0-rc5

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using

    git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook

drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGD

2026-01-29T17:26:55+00:00

Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue.

Signed-off-by: Lang Yu 
Reviewed-by: David Belanger 
Reviewed-by: Hawking Zhang 
Reviewed-by: Mukul Joshi 
Signed-off-by: Alex Deucher

drm/amdgpu: do not use amdgpu_bo_gpu_offset_no_check individually

2025-12-16T18:27:13+00:00

This should not be used indiviually, use amdgpu_bo_gpu_offset
with bo reserved.

v3 - unpin bo in queue destroy (Christian)
v2 - pin bo so that offset returned won't change after unlock (Christian)

Signed-off-by: Saleemkhan Jamadar 
Suggested-by: Christian König 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Change user queue interface signatures

2025-12-08T18:56:39+00:00

A userq is associated with its queue manager. Use that and make
the userqueue interfaces to operate on queue.

Signed-off-by: Lijo Lazar 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Update vm start, end, hole to support 57bit address

2025-12-08T18:56:30+00:00

Change gmc macro AMDGPU_GMC_HOLE_START/END/MASK to 57bit if vm root
level is PDB3 for 5-level page tables.

The macro access adev without passing adev as parameter is to minimize
the code change to support 57bit, then we have to add adev variable in
several places to use the macro.

Because adev definition is not available in all amdgpu c files which
include amdgpu_gmc.h, change inline function amdgpu_gmc_sign_extend to
macro.

Signed-off-by: Philip Yang 
Acked-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdgpu/mes: add multi-xcc support

2025-12-08T18:56:29+00:00

a. extend mes pipe instances to num_xcc * max_mes_pipe
b. initialize mes schq/kiq pipes per xcc
c. submit mes packet to mes ring according to xcc_id

v2: rebase (Alex)

Signed-off-by: Jack Xiao 
Reviewed-by: Hawking Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: resume MES scheduling after user queue hang detection and recovery

2025-11-12T02:54:17+00:00

This patch ensures the Micro-Engine Scheduler (MES) is properly resumed
after detecting and recovering from a user queue hang condition.

Key changes:
1. Track when a hung user queue is detected using found_hung_queue flag
2. Call amdgpu_mes_resume() to restart MES scheduling after completing
   the hang recovery process
3. This complements the existing recovery steps (fence force completion
   and device wedging) by ensuring the scheduler can process new work

Without this resume call, the MES scheduler may remain in a paused state
even after the hung queue has been handled, preventing newly submitted
work from being processed and leading to system stalls.

Acked-by: Alex Deucher 
Signed-off-by: Jesse Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu/userq: fix SDMA and compute validation

2025-10-28T13:59:48+00:00

The CSA and EOP buffers have different alignement requirements.
Hardcode them for now as a bug fix.  A proper query will be added in
a subsequent patch.

v2: verify gfx shadow helper callback (Prike)

Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size")
Reviewed-by: Prike Liang 
Signed-off-by: Alex Deucher

drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray

2025-10-28T13:59:22+00:00

This commit refactors the AMDGPU userqueue management subsystem to replace
IDR (ID Allocation) with XArray for improved performance, scalability, and
maintainability. The changes address several issues with the previous IDR
implementation and provide better locking semantics.

Key changes:

1. **Global XArray Introduction**:
   - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking
   - Uses doorbell_index as key for efficient global lookup
   - Replaces the previous `userq_mgr_list` linked list approach

2. **Per-process XArray Conversion**:
   - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr`
   - Maintains per-process queue tracking with queue_id as key
   - Uses XA_FLAGS_ALLOC for automatic ID allocation

3. **Locking Improvements**:
   - Removed global `userq_mutex` from `struct amdgpu_device`
   - Replaced with fine-grained XArray locking using XArray's internal spinlocks

4. **Runtime Idle Check Optimization**:
   - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty

5. **Queue Management Functions**:
   - Converted all IDR operations to equivalent XArray functions:
     - `idr_alloc()` → `xa_alloc()`
     - `idr_find()` → `xa_load()`
     - `idr_remove()` → `xa_erase()`
     - `idr_for_each()` → `xa_for_each()`

Benefits:
- **Performance**: XArray provides better scalability for large numbers of queues
- **Memory Efficiency**: Reduced memory overhead compared to IDR
- **Thread Safety**: Improved locking semantics with XArray's internal spinlocks

v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa
    Remove xa_lock and use its own lock.

v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create()
v4: use xa_store_irq (Christian)
    hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian)

Acked-by: Alex Deucher 
Suggested-by: Christian König 
Signed-off-by: Jesse Zhang 
Signed-off-by: Alex Deucher