linux-toradex.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c, branch v7.0-rc5

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook

drm/amdgpu: Fix missing unwind in amdgpu_ib_schedule() error path

2026-02-12T20:17:31+00:00

amdgpu_ib_schedule() returns early after calling amdgpu_ring_undo().
This skips the common free_fence cleanup path.  Other error paths were
already changed to use goto free_fence, but this one was missed.

Change the early return to goto free_fence so all error paths clean up
the same way.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:232 amdgpu_ib_schedule()
warn: missing unwind goto?

drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
    124 int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs,
    125                        struct amdgpu_ib *ibs, struct amdgpu_job *job,
    126                        struct dma_fence **f)
    127 {

    ...

    224
    225         if (ring->funcs->insert_start)
    226                 ring->funcs->insert_start(ring);
    227
    228         if (job) {
    229                 r = amdgpu_vm_flush(ring, job, need_pipe_sync);
    230                 if (r) {
    231                         amdgpu_ring_undo(ring);
--> 232                         return r;

	The patch changed the other error paths to goto free_fence but
	this one was accidentally skipped.

    233                 }
    234         }
    235
    236         amdgpu_ring_ib_begin(ring);

    ...

    338
    339 free_fence:
    340         if (!job)
    341                 kfree(af);
    342         return r;
    343 }

Fixes: f903b85ed0f1 ("drm/amdgpu: fix possible fence leaks from job structure")
Reported-by: Dan Carpenter 
Cc: Alex Deucher 
Cc: Christian König 
Signed-off-by: Srinivasan Shanmugam 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher

drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()

2026-01-28T21:21:45+00:00

The EXEC_COUNT field must be > 0.  In the gfx shadow
handling we always emit a cond_exec packet after the gfx_shadow
packet, but the EXEC_COUNT never gets patched.  This leads
to a hang when we try and reset queues on gfx11 APUs.

Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Jesse Zhang 
Signed-off-by: Alex Deucher

drm/amdgpu: fix error handling in ib_schedule()

2026-01-20T22:25:40+00:00

If fence emit fails, free the fence if necessary.

Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: fix possible fence leaks from job structure

2025-11-04T16:53:59+00:00

If we don't end up initializing the fences, free them when
we free the job.  We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.

v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths

Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
Reviewed-by: Jesse Zhang  (v1)
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: clean up and unify hw fence handling

2025-10-13T18:14:35+00:00

Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences.  This also allows us to allocate the
vm fence up front when we allocate the job.

v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
    vm fence fix
v5: cast seqno (David)

Cc: David.Wu3@amd.com
Cc: christian.koenig@amd.com
Tested-by: David (Ming Qiang) Wu 
Reviewed-by: David (Ming Qiang) Wu 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: track ring state associated with a fence

2025-07-16T20:14:11+00:00

We need to know the wptr and sequence number associated
with a fence so that we can re-emit the unprocessed state
after a ring reset.  Pre-allocate storage space for
the ring buffer contents and add helpers to save off
and re-emit the unprocessed state so that it can be
re-emitted after the queue is reset.

Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: remove job parameter from amdgpu_fence_emit()

2025-06-30T15:57:06+00:00

What we actually care about is the amdgpu_fence object
so pass that in explicitly to avoid possible mistakes
in the future.

The job_run_counter handling can be safely removed at this
point as we no longer support job resubmission.

Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: remove is_mes_queue flag

2025-04-08T20:48:21+00:00

This was leftover from MES bring up when we had MES
user queues in the kernel.  It's no longer used so
remove it.

Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment

2025-01-24T14:55:04+00:00

commit 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in
amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will
cause emit switch buffer be skippen if job->vm set as NULL.

Check job rather than vm could solve this problem.

Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: Lin.Cao 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher