linux-toradex.git/drivers/gpu/drm/amd/scheduler, branch master

drm: move amd_gpu_scheduler into common location

2017-12-07T16:51:56+00:00

This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.

One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.

Reviewed-by: Chunming Zhou 
Tested-by: Dieter Nützel 
Acked-by: Alex Deucher 
Signed-off-by: Lucas Stach 
Signed-off-by: Alex Deucher

drm/amdgpu: add license to files where it was missing

2017-12-07T16:51:25+00:00

These files were missing it before.

Acked-by: Harry Wentland 
Acked-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amd/scheduler: add WARN_ON for s_fence->parent

2017-12-06T17:47:18+00:00

Signed-off-by: Chunming Zhou 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amd/scheduler: fix page protection of cb

2017-12-06T17:47:18+00:00

We must remove the fence callback.

Signed-off-by: Chunming Zhou 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu:fix gpu recover missing skipping(v2)

2017-12-04T21:41:46+00:00

if app close CTX right after IB submit, gpu recover
will fail to find out the entity behind this guilty
job thus lead to no job skipping for this guilty job.

to fix this corner case just move the increasement of
job->karma out of the entity iteration.

v2:
only do karma increasment if bad->s_priority != KERNEL
because we always consider KERNEL job be correct and always
want to recover an unfinished kernel job (sometimes kernel
job is interrupted by VF FLR or other GPU hang event)

Signed-off-by: Monk Liu 
Reviewed-by: Christian König 
Reviewed-By: Xiangliang Yu 
Signed-off-by: Alex Deucher

amd/scheduler:imple job skip feature(v3)

2017-12-04T21:41:30+00:00

jobs are skipped under two cases
1)when the entity behind this job marked guilty, the job
poped from this entity's queue will be dropped in sched_main loop.

2)in job_recovery(), skip the scheduling job if its karma detected
above limit, and also skipped as well for other jobs sharing the
same fence context. this approach is becuase job_recovery() cannot
access job->entity due to entity may already dead.

v2:
some logic fix

v3:
when entity detected guilty, don't drop the job in the poping
stage, instead set its fence error as -ECANCELED

in run_job(), skip the scheduling either:1) fence->error < 0
or 2) there was a VRAM LOST occurred on this job.
this way we can unify the job skipping logic.

with this feature we can introduce new gpu recover feature.

Signed-off-by: Monk Liu 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Remove job->s_entity to avoid keeping reference to stale pointer.

2017-12-04T21:33:11+00:00

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Fix deadlock during GPU reset.

2017-12-04T21:33:11+00:00

Bug:
Kfifo is limited at size, during GPU reset it would fill up to limit
and the pushing thread (producer) would wait for the scheduler worker to
consume the items in the fifo while holding reservation lock
on a BO. The gpu reset thread on the other hand blocks the scheduler
during reset. Before it unblocks the sceduler it might want
to recover VRAM and so will try to reserve the same BO the producer
thread is already holding creating a deadlock.

Fix:
Switch from kfifo to SPSC queue which is unlimited in size.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Add SPSC queue to scheduler.

2017-12-04T21:33:10+00:00

It is intended to sabstitute the bounded fifo we are currently
using.

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu:cleanup job reset routine(v2)

2017-12-04T21:33:10+00:00

merge the setting guilty on context into this function
to avoid implement extra routine.

v2:
go through entity list and compare the fence_ctx
before operate on the entity, otherwise the entity
may be just a wild pointer

Signed-off-by: Monk Liu 
Reviewed-by: Chunming Zhou 
Signed-off-by: Alex Deucher