linux-toradex.git - Linux kernel for Apalis and Colibri modules

diff options

author	Ce Sun <cesun102@amd.com>	2025-07-26 20:16:24 +0800
committer	Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:27:49 -0400
commit	da467352296f8e50c7ab7057ead44a1df1c81496 (patch)
tree	c83b1901ea355184b6126fa750499af74699eaec /drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
parent	21c0ffa612c98bcc6dab5bd9d977a18d565ee28e (diff)

drm/amdgpu: Effective health check before reset

Move amdgpu_device_health_check into amdgpu_device_gpu_recover to ensure that if the device is present can be checked before reset The reason is: 1.During the dpc event, the device where the dpc event occurs is not present on the bus 2.When both dpc event and ATHUB event occur simultaneously,the dpc thread holds the reset domain lock when detecting error,and the gpu recover thread acquires the hive lock.The device is simultaneously in the states of amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to amdgpu_device_health_check.It waits for the reset domain lock held by the dpc thread, but dpc thread has not released the reset domain lock.In the dpc callback slot_reset,to obtain the hive lock, the hive lock is held by the gpu recover thread at this time.So a deadlock occurred Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: