diff options
| author | Ce Sun <cesun102@amd.com> | 2025-07-26 20:16:24 +0800 |
|---|---|---|
| committer | Alex Deucher <alexander.deucher@amd.com> | 2025-08-04 14:27:49 -0400 |
| commit | da467352296f8e50c7ab7057ead44a1df1c81496 (patch) | |
| tree | c83b1901ea355184b6126fa750499af74699eaec /drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | |
| parent | 21c0ffa612c98bcc6dab5bd9d977a18d565ee28e (diff) | |
drm/amdgpu: Effective health check before reset
Move amdgpu_device_health_check into amdgpu_device_gpu_recover to
ensure that if the device is present can be checked before reset
The reason is:
1.During the dpc event, the device where the dpc event occurs is not
present on the bus
2.When both dpc event and ATHUB event occur simultaneously,the dpc thread
holds the reset domain lock when detecting error,and the gpu recover thread
acquires the hive lock.The device is simultaneously in the states of
amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to
amdgpu_device_health_check.It waits for the reset domain lock held by the
dpc thread, but dpc thread has not released the reset domain lock.In the dpc
callback slot_reset,to obtain the hive lock, the hive lock is held by the
gpu recover thread at this time.So a deadlock occurred
Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Diffstat (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c')
0 files changed, 0 insertions, 0 deletions
