From dedf404be8cf97e6fabbed7ad97000ab816897eb Mon Sep 17 00:00:00 2001 From: Connor Abbott Date: Tue, 20 May 2025 15:08:58 -0400 Subject: drm/msm: Delete resume_translation() Unused since the previous commit. Signed-off-by: Connor Abbott Patchwork: https://patchwork.freedesktop.org/patch/654890/ Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/msm_mmu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'drivers/gpu/drm/msm/msm_mmu.h') diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index daf91529e02b..c3d17aae88b0 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -15,7 +15,6 @@ struct msm_mmu_funcs { size_t len, int prot); int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len); void (*destroy)(struct msm_mmu *mmu); - void (*resume_translation)(struct msm_mmu *mmu); }; enum msm_mmu_type { -- cgit v1.2.3 From b13044092c1e30453d2f7e9be596d3a2616582a0 Mon Sep 17 00:00:00 2001 From: Connor Abbott Date: Tue, 20 May 2025 15:08:59 -0400 Subject: drm/msm: Temporarily disable stall-on-fault after a page fault When things go wrong, the GPU is capable of quickly generating millions of faulting translation requests per second. When that happens, in the stall-on-fault model each access will stall until it wins the race to signal the fault and then the RESUME register is written. This slows processing page faults to a crawl as the GPU can generate faults much faster than the CPU can acknowledge them. It also means that all available resources in the SMMU are saturated waiting for the stalled transactions, so that other transactions such as transactions generated by the GMU, which shares translation resources with the GPU, cannot proceed. This causes a GMU watchdog timeout, which leads to a failed reset because GX cannot collapse when there is a transaction pending and a permanently hung GPU. On older platforms with qcom,smmu-v2, it seems that when one transaction is stalled subsequent faulting transactions are terminated, which avoids this problem, but the MMU-500 follows the spec here. To work around these problems, disable stall-on-fault as soon as we get a page fault until a cooldown period after pagefaults stop. This allows the GMU some guaranteed time to continue working. We only use stall-on-fault to halt the GPU while we collect a devcoredump and we always terminate the transaction afterward, so it's fine to miss some subsequent page faults. We also keep it disabled so long as the current devcoredump hasn't been deleted, because in that case we likely won't capture another one if there's a fault. After this commit HFI messages still occasionally time out, because the crashdump handler doesn't run fast enough to let the GMU resume, but the driver seems to recover from it. This will probably go away after the HFI timeout is increased. Signed-off-by: Connor Abbott Reviewed-by: Rob Clark Patchwork: https://patchwork.freedesktop.org/patch/654891/ Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/msm_mmu.h | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers/gpu/drm/msm/msm_mmu.h') diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h index c3d17aae88b0..0c694907140d 100644 --- a/drivers/gpu/drm/msm/msm_mmu.h +++ b/drivers/gpu/drm/msm/msm_mmu.h @@ -15,6 +15,7 @@ struct msm_mmu_funcs { size_t len, int prot); int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len); void (*destroy)(struct msm_mmu *mmu); + void (*set_stall)(struct msm_mmu *mmu, bool enable); }; enum msm_mmu_type { -- cgit v1.2.3