summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2023-05-05Merge tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull more drm fixes from Dave Airlie: "This is the fixes for the last couple of weeks for i915 and last 3 weeks for amdgpu, lots of them but pretty scattered around and all pretty small. amdgpu: - SR-IOV fixes - DCN 3.2 fixes - DC mclk handling fixes - eDP fixes - SubVP fixes - HDCP regression fix - DSC fixes - DC FP fixes - DCN 3.x fixes - Display flickering fix when switching between vram and gtt - Z8 power saving fix - Fix hang when skipping modeset - GPU reset fixes - Doorbell fix when resizing BARs - Fix spurious warnings in gmc - Locking fix for AMDGPU_SCHED IOCTL - SR-IOV fix - DCN 3.1.4 fix - DCN 3.2 fix - Fix job cleanup when CS is aborted i915: - skl pipe source size check - mtl transcoder mask fix - DSI power on sequence fix - GuC versioning corner case fix" * tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drm: (48 commits) drm/amdgpu: drop redundant sched job cleanup when cs is aborted drm/amd/display: filter out invalid bits in pipe_fuses drm/amd/display: Change default Z8 watermark values drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV drm/amdgpu: add a missing lock for AMDGPU_SCHED drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini() drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini drm/amdgpu: Enable doorbell selfring after resize FB BAR drm/amdgpu: Use the default reset when loading or reloading the driver drm/amdgpu: Fix mode2 reset for sienna cichlid drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep() drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_info drm/i915/guc: Actually return an error if GuC version range check fails drm/amd/display: Lowering min Z8 residency time drm/amd/display: fix flickering caused by S/G mode drm/amd/display: Set min_width and min_height capability for DCN30 drm/amd/display: Isolate remaining FPU code in DCN32 drm/amd/display: Update bounding box values for DCN321 drm/amd/display: Do not clear GPINT register when releasing DMUB from reset ...
2023-05-03drm/amdgpu: drop redundant sched job cleanup when cs is abortedGuchun Chen
Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as former commit has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference will occur in such scenario. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOVHorace Chen
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: add a missing lock for AMDGPU_SCHEDChia-I Wu
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()Hamza Mahfooz
As made mention of in commit 08c677cb0b43 ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit 13af556104fa ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it from gmc_v9_0_hw_fini(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_finiHoratio Zhang
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_finiHoratio Zhang
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-25Merge tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm updates from Dave Airlie: "There is a new Qualcomm accel driver for their QAIC, dma-fence got a deadline feature added, lots of refactoring around fbdev emulation, and the usual pre-release hw enablements from AMD and Intel and fixes everywhere. New drivers: - add QAIC acceleration driver dma-buf: - constify kobj_type structs - Reject prime DMA-Buf attachment if get_sg_table is missing. fbdev: - cmdline parser fixes - implement fbdev emulation for GEM DMA drivers - always use shadow buffer in fbdev emulation helpers dma-fence: - add deadline hint to fences - signal private stub fence core: - improve DisplayID 2.0 and EDID parsing - add gem eviction function + callback - prep to convert shmem helper to GEM resv lock - move suballocator from radeon/amdgpu to core for Xe - HPD polling fixes - Documentation improvements - Add atomic enable_plane callback - use tgid instead of pid for client tracking - DP: Add SDP Error Detection Configuration Register - Add prime import/export to vram-helper - use pci aperture helpers in more drivers panel: - Radxa 8/10HD support - Samsung AMD495QA01 support - Elida KD50T048A - Sony TD4353 - Novatek NT36523 - STARRY 2081101QFH032011-53G - B133UAN01.0 - AUO NE135FBM-N41 i915: - More MTL enabling - fix s/r problems with MEI/PXP - Implement fb_dirty for PSR,FBC,DRRS fixes - Fix eDP+DSI dual panel systems - Fix issue #6333: "list_add corruption" and full system lockup from performance monitoring - Don't use stolen memory or BAR for ring buffers on LLC platforms - Make sure DSM size has correct 1MiB granularity on Gen12+ - Whitelist COMMON_SLICE_CHICKEN3 for UMD access on Gen12+ - Add engine TLB invalidation for Meteorlake - Fix GSC races on driver load/unload on Meteorlake+ - Make kobj_type structures constant - Move fd_install after last use of fence - wm/vblank refactoring - display code refactoring - Create GSC submission targeting HDCP and PXP usages on MTL+ - Enable HDCP2.x via GSC CS - Fix context runtime accounting on sysfs fdinfo for heavy workloads - Use i915 instead of dev_priv insied the file_priv structure - Replace fake flex-array with flexible-array member amdgpu: - Make kobj structures const - Generalize dmabuf import to work with KFD - Add capped/uncapped workload handling for supported APUs - Expose additional memory stats via fdinfo - Register vga_switcheroo for apple-gmux - Initial NBIO7.9, GC 9.4.3, GFXHUB 1.2, MMHUB 1.8 support - Initial DC FAM infrastructure - Link DC backlight to connector device rather than PCI device - Add sysfs nodes for secondary VCN clocks amdkfd: - Make kobj structures const - Support for exporting buffers via dmabuf - Multi-VMA page migration fixes - initial GC 9.4.3 support radeon: - iMac fix - convert to client based fbdev emulation habanalabs: - Add opcodes to the CS ioctl to allow user to stall/resume specific engines inside Gaudi2. - INFO ioctl the amount of device memory that the driver and f/w reserve for themselves. - INFO ioctl a bit-mask of the available rotator engines - INFO ioctl the register's address of the f/w that should be used to trigger interrupts - INFO ioctl two new opcodes to fetch information on h/w and f/w events - Enable graceful reset mechanism for compute-reset. - Align to the latest firmware specs. - Enforce the release order of the compute device and dma-buf. msm: - UBWC decoder programming rework - SM8550, SM8450 bindings update - uapi C++ fix - a3xx and a4xx devfreq support - GPU and GEM updates to avoid allocations which could trigger reclaim (shrinker) in fence signaling path - dma-fence deadline hint support and wait-boost - a640/650 speed bin support cirrus: - convert to regular atomic helpers - add damage clipping mediatek: - 10-bit overlay support - mt8195 support - Only trigger DRM HPD events if bridge is attached - Change the aux retries times when receiving AUX_DEFER rockchip: - add 4K support vc4: - use drm_gem_objects virtio: - allow KMS support to be disabled - add damage clipping vmwgfx: - buffer object lifetime fixes exynos: - move MIPI DSI driver to drm bridge for iMX sharing - use kernel fbdev emulation panfrost: - add support for mali MT81xx devices - add speed binning support lima: - add usage stats tegra: - fbdev client conversion vkms: - Add primary plane positioning support" * tag 'drm-next-2023-04-24' of git://anongit.freedesktop.org/drm/drm: (1495 commits) drm/i915/dp_mst: Fix active port PLL selection for secondary MST streams drm/exynos: Implement fbdev emulation as in-kernel client drm/exynos: Initialize fbdev DRM client drm/exynos: Remove fb_helper from struct exynos_drm_private drm/exynos: Remove struct exynos_drm_fbdev drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev drm/i915: Fix memory leaks in i915 selftests drm/i915: Make intel_get_crtc_new_encoder() less oopsy drm/i915/gt: Avoid out-of-bounds access when loading HuC drm/amdgpu: add some basic elements for multiple XCD case drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4) Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV" drm/amdgpu: add common ip block for GC 9.4.3 drm/amd/display: Add logging when DP link training Clock recovery is Successful drm/amdgpu: add common early init support for GC 9.4.3 drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3 drm/amd/display: Add logging when setting DP sink power state fails drm/amdkfd: Add gfx_target_version for GC 9.4.3 drm/amdkfd: Enable HW_UPDATE_RPTR on GC 9.4.3 drm/amdgpu: reserve the old gc_11_0_*_mes.bin ...
2023-04-18drm/amdgpu: Fix desktop freezed after gpu-resetAlan Liu
[Why] After gpu-reset, sometimes the driver fails to enable vblank irq, causing flip_done timed out and the desktop freezed. During gpu-reset, we disable and enable vblank irq in dm_suspend() and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check irqs' refcount and decide to enable or disable the irqs again. However, we have 2 sets of API for controling vblank irq, one is dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has its own refcount and flag to store the state of vblank irq, and they are not synchronized. In drm we use the first API to control vblank irq but in amdgpu_irq_gpu_reset_resume_helper() we use the second set of API. The failure happens when vblank irq was enabled by dm_vblank_get() before gpu-reset, we have vblank->enabled true. However, during gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state checked from amdgpu_irq_update() is DISABLED. So finally it disables vblank irq again. After gpu-reset, if there is a cursor plane commit, the driver will try to enable vblank irq by calling drm_vblank_enable(), but the vblank->enabled is still true, so it fails to turn on vblank irq and causes flip_done can't be completed in vblank irq handler and desktop become freezed. [How] Combining the 2 vblank control APIs by letting drm's API finally calls amdgpu_irq's API, so the irq's refcount and state of both APIs can be synchronized. Also add a check to prevent refcount from being less then 0 in amdgpu_irq_put(). v2: - Add warning in amdgpu_irq_enable() if the irq is already disabled. - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change if it is in gpu-reset. v3: - Improve commit message and code comments. Signed-off-by: Alan Liu <HaoPing.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-04-18drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init"Chong Li
[WHY] Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context. We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()". [HOW] Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()", to ensure that registers be accessed through RLCG instead of KIQ. Call Trace: <TASK> show_stack+0x52/0x69 dump_stack_lvl+0x49/0x6d dump_stack+0x10/0x18 __schedule_bug.cold+0x4f/0x6b __schedule+0x473/0x5d0 ? __wake_up_klogd.part.0+0x40/0x70 ? vprintk_emit+0xbe/0x1f0 schedule+0x68/0x110 schedule_timeout+0x87/0x160 ? timer_migration_handler+0xa0/0xa0 msleep+0x2d/0x50 amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu] amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu] amdgpu_device_rreg+0x3a/0x50 [amdgpu] amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu] gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu] gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu] amdgpu_irq_update+0x53/0x80 [amdgpu] amdgpu_irq_get+0x7c/0xb0 [amdgpu] amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu] amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu] Signed-off-by: Chong Li <chongli2@amd.com> Reviewed-by: JingWen.Chen2@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu/vcn: fix mmsch ctx table sizeJane Jian
add jpeg table size to ctx table size rather than override it Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: JingWen Chen <JingWen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: add some basic elements for multiple XCD caseLe Ma
Add some basic definitions and structure member. Inscrease MAX_WB slots to 1024 to support the increasing number of rings for multiple partitions. v2: unify naming style Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)Le Ma
It looks better to place this field in ring structure. Also drop the repeated ring funcs definitions if there's no difference except for vmhub field. v2: rename the field to vm_hub like others (Le) v3: apply the changes to new ip blocks (Hawking) v4: fix vcn sw ring (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14Revert "drm/amdgpu: enable ras for mp0 v13_0_10 on SRIOV"Jane Jian
This reverts commit fe120b9f5ce873516a2604e4ff0c19084be94e8c. This patch impacts sriov multi-vf stability Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: add common ip block for GC 9.4.3Hawking Zhang
Add common IP handling for GC 9.4.3 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: add common early init support for GC 9.4.3Hawking Zhang
init asic funcs and cp/pg flags for GC 9.4.3 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: switch to v9_4_3 gfx_funcs callbacks for GC 9.4.3Hawking Zhang
add gfx_funcs callbacks implemenation based on gc_v9_4_3 ip headers Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: reserve the old gc_11_0_*_mes.binLi Ma
Reserve the MOUDLE_FIRMWARE declaration of gc_11_0_*_mes.bin to fix falling back to old mes bin on failure via autoload. Fixes: 97998b893c30 ("drm/amd/amdgpu: introduce gc_*_mes_2.bin v2") Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: change the reference clock for raven/raven2Jesse Zhang
Due to switch to golden tsc register to get clock counter for raven/ raven2. Chang the reference clock from 25MHZ to 100MHZ. Suggested-by: shanshengwang <shansheng.wang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: skip kfd-iommu suspend/resume for S0ixAaron Liu
GFX is in gfxoff mode during s0ix so we shouldn't need to actually execute kfd_iommu_suspend/kfd_iommu_resume operation. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: add gc v9_4_3 rlc_funcs implementationHawking Zhang
all the gc v9_4_3 registers fall in gc_rlcpdec address range have different relative offsets and base_idx from the ones defined in gc v9_0 ip headers. gc_v9_0_rlc_funcs can not be reused anymore for gc v9_4_3 v2: drop unused handshake function (Alex) Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: drop temp programming for pagefault handlingHawking Zhang
Was introduced as workaround. not needed anymore Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Jack Gui <Jack.Gui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: include protection for doorbell.hShashank Sharma
This patch adds double include protection for doorbell.h Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: rename num_doorbellsShashank Sharma
Rename doorbell.num_doorbells to doorbell.num_kernel_doorbells to make it more readable. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: switch to golden tsc registers for raven/raven2Jesse Zhang
Due to raven/raven2 maybe enable  sclk slow down, they cannot get clock count by the RLC at the auto level of dpm performance. So switch to golden tsc register. Suggested-by: shanshengwang <shansheng.wang@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: add gfx v11_0_3 fed irq handling for sriovYiPeng Chai
Add gfx v11_0_3 fed irq handling for sriov. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: Rework retry fault removalMukul Joshi
Rework retry fault removal from the software filter by storing an expired timestamp for a fault that is being removed. When a new fault comes, and it matches an entry in the sw filter, it will be added as a new fault only when its timestamp is greater than the timestamp expiry of the fault in the sw filter. This helps in avoiding stale faults being added back into the filter and preventing legitimate faults from being handled. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: Enable IH retry CAM on GFX9Mukul Joshi
This patch enables the IH retry CAM on GFX9 series cards. This retry filter is used to prevent sending lots of retry interrupts in a short span of time and overflowing the IH ring buffer. This will also help reduce CPU interrupt workload. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: simplify amdgpu_ras_eeprom.cAlex Deucher
All chips that support RAS also support IP discovery, so use the IP versions rather than a mix of IP versions and asic types. Checking the validity of the atom_ctx pointer is not required as the vbios is already fetched at this point. v2: add comments to id asic types based on feedback from Luben Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com>
2023-04-11drm/amdgpu: Enable GFX11 SDMA context empty interruptGraham Sider
Enable SDMA queue empty context switching. SDMA context switch due to quantum programming no longer done here (as of sdma v6), so re-name sdma_v6_0_ctx_switch_enable to sdma_v6_0_ctxempty_int_enable to reflect this. Also program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to quantum in KFD. Set to amdgpu_sdma_phase_quantum (defaults to 32 i.e. 3200us). Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdkfd: Check PCIe atomics support on GFX11 to set CP_HQD_HQ_STATUS0[29]Sreekant Somasekharan
CP_HQD_HQ_STATUS0[29] bit will be used by CPFW to acknowledge whether PCIe atomics are supported. The default value of this bit is set to 0. Driver will check whether PCIe atomics are supported and set the bit to 1 if supported. This will force CPFW to use real atomic ops. If the bit is not set, CPFW will default to read/modify/write using the firmware itself. This is applicable only to GFX11 RS64 CP with MEC FW >= 509. If MEC FW < 509 and for all GFX11 F32 CP, PCIe atomics needs to be supported else it will skip the device. This commit also involves moving amdgpu_amdkfd_device_probe() function call after per-IP early_init loop in amdgpu_device_ip_early_init() function so as to check for RS64 enabled device. Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: correct ras enabled flagStanley.Yang
XGMI RAS should be according to the gmc xgmi physical nodes number, XGMI RAS should not be enabled if xgmi num_physical_nodes is zero. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: fix unexpected block idStanley.Yang
Aldebaran supports VCN and JPEG RAS, it reports unexpected block id message during VCN and JPEG RAS initialization if VCN and JPEG block id not defined. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: use sdma_v6 single packet invalidationPierre-Eric Pelloux-Prayer
This achieves the same result as the sequence used in emit_flush_gpu_tlb but the invalidation is now a single packet instead of the 3 packets required to implement reg_write_reg_wait. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: Fix warningsLijo Lazar
Fix below warning due to incompatible types in conditional operator ../pm/swsmu/smu13/smu_v13_0_6_ppt.c:315:17: sparse: sparse: incompatible types in conditional expression (different base types): Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Link: https://lore.kernel.org/oe-kbuild-all/202303082135.NjdX1Bij-lkp@intel.com/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: refine get gpu clock counter methodTong Liu01
[why] regGOLDEN_TSC_COUNT_LOWER/regGOLDEN_TSC_COUNT_UPPER are protected and unaccessible under sriov. The clock counter high bit may update during reading process. [How] Replace regGOLDEN_TSC_COUNT_LOWER/regGOLDEN_TSC_COUNT_UPPER with regCP_MES_MTIME_LO/regCP_MES_MTIME_HI to get gpu clock under sriov. Refine get gpu clock counter method to make the result more precise. Signed-off-by: Tong Liu01 <Tong.Liu01@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: optimize redundant code in umc_v6_7YiPeng Chai
Optimize redundant code in umc_v6_7. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: Fix sdma v4 sw fini errorlyndonli
Fix sdma v4 sw fini error for sdma 4.2.2 to solve the following general protection fault [ +0.108196] general protection fault, probably for non-canonical address 0xd5e5a4ae79d24a32: 0000 [#1] PREEMPT SMP PTI [ +0.000018] RIP: 0010:free_fw_priv+0xd/0x70 [ +0.000022] Call Trace: [ +0.000012] <TASK> [ +0.000011] release_firmware+0x55/0x80 [ +0.000021] amdgpu_ucode_release+0x11/0x20 [amdgpu] [ +0.000415] amdgpu_sdma_destroy_inst_ctx+0x4f/0x90 [amdgpu] [ +0.000360] sdma_v4_0_sw_fini+0xce/0x110 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: DROP redundant drm_prime_sg_to_dma_addr_arrayShane Xiao
For DMA-MAP userptr on other GPUs, the dma address array will be populated in amdgpu_ttm_backend_bind. Remove the redundant call from the driver. v2: update the comment Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: optimize redundant code in umc_v8_10YiPeng Chai
Optimize redundant code in umc_v8_10 Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11amd/amdgpu: Inherit coherence flags base on original BO flagsShane Xiao
For SG BO to DMA-map userptrs on other GPUs, the SG BO need inherit MTYPEs in PTEs from original BO. If we set the flags, the device can be coherent with the CPUs and other GPUs. v2: 1. Drop unnecessary flags check 2. Remove local variable align Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: Add userptr bo support for mGPUs when iommu is onShane Xiao
For userptr bo with iommu on, multiple GPUs use same system memory dma mapping address when both adev and bo_adev are in identity mode or in the same iommu group. If RAM direct map to one GPU, other GPUs can share the original BO in order to reduce dma address array usage when RAM can direct map to these GPUs. However, we should explicit check whether RAM can direct map to all these GPUs. This patch fixes a potential issue that where RAM is direct mapped on some but not all GPUs. v2: 1. Update comment 2. Add helper function reuse_dmamap Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amd/amdgpu: Drop the hang limit parameterSrinivasan Shanmugam
The driver doesn't resubmit jobs on hangs any more, hence drop the hang limit parameter - amdgpu_job_hang_limit, wherever it is used. Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: Add MES KIQ clear to tell RLC that KIQ is dequeuedYifan Zha
[Why] As MES KIQ is dequeued, tell RLC that KIQ is inactive [How] Clear the RLC_CP_SCHEDULERS Active bit which RLC checks KIQ status In addition, driver can halt MES under SRIOV when unloading driver v2: Use scheduler0 mask to clear KIQ portion of RLC_CP_SCHEDULERS Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: Add MES KIQ dequeue in MES hw finiYifan Zha
[Why] Need dequeue MES KIQ under SRIOV when unloading driver [How] Modify mes_v11_0_kiq_dequeue_sched which was used to dequeue MES SCHED to support veriable pipe. Add MES KIQ dequeue in hw fini Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Reviewed-by: Horace Chen <horace.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amd/amdgpu: introduce gc_*_mes_2.bin v2Jack Xiao
To avoid new mes fw running with old driver, rename mes schq fw to gc_*_mes_2.bin. v2: add MODULE_FIRMWARE declaration v3: squash in fixup patch Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amdgpu: allow more APUs to do mode2 reset when go to S4Tim Huang
Skip mode2 reset only for IMU enabled APUs when do S4. This patch is to fix the regression issue https://gitlab.freedesktop.org/drm/amd/-/issues/2483 It is generated by commit b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4"). Fixes: b589626674de ("drm/amdgpu: skip ASIC reset for APUs when go to S4") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483 Tested-by: Yuan Perry <Perry.Yuan@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>