summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
2026-02-03drm/amd: kill the outdated "Only the pthreads threading model is supported" ↵Oleg Nesterov
checks Nowadays task->group_leader->mm != task->mm is only possible if a) task is not a group leader and b) task->group_leader->mm == NULL because task->group_leader has already exited using sys_exit(). I don't think that drm/amd tries to detect/nack this case. Link: https://lkml.kernel.org/r/aXY_yLVHd63UlWtm@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Christan König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-03drm/amdgpu: don't abuse current->group_leaderOleg Nesterov
Cleanup and preparation to simplify the next changes. - Use current->tgid instead of current->group_leader->pid - Use get_task_pid(current, PIDTYPE_TGID) instead of get_task_pid(current->group_leader, PIDTYPE_PID) Link: https://lkml.kernel.org/r/aXY_wKewzV5lCa5I@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Christan König <christian.koenig@amd.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02Merge tag 'amd-drm-next-6.20-2026-01-30' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.20-2026-01-30: amdgpu: - Misc cleanups - SMU 13 fixes - SMU 14 fixes - GPUVM fault filter fix - USB4 fixes - DC FP guard fixes - Powergating fix - JPEG ring reset fix - RAS fixes - Xclk fix for soc21 APUs - Fix COND_EXEC handling for GC 11 - UserQ fixes - MQD size alignment fixes - SMU feature interface cleanup - GC 10-12 KGQ init fixes - GC 11-12 KGQ reset fixes amdkfd: - Fix device snapshot reporting - GC 12.1 trap handler fixes - MQD size alignment fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260130183257.28879-1-alexander.deucher@amd.com
2026-01-29Merge tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Seems to be a bit quieter this week, mostly xe and amdgpu, with msm and imx fixes and one WARN_ON from user blocked. Nothing of note outstanding either. uapi: - Fix a WARN_ON() when passing an invalid handle to drm_gem_change_handle_ioctl() msm: - GPU: - Fix bogus hwcg register update for a690 xe: - Skip address copy for sync-only execs - Fix a WA - Derive mem_copy cap from graphics version - Fix is_bound() pci_dev lifetime - xe nvm cleanup fixes amdgpu: - SMU 13 fixes - SMU 14 fixes - GPUVM fault filter fix - Powergating fix - HDMI debounce fix - Xclk fix for soc21 APUs - Fix COND_EXEC handling for GC 11 - GC 10-12 KGQ init fixes - GC 11-12 KGQ reset fixes imx/tve: - drop ddc device reference when unloading" * tag 'drm-fixes-2026-01-30' of https://gitlab.freedesktop.org/drm/kernel: (21 commits) drm/xe/nvm: Fix double-free on aux add failure drm/xe/nvm: Manage nvm aux cleanup with devres drm/amdgpu/gfx12: adjust KGQ reset sequence drm/amdgpu/gfx11: adjust KGQ reset sequence drm/amdgpu/gfx12: fix wptr reset in KGQ init drm/amdgpu/gfx11: fix wptr reset in KGQ init drm/amdgpu/gfx10: fix wptr reset in KGQ init drm/xe/configfs: Fix is_bound() pci_dev lifetime drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule() drm/amdgpu/soc21: fix xclk for APUs drm/amd/display: Clear HDMI HPD pending work only if it is enabled drm/imx/tve: fix probe device leak drm/amd/pm: fix race in power state check before mutex lock drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove drm/amd/pm: fix smu v14 soft clock frequency setting issue drm/amd/pm: fix smu v13 soft clock frequency setting issue drm/xe: derive mem copy capability from graphics version drm/xe/xelp: Fix Wa_18022495364 drm/xe: Skip address copy for sync-only execs drm: Do not allow userspace to trigger kernel warnings in drm_gem_change_handle_ioctl() ...
2026-01-29Merge tag 'mm-hotfixes-stable-2026-01-29-09-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 9 are cc:stable, 12 are for MM. There's a patch series from Pratyush Yadav which fixes a few things in the new-in-6.19 LUO memfd code. Plus the usual shower of singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-01-29-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: vmcoreinfo: make hwerr_data visible for debugging mm/zone_device: reinitialize large zone device private folios mm/mm_init: don't cond_resched() in deferred_init_memmap_chunk() if called from deferred_grow_zone() mm/kfence: randomize the freelist on initialization kho: kho_preserve_vmalloc(): don't return 0 when ENOMEM kho: init alloc tags when restoring pages from reserved memory mm: memfd_luo: restore and free memfd_luo_ser on failure mm: memfd_luo: use memfd_alloc_file() instead of shmem_file_setup() memfd: export alloc_file() flex_proportions: make fprop_new_period() hardirq safe mailmap: add entry for Viacheslav Bocharov mm/memory-failure: teach kill_accessing_process to accept hugetlb tail page pfn mm/memory-failure: fix missing ->mf_stats count in hugetlb poison mm, swap: restore swap_space attr aviod kernel panic mm/kasan: fix KASAN poisoning in vrealloc() mm/shmem, swap: fix race of truncate and swap entry split
2026-01-29drm/amdgpu/gfx12: adjust KGQ reset sequenceAlex Deucher
Kernel gfx queues do not need to be reinitialized or remapped after a reset. Align with gfx11. v2: preserve init and remap for MMIO case. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a6d6ed694d72b66b0ed7a483d5effa01acd3951) Cc: stable@vger.kernel.org
2026-01-29drm/amdgpu/gfx11: adjust KGQ reset sequenceAlex Deucher
Kernel gfx queues do not need to be reinitialized or remapped after a reset. This fixes queue reset failures on APUs. v2: preserve init and remap for MMIO case. Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b340ff216fdabfe71ba0cdd47e9835a141d08e10) Cc: stable@vger.kernel.org
2026-01-29drm/amdgpu/gfx12: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2918f958d3f677ea93c0ac257cb6ba69b7abb7c) Cc: stable@vger.kernel.org
2026-01-29drm/amdgpu/gfx11: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f16866bdb1daed7a80ca79ae2837a9832a74fbc) Cc: stable@vger.kernel.org
2026-01-29drm/amdgpu/gfx10: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e80b1d1aa1073230b6c25a1a72e88f37e425ccda) Cc: stable@vger.kernel.org
2026-01-29drm/amdgpu/gfx12: adjust KGQ reset sequenceAlex Deucher
Kernel gfx queues do not need to be reinitialized or remapped after a reset. Align with gfx11. v2: preserve init and remap for MMIO case. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx11: adjust KGQ reset sequenceAlex Deucher
Kernel gfx queues do not need to be reinitialized or remapped after a reset. This fixes queue reset failures on APUs. v2: preserve init and remap for MMIO case. Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx12: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx11: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu/gfx10: fix wptr reset in KGQ initAlex Deucher
wptr is a 64 bit value and we need to update the full value, not just 32 bits. Align with what we already do for KCQs. Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdkfd: Use AMDGPU_MQD_SIZE_ALIGN in gfx11+ kfd mqd managerLang Yu
MES is enabled by default from gfx11+, use AMDGPU_MQD_SIZE_ALIGN unconditionally for gfx11+. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdkfd: Adjust parameter of allocate_mqdLang Yu
Make allocate_mqd consistent with other callbacks. Prepare for next patch to use mqd_manager->mqd_size. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGDLang Yu
Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Initialize allowed feature listLijo Lazar
Instead of returning feature bit mask of allowed features, initialize the allowed features in the callback implementation itself. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Remove unused logic in SMUv14.0.2Lijo Lazar
Remove commented and redundant logic in get_allowed_feature_mask implementation. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Add smu feature interface functionsLijo Lazar
Instead of using bitmap operations, add wrapper interface functions to operate on smu features. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amd/pm: Add smu feature bits data structLijo Lazar
Add a bitmap struct to represent smu feature bits and functions to set/clear features. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu: Add a helper macro to align mqd sizeLang Yu
MES FW uses address(mqd_addr + sizeof(struct mqd) + 3*sizeof(uint32_t)) as fence address and writes a 32 bit fence value to this address. Driver needs to allocate some extra memory(at least 4 DWs) in addition to sizeof(struct mqd) as mqd memory(limited to gfx/compute/sdma queue). For gfx11/12, sizeof(struct mqd) < PAGE_SIZE, KGD allocates mqd memory with PAGE_SIZE aligned works. For gfx12.1, sizeof(struct mqd) == PAGE_SIZE, it doesn't work. KFD mqd manager hardcodes mqd size to PAGE_SIZE/MQD_SIZE across different IP versions to solve this issue. To avoid hardcoding in differnet places and across different IP versions. Let's use AMDGPU_MQD_SIZE_ALIGN instead. It is used in two places. 1. mqd memory alloction 2. mqd stride handling for multi xcc config v2: Use AMDGPU_GPU_PAGE_ALIGN. (Mukul) Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: David Belanger <david.belanger@amd.com> (v1) Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29drm/amdgpu: validate user queue size constraintsJesse.Zhang
Add validation to ensure user queue sizes meet hardware requirements: - Size must be a power of two for efficient ring buffer wrapping - Size must be at least AMDGPU_GPU_PAGE_SIZE to prevent undersized allocations This prevents invalid configurations that could lead to GPU faults or unexpected behavior. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-28drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()Alex Deucher
The EXEC_COUNT field must be > 0. In the gfx shadow handling we always emit a cond_exec packet after the gfx_shadow packet, but the EXEC_COUNT never gets patched. This leads to a hang when we try and reset queues on gfx11 APUs. Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba205ac3d6e83f56c4f824f23f1b4522cb844ff3) Cc: stable@vger.kernel.org
2026-01-28drm/amdgpu/soc21: fix xclk for APUsAlex Deucher
The reference clock is supposed to be 100Mhz, but it appears to actually be slightly lower (99.81Mhz). Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 637fee3954d4bd509ea9d95ad1780fc174489860) Cc: stable@vger.kernel.org
2026-01-28drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()Alex Deucher
The EXEC_COUNT field must be > 0. In the gfx shadow handling we always emit a cond_exec packet after the gfx_shadow packet, but the EXEC_COUNT never gets patched. This leads to a hang when we try and reset queues on gfx11 APUs. Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-28drm/amdgpu/soc21: fix xclk for APUsAlex Deucher
The reference clock is supposed to be 100Mhz, but it appears to actually be slightly lower (99.81Mhz). Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451 Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-28drm/amdkfd: gfx12.1 trap handler instruction fixup for VOP3PXJay Cornwall
A trap may occur in the middle of VOP3PX instruction co-issue. The PC would be restored incorrectly if left unmodified. Identify this case by examining the instruction opcode and rewind the PC 8 bytes if it occurs. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Reviewed-by: Vladimir Indic <vladimir.indic@amd.com> Cc: Shweta Khatri <shweta.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-28drm/amd/pm: Fix null pointer dereference issueJinzhou Su
If SMU is disabled, during RAS initialization, there will be null pointer dereference issue here. Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-28drm/amd/display: Clear HDMI HPD pending work only if it is enabledIvan Lipski
[Why&How] On amdgpu_dm_connector_destroy(), the driver attempts to cancel pending HDMI HPD work without checking if the HDMI HPD is enabled. Added a check that it is enabled before clearing it. Fixes: 6a681cd90345 ("drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module") Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 17b2c526fd8026d8e0f4c0e7f94fc517e3901589)
2026-01-28BackMerge tag 'v6.19-rc7' into drm-nextDave Airlie
Linux 6.19-rc7 This is needed for msm and rust trees. Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-01-27drm/amd/pm: fix race in power state check before mutex lockYang Wang
The power state check in amdgpu_dpm_set_powergating_by_smu() is done before acquiring the pm mutex, leading to a race condition where: 1. Thread A checks state and thinks no change is needed 2. Thread B acquires mutex and modifies the state 3. Thread A returns without updating state, causing inconsistency Fix this by moving the mutex lock before the power state check, ensuring atomicity of the state check and modification. Fixes: 6ee27ee27ba8 ("drm/amd/pm: avoid duplicate powergate/ungate setting") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7a3fbdfd19ec5992c0fc2d0bd83888644f5f2f38)
2026-01-27drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_removeJon Doron
On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and ih2 interrupt ring buffers are not initialized. This is by design, as these secondary IH rings are only available on discrete GPUs. See vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when AMD_IS_APU is set. However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to get the timestamp of the last interrupt entry. When retry faults are enabled on APUs (noretry=0), this function is called from the SVM page fault recovery path, resulting in a NULL pointer dereference when amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[]. The crash manifests as: BUG: kernel NULL pointer dereference, address: 0000000000000004 RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu] Call Trace: amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu] svm_range_restore_pages+0xae5/0x11c0 [amdgpu] amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu] gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu] amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu] amdgpu_ih_process+0x84/0x100 [amdgpu] This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW IP 9.3.0 from noretry=1") which changed the default for Renoir APU from noretry=1 to noretry=0, enabling retry fault handling and thus exercising the buggy code path. Fix this by adding a check for ih1.ring_size before attempting to use it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu: Rework retry fault removal"). This is needed if the hardware doesn't support secondary HW IH rings. v2: additional updates (Alex) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814 Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6ce8d536c80aa1f059e82184f0d1994436b1d526) Cc: stable@vger.kernel.org
2026-01-27drm/amd/pm: fix smu v14 soft clock frequency setting issueYang Wang
v1: resolve the issue where some freq frequencies cannot be set correctly due to insufficient floating-point precision. v2: patch this convert on 'max' value only. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 53868dd8774344051999c880115740da92f97feb) Cc: stable@vger.kernel.org
2026-01-27drm/amd/pm: fix smu v13 soft clock frequency setting issueYang Wang
v1: resolve the issue where some freq frequencies cannot be set correctly due to insufficient floating-point precision. v2: patch this convert on 'max' value only. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6194f60c707e3878e120adeb36997075664d8429) Cc: stable@vger.kernel.org
2026-01-27drm/amdkfd: add extended capabilities to device snapshotJonathan Kim
Add additional capabilities reporting. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amdgpu: Send RMA CPER at bad page loadingKent Russell
Some older builds weren't sending RMA CPERs when the bad page threshold was exceeded. Newer builds have resolved this, but there could be systems out there with bad page numbers higher than the threshold, that haven't sent out an RMA CPER. To be thorough and safe, send an RMA CPER when we load the table, if the threshold is met or exceeded, instead of waiting for the next UE to trigger the CPER. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amdgpu: Fix jpeg ring test order in vcn_v4_0_3Jesse.Zhang
Fix the vcn reset sequence in vcn_v4_0_3_ring_reset() to restore JPEG power state and unlock the JPEG powergating mutex before running the JPEG ring post-reset helper. Fixes: d25c67fd9d6f ("drm/amdgpu/vcn4.0.3: rework reset handling") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/pm: fix race in power state check before mutex lockYang Wang
The power state check in amdgpu_dpm_set_powergating_by_smu() is done before acquiring the pm mutex, leading to a race condition where: 1. Thread A checks state and thinks no change is needed 2. Thread B acquires mutex and modifies the state 3. Thread A returns without updating state, causing inconsistency Fix this by moving the mutex lock before the power state check, ensuring atomicity of the state check and modification. Fixes: 6ee27ee27ba8 ("drm/amd/pm: avoid duplicate powergate/ungate setting") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: Promote DC to 3.2.367Taimur Hassan
* Fw release 0.1.44.0 * Fixes for corruption on platforms older than DCN4x. * Bug fixes related to USB4 link training * Fixes related to FP guard * Debug helpers and other stability fixes. * Some refactors to improve code quality Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: [FW Promotion] Release 0.1.44.0Taimur Hassan
* Panel Replay related features/bugfixes * BootCRC feature Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: Migrate HUBBUB register access from hwseq to hubbub component.Bhuvanachandra Pinninti
[why] Direct HUBBUB register access in the hwseq layer was creating register conflicts. [how] Migrated HUBBUB registers from hwseq to the hubbub component. Reviewed-by: Martin Leung <martin.leung@amd.com> Signed-off-by: Bhuvanachandra Pinninti <bpinnint@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: mouse event trigger to boost RR when idleMuaaz Nisar
[WHY+HOW] Add trigger event to boost refresh rate on mouse movement. Reviewed-by: Jun Lei <jun.lei@amd.com> Signed-off-by: Muaaz Nisar <muanisar@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: Add debug flag to override min dispclkMichael Strauss
[WHY] Enable dynamic ODM testing without needing a valid dispclk table [HOW] Create a debug flag to specify an override value for min dispclk Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: avoid dig reg access timeout on usb4 link training failZhongwei
[Why] When usb4 link training fails, the dpia sym clock will be disabled and SYMCLK source should be changed back to phy clock. In enable_streams, it is assumed that link training succeeded and will switch from refclk to phy clock. But phy clk here might not be on. Dig reg access timeout will occur. [How] When enable_stream is hit, check if link training failed for usb4. If it did, fall back to the ref clock to avoid reg access timeout. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Zhongwei <Zhongwei.Zhang@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: Remove unnecessary DC FP guardWayne Lin
[Why & How] For dcn2x_fast_validate_bw(), not only populate_dml_pipes needs FP guard but also dml_get_voltage_level(). Remove unnecessary DC_FP_START/DC_FP_END guard in dcn20_fast_validate_bw and dcn21_fast_validate_bw. FP guard is already there before calling dcn2x_validate_bandwidth_fp(). Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: add setup_stereo for dcn4x or laterCharlene Liu
[why] stereo_sync pin is removed, but we still support display stereo Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: perform clear update flags for all DCN asicsAurabindo Pillai
Existing version check that limits the sequence to clear update flags should be performed for all asics. Exclude DCE asics for now. Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-27drm/amd/display: Enable bootcrc on FW sideWayne Lin
[Why] The bootcrc feature is controlled on the FW side. [How] Pass the control bits in boot options to FW. Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>