summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2026-01-21drm/amd/pm: Don't clear SI SMC table when setting power limitTimur Kristóf
There is no reason to clear the SMC table. We also don't need to recalculate the power limit then. Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e214d626253f5b180db10dedab161b7caa41f5e9)
2026-01-21drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT settingTimur Kristóf
Use WREG32 to write mmCG_THERMAL_INT. This is a direct access register. Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2555f4e4a741d31e0496572a8ab4f55941b4e30e)
2026-01-21drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()Alex Deucher
The function no longer signals the fence so rename it to better match what it does. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu: fix type for wptr in ring backupAlex Deucher
Needs to be a u64. Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu: mark invalid records with U64_MAXGangliang Xie
set retired_page of invalid ras records to U64_MAX, and skip them when reading ras records Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu: Avoid excessive dmesg logLijo Lazar
KIQ access is not guaranteed to work reliably under all reset situations. Avoid flooding dmesg with HDP flush failure messages. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu: Fix validating flush_gpu_tlb_pasid()Timur Kristóf
When a function holds a lock and we return without unlocking it, it deadlocks the kernel. We should always unlock before returning. This commit fixes suspend/resume on SI. Tested on two Tahiti GPUs: FirePro W9000 and R9 280X. Fixes: f4db9913e4d3 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202601190121.z9C0uml5-lkp@intel.com/ Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: simplify svm_range_unmap_from_gpus()Yury Norov
The function calls bitmap_or() followed by for_each_set_bit(). Switch it to the dedicated for_each_or_bit() and drop the temporary bitmap. Signed-off-by: Yury Norov <ynorov@nvidia.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: Do not include VGPR MSBs in saved PC during saveLancelot Six
The current trap handler uses the top bits of ttmp1 to store a copy of sq_wave_mode.*vgpr_msb (except for src2_vgpr_msb). This is so the effective values in sq_wave_mode can be cleared to ensure correct behavior of the trap handler. When saving sq_wave_mode, the trap handler correctly rebuilds the expected value (with *vgpr_msb restored), so the save area is correct. However, the PC itself is copied from ttmp[0:1], which contains the wave's PC as well as the saved MSBs. The debugger reads the PC from the save area and is confused when non-0 values from VGPR_MSBs are present. This patch fixes this by saving the PC in the save area's PC slot, not the composite of the PC and VGPR_MSBs. On restore, the VGPR_MSBs are restored from sq_wave_mode. Signed-off-by: Lancelot Six <lancelot.six@amd.com> Tested-by: Alexey Kondratiev <Alexey.Kondratiev@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Cc: Vladimir Indic <vladimir.indic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amd/pm: Correct comment above power2_cap attributesTimur Kristóf
Previously only Van Gogh supported this, but that is not true anymore since: commit 12c958d1db36 ("drm/amd/pm: Expose ppt1 limit for gc_v9_5_0") Update the comment to reflect that. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amd/pm: Return -EOPNOTSUPP when can't read power limitTimur Kristóf
So that hwmon_attributes_visible() will see that the power2_cap attributes should not be visible on GPUs that don't support the get_power_limit() function. This fixes an error when running the "sensors" command on SI. Fixes: 12c958d1db36 ("drm/amd/pm: Expose ppt1 limit for gc_v9_5_0") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amd/pm: Workaround SI powertune issue on Radeon 430 (v2)Timur Kristóf
Radeon 430 and 520 are OEM GPUs from 2016~2017 They have the same device id: 0x6611 and revision: 0x87 On the Radeon 430, powertune is buggy and throttles the GPU, never allowing it to reach its maximum SCLK. Work around this bug by raising the TDP limits we program to the SMC from 24W (specified by the VBIOS on Radeon 430) to 32W. Disabling powertune entirely is not a viable workaround, because it causes the Radeon 520 to heat up above 100 C, which I prefer to avoid. Additionally, revise the maximum SCLK limit. Considering the above issue, these GPUs never reached a high SCLK on Linux, and the workarounds were added before the GPUs were released, so the workaround likely didn't target these specifically. Use 780 MHz (the maximum SCLK according to the VBIOS on the Radeon 430). Note that the Radeon 520 VBIOS has a higher maximum SCLK: 905 MHz, but in practice it doesn't seem to perform better with the higher clock, only heats up more. v2: Move the workaround to si_populate_smc_tdp_limits. Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amd/pm: Don't clear SI SMC table when setting power limitTimur Kristóf
There is no reason to clear the SMC table. We also don't need to recalculate the power limit then. Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: gfx12.1 trap handler support for expert scheduling modeJay Cornwall
- Leave DEP_MODE unchanged as it is ignored in the trap handler - Save/restore SCHED_MODE (gfx12.0 saves in ttmp11) Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Cc: Vladimir Indic <vladimir.indic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: gfx12.1 cluster barrier context save workaroundJay Cornwall
Trap cluster barrier may not serialize with user cluster barrier under some circumstances. Add a check for pending user cluster barrier complete. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Tested-by: Gang Ba <Gang.Ba@amd.com> Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Cc: Vladimir Indic <vladimir.indic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: Fix scalar load ordering in gfx12.1 trap handlerJay Cornwall
Scalar loads may arrive out-of-order with respect to KMCNT. The affected code expects the two loads to arrive in-order. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Cc: Joseph Greathouse <joseph.greathouse@amd.com> Cc: Vladimir Indic <vladimir.indic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdkfd: Sync trap handler binary with sourceJay Cornwall
Binary and source desynced during branch activity. Source merge also introduced compile error. Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Cc: Vladimir Indic <vladimir.indic@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu/vcn5.0.1: rework reset handlingJesse.Zhang
Resetting VCN resets the entire tile, including jpeg. When resetting the VCN, we need to ensure that JPEG data blocks are accessible and we also need to handle the JPEG queue. Add a helper function to restore the JPEG queue during the VCN reset. v2: split the jpeg helper in two, in the top helper we can stop the sched workqueues and attempt to wait for any outstanding fences. Then in the bottom helper, we can force completion, re-init the rings, and restart the sched workqueues (Alex) v3: merge patches 4 and 5 into one patch (Alex) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu/vcn4.0.3: rework reset handlingJesse.Zhang
Resetting VCN resets the entire tile, including jpeg. When resetting the VCN, we need to ensure that JPEG data blocks are accessible and we also need to handle the JPEG queue. Add a helper function to restore the JPEG queue during the VCN reset. v2: split the jpeg helper in two, in the top helper we can stop the sched workqueues and attempt to wait for any outstanding fences. Then in the bottom helper, we can force completion, re-init the rings, and restart the sched workqueues (Alex) v3: merge patches 1 and 2 into one patch (Alex) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amdgpu/vcn4.0.3: implement DPG pause mode handling for VCN 4.0.3Jesse.Zhang
For MI projects, when Dynamic Power Gating (DPG) is enabled, VCN reset operations should be performed with DPG in pause mode. Otherwise, the hardware may perform undesirable reset operations Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT settingTimur Kristóf
Use WREG32 to write mmCG_THERMAL_INT. This is a direct access register. Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-21drm/bridge: fix kdoc syntaxLuca Ceresoli
Use the correct kdoc syntax for bullet list. Fixes kdoc error and warning: Documentation/gpu/drm-kms-helpers:197: ./drivers/gpu/drm/drm_bridge.c:1519: ERROR: Unexpected indentation. [docutils] Documentation/gpu/drm-kms-helpers:197: ./drivers/gpu/drm/drm_bridge.c:1521: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512302319.1PGGt3CN-lkp@intel.com/ Fixes: 9da0e06abda8 ("drm/bridge: deprecate of_drm_find_bridge()") Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://patch.msgid.link/20251231-drm-bridge-alloc-getput-drm_of_find_bridge-kdoc-fix-v1-1-193a03f0609c@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/xe: Update wedged.mode only after successful reset policy changeLukasz Laguna
Previously, the driver's internal wedged.mode state was updated without verifying whether the corresponding engine reset policy update in GuC succeeded. This could leave the driver reporting a wedged.mode state that doesn't match the actual reset behavior programmed in GuC. With this change, the reset policy is updated first, and the driver's wedged.mode state is modified only if the policy update succeeds on all available GTs. This patch also introduces two functional improvements: - The policy is sent to GuC only when a change is required. An update is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG, because only in that case the reset policy changes. For example, switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no need to send the same value to GuC. - An inconsistent_reset flag is added to track cases where reset policy update succeeds only on a subset of GTs. If such inconsistency is detected, future wedged mode configuration will force a retry of the reset policy update to restore a consistent state across all GTs. Fixes: 6b8ef44cc0a9 ("drm/xe: Introduce the wedged_mode debugfs") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 0f13dead4e0385859f5c9c3625a19df116b389d3) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-01-21drm/xe/migrate: fix job lock assertMatthew Auld
We are meant to be checking the user vm for the bind queue, but actually we are checking the migrate vm. For various reasons this is not currently firing but this will likely change in the future. Now that we have the user_vm attached to the bind queue, we can fix this by directly checking that here. Fixes: dba89840a920 ("drm/xe: Add GT TLB invalidation jobs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Link: https://patch.msgid.link/20260120110609.77958-4-matthew.auld@intel.com (cherry picked from commit 9dd1048bca4fe2aa67c7a286bafb3947537adedb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-01-21drm/xe/uapi: disallow bind queue sharingMatthew Auld
Currently this is very broken if someone attempts to create a bind queue and share it across multiple VMs. For example currently we assume it is safe to acquire the user VM lock to protect some of the bind queue state, but if allow sharing the bind queue with multiple VMs then this quickly breaks down. To fix this reject using a bind queue with any VM that is not the same VM that was originally passed when creating the bind queue. This a uAPI change, however this was more of an oversight on kernel side that we didn't reject this, and expectation is that userspace shouldn't be using bind queues in this way, so in theory this change should go unnoticed. Based on a patch from Matt Brost. v2 (Matt B): - Hold the vm lock over queue create, to ensure it can't be closed as we attach the user_vm to the queue. - Make sure we actually check for NULL user_vm in destruction path. v3: - Fix error path handling. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Arvind Yadav <arvind.yadav@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Link: https://patch.msgid.link/20260120110609.77958-3-matthew.auld@intel.com (cherry picked from commit 9dd08fdecc0c98d6516c2d2d1fa189c1332f8dab) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-01-21drm: rcar-du: lvds: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done. Since the companion bridge pointer is used by .atomic_enable, putting its reference in the remove function would be dangerous. Use .destroy to put it on final deallocation. Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-6-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/exynos: hdmi: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-5-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/mediatek: mtk_hdmi*: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done by using the drm_bridge::next_bridge pointer. Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-4-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/imx/dw-hdmi: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done. Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-3-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/meson/dw-hdmi: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done. dw_hdmi->bridge is used only in dw_hdmi_top_thread_irq(), so in order to avoid potential use-after-free ensure the irq is freed before putting the dw_hdmi->bridge reference. Acked-by: Maxime Ripard <mripard@kernel.org> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-2-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/bridge: dw-hdmi: convert to of_drm_find_and_get_bridge()Luca Ceresoli
of_drm_find_bridge() is deprecated. Move to its replacement of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it is put when done by using the drm_bridge::next_bridge pointer. Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-1-8d7a3dbacdf4@bootlin.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-01-21drm/bridge: simple: add the Algoltek AG6311 DP-to-HDMI bridgeVal Packett
The Algoltek AG6311 is a transparent DisplayPort to HDMI bridge. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Val Packett <val@packett.cool> Link: https://patch.msgid.link/20260120234029.419825-8-val@packett.cool Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-21drm/bridge: anx7625: Fix invalid EDID sizeLoic Poulain
DRM checks EDID block count against allocated size in drm_edid_valid function. We have to allocate the right EDID size instead of the max size to prevent the EDID to be reported as invalid. Cc: stable@kernel.org Fixes: 7c585f9a71aa ("drm/bridge: anx7625: use struct drm_edid more") Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20251218151307.95491-1-loic.poulain@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-21drm/tests/drm_buddy: Add tests for allocations exceeding max_orderSanjay Yadav
Add kunit tests that exercise edge cases where allocation requests exceed mm->max_order after rounding. This can happen with non-power-of-two VRAM sizes when the allocator rounds up requests. For example, with 10G VRAM (8G + 2G roots), mm->max_order represents the 8G block. A 9G allocation can round up to 16G in multiple ways: CONTIGUOUS allocation rounds to next power-of-two, or non-CONTIGUOUS with 8G min_block_size rounds to next alignment boundary. The test validates CONTIGUOUS and RANGE flag combinations, ensuring that only CONTIGUOUS-alone allocations use try_harder fallback, while other combinations return -EINVAL when rounded size exceeds memory, preventing BUG_ON assertions. Cc: Christian König <christian.koenig@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patch.msgid.link/20260108113227.2101872-6-sanjay.kumar.yadav@intel.com
2026-01-21drm/buddy: Prevent BUG_ON by validating rounded allocationSanjay Yadav
When DRM_BUDDY_CONTIGUOUS_ALLOCATION is set, the requested size is rounded up to the next power-of-two via roundup_pow_of_two(). Similarly, for non-contiguous allocations with large min_block_size, the size is aligned up via round_up(). Both operations can produce a rounded size that exceeds mm->size, which later triggers BUG_ON(order > mm->max_order). Example scenarios: - 9G CONTIGUOUS allocation on 10G VRAM memory: roundup_pow_of_two(9G) = 16G > 10G - 9G allocation with 8G min_block_size on 10G VRAM memory: round_up(9G, 8G) = 16G > 10G Fix this by checking the rounded size against mm->size. For non-contiguous or range allocations where size > mm->size is invalid, return -EINVAL immediately. For contiguous allocations without range restrictions, allow the request to fall through to the existing __alloc_contig_try_harder() fallback. This ensures invalid user input returns an error or uses the fallback path instead of hitting BUG_ON. v2: (Matt A) - Add Fixes, Cc stable, and Closes tags for context Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6712 Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Cc: <stable@vger.kernel.org> # v6.7+ Cc: Christian König <christian.koenig@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patch.msgid.link/20260108113227.2101872-5-sanjay.kumar.yadav@intel.com
2026-01-21drm/atmel-hlcdc: don't reject the commit if the src rect has fractional partsLudovic Desroches
Don’t reject the commit when the source rectangle has fractional parts. This can occur due to scaling: drm_atomic_helper_check_plane_state() calls drm_rect_clip_scaled(), which may introduce fractional parts while computing the clipped source rectangle. This does not imply the commit is invalid, so we should accept it instead of discarding it. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com> Link: https://patch.msgid.link/20251120-lcd_scaling_fix-v1-1-5ffc98557923@microchip.com Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
2026-01-20watchdog: softlockup: panic when lockup duration exceeds N thresholdsLi RongQing
The softlockup_panic sysctl is currently a binary option: panic immediately or never panic on soft lockups. Panicking on any soft lockup, regardless of duration, can be overly aggressive for brief stalls that may be caused by legitimate operations. Conversely, never panicking may allow severe system hangs to persist undetected. Extend softlockup_panic to accept an integer threshold, allowing the kernel to panic only when the normalized lockup duration exceeds N watchdog threshold periods. This provides finer-grained control to distinguish between transient delays and persistent system failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic when duration >= 1 * threshold (20s default, original behavior) - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.) The original behavior is preserved for values 0 and 1, maintaining full backward compatibility while allowing systems to tolerate brief lockups while still catching severe, persistent hangs. [lirongqing@baidu.com: v2] Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-21drm/atmel-hlcdc: fix use-after-free of drm_crtc_commit after releaseLudovic Desroches
The atmel_hlcdc_plane_atomic_duplicate_state() callback was copying the atmel_hlcdc_plane state structure without properly duplicating the drm_plane_state. In particular, state->commit remained set to the old state commit, which can lead to a use-after-free in the next drm_atomic_commit() call. Fix this by calling __drm_atomic_helper_duplicate_plane_state(), which correctly clones the base drm_plane_state (including the ->commit pointer). It has been seen when closing and re-opening the device node while another DRM client (e.g. fbdev) is still attached: ============================================================================= BUG kmalloc-64 (Not tainted): Poison overwritten ----------------------------------------------------------------------------- 0xc611b344-0xc611b344 @offset=836. First byte 0x6a instead of 0x6b FIX kmalloc-64: Restoring Poison 0xc611b344-0xc611b344=0x6b Allocated in drm_atomic_helper_setup_commit+0x1e8/0x7bc age=178 cpu=0 pid=29 drm_atomic_helper_setup_commit+0x1e8/0x7bc drm_atomic_helper_commit+0x3c/0x15c drm_atomic_commit+0xc0/0xf4 drm_framebuffer_remove+0x4cc/0x5a8 drm_mode_rmfb_work_fn+0x6c/0x80 process_one_work+0x12c/0x2cc worker_thread+0x2a8/0x400 kthread+0xc0/0xdc ret_from_fork+0x14/0x28 Freed in drm_atomic_helper_commit_hw_done+0x100/0x150 age=8 cpu=0 pid=169 drm_atomic_helper_commit_hw_done+0x100/0x150 drm_atomic_helper_commit_tail+0x64/0x8c commit_tail+0x168/0x18c drm_atomic_helper_commit+0x138/0x15c drm_atomic_commit+0xc0/0xf4 drm_atomic_helper_set_config+0x84/0xb8 drm_mode_setcrtc+0x32c/0x810 drm_ioctl+0x20c/0x488 sys_ioctl+0x14c/0xc20 ret_fast_syscall+0x0/0x54 Slab 0xef8bc360 objects=21 used=16 fp=0xc611b7c0 flags=0x200(workingset|zone=0) Object 0xc611b340 @offset=832 fp=0xc611b7c0 Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com> Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-2-79b615130dc3@microchip.com Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
2026-01-21drm/atmel-hlcdc: fix memory leak from the atomic_destroy_state callbackLudovic Desroches
After several commits, the slab memory increases. Some drm_crtc_commit objects are not freed. The atomic_destroy_state callback only put the framebuffer. Use the __drm_atomic_helper_plane_destroy_state() function to put all the objects that are no longer needed. It has been seen after hours of usage of a graphics application or using kmemleak: unreferenced object 0xc63a6580 (size 64): comm "egt_basic", pid 171, jiffies 4294940784 hex dump (first 32 bytes): 40 50 34 c5 01 00 00 00 ff ff ff ff 8c 65 3a c6 @P4..........e:. 8c 65 3a c6 ff ff ff ff 98 65 3a c6 98 65 3a c6 .e:......e:..e:. backtrace (crc c25aa925): kmemleak_alloc+0x34/0x3c __kmalloc_cache_noprof+0x150/0x1a4 drm_atomic_helper_setup_commit+0x1e8/0x7bc drm_atomic_helper_commit+0x3c/0x15c drm_atomic_commit+0xc0/0xf4 drm_atomic_helper_set_config+0x84/0xb8 drm_mode_setcrtc+0x32c/0x810 drm_ioctl+0x20c/0x488 sys_ioctl+0x14c/0xc20 ret_fast_syscall+0x0/0x54 Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com> Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-1-79b615130dc3@microchip.com Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>
2026-01-20drm/amd/display: Only poll analog connectorsTimur Kristóf
Analog connectors may be hot-plugged unlike other connector types that don't support HPD. Stop DRM from polling other connector types that don't support HPD, such as eDP, LVDS, etc. These were wrongly polled when analog connector support was added, causing issues with the seamless boot process. Fixes: c4f3f114e73c ("drm/amd/display: Poll analog connectors (v3)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e924c7004b08e4e173782bad60b27841d889e371)
2026-01-20drm/amdgpu: fix error handling in ib_schedule()Alex Deucher
If fence emit fails, free the fence if necessary. Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5eb680a06007f2f6ea333d11a4e29039da90614b)
2026-01-20drm/amdkfd: fix gfx11 restrictions on debugging cooperative launchJonathan Kim
Restrictions on debugging cooperative launch for GFX11 devices should align to CWSR work around requirements. i.e. devices without the need for the work around should not be subject to such restrictions. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 230ef3977d6ffdd498ffa9baa6f5a061786189bf)
2026-01-20drm/amdgpu: free hw_vm_fence when fail in amdgpu_job_allocJiqian Chen
If drm_sched_job_init fails, hw_vm_fence is not freed currently, then cause memory leak. Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Link: https://lore.kernel.org/amd-gfx/a5a828cb-0e4a-41f0-94c3-df31e5ddad52@amd.com/T/#t Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com> Reviewed-by: Amos Kong <kongjianjun@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d42ee457ccd1fb5da4c7f817825b2806ec36956)
2026-01-20drm/amdgpu: remove frame cntl for gfx v12Likun Gao
Remove emit_frame_cntl function for gfx v12, which is not support. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5aaa5058dec5bfdcb24c42fe17ad91565a3037ca) Cc: stable@vger.kernel.org
2026-01-21drm/msm/dp: Avoid division by zero in msm_dp_ctrl_config_msa()Nathan Chancellor
An (admittedly problematic) optimization change in LLVM 20 [1] turns known division by zero into the equivalent of __builtin_unreachable(), which invokes undefined behavior if it is encountered in a control flow graph, destroying code generation. When compile testing for x86_64, objtool flags an instance of this optimization triggering in msm_dp_ctrl_config_msa(), inlined into msm_dp_ctrl_on_stream(): drivers/gpu/drm/msm/msm.o: warning: objtool: msm_dp_ctrl_on_stream(): unexpected end of section .text.msm_dp_ctrl_on_stream The zero division happens if the else branch in the first if statement in msm_dp_ctrl_config_msa() is taken because pixel_div is initialized to zero and it is not possible for LLVM to eliminate the else branch since rate is still not known after inlining into msm_dp_ctrl_on_stream(). Transform the if statements into a switch statement with a default case with the existing error print and an early return to avoid the invalid division. Add a comment to note this helps the compiler, even though the case is known to be unreachable. With this, pixel_dev's default zero initialization can be dropped, as it is dead with this change. Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Link: https://github.com/llvm/llvm-project/commit/37932643abab699e8bb1def08b7eb4eae7ff1448 [1] Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601081959.9UVJEOfP-lkp@intel.com/ Suggested-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698355/ Link: https://lore.kernel.org/r/20260113-drm-msm-dp_ctrl-avoid-zero-div-v2-1-f1aa67bf6e8e@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-21drm/msm/dpu: try reserving the DSPP-less LM firstDmitry Baryshkov
On most of the platforms only some mixers have connected DSPP blocks. If DSPP is not required for the CRTC, try looking for the LM with no DSSP block, leaving DSPP-enabled LMs to CRTCs which actually require those. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698773/ Link: https://lore.kernel.org/r/20260115-dpu-fix-dspp-v1-2-b73152c147b3@oss.qualcomm.com
2026-01-21drm/msm/dpu: correct error messages in RMDmitry Baryshkov
Some of error messages in RM reference block index, while other print the enum value (which is shifted by 1), not to mention that some of the messages are misleading. Reformat the messages, making them more clear and also always printing the hardware block name. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698774/ Link: https://lore.kernel.org/r/20260115-dpu-fix-dspp-v1-1-b73152c147b3@oss.qualcomm.com
2026-01-21drm/msm/dpu: Add support for Kaanapali DPUYuanjie Yang
Add support for Display Processing Unit (DPU) version 13.0 on the Kaanapali platform. This version introduces changes to the SSPP sub-block structure. Add common block and rectangle blocks to accommodate these structural modifications for compatibility. Co-developed-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Signed-off-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Yuanjie Yang <yuanjie.yang@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698716/ Link: https://lore.kernel.org/r/20260115092749.533-13-yuanjie.yang@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-21drm/msm/dpu: Add Kaanapali WB supportYuanjie Yang
Add support for Kaanapali WB, which introduce register relocations, use the updated registeri definition to ensure compatibility. Co-developed-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Signed-off-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Yuanjie Yang <yuanjie.yang@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698715/ Link: https://lore.kernel.org/r/20260115092749.533-12-yuanjie.yang@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-21drm/msm/dpu: Add Kaanapali SSPP sub-block supportYuanjie Yang
Add support for Kaanapali platform SSPP sub-blocks, which introduce structural changes including register additions, removals, and relocations. Add the new common and rectangle blocks, and update register definitions and handling to ensure compatibility with DPU v13.0. Co-developed-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Signed-off-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com> Signed-off-by: Yuanjie Yang <yuanjie.yang@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Patchwork: https://patchwork.freedesktop.org/patch/698712/ Link: https://lore.kernel.org/r/20260115092749.533-11-yuanjie.yang@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>