summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
4 daysdrm/amdgpu: apply state adjust rules to some additional HAINAN vairantsAlex Deucher
They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0de31d92a173d3d94f28051b0b80a6c98913aed4) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: rework how we handle TLB fencesAlex Deucher
Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69c5fbd2b93b5ced77c6e79afe83371bca84c788) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu: Fix ISP segfault issue in kernel v7.0Pratap Nirujogi
Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c51632d1ed7ac5aed2d40dbc0718d75342c12c6a)
5 daysdrm/amdgpu/gmc9.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e14d468304832bcc4a082d95849bc0a41b18ddea) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub4.2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dea5f235baf3786bfd4fd920b03c19285fdc3d9f) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub4.1.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 04f063d85090f5dd0c671010ce88ee49d9dcc8ed) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f14f27bbe2a3ed7af32d5f6eaf3f417139f45253) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0.2: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1441f52c7f6ae6553664aa9e3e4562f6fc2fe8ea) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0.1: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5f76083183363c4528a4aaa593f5d38c28fe7d7b) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub2.3: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89cd90375c19fb45138990b70e9f4ba4806f05c4) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e064cef4b53552602bb6ac90399c18f662f3cacd) Cc: stable@vger.kernel.org
5 daysdrm/amd: fix dcn 2.01 checkAndy Nguyen
The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: 54b822b3eac3 ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9c7be0efa6f0daa949a5f3e3fdf9ea090b0713cb)
5 daysdrm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()Srinivasan Shanmugam
parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid || !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: a638b837d0e6 ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c7e6342e98c846b259c57273436fdea4c043f2)
5 daysdrm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END}Xi Ruoyao
[Why] The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is compiled with CC_FLAGS_FPU into FP instructions. So when we call it we must use DC_FP_{START,END} to save and restore the FP context, and prepare the FP unit on architectures like LoongArch where the FP unit isn't always on. Reported-by: LiarOnce <liaronce@hotmail.com> Fixes: ee7be8f3de1c ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25bb1d54ba3983c064361033a8ec15474fece37e) Cc: stable@vger.kernel.org
5 daysdrm/amd/display: Fix uninitialized variable use which breaks full LTOCalvin Owens
Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") introduced a use of an uninitialized stack variable in dm_dmub_sw_init() (region_params.bss_data_size). Interestingly, this seems to cause no issue on normal kernels. But when full LTO is enabled, it causes the compiler to "optimize" out huge swaths of amdgpu initialization code, and the driver is unusable: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00 amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init It surprises me that neither gcc nor clang emit a warning about this: I only found it by bisecting the LTO breakage. Fix by using the bss_data_size field from fw_meta_info_params, as was presumably intended. Fixes: e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7f1402f6ad24cc6b9a01fa09ebd1c6559d787d0)
5 daysdrm/amdgpu: Limit BO list entry count to prevent resource exhaustionJesse.Zhang
Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 688b87d39e0aa8135105b40dc167d74b5ada5332) Cc: stable@vger.kernel.org
5 daysdrm/amd/display: Fix gamma 2.2 colorop TFsAlex Hung
Use GAMMA22 for degamma/blend and GAMMA22_INV for shaper so curves match the color pipeline. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5016 Tested-by: Xaver Hugl <xaver.hugl@kde.org> Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8f9f42effd767ffa7bbcd7e05fbd6b20737e468)
11 daysdrm/amd: Set num IP blocks to 0 if discovery failsMario Limonciello
If discovery has failed for any reason (such as no support for a block) then there is no need to unwind all the IP blocks in fini. In this condition there can actually be failures during the unwind too. Reset num_ip_blocks to zero during failure path and skip the unnecessary cleanup path. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fae5984296b981c8cc3acca35b701c1f332a6cd8) Cc: stable@vger.kernel.org
11 daysdrm/amdkfd: Unreserve bo if queue update failedPhilip Yang
Error handling path should unreserve bo then return failed. Fixes: 305cd109b761 ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c24afed7de9ecce341825d8ab55a43a254348b33)
11 daysdrm/amd/display: Check for S0i3 to be done before DCCG init on DCN21Ivan Lipski
[WHY] On DCN21, dccg2_init() is called in dcn10_init_hw() before bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264, causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden init. Eventually during the resume sequence, a flip timeout occurs. [HOW] Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21. Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c61eda434336cf2c033aa35efdc9a08b31d2fdfa)
11 daysdrm/amd/display: Add missing DCCG register entries for DCN20-DCN316Ivan Lipski
Commit 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") moved register writes from hwseq to dccg2_*() functions but did not add the registers to the DCCG register list macros. The struct fields default to 0, so REG_WRITE() targets MMIO offset 0, causing a GPU hang on resume (seen on DCN21/DCN30 during IGT kms_cursor_crc@cursor-suspend). Add - MICROSECOND_TIME_BASE_DIV - MILLISECOND_TIME_BASE_DIV - DCCG_GATE_DISABLE_CNTL - DCCG_GATE_DISABLE_CNTL2 - DC_MEM_GLOBAL_PWR_REQ_CNTL to macros in dcn20_dccg.h, dcn301_dccg.h, dcn31_dccg.h, and dcn314_dccg.h. Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reported-by: Rafael Passos <rafael@rcpassos.me> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e6e2b956fc814de766d3480be7018297c41d3ce0)
2026-03-06drm/amd: Fix a few more NULL pointer dereference in device cleanupMario Limonciello
I found a few more paths that cleanup fails due to a NULL version pointer on unsupported hardware. Add NULL checks as applicable. Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f5a05f8414fc10f307eb965f303580c7778f8dd2) Cc: stable@vger.kernel.org
2026-03-06drm/amdgpu: fix gpu idle power consumption issue for gfx v12Yang Wang
Older versions of the MES firmware may cause abnormal GPU power consumption. When performing inference tasks on the GPU (e.g., with Ollama using ROCm), the GPU may show abnormal power consumption in idle state and incorrect GPU load information. This issue has been fixed in firmware version 0x8b and newer. Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4e22a5fe6ea6e0b057e7f246df4ac3ff8bfbc46a)
2026-03-06drm/amdgpu: Fix kernel-doc comments for some LUT propertiesCristian Ciocaltea
The following members of struct amdgpu_mode_info do not have valid references in the related kernel-doc sections: - plane_shaper_lut_property - plane_shaper_lut_size_property, - plane_lut3d_size_property Correct all affected comment blocks. Fixes: f545d82479b4 ("drm/amd/display: add plane shaper LUT and TF driver-specific properties") Fixes: 671994e3bf33 ("drm/amd/display: add plane 3D LUT driver-specific properties") Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ec5708d6e547f7efe2f009073bfa98dbc4c5c2ac)
2026-03-06drm/amd: Fix NULL pointer dereference in device cleanupMario Limonciello
When GPU initialization fails due to an unsupported HW block IP blocks may have a NULL version pointer. During cleanup in amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and amdgpu_device_set_cg_state which iterate over all IP blocks and access adev->ip_blocks[i].version without NULL checks, leading to a kernel NULL pointer dereference. Add NULL checks for adev->ip_blocks[i].version in both amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent dereferencing NULL pointers during GPU teardown when initialization has failed. Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7ac77468cda92eecae560b05f62f997a12fe2f2) Cc: stable@vger.kernel.org
2026-03-06drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14Yang Wang
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3 Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b5cf07d80bb16d1593579ccdb23f08ea4262c14)
2026-03-06drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13Yang Wang
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7 Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 576a10797b607ee9e4068218daf367b481564120)
2026-03-04drm/amdgpu/userq: refcount userqueues to avoid any race conditionsSunil Khatri
To avoid race condition and avoid UAF cases, implement kref based queues and protect the below operations using xa lock a. Getting a queue from xarray b. Increment/Decrement it's refcount Every time some one want to access a queue, always get via amdgpu_userq_get to make sure we have locks in place and get the object if active. A userqueue is destroyed on the last refcount is dropped which typically would be via IOCTL or during fini. v2: Add the missing drop in one the condition in the signal ioclt [Alex] v3: remove the queue from the xarray first in the free queue ioctl path [Christian] - Pass queue to the amdgpu_userq_put directly. - make amdgpu_userq_put xa_lock free since we are doing put for each get only and final put is done via destroy and we remove the queue from xa with lock. - use userq_put in fini too so cleanup is done fully. v4: Use xa_erase directly rather than doing load and erase in free ioctl. Also remove some of the error logs which could be exploited by the user to flood the logs [Christian] Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4952189b284d4d847f92636bb42dd747747129c0) Cc: <stable@vger.kernel.org> # 048c1c4e5171: drm/amdgpu/userq: Consolidate wait ioctl exit path Cc: <stable@vger.kernel.org>
2026-03-04drm/amdgpu/userq: Consolidate wait ioctl exit pathTvrtko Ursulin
If we gate the fence destruction with a check telling us whether there are valid pointers in there we can eliminate the need for dual, basically identical, exit paths. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bea29bb0dd29012949cd44fdb122465a9fd5cf91)
2026-03-04drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailboxsguttula
The reason the RAP is not granting access to 0x58200 is that a dedicated RSMU slot would have to be spent for this address range, and MPASP is close to running out of RSMU slots. This will help to fix PSP TOC load failure during secureboot. GFX Driver Need to use indirect access for SMN address regs. Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9b822e26eea3899003aa8a89d5e2c4408e066e20)
2026-03-04drm/amdgpu: Fix use-after-free race in VM acquireAlysa Liu
Replace non-atomic vm->process_info assignment with cmpxchg() to prevent race when parent/child processes sharing a drm_file both try to acquire the same VM after fork(). Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c7c573275ec20db05be769288a3e3bb2250ec618) Cc: stable@vger.kernel.org
2026-03-04drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.xYang Wang
v1: The metrics->EnergyAccumulator field has been deprecated on newer pmfw. v2: add smu 13.0.0/13.0.7/13.0.10 support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8de9edb35976fa56565dc8fbb5d1310e8e10187c) Cc: stable@vger.kernel.org
2026-03-02drm/amd/display: Fallback to boot snapshot for dispclkDillon Varone
[WHY & HOW] If the dentist is unavailable, fallback to reading CLKIP via the boot snapshot to get the current dispclk. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ab77600d1e55a042c02437326d3c7563e853c6c) Cc: stable@vger.kernel.org
2026-03-02drm/amdgpu: Enable DPG support for VCN5sguttula
This will set DPG flags for enabling power gating on GFX11_5_4 Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a503c266d70d3363ba6bffb883cd6ecdb092670c)
2026-03-02drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUTAlex Hung
[WHAT] Create DEGAMMA properties even if color pipeline is enabled, and enforce the mutual exclusion in atomic check by rejecting any commit that attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on the CRTC simultaneously. Fixes: 18a4127e9315 ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 196a6aa727f1f15eb54dda5e60a41543ea9397ee)
2026-03-02drm/amd/display: Use mpc.preblend flag to indicate 3D LUTAlex Hung
[WHAT] New ASIC's 3D LUT is indicated by mpc.preblend. Fixes: 0de2b1afea8d ("drm/amd/display: add 3D LUT colorop") Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 43175f6164d32cb96362d16e357689f74298145c)
2026-02-25drm/amd: Disable MES LR compute W/AMario Limonciello
A workaround was introduced in commit 1fb710793ce2 ("drm/amdgpu: Enable MES lr_compute_wa by default") to help with some hangs observed in gfx1151. This WA didn't fully fix the issue. It was actually fixed by adjusting the VGPR size to the correct value that matched the hardware in commit b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151"). There are reports of instability on other products with newer GC microcode versions, and I believe they're caused by this workaround. As we don't need the workaround any more, remove it. Fixes: b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9973e64bd6ee7642860a6f3b6958cbf14e89cabd) Cc: stable@vger.kernel.org
2026-02-25drm/amdgpu: Fix error handling in slot resetLijo Lazar
If the device has not recovered after slot reset is called, it goes to out label for error handling. There it could make decision based on uninitialized hive pointer and could result in accessing an uninitialized list. Initialize the list and hive properly so that it handles the error situation and also releases the reset domain lock which is acquired during error_detected callback. Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bb71362182e59caa227e4192da5a612b09349696)
2026-02-25drm/amdgpu/vcn5: Add SMU dpm interface typesguttula
This will set AMDGPU_VCN_SMU_DPM_INTERFACE_* smu_type based on soc type and fixing ring timeout issue seen for DPM enabled case. Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f0f23c315b38c55e8ce9484cf59b65811f350630)
2026-02-25drm/amdgpu: Fix locking bugs in error pathsBart Van Assche
Do not unlock psp->ras_context.mutex if it has not been locked. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa01b4335978051d2cd80841728fd63cc597970)
2026-02-25drm/amdgpu: Unlock a mutex before destroying itBart Van Assche
Mutexes must be unlocked before these are destroyed. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 270258ba320beb99648dceffb67e86ac76786e55)
2026-02-25drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sinkNatalie Vock
This can be called while preemption is disabled, for example by dcn32_internal_validate_bw which is called with the FPU active. Fixes "BUG: scheduling while atomic" messages I encounter on my Navi31 machine. Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b42dae2ebc5c84a68de63ec4ffdfec49362d53f1) Cc: stable@vger.kernel.org
2026-02-25drm/amdgpu: add upper bound check on user inputs in wait ioctlSunil Khatri
Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. v2: squash in Srini's fix Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fcec012c664247531aed3e662f4280ff804d1476) Cc: stable@vger.kernel.org
2026-02-25drm/amdgpu: add upper bound check on user inputs in signal ioctlSunil Khatri
Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit be267e15f99bc97cbe202cd556717797cdcf79a5) Cc: stable@vger.kernel.org
2026-02-25drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warningsTvrtko Ursulin
Userspace can either deliberately pass in the too small num_fences, or the required number can legitimately grow between the two calls to the userq wait ioctl. In both cases we do not want the emit the kernel warning backtrace since nothing is wrong with the kernel and userspace will simply get an errno reported back. So lets simply drop the WARN_ONs. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2c333ea579de6cc20ea7bc50e9595ef72863e65c)
2026-02-25drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctlTvrtko Ursulin
Drop reference to syncobj and timeline fence when aborting the ioctl due output array being too small. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 68951e9c3e6bb22396bc42ef2359751c8315dd27) Cc: <stable@vger.kernel.org> # v6.16+
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>