summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
10 daysMerge tag 'drm-next-2026-06-17' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - xe: add initial CRI platform support - amdgpu: initial HDMI 2.1 FRL support - rust: add some new type concepts for device lifetimes - scheduler: moves to a fair algorithm and lots of cleanups But it's mostly the usual mountain of changes across the board. core: - add docbook for DRM_IOCTL_SYNCOBJ_EVENTFD - change signature of drm_connector_attach_hdr_output_metadata_property - dedup counter and timestamp retrieval in vblank code - parse AMD VSDB v3 in CTA extension blocks - add P230, Y7, XYYY2101010, T430, XVUY210101010 formats - don't call drop master on file close if not master - use drm_printf_indent in atomic / bridge - fix 32b format descriptions - docs: fix toctree - hdmi: add common TMDS character rates - fix drm_syncobj_find_fence leak rust: - introduce Higher-Ranked lifetime types - replace drvdata with scoped registration data - add GPUVM immediate mode abstraction for rust GPU drivers - introduce DeviceContext type state for drm::Device bridge: - clarify drm_bridge_get/put - create drm_get_bridge_by_endpoint and use it - analogix_dp: add panel probing - ite-it6211 - use drm audio hdmi helpers buddy: - add lockdep annotations dp: - add PR and VRR updates - mst: fix buffer overflows - add Adaptive Sync SDP decoding support - fix OOB reads in dp-mst ttm: - bump fpfn/lpfn to 64-bit scheduler: - change default to fair scheduler - map runqueue 1:1 with scheduler dma-buf: - port selftests to kunit - convert dma-buf system/heap allocators to module - add separate DMABUF_HEAPS_SYSTEM_CC_SHARED Kconfig udmabuf: - revert hugetlb support - fix error with CONFIG_DMA_API_DEBUG dma-fence: - fix tracepoints lifetime - remove unused signal on any support ras: - add clear error counter netlink command to drm ras gpusvm: - reject VMAs with VM_IO or VM_PFNMAP when creating SVM ranges - use IOVA allocations pagemap: - use IOVA allocations panels: - update to use ref counts - add support for CSW PNB601LS1-2, LGD LP116WHA-SPB1 - add support for waveshare panels - CMN N116BCN-EA1, CMN N140HCA-EEK, IVO M140NWFQ R5, - IVO, R140NWFW R0, BOE NT140*, BOE NV133FHM-N4F, - AUO B140*, AUO B133HAN06.6 and AUO B116XTN02.3 eDP panels - Surface Pro 12 Panel xe: - add CRI PCI-IDs - debugfs add multi-lrc info - engine init cleanup - PF fair scheduling auto provisioning - system controller support for CRI/Xe3p - PXP state machine fixes - Reset/wedge/unload corner case fixes - Wedge path memory allocation fixes - PAT type cleanups - Reject unsafe PAT for CPU cached memory - OA improvements for CRI device memory - kernel doc syntax in xe headers - xe_drm.h documentation fixes - include guard cleanups - VF CCS memory pool - i915/xe step unification - Xe3p GT tuning fixes - forcewake cleanup in GT and GuC - admin-only PF mode - enable hwmon energy attributes for CRI - enable GT_MI_USER_INTERRUPT - refactor emit functions - oa workarounds - multi_queue: allow QUEUE_TIMESTAMP register - convert stolen memory to ttm range manager - use xe2 style blitter as a feature flag - make drm_driver const - add/use IRQ page to HW engine definition - fix oops when display disabled i915: - enable PIPEDMC_ERROR interrupt - more common display code refactoring - restructure DP/HDMI sink format handling - eliminate FB usage from lowlevel pinning code - panel replay bw optimization - integrate sharpness filter into the scaler - new fb_pin abstraction for xe/i915 fb transparent handling - skip inactive MST connectors on HDCP - start switching to display specific registers - use polling when irq unavailable - Adaptive-sync SDP prep amdgpu: - use drm_display_info for AMD VSDB data - Initial HDMI 2.1 FRL support - Initial DCN 4.2.1 support - GART fixes for non-4k pages - GC 11.5.6/SDMA 6.4.0/and other new IPs - GFX9/DCE6/Hawaii/SDMA4/GART/Userq fixes - Finish support for using multiple SDMA queues for TTM operations - SWSMU updates - GC 12.1 updates - SMU 15.0.8 updates - DCN 4.2 updates - DC type conversion fixes - Enable DC power module - Replay/PSR updates - SMU 13.x updates - Compute queue quantum MQD updates - ASPM fix - Align VKMS with common implementation - DC analog support fixes - UVD 3 fixes - TCC harvesting fixes for SI - GC 11 APU module reload fix - NBIO 6.3.2 support - IH 7.1 updates - DC cursor fixes - VCN/JPEG user fence fixes - DC support for connectors without DDC - Prefer ROM BAR for default VGA device - DC bandwidth fixes - Add PTL support for profiler - Introduce dc_plane_cm and migrate surface update color path - Add FRL registers for HDMI 2.1 - Restructure VM state machine - Auxless ALPM support - GEM_OP locking/warning fixes - switch to system_dfl_wq amdkfd: - GPUVM TLB flush fix - Hotplug fix - Boundary check fixes - SVM fixes - CRIU fixes - add profiler API - MES 12.1 updates msm: - core: - fix shrinker documentation - IFPC enabled for gen8 - PERFCNTR_CONFIG ioctl support - GPU: - reworked UBWC handling - a810 support - MDSS: - add support for Milos platform - reworked UBWC handling - DisplayPort: - reworked HPD handling as prep for MST - DPU: - Milos platform support - reworked UBWC handling - DSI: - Milos platform support nova: - Hopper/Blackwell enablement (GH100/GB100/GB202) - FSP support - 32-bit firmware support - HAL functions - refactor GSP boot/unload - GA100 support - VBIOS hardening/refactoring - Adopt higher order lifetime types tyr: - define register blocks - add shmem backed GEM objects - adopt higher order lifetime types - move clock cleanup into Drop radeon: - Hawaii SMU fixes - CS parser fix - use struct drm_edid instead of edid amdxdna: - export per-client BO memory via fdinfo - AIE4 device support - support medium/lower power modes - expandable device heap support - revert read-only user-pointer BO mappings ivpu: - support frequency limiting panthor: - enable GEM shrinker support - add eviction and reclaim info to fdinfo v3d: - enable runtime PM mgag200: - support XRGB1555 + C8 ast: - support XRGB1555 + C8 - use constants for lots of registers - fix register handling imagination: - fence handling refactoring nouveau: - fix sched double call - expose VBIOS on GSP-RM systems - add GA100 support virtio: - add VIRTIO_GPU_F_BLOB_ALIGNMENT flag - add deferred mapping support gud: - add RCade Display Adapter hibmc: - fix no connectors usage mediatek: - hdmi: convert error handling - simplify mtk_crtc allocation exynos: - move fbdev emulation to drm client buffers - use drm format helpers for geometry/size - adopt core DMA tracking - fix framebuffer offset handling renesas: - add RZ/T2H SOC support versilicon: - add cursor plane support tegra: - use drm client for framebuffer" * tag 'drm-next-2026-06-17' of https://gitlab.freedesktop.org/drm/kernel: (1731 commits) dma-buf: move system_cc_shared heap under separate Kconfig accel/amdxdna: Clear sva pointer after unbind agp/amd64: Fix broken error propagation in agp_amd64_probe() accel/amdxdna: Require carveout when PASID and force_iova are disabled drm/amdkfd: always resume_all after suspend_all drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_fini drm/amd/display: Consult MCCS FreeSync cap only if requested & supported drm/amd/pm: Use strscpy in profile mode parsing drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype length drm/amdkfd: fix sysfs topology prop length on buffer truncation drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages drm/amd/pm: bound OD parameter parsing to stack array size drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZE drm/amdkfd: Unwind debug trap enable on copy_to_user failure drm/amdgpu: validate the mes firmware version for gfx12.1 drm/amdgpu: validate the mes firmware version for gfx12 drm/amdgpu: compare MES firmware version ucode for gfx11 drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTS drm/amdgpu: restart the CS if some parts of the VM are still invalidated drm/amd/display: use unsigned types for local pipe and REG_GET counters ...
2026-06-13Merge tag 'drm-misc-fixes-2026-06-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: amd: - track colorop changes correctly amdxdna: - fix possible leak of mm_struct colorop: - make lut interpolation mutable - track colorop updates correctly ivpu: - fix integer truncation vc4: - fix leak in krealloc() error handling virtio: - fix dma_fence ref-count leak Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260612081418.GA17001@2a02-2455-9062-2500-e496-5a17-62ba-545e.dyn6.pyur.net
2026-06-10drm/amd/display: use plane color_mgmt_changed to track colorop changesMelissa Wen
Ensure the driver tracks changes in any colorop property of a plane color pipeline by using the same mechanism of CRTC color management and update plane color blocks when any colorop property changes. It fixes an issue observed on gamescope settings for night mode which is done via shaper/3D-LUT updates. Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patch.msgid.link/20260609110420.1298352-5-mwen@igalia.com
2026-06-08Merge tag 'amd-drm-next-7.2-2026-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-7.2-2026-06-04: amdgpu: - UserQ fix - Userptr fix - MCCS freesync fix - Remove some triggerable BUG() calls - DCN 4.2.1 fixes - Lockdep annotations - Guilty handling fix - VCN 5.3 fix - FRL fixes - Bounds checking fixes - HMM fix - IRQ accounting fix amdkfd: - Fix an event information leak - Events bounds check fix - Trap cleanup fix - Bounds checking fixes - MES fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604231801.19979-1-alexander.deucher@amd.com
2026-06-04drm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 115bf5ca318e18a3dc1888ec6271c7052774952a)
2026-06-04drm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 01112e241e37f9ac98b6f418d93ce2e0b87b7ee0)
2026-06-04drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 39eb6da7acee8d0cc12a8959235b590f295d7b4c)
2026-06-04drm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König
Make sure that we only submit work with full up to date VM page tables. Backport to 7.1 and older. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 59720bfd8c6dbebeb8d5a7ab64241b007efd9213) Cc: stable@vger.kernel.org
2026-06-04drm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0ac98160dfb6ab3c6d7b38e0ff9687780beed9cb)
2026-06-04drm/amdkfd: fix SMI event cross-process information leakYongqiang Sun
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 92a8dba246d371fe268280e5fd74b0955688e6df)
2026-06-04drm/amdkfd: always resume_all after suspend_allAlex Deucher
Need to restore any good queues even if the suspend_all failed for some. Always run remove_queue as that will schedule a GPU reset is removing the queue fails. v2: move resume_all after remove Fixes: eb067d65c33e ("drm/amdkfd: Update BadOpcode Interrupt handling with MES") Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/gfx: move fault and EOP IRQ get/put to hw_init/hw_finiYunxiang Li
priv_reg / priv_inst / bad_op and (on v11+) userq EOP IRQs are acquired in late_init but released in hw_fini. This split forced gfx_v9_0_hw_fini() to defensively guard each put with amdgpu_irq_enabled() because hw_fini runs on paths that may not reach late_init. amdgpu_ip_block_hw_fini() only runs after hw_init returns success, and suspend / resume cycle the refs through the same path, so hw_init / hw_fini pair without any extra tracking. Move the gets there and drop the guards. While here, fix the pre-existing partial-failure leak in set_userq_eop_interrupts() (gfx11 / 12_0 / 12_1). amdgpu_irq_get() increments the refcount before calling .set, so a failure partway through the loop leaves earlier successful gets stranded. Track the loop position and roll back on the enable path. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Consult MCCS FreeSync cap only if requested & supportedMichel Dänzer
When the do_mccs parameter is false, we don't call dm_helpers_read_mccs_caps, so sink->mccs_caps.freesync_supported is unlikely to be true. Fixes: 6f71d5dd3206 ("drm/amd/display: Read sink freesync support via mccs") Bug: https://gitlab.freedesktop.org/drm/amd/-/work_items/5286 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: Use strscpy in profile mode parsingLijo Lazar
Use strscpy to copy the buffer which makes it explicit that a valid NULL terminated string gets copied. Also, make it explicit that the source buffer can be copied safely to the temporary buffer by checking against its size. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Fix infinite loop parsing CRAT with zero subtype lengthYongqiang Sun
Malformed ACPI CRAT tables can advertise a zero or undersized subtype length. The parser then fails to advance the cursor and loops forever while the remaining image still looks large enough for a generic header. Validate sub_type_hdr->length on each iteration before parsing or advancing. Return -EINVAL and warn when length is zero or smaller than the generic subtype header. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: fix sysfs topology prop length on buffer truncationYongqiang Sun
sysfs_show_gen_prop() accumulated snprintf()'s return value into the offset. snprintf() reports bytes that would have been written, not bytes actually written, so a truncated sysfs show could over-report its length. Use sysfs_emit_at(), which returns only the bytes written. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pagesHonglei Huang
Since commit c08972f55594 ("drm/amdgpu: fix amdgpu_hmm_range_get_pages") moved mmu_interval_read_begin() out of the per-chunk loop, the captured notifier_seq is no longer refreshed across retries. As a result, the existing -EBUSY retry path can never make progress: hmm_range_fault() returns -EBUSY only when mmu_interval_check_retry(notifier, notifier_seq) reports that the sequence is stale. Once the sequence has advanced, the stored seq will never match again, so every subsequent call within the same invocation returns -EBUSY immediately. The "goto retry" therefore degenerates into a busy spin that simply burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before finally bailing out with -EAGAIN. This is pure latency with no chance of recovery, and it actively hurts the KFD userptr stack: the caller ends up blocked for a second while holding mmap_lock, only to return -EAGAIN to the restore worker (or to userspace) which would have re-driven the operation immediately anyway. Drop the retry/timeout entirely and let -EBUSY propagate straight to out_free_pfns, where it is already translated to -EAGAIN. Recovery is handled at a higher level: the KFD restore_userptr_worker reschedules itself, and the userptr ioctl path returns -EAGAIN to userspace. No functional regression: the previous behaviour on -EBUSY was already to fail with -EAGAIN after a 1s stall; we just skip the stall. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Honglei Huang <honghuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: bound OD parameter parsing to stack array sizeCandice Li
Reject inputs once parameter_size reaches the array limit, and pass ARRAY_SIZE(parameter) into parse_input_od_command_lines() for defense in depth. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/pm: Stop pp_od_clk_voltage emit at PAGE_SIZEAsad Kamal
Stop appending OD sections in amdgpu_get_pp_od_clk_voltage() once the sysfs page is full, instead of checking every sysfs_emit_at() in SMU helpers. This is purely defensive hardening. v2: Drop the prior series that checked sysfs_emit_at() return values in every SMU *_emit_clk_levels() helper and smu_cmn_print_*().(Kevin) v3: Update description, remove all clamping Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Unwind debug trap enable on copy_to_user failureYongqiang Sun
If kfd_dbg_trap_enable() fails while copying runtime_info to userspace, it had already activated the trap, set debug_trap_enabled, taken an extra process reference, and opened the debug event file. Return -EFAULT without unwinding that state, leaving inconsistent trap state and a refcount imbalance that could break later DISABLE/ENABLE. On copy_to_user failure, deactivate the trap and undo the rest of the enable setup before returning. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: validate the mes firmware version for gfx12.1Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: validate the mes firmware version for gfx12Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: compare MES firmware version ucode for gfx11Sunil Khatri
MES firmware should report the same version whether read from the register or from the firmware ucode binary. This is not always the case, so add a log when they mismatch. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: Add bounds check for AMDKFD_IOC_WAIT_EVENTSSunday Clement
The kfd_wait_on_events ioctl passes a user-supplied num_events parameter directly to alloc_event_waiters() which calls kcalloc() without validation. This allows unprivileged users with /dev/kfd access to trigger large kernel memory allocations, potentially causing memory exhaustion and denial of service via the OOM killer. Add a check to reject num_events values exceeding KFD_SIGNAL_EVENT_LIMIT (4096), which is the maximum number of events a single process can create. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: restart the CS if some parts of the VM are still invalidatedChristian König
Make sure that we only submit work with full up to date VM page tables. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: use unsigned types for local pipe and REG_GET countersAurabindo Pillai
Two small type fixes that match how the values are actually consumed: - decide_zstate_support() iterates from 0 to pipe_count, which is unsigned. Make the loop index unsigned int. - hpo_enc401_read_state() reads HDMI_PIXEL_ENCODING and HDMI_DEEP_COLOR_DEPTH via REG_GET_2(), which internally casts the output pointer to (uint32_t *). Passing the address of an int is a strict-aliasing wart even when the sizes match. Declare the locals as uint32_t. No behavioural change since the values are only compared against small non-negative constants. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: widen dc_hdmi_frl_flags.force_frl_rate to unsigned intAurabindo Pillai
dc_hdmi_frl_flags.force_frl_rate mirrors dc_debug_options.force_frl_rate, which was just widened to unsigned int. Match the type here too so the assignment in link_hdmi_frl.c does not narrow from unsigned to signed. All call sites in link_hdmi_frl.c only compare the value against 0, 0xF, or an hdmi_frl_link_rate enum whose values are non-negative, so the change is behaviour-preserving and does not introduce sign-compare warnings. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/userq: Fix reading timeline points in wait ioctlDavid Rosca
Use correct u64 type. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu/vcn5.0.0: enable secure submission on unified ring for VCN 5.3.0Jeevana Muthyala
Enable secure submission support on the unified ring for VCN IP version 5.3.0 by setting `secure_submission_supported = true` in vcn_v5_0_0_unified_ring_vm_funcs. Secure IB submission is supported on VCN 5.3.0 hardware/firmware, allowing protected decode workloads to bypass the common IB gate. Without this, secure playback submissions can be blocked and fail. Other VCN 5.x variants using the same vcn_v5_0_0_ip_block (e.g. IP_VERSION(5, 0, 0)) do not support secure submission on the unified ring and therefore continue using non-secure paths. This change only advertises existing hardware/firmware capability; non-secure decode paths remain unaffected. Signed-off-by: Jeevana Muthyala <Jeevana.Muthyala2@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: deprecate guilty handlingChristian König
The guilty handling tried to establish a second way of signaling problems with the GPU back to userspace. This caused quite a bunch of issue we had to work around, especially lifetime issues with the drm_sched_entity. Just drop the handling altogether and use the dma_fence based approach instead. v2: fix reversed condition in entity check (Alex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: Add lockdep annotations for lock ordering validationVitaly Prosyak
Add lockdep annotations to teach lockdep the correct lock hierarchy and catch ordering violations during development. This follows the pattern established by dma-resv in drivers/dma-buf/dma-resv.c. Lock ordering hierarchy (outermost to innermost): 1. userq_sch_mutex - Global userq scheduler (enforce_isolation) 2. userq_mutex - Per-context userq (held across queue create/destroy) 3. notifier_lock - MMU notifier synchronization 4. vram_lock - VRAM memory allocator 5. reset_domain->sem - GPU reset synchronization 6. reset_lock - Reset control mutex 7. srbm_mutex - SRBM register access 8. grbm_idx_mutex - GRBM index register access 9. mmio_idx_lock - MMIO index access (spinlock) The implementation provides: - Lock ordering training at module init (amdgpu_lockdep_init) - Lock class association for real driver locks (amdgpu_lockdep_set_class) Dummy locks are associated with the same class keys as real driver locks via lockdep_set_class(), ensuring lockdep connects the training ordering with actual runtime locks. Testing: Build the kernel with CONFIG_PROVE_LOCKING=y (enables CONFIG_LOCKDEP): scripts/config --enable PROVE_LOCKING scripts/config --enable DEBUG_LOCKDEP make -j$(nproc) On boot, dmesg should show: AMDGPU: Lockdep annotations initialized (9 lock levels) The companion IGT test (tests/amdgpu/amd_lockdep) exercises lock-heavy GPU code paths concurrently to trigger lockdep warnings on violations: sudo ./build/tests/amdgpu/amd_lockdep sudo dmesg | grep -A 50 "circular locking dependency" IGT subtests: concurrent-reset-and-submit - reset_sem vs submission locks concurrent-mmap-and-evict - mmap_lock vs vram_lock concurrent-userptr-and-reset - notifier_lock vs reset_sem stress-all-paths - all of the above simultaneously A clean dmesg (no "circular locking dependency" or "possible recursive locking detected" messages) confirms no lock ordering violations. For CI integration, the test should be run on kernels compiled with CONFIG_LOCKDEP=y; dmesg is scanned post-run for lockdep splats. v2: (Christian) - Move notifier_lock and vram_lock before reset locks in hierarchy. HMM invalidation holds notifier_lock and can wait for GPU reset completion, so notifier_lock must be outer to reset_domain->sem. - Associate dummy locks with lock class keys via lockdep_set_class() so lockdep connects training with real driver locks. - Update commit message to list all 9 lock levels. Requires CONFIG_PROVE_LOCKING=y to activate. Cc: Christian Konig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdkfd: fix SMI event cross-process information leakYongqiang Sun
kfd_smi_ev_enabled() skips the suser privilege check when pid=0. PROCESS_START, PROCESS_END, and VMFAULT events are emitted with pid=0 while carrying another process's PID and command name, so any /dev/kfd user in the render group can monitor all GPU workloads. Pass the target process PID into kfd_smi_event_add() for these events so the existing per-client filter restricts delivery to the owning process or CAP_SYS_ADMIN subscribers. Signed-off-by: Yongqiang Sun <Yongqiang.Sun@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Add DCN42B to dml21_translation_helperMatthew Stewart
Needed for DML to function with DCN42B. Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amd/display: Fix DCN42B version detectionMatthew Stewart
In resource_parse_asic_id, the check for GC_11_0_4 was unbounded, which caused it to override the detection of DCN42B. Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com> Reviewed-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04drm/amdgpu: Fix user-triggerable BUG()/BUG_ON() callsCe Sun
Replace BUG()/BUG_ON() with error logs and safe returns in several places where they can be triggered by invalid userspace input, preventing DoS via kernel panic. Signed-off-by: Ce Sun <cesun102@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-06-04Merge tag 'amd-drm-next-7.2-2026-06-03' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-7.2-2026-06-03: amdgpu: - BT.2020 fix for DCE - DC bounds checking fixes - SDMA 7.1 fix - UserQ fixes - SI fix - SMU 13 fixes - SMU 14 fixes - GC 12.1 fix - Userptr fix - GC 10.1 fix - GART fix for non-4K pages - DCN 4.x fixes - DCN 4.2 updates - More DC KUnit tests - PSR cleanup - Support for connectors without DDC pins - Initial DCN 4.2.1 support - Initial HDMI 2.1 FRL support - Misc bounds check fixes - RAS fixes - GC 11.5.6 support - SDMA 6.4.0 support - NBIO 7.11.5 support - IH 6.4.0 support - HDP 6.4.0 support - MMHUB 3.4.2 support - SMU 15.0.5 support - ATHUB 3.4.2 support - VPE 2.2 support - Devcoredump fixes - _PR3 fix amdkfd: - UAF race fix - Fix a potential NULL pointer dereference - GC 11 buffer overflow fix for SDMA - Profiler locking order fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260604013527.2373534-1-alexander.deucher@amd.com
2026-06-03drm/amd/pm: smu_v14_0_0: use SoftMin for gfxclk in set_soft_freq_limited_rangePriya Hosur
In smu_v14_0_0_set_soft_freq_limited_range(), the gfxclk floor is programmed via SetHardMinGfxClk together with SetSoftMaxGfxClk. Under power_dpm_force_performance_level=high this pins HardMin to peak gfxclk. In PMFW arbitration HardMin has higher priority than SoftMax, so the firmware thermal/PPT throttler cannot clamp gfxclk via SoftMax once HardMin is set to peak. Replace SetHardMinGfxClk with SetSoftMinGfxclk so the driver still requests peak performance but the firmware throttler retains the ability to clamp gfxclk under thermal/PPT pressure. SoftMax handling is unchanged and no other clock domains are affected. Signed-off-by: Priya Hosur <Priya.Hosur@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 3ea273267fd29cbf6d83ee72329f59eb5042605b) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu: Fix incorrect VRAM GART mappings on non-4K page size systemsDonet Tom
When mapping VRAM pages into the GART page table, amdgpu_gart_map_vram_range() assumes that the system page size is the same as the GPU page size. On systems with non-4K page sizes, multiple GPU pages can exist within a single CPU page. As a result, the mappings are created incorrectly because fewer page table entries are programmed than required. Fix this by programming the mappings correctly for non-4K page size systems. Fixes: 237d623ae659 ("drm/amdgpu/gart: Add helper to bind VRAM pages (v2)") Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8f0bc22388f74e0cf4ed8b7d1846c580eaf44cc) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu/userq: move wptr_obj cleanup in mqd_destroySunil Khatri
In case when queue_create fails and mqd has already been allocated and hence wptr_obj is not cleaned up. So moving that cleanup part to mqd_destroy so it takes care of all the cases of clean up and during tear down of the queue. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 43355f62cd2ef5386c2693df537c232ea0f2ce6c)
2026-06-03drm/amdgpu: improve the userq seq BO free bit lookupPrike Liang
Use find_next_zero_bit() to locate the next free seq slot bit instead of the current walk, for more efficient bitmap scanning. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ff905a9b6228de9eedd0db71ecb1bdde91fb898d)
2026-06-03drm/amdgpu/userq: remove the vital queue unmap loggingSunil Khatri
Mesa userqueues free does not wait for the free to complete and go ahead in unmapping the vital bos while kernel is still in queue free and corresponding cleanup. So ideally we don't need the logging for that and hence remove the warn message as this is expected behaviour and functionally, we are making sure to wait for the required fences before unmap. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 758a868043dcb07eca923bc451c16da3e73dc47c)
2026-06-03drm/amdkfd: Fix buffer overflow in SDMA queue checkpoint/restore on GFX11Andrew Martin
The v11 MQD manager incorrectly assigned the CP-compute variants of checkpoint_mqd/restore_mqd for KFD_MQD_TYPE_SDMA queues. These functions use sizeof(struct v11_compute_mqd) (2048 bytes) instead of sizeof(struct v11_sdma_mqd) (512 bytes), causing a 1536-byte overflow. During CRIU checkpoint of an SDMA queue on Navi3x: - checkpoint_mqd() reads 2048 bytes from a 512-byte SDMA MQD buffer, leaking 1536 bytes of adjacent GTT memory to userspace During CRIU restore: - restore_mqd() writes 2048 bytes into a 512-byte SDMA MQD buffer, corrupting 1536 bytes of adjacent GTT memory (often the ring buffer or neighboring MQDs) This is a copy-paste regression unique to v11. All other ASIC backends (cik, vi, v9, v10, v12) correctly use the SDMA-specific variants. Add checkpoint_mqd_sdma() and restore_mqd_sdma() functions that properly handle the smaller v11_sdma_mqd structure, matching the pattern used in other MQD managers. Fixes: cc009e613de6 ("drm/amdkfd: Add KFD support for soc21 v3") Assisted-by: Claude:Sonnet 4-5 Signed-off-by: Andrew Martin <andrew.martin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6fa41db7ffdec97d62433adf03b7b9b759af8c2c) Cc: stable@vger.kernel.org
2026-06-03drm/amdkfd: fix NULL dereference in get_queue_ids()Muhammad Bilal
When usr_queue_id_array is NULL and num_queues is non-zero, get_queue_ids() returns NULL. The callers check only IS_ERR() on the return value; since IS_ERR(NULL) == false the check passes, and suspend_queues() calls q_array_invalidate() which immediately dereferences NULL while iterating num_queues times. Userspace can trigger this via kfd_ioctl_set_debug_trap() by supplying num_queues > 0 with a zero queue_array_ptr, causing a kernel panic. A NULL usr_queue_id_array with num_queues == 0 is a legitimate no-op (q_array_invalidate never executes, and resume_queues already guards all queue_ids dereferences behind a NULL check). Return ERR_PTR(-EINVAL) only when num_queues is non-zero and the pointer is absent; both callers already propagate IS_ERR() returns correctly to userspace. Fixes: a70a93fa568b ("drm/amdkfd: add debug suspend and resume process queues operation") Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f165a82cdf503884bb1797771c61b2fcc72113d4) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu: set noretry=1 as default for GFX 10.1.x (Navi10/12/14)Vitaly Prosyak
Problem: While developing the amd_close_race IGT test (which intentionally triggers execute permission faults by removing VM_PAGE_EXECUTABLE from GPU page table entries), we discovered that on Navi10 (GFX 10.1.x) these faults produce zero diagnostic output. The GPU simply hangs silently for ~10s until the scheduler timeout fires. There is no way to distinguish an execute permission fault from any other type of GPU hang. Root cause: GFX 10.1.x defaults to noretry=0, which sets RETRY_PERMISSION_OR_INVALID_PAGE_FAULT=1 in the GFXHUB UTCL2 registers (gfxhub_v2_0.c line 313). With this bit set, permission faults (valid PTE, wrong R/W/X bits) are handled entirely within the UTCL1/UTCL2 hardware loop: UTCL2 returns an XNACK to UTCL1, and UTCL1 re-requests the translation indefinitely, expecting software to eventually fix the permission bits (as happens in SVM/HMM recovery). No interrupt of any kind reaches the IH ring. This is different from invalid-page faults (V=0) which DO generate a retry fault interrupt that the driver can escalate to a no-retry fault. Permission faults with valid PTEs loop silently forever in hardware. GFX 10.3+ already defaults to noretry=1, which makes permission faults generate immediate L2 protection fault interrupts. GFX 10.1.x was inadvertently left out of this default. Fix: Change the noretry=1 threshold from IP_VERSION(10, 3, 0) to IP_VERSION(10, 1, 0) in amdgpu_gmc_noretry_set(). This is a one-line change that aligns GFX 10.1.x behavior with GFX 10.3+ and all newer generations. With noretry=1, the existing non-retry fault handler (gmc_v10_0_process_interrupt) already decodes and prints the full GCVM_L2_PROTECTION_FAULT_STATUS register including PERMISSION_FAULTS, faulting address, VMID, PASID, and process name. No additional logging code is needed — the fix is purely routing permission faults to the existing, fully-capable non-retry interrupt handler. v2: Dropped GFX10-specific logging from gmc_v10_0.c and kfd_int_process_v10.c (Felix Kuehling). v1 added logging in the retry fault handler, but with noretry=1 permission faults take the non-retry path — the v1 retry handler code was dead and would never execute. Tested on Navi10 (GFX 10.1.10): - Execute permission faults now produce immediate, clear output: [gfxhub] page fault (src_id:0 ring:64 vmid:4 pasid:592) Process amd_close_race pid 13380 thread amd_close_race pid 13384 in page at address 0x40001000 from client 0x1b (UTCL2) GCVM_L2_PROTECTION_FAULT_STATUS:0x00700881 PERMISSION_FAULTS: 0x8 - No regressions with properly-mapped GPU workloads Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb21edd24c40d81066753f8ac6f23bce15745395) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu/gfxhub: Program CRASH_ON_*_FAULT bits to 0 as neededTimur Kristóf
When the fault stop mode isn't AMDGPU_VM_FAULT_STOP_ALWAYS, these bits should be programmed to 0. Program CRASH_ON_NO_RETRY_FAULT and CRASH_ON_RETRY_FAULT always, to make sure to clear the bits when we don't want to crash. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d0cd99e73090700b7a942b98a3327ec966597d0a)
2026-06-03drm/amdgpu: fix waiting for all submissions for userptrsChristian König
Wait for all submissions when userptrs need to be invalidated by the MMU notifier, not just the one the userptr was involved into. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Tested-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91250893cbaa25c86872deca95a540d08de1f91e) Cc: stable@vger.kernel.org
2026-06-03drm/amdgpu: drm/amdgpu: Set correct DMA mask for gfx12.1Harish Kasiviswanathan
Set correct DMA mask for gfx12 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a2ef14ee2593b48242b8d90f229f71c1710529da)
2026-06-03drm/amdgpu: Use asic specific pte_addr_maskHarish Kasiviswanathan
For PTE creation use asic specific physical page base address mask v2: Change variable name from pa_mask to pte_addr_mask Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2ea989885941a6e5607ef86dbe309e90b7191f21)
2026-06-03drm/amd/pm: zero unused SMU argument registersYang Wang
SMU messages may use fewer arguments than the available argument registers, the previous code only wrote used registers and left the rest unchanged, so stale values from a prior message could persist. Write all argument registers for each message and zero the unused tail to keep command arguments deterministic and avoid unintended carry-over. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e03b635f61f77ebd5107ef82f48e3221cb695856)
2026-06-03drm/amd/pm: mark metrics.energy_accumulator is invalid for smu 14.0.2Yang Wang
EnergyAccumulator is unsupported on SMU 14.0.2, mark it invalid. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 646b05043eeed04b51c14aad22a400a8250af4b7) Cc: stable@vger.kernel.org