summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)Author
2025-11-14Merge tag 'drm-xe-fixes-2025-11-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - New HW workarounds affecting PTL and WCL platforms (Nitin Gote, Tangudu Tilak Tirumalesh) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/ay2qztgonodwson6tuzcv5napjmqbgwzv27so4ybfola34guux@xgufrrmbzyws
2025-11-14Merge tag 'drm-intel-fixes-2025-11-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix PSR's pipe to vblank conversion (Jani) - Disable Panel Replay on MST links (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aRXdQnitzyFcokhF@intel.com
2025-11-14Merge tag 'drm-misc-fixes-2025-11-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: client: - Fix description of module parameter panthor: - Flush writes before mapping buffers vmwgfx: - Improve command validation - Improve ref counting - Fix cursor-plane support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251113132317.GA451885@linux.fritz.box
2025-11-13drm/client: fix MODULE_PARM_DESC string for "active"Randy Dunlap
The MODULE_PARM_DESC string for the "active" parameter is missing a space and has an extraneous trailing ']' character. Correct these. Before patch: $ modinfo -p ./drm_client_lib.ko active:Choose which drm client to start, default isfbdev] (string) After patch: $ modinfo -p ./drm_client_lib.ko active:Choose which drm client to start, default is fbdev (string) Fixes: f7b42442c4ac ("drm/log: Introduce a new boot logger to draw the kmsg on the screen") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251112010920.2355712-1-rdunlap@infradead.org
2025-11-12drm/i915/dp_mst: Disable Panel ReplayImre Deak
Disable Panel Replay on MST links until it's properly implemented. For instance the required VSC SDP is not programmed on MST and FEC is not enabled if Panel Replay is enabled. Fixes: 3257e55d3ea7 ("drm/i915/panelreplay: enable/disable panel replay") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15174 Cc: Jouni Högander <jouni.hogander@intel.com> Cc: Animesh Manna <animesh.manna@intel.com> Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251107124141.911895-1-imre.deak@intel.com (cherry picked from commit e109f644b871df8440c886a69cdce971ed533088) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-11drm/amdkfd: Fix GPU mappings for APU after prefetchHarish Kasiviswanathan
Fix the following corner case:- Consider a 2M huge page SVM allocation, followed by prefetch call for the first 4K page. The whole range is initially mapped with single PTE. After the prefetch, this range gets split to first page + rest of the pages. Currently, the first page mapping is not updated on MI300A (APU) since page hasn't migrated. However, after range split PTE mapping it not valid. Fix this by forcing page table update for the whole range when prefetch is called. Calling prefetch on APU doesn't improve performance. If all it deteriotes. However, functionality has to be supported. v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout VRAM v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU v4: Remove v2 and v3 changes. Force update_mapping when range is split at a size that is not aligned to prange granularity Suggested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 076470b9f6f8d9c7c8ca73a9f054942a686f9ba7)
2025-11-11drm/amdkfd: relax checks for over allocation of save areaJonathan Kim
Over allocation of save area is not fatal, only under allocation is. ROCm has various components that independently claim authority over save area size. Unless KFD decides to claim single authority, relax size checks. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 15bd4958fe38e763bc17b607ba55155254a01f55) Cc: stable@vger.kernel.org
2025-11-11drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1Sathishkumar S
enable parse_cs callback for JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 547985579932c1de13f57f8bcf62cd9361b9d3d3) Cc: stable@vger.kernel.org
2025-11-11drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BOSultan Alsawaf
When the BO pointer provided to amdgpu_bo_create_kernel() points to non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address rather than allocate a new BO. This functionality is never desired for allocating ISP buffers. A new BO should always be created when isp_kernel_buffer_alloc() is called, per the description for isp_kernel_buffer_alloc(). Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call. Fixes: 55d42f616976 ("drm/amd/amdgpu: Add helper functions for isp buffers") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 73c8c29baac7f0c7e703d92eba009008cbb5228e)
2025-11-11drm/amd/display: Allow VRR params change if unsynced with the streamIvan Lipski
[Why] When changing resolution (e.g., 4K → FHD) in mirror/clone mode with certain monitors, the monitor blanks and loses connection due to an early exit in vrr_settings_require_update(). The function only checks if VRR state, fixed refresh target, or min/max refresh rate range has changed. During mode changes, if the calculated min/max refresh values remain the same even though the stream's v_total changed, the function returns early without updating vrr_params.adjust.v_total_min/max, leaving the monitor's VRR timing parameters unsynced with the new mode, causing it to blank out. [How] Explicitly adjust VRR parameters to the stream's nominal v_total when VRR is supported, but inactive. Fixes: 6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 607df8248a011524211ee34850345305a1913f9e)
2025-11-11drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_processJesse.Zhang
Fix a potential deadlock caused by inconsistent spinlock usage between interrupt and process contexts in the userq fence driver. The issue occurs when amdgpu_userq_fence_driver_process() is called from both: - Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process() - Process context: amdgpu_eviction_fence_suspend_worker() -> amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process() In interrupt context, the spinlock was acquired without disabling interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock is acquired in process context, the kernel detects inconsistent locking since the process context acquisition would enable interrupts while holding a lock previously acquired in interrupt context. Kernel log shows: [ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3}, [ 4039.310993] {IN-HARDIRQ-W} state was registered at: [ 4039.311004] lock_acquire+0xc6/0x300 [ 4039.311018] _raw_spin_lock+0x39/0x80 [ 4039.311031] amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu] [ 4039.311146] amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu] [ 4039.311257] gfx_v11_0_eop_irq+0x132/0x170 [amdgpu] Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly manage interrupt state regardless of calling context. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ded3ad780cf97a04927773c4600823b84f7f3cc2) Cc: stable@vger.kernel.org
2025-11-11drm/amdgpu: jump to the correct label on failurePierre-Eric Pelloux-Prayer
drm_sched_entity_init wasn't called yet, so the only thing to do is to release allocated memory. This doesn't fix any bug since entity is zero allocated and drm_sched_entity_fini does nothing in this case. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ec49374ccb8da86b465beaf09c367f3dfd648d8f)
2025-11-11drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfacesVitaly Prosyak
Certain multi-GPU configurations (especially GFX12) may hit data corruption when a DCC-compressed VRAM surface is shared across GPUs using peer-to-peer (P2P) DMA transfers. Such surfaces rely on device-local metadata and cannot be safely accessed through a remote GPU’s page tables. Attempting to import a DCC-enabled surface through P2P leads to incorrect rendering or GPU faults. This change disables P2P for DCC-enabled VRAM buffers that are contiguous and allocated on GFX12+ hardware. In these cases, the importer falls back to the standard system-memory path, avoiding invalid access to compressed surfaces. Future work could consider optional migration (VRAM→System→VRAM) if a performance regression is observed when `attach->peer2peer = false`. Tested on: - Dual RX 9700 XT (Navi4x) setup - GNOME and Wayland compositor scenarios - Confirmed no corruption after disabling P2P under these conditions v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS. v3: simplify for upsteam and fix ip version check (Alex) Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9dff2bb709e6fbd97e263fd12bf12802d2b5a0cf) Cc: stable@vger.kernel.org
2025-11-10drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpgNitin Gote
Wa_15016589081 applies to Xe3_LPG renderCS Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20251106100516.318863-2-nitin.r.gote@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 715974499a2199bd199fb4630501f55545342ea4) Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-10drm/xe/xe3: Extend wa_14023061436Tangudu Tilak Tirumalesh
Extend wa_14023061436 to Graphics Versions 30.03, 30.04 and 30.05. Signed-off-by: Tangudu Tilak Tirumalesh <tilak.tirumalesh.tangudu@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251030154626.3124565-1-tilak.tirumalesh.tangudu@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 0dd656d06f50ae4cedf160634cf13fd9e0944cf7) Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-10drm/xe/xe3: Add WA_14024681466 for Xe3_LPGNitin Gote
Apply WA_14024681466 to Xe3_LPG graphics IP versions from 30.00 to 30.05. v2: (Matthew Roper) - Remove stepping filter as workaround applies to all steppings. - Add an engine class filter so it only applies to the RENDER engine. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20251027092643.335904-1-nitin.r.gote@intel.com Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 071089a69e199bd810ff31c4c933bd528e502743) Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-10drm/i915/psr: fix pipe to vblank conversionJani Nikula
First, we can't assume pipe == crtc index. If a pipe is fused off in between, it no longer holds. intel_crtc_for_pipe() is the only proper way to get from a pipe to the corresponding crtc. Second, drivers aren't supposed to access or index drm->vblank[] directly. There's drm_crtc_vblank_crtc() for this. Use both functions to fix the pipe to vblank conversion. Fixes: f02658c46cf7 ("drm/i915/psr: Add mechanism to notify PSR of pipe enable/disable") Cc: Jouni Högander <jouni.hogander@intel.com> Cc: stable@vger.kernel.org # v6.16+ Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Link: https://patch.msgid.link/20251106200000.1455164-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 2750f6765d6974f7e163c5d540a96c8703f6d8dd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-10drm/panthor: Flush shmem writes before mapping buffers CPU-uncachedBoris Brezillon
The shmem layer zeroes out the new pages using cached mappings, and if we don't CPU-flush we might leave dirty cachelines behind, leading to potential data leaks and/or asynchronous buffer corruption when dirty cachelines are evicted. Fixes: 8a1cc07578bf ("drm/panthor: Add GEM logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/20251107171214.1186299-1-boris.brezillon@collabora.com
2025-11-09Merge tag 'kbuild-fixes-6.18-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Strip trailing padding bytes from modules.builtin.modinfo to fix error during modules_install with certain versions of kmod - Drop unused static inline function warning in .c files with clang from W=1 to W=2 - Ensure kernel-doc.py invocations use the PYTHON3 make variable to ensure user's choice of Python interpreter is always respected * tag 'kbuild-fixes-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Let kernel-doc.py use PYTHON3 override compiler_types: Move unused static inline functions warning to W=2 kbuild: Strip trailing padding bytes from modules.builtin.modinfo
2025-11-08kbuild: Let kernel-doc.py use PYTHON3 overrideJean Delvare
It is possible to force a specific version of python to be used when building the kernel by passing PYTHON3= on the make command line. However kernel-doc.py is currently called with python3 hard-coded and thus ignores this setting. Use $(PYTHON3) to run $(KERNELDOC) so that the desired version of python is used. Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://patch.msgid.link/20251107192933.2bfe9e57@endymion Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-11-08Revert "drm/nouveau: set DMA mask before creating the flush page"Dave Airlie
This reverts commit ebe755605082eddff80eafe0c50915b1366ee98f. Tested the latest kernel on my GB203 and this seems to break it somehow. Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: GSP-FMC boot failed (mbox: 0x0000000b) Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: gsp: init failed, -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: init failed with -5 Nov 09 04:16:14 bighp kernel: nouveau: drm:00000000:00000080: init failed with -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: drm: Device allocation failed: -5 Nov 09 04:16:14 bighp kernel: nouveau 0000:02:00.0: probe with driver nouveau failed with error -5 Not sure why, I went over the patch and thought it should have worked, but there must be some 32-bit problem maybe in the FMC boot path. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-11-08Merge tag 'drm-xe-fixes-2025-11-07' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix missing synchronization on unbind (Balasubramani Vivekanandan) - Fix device shutdown when doing FLR (Jouni Högander) - Fix user fence signaling order (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/mvfyflloncy76a7nmkatpj6f2afddavwsibz3y4u4wo6gznro5@rdulkuh5wvje
2025-11-07drm/xe: Enforce correct user fence signaling order usingMatthew Brost
Prevent application hangs caused by out-of-order fence signaling when user fences are attached. Use drm_syncobj (via dma-fence-chain) to guarantee that each user fence signals in order, regardless of the signaling order of the attached fences. Ensure user fence writebacks to user space occur in the correct sequence. v7: - Skip drm_syncbj create of error (CI) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20251031234050.3043507-2-matthew.brost@intel.com (cherry picked from commit adda4e855ab6409a3edaa585293f1f2069ab7299) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07drm/xe: Do clean shutdown also when using flrJouni Högander
Currently Xe driver is triggering flr without any clean-up on shutdown. This is causing random warnings from pending related works as the underlying hardware is reset in the middle of their execution. Fix this by performing clean shutdown also when using flr. Fixes: 501d799a47e2 ("drm/xe: Wire up device shutdown handler") Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> (cherry picked from commit a4ff26b7c8ef38e4dd34f77cbcd73576fdde6dd4) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07drm/xe: Move declarations under conditional branchTejas Upadhyay
The xe_device_shutdown() function was needing a few declarations that were only required under a specific condition. This change moves those declarations to be within that conditional branch to avoid unnecessary declarations. Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit 15b3036045188f4da4ca62b2ed01b0f160252e9b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07drm/xe/guc: Synchronize Dead CT worker with unbindBalasubramani Vivekanandan
Cancel and wait for any Dead CT worker to complete before continuing with device unbinding. Else the worker will end up using resources freed by the undind operation. Cc: Zhanjun Dong <zhanjun.dong@intel.com> Fixes: d2c5a5a926f4 ("drm/xe/guc: Dead CT helper") Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20251103123144.3231829-6-balasubramani.vivekanandan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 492671339114e376aaa38626d637a2751cdef263) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07drm/vmwgfx: Restore Guest-Backed only cursor plane supportIan Forbes
The referenced fixes commit broke the cursor plane for configurations which have Guest-Backed surfaces but no cursor MOB support. Fixes: 965544150d1c ("drm/vmwgfx: Refactor cursor handling") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20251103201920.381503-1-ian.forbes@broadcom.com
2025-11-07drm/vmwgfx: Use kref in vmw_bo_dirtyIan Forbes
Rather than using an ad hoc reference count use kref which is atomic and has underflow warnings. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20251030193640.153697-1-ian.forbes@broadcom.com
2025-11-06drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZEIan Forbes
This data originates from userspace and is used in buffer offset calculations which could potentially overflow causing an out-of-bounds access. Fixes: 8ce75f8ab904 ("drm/vmwgfx: Update device includes for DX device functionality") Reported-by: Rohit Keshri <rkeshri@redhat.com> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20251021190128.13014-1-ian.forbes@broadcom.com
2025-11-07Merge tag 'mediatek-drm-fixes-20251105' of ↵Dave Airlie
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes - 20251105 1. Disable AFBC support on Mediatek DRM driver 2. Add pm_runtime support for GCE power control Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patch.msgid.link/20251105151443.3909-1-chunkuang.hu@kernel.org
2025-11-07Merge tag 'amd-drm-fixes-6.18-2025-11-06' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.18-2025-11-06: amdgpu: - Reset fixes - Misc fixes - Panel scaling fixes - HDMI fix - S0ix fixes - Hibernation fix - Secure display fix - Suspend fix - MST fix amdkfd: - Process cleanup fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20251106201326.807230-1-alexander.deucher@amd.com
2025-11-07Merge tag 'drm-intel-fixes-2025-11-06' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD (Janusz) - Fix conversion between clock ticks and nanoseconds (Umesh) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aQyxT1D8IW-xcDbM@intel.com
2025-11-06drm/amd/display: Enable mst when it's detected but yet to be initializedWayne Lin
[Why] drm_dp_mst_topology_queue_probe() is used under the assumption that mst is already initialized. If we connect system with SST first then switch to the mst branch during suspend, we will fail probing topology by calling the wrong API since the mst manager is yet to be initialized. [How] At dm_resume(), once it's detected as mst branc connected, check if the mst is initialized already. If not, call dm_helpers_dp_mst_start_top_mgr() instead to initialize mst V2: Adjust the commit msg a bit Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62320fb8d91a0bddc44a228203cfa9bfbb5395bd) Cc: stable@vger.kernel.org
2025-11-06drm/amdgpu: Fix wait after reset sequence in S3Lijo Lazar
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ad25fd272753db14c5d1cc8c68e20ce01f3f888)
2025-11-06drm/amd: Fix suspend failure with secure display TAMario Limonciello
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240)
2025-11-06drm/amdgpu: fix gpu page fault after hibernation on PF passthroughSamuel Zhang
On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
2025-11-06drm/tiny: pixpaper: add explicit dependency on MMULiangCheng Wang
The DRM_GEM_SHMEM_HELPER helper requires MMU enabled because it uses vmf_insert_pfn() in its mmap implementation. On NOMMU configurations (e.g. some RISC-V randconfig builds), this symbol is unavailable and selecting DRM_GEM_SHMEM_HELPER causes a modpost undefined reference: ERROR: modpost: "vmf_insert_pfn" [drivers/gpu/drm/drm_shmem_helper.ko] undefined! Normally, Kconfig prevents this helper from being selected when CONFIG_MMU=n. However, in some randconfig builds (such as those used by 0day CI), select statements can override unmet dependencies, triggering the issue. Add an explicit dependency on MMU to DRM_PIXPAPER to prevent this. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510280213.0rlYA4T3-lkp@intel.com/ Fixes: 0c4932f6ddf8 ("drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: LiangCheng Wang <zaq14760@gmail.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251028-bar-v1-1-edfbd13fafff@gmail.com
2025-11-06drm/nouveau: Advertise correct modifiers on GB20xJames Jones
8 and 16 bit formats use a different layout on GB20x than they did on prior chips. Add the corresponding DRM format modifiers to the list of modifiers supported by the display engine on such chips, and filter the supported modifiers for each format based on its bytes per pixel in nv50_plane_format_mod_supported(). Note this logic will need to be updated when GB10 support is added, since it is a GB20x chip that uses the pre-GB20x sector layout for all formats. Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
2025-11-06drm/nouveau: set DMA mask before creating the flush pageTimur Tabi
Set the DMA mask before calling nvkm_device_ctor(), so that when the flush page is created in nvkm_fb_ctor(), the allocation will not fail if the page is outside of DMA address space, which can easily happen if IOMMU is disable. In such situations, you will get an error like this: nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0). Commit 38f5359354d4 ("rm/nouveau/pci: set streaming DMA mask early") set the mask after calling nvkm_device_ctor(), but back then there was no flush page being created, which might explain why the mask wasn't set earlier. Flush page allocation was added in commit 5728d064190e ("drm/nouveau/fb: handle sysmem flush page from common code"). nvkm_fb_ctor() calls alloc_page(), which can allocate a page anywhere in system memory, but then calls dma_map_page() on that page. But since the DMA mask is still set to 32, the map can fail if the page is allocated above 4GB. This is easy to reproduce on systems with a lot of memory and IOMMU disabled. An alternative approach would be to force the allocation of the flush page to low memory, by specifying __GFP_DMA32. However, this would always allocate the page in low memory, even though the hardware can access high memory. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20251014174512.3172102-1-ttabi@nvidia.com
2025-11-05drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cbPierre-Eric Pelloux-Prayer
The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). Cc: stable@vger.kernel.org # v6.2+ Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [phasta: commit message nits] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
2025-11-04drm/amd/display: Fix NULL deref in debugfs odm_combine_segmentsRong Zhang
When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f19bbecd34e3c15eed7e5e593db2ac0fc7a0e6d8) Cc: stable@vger.kernel.org
2025-11-04drm/amdkfd: Don't clear PT after process killedPhilip Yang
If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "*ERROR* Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 10c382ec6c6d1e11975a11962bec21cba6360391) Cc: stable@vger.kernel.org
2025-11-04drm/amdgpu/smu: Handle S0ix for vangoghAlex Deucher
Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fd39b5a5830d8f2553e0c09d4d50bdff28b10080) Cc: <stable@vger.kernel.org> # c81f5cebe849: drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() Cc: <stable@vger.kernel.org>
2025-11-04drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()Alex Deucher
For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 960e30a61e1a7ca5341a6cf9481e770e1cda24aa)
2025-11-04drm/amd/display: Fix black screen with HDMI outputsAlex Hung
[Why & How] This fixes the black screen issue on certain APUs with HDMI, accompanied by the following messages: amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info frame on connector DP-1: -22 amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm] Cannot find any crtc or sizes Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters") Suggested-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 678c901443a6d2e909e3b51331a20f9d8f84ce82)
2025-11-04drm/amd/display: Don't stretch non-native images by default in eDPMario Limonciello (AMD)
commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native resolutions on eDP") started using the GPU scaler hardware to scale when a non-native resolution was picked on eDP. This scaling was done to fill the screen instead of maintain aspect ratio. The idea was supposed to be that if a different scaling behavior is preferred then the compositor would request it. The not following aspect ratio behavior however isn't desirable, so adjust it to follow aspect ratio and still try to fill screen. Note: This will lead to black bars in some cases for non-native resolutions. Compositors can request the previous behavior if desired. Fixes: 978fa2f6d0b1 ("drm/amd/display: Use scaling for non-native resolutions on eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 825df7ff4bb1a383ad4827545e09aec60d230770)
2025-11-04drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()Yang Wang
Use the correct label to complete all cleanup work. Fixes: 4d154b1ca580 ("drm/amd/pm: Add support for DPM policies") Fixes: 25e82f2e2c59 ("drm/amd/pm: Add temperature metrics sysfs entry") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4c4c138a1c86775c4d47e24f26357a1f8b64d0a3)
2025-11-04drm/amdgpu: set default gfx reset masks for gfx6-8Alex Deucher
These were not set so soft recovery was inadvertantly disabled. Fixes: 6ac55eab4fc4 ("drm/amdgpu: move reset support type checks into the caller") Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1972763505d728c604b537180727ec8132e619df)
2025-11-03drm/i915: Fix conversion between clock ticks and nanosecondsUmesh Nerlige Ramappa
When tick values are large, the multiplication by NSEC_PER_SEC is larger than 64 bits and results in bad conversions. The issue is seen in PMU busyness counters that look like they have wrapped around due to bad conversion. i915 PMU implementation returns monotonically increasing counters. If a count is lesser than previous one, it will only return the larger value until the smaller value catches up. The user will see this as zero delta between two measurements even though the engines are busy. Fix it by using mul_u64_u32_div() Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14955 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20251016000350.1152382-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit 2ada9cb1df3f5405a01d013b708b1b0914efccfe) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo: Added the Fixes tag while cherry-picking to fixes]
2025-11-03drm/i915: Avoid lock inversion when pinning to GGTT on CHV/BXT+VTDJanusz Krzysztofik
On completion of i915_vma_pin_ww(), a synchronous variant of dma_fence_work_commit() is called. When pinning a VMA to GGTT address space on a Cherry View family processor, or on a Broxton generation SoC with VTD enabled, i.e., when stop_machine() is then called from intel_ggtt_bind_vma(), that can potentially lead to lock inversion among reservation_ww and cpu_hotplug locks. [86.861179] ====================================================== [86.861193] WARNING: possible circular locking dependency detected [86.861209] 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 Tainted: G U [86.861226] ------------------------------------------------------ [86.861238] i915_module_loa/1432 is trying to acquire lock: [86.861252] ffffffff83489090 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x1c/0x50 [86.861290] but task is already holding lock: [86.861303] ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.862233] which lock already depends on the new lock. [86.862251] the existing dependency chain (in reverse order) is: [86.862265] -> #5 (reservation_ww_class_mutex){+.+.}-{3:3}: [86.862292] dma_resv_lockdep+0x19a/0x390 [86.862315] do_one_initcall+0x60/0x3f0 [86.862334] kernel_init_freeable+0x3cd/0x680 [86.862353] kernel_init+0x1b/0x200 [86.862369] ret_from_fork+0x47/0x70 [86.862383] ret_from_fork_asm+0x1a/0x30 [86.862399] -> #4 (reservation_ww_class_acquire){+.+.}-{0:0}: [86.862425] dma_resv_lockdep+0x178/0x390 [86.862440] do_one_initcall+0x60/0x3f0 [86.862454] kernel_init_freeable+0x3cd/0x680 [86.862470] kernel_init+0x1b/0x200 [86.862482] ret_from_fork+0x47/0x70 [86.862495] ret_from_fork_asm+0x1a/0x30 [86.862509] -> #3 (&mm->mmap_lock){++++}-{3:3}: [86.862531] down_read_killable+0x46/0x1e0 [86.862546] lock_mm_and_find_vma+0xa2/0x280 [86.862561] do_user_addr_fault+0x266/0x8e0 [86.862578] exc_page_fault+0x8a/0x2f0 [86.862593] asm_exc_page_fault+0x27/0x30 [86.862607] filldir64+0xeb/0x180 [86.862620] kernfs_fop_readdir+0x118/0x480 [86.862635] iterate_dir+0xcf/0x2b0 [86.862648] __x64_sys_getdents64+0x84/0x140 [86.862661] x64_sys_call+0x1058/0x2660 [86.862675] do_syscall_64+0x91/0xe90 [86.862689] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.862703] -> #2 (&root->kernfs_rwsem){++++}-{3:3}: [86.862725] down_write+0x3e/0xf0 [86.862738] kernfs_add_one+0x30/0x3c0 [86.862751] kernfs_create_dir_ns+0x53/0xb0 [86.862765] internal_create_group+0x134/0x4c0 [86.862779] sysfs_create_group+0x13/0x20 [86.862792] topology_add_dev+0x1d/0x30 [86.862806] cpuhp_invoke_callback+0x4b5/0x850 [86.862822] cpuhp_issue_call+0xbf/0x1f0 [86.862836] __cpuhp_setup_state_cpuslocked+0x111/0x320 [86.862852] __cpuhp_setup_state+0xb0/0x220 [86.862866] topology_sysfs_init+0x30/0x50 [86.862879] do_one_initcall+0x60/0x3f0 [86.862893] kernel_init_freeable+0x3cd/0x680 [86.862908] kernel_init+0x1b/0x200 [86.862921] ret_from_fork+0x47/0x70 [86.862934] ret_from_fork_asm+0x1a/0x30 [86.862947] -> #1 (cpuhp_state_mutex){+.+.}-{3:3}: [86.862969] __mutex_lock+0xaa/0xed0 [86.862982] mutex_lock_nested+0x1b/0x30 [86.862995] __cpuhp_setup_state_cpuslocked+0x67/0x320 [86.863012] __cpuhp_setup_state+0xb0/0x220 [86.863026] page_alloc_init_cpuhp+0x2d/0x60 [86.863041] mm_core_init+0x22/0x2d0 [86.863054] start_kernel+0x576/0xbd0 [86.863068] x86_64_start_reservations+0x18/0x30 [86.863084] x86_64_start_kernel+0xbf/0x110 [86.863098] common_startup_64+0x13e/0x141 [86.863114] -> #0 (cpu_hotplug_lock){++++}-{0:0}: [86.863135] __lock_acquire+0x1635/0x2810 [86.863152] lock_acquire+0xc4/0x2f0 [86.863166] cpus_read_lock+0x41/0x100 [86.863180] stop_machine+0x1c/0x50 [86.863194] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.863987] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.864735] __vma_bind+0x55/0x70 [i915] [86.865510] fence_work+0x26/0xa0 [i915] [86.866248] fence_notify+0xa1/0x140 [i915] [86.866983] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.867719] i915_sw_fence_commit+0x39/0x60 [i915] [86.868453] i915_vma_pin_ww+0x462/0x1360 [i915] [86.869228] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.870001] initial_plane_vma+0x307/0x840 [i915] [86.870774] intel_initial_plane_config+0x33f/0x670 [i915] [86.871546] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.872330] i915_driver_probe+0x7fa/0xe80 [i915] [86.873057] i915_pci_probe+0xe6/0x220 [i915] [86.873782] local_pci_probe+0x47/0xb0 [86.873802] pci_device_probe+0xf3/0x260 [86.873817] really_probe+0xf1/0x3c0 [86.873833] __driver_probe_device+0x8c/0x180 [86.873848] driver_probe_device+0x24/0xd0 [86.873862] __driver_attach+0x10f/0x220 [86.873876] bus_for_each_dev+0x7f/0xe0 [86.873892] driver_attach+0x1e/0x30 [86.873904] bus_add_driver+0x151/0x290 [86.873917] driver_register+0x5e/0x130 [86.873931] __pci_register_driver+0x7d/0x90 [86.873945] i915_pci_register_driver+0x23/0x30 [i915] [86.874678] i915_init+0x37/0x120 [i915] [86.875347] do_one_initcall+0x60/0x3f0 [86.875369] do_init_module+0x97/0x2a0 [86.875385] load_module+0x2c54/0x2d80 [86.875398] init_module_from_file+0x96/0xe0 [86.875413] idempotent_init_module+0x117/0x330 [86.875426] __x64_sys_finit_module+0x77/0x100 [86.875440] x64_sys_call+0x24de/0x2660 [86.875454] do_syscall_64+0x91/0xe90 [86.875470] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.875486] other info that might help us debug this: [86.875502] Chain exists of: cpu_hotplug_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [86.875539] Possible unsafe locking scenario: [86.875552] CPU0 CPU1 [86.875563] ---- ---- [86.875573] lock(reservation_ww_class_mutex); [86.875588] lock(reservation_ww_class_acquire); [86.875606] lock(reservation_ww_class_mutex); [86.875624] rlock(cpu_hotplug_lock); [86.875637] *** DEADLOCK *** [86.875650] 3 locks held by i915_module_loa/1432: [86.875663] #0: ffff888101f5c1b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x104/0x220 [86.875699] #1: ffffc90002e0b4a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.876512] #2: ffffc90002e0b4c8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.0+0x39/0x1d0 [i915] [86.877305] stack backtrace: [86.877326] CPU: 0 UID: 0 PID: 1432 Comm: i915_module_loa Tainted: G U 6.15.0-rc5-CI_DRM_16515-gca0305cadc2d+ #1 PREEMPT(voluntary) [86.877334] Tainted: [U]=USER [86.877336] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0079.2020.0420.1316 04/20/2020 [86.877339] Call Trace: [86.877344] <TASK> [86.877353] dump_stack_lvl+0x91/0xf0 [86.877364] dump_stack+0x10/0x20 [86.877369] print_circular_bug+0x285/0x360 [86.877379] check_noncircular+0x135/0x150 [86.877390] __lock_acquire+0x1635/0x2810 [86.877403] lock_acquire+0xc4/0x2f0 [86.877408] ? stop_machine+0x1c/0x50 [86.877422] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878173] cpus_read_lock+0x41/0x100 [86.878182] ? stop_machine+0x1c/0x50 [86.878191] ? __pfx_bxt_vtd_ggtt_insert_entries__cb+0x10/0x10 [i915] [86.878916] stop_machine+0x1c/0x50 [86.878927] bxt_vtd_ggtt_insert_entries__BKL+0x3b/0x60 [i915] [86.879652] intel_ggtt_bind_vma+0x43/0x70 [i915] [86.880375] __vma_bind+0x55/0x70 [i915] [86.881133] fence_work+0x26/0xa0 [i915] [86.881851] fence_notify+0xa1/0x140 [i915] [86.882566] __i915_sw_fence_complete+0x8f/0x270 [i915] [86.883286] i915_sw_fence_commit+0x39/0x60 [i915] [86.884003] i915_vma_pin_ww+0x462/0x1360 [i915] [86.884756] ? i915_vma_pin.constprop.0+0x6c/0x1d0 [i915] [86.885513] i915_vma_pin.constprop.0+0x133/0x1d0 [i915] [86.886281] initial_plane_vma+0x307/0x840 [i915] [86.887049] intel_initial_plane_config+0x33f/0x670 [i915] [86.887819] intel_display_driver_probe_nogem+0x1c6/0x260 [i915] [86.888587] i915_driver_probe+0x7fa/0xe80 [i915] [86.889293] ? mutex_unlock+0x12/0x20 [86.889301] ? drm_privacy_screen_get+0x171/0x190 [86.889308] ? acpi_dev_found+0x66/0x80 [86.889321] i915_pci_probe+0xe6/0x220 [i915] [86.890038] local_pci_probe+0x47/0xb0 [86.890049] pci_device_probe+0xf3/0x260 [86.890058] really_probe+0xf1/0x3c0 [86.890067] __driver_probe_device+0x8c/0x180 [86.890072] driver_probe_device+0x24/0xd0 [86.890078] __driver_attach+0x10f/0x220 [86.890083] ? __pfx___driver_attach+0x10/0x10 [86.890088] bus_for_each_dev+0x7f/0xe0 [86.890097] driver_attach+0x1e/0x30 [86.890101] bus_add_driver+0x151/0x290 [86.890107] driver_register+0x5e/0x130 [86.890113] __pci_register_driver+0x7d/0x90 [86.890119] i915_pci_register_driver+0x23/0x30 [i915] [86.890833] i915_init+0x37/0x120 [i915] [86.891482] ? __pfx_i915_init+0x10/0x10 [i915] [86.892135] do_one_initcall+0x60/0x3f0 [86.892145] ? __kmalloc_cache_noprof+0x33f/0x470 [86.892157] do_init_module+0x97/0x2a0 [86.892164] load_module+0x2c54/0x2d80 [86.892168] ? __kernel_read+0x15c/0x300 [86.892185] ? kernel_read_file+0x2b1/0x320 [86.892195] init_module_from_file+0x96/0xe0 [86.892199] ? init_module_from_file+0x96/0xe0 [86.892211] idempotent_init_module+0x117/0x330 [86.892224] __x64_sys_finit_module+0x77/0x100 [86.892230] x64_sys_call+0x24de/0x2660 [86.892236] do_syscall_64+0x91/0xe90 [86.892243] ? irqentry_exit+0x77/0xb0 [86.892249] ? sysvec_apic_timer_interrupt+0x57/0xc0 [86.892256] entry_SYSCALL_64_after_hwframe+0x76/0x7e [86.892261] RIP: 0033:0x7303e1b2725d [86.892271] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48 [86.892276] RSP: 002b:00007ffddd1fdb38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [86.892281] RAX: ffffffffffffffda RBX: 00005d771d88fd90 RCX: 00007303e1b2725d [86.892285] RDX: 0000000000000000 RSI: 00005d771d893aa0 RDI: 000000000000000c [86.892287] RBP: 00007ffddd1fdbf0 R08: 0000000000000040 R09: 00007ffddd1fdb80 [86.892289] R10: 00007303e1c03b20 R11: 0000000000000246 R12: 00005d771d893aa0 [86.892292] R13: 0000000000000000 R14: 00005d771d88f0d0 R15: 00005d771d895710 [86.892304] </TASK> Call asynchronous variant of dma_fence_work_commit() in that case. v3: Provide more verbose in-line comment (Andi), - mention target environments in commit message. Fixes: 7d1c2618eac59 ("drm/i915: Take reservation lock around i915_vma_pin.") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14985 Cc: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Acked-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20251023082925.351307-6-janusz.krzysztofik@linux.intel.com (cherry picked from commit 648ef1324add1c2e2b6041cdf0b28d31fbca5f13) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>