summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm
AgeCommit message (Collapse)Author
41 hoursMerge tag 'drm-xe-fixes-2026-03-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - A number of teardown fixes (Daniele, Matt Brost, Zhanjun, Ashutosh) - Skip over non-leaf PTE for PRL generation (Brian) - Fix an unitialized variable (Umesh) - Fix a missing runtime PM reference (Sanjay) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/abxj4_dBHYBiSvDG@fedora
42 hoursMerge tag 'amd-drm-fixes-7.0-2026-03-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.0-2026-03-19: amdgpu: - Fix gamma 2.2 colorop TFs - BO list fix - LTO fix - DC FP fix - DisplayID handling fix - DCN 2.01 fix - MMHUB boundary fixes - ISP fix - TLB fence fix - Hainan pm fix radeon: - Hainan pm fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260319131013.36639-1-alexander.deucher@amd.com
42 hoursMerge tag 'drm-misc-fixes-2026-03-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A doc warning fix and a memory leak fix for vmwgfx, a deadlock fix and interrupt handling fixes for imagination, a locking fix for pagemap_until, a UAF fix for drm_dev_unplug, and a multi-channel audio handling fix for dw-hdmi-qp. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260319-lush-righteous-malamute-e7bb98@houat
3 daysdrm/xe: Fix missing runtime PM reference in ccs_mode_storeSanjay Yadav
ccs_mode_store() calls xe_gt_reset() which internally invokes xe_pm_runtime_get_noresume(). That function requires the caller to already hold an outer runtime PM reference and warns if none is held: [46.891177] xe 0000:03:00.0: [drm] Missing outer runtime PM protection [46.891178] WARNING: drivers/gpu/drm/xe/xe_pm.c:885 at xe_pm_runtime_get_noresume+0x8b/0xc0 Fix this by protecting xe_gt_reset() with the scope-based guard(xe_pm_runtime)(xe), which is the preferred form when the reference lifetime matches a single scope. v2: - Use scope-based guard(xe_pm_runtime)(xe) (Shuicheng) - Update commit message accordingly Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7593 Fixes: 480b358e7d8e ("drm/xe: Do not wake device during a GT reset") Cc: <stable@vger.kernel.org> # v6.19+ Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260313071608.3459480-2-sanjay.kumar.yadav@intel.com (cherry picked from commit 7937ea733f79b3f25e802a0c8360bf7423856f36) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe: Open-code GGTT MMIO access protectionMatthew Brost
GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com (cherry picked from commit 4f3a998a173b4325c2efd90bdadc6ccd3ad9a431) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe/lrc: Fix uninitialized new_ts when capturing context timestampUmesh Nerlige Ramappa
Getting engine specific CTX TIMESTAMP register can fail. In that case, if the context is active, new_ts is uninitialized. Fix that case by initializing new_ts to the last value that was sampled in SW - lrc->ctx_timestamp. Flagged by static analysis. v2: Fix new_ts initialization (Ashutosh) Fixes: bb63e7257e63 ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260312125308.3126607-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit 466e75d48038af252187855058a7a9312db9d2f8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe/oa: Allow reading after disabling OA streamAshutosh Dixit
Some OA data might be present in the OA buffer when OA stream is disabled. Allow UMD's to retrieve this data, so that all data till the point when OA stream is disabled can be retrieved. v2: Update tail pointer after disable (Umesh) Fixes: efb315d0a013 ("drm/xe/oa/uapi: Read file_operation") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa<umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260313053630.3176100-1-ashutosh.dixit@intel.com (cherry picked from commit 4ff57c5e8dbba23b5457be12f9709d5c016da16e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe: Skip over non leaf pte for PRL generationBrian Nguyen
The check using xe_child->base.children was insufficient in determining if a pte was a leaf node. So explicitly skip over every non-leaf pt and conditionally abort if there is a scenario where a non-leaf pt is interleaved between leaf pt, which results in the page walker skipping over some leaf pt. Note that the behavior being targeted for abort is PD[0] = 2M PTE PD[1] = PT -> 512 4K PTEs PD[2] = 2M PTE results in abort, page walker won't descend PD[1]. With new abort, ensuring valid PRL before handling a second abort. v2: - Revert to previous assert. - Revised non-leaf handling for interleaf child pt and leaf pte. - Update comments to specifications. (Stuart) - Remove unnecessary XE_PTE_PS64. (Matthew B) v3: - Modify secondary abort to only check non-leaf PTEs. (Matthew B) Fixes: b912138df299 ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 1d123587525db86cc8f0d2beb35d9e33ca3ade83) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe/guc: Ensure CT state transitions via STOP before DISABLEDZhanjun Dong
The GuC CT state transition requires moving to the STOP state before entering the DISABLED state. Update the driver teardown sequence to make the proper state machine transitions. Fixes: ee4b32220a6b ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com (cherry picked from commit dace8cb0032f57ea67c87b3b92ad73c89dd2db44) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe: Trigger queue cleanup if not in wedged mode 2Matthew Brost
The intent of wedging a device is to allow queues to continue running only in wedged mode 2. In other modes, queues should initiate cleanup and signal all remaining fences. Fix xe_guc_submit_wedge to correctly clean up queues when wedge mode != 2. Fixes: 7dbe8af13c18 ("drm/xe: Wedge the entire device") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-4-zhanjun.dong@intel.com (cherry picked from commit e25ba41c8227c5393c16e4aab398076014bd345f) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe: Forcefully tear down exec queues in GuC submit finiMatthew Brost
In GuC submit fini, forcefully tear down any exec queues by disabling CTs, stopping the scheduler (which cleans up lost G2H), killing all remaining queues, and resuming scheduling to allow any remaining cleanup actions to complete and signal any remaining fences. Split guc_submit_fini into device related and software only part. Using device-managed and drm-managed action guarantees the correct ordering of cleanup. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com (cherry picked from commit a6ab444a111a59924bd9d0c1e0613a75a0a40b89) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe: Always kill exec queues in xe_guc_submit_pause_abortMatthew Brost
xe_guc_submit_pause_abort is intended to be called after something disastrous occurs (e.g., VF migration fails, device wedging, or driver unload) and should immediately trigger the teardown of remaining submission state. With that, kill any remaining queues in this function. Fixes: 7c4b7e34c83b ("drm/xe/vf: Abort VF post migration recovery on failure") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-2-zhanjun.dong@intel.com (cherry picked from commit 78f3bf00be4f15daead02ba32d4737129419c902) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
3 daysdrm/xe/guc: Fail immediately on GuC load errorDaniele Ceraolo Spurio
By using the same variable for both the return of poll_timeout_us and the return of the polled function guc_wait_ucode, the return value of the latter is overwritten and lost after exiting the polling loop. Since guc_wait_ucode returns -1 on GuC load failure, we lose that information and always continue as if the GuC had been loaded correctly. This is fixed by simply using 2 separate variables. Fixes: a4916b4da448 ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com (cherry picked from commit c85ec5c5753a46b5c2aea1292536487be9470ffe) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
4 daysdrm/i915/gt: Check set_default_submission() before deferencingRahul Bukte
When the i915 driver firmware binaries are not present, the set_default_submission pointer is not set. This pointer is dereferenced during suspend anyways. Add a check to make sure it is set before dereferencing. [ 23.289926] PM: suspend entry (deep) [ 23.293558] Filesystems sync: 0.000 seconds [ 23.298010] Freezing user space processes [ 23.302771] Freezing user space processes completed (elapsed 0.000 seconds) [ 23.309766] OOM killer disabled. [ 23.313027] Freezing remaining freezable tasks [ 23.318540] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 23.342038] serial 00:05: disabled [ 23.345719] serial 00:02: disabled [ 23.349342] serial 00:01: disabled [ 23.353782] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 23.358993] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [ 23.361635] ata1.00: Entering standby power mode [ 23.368863] ata2.00: Entering standby power mode [ 23.445187] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 23.452194] #PF: supervisor instruction fetch in kernel mode [ 23.457896] #PF: error_code(0x0010) - not-present page [ 23.463065] PGD 0 P4D 0 [ 23.465640] Oops: Oops: 0010 [#1] SMP NOPTI [ 23.469869] CPU: 8 UID: 0 PID: 211 Comm: kworker/u48:18 Tainted: G S W 6.19.0-rc4-00020-gf0b9d8eb98df #10 PREEMPT(voluntary) [ 23.482512] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN [ 23.496511] Workqueue: async async_run_entry_fn [ 23.501087] RIP: 0010:0x0 [ 23.503755] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 23.510324] RSP: 0018:ffffb4a60065fca8 EFLAGS: 00010246 [ 23.515592] RAX: 0000000000000000 RBX: ffff9f428290e000 RCX: 000000000000000f [ 23.522765] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff9f428290e000 [ 23.529937] RBP: ffff9f4282907070 R08: ffff9f4281130428 R09: 00000000ffffffff [ 23.537111] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f42829070f8 [ 23.544284] R13: ffff9f4282906028 R14: ffff9f4282900000 R15: ffff9f4282906b68 [ 23.551457] FS: 0000000000000000(0000) GS:ffff9f466b2cf000(0000) knlGS:0000000000000000 [ 23.559588] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 23.565365] CR2: ffffffffffffffd6 CR3: 000000031c230001 CR4: 0000000000f70ef0 [ 23.572539] PKRU: 55555554 [ 23.575281] Call Trace: [ 23.577770] <TASK> [ 23.579905] intel_engines_reset_default_submission+0x42/0x60 [ 23.585695] __intel_gt_unset_wedged+0x191/0x200 [ 23.590360] intel_gt_unset_wedged+0x20/0x40 [ 23.594675] gt_sanitize+0x15e/0x170 [ 23.598290] i915_gem_suspend_late+0x6b/0x180 [ 23.602692] i915_drm_suspend_late+0x35/0xf0 [ 23.607008] ? __pfx_pci_pm_suspend_late+0x10/0x10 [ 23.611843] dpm_run_callback+0x78/0x1c0 [ 23.615817] device_suspend_late+0xde/0x2e0 [ 23.620037] async_suspend_late+0x18/0x30 [ 23.624082] async_run_entry_fn+0x25/0xa0 [ 23.628129] process_one_work+0x15b/0x380 [ 23.632182] worker_thread+0x2a5/0x3c0 [ 23.635973] ? __pfx_worker_thread+0x10/0x10 [ 23.640279] kthread+0xf6/0x1f0 [ 23.643464] ? __pfx_kthread+0x10/0x10 [ 23.647263] ? __pfx_kthread+0x10/0x10 [ 23.651045] ret_from_fork+0x131/0x190 [ 23.654837] ? __pfx_kthread+0x10/0x10 [ 23.658634] ret_from_fork_asm+0x1a/0x30 [ 23.662597] </TASK> [ 23.664826] Modules linked in: [ 23.667914] CR2: 0000000000000000 [ 23.671271] ------------[ cut here ]------------ Signed-off-by: Rahul Bukte <rahul.bukte@sony.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260203044839.1555147-1-suraj.kandpal@intel.com (cherry picked from commit daa199abc3d3d1740c9e3a2c3e9216ae5b447cad) Fixes: ff44ad51ebf8 ("drm/i915: Move engine->submit_request selection to a vfunc") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
4 daysdrm/radeon: apply state adjust rules to some additional HAINAN vairantsAlex Deucher
They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 87327658c848f56eac166cb382b57b83bf06c5ac) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: apply state adjust rules to some additional HAINAN vairantsAlex Deucher
They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0de31d92a173d3d94f28051b0b80a6c98913aed4) Cc: stable@vger.kernel.org
4 daysdrm/amdgpu: rework how we handle TLB fencesAlex Deucher
Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69c5fbd2b93b5ced77c6e79afe83371bca84c788) Cc: stable@vger.kernel.org
5 daysdrm/bridge: dw-hdmi-qp: fix multi-channel audio outputJonas Karlman
Channel Allocation (PB4) and Level Shift Information (PB5) are configured with values from PB1 and PB2 due to the wrong offset being used. This results in missing audio channels or incorrect speaker placement when playing multi-channel audio. Use the correct offset to fix multi-channel audio output. Fixes: fd0141d1a8a2 ("drm/bridge: synopsys: Add audio support for dw-hdmi-qp") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260228112822.4056354-1-christianshewitt@gmail.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
5 daysdrm: Fix use-after-free on framebuffers and property blobs when calling ↵Maarten Lankhorst
drm_dev_unplug When trying to do a rather aggressive test of igt's "xe_module_load --r reload" with a full desktop environment and game running I noticed a few OOPSes when dereferencing freed pointers, related to framebuffers and property blobs after the compositor exits. Solve this by guarding the freeing in drm_file with drm_dev_enter/exit, and immediately put the references from struct drm_file objects during drm_dev_unplug(). Related warnings for framebuffers on the subtest: [ 739.713076] ------------[ cut here ]------------ WARN_ON(!list_empty(&dev->mode_config.fb_list)) [ 739.713079] WARNING: drivers/gpu/drm/drm_mode_config.c:584 at drm_mode_config_cleanup+0x30b/0x320 [drm], CPU#12: xe_module_load/13145 .... [ 739.713328] Call Trace: [ 739.713330] <TASK> [ 739.713335] ? intel_pmdemand_destroy_state+0x11/0x20 [xe] [ 739.713574] ? intel_atomic_global_obj_cleanup+0xe4/0x1a0 [xe] [ 739.713794] intel_display_driver_remove_noirq+0x51/0xb0 [xe] [ 739.714041] xe_display_fini_early+0x33/0x50 [xe] [ 739.714284] devm_action_release+0xf/0x20 [ 739.714294] devres_release_all+0xad/0xf0 [ 739.714301] device_unbind_cleanup+0x12/0xa0 [ 739.714305] device_release_driver_internal+0x1b7/0x210 [ 739.714311] device_driver_detach+0x14/0x20 [ 739.714315] unbind_store+0xa6/0xb0 [ 739.714319] drv_attr_store+0x21/0x30 [ 739.714322] sysfs_kf_write+0x48/0x60 [ 739.714328] kernfs_fop_write_iter+0x16b/0x240 [ 739.714333] vfs_write+0x266/0x520 [ 739.714341] ksys_write+0x72/0xe0 [ 739.714345] __x64_sys_write+0x19/0x20 [ 739.714347] x64_sys_call+0xa15/0xa30 [ 739.714355] do_syscall_64+0xd8/0xab0 [ 739.714361] entry_SYSCALL_64_after_hwframe+0x4b/0x53 and [ 739.714459] ------------[ cut here ]------------ [ 739.714461] xe 0000:67:00.0: [drm] drm_WARN_ON(!list_empty(&fb->filp_head)) [ 739.714464] WARNING: drivers/gpu/drm/drm_framebuffer.c:833 at drm_framebuffer_free+0x6c/0x90 [drm], CPU#12: xe_module_load/13145 [ 739.714715] RIP: 0010:drm_framebuffer_free+0x7a/0x90 [drm] ... [ 739.714869] Call Trace: [ 739.714871] <TASK> [ 739.714876] drm_mode_config_cleanup+0x26a/0x320 [drm] [ 739.714998] ? __drm_printfn_seq_file+0x20/0x20 [drm] [ 739.715115] ? drm_mode_config_cleanup+0x207/0x320 [drm] [ 739.715235] intel_display_driver_remove_noirq+0x51/0xb0 [xe] [ 739.715576] xe_display_fini_early+0x33/0x50 [xe] [ 739.715821] devm_action_release+0xf/0x20 [ 739.715828] devres_release_all+0xad/0xf0 [ 739.715843] device_unbind_cleanup+0x12/0xa0 [ 739.715850] device_release_driver_internal+0x1b7/0x210 [ 739.715856] device_driver_detach+0x14/0x20 [ 739.715860] unbind_store+0xa6/0xb0 [ 739.715865] drv_attr_store+0x21/0x30 [ 739.715868] sysfs_kf_write+0x48/0x60 [ 739.715873] kernfs_fop_write_iter+0x16b/0x240 [ 739.715878] vfs_write+0x266/0x520 [ 739.715886] ksys_write+0x72/0xe0 [ 739.715890] __x64_sys_write+0x19/0x20 [ 739.715893] x64_sys_call+0xa15/0xa30 [ 739.715900] do_syscall_64+0xd8/0xab0 [ 739.715905] entry_SYSCALL_64_after_hwframe+0x4b/0x53 and then finally file close blows up: [ 743.186530] Oops: general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP [ 743.186535] CPU: 3 UID: 1000 PID: 3453 Comm: kwin_wayland Tainted: G W 7.0.0-rc1-valkyria+ #110 PREEMPT_{RT,(lazy)} [ 743.186537] Tainted: [W]=WARN [ 743.186538] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021 [ 743.186539] RIP: 0010:drm_framebuffer_cleanup+0x55/0xc0 [drm] [ 743.186588] Code: d8 72 73 0f b6 42 05 ff c3 39 c3 72 e8 49 8d bd 50 07 00 00 31 f6 e8 3a 80 d3 e1 49 8b 44 24 10 49 8d 7c 24 08 49 8b 54 24 08 <48> 3b 38 0f 85 95 7f 02 00 48 3b 7a 08 0f 85 8b 7f 02 00 48 89 42 [ 743.186589] RSP: 0018:ffffc900085e3cf8 EFLAGS: 00010202 [ 743.186591] RAX: dead000000000122 RBX: 0000000000000001 RCX: ffffffff8217ed03 [ 743.186592] RDX: dead000000000100 RSI: 0000000000000000 RDI: ffff88814675ba08 [ 743.186593] RBP: ffffc900085e3d10 R08: 0000000000000000 R09: 0000000000000000 [ 743.186593] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88814675ba00 [ 743.186594] R13: ffff88810d778000 R14: ffff888119f6dca0 R15: ffff88810c660bb0 [ 743.186595] FS: 00007ff377d21280(0000) GS:ffff888cec3f8000(0000) knlGS:0000000000000000 [ 743.186596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 743.186596] CR2: 000055690b55e000 CR3: 0000000113586003 CR4: 00000000003706f0 [ 743.186597] Call Trace: [ 743.186598] <TASK> [ 743.186603] intel_user_framebuffer_destroy+0x12/0x90 [xe] [ 743.186722] drm_framebuffer_free+0x3a/0x90 [drm] [ 743.186750] ? trace_hardirqs_on+0x5f/0x120 [ 743.186754] drm_mode_object_put+0x51/0x70 [drm] [ 743.186786] drm_fb_release+0x105/0x190 [drm] [ 743.186812] ? rt_mutex_slowunlock+0x3aa/0x410 [ 743.186817] ? rt_spin_lock+0xea/0x1b0 [ 743.186819] drm_file_free+0x1e0/0x2c0 [drm] [ 743.186843] drm_release_noglobal+0x91/0xf0 [drm] [ 743.186865] __fput+0x100/0x2e0 [ 743.186869] fput_close_sync+0x40/0xa0 [ 743.186870] __x64_sys_close+0x3e/0x80 [ 743.186873] x64_sys_call+0xa07/0xa30 [ 743.186879] do_syscall_64+0xd8/0xab0 [ 743.186881] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 743.186882] RIP: 0033:0x7ff37e567732 [ 743.186884] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 55 bf 01 00 [ 743.186885] RSP: 002b:00007ffc818169a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 743.186886] RAX: ffffffffffffffda RBX: 00007ffc81816a30 RCX: 00007ff37e567732 [ 743.186887] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012 [ 743.186888] RBP: 00007ffc818169d0 R08: 0000000000000000 R09: 0000000000000000 [ 743.186889] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d60a7996e0 [ 743.186889] R13: 00007ffc81816a90 R14: 00007ffc81816a90 R15: 000055d60a782a30 [ 743.186892] </TASK> [ 743.186893] Modules linked in: rfcomm snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_addrtype nft_compat x_tables nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep mtd_intel_dg snd_hda_codec_intelhdmi mtd snd_hda_codec_hdmi nls_utf8 mxm_wmi intel_wmi_thunderbolt gigabyte_wmi wmi_bmof xe drm_gpuvm drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video drm_suballoc_helper gpu_sched drm_client_lib drm_exec drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 i2c_mux snd i2c_smbus btusb btrtl btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse nfnetlink [ 743.186938] ---[ end trace 0000000000000000 ]--- And for property blobs: void drm_mode_config_cleanup(struct drm_device *dev) { ... list_for_each_entry_safe(blob, bt, &dev->mode_config.property_blob_list, head_global) { drm_property_blob_put(blob); } Resulting in: [ 371.072940] BUG: unable to handle page fault for address: 000001ffffffffff [ 371.072944] #PF: supervisor read access in kernel mode [ 371.072945] #PF: error_code(0x0000) - not-present page [ 371.072947] PGD 0 P4D 0 [ 371.072950] Oops: Oops: 0000 [#1] SMP [ 371.072953] CPU: 0 UID: 1000 PID: 3693 Comm: kwin_wayland Not tainted 7.0.0-rc1-valkyria+ #111 PREEMPT_{RT,(lazy)} [ 371.072956] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021 [ 371.072957] RIP: 0010:drm_property_destroy_user_blobs+0x3b/0x90 [drm] [ 371.073019] Code: 00 00 48 83 ec 10 48 8b 86 30 01 00 00 48 39 c3 74 59 48 89 c2 48 8d 48 c8 48 8b 00 4c 8d 60 c8 eb 04 4c 8d 60 c8 48 8b 71 40 <48> 39 16 0f 85 39 32 01 00 48 3b 50 08 0f 85 2f 32 01 00 48 89 70 [ 371.073021] RSP: 0018:ffffc90006a73de8 EFLAGS: 00010293 [ 371.073022] RAX: 000001ffffffffff RBX: ffff888118a1a930 RCX: ffff8881b92355c0 [ 371.073024] RDX: ffff8881b92355f8 RSI: 000001ffffffffff RDI: ffff888118be4000 [ 371.073025] RBP: ffffc90006a73e08 R08: ffff8881009b7300 R09: ffff888cecc5b000 [ 371.073026] R10: ffffc90006a73e90 R11: 0000000000000002 R12: 000001ffffffffc7 [ 371.073027] R13: ffff888118a1a980 R14: ffff88810b366d20 R15: ffff888118a1a970 [ 371.073028] FS: 00007f1faccbb280(0000) GS:ffff888cec2db000(0000) knlGS:0000000000000000 [ 371.073029] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 371.073030] CR2: 000001ffffffffff CR3: 000000010655c001 CR4: 00000000003706f0 [ 371.073031] Call Trace: [ 371.073033] <TASK> [ 371.073036] drm_file_free+0x1df/0x2a0 [drm] [ 371.073077] drm_release_noglobal+0x7a/0xe0 [drm] [ 371.073113] __fput+0xe2/0x2b0 [ 371.073118] fput_close_sync+0x40/0xa0 [ 371.073119] __x64_sys_close+0x3e/0x80 [ 371.073122] x64_sys_call+0xa07/0xa30 [ 371.073126] do_syscall_64+0xc0/0x840 [ 371.073130] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 371.073132] RIP: 0033:0x7f1fb3501732 [ 371.073133] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 55 bf 01 00 [ 371.073135] RSP: 002b:00007ffe8e6f0278 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 371.073136] RAX: ffffffffffffffda RBX: 00007ffe8e6f0300 RCX: 00007f1fb3501732 [ 371.073137] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012 [ 371.073138] RBP: 00007ffe8e6f02a0 R08: 0000000000000000 R09: 0000000000000000 [ 371.073139] R10: 0000000000000000 R11: 0000000000000246 R12: 00005585ba46eea0 [ 371.073140] R13: 00007ffe8e6f0360 R14: 00007ffe8e6f0360 R15: 00005585ba458a30 [ 371.073143] </TASK> [ 371.073144] Modules linked in: rfcomm snd_hrtimer xt_addrtype xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep snd_hda_codec_intelhdmi snd_hda_codec_hdmi mtd_intel_dg mtd nls_utf8 wmi_bmof mxm_wmi gigabyte_wmi intel_wmi_thunderbolt xe drm_gpuvm drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video drm_suballoc_helper gpu_sched drm_client_lib drm_exec drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 btusb i2c_mux i2c_smbus btrtl snd btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse nfnetlink [ 371.073198] CR2: 000001ffffffffff [ 371.073199] ---[ end trace 0000000000000000 ]--- Add a guard around file close, and ensure the warnings from drm_mode_config do not trigger. Fix those by allowing an open reference to the file descriptor and cleaning up the file linked list entry in drm_mode_config_cleanup(). Cc: <stable@vger.kernel.org> # v4.18+ Fixes: bee330f3d672 ("drm: Use srcu to protect drm_device.unplugged") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260313151728.14990-4-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
5 daysdrm/amdgpu: Fix ISP segfault issue in kernel v7.0Pratap Nirujogi
Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c51632d1ed7ac5aed2d40dbc0718d75342c12c6a)
5 daysdrm/amdgpu/gmc9.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e14d468304832bcc4a082d95849bc0a41b18ddea) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub4.2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dea5f235baf3786bfd4fd920b03c19285fdc3d9f) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub4.1.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 04f063d85090f5dd0c671010ce88ee49d9dcc8ed) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f14f27bbe2a3ed7af32d5f6eaf3f417139f45253) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0.2: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1441f52c7f6ae6553664aa9e3e4562f6fc2fe8ea) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub3.0.1: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5f76083183363c4528a4aaa593f5d38c28fe7d7b) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub2.3: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89cd90375c19fb45138990b70e9f4ba4806f05c4) Cc: stable@vger.kernel.org
5 daysdrm/amdgpu/mmhub2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e064cef4b53552602bb6ac90399c18f662f3cacd) Cc: stable@vger.kernel.org
5 daysdrm/amd: fix dcn 2.01 checkAndy Nguyen
The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: 54b822b3eac3 ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9c7be0efa6f0daa949a5f3e3fdf9ea090b0713cb)
5 daysdrm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()Srinivasan Shanmugam
parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid || !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: a638b837d0e6 ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c7e6342e98c846b259c57273436fdea4c043f2)
5 daysdrm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END}Xi Ruoyao
[Why] The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is compiled with CC_FLAGS_FPU into FP instructions. So when we call it we must use DC_FP_{START,END} to save and restore the FP context, and prepare the FP unit on architectures like LoongArch where the FP unit isn't always on. Reported-by: LiarOnce <liaronce@hotmail.com> Fixes: ee7be8f3de1c ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25bb1d54ba3983c064361033a8ec15474fece37e) Cc: stable@vger.kernel.org
5 daysdrm/amd/display: Fix uninitialized variable use which breaks full LTOCalvin Owens
Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") introduced a use of an uninitialized stack variable in dm_dmub_sw_init() (region_params.bss_data_size). Interestingly, this seems to cause no issue on normal kernels. But when full LTO is enabled, it causes the compiler to "optimize" out huge swaths of amdgpu initialization code, and the driver is unusable: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00 amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init It surprises me that neither gcc nor clang emit a warning about this: I only found it by bisecting the LTO breakage. Fix by using the bss_data_size field from fw_meta_info_params, as was presumably intended. Fixes: e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7f1402f6ad24cc6b9a01fa09ebd1c6559d787d0)
5 daysdrm/amdgpu: Limit BO list entry count to prevent resource exhaustionJesse.Zhang
Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 688b87d39e0aa8135105b40dc167d74b5ada5332) Cc: stable@vger.kernel.org
5 daysdrm/amd/display: Fix gamma 2.2 colorop TFsAlex Hung
Use GAMMA22 for degamma/blend and GAMMA22_INV for shaper so curves match the color pipeline. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5016 Tested-by: Xaver Hugl <xaver.hugl@kde.org> Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8f9f42effd767ffa7bbcd7e05fbd6b20737e468)
5 daysdrm/pagemap_util: Ensure proper cache lock management on freeJonathan Cavitt
For the sake of consistency, ensure that the cache lock is always unlocked after drm_pagemap_cache_fini. Spinlocks typically disable preemption and if the code-path missing the unlock is hit, preemption will remain disabled even if the lock is subsequently freed. Fixes static analysis issue. v2: - Use requested code flow (Maarten) v3: - Clear cache->dpagemap (Matt Brost, Maarten) v4: - Reword commit message (Thomas) Fixes: 77f14f2f2d73f ("drm/pagemap: Add a drm_pagemap cache and shrinker") Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260316151555.7553-2-jonathan.cavitt@intel.com
5 daysdrm/imagination: Disable interrupts before suspending the GPUAlessio Belle
This is an additional safety layer to ensure no accesses to the GPU registers can be made while it is powered off. While we can disable IRQ generation from GPU, META firmware, MIPS firmware and for safety events, we cannot do the same for the RISC-V firmware. To keep a unified approach, once the firmware has completed its power off sequence, disable IRQs for the while GPU at the kernel level instead. Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-2-bf4f9ed68e75@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
5 daysdrm/imagination: Synchronize interrupts before suspending the GPUAlessio Belle
The runtime PM suspend callback doesn't know whether the IRQ handler is in progress on a different CPU core and doesn't wait for it to finish. Depending on timing, the IRQ handler could be running while the GPU is suspended, leading to kernel crashes when trying to access GPU registers. See example signature below. In a power off sequence initiated by the runtime PM suspend callback, wait for any IRQ handlers in progress on other CPU cores to finish, by calling synchronize_irq(). At the same time, remove the runtime PM resume/put calls in the threaded IRQ handler. On top of not being the right approach to begin with, and being at the wrong place as they should have wrapped all GPU register accesses, the driver would hit a deadlock between synchronize_irq() being called from a runtime PM suspend callback, holding the device power lock, and the resume callback requiring the same. Example crash signature on a TI AM68 SK platform: [ 337.241218] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError [ 337.241239] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241246] Tainted: [M]=MACHINE_CHECK [ 337.241249] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241252] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 337.241256] pc : pvr_riscv_irq_pending+0xc/0x24 [ 337.241277] lr : pvr_device_irq_thread_handler+0x64/0x310 [ 337.241282] sp : ffff800085b0bd30 [ 337.241284] x29: ffff800085b0bd50 x28: ffff0008070d9eab x27: ffff800083a5ce10 [ 337.241291] x26: ffff000806e48f80 x25: ffff0008070d9eac x24: 0000000000000000 [ 337.241296] x23: ffff0008068e9bf0 x22: ffff0008068e9bd0 x21: ffff800085b0bd30 [ 337.241301] x20: ffff0008070d9e00 x19: ffff0008068e9000 x18: 0000000000000001 [ 337.241305] x17: 637365645f656c70 x16: 0000000000000000 x15: ffff000b7df9ff40 [ 337.241310] x14: 0000a585fe3c0d0e x13: 000000999704f060 x12: 000000000002771a [ 337.241314] x11: 00000000000000c0 x10: 0000000000000af0 x9 : ffff800085b0bd00 [ 337.241318] x8 : ffff0008071175d0 x7 : 000000000000b955 x6 : 0000000000000003 [ 337.241323] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000 [ 337.241327] x2 : ffff800080e39d20 x1 : ffff800080e3fc48 x0 : 0000000000000000 [ 337.241333] Kernel panic - not syncing: Asynchronous SError Interrupt [ 337.241337] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241342] Tainted: [M]=MACHINE_CHECK [ 337.241343] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241345] Call trace: [ 337.241348] show_stack+0x18/0x24 (C) [ 337.241357] dump_stack_lvl+0x60/0x80 [ 337.241364] dump_stack+0x18/0x24 [ 337.241368] vpanic+0x124/0x2ec [ 337.241373] abort+0x0/0x4 [ 337.241377] add_taint+0x0/0xbc [ 337.241384] arm64_serror_panic+0x70/0x80 [ 337.241389] do_serror+0x3c/0x74 [ 337.241392] el1h_64_error_handler+0x30/0x48 [ 337.241400] el1h_64_error+0x6c/0x70 [ 337.241404] pvr_riscv_irq_pending+0xc/0x24 (P) [ 337.241410] irq_thread_fn+0x2c/0xb0 [ 337.241416] irq_thread+0x170/0x334 [ 337.241421] kthread+0x12c/0x210 [ 337.241428] ret_from_fork+0x10/0x20 [ 337.241434] SMP: stopping secondary CPUs [ 337.241451] Kernel Offset: disabled [ 337.241453] CPU features: 0x040000,02002800,20002001,0400421b [ 337.241456] Memory Limit: none [ 337.457921] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]--- Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Fixes: 96822d38ff57 ("drm/imagination: Handle Rogue safety event IRQs") Cc: stable@vger.kernel.org # see patch description, needs adjustments for < 6.16 Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-1-bf4f9ed68e75@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
5 daysdrm/imagination: Fix deadlock in soft reset sequenceAlessio Belle
The soft reset sequence is currently executed from the threaded IRQ handler, hence it cannot call disable_irq() which internally waits for IRQ handlers, i.e. itself, to complete. Use disable_irq_nosync() during a soft reset instead. Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Cc: stable@vger.kernel.org Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260309-fix-soft-reset-v1-1-121113be554f@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
5 daysdrm/i915/psr: Compute PSR entry_setup_frames into intel_crtc_stateJouni Högander
PSR entry_setup_frames is currently computed directly into struct intel_dp:intel_psr:entry_setup_frames. This causes a problem if mode change gets rejected after PSR compute config: Psr_entry_setup_frames computed for this rejected state is in intel_dp:intel_psr:entry_setup_frame. Fix this by computing it into intel_crtc_state and copy the value into intel_dp:intel_psr:entry_setup_frames on PSR enable. Fixes: 2b981d57e480 ("drm/i915/display: Support PSR entry VSC packet to be transmitted one frame earlier") Cc: Mika Kahola <mika.kahola@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260312083710.1593781-3-jouni.hogander@intel.com (cherry picked from commit 8c229b4aa00262c13787982e998c61c0783285e0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
5 daysdrm/i915/psr: Disable PSR on update_m_n and update_lrrJouni Högander
PSR/PR parameters might change based on update_m_n or update_lrr. Disable on update_m_n and update_lrr to ensure proper parameters are taken into use on next PSR enable in intel_psr_post_plane_update. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15771 Fixes: 2bc98c6f97af ("drm/i915/alpm: Compute ALPM parameters into crtc_state->alpm_state") Cc: <stable@vger.kernel.org> # v6.19+ Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Link: https://patch.msgid.link/20260312083710.1593781-2-jouni.hogander@intel.com (cherry picked from commit 65852b56bfa929f99e28c96fd98b02058959da7f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
6 daysdrm/vmwgfx: Don't overwrite KMS surface dirty trackerIan Forbes
We were overwriting the surface's dirty tracker here causing a memory leak. Reported-by: Mika Penttilä <mpenttil@redhat.com> Closes: https://lore.kernel.org/dri-devel/8c53f3c6-c6de-46fe-a8ca-d98dd52b3abe@redhat.com/ Fixes: 965544150d1c ("drm/vmwgfx: Refactor cursor handling") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20260302200330.66763-1-ian.forbes@broadcom.com
6 daysdrm/vmwgfx: fix kernel-doc warnings in vmwgfx_drv.hRandy Dunlap
Fix 45+ kernel-doc warnings in vmwgfx_drv.h: - spell a struct name correctly - don't have structs between kernel-doc and its struct - end description of struct members with ':' - start all kernel-doc lines with " *" - mark private struct member and enum value with "private:" - add kernel-doc for enum vmw_dma_map_mode - add missing struct member comments - add missing function parameter comments - convert "/**" to "/*" for non-kernel-doc comments - add missing "Returns:" comments for several functions - correct a function parameter name to eliminate kernel-doc warnings (examples): Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:128 struct vmw_bo; error: Cannot parse struct or union! Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:151 struct member 'used_prio' not described in 'vmw_resource' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:151 struct member 'mob_node' not described in 'vmw_resource' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:199 bad line: SM4 device. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:270 struct member 'private' not described in 'vmw_res_cache_entry' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:280 Enum value 'vmw_dma_alloc_coherent' not described in enum 'vmw_dma_map_mode' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:280 Enum value 'vmw_dma_map_bind' not described in enum 'vmw_dma_map_mode' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:295 struct member 'addrs' not described in 'vmw_sg_table' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:295 struct member 'mode' not described in 'vmw_sg_table' vmwgfx_drv.h:309: warning: Excess struct member 'num_regions' description in 'vmw_sg_table' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:402 struct member 'filp' not described in 'vmw_sw_context' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:732 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:742 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:762 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:887 No description found for return value of 'vmw_fifo_caps' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:901 No description found for return value of 'vmw_is_cursor_bypass3_enabled' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:906 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:961 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:996 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1082 cannot understand function prototype: 'const struct dma_buf_ops vmw_prime_dmabuf_ops;' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1303 struct member 'do_cpy' not described in 'vmw_diff_cpy' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1385 function parameter 'fmt' not described in 'VMW_DEBUG_KMS' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1389 This comment starts with '/**', but isn't a kernel-doc comment. Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1426 function parameter 'vmw' not described in 'vmw_fifo_mem_read' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1426 No description found for return value of 'vmw_fifo_mem_read' Warning: drivers/gpu/drm/vmwgfx/vmwgfx_drv.h:1441 function parameter 'fifo_reg' not described in 'vmw_fifo_mem_write' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patch.msgid.link/20260219215548.470810-1-rdunlap@infradead.org
6 daysdrm/i915/dmc: Fix an unlikely NULL pointer deference at probeImre Deak
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL. That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init(). However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before. It is also strange to have DC6 enabled at boot, since that would require the DMC firmware (loaded by BIOS); the BIOS loading the DMC firmware and the driver stopping / reprogramming the firmware is a poorly specified sequence and as such unlikely an intentional BIOS behaviour. It's more likely that BIOS is leaving an unintentionally enabled DC6 HW state behind (without actually loading the required DMC firmware for this). The tracking of the DC6 allowed counter only works if starting / stopping the counter depends on the _SW_ DC6 state vs. the current _HW_ DC6 state (since stopping the counter requires the DC5 counter captured when the counter was started). Thus, using the HW DC6 state is incorrect and it also leads to the above oops. Fix both issues by using the SW DC6 state for the tracking. This is v2 of the fix originally sent by Jani, updated based on the first Link: discussion below. Link: https://lore.kernel.org/all/3626411dc9e556452c432d0919821b76d9991217@intel.com Link: https://lore.kernel.org/all/20260228130946.50919-2-ltao@redhat.com Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem <mohammed.thasleem@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Tao Liu <ltao@redhat.com> Cc: <stable@vger.kernel.org> # v6.16+ Tested-by: Tao Liu <ltao@redhat.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20260309164803.1918158-1-imre.deak@intel.com (cherry picked from commit 2344b93af8eb5da5d496b4e0529d35f0f559eaf0) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
9 daysMerge tag 'amd-drm-fixes-7.0-2026-03-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.0-2026-03-12: amdgpu: - SMU13 fix - SMU14 fix - Fixes for bringup hw testing - Kerneldoc fix - GC12 idle power fix for compute workloads - DCCG fixes amdkfd: - Fix missing BO unreserve in an error path Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260312180351.3874990-1-alexander.deucher@amd.com
9 daysMerge tag 'drm-intel-fixes-2026-03-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Avoid hang when configuring VRR [icl] (Ville Syrjälä) - Fix sg_table overflow with >4GB folios (Janusz Krzysztofik) - Fix PSR Selective Update handling [psr] (Jouni Högander) - Fix eDP ALPM read-out sequence [dp] (Arun R Murthy) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/abJ_MQ7o-5ghyaNW@linux
9 daysMerge tag 'drm-misc-fixes-2026-03-12' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes A pixel byte swap fix for st7586, a null pointer dereference fix for gud, two timings fixes for ti-sn65dsi83, an initialization fix for ivpu, and a runtime suspend deadlock fix for amdxdna. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260312-accurate-ambrosial-trout-bfabf8@houat
10 daysMerge tag 'drm-msm-fixes-2026-03-06' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes Fixes for v7.0: Core: - Adjusted msm_iommu_pagetable_prealloc_allocate() allocation type DPU: - Fixed blue screens on Hamoa laptops by reverting the LM reservation - Fixed the size of the LM block on several platforms - Dropped usage of %pK (again) - Fixed smatch warning on SSPP v13+ code - Fixed INTF_6 interrupts on Lemans DSI: - Fixed DSI PHY revision on Kaanapali - Fixed pixel clock calculation for the bonded DSI mode panels with compression enabled DT bindings: - Fixed DisplayPort description on Glymur - Fixed model name in SM8750 MDSS schema GPU: - Added MODULE_DEVICE_TABLE to the GPU driver - Fix bogus protect error on X2-85 - Fix dma_free_attrs() buffer size - Gen8 UBWC fix for Glymur From: Rob Clark <rob.clark@oss.qualcomm.com> Link: https://patch.msgid.link/CACSVV00wZ95gFDLfzJ0Ywb8rsjPSjZ1aHdwE4smnyuZ=Fg-g8Q@mail.gmail.com Signed-off-by: Dave Airlie <airlied@redhat.com>
11 daysdrm/amd: Set num IP blocks to 0 if discovery failsMario Limonciello
If discovery has failed for any reason (such as no support for a block) then there is no need to unwind all the IP blocks in fini. In this condition there can actually be failures during the unwind too. Reset num_ip_blocks to zero during failure path and skip the unnecessary cleanup path. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fae5984296b981c8cc3acca35b701c1f332a6cd8) Cc: stable@vger.kernel.org
11 daysdrm/amdkfd: Unreserve bo if queue update failedPhilip Yang
Error handling path should unreserve bo then return failed. Fixes: 305cd109b761 ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c24afed7de9ecce341825d8ab55a43a254348b33)
11 daysdrm/amd/display: Check for S0i3 to be done before DCCG init on DCN21Ivan Lipski
[WHY] On DCN21, dccg2_init() is called in dcn10_init_hw() before bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264, causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden init. Eventually during the resume sequence, a flip timeout occurs. [HOW] Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21. Fixes: 4c595e75110e ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c61eda434336cf2c033aa35efdc9a08b31d2fdfa)