summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2026-03-17hwmon: (pmbus/hac300s) Add error check for pmbus_read_word_data() return valueSanman Pradhan
hac300s_read_word_data() passes the return value of pmbus_read_word_data() directly to FIELD_GET() without checking for errors. If the I2C transaction fails, a negative error code is sign-extended and passed to FIELD_GET(), which silently produces garbage data instead of propagating the error. Add the missing error check before using the return value in the FIELD_GET() macro. Fixes: 669cf162f7a1 ("hwmon: Add support for HiTRON HAC300S PSU") Cc: stable@vger.kernel.org Signed-off-by: Sanman Pradhan <psanman@juniper.net> Link: https://lore.kernel.org/r/20260317173308.382545-2-sanman.pradhan@hpe.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2026-03-17drm/radeon: apply state adjust rules to some additional HAINAN vairantsAlex Deucher
They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 87327658c848f56eac166cb382b57b83bf06c5ac) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu: apply state adjust rules to some additional HAINAN vairantsAlex Deucher
They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0de31d92a173d3d94f28051b0b80a6c98913aed4) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu: rework how we handle TLB fencesAlex Deucher
Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: f3854e04b708 ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69c5fbd2b93b5ced77c6e79afe83371bca84c788) Cc: stable@vger.kernel.org
2026-03-17libie: prevent memleak in fwlog codeMichal Swiatkowski
All cmd_buf buffers are allocated and need to be freed after usage. Add an error unwinding path that properly frees these buffers. The memory leak happens whenever fwlog configuration is changed. For example: $echo 256K > /sys/kernel/debug/ixgbe/0000\:32\:00.0/fwlog/log_size Fixes: 96a9a9341cda ("ice: configure FW logging") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-03-17Merge tag 'hid-for-linus-2026031701' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - various fixes dealing with (intentionally) broken devices in HID core, logitech-hidpp and multitouch drivers (Lee Jones) - fix for OOB in wacom driver (Benoît Sevens) - fix for potentialy HID-bpf-induced buffer overflow in () (Benjamin Tissoires) - various other small fixes and device ID / quirk additions * tag 'hid-for-linus-2026031701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: multitouch: Check to ensure report responses match the request HID: logitech-hidpp: Prevent use-after-free on force feedback initialisation failure HID: bpf: prevent buffer overflow in hid_hw_request selftests/hid: fix compilation when bpf_wq and hid_device are not exported HID: core: Mitigate potential OOB by removing bogus memset() HID: intel-thc-hid: Set HID_PHYS with PCI BDF HID: appletb-kbd: add .resume method in PM HID: logitech-hidpp: Enable MX Master 4 over bluetooth HID: input: Add HID_BATTERY_QUIRK_DYNAMIC for Elan touchscreens HID: input: Drop Asus UX550* touchscreen ignore battery quirks HID: asus: add xg mobile 2022 external hardware support HID: wacom: fix out-of-bounds read in wacom_intuos_bt_irq
2026-03-17iavf: fix VLAN filter lost on add/delete racePetr Oros
When iavf_add_vlan() finds an existing filter in IAVF_VLAN_REMOVE state, it transitions the filter to IAVF_VLAN_ACTIVE assuming the pending delete can simply be cancelled. However, there is no guarantee that iavf_del_vlans() has not already processed the delete AQ request and removed the filter from the PF. In that case the filter remains in the driver's list as IAVF_VLAN_ACTIVE but is no longer programmed on the NIC. Since iavf_add_vlans() only picks up filters in IAVF_VLAN_ADD state, the filter is never re-added, and spoof checking drops all traffic for that VLAN. CPU0 CPU1 Workqueue ---- ---- --------- iavf_del_vlan(vlan 100) f->state = REMOVE schedule AQ_DEL_VLAN iavf_add_vlan(vlan 100) f->state = ACTIVE iavf_del_vlans() f is ACTIVE, skip iavf_add_vlans() f is ACTIVE, skip Filter is ACTIVE in driver but absent from NIC. Transition to IAVF_VLAN_ADD instead and schedule IAVF_FLAG_AQ_ADD_VLAN_FILTER so iavf_add_vlans() re-programs the filter. A duplicate add is idempotent on the PF. Fixes: 0c0da0e95105 ("iavf: refactor VLAN filter states") Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-03-17igc: fix page fault in XDP TX timestamps handlingZdenek Bouska
If an XDP application that requested TX timestamping is shutting down while the link of the interface in use is still up the following kernel splat is reported: [ 883.803618] [ T1554] BUG: unable to handle page fault for address: ffffcfb6200fd008 ... [ 883.803650] [ T1554] Call Trace: [ 883.803652] [ T1554] <TASK> [ 883.803654] [ T1554] igc_ptp_tx_tstamp_event+0xdf/0x160 [igc] [ 883.803660] [ T1554] igc_tsync_interrupt+0x2d5/0x300 [igc] ... During shutdown of the TX ring the xsk_meta pointers are left behind, so that the IRQ handler is trying to touch them. This issue is now being fixed by cleaning up the stale xsk meta data on TX shutdown. TX timestamps on other queues remain unaffected. Fixes: 15fd021bc427 ("igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet") Signed-off-by: Zdenek Bouska <zdenek.bouska@siemens.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-03-17igc: fix missing update of skb->tail in igc_xmit_frame()Kohei Enju
igc_xmit_frame() misses updating skb->tail when the packet size is shorter than the minimum one. Use skb_put_padto() in alignment with other Intel Ethernet drivers. Fixes: 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers") Signed-off-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Avigail Dahan <avigailx.dahan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2026-03-17ACPICA: Update the format of Arg3 of _DSMSaket Dumbre
To get rid of type incompatibility warnings in Linux. Fixes: 81f92cff6d42 ("ACPICA: ACPI_TYPE_ANY does not include the package type") Link: https://github.com/acpica/acpica/commit/4fb74872dcec Signed-off-by: Saket Dumbre <saket.dumbre@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/12856643.O9o76ZdvQC@rafael.j.wysocki
2026-03-17driver core: platform: use generic driver_override infrastructureDanilo Krummrich
When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 3d713e0e382e ("driver core: platform: add device binding path 'driver_override'") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260303115720.48783-5-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-03-17hwmon: axi-fan: don't use driver_override as IRQ nameDanilo Krummrich
Do not use driver_override as IRQ name, as it is not guaranteed to point to a valid string; use NULL instead (which makes the devm IRQ helpers use dev_name()). Fixes: 8412b410fa5e ("hwmon: Support ADI Fan Control IP") Reviewed-by: Nuno Sá <nuno.sa@analog.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260303115720.48783-4-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-03-17driver core: generalize driver_override in struct deviceDanilo Krummrich
Currently, there are 12 busses (including platform and PCI) that duplicate the driver_override logic for their individual devices. All of them seem to be prone to the bug described in [1]. While this could be solved for every bus individually using a separate lock, solving this in the driver-core generically results in less (and cleaner) changes overall. Thus, move driver_override to struct device, provide corresponding accessors for busses and handle locking with a separate lock internally. In particular, add device_set_driver_override(), device_has_driver_override(), device_match_driver_override() and generalize the sysfs store() and show() callbacks via a driver_override feature flag in struct bus_type. Until all busses have migrated, keep driver_set_override() in place. Note that we can't use the device lock for the reasons described in [2]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220789 [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [2] Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Co-developed-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260303115720.48783-2-dakr@kernel.org [ Use dev->bus instead of sp->bus for consistency; fix commit message to refer to the struct bus_type's driver_override feature flag. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-03-17RDMA/efa: Fix possible deadlockEthan Tidmore
In the error path for efa_com_alloc_comp_ctx() the semaphore assigned to &aq->avail_cmds is not released. Detected by Smatch: drivers/infiniband/hw/efa/efa_com.c:662 efa_com_cmd_exec() warn: inconsistent returns '&aq->avail_cmds' Add release for &aq->avail_cmds in efa_com_alloc_comp_ctx() error path. Fixes: ef3b06742c8a2 ("RDMA/efa: Fix use of completion ctx after free") Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Link: https://patch.msgid.link/20260314045730.1143862-1-ethantidmore06@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ pathChuck Lever
When IOVA-based DMA mapping is unavailable (e.g., IOMMU passthrough mode), rdma_rw_ctx_init_bvec() falls back to checking rdma_rw_io_needs_mr() with the raw bvec count. Unlike the scatterlist path in rdma_rw_ctx_init(), which passes a post-DMA-mapping entry count that reflects coalescing of physically contiguous pages, the bvec path passes the pre-mapping page count. This overstates the number of DMA entries, causing every multi-bvec RDMA READ to consume an MR from the QP's pool. Under NFS WRITE workloads the server performs RDMA READs to pull data from the client. With the inflated MR demand, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. svcrdma treats this as a DMA mapping failure, closes the connection, and the client reconnects -- producing a cycle of 71% RPC retransmissions and ~100 reconnections per test run. RDMA WRITEs (NFS READ direction) are unaffected because DMA_TO_DEVICE never triggers the max_sgl_rd check. Remove the rdma_rw_io_needs_mr() gate from the bvec path entirely, so that bvec RDMA operations always use the map_wrs path (direct WR posting without MR allocation). The bvec caller has no post-DMA-coalescing segment count available -- xdr_buf and svc_rqst hold pages as individual pointers, and physical contiguity is discovered only during DMA mapping -- so the raw page count cannot serve as a reliable input to rdma_rw_io_needs_mr(). iWARP devices, which require MRs unconditionally, are handled by an earlier check in rdma_rw_ctx_init_bvec() and are unaffected. Fixes: bea28ac14cab ("RDMA/core: add MR support for bvec-based RDMA operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-3-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/rw: Fall back to direct SGE on MR pool exhaustionChuck Lever
When IOMMU passthrough mode is active, ib_dma_map_sgtable_attrs() produces no coalescing: each scatterlist page maps 1:1 to a DMA entry, so sgt.nents equals the raw page count. A 1 MB transfer yields 256 DMA entries. If that count exceeds the device's max_sgl_rd threshold (an optimization hint from mlx5 firmware), rdma_rw_io_needs_mr() steers the operation into the MR registration path. Each such operation consumes one or more MRs from a pool sized at max_rdma_ctxs -- roughly one MR per concurrent context. Under write-intensive workloads that issue many concurrent RDMA READs, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. Upper layer protocols treat this as a fatal DMA mapping failure and tear down the connection. The max_sgl_rd check is a performance optimization, not a correctness requirement: the device can handle large SGE counts via direct posting, just less efficiently than with MR registration. When the MR pool cannot satisfy a request, falling back to the direct SGE (map_wrs) path avoids the connection reset while preserving the MR optimization for the common case where pool resources are available. Add a fallback in rdma_rw_ctx_init() so that -EAGAIN from rdma_rw_init_mr_wrs() triggers direct SGE posting instead of propagating the error. iWARP devices, which mandate MR registration for RDMA READs, and force_mr debug mode continue to treat -EAGAIN as terminal. Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-2-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17drm/bridge: dw-hdmi-qp: fix multi-channel audio outputJonas Karlman
Channel Allocation (PB4) and Level Shift Information (PB5) are configured with values from PB1 and PB2 due to the wrong offset being used. This results in missing audio channels or incorrect speaker placement when playing multi-channel audio. Use the correct offset to fix multi-channel audio output. Fixes: fd0141d1a8a2 ("drm/bridge: synopsys: Add audio support for dw-hdmi-qp") Reported-by: Christian Hewitt <christianshewitt@gmail.com> Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> Reviewed-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20260228112822.4056354-1-christianshewitt@gmail.com Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
2026-03-17spi: geni-qcom: Check DMA interrupts early in ISRPraveen Talari
The current interrupt handler only checks the GENI main IRQ status (m_irq) before deciding to return IRQ_NONE. This can lead to spurious IRQ_NONE returns when DMA interrupts are pending but m_irq is zero. Move the DMA TX/RX status register reads to the beginning of the ISR, right after reading m_irq. Update the early return condition to check all three status registers (m_irq, dma_tx_status, dma_rx_status) before returning IRQ_NONE. Signed-off-by: Praveen Talari <praveen.talari@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://patch.msgid.link/20260313-spi-geni-qcom-fix-dma-irq-handling-v1-1-0bd122589e02@oss.qualcomm.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-03-17drm: Fix use-after-free on framebuffers and property blobs when calling ↵Maarten Lankhorst
drm_dev_unplug When trying to do a rather aggressive test of igt's "xe_module_load --r reload" with a full desktop environment and game running I noticed a few OOPSes when dereferencing freed pointers, related to framebuffers and property blobs after the compositor exits. Solve this by guarding the freeing in drm_file with drm_dev_enter/exit, and immediately put the references from struct drm_file objects during drm_dev_unplug(). Related warnings for framebuffers on the subtest: [ 739.713076] ------------[ cut here ]------------ WARN_ON(!list_empty(&dev->mode_config.fb_list)) [ 739.713079] WARNING: drivers/gpu/drm/drm_mode_config.c:584 at drm_mode_config_cleanup+0x30b/0x320 [drm], CPU#12: xe_module_load/13145 .... [ 739.713328] Call Trace: [ 739.713330] <TASK> [ 739.713335] ? intel_pmdemand_destroy_state+0x11/0x20 [xe] [ 739.713574] ? intel_atomic_global_obj_cleanup+0xe4/0x1a0 [xe] [ 739.713794] intel_display_driver_remove_noirq+0x51/0xb0 [xe] [ 739.714041] xe_display_fini_early+0x33/0x50 [xe] [ 739.714284] devm_action_release+0xf/0x20 [ 739.714294] devres_release_all+0xad/0xf0 [ 739.714301] device_unbind_cleanup+0x12/0xa0 [ 739.714305] device_release_driver_internal+0x1b7/0x210 [ 739.714311] device_driver_detach+0x14/0x20 [ 739.714315] unbind_store+0xa6/0xb0 [ 739.714319] drv_attr_store+0x21/0x30 [ 739.714322] sysfs_kf_write+0x48/0x60 [ 739.714328] kernfs_fop_write_iter+0x16b/0x240 [ 739.714333] vfs_write+0x266/0x520 [ 739.714341] ksys_write+0x72/0xe0 [ 739.714345] __x64_sys_write+0x19/0x20 [ 739.714347] x64_sys_call+0xa15/0xa30 [ 739.714355] do_syscall_64+0xd8/0xab0 [ 739.714361] entry_SYSCALL_64_after_hwframe+0x4b/0x53 and [ 739.714459] ------------[ cut here ]------------ [ 739.714461] xe 0000:67:00.0: [drm] drm_WARN_ON(!list_empty(&fb->filp_head)) [ 739.714464] WARNING: drivers/gpu/drm/drm_framebuffer.c:833 at drm_framebuffer_free+0x6c/0x90 [drm], CPU#12: xe_module_load/13145 [ 739.714715] RIP: 0010:drm_framebuffer_free+0x7a/0x90 [drm] ... [ 739.714869] Call Trace: [ 739.714871] <TASK> [ 739.714876] drm_mode_config_cleanup+0x26a/0x320 [drm] [ 739.714998] ? __drm_printfn_seq_file+0x20/0x20 [drm] [ 739.715115] ? drm_mode_config_cleanup+0x207/0x320 [drm] [ 739.715235] intel_display_driver_remove_noirq+0x51/0xb0 [xe] [ 739.715576] xe_display_fini_early+0x33/0x50 [xe] [ 739.715821] devm_action_release+0xf/0x20 [ 739.715828] devres_release_all+0xad/0xf0 [ 739.715843] device_unbind_cleanup+0x12/0xa0 [ 739.715850] device_release_driver_internal+0x1b7/0x210 [ 739.715856] device_driver_detach+0x14/0x20 [ 739.715860] unbind_store+0xa6/0xb0 [ 739.715865] drv_attr_store+0x21/0x30 [ 739.715868] sysfs_kf_write+0x48/0x60 [ 739.715873] kernfs_fop_write_iter+0x16b/0x240 [ 739.715878] vfs_write+0x266/0x520 [ 739.715886] ksys_write+0x72/0xe0 [ 739.715890] __x64_sys_write+0x19/0x20 [ 739.715893] x64_sys_call+0xa15/0xa30 [ 739.715900] do_syscall_64+0xd8/0xab0 [ 739.715905] entry_SYSCALL_64_after_hwframe+0x4b/0x53 and then finally file close blows up: [ 743.186530] Oops: general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP [ 743.186535] CPU: 3 UID: 1000 PID: 3453 Comm: kwin_wayland Tainted: G W 7.0.0-rc1-valkyria+ #110 PREEMPT_{RT,(lazy)} [ 743.186537] Tainted: [W]=WARN [ 743.186538] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021 [ 743.186539] RIP: 0010:drm_framebuffer_cleanup+0x55/0xc0 [drm] [ 743.186588] Code: d8 72 73 0f b6 42 05 ff c3 39 c3 72 e8 49 8d bd 50 07 00 00 31 f6 e8 3a 80 d3 e1 49 8b 44 24 10 49 8d 7c 24 08 49 8b 54 24 08 <48> 3b 38 0f 85 95 7f 02 00 48 3b 7a 08 0f 85 8b 7f 02 00 48 89 42 [ 743.186589] RSP: 0018:ffffc900085e3cf8 EFLAGS: 00010202 [ 743.186591] RAX: dead000000000122 RBX: 0000000000000001 RCX: ffffffff8217ed03 [ 743.186592] RDX: dead000000000100 RSI: 0000000000000000 RDI: ffff88814675ba08 [ 743.186593] RBP: ffffc900085e3d10 R08: 0000000000000000 R09: 0000000000000000 [ 743.186593] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88814675ba00 [ 743.186594] R13: ffff88810d778000 R14: ffff888119f6dca0 R15: ffff88810c660bb0 [ 743.186595] FS: 00007ff377d21280(0000) GS:ffff888cec3f8000(0000) knlGS:0000000000000000 [ 743.186596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 743.186596] CR2: 000055690b55e000 CR3: 0000000113586003 CR4: 00000000003706f0 [ 743.186597] Call Trace: [ 743.186598] <TASK> [ 743.186603] intel_user_framebuffer_destroy+0x12/0x90 [xe] [ 743.186722] drm_framebuffer_free+0x3a/0x90 [drm] [ 743.186750] ? trace_hardirqs_on+0x5f/0x120 [ 743.186754] drm_mode_object_put+0x51/0x70 [drm] [ 743.186786] drm_fb_release+0x105/0x190 [drm] [ 743.186812] ? rt_mutex_slowunlock+0x3aa/0x410 [ 743.186817] ? rt_spin_lock+0xea/0x1b0 [ 743.186819] drm_file_free+0x1e0/0x2c0 [drm] [ 743.186843] drm_release_noglobal+0x91/0xf0 [drm] [ 743.186865] __fput+0x100/0x2e0 [ 743.186869] fput_close_sync+0x40/0xa0 [ 743.186870] __x64_sys_close+0x3e/0x80 [ 743.186873] x64_sys_call+0xa07/0xa30 [ 743.186879] do_syscall_64+0xd8/0xab0 [ 743.186881] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 743.186882] RIP: 0033:0x7ff37e567732 [ 743.186884] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 55 bf 01 00 [ 743.186885] RSP: 002b:00007ffc818169a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 743.186886] RAX: ffffffffffffffda RBX: 00007ffc81816a30 RCX: 00007ff37e567732 [ 743.186887] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012 [ 743.186888] RBP: 00007ffc818169d0 R08: 0000000000000000 R09: 0000000000000000 [ 743.186889] R10: 0000000000000000 R11: 0000000000000246 R12: 000055d60a7996e0 [ 743.186889] R13: 00007ffc81816a90 R14: 00007ffc81816a90 R15: 000055d60a782a30 [ 743.186892] </TASK> [ 743.186893] Modules linked in: rfcomm snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_addrtype nft_compat x_tables nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep mtd_intel_dg snd_hda_codec_intelhdmi mtd snd_hda_codec_hdmi nls_utf8 mxm_wmi intel_wmi_thunderbolt gigabyte_wmi wmi_bmof xe drm_gpuvm drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video drm_suballoc_helper gpu_sched drm_client_lib drm_exec drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 i2c_mux snd i2c_smbus btusb btrtl btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse nfnetlink [ 743.186938] ---[ end trace 0000000000000000 ]--- And for property blobs: void drm_mode_config_cleanup(struct drm_device *dev) { ... list_for_each_entry_safe(blob, bt, &dev->mode_config.property_blob_list, head_global) { drm_property_blob_put(blob); } Resulting in: [ 371.072940] BUG: unable to handle page fault for address: 000001ffffffffff [ 371.072944] #PF: supervisor read access in kernel mode [ 371.072945] #PF: error_code(0x0000) - not-present page [ 371.072947] PGD 0 P4D 0 [ 371.072950] Oops: Oops: 0000 [#1] SMP [ 371.072953] CPU: 0 UID: 1000 PID: 3693 Comm: kwin_wayland Not tainted 7.0.0-rc1-valkyria+ #111 PREEMPT_{RT,(lazy)} [ 371.072956] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3/X299 AORUS Gaming 3-CF, BIOS F8n 12/06/2021 [ 371.072957] RIP: 0010:drm_property_destroy_user_blobs+0x3b/0x90 [drm] [ 371.073019] Code: 00 00 48 83 ec 10 48 8b 86 30 01 00 00 48 39 c3 74 59 48 89 c2 48 8d 48 c8 48 8b 00 4c 8d 60 c8 eb 04 4c 8d 60 c8 48 8b 71 40 <48> 39 16 0f 85 39 32 01 00 48 3b 50 08 0f 85 2f 32 01 00 48 89 70 [ 371.073021] RSP: 0018:ffffc90006a73de8 EFLAGS: 00010293 [ 371.073022] RAX: 000001ffffffffff RBX: ffff888118a1a930 RCX: ffff8881b92355c0 [ 371.073024] RDX: ffff8881b92355f8 RSI: 000001ffffffffff RDI: ffff888118be4000 [ 371.073025] RBP: ffffc90006a73e08 R08: ffff8881009b7300 R09: ffff888cecc5b000 [ 371.073026] R10: ffffc90006a73e90 R11: 0000000000000002 R12: 000001ffffffffc7 [ 371.073027] R13: ffff888118a1a980 R14: ffff88810b366d20 R15: ffff888118a1a970 [ 371.073028] FS: 00007f1faccbb280(0000) GS:ffff888cec2db000(0000) knlGS:0000000000000000 [ 371.073029] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 371.073030] CR2: 000001ffffffffff CR3: 000000010655c001 CR4: 00000000003706f0 [ 371.073031] Call Trace: [ 371.073033] <TASK> [ 371.073036] drm_file_free+0x1df/0x2a0 [drm] [ 371.073077] drm_release_noglobal+0x7a/0xe0 [drm] [ 371.073113] __fput+0xe2/0x2b0 [ 371.073118] fput_close_sync+0x40/0xa0 [ 371.073119] __x64_sys_close+0x3e/0x80 [ 371.073122] x64_sys_call+0xa07/0xa30 [ 371.073126] do_syscall_64+0xc0/0x840 [ 371.073130] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 371.073132] RIP: 0033:0x7f1fb3501732 [ 371.073133] Code: 08 0f 85 a1 38 ff ff 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 55 bf 01 00 [ 371.073135] RSP: 002b:00007ffe8e6f0278 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 371.073136] RAX: ffffffffffffffda RBX: 00007ffe8e6f0300 RCX: 00007f1fb3501732 [ 371.073137] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000012 [ 371.073138] RBP: 00007ffe8e6f02a0 R08: 0000000000000000 R09: 0000000000000000 [ 371.073139] R10: 0000000000000000 R11: 0000000000000246 R12: 00005585ba46eea0 [ 371.073140] R13: 00007ffe8e6f0360 R14: 00007ffe8e6f0360 R15: 00005585ba458a30 [ 371.073143] </TASK> [ 371.073144] Modules linked in: rfcomm snd_hrtimer xt_addrtype xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay cfg80211 bnep snd_hda_codec_intelhdmi snd_hda_codec_hdmi mtd_intel_dg mtd nls_utf8 wmi_bmof mxm_wmi gigabyte_wmi intel_wmi_thunderbolt xe drm_gpuvm drm_gpusvm_helper i2c_algo_bit drm_buddy drm_ttm_helper ttm video drm_suballoc_helper gpu_sched drm_client_lib drm_exec drm_display_helper cec drm_kunit_helpers drm_kms_helper kunit x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_hda_codec snd_hwdep snd_hda_core snd_intel_dspcfg snd_soc_core snd_compress ac97_bus snd_pcm snd_seq snd_seq_device snd_timer i2c_i801 btusb i2c_mux i2c_smbus btrtl snd btbcm btmtk btintel bluetooth ecdh_generic rfkill ecc mei_me mei ioatdma dca wmi nfsd drm i2c_dev fuse nfnetlink [ 371.073198] CR2: 000001ffffffffff [ 371.073199] ---[ end trace 0000000000000000 ]--- Add a guard around file close, and ensure the warnings from drm_mode_config do not trigger. Fix those by allowing an open reference to the file descriptor and cleaning up the file linked list entry in drm_mode_config_cleanup(). Cc: <stable@vger.kernel.org> # v4.18+ Fixes: bee330f3d672 ("drm: Use srcu to protect drm_device.unplugged") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260313151728.14990-4-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2026-03-17drm/amdgpu: Fix ISP segfault issue in kernel v7.0Pratap Nirujogi
Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: 057edc58aa59 ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c51632d1ed7ac5aed2d40dbc0718d75342c12c6a)
2026-03-17drm/amdgpu/gmc9.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e14d468304832bcc4a082d95849bc0a41b18ddea) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub4.2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dea5f235baf3786bfd4fd920b03c19285fdc3d9f) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub4.1.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 04f063d85090f5dd0c671010ce88ee49d9dcc8ed) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f14f27bbe2a3ed7af32d5f6eaf3f417139f45253) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0.2: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1441f52c7f6ae6553664aa9e3e4562f6fc2fe8ea) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub3.0.1: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5f76083183363c4528a4aaa593f5d38c28fe7d7b) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub2.3: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89cd90375c19fb45138990b70e9f4ba4806f05c4) Cc: stable@vger.kernel.org
2026-03-17drm/amdgpu/mmhub2.0: add bounds checking for cidAlex Deucher
The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e064cef4b53552602bb6ac90399c18f662f3cacd) Cc: stable@vger.kernel.org
2026-03-17drm/amd: fix dcn 2.01 checkAndy Nguyen
The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: 54b822b3eac3 ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9c7be0efa6f0daa949a5f3e3fdf9ea090b0713cb)
2026-03-17drm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr()Srinivasan Shanmugam
parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid || !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: a638b837d0e6 ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c7e6342e98c846b259c57273436fdea4c043f2)
2026-03-17drm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END}Xi Ruoyao
[Why] The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is compiled with CC_FLAGS_FPU into FP instructions. So when we call it we must use DC_FP_{START,END} to save and restore the FP context, and prepare the FP unit on architectures like LoongArch where the FP unit isn't always on. Reported-by: LiarOnce <liaronce@hotmail.com> Fixes: ee7be8f3de1c ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 25bb1d54ba3983c064361033a8ec15474fece37e) Cc: stable@vger.kernel.org
2026-03-17drm/amd/display: Fix uninitialized variable use which breaks full LTOCalvin Owens
Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") introduced a use of an uninitialized stack variable in dm_dmub_sw_init() (region_params.bss_data_size). Interestingly, this seems to cause no issue on normal kernels. But when full LTO is enabled, it causes the compiler to "optimize" out huge swaths of amdgpu initialization code, and the driver is unusable: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00 amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init It surprises me that neither gcc nor clang emit a warning about this: I only found it by bisecting the LTO breakage. Fix by using the bss_data_size field from fw_meta_info_params, as was presumably intended. Fixes: e1b385726f7f ("drm/amd/display: Add additional checks for PSP footer size") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7f1402f6ad24cc6b9a01fa09ebd1c6559d787d0)
2026-03-17drm/amdgpu: Limit BO list entry count to prevent resource exhaustionJesse.Zhang
Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 688b87d39e0aa8135105b40dc167d74b5ada5332) Cc: stable@vger.kernel.org
2026-03-17drm/amd/display: Fix gamma 2.2 colorop TFsAlex Hung
Use GAMMA22 for degamma/blend and GAMMA22_INV for shaper so curves match the color pipeline. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5016 Tested-by: Xaver Hugl <xaver.hugl@kde.org> Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d8f9f42effd767ffa7bbcd7e05fbd6b20737e468)
2026-03-17PCI: endpoint: pci-epf-test: Roll back BAR mapping when subrange setup failsKoichiro Den
When the BAR subrange mapping test on DWC-based platforms fails due to insufficient free inbound iATU regions, pci_epf_test_bar_subrange_setup() returns an error (-ENOSPC) but does not restore the original BAR mapping. This causes subsequent test runs to become confusing, since the failure may leave room for the next subrange mapping test to pass. Fix this by restoring the original BAR mapping when preparation of the subrange mapping fails, so that no side effect remains regardless of the test success or failure. Fixes: 6c5e6101423b ("PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support") Reported-by: Christian Bruel <christian.bruel@foss.st.com> Closes: https://lore.kernel.org/linux-pci/b2b03ebe-9482-4a13-b22f-7b44da096eed@foss.st.com/ Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Christian Bruel <christian.bruel@foss.st.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260316140225.1481658-1-den@valinux.co.jp
2026-03-17drm/pagemap_util: Ensure proper cache lock management on freeJonathan Cavitt
For the sake of consistency, ensure that the cache lock is always unlocked after drm_pagemap_cache_fini. Spinlocks typically disable preemption and if the code-path missing the unlock is hit, preemption will remain disabled even if the lock is subsequently freed. Fixes static analysis issue. v2: - Use requested code flow (Maarten) v3: - Clear cache->dpagemap (Matt Brost, Maarten) v4: - Reword commit message (Thomas) Fixes: 77f14f2f2d73f ("drm/pagemap: Add a drm_pagemap cache and shrinker") Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260316151555.7553-2-jonathan.cavitt@intel.com
2026-03-17drm/imagination: Disable interrupts before suspending the GPUAlessio Belle
This is an additional safety layer to ensure no accesses to the GPU registers can be made while it is powered off. While we can disable IRQ generation from GPU, META firmware, MIPS firmware and for safety events, we cannot do the same for the RISC-V firmware. To keep a unified approach, once the firmware has completed its power off sequence, disable IRQs for the while GPU at the kernel level instead. Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-2-bf4f9ed68e75@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
2026-03-17drm/imagination: Synchronize interrupts before suspending the GPUAlessio Belle
The runtime PM suspend callback doesn't know whether the IRQ handler is in progress on a different CPU core and doesn't wait for it to finish. Depending on timing, the IRQ handler could be running while the GPU is suspended, leading to kernel crashes when trying to access GPU registers. See example signature below. In a power off sequence initiated by the runtime PM suspend callback, wait for any IRQ handlers in progress on other CPU cores to finish, by calling synchronize_irq(). At the same time, remove the runtime PM resume/put calls in the threaded IRQ handler. On top of not being the right approach to begin with, and being at the wrong place as they should have wrapped all GPU register accesses, the driver would hit a deadlock between synchronize_irq() being called from a runtime PM suspend callback, holding the device power lock, and the resume callback requiring the same. Example crash signature on a TI AM68 SK platform: [ 337.241218] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError [ 337.241239] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241246] Tainted: [M]=MACHINE_CHECK [ 337.241249] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241252] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 337.241256] pc : pvr_riscv_irq_pending+0xc/0x24 [ 337.241277] lr : pvr_device_irq_thread_handler+0x64/0x310 [ 337.241282] sp : ffff800085b0bd30 [ 337.241284] x29: ffff800085b0bd50 x28: ffff0008070d9eab x27: ffff800083a5ce10 [ 337.241291] x26: ffff000806e48f80 x25: ffff0008070d9eac x24: 0000000000000000 [ 337.241296] x23: ffff0008068e9bf0 x22: ffff0008068e9bd0 x21: ffff800085b0bd30 [ 337.241301] x20: ffff0008070d9e00 x19: ffff0008068e9000 x18: 0000000000000001 [ 337.241305] x17: 637365645f656c70 x16: 0000000000000000 x15: ffff000b7df9ff40 [ 337.241310] x14: 0000a585fe3c0d0e x13: 000000999704f060 x12: 000000000002771a [ 337.241314] x11: 00000000000000c0 x10: 0000000000000af0 x9 : ffff800085b0bd00 [ 337.241318] x8 : ffff0008071175d0 x7 : 000000000000b955 x6 : 0000000000000003 [ 337.241323] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000 [ 337.241327] x2 : ffff800080e39d20 x1 : ffff800080e3fc48 x0 : 0000000000000000 [ 337.241333] Kernel panic - not syncing: Asynchronous SError Interrupt [ 337.241337] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT [ 337.241342] Tainted: [M]=MACHINE_CHECK [ 337.241343] Hardware name: Texas Instruments AM68 SK (DT) [ 337.241345] Call trace: [ 337.241348] show_stack+0x18/0x24 (C) [ 337.241357] dump_stack_lvl+0x60/0x80 [ 337.241364] dump_stack+0x18/0x24 [ 337.241368] vpanic+0x124/0x2ec [ 337.241373] abort+0x0/0x4 [ 337.241377] add_taint+0x0/0xbc [ 337.241384] arm64_serror_panic+0x70/0x80 [ 337.241389] do_serror+0x3c/0x74 [ 337.241392] el1h_64_error_handler+0x30/0x48 [ 337.241400] el1h_64_error+0x6c/0x70 [ 337.241404] pvr_riscv_irq_pending+0xc/0x24 (P) [ 337.241410] irq_thread_fn+0x2c/0xb0 [ 337.241416] irq_thread+0x170/0x334 [ 337.241421] kthread+0x12c/0x210 [ 337.241428] ret_from_fork+0x10/0x20 [ 337.241434] SMP: stopping secondary CPUs [ 337.241451] Kernel Offset: disabled [ 337.241453] CPU features: 0x040000,02002800,20002001,0400421b [ 337.241456] Memory Limit: none [ 337.457921] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]--- Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Fixes: 96822d38ff57 ("drm/imagination: Handle Rogue safety event IRQs") Cc: stable@vger.kernel.org # see patch description, needs adjustments for < 6.16 Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-1-bf4f9ed68e75@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
2026-03-17drm/imagination: Fix deadlock in soft reset sequenceAlessio Belle
The soft reset sequence is currently executed from the threaded IRQ handler, hence it cannot call disable_irq() which internally waits for IRQ handlers, i.e. itself, to complete. Use disable_irq_nosync() during a soft reset instead. Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support") Cc: stable@vger.kernel.org Signed-off-by: Alessio Belle <alessio.belle@imgtec.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Link: https://patch.msgid.link/20260309-fix-soft-reset-v1-1-121113be554f@imgtec.com Signed-off-by: Matt Coster <matt.coster@imgtec.com>
2026-03-17iommu/amd: Block identity domain when SNP enabledJoe Damato
Previously, commit 8388f7df936b ("iommu/amd: Do not support IOMMU_DOMAIN_IDENTITY after SNP is enabled") prevented users from changing the IOMMU domain to identity if SNP was enabled. This resulted in an error when writing to sysfs: # echo "identity" > /sys/kernel/iommu_groups/50/type -bash: echo: write error: Cannot allocate memory However, commit 4402f2627d30 ("iommu/amd: Implement global identity domain") changed the flow of the code, skipping the SNP guard and allowing users to change the IOMMU domain to identity after a machine has booted. Once the user does that, they will probably try to bind and the device/driver will start to do DMA which will trigger errors: iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0 (unnamed net_device) (uninitialized): Error (timeout: 500015) msg {0x0 0x0} len:0 iommu ivhd3: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:43:00.0 pasid=0x00000 address=0x3737b01000 flags=0x0020] iommu ivhd3: AMD-Vi: Control Reg : 0xc22000142148d AMD-Vi: DTE[0]: 6000000000000003 AMD-Vi: DTE[1]: 0000000000000001 AMD-Vi: DTE[2]: 2000003088b3e013 AMD-Vi: DTE[3]: 0000000000000000 bnxt_en 0000:43:00.0: probe with driver bnxt_en failed with error -16 To prevent this from happening, create an attach wrapper for identity_domain_ops which returns EINVAL if amd_iommu_snp_en is true. With this commit applied: # echo "identity" > /sys/kernel/iommu_groups/62/type -bash: echo: write error: Invalid argument Fixes: 4402f2627d30 ("iommu/amd: Implement global identity domain") Signed-off-by: Joe Damato <joe@dama.to> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17iommu/sva: Fix crash in iommu_sva_unbind_device()Lizhi Hou
domain->mm->iommu_mm can be freed by iommu_domain_free(): iommu_domain_free() mmdrop() __mmdrop() mm_pasid_drop() After iommu_domain_free() returns, accessing domain->mm->iommu_mm may dereference a freed mm structure, leading to a crash. Fix this by moving the code that accesses domain->mm->iommu_mm to before the call to iommu_domain_free(). Fixes: e37d5a2d60a3 ("iommu/sva: invalidate stale IOTLB entries for kernel address space") Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17net: usb: aqc111: Do not perform PM inside suspend callbackNikola Z. Ivanov
syzbot reports "task hung in rpm_resume" This is caused by aqc111_suspend calling the PM variant of its write_cmd routine. The simplified call trace looks like this: rpm_suspend() usb_suspend_both() - here udev->dev.power.runtime_status == RPM_SUSPENDING aqc111_suspend() - called for the usb device interface aqc111_write32_cmd() usb_autopm_get_interface() pm_runtime_resume_and_get() rpm_resume() - here we call rpm_resume() on our parent rpm_resume() - Here we wait for a status change that will never happen. At this point we block another task which holds rtnl_lock and locks up the whole networking stack. Fix this by replacing the write_cmd calls with their _nopm variants Reported-by: syzbot+48dc1e8dfc92faf1124c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=48dc1e8dfc92faf1124c Fixes: e58ba4544c77 ("net: usb: aqc111: Add support for wake on LAN by MAGIC packet") Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Link: https://patch.msgid.link/20260313141643.1181386-1-zlatistiv@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-17iommu: Fix mapping check for 0x0 to avoid re-mapping itAntheas Kapenekakis
Commit 789a5913b29c ("iommu/amd: Use the generic iommu page table") introduces the shared iommu page table for AMD IOMMU. Some bioses contain an identity mapping for address 0x0, which is not parsed properly (e.g., certain Strix Halo devices). This causes the DMA components of the device to fail to initialize (e.g., the NVMe SSD controller), leading to a failed post. Specifically, on the GPD Win 5, the NVME and SSD GPU fail to mount, making collecting errors difficult. While debugging, it was found that a -EADDRINUSE error was emitted and its source was traced to iommu_iova_to_phys(). After adding some debug prints, it was found that phys_addr becomes 0, which causes the code to try to re-map the 0 address and fail, causing a cascade leading to a failed post. This is because the GPD Win 5 contains a 0x0-0x1 identity mapping for DMA devices, causing it to be repeated for each device. The cause of this failure is the following check in iommu_create_device_direct_mappings(), where address aliasing is handled via the following check: ``` phys_addr = iommu_iova_to_phys(domain, addr); if (!phys_addr) { map_size += pg_size; continue; } ```` Obviously, the iommu_iova_to_phys() signature is faulty and aliases unmapped and 0 together, causing the allocation code to try to re-allocate the 0 address per device. However, it has too many instantiations to fix. Therefore, use a ternary so that when addr is 0, the check is done for address 1 instead. Suggested-by: Robin Murphy <robin.murphy@arm.com> Fixes: 789a5913b29c ("iommu/amd: Use the generic iommu page table") Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17iommu/vt-d: Only handle IOPF for SVA when PRI is supportedLu Baolu
In intel_svm_set_dev_pasid(), the driver unconditionally manages the IOPF handling during a domain transition. However, commit a86fb7717320 ("iommu/vt-d: Allow SVA with device-specific IOPF") introduced support for SVA on devices that handle page faults internally without utilizing the PCI PRI. On such devices, the IOMMU-side IOPF infrastructure is not required. Calling iopf_for_domain_replace() on these devices is incorrect and can lead to unexpected failures during PASID attachment or unwinding. Add a check for info->pri_supported to ensure that the IOPF queue logic is only invoked for devices that actually rely on the IOMMU's PRI-based fault handling. Fixes: 17fce9d2336d ("iommu/vt-d: Put iopf enablement in domain attach path") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20260310075520.295104-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17iommu/vt-d: Fix intel iommu iotlb sync hardlockup and retryGuanghui Feng
During the qi_check_fault process after an IOMMU ITE event, requests at odd-numbered positions in the queue are set to QI_ABORT, only satisfying single-request submissions. However, qi_submit_sync now supports multiple simultaneous submissions, and can't guarantee that the wait_desc will be at an odd-numbered position. Therefore, if an item times out, IOMMU can't re-initiate the request, resulting in an infinite polling wait. This modifies the process by setting the status of all requests already fetched by IOMMU and recorded as QI_IN_USE status (including wait_desc requests) to QI_ABORT, thus enabling multiple requests to be resubmitted. Fixes: 8a1d82462540 ("iommu/vt-d: Multiple descriptors per qi_submit_sync()") Cc: stable@vger.kernel.org Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://lore.kernel.org/r/20260306101516.3885775-1-guanghuifeng@linux.alibaba.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: 8a1d82462540 ("iommu/vt-d: Multiple descriptors per qi_submit_sync()") Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-17dmaengine: xilinx_dma: Fix reset related timeout with two-channel AXIDMATomi Valkeinen
A single AXIDMA controller can have one or two channels. When it has two channels, the reset for both are tied together: resetting one channel resets the other as well. This creates a problem where resetting one channel will reset the registers for both channels, including clearing interrupt enable bits for the other channel, which can then lead to timeouts as the driver is waiting for an interrupt which never comes. The driver currently has a probe-time work around for this: when a channel is created, the driver also resets and enables the interrupts. With two channels the reset for the second channel will clear the interrupt enables for the first one. The work around in the driver is just to manually enable the interrupts again in xilinx_dma_alloc_chan_resources(). This workaround only addresses the probe-time issue. When channels are reset at runtime (e.g., in xilinx_dma_terminate_all() or during error recovery), there's no corresponding mechanism to restore the other channel's interrupt enables. This leads to one channel having its interrupts disabled while the driver expects them to work, causing timeouts and DMA failures. A proper fix is a complicated matter, as we should not reset the other channel when it's operating normally. So, perhaps, there should be some kind of synchronization for a common reset, which is not trivial to implement. To add to the complexity, the driver also supports other DMA types, like VDMA, CDMA and MCDMA, which don't have a shared reset. However, when the two-channel AXIDMA is used in the (assumably) normal use case, providing DMA for a single memory-to-memory device, the common reset is a bit smaller issue: when something bad happens on one channel, or when one channel is terminated, the assumption is that we also want to terminate the other channel. And thus resetting both at the same time is "ok". With that line of thinking we can implement a bit better work around than just the current probe time work around: let's enable the AXIDMA interrupts at xilinx_dma_start_transfer() instead. This ensures interrupts are enabled whenever a transfer starts, regardless of any prior resets that may have cleared them. This approach is also more logical: enable interrupts only when needed for a transfer, rather than at resource allocation time, and, I think, all the other DMA types should also use this model, but I'm reluctant to do such changes as I cannot test them. The reset function still enables interrupts even though it's not needed for AXIDMA anymore, but it's common code for all DMA types (VDMA, CDMA, MCDMA), so leave it unchanged to avoid affecting other variants. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine") Link: https://patch.msgid.link/20260311-xilinx-dma-fix-v2-1-a725abb66e3c@ideasonboard.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-03-17dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtractionMarek Vasut
The segment .control and .status fields both contain top bits which are not part of the buffer size, the buffer size is located only in the bottom max_buffer_len bits. To avoid interference from those top bits, mask out the size using max_buffer_len first, and only then subtract the values. Fixes: a575d0b4e663 ("dmaengine: xilinx_dma: Introduce xilinx_dma_get_residue") Signed-off-by: Marek Vasut <marex@nabladev.com> Link: https://patch.msgid.link/20260316222530.163815-1-marex@nabladev.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-03-17dmaengine: xilinx: xilinx_dma: Fix residue calculation for cyclic DMAMarek Vasut
The cyclic DMA calculation is currently entirely broken and reports residue only for the first segment. The problem is twofold. First, when the first descriptor finishes, it is moved from active_list to done_list, but it is never returned back into the active_list. The xilinx_dma_tx_status() expects the descriptor to be in the active_list to report any meaningful residue information, which never happens after the first descriptor finishes. Fix this up in xilinx_dma_start_transfer() and if the descriptor is cyclic, lift it from done_list and place it back into active_list list. Second, the segment .status fields of the descriptor remain dirty. Once the DMA did one pass on the descriptor, the .status fields are populated with data by the DMA, but the .status fields are not cleared before reuse during the next cyclic DMA round. The xilinx_dma_get_residue() recognizes that as if the descriptor was complete and had 0 residue, which is bogus. Reinitialize the status field before placing the descriptor back into the active_list. Fixes: c0bba3a99f07 ("dmaengine: vdma: Add Support for Xilinx AXI Direct Memory Access Engine") Signed-off-by: Marek Vasut <marex@nabladev.com> Link: https://patch.msgid.link/20260316221943.160375-1-marex@nabladev.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-03-17dmaengine: xilinx: xilinx_dma: Fix dma_device directionsMarek Vasut
Unlike chan->direction , struct dma_device .directions field is a bitfield. Turn chan->direction into a bitfield to make it compatible with struct dma_device .directions . Fixes: 7e01511443c3 ("dmaengine: xilinx_dma: Set dma_device directions") Signed-off-by: Marek Vasut <marex@nabladev.com> Link: https://patch.msgid.link/20260316221728.160139-1-marex@nabladev.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-03-17dmaengine: sh: rz-dmac: Move CHCTRL updates under spinlockClaudiu Beznea
Both rz_dmac_disable_hw() and rz_dmac_irq_handle_channel() update the CHCTRL register. To avoid concurrency issues when configuring functionalities exposed by this registers, take the virtual channel lock. All other CHCTRL updates were already protected by the same lock. Previously, rz_dmac_disable_hw() disabled and re-enabled local IRQs, before accessing CHCTRL registers but this does not ensure race-free access. Remove the local IRQ disable/enable code as well. Fixes: 5000d37042a6 ("dmaengine: sh: Add DMAC driver for RZ/G2L SoC") Cc: stable@vger.kernel.org Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Link: https://patch.msgid.link/20260316133252.240348-3-claudiu.beznea.uj@bp.renesas.com Signed-off-by: Vinod Koul <vkoul@kernel.org>