summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2025-12-02drm/amdgpu/cz_ih: Enable soft IRQ handler ringTimur Kristóf
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/tonga_ih: Enable soft IRQ handler ringTimur Kristóf
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/iceland_ih: Enable soft IRQ handler ringTimur Kristóf
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/cik_ih: Enable soft IRQ handler ringTimur Kristóf
We are going to use the soft IRQ handler ring on GMC v7 (CIK) to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/si_ih: Enable soft IRQ handler ringTimur Kristóf
We are going to use the soft IRQ handler ring on GMC v6 (SI) to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdmaPierre-Eric Pelloux-Prayer
Users of ttm entities need to hold the gtt_window_lock before using them to guarantee proper ordering of jobs. Cc: stable@vger.kernel.org Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02Merge tag 'drm-misc-next-2025-12-01-1' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next Extra drm-misc-next for v6.19-rc1: UAPI Changes: - Add support for drm colorop pipeline. - Add COLOR PIPELINE plane property. - Add DRM_CLIENT_CAP_PLANE_COLOR_PIPELINE. Cross-subsystem Changes: - Attempt to use higher order mappings in system heap allocator. - Always taint kernel with sw-sync. Core Changes: - Small fixes to drm/gem. - Support emergency restore to drm-client. - Allocate and release fb_info in single place. - Rework ttm pipelined eviction fence handling. Driver Changes: - Support the drm color pipeline in vkms, amdgfx. - Add NVJPG driver for tegra. - Assorted small fixes and updates to rockchip, bridge/dw-hdmi-qp, panthor. - Add ASL CS5263 DP-to-HDMI simple bridge. - Add and improve support for G LD070WX3-SL01 MIPI DSI, Samsung LTL106AL0, Samsung LTL106AL01, Raystar RFF500F-AWH-DNN, Winstar WF70A8SYJHLNGA, Wanchanglong w552946aaa, Samsung SOFEF00, Lenovo X13s panel. - Add support for it66122 to it66121. - Support mali-G1 gpu in panthor. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/aa5cbd50-7676-4a59-bbed-e8428af86804@linux.intel.com
2025-11-26drm/amdgpu: fix cyan_skillfish2 gpu info fw handlingAlex Deucher
If the board supports IP discovery, we don't need to parse the gpu info firmware. Backport to 6.18. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721 Fixes: fa819e3a7c1e ("drm/amdgpu: add support for cyan skillfish gpu_info") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5427e32fa3a0ba9a016db83877851ed277b065fb)
2025-11-26drm/amdgpu: attach tlb fence to the PTs updatePrike Liang
Ensure the userq TLB flush is emitted only after the VM update finishes and the PT BOs have been annotated with bookkeeping fences. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f3854e04b708d73276c4488231a8bd66d30b4671) Cc: stable@vger.kernel.org
2025-11-26drm/amdgpu: fix cyan_skillfish2 gpu info fw handlingAlex Deucher
If the board supports IP discovery, we don't need to parse the gpu info firmware. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721 Fixes: fa819e3a7c1e ("drm/amdgpu: add support for cyan skillfish gpu_info") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-26drm/amdgpu: Fix CPER ring debugfs read buffer overflow riskSrinivasan Shanmugam
The CPER ring debugfs read code always writes a 12-byte header when the file is read for the first time (*offset == 0): copy_to_user(buf, ring_header, 12); But the code never checks whether the user buffer (@size) is at least 12 bytes long. After writing the 12-byte header, the code then gives the full original @size to the CPER payload handler: record_req->buf_size = size; This means the function can write: 12 bytes (header) + payload bytes (up to @size) into a buffer that is only @size bytes big. In other words, the kernel may write more data than the user asked for. This can overflow the user buffer. The fix is: - If the user buffer is smaller than 12 bytes on the first read, return -EINVAL instead of copying the header. - After writing the 12-byte header, subtract 12 from @size and pass the reduced size to record_req->buf_size. This ensures the CPER payload only uses the remaining free space in the buffer. Reads after the first one (*offset != 0) do not write the header, so their behavior stays exactly the same. The only user-visible change is that tiny buffers now fail safely instead of risking an overflow. Fixes: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:523 amdgpu_ras_cper_debugfs_read() warn: userbuf overflow? is 'ring_header_size' <= 'size' Fixes: 527e3d40339b ("drm/amd/ras: Add CPER ring read for uniras") Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Xiang Liu <xiang.liu@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-26drm/amdgpu: attach tlb fence to the PTs updatePrike Liang
Ensure the userq TLB flush is emitted only after the VM update finishes and the PT BOs have been annotated with bookkeeping fences. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-26drm/amdgpu: free job fences on failure in amdgpu_job_alloc_with_ibPierre-Eric Pelloux-Prayer
Otherwise we're leaking memory. Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-26drm/amdgpu: clear job on failure in amdgpu_job_alloc(_with_ib)Pierre-Eric Pelloux-Prayer
If memory is freed we need to nullify the pointer or the caller might call kfree again (eg: amdgpu_cs_parser_fini calls kfree on all non-null job pointers). Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-26drm/amdgpu: use ttm_resource_manager_cleanupPierre-Eric Pelloux-Prayer
Rather than open-coding it. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20251121101315.3585-3-pierre-eric.pelloux-prayer@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2025-11-24drm/amd/amdgpu: reserve vm invalidation engine for uni_mesMichael Chen
Reserve vm invalidation engine 6 when uni_mes enabled. It is used in processing tlb flush request from host. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 873373739b9b150720ea2c5390b4e904a4d21505) Cc: stable@vger.kernel.org
2025-11-24Revert "drm/amd: fix gfx hang on renoir in IGT reload test"Rodrigo Siqueira
The original patch introduced additional latency during boot time because it triggers a driver reload to avoid a CP hang when the driver is reloaded multiple times. This has been addressed with a more generic solution that triggers the GPU reset only during the unload phase, avoiding extra latency during boot time. For this reason, this commit reverts the original change. This reverts commit 72a98763b473890e6605604bfcaf71fc212b4720. This patch should only be applied if commit: 4355e61835e7 ("drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded") is present. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloadedRodrigo Siqueira
When trying to unload amdgpu in the SteamDeck (TTY mode), the following set of errors happens and the system gets unstable: [..] [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0 amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110). amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110). [..] amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff! [..] When the driver initializes the GPU, the PSP validates all the firmware loaded, and after that, it is not possible to load any other firmware unless the device is reset. What is happening in the load/unload situation is that PSP halts the GC engine because it suspects that something is amiss. To address this issue, this commit ensures that the GPU is reset (mode 2 reset) in the unload sequence. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24drm/amd/amdgpu: reserve vm invalidation engine for uni_mesMichael Chen
Reserve vm invalidation engine 6 when uni_mes enabled. It is used in processing tlb flush request from host. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-21Merge tag 'v6.18-rc6' into drm-nextDave Airlie
Linux 6.18-rc6 Backmerge in order to merge msm next Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-11-19drm/amdgpu: Add sriov vf check for VCN per queue reset support.Shikang Fan
Add SRIOV check when setting VCN ring's supported reset mask. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ee9b603ad43f9870eb75184f9fb0a84f8c3cc852) Cc: stable@vger.kernel.org
2025-11-19drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flagsSrinivasan Shanmugam
The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like normal TT/doorbell/preempt memory and unconditionally accessed ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a NULL pointer dereference when computing PDE flags. Fix this by checking that ttm is non-NULL before reading ttm->caching. This prevents the crash for MMIO_REMAP and also makes the code more defensive if other BOs ever come through without a ttm_tt. Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement") Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0db94da5a0a1cacda080b9ec8425fcbe4babc141)
2025-11-19drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flagTimur Kristóf
This fixes sparse mappings (aka. partially resident textures). Check the correct flags. Since a recent refactor, the code works with uAPI flags (for mapping buffer objects), and not PTE (page table entry) flags. Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8feeab26c80635b802f72b3ed986c693ff8f3212)
2025-11-19drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabledYifan Zha
[Why] Accoreding to CP updated to RS64 on gfx11, WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW. That packet is used for MCBP on F32 based system. So it would lead to incorrect GRBM write and FW is not handling that extra case correctly. [How] With gfx11 rs64 enabled, skip emit de meta data. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8366cd442d226463e673bed5d199df916f4ecbcf) Cc: stable@vger.kernel.org
2025-11-19drm/amd: Skip power ungate during suspend for VPEMario Limonciello
During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2a6c826cfeedd7714611ac115371a959ead55bda) Cc: stable@vger.kernel.org
2025-11-19drm/amdgpu: Add sriov vf check for VCN per queue reset support.Shikang Fan
Add SRIOV check when setting VCN ring's supported reset mask. Signed-off-by: Shikang Fan <shikang.fan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-19drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flagsSrinivasan Shanmugam
The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like normal TT/doorbell/preempt memory and unconditionally accessed ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a NULL pointer dereference when computing PDE flags. Fix this by checking that ttm is non-NULL before reading ttm->caching. This prevents the crash for MMIO_REMAP and also makes the code more defensive if other BOs ever come through without a ttm_tt. Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement") Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Tested-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-19drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flagTimur Kristóf
This fixes sparse mappings (aka. partially resident textures). Check the correct flags. Since a recent refactor, the code works with uAPI flags (for mapping buffer objects), and not PTE (page table entry) flags. Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-19drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabledYifan Zha
[Why] Accoreding to CP updated to RS64 on gfx11, WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW. That packet is used for MCBP on F32 based system. So it would lead to incorrect GRBM write and FW is not handling that extra case correctly. [How] With gfx11 rs64 enabled, skip emit de meta data. Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-19drm/amd: Skip power ungate during suspend for VPEMario Limonciello
During the suspend sequence VPE is already going to be power gated as part of vpe_suspend(). It's unnecessary to call during calls to amdgpu_device_set_pg_state(). It actually can expose a race condition with the firmware if s0i3 sequence starts as well. Drop these calls. Cc: Peyton.Lee@amd.com Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-19drm/amdgpu: Switch to use %ptSpAndy Shevchenko
Use %ptSp instead of open coded variants to print content of struct timespec64 in human readable format. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251113150217.3030010-6-andriy.shevchenko@linux.intel.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-18drm/amdgpu: Unregister mce notifierLijo Lazar
Unregister mce notifier on unload. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Use pci_rebar_get_max_size()Ilpo Järvinen
Use pci_rebar_get_max_size() to simplify amdgpu_device_resize_fb_bar(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-11-ilpo.jarvinen@linux.intel.com
2025-11-14drm/amdgpu: Remove driver side BAR release before resizeIlpo Järvinen
PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the amdgpu driver. Also remove the driver initiated assignment as pci_resize_resource() should try to assign as much as possible. If the driver side call manages to get more required resources assigned in some scenario, such a problem should be fixed inside pci_resize_resource() instead. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-11-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Fix restoring BARs on BAR resize rollback pathIlpo Järvinen
BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring from the entries failed list as saved list should now contain both the bridge windows and device resources so the extra restore is duplicated work. Some drivers (currently only amdgpu) want to prevent releasing some resources. Add exclude_bars param to pci_resize_resource() and make amdgpu pass its register BAR (BAR 2 or 5), which should never be released during resize operation. Normally 64-bit prefetchable resources do not share a bridge window with the 32-bit only register BAR, but there are various fallbacks in the resource assignment logic which may make the resources share the bridge window in rare cases. This change (together with the driver side changes) is to counter the resource releases that had to be done to prevent resource tree corruption in the ("PCI: Release assigned resource before restoring them") change. As such, it likely restores functionality in cases where device resources were released to avoid resource tree conflicts which appeared to be "working" when such conflicts were not correctly detected by the kernel. Reported-by: Simon Richter <Simon.Richter@hogyros.de> Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/ Reported-by: Alex Bennée <alex.bennee@linaro.org> Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash amdgpu BAR selection from https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com
2025-11-14drm/amdgpu: Use amdgpu by default on SI dedicated GPUs (v2)Timur Kristóf
Now that the DC analog connector support and VCE1 support landed, amdgpu is at feature parity with the old radeon driver on SI dGPUs. Enabling the amdgpu driver by default for SI dGPUs has the following benefits: - More stable OpenGL support through RadeonSI - Vulkan support through RADV - Improved performance - Better display features through DC Users who want to keep using the old driver can do so using: amdgpu.si_support=0 radeon.si_support=1 v2: - Update documentation in Kconfig file Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Use amdgpu by default on CIK dedicated GPUsTimur Kristóf
The amdgpu driver has been working well on CIK dGPUs for years. Now that the DC analog connector support landed, amdgpu is at feature parity with the old radeon driver on CIK dGPUs. Enabling the amdgpu driver by default for CIK dGPUs has the following benefits: - More stable OpenGL support through RadeonSI - Vulkan support through RADV - Improved performance - Better display features through DC Users who want to keep using the old driver can do so using: amdgpu.cik_support=0 radeon.cik_support=1 v2: - Update documentation in Kconfig file v3: - Rebase documentation updates (Alex) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Fix the issue of missing ras message on sriov hostYiPeng Chai
This code only applies to amdgpu processing poison consumption after uniras is enabled, but not to sriov. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Add lock to serialize sriov command executionYiPeng Chai
Add lock to serialize sriov command execution. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Synchronize sriov host to add block_mmsch bit fieldYiPeng Chai
Synchronize sriov host to add block_mmsch bit field. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: use GFP_ATOMIC instead of NOWAIT in the critical pathChristian König
Otherwise job submissions can fail with ENOMEM. We probably need to re-design the per VMID tracking at some point. Signed-off-by: Christian König <christian.koenig@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: avoid memory allocation in the critical code path v3Christian König
When we run out of VMIDs we need to wait for some to become available. Previously we were using a dma_fence_array for that, but this means that we have to allocate memory. Instead just wait for the first not signaled fence from the least recently used VMID to signal. That is not as efficient since we end up in this function multiple times again, but allocating memory can easily fail or deadlock if we have to wait for memory to become available. v2: remove now unused VM manager fields v3: fix dma_fence reference Signed-off-by: Christian König <christian.koenig@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Enable xgmi extended peer links for sriov guestWill Aitken
The amd-smi tool relies on extended peer link information to report xgmi link metrics. The necessary xgmi ta command, GET_EXTEND_PEER_LINKS, has been enabled in the host driver and this change is necessary for the guest to make use of it. To handle the case where the host driver does not have the latest xgmi ta, the guest driver checks for guest support through a pf2vf feature flag before invoking psp. Signed-off-by: Will Aitken <wiaitken@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Update headers for sriov xgmi ext peer link support feature flagWill Aitken
Adds new sriov msg flag to match host, feature flag in the amdgim enum, and a wrapper macro to check it. Signed-off-by: Will Aitken <wiaitken@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Refactor sriov xgmi topology filling to common codeWill Aitken
amdgpu_xgmi_fill_topology_info and psp_xgmi_reflect_topology_info perform the same logic of copying topology info of one node to every other node in the hive. Instead of having two functions that purport to do the same thing, this refactoring moves the logic of the fill function to the reflect function and adds reflecting port number info as well for complete functionality. Signed-off-by: Will Aitken <wiaitken@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Use amdgpu by default on CIK dedicated GPUsTimur Kristóf
The amdgpu driver has been working well on CIK dGPUs for years. Now that the DC analog connector support landed, these GPUs are at feature parity with the old radeon driver. Additionally, amdgpu yields extra performance, supports Vulkan and provides more display features through DC as well as more robust power management. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Refactor how SI and CIK support is determinedTimur Kristóf
Move the determination into a separate function. Change amdgpu.si_support and amdgpu.cik_support so that their default value is -1 (default). This prepares the code for changing the default driver based on the chip. Also adjust the module param documentation. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-14drm/amdgpu: Avoid xgmi register accessLijo Lazar
On single GPU systems, avoid accesses to XGMI link registers. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1Sathishkumar S
enable parse_cs callback for JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 547985579932c1de13f57f8bcf62cd9361b9d3d3) Cc: stable@vger.kernel.org
2025-11-11drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BOSultan Alsawaf
When the BO pointer provided to amdgpu_bo_create_kernel() points to non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address rather than allocate a new BO. This functionality is never desired for allocating ISP buffers. A new BO should always be created when isp_kernel_buffer_alloc() is called, per the description for isp_kernel_buffer_alloc(). Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call. Fixes: 55d42f616976 ("drm/amd/amdgpu: Add helper functions for isp buffers") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 73c8c29baac7f0c7e703d92eba009008cbb5228e)