summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-11-06drm/i915/xe3p_lpd: Load DMC firmwareGustavo Sousa
Load the DMC firmware for Xe3p_LPD. Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-9-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Add CDCLK tableGustavo Sousa
Add CDCLK table for Xe3p_LPD. Just as with Xe3_LPD, we don't need to send voltage index info in the PMDemand message, so we are able to re-use xe3lpd_cdclk_funcs. With the new CDCLK table, we also need to update the maximum CDCLK value returned by intel_update_max_cdclk(). Bspec: 68861, 68863 Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-8-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Remove gamma,csc bottom color checksSai Teja Pottumuttu
With Xe3p_LPD, the SKL_BOTTOM_COLOR_GAMMA_ENABLE and SKL_BOTTOM_COLOR_CSC_ENABLE bits are being removed. Thus, we need not set gamma_enable nor csc_enable in crtc_state. Note that GAMMA_MODE.POST_CSC_GAMMA_ENABLE and CSC_MODE.ICL_CSC_ENABLE are the documented alternatives for the bottom color bits being removed. But as these suggested bits are being checked in state checker as part of gamma_mode, csc_mode fields and as gamma_enable/csc_enable are not being used anywhere else functionally post ICL, we need not set these fields in crtc_state. Bspec: 69734 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-7-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Horizontal flip support for linear surfacesSai Teja Pottumuttu
Starting from Xe3p_LPD, linear surfaces also support horizontal flip. Bspec: 68904 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-6-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Expand bifield masks dbuf blocks fieldsSai Teja Pottumuttu
On Xe3p_LPD, the dbuf blocks fields of different registers are now documented as 13-bit fields. The dbuf isn't really large enough to need the 13th bit, but let's go ahead and update the definition now just in case some new display IP in future ends up needing the larger size. The extra bit is an unused bit in previous display versions, so we can safely just extend the existing definition. Bspec: 69847, 69880, 72053 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-5-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Update bandwidth parametersMatt Atwood
Bandwidth parameters for Xe3p_LPD are the same as for Xe3_LPD. Re-use them. Since handling for Xe3_LPD version 30.02 is more like a special case, let's use a "== 3002" check for it inside the ">= 30" branch instead of adding a new branch for version 35. That allows us to re-use the ">= 30" branch for Xe3p_LPD. v2: - Do not have a special case for ecc_impacting_de_bw, since there are no specific instructions in Bspec for this scenario. (Matt Roper) v3: - Re-use the ">= 30" branch in the if-ladder. (Matt Roper) Bspec: 68859 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-4-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/display: Use braces for if-ladder in intel_bw_init_hw()Gustavo Sousa
Looking at the current if-ladder in intel_bw_init_hw(), we see that Xe2_HPD contains two entries, differing only for ECC memories. Let's improve readability by using braces and allowing adding extra conditions for each case. v2: - Tweaked commit message, since we are not going to add the ECC case for Xe3p_LPD anymore. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-3-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Drop north display reset option programmingMatt Roper
The NDE_RSTWRN_OPT has been removed on Xe3p platforms and reset option programming is no longer necessary during display init. Bspec: 68846, 69137 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-2-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/i915/xe3p_lpd: Add Xe3p_LPD display IP featuresSai Teja Pottumuttu
Xe3p_LPD (display version 35) is similar to Xe2_LPD with respect to the features described by struct intel_display_device_info, so reuse its device descriptor. v2: - Add reference to Bspec 74201. (Shekhar) Bspec: 74201, 74304 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20251103-xe3p_lpd-basic-enabling-v3-1-00e87b510ae7@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
2025-11-06drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpgNitin Gote
Wa_15016589081 applies to Xe3_LPG renderCS Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20251106100516.318863-2-nitin.r.gote@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-11-06drm/amd/display: Enable mst when it's detected but yet to be initializedWayne Lin
[Why] drm_dp_mst_topology_queue_probe() is used under the assumption that mst is already initialized. If we connect system with SST first then switch to the mst branch during suspend, we will fail probing topology by calling the wrong API since the mst manager is yet to be initialized. [How] At dm_resume(), once it's detected as mst branc connected, check if the mst is initialized already. If not, call dm_helpers_dp_mst_start_top_mgr() instead to initialize mst V2: Adjust the commit msg a bit Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62320fb8d91a0bddc44a228203cfa9bfbb5395bd) Cc: stable@vger.kernel.org
2025-11-06drm/amdgpu: Fix wait after reset sequence in S3Lijo Lazar
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1ad25fd272753db14c5d1cc8c68e20ce01f3f888)
2025-11-06drm/amd: Fix suspend failure with secure display TAMario Limonciello
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4104c0a454f6a4d1e0d14895d03c0e7bdd0c8240)
2025-11-06drm/amdgpu: fix gpu page fault after hibernation on PF passthroughSamuel Zhang
On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
2025-11-06drm/edid: add 6 bpc quirk to the Sharp LQ116M1JW10Ajye Huang
The Sharp LQ116M1JW105 reports that it supports 8 bpc modes, but it will happen display noise in some videos. So, limit it to 6 bpc modes. Signed-off-by: Ajye Huang <ajye_huang@compal.corp-partner.google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20251101040043.3768848-1-ajye_huang@compal.corp-partner.google.com
2025-11-06drm/amd/pm: Update default power1_capAsad Kamal
Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: skip writing eeprom when PMFW manages RAS dataTao Zhou
Only update bad page number in legacy eeprom write path. v2: add null pointer check for con. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amd/display: Enable mst when it's detected but yet to be initializedWayne Lin
[Why] drm_dp_mst_topology_queue_probe() is used under the assumption that mst is already initialized. If we connect system with SST first then switch to the mst branch during suspend, we will fail probing topology by calling the wrong API since the mst manager is yet to be initialized. [How] At dm_resume(), once it's detected as mst branc connected, check if the mst is initialized already. If not, call dm_helpers_dp_mst_start_top_mgr() instead to initialize mst V2: Adjust the commit msg a bit Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT") Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: support to load RAS bad pages from PMFWTao Zhou
PMFW manages eeprom bad page records, update bad page loading accrodingly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: Fix wait after reset sequence in S3Lijo Lazar
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: add ras_eeprom_read_idx interfaceTao Zhou
PMFW will manage RAS eeprom data by itself, add new interface to read eeprom data via PMFW, we can read part of records by setting index. v2: use IPID parse interface. pa is not used and set it to a fixed value. v3: optimize the null pointer check for IPID parse interface. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: make MCA IPID parse globalTao Zhou
So we can call it in other blocks. v2: add a new IPID parse interface for umc and we can implement it for each ASIC. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amd: Fix suspend failure with secure display TAMario Limonciello
commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded successfully") attempted to fix extra messages, but failed to port the cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable secure display TA multiple times") to prevent multiple tries. Add that to the failure handling path even on a quick failure. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679 Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amd/ras: Fix the issue of incorrect function callYiPeng Chai
When amdgpu_device_health_check fails, amdgpu_ras_pre_reset will not be called and therefore amdgpu_ras_post_reset cannot be called either. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: fix gpu page fault after hibernation on PF passthroughSamuel Zhang
On PF passthrough environment, after hibernate and then resume, coralgemm will cause gpu page fault. Mode1 reset happens during hibernate, but partition mode is not restored on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right after resume. When CP access the MQD BO, wrong stride size is used, this will cause out of bound access on the MQD BO, resulting page fault. The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called when resume from a hibernation. KFD resume is called separately during a reset recovery or resume from suspend sequence. Hence it's not required to be called as part of partition switch. Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12YiPeng Chai
ras supports i2c eeprom for mp1 v13_0_12. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdkfd: Do not wait for queue op response during resetAhmad Rehman
This patch adds the condition to not wait for the queue response for unmap, if the gpu is in reset. Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu/userq: need to unref boDavid (Ming Qiang) Wu
unref bo after amdgpu_bo_reserve() failure as it has called amdgpu_bo_ref() already Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: initialize max record count after table resetGangliang Xie
initialize max record count and record offset after table reset Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amd/pm: check pmfw eeprom feature bitGangliang Xie
get and check the pmfw eeprom feature bit to decide if pmfw eeprom is supported Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: add check function for pmfw eepromGangliang Xie
add check function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: add initialization function for pmfw eepromGangliang Xie
add initialization function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/amdgpu: adapt reset function for pmfw eepromGangliang Xie
adapt reset function for pmfw eeprom Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06drm/tiny: pixpaper: add explicit dependency on MMULiangCheng Wang
The DRM_GEM_SHMEM_HELPER helper requires MMU enabled because it uses vmf_insert_pfn() in its mmap implementation. On NOMMU configurations (e.g. some RISC-V randconfig builds), this symbol is unavailable and selecting DRM_GEM_SHMEM_HELPER causes a modpost undefined reference: ERROR: modpost: "vmf_insert_pfn" [drivers/gpu/drm/drm_shmem_helper.ko] undefined! Normally, Kconfig prevents this helper from being selected when CONFIG_MMU=n. However, in some randconfig builds (such as those used by 0day CI), select statements can override unmet dependencies, triggering the issue. Add an explicit dependency on MMU to DRM_PIXPAPER to prevent this. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510280213.0rlYA4T3-lkp@intel.com/ Fixes: 0c4932f6ddf8 ("drm/tiny: pixpaper: Fix missing dependency on DRM_GEM_SHMEM_HELPER") Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: LiangCheng Wang <zaq14760@gmail.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251028-bar-v1-1-edfbd13fafff@gmail.com
2025-11-06drm/ttm: Fix @alloc_flags descriptionBagas Sanjaya
Stephen Rothwell reports htmldocs warnings when merging drm-misc tree: Documentation/gpu/drm-mm:40: include/drm/ttm/ttm_device.h:225: ERROR: Unknown target name: "ttm_allocation". [docutils] Documentation/gpu/drm-mm:43: drivers/gpu/drm/ttm/ttm_device.c:202: ERROR: Unknown target name: "ttm_allocation". [docutils] Documentation/gpu/drm-mm:73: include/drm/ttm/ttm_pool.h:68: ERROR: Unknown target name: "ttm_allocation_pool". [docutils] Documentation/gpu/drm-mm:76: drivers/gpu/drm/ttm/ttm_pool.c:1070: ERROR: Unknown target name: "ttm_allocation_pool". [docutils] Fix these by adding missing wildcard on TTM_ALLOCATION_* and TTM_ALLOCATION_POOL_* in @alloc_flags description. Fixes: 0af5b6a8f8dd ("drm/ttm: Replace multiple booleans with flags in pool init") Fixes: 77e19f8d3297 ("drm/ttm: Replace multiple booleans with flags in device init") Fixes: 402b3a865090 ("drm/ttm: Add an allocation flag to propagate -ENOSPC on OOM") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/linux-next/20251105161838.55b962a3@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://lore.kernel.org/r/20251106005217.14026-1-bagasdotme@gmail.com
2025-11-06drm/nouveau: Advertise correct modifiers on GB20xJames Jones
8 and 16 bit formats use a different layout on GB20x than they did on prior chips. Add the corresponding DRM format modifiers to the list of modifiers supported by the display engine on such chips, and filter the supported modifiers for each format based on its bytes per pixel in nv50_plane_format_mod_supported(). Note this logic will need to be updated when GB10 support is added, since it is a GB20x chip that uses the pre-GB20x sector layout for all formats. Fixes: 6cc6e08d4542 ("drm/nouveau/kms: add support for GB20x") Signed-off-by: James Jones <jajones@nvidia.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251030181153.1208-3-jajones@nvidia.com
2025-11-06drm/nouveau: set DMA mask before creating the flush pageTimur Tabi
Set the DMA mask before calling nvkm_device_ctor(), so that when the flush page is created in nvkm_fb_ctor(), the allocation will not fail if the page is outside of DMA address space, which can easily happen if IOMMU is disable. In such situations, you will get an error like this: nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0). Commit 38f5359354d4 ("rm/nouveau/pci: set streaming DMA mask early") set the mask after calling nvkm_device_ctor(), but back then there was no flush page being created, which might explain why the mask wasn't set earlier. Flush page allocation was added in commit 5728d064190e ("drm/nouveau/fb: handle sysmem flush page from common code"). nvkm_fb_ctor() calls alloc_page(), which can allocate a page anywhere in system memory, but then calls dma_map_page() on that page. But since the DMA mask is still set to 32, the map can fail if the page is allocated above 4GB. This is easy to reproduce on systems with a lot of memory and IOMMU disabled. An alternative approach would be to force the allocation of the flush page to low memory, by specifying __GFP_DMA32. However, this would always allocate the page in low memory, even though the hardware can access high memory. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20251014174512.3172102-1-ttabi@nvidia.com
2025-11-05drm/i915/dmc: Fix extra bracket and wrong variable in PIPEDMC error logsAlok Tiwari
Fixes two issues in intel_pipedmc_irq_handler(): - Removed an extra ']' in the PIPEDMC error and interrupt vector log. - Corrected the interrupt vector log to print int_vector instead of tmp, as tmp will be zero in this case. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251103132337.762156-1-alok.a.tiwari@oracle.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-05drm/xe/gt_throttle: Avoid TOCTOU when monitoring reasonsLucas De Marchi
It's currently not possible to safely monitor if there's throttling happening and what are the reasons. The approach of reading the status and then reading the reasons is not reliable as by the time sysadmin reads the reason, the throttling could not be happening anymore. Previous tentative to fix that[1] was breaking the ABI and potentially sysadmin's scripts. This takes a different approach of adding and documenting the additional attribute. It's still valuable, though redundant, to provide the simpler 0/1 interface. In order to avoid userspace knowledge on the bitmask meaning and to be able to maintain the kernel side in sync with possible changes in future, just walk the attribute group and check what are the masks that match the value read. [1] https://lore.kernel.org/intel-xe/20241025092238.167042-1-raag.jadav@intel.com/ Cc: Raag Jadav <raag.jadav@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20251104-gt-throttle-cri-v5-1-4948b060bbfd@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-05drm/xe: Remove never used code in xe_vm_create()Gwan-gyeong Mun
Clang is not happy with set but unused variable (this is visible with `make LLVM=1` build: drivers/gpu/drm/xe/xe_vm.c:1462:11: error: variable 'number_tiles' set but not used [-Werror,-Wunused-but-set-variable] The use of this variable was removed in the commit mentioned below as "Fixes:" but only its declaration and update remain. It seems like the variable is not used along with the assignment that does not have side effects as far as I can see. Remove those altogether. Fixes: cb99e12ba8cb ("drm/xe: Decouple bind queue last fence from TLB invalidations") Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20251105011311.3177875-1-gwan-gyeong.mun@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-11-05drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cbPierre-Eric Pelloux-Prayer
The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). Cc: stable@vger.kernel.org # v6.2+ Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [phasta: commit message nits] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
2025-11-05gpu: nova-core: vbios: use FromBytes for NpdeStructAlexandre Courbot
Use `from_bytes_copy_prefix` to create `NpdeStruct` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-5-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for BitHeaderAlexandre Courbot
Use `from_bytes_copy_prefix` to create `BitHeader` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-4-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for PcirStructAlexandre Courbot
Use `from_bytes_copy_prefix` to create `PcirStruct` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-3-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for PmuLookupTable headerAlexandre Courbot
Use `from_bytes_copy_prefix` to create the `PmuLookupTable` header instead of building it ourselves from the bytes stream. This lets us remove a few `as` conversions and array accesses. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-2-ac441ebc1de3@nvidia.com>
2025-11-05drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup()Alok Tiwari
The rcar_du_crtc_cleanup() function has a void return type, but incorrectly uses a return statement with a call to drm_crtc_cleanup(), which also returns void. Remove the return statement to ensure proper function semantics. No functional change intended. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
2025-11-04drm/amd/display: Fix NULL deref in debugfs odm_combine_segmentsRong Zhang
When a connector is connected but inactive (e.g., disabled by desktop environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading odm_combine_segments causes kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy) e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6 Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025 RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> seq_read_iter+0x125/0x490 ? __alloc_frozen_pages_noprof+0x18f/0x350 seq_read+0x12c/0x170 full_proxy_read+0x51/0x80 vfs_read+0xbc/0x390 ? __handle_mm_fault+0xa46/0xef0 ? do_syscall_64+0x71/0x900 ksys_read+0x73/0xf0 do_syscall_64+0x71/0x900 ? count_memcg_events+0xc2/0x190 ? handle_mm_fault+0x1d7/0x2d0 ? do_user_addr_fault+0x21a/0x690 ? exc_page_fault+0x7e/0x1a0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f44d4031687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00> RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687 RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003 RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000 </TASK> Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x> snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn> platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp> CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu] Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00> RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286 RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8 RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0 R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08 R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001 FS: 00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0 PKRU: 55555554 Fix this by checking pipe_ctx->stream_res.tg before dereferencing. Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info") Signed-off-by: Rong Zhang <i@rong.moe> Reviewed-by: Mario Limoncello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f19bbecd34e3c15eed7e5e593db2ac0fc7a0e6d8) Cc: stable@vger.kernel.org
2025-11-04drm/amdkfd: Don't clear PT after process killedPhilip Yang
If process is killed. the vm entity is stopped, submit pt update job will trigger the error message "*ERROR* Trying to push to a killed entity", job will not execute. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 10c382ec6c6d1e11975a11962bec21cba6360391) Cc: stable@vger.kernel.org
2025-11-04drm/amdgpu/smu: Handle S0ix for vangoghAlex Deucher
Fix the flows for S0ix. There is no need to stop rlc or reintialize PMFW in S0ix. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Antheas Kapenekakis <lkml@antheas.dev> Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fd39b5a5830d8f2553e0c09d4d50bdff28b10080) Cc: <stable@vger.kernel.org> # c81f5cebe849: drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend() Cc: <stable@vger.kernel.org>
2025-11-04drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()Alex Deucher
For S3 on vangogh, PMFW needs to be notified before the driver powers down RLC. This already happens in smu_disable_dpms() so drop the superfluous call in amdgpu_device_suspend(). Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 960e30a61e1a7ca5341a6cf9481e770e1cda24aa)