From 337b1b566db087347194e4543ddfdfa5645275cc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Thu, 13 Nov 2025 18:26:23 +0200 Subject: PCI: Fix restoring BARs on BAR resize rollback path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring from the entries failed list as saved list should now contain both the bridge windows and device resources so the extra restore is duplicated work. Some drivers (currently only amdgpu) want to prevent releasing some resources. Add exclude_bars param to pci_resize_resource() and make amdgpu pass its register BAR (BAR 2 or 5), which should never be released during resize operation. Normally 64-bit prefetchable resources do not share a bridge window with the 32-bit only register BAR, but there are various fallbacks in the resource assignment logic which may make the resources share the bridge window in rare cases. This change (together with the driver side changes) is to counter the resource releases that had to be done to prevent resource tree corruption in the ("PCI: Release assigned resource before restoring them") change. As such, it likely restores functionality in cases where device resources were released to avoid resource tree conflicts which appeared to be "working" when such conflicts were not correctly detected by the kernel. Reported-by: Simon Richter Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/ Reported-by: Alex Bennée Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/ Signed-off-by: Ilpo Järvinen [bhelgaas: squash amdgpu BAR selection from https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas Tested-by: Alex Bennée # AVA, AMD GPU Reviewed-by: Christian König Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7a899fb4de29..bf0bc38e1c47 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1736,7 +1736,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) pci_release_resource(adev->pdev, 0); - r = pci_resize_resource(adev->pdev, 0, rbar_size); + r = pci_resize_resource(adev->pdev, 0, rbar_size, + (adev->asic_type >= CHIP_BONAIRE) ? 1 << 5 + : 1 << 2); if (r == -ENOSPC) dev_info(adev->dev, "Not enough PCI address space for a large BAR."); -- cgit v1.2.3 From db92e3fef53e2083001dcc4c86a3ecff4ab4dc4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Thu, 13 Nov 2025 18:26:27 +0200 Subject: drm/amdgpu: Remove driver side BAR release before resize MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PCI core handles releasing device's resources and their rollback in case of failure of a BAR resizing operation. Releasing resource prior to calling pci_resize_resource() prevents PCI core from restoring the BARs as they were. Remove driver-side release of BARs from the amdgpu driver. Also remove the driver initiated assignment as pci_resize_resource() should try to assign as much as possible. If the driver side call manages to get more required resources assigned in some scenario, such a problem should be fixed inside pci_resize_resource() instead. Signed-off-by: Ilpo Järvinen Signed-off-by: Bjorn Helgaas Tested-by: Alex Bennée # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-11-ilpo.jarvinen@linux.intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bf0bc38e1c47..d777454a170a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1729,12 +1729,8 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) pci_write_config_word(adev->pdev, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY); - /* Free the VRAM and doorbell BAR, we most likely need to move both. */ + /* Tear down doorbell as resizing will release BARs */ amdgpu_doorbell_fini(adev); - if (adev->asic_type >= CHIP_BONAIRE) - pci_release_resource(adev->pdev, 2); - - pci_release_resource(adev->pdev, 0); r = pci_resize_resource(adev->pdev, 0, rbar_size, (adev->asic_type >= CHIP_BONAIRE) ? 1 << 5 @@ -1745,8 +1741,6 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) else if (r && r != -ENOTSUPP) dev_err(adev->dev, "Problem resizing BAR0 (%d).", r); - pci_assign_unassigned_bus_resources(adev->pdev->bus); - /* When the doorbell or fb BAR isn't available we have no chance of * using the device. */ -- cgit v1.2.3 From c7df7059e3baa1df96d03429923cc137f134658c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Thu, 13 Nov 2025 20:00:52 +0200 Subject: drm/amdgpu: Use pci_rebar_get_max_size() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use pci_rebar_get_max_size() to simplify amdgpu_device_resize_fb_bar(). Signed-off-by: Ilpo Järvinen Signed-off-by: Bjorn Helgaas Reviewed-by: Christian König Link: https://patch.msgid.link/20251113180053.27944-11-ilpo.jarvinen@linux.intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.c') diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d777454a170a..9c9b482e7ec2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1673,9 +1673,9 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) int rbar_size = pci_rebar_bytes_to_size(adev->gmc.real_vram_size); struct pci_bus *root; struct resource *res; + int max_size, r; unsigned int i; u16 cmd; - int r; if (!IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT)) return 0; @@ -1721,8 +1721,10 @@ int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev) return 0; /* Limit the BAR size to what is available */ - rbar_size = min(fls(pci_rebar_get_possible_sizes(adev->pdev, 0)) - 1, - rbar_size); + max_size = pci_rebar_get_max_size(adev->pdev, 0); + if (max_size < 0) + return 0; + rbar_size = min(max_size, rbar_size); /* Disable memory decoding while we change the BAR addresses and size */ pci_read_config_word(adev->pdev, PCI_COMMAND, &cmd); -- cgit v1.2.3