summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
10 daysPCI: tegra194: Use DWC IP core versionManikanta Maddireddy
Tegra194 PCIe driver used custom version numbers to detect Tegra194 and Tegra234 IPs. With version detect logic added, version check results in mismatch warnings: tegra194-pcie 14100000.pcie: Versions don't match (0000562a != 3536322a) Use HW version numbers which match to PORT_LOGIC.PCIE_VERSION_OFF in Tegra194 driver to avoid these kernel warnings. Fixes: a54e19073718 ("PCI: tegra194: Add Tegra234 PCIe support") Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-12-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Free up Endpoint resources during remove()Vidya Sagar
Free up the resources during remove() that were acquired by the DesignWare driver for the Endpoint mode during probe(). Fixes: bb617cbd8151 ("PCI: tegra194: Clean up the exit path for Endpoint mode") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-11-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Allow system suspend when the Endpoint link is not upVidya Sagar
Host software initiates the L2 sequence. PCIe link is kept in L2 state during suspend. If Endpoint mode is enabled and the link is up, the software cannot proceed with suspend. However, when the PCIe Endpoint driver is probed, but the PCIe link is not up, Tegra can go into suspend state. So, allow system to suspend in this case. Fixes: de2bbf2b71bb ("PCI: tegra194: Don't allow suspend when Tegra PCIe is in EP mode") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-10-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Set LTR message request before PCIe link up in Endpoint modeVidya Sagar
LTR message should be sent as soon as the Root Port enables LTR in the Endpoint mode. So set snoop and no-snoop LTR timing and LTR message request before the PCIe link comes up, so that the LTR message is sent upstream as soon as LTR is enabled. Without programming these values, the Endpoint would send latencies of 0 to the host, which will be inaccurate. Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> [mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-9-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Disable direct speed change for Endpoint modeVidya Sagar
Pre-silicon simulation showed the controller operating in Endpoint mode initiating link speed change after completing Secondary Bus Reset. Ideally, the Root Port or the Switch Downstream Port should initiate the link speed change post SBR, not the Endpoint. So, as per the hardware team recommendation, disable direct speed change for the Endpoint mode to prevent it from initiating speed change after the physical layer link is up at Gen1, leaving speed change ownership with the host. Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> [mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-8-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select"Vidya Sagar
The GPIO DT property "nvidia,refclk-select", to select the PCIe reference clock is optional. Use devm_gpiod_get_optional() to get it. Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-7-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Disable PERST# IRQ only in Endpoint modeManikanta Maddireddy
The PERST# GPIO interrupt is only registered when the controller is operating in Endpoint mode. In Root Port mode, the PERST# GPIO is configured as an output to control downstream devices, and no interrupt is registered for it. Currently, tegra_pcie_dw_stop_link() unconditionally calls disable_irq() on pex_rst_irq, which causes issues in Root Port mode where this IRQ is not registered. Fix this by only disabling the PERST# IRQ when operating in Endpoint mode, where the interrupt is actually registered and used to detect PERST# assertion/deassertion from the host. Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194") Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-6-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Don't force the device into the D0 state before L2Vidya Sagar
As per PCIe CEM r6.0, sec 2.3, the PCIe Endpoint device should be in D3cold to assert WAKE# pin. The previous workaround that forced downstream devices to D0 before taking the link to L2 cited PCIe r4.0, sec 5.2, "Link State Power Management"; however, that spec does not explicitly require putting the device into D0 and only indicates that power removal may be initiated without transitioning to D3hot. Remove the D0 workaround so that Endpoint devices can use wake functionality (WAKE# from D3). With some Endpoints the link may not enter L2 when they remain in D3, but the Root Port continues with the usual flow after PME timeout, so there is no functional issue. Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Vidya Sagar <vidyas@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-5-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Disable LTSSM after transition to Detect on surprise link downManikanta Maddireddy
After the link reaches a Detect-related LTSSM state, disable LTSSM so it does not keep toggling between Polling and Detect. Do this by polling for the Detect state first, then clearing APPL_CTRL_LTSSM_EN in both tegra_pcie_dw_pme_turnoff() and pex_ep_event_pex_rst_assert(). Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-4-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Increase LTSSM poll time on surprise link downManikanta Maddireddy
On surprise link down, LTSSM state transits from L0 -> Recovery.RcvrLock -> Recovery.RcvrSpeed -> Gen1 Recovery.RcvrLock -> Detect. Recovery.RcvrLock and Recovery.RcvrSpeed transit times are 24 ms and 48 ms respectively, so the total time from L0 to Detect is ~96 ms. Increase the poll timeout to 120 ms to account for this. While at it, add LTSSM state defines for Detect-related states and use them in the poll condition. Use readl_poll_timeout() instead of readl_poll_timeout_atomic() in tegra_pcie_dw_pme_turnoff() since that path runs in non-atomic context. Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-3-mmaddireddy@nvidia.com
10 daysPCI: tegra194: Fix polling delay for L2 stateVidya Sagar
As per PCIe r7.0, sec 5.3.3.2.1, after sending PME_Turn_Off message, Root Port should wait for 1-10 msec for PME_TO_Ack message. Currently, driver is polling for 10 msec with 1 usec delay which is aggressive. Use existing macro PCIE_PME_TO_L2_TIMEOUT_US to poll for 10 msec with 1 msec delay. Since this function is used in non-atomic context only, use non-atomic poll function. Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260324190755.1094879-2-mmaddireddy@nvidia.com
10 daysselftests: pci_endpoint: Skip BAR subrange test on -ENOSPCChristian Bruel
In pci-epf-test.c, set the STATUS_NO_RESOURCE status bit if pci_epc_set_bar() returns -ENOSPC. This status bit is used to indicate that there are not enough inbound window resources to allocate the subrange. In pci_endpoint_test.c, return -ENOSPC instead of -EIO when STATUS_NO_RESOURCE is set. In pci_endpoint_test.c, skip the BAR subrange test if -ENOSPC, i.e., there are not enough inbound window resources to run the test. Signed-off-by: Christian Bruel <christian.bruel@foss.st.com> [mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> [bhelgaas: squash related commits] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Koichiro Den <den@valinux.co.jp> Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-1-6f2e65f2298c@foss.st.com Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-2-6f2e65f2298c@foss.st.com Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-3-6f2e65f2298c@foss.st.com
11 daysPCI: Remove no_pci_devices()Heiner Kallweit
After having removed the last usage of no_pci_devices(), this function can be removed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/b0ce592d-c34c-4e0b-b389-4e346b3a0c44@gmail.com
11 daysMerge tag 'hyperv-fixes-signed-20260406' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V fixes from Wei Liu: - Two fixes for Hyper-V PCI driver (Long Li, Sahil Chandna) - Fix an infinite loop issue in MSHV driver (Stanislav Kinsburskii) * tag 'hyperv-fixes-signed-20260406' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: mshv: Fix infinite fault loop on permission-denied GPA intercepts PCI: hv: Fix double ida_free in hv_pci_probe error path PCI: hv: Set default NUMA node to 0 for devices without affinity info
11 daysPCI: dw-rockchip: Add pcie_ltssm_state_transition tracepoint supportShawn Lin
Rockchip platforms provide a 64x4 bytes debug FIFO to trace the LTSSM transition and data rate change history. These will be useful for debugging issues such as link failure, etc. Hence, expose these information over pcie_ltssm_state_transition tracepoint. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> [mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Anand Moon <linux.amoon@gmail.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/1774403912-210670-4-git-send-email-shawn.lin@rock-chips.com
11 daysPCI: trace: Add PCI controller tracepoint featureShawn Lin
Some PCI controllers may provide debug functionalities to track PCI bus activities like LTSSM state transitions and data rate changes. These will be very useful for debugging PCI link specific issues such as endpoint not getting detected or performance issues. Hence, implement the PCI controller tracepoint feature for recording LTSSM state transitions and data rate changes. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> [mani: commit log and maintainers entry] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Anand Moon <linux.amoon@gmail.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/1774403912-210670-2-git-send-email-shawn.lin@rock-chips.com
12 daysPCI/NPEM: Set LED_HW_PLUGGABLE for hotplug-capable portsRichard Cheng
NPEM registers LED classdevs on PCI endpoint that may be behind hotplug-capable ports. During hot-removal, led_classdev_unregister() calls led_set_brightness(LED_OFF) which leads to a PCI config read to a disconnected device, which fails and returns -ENODEV (topology details in msgid.link below): leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19) The LED core already suppresses this for devices with LED_HW_PLUGGABLE set, but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable hardware by nature. Fixes: 4e893545ef87 ("PCI/NPEM: Add Native PCIe Enclosure Management support") Signed-off-by: Richard Cheng <icheng@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Kai-Heng Feng <kaihengf@nvidia.com> Link: https://patch.msgid.link/20260402093850.23075-1-icheng@nvidia.com
12 daysPCI: imx6: Fix reference clock source selection for i.MX95Franz Schnyder
In the PCIe PHY init for the i.MX95, the reference clock source selection uses a conditional instead of always passing the mask. This currently breaks functionality if the internal refclk is used. To fix this issue, always pass IMX95_PCIE_REF_USE_PAD as the mask and clear bit if external refclk is not used. This essentially swaps the parameters. Fixes: d8574ce57d76 ("PCI: imx6: Add external reference clock input mode support") Signed-off-by: Franz Schnyder <franz.schnyder@toradex.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Richard Zhu <hongxing.zhu@nxp.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260325093118.684142-1-fra.schnyder@gmail.com
12 daysPCI/TPH: Pass ACPI Processor UID to Cache Locality _DSMChengwen Feng
pcie_tph_get_cpu_st() uses the Query Cache Locality Features _DSM [1] to retrieve the TPH Steering Tag for memory associated with the CPU identified by its "cpu_uid" parameter, a Linux logical CPU ID. The _DSM requires an ACPI Processor UID, which pcie_tph_get_cpu_st() previously assumed was the same as the Linux logical CPU ID. This is true on x86 but not on arm64, so pcie_tph_get_cpu_st() returned the wrong Steering Tag, resulting in incorrect TPH functionality on arm64. Convert the Linux logical CPU ID to the ACPI Processor UID with acpi_get_cpu_uid() before passing it to the _DSM. Additionally, rename the pcie_tph_get_cpu_st() parameter from "cpu_uid" to "cpu" to reflect that it represents a logical CPU ID (not an ACPI Processor UID). [1] According to ECN_TPH-ST_Revision_20200924 (https://members.pcisig.com/wg/PCI-SIG/document/15470), the input is defined as: "If the target is a processor, then this field represents the ACPI Processor UID of the processor as specified in the MADT. If the target is a processor container, then this field represents the ACPI Processor UID of the processor container as specified in the PPTT." Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support") Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260401081640.26875-9-fengchengwen@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
12 daysPCI: hisi: Use devm_ghes_register_vendor_record_notifier()Kai-Heng Feng
Switch to the device-managed variant so the notifier is automatically unregistered on device removal, allowing the open-coded remove callback to be dropped entirely. Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20260330094203.38022-3-kaihengf@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
14 daysPCI: cadence: Use cdns_pcie_read_sz() for byte or word read accessAksh Garg
The commit 18ac51ae9df9 ("PCI: cadence: Implement capability search using PCI core APIs") assumed all the platforms using Cadence PCIe controller support byte and word register accesses. This is not true for all platforms (e.g., TI J721E SoC, which only supports dword register accesses). This causes capability searches via cdns_pcie_find_capability() to fail on such platforms. Fix this by using cdns_pcie_read_sz() for config read functions, which properly handles size-aligned accesses. Remove the now-unused byte and word read wrapper functions (cdns_pcie_readw and cdns_pcie_readb). Fixes: 18ac51ae9df9 ("PCI: cadence: Implement capability search using PCI core APIs") Signed-off-by: Aksh Garg <a-garg7@ti.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260402085545.284457-1-a-garg7@ti.com
14 daysPCI: mediatek-gen3: Prevent leaking IRQ domains when IRQ not foundChen-Yu Tsai
In mtk_pcie_setup_irq(), the IRQ domains are allocated before the controller's IRQ is fetched. If the latter fails, the function directly returns an error, without cleaning up the allocated domains. Hence, reverse the order so that the IRQ domains are allocated after the controller's IRQ is found. This was flagged by Sashiko during a review of "[PATCH v6 0/7] PCI: mediatek-gen3: add power control support". Fixes: 814cceebba9b ("PCI: mediatek-gen3: Add INTx support") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://sashiko.dev/#/patchset/20260324052002.4072430-1-wenst%40chromium.org Link: https://patch.msgid.link/20260324093542.18523-1-wenst@chromium.org
2026-04-04PCI: tegra194: Expose BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVEDManikanta Maddireddy
Tegra Endpoint exposes three 64-bit BARs at indices 0, 2, and 4: - BAR0+BAR1: EPF test/data (programmable 64-bit BAR) - BAR2+BAR3: MSI-X table (hardware-backed) - BAR4+BAR5: DMA registers (hardware-backed) Update tegra_pcie_epc_features so that BAR2 is BAR_RESERVED with PCI_EPC_BAR_RSVD_MSIX_TBL_RAM (64 KB) & PCI_EPC_BAR_RSVD_MSIX_PBA_RAM (64 KB) and BAR4 is BAR_RESERVED with PCI_EPC_BAR_RSVD_DMA_CTRL_MMIO (4KB). This keeps CONSECUTIVE_BAR_TEST working while allowing the host to use 64-bit BAR2 (MSI-X) and BAR4 (DMA). Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260324080857.916263-4-mmaddireddy@nvidia.com
2026-04-04PCI: tegra194: Make BAR0 programmable and remove 1MB size limitManikanta Maddireddy
The Tegra194/234 Endpoint does not support the Resizable BAR capability, but BAR0 can be programmed to different sizes via the DBI2 BAR registers in dw_pcie_ep_set_bar_programmable(). The BAR0 size is set once during initialization. Remove the fixed 1MB limit from pci_epc_features so Endpoint function drivers can configure the BAR0 size they need. Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Link: https://patch.msgid.link/20260324080857.916263-3-mmaddireddy@nvidia.com
2026-04-04PCI: aspeed: Fix IRQ domain leak on platform_get_irq() failureFelix Gu
The aspeed_pcie_probe() function calls aspeed_pcie_init_irq_domain() which allocates pcie->intx_domain and initializes MSI. However, if platform_get_irq() fails afterwards, the cleanup action was not yet registered via devm_add_action_or_reset(), causing the IRQ domain resources to leak. Fix this by registering the devm cleanup action immediately after aspeed_pcie_init_irq_domain() succeeds, before calling platform_get_irq(). This ensures proper cleanup on any subsequent failure. Fixes: 9aa0cb68fcc1 ("PCI: aspeed: Add ASPEED PCIe RC driver") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20260324-aspeed-v1-1-354181624c00@gmail.com
2026-04-04PCI: hv: Fix double ida_free in hv_pci_probe error pathSahil Chandna
If hv_pci_probe() fails after storing the domain number in hbus->bridge->domain_nr, there is a call to free this domain_nr via pci_bus_release_emul_domain_nr(), however, during cleanup, the bridge release callback pci_release_host_bridge_dev() also frees the domain_nr causing ida_free to be called on same ID twice and triggering following warning: ida_free called for id=28971 which is not allocated. WARNING: lib/idr.c:594 at ida_free+0xdf/0x160, CPU#0: kworker/0:2/198 Call Trace: pci_bus_release_emul_domain_nr+0x17/0x20 pci_release_host_bridge_dev+0x4b/0x60 device_release+0x3b/0xa0 kobject_put+0x8e/0x220 devm_pci_alloc_host_bridge_release+0xe/0x20 devres_release_all+0x9a/0xd0 device_unbind_cleanup+0x12/0xa0 really_probe+0x1c5/0x3f0 vmbus_add_channel_work+0x135/0x1a0 Fix this by letting pci core handle the free domain_nr and remove the explicit free called in pci-hyperv driver. Fixes: bcce8c74f1ce ("PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms") Signed-off-by: Sahil Chandna <sahilchandna@linux.microsoft.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-04PCI: use generic driver_override infrastructureDanilo Krummrich
When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override") Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex@shazbot.org> Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-04-03PCI: imx6: Keep Root Port MSI capability with iMSI-RX to work around ↵Richard Zhu
hardware bug On NXP i.MX7D, i.MX8MM, and i.MX8MQ chipsets, MSIs from the endpoints won't be received by the iMSI-RX MSI controller if the Root Port MSI capability is disabled. Even though the Root Port MSIs won't be received by the iMSI-RX controller due to design, these chipsets have some weird hardware bug that prevents the endpoint MSIs from reaching when the Root Port MSI capability is disabled. Hence, introduce a new flag, 'dw_pcie_rp::keep_rp_msi_en', set it for the above mentioned SoCs, and always keep the Root Port MSI capability when this flag is set. Note that by keeping Root Port MSI capability, Root Port MSIs such as AER, PME and others won't be received by default. So users need to use workarounds such as passing 'pcie_pme=nomsi' cmdline param. Fixes: f5cd8a929c825 ("PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller") Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> [mani: commit log] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> [bhelgaas: fix typos] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260331085252.1243108-1-hongxing.zhu@nxp.com
2026-04-03PCI: Update PCIe spec references for AtomicOpsGerd Bayer
Point to the relevant sections in the most recent release 7.0 of the PCIe spec. Text has mostly just moved around without any semantic change. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260330-fix_pciatops-v7-3-f601818417e8@linux.ibm.com
2026-04-03PCI: Enable AtomicOps only if Root Port supports themGerd Bayer
When inspecting the config space of a Connect-X physical function in an s390 system after it was initialized by the mlx5_core device driver, we found the function to be enabled to request AtomicOps despite the Root Port lacking support for completing them: 00:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] Subsystem: Mellanox Technologies Device 0002 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- AtomicOpsCtl: ReqEn+ On s390 and many virtualized guests, the Endpoint is visible but the Root Port is not. In this case, pci_enable_atomic_ops_to_root() previously enabled AtomicOps in the Endpoint even though it can't tell whether the Root Port supports them as a completer. Change pci_enable_atomic_ops_to_root() to fail if there's no Root Port or the Root Port doesn't support AtomicOps. Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()") Reported-by: Alexander Schmidt <alexs@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> [bhelgaas: commit log, check RP first to simplify flow] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260330-fix_pciatops-v7-2-f601818417e8@linux.ibm.com
2026-04-03PCI: Do not enable AtomicOps by RCiEPsGerd Bayer
Since Root Complex Integrated Endpoints (RCiEPs) attach to a bus that has no bridge device describing the Root Port, the capability to complete AtomicOps requests cannot be determined with PCIe methods. Change default of pci_enable_atomic_ops_to_root() to not enable AtomicOps requests on RCiEPs. As far as we know, there are no RCiEPs that need AtomicOps (see Link below). Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260330-fix_pciatops-v7-1-f601818417e8@linux.ibm.com
2026-04-02PCI: dwc: Fix type mismatch for kstrtou32_from_user() return valueHans Zhang
kstrtou32_from_user() returns int, but the return value was stored in a u32 variable 'val', risking sign loss. Use a dedicated int variable to correctly handle the return code. Fixes: 4fbfa17f9a07 ("PCI: dwc: Add debugfs based Silicon Debug support for DWC") Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260401023048.4182452-1-18255117159@163.com
2026-03-30PCI: Clean up dead code in KconfigJulian Braha
There is already an 'if PCI' condition wrapping several config options, e.g., PCI_DOMAINS and VGA_ARB, making the 'depends on PCI' statement for each of these a duplicate dependency (dead code). Leave the outer 'if PCI...endif' and remove the individual 'depends on PCI' statement from each option. This dead code was found by kconfirm, a static analysis tool for Kconfig. Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260330214549.16157-1-julianbraha@gmail.com
2026-03-30PCI/DPC: Log AER error info for DPC/EDR uncorrectable errorsKuppuswamy Sathyanarayanan
aer_print_error() skips printing if ratelimit_print[i] is not set. In the native AER path, ratelimit_print is initialized by add_error_device() during source device discovery, and is set to 1 for fatal errors to bypass rate limiting since fatal errors should always be logged. The DPC/EDR path uses the DPC-capable port as the error source and reads its AER uncorrectable error status registers directly in dpc_get_aer_uncorrect_severity(). Since it does not go through add_error_device(), ratelimit_print[0] is left uninitialized and zero. As a result, aer_print_error() silently drops all AER error messages for DPC/EDR triggered events. Set ratelimit_print[0] to 1 to bypass rate limiting and always print AER logs for uncorrectable errors detected by the DPC port. Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging") Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260318170449.2733581-1-sathyanarayanan.kuppuswamy@linux.intel.com
2026-03-30PCI/sysfs: Suppress FW_BUG warning when NUMA node already matchesLi RongQing
The numa_node sysfs interface allows users to manually override a PCI device's NUMA node assignment. Currently, every write triggers a FW_BUG warning and taints the kernel, even when writing the same value that is already set. Check if the requested node is already assigned to the device. If it matches, return success immediately without tainting the kernel or printing a warning. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260317002803.2353-1-lirongqing@baidu.com
2026-03-30PCI/AER: Stop ruling out unbound devices as error sourceLukas Wunner
When searching for the error source, the AER driver rules out devices whose enable_cnt is zero. This was introduced in 2009 by commit 28eb27cf0839 ("PCI AER: support invalid error source IDs") without providing a rationale. Drivers typically call pci_enable_device() on probe, hence the enable_cnt check essentially filters out unbound devices. At the time of the commit, drivers had to opt in to AER by calling pci_enable_pcie_error_reporting() and so any AER-enabled device could be assumed to be bound to a driver. The check thus made sense because it allowed skipping config space accesses to devices which were known not to be the error source. But since 2022, AER is universally enabled on all devices when they are enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"). Errors may very well be reported by unbound devices, e.g. due to link instability. By ruling them out as error source, errors reported by them are neither logged nor cleared. When they do get bound and another error occurs, the earlier error is reported together with the new error, which may confuse users. Stop doing so. Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Stefan Roese <stefan.roese@mailbox.org> Cc: stable@vger.kernel.org # v6.0+ Link: https://patch.msgid.link/734338c2e8b669db5a5a3b45d34131b55ffebfca.1774605029.git.lukas@wunner.de
2026-03-30PCI/VGA: Fail pci_set_vga_state() if VGA decoding not supportedSimon Richter
PCI bridges are allowed to refuse activating VGA decoding, by simply ignoring attempts to set the bit that enables it, so after setting the bit, read it back to verify. One example of such a bridge is the root bridge in IBM PowerNV, but this is also useful for GPU passthrough into virtual machines, where it is difficult to set up routing for legacy IO through IOMMU. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> [bhelgaas: subject, add comment about VGA Enable writability] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260307173538.763188-5-Simon.Richter@hogyros.de
2026-03-27PCI: endpoint: pci-epf-vntb: Implement .get_dma_dev()Koichiro Den
When vNTB is used as a PCI endpoint function, the NTB device is backed by a virtual PCI function. For DMA API allocations and mappings, NTB clients must use the device that is associated with the IOMMU domain. Implement ntb_dev_ops->get_dma_dev() for pci-epf-vntb and return the EPC parent device. Suggested-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260306031443.1911860-4-den@valinux.co.jp
2026-03-27PCI: Fix alignment calculation for resource size larger than alignIlpo Järvinen
The commit bc75c8e50711 ("PCI: Rewrite bridge window head alignment function") did not use if (r_size <= align) check from pbus_size_mem() for the new head alignment bookkeeping structure (aligns2[]). In some configurations, this can result in producing a gap into the bridge window which the resource larger than its alignment cannot fill. The old alignment calculation algorithm was removed by the subsequent commit 3958bf16e2fe ("PCI: Stop over-estimating bridge window size") which renamed the aligns2[] array leaving only aligns[] array. Add the if (r_size <= align) check back to avoid this problem. Fixes: bc75c8e50711 ("PCI: Rewrite bridge window head alignment function") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/all/b05a6f14-979d-42c9-924c-d8408cb12ae7@roeck-us.net/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-11-ilpo.jarvinen@linux.intel.com
2026-03-27PCI: Align head space betterIlpo Järvinen
When a bridge window contains big and small resource(s), the small resource(s) may not amount to the half of the size of the big resource which would allow calculate_head_align() to shrink the head alignment. This results in always placing the small resource(s) after the big resource. In general, it would be good to be able to place the small resource(s) before the big resource to achieve better utilization of the address space. In the cases where the large resource can only fit at the end of the window, it is even required. However, carrying the information over from pbus_size_mem() and calculate_head_align() to __pci_assign_resource() and pcibios_align_resource() is not easy with the current data structures. A somewhat hacky way to move the non-aligning tail part to the head is possible within pcibios_align_resource(). The free space between the start of the free space span and the aligned start address can be compared with the non-aligning remainder of the size. If the free space is larger than the remainder, placing the remainder before the start address is possible. This relocation should generally work, because PCI resources consist only power-of-2 atoms. Various arch requirements may still need to override the relocation, so the relocation is only applied selectively in such cases. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-10-ilpo.jarvinen@linux.intel.com
2026-03-27PCI: Rename window_alignment() to pci_min_window_alignment()Ilpo Järvinen
window_alignment() lacks prefix. Rename it to pci_min_window_alignment() in order to include the prefix and also add min to indicate the returned window alignment is the minimum PCI spec and arch allows. Also make it available in drivers/pci/pci.h as upcoming changes will need to call it from outside of setup-bus.c. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-9-ilpo.jarvinen@linux.intel.com
2026-03-27resource: Pass full extent of empty space to resource_alignf callbackIlpo Järvinen
__find_resource_space() calculates the full extent of empty space but only passes the aligned space to resource_alignf callback. In some situations, the callback may choose take advantage of the free space before the requested alignment. Pass the full extent of the calculated empty space to resource_alignf callback as an additional parameter. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xifer <xiferdev@gmail.com> Link: https://patch.msgid.link/20260324165633.4583-3-ilpo.jarvinen@linux.intel.com
2026-03-26PCI: Fix premature removal from realloc_head list during resource assignmentIlpo Järvinen
reassign_resources_sorted() checks for two things: a) Resource assignment failures for mandatory resources by checking if the resource remains unassigned, which are known to always repeat, and does not attempt to assign them again. b) That resource is not among the ones being processed/assigned at this stage, leading to skip processing such resources in reassign_resources_sorted() as well (resource assignment progresses one PCI hierarchy level at a time). The problem here is that a) is checked before b), but b) also implies the resource is not being assigned yet, making also a) true. As a) only skips resource assignment but still removes the resource from realloc_head, the later stages that would need to process the information in realloc_head cannot obtain the optional size information anymore. This leads to considering only non-optional part for bridge windows deeper in the PCI hierarchy. This problem has been observed during rescan (add_size is not considered while attempting assignment for 0000:e2:00.0 indicating the corresponding entry was removed from realloc_head while processing resource assignments for 0000:e1): pci_bus 0000:e1: scanning bus ... pci 0000:e3:01.0: bridge window [mem 0x800000000-0x1000ffffff 64bit pref] to [bus e4] add_size 60c000000 add_align 800000000 pci 0000:e3:01.0: bridge window [mem 0x00100000-0x000fffff] to [bus e4] add_size 200000 add_align 200000 pci 0000:e3:02.0: disabling bridge window [mem 0x00000000-0x000fffff 64bit pref] to [bus e5] (unused) pci 0000:e2:00.0: bridge window [mem 0x800000000-0x1000ffffff 64bit pref] to [bus e3-e5] add_size 60c000000 add_align 800000000 pci 0000:e2:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus e3-e5] add_size 200000 add_align 200000 pcieport 0000:e1:02.0: bridge window [io size 0x2000]: can't assign; no space pcieport 0000:e1:02.0: bridge window [io size 0x2000]: failed to assign pcieport 0000:e1:02.0: bridge window [io 0x1000-0x2fff]: resource restored pcieport 0000:e1:02.0: bridge window [io 0x1000-0x2fff]: resource restored pcieport 0000:e1:02.0: bridge window [io size 0x2000]: can't assign; no space pcieport 0000:e1:02.0: bridge window [io size 0x2000]: failed to assign pci 0000:e2:00.0: bridge window [mem 0x28f000000000-0x28f800ffffff 64bit pref]: assigned Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Reported-by: Peter Nisbet <peter.nisbet@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Peter Nisbet <peter.nisbet@intel.com> Link: https://patch.msgid.link/20260313084551.1934-1-ilpo.jarvinen@linux.intel.com
2026-03-26PCI: Prevent shrinking bridge window from its required sizeIlpo Järvinen
Steve reported an eGPU (either Radeon Instinct MI50 32GB or NVIDIA 3080 10GB) connected via Thunderbolt was not assigned sufficient BAR space in v6.11, so the amdgpu and nvidia drivers were unable to initialize the device. pci_bridge_distribute_available_resources() -> ... -> adjust_bridge_window() is called between __pci_bus_size_bridges() and assigning the resources. Since the commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") adjust_bridge_window() can also shrink the bridge window. The shrunken size, however, conflicts with what __pci_bus_size_bridges() -> pbus_size_mem() calculated as the required bridge window size. By shrinking the size, adjust_bridge_window() prevents the rest of the resource fitting algorithm from working as intended. Resource fitting logic is expecting assignment failures when bridge windows need resizing, but there are cases where failures are no longer happening after the commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary"). The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") justifies the change by the extra reservation made due to hpmemsize parameter, however, the kernel code contradicts that statement. (For simplicity, finer-grained hpmmiosize and hpmmiopref parameters that can be used to the same effect as hpmemsize are ignored in this description.) pbus_size_mem() calls calculate_memsize() twice. First with add_size=0 to find out the minimal required resource size. The second call occurs with add_size=hpmemsize (effectively) but the result does not directly affect the resource size only resulting in an entry on the realloc_head list (a.k.a. add_list). Yet, adjust_bridge_window() directly changes the resource size which does not include what is reserved due to hpmemsize. Also, if the required size for the bridge window exceeds hpmemsize, the parameter does not have any effect even on the second size calculation made by pbus_size_mem(); from calculate_memsize(): size = max(size, add_size) + children_add_size; The commit ae4611f1d7e9 ("PCI: Set resource size directly in adjust_bridge_window()") that precedes the commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") is also related to causing this problem. Its changelog explicitly states adjust_bridge_window() wants to "guarantee" allocation success. Guaranteed allocations, however, are incompatible with how the other parts of the resource fitting algorithm work. The given justification fails to explain why guaranteed allocations at this stage are required nor why forcing window to a smaller value than what was calculated by pbus_size_mem() is correct. While the change might have worked by chance in some test scenario, too small bridge window does not "guarantee" success from the point of view of the endpoint device resource assignments. No issue is mentioned within the changelog so it's unclear if the change was made to fix some observed issue nor and what that issue was. The unwanted shrinking of a bridge window occurs, e.g., when a device with large BARs such as eGPU is attached using Thunderbolt and the Root Port holds less than enough resource space for the eGPU. The GPU resources are in order of GBs and the default hotplug allocation is a mere 2MB (DEFAULT_HOTPLUG_MMIO_PREF_SIZE). The problem is illustrated by this log (filtered to the relevant content only): pci 0000:00:07.0: PCI bridge to [bus 03-2c] pci 0000:00:07.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref] pci 0000:03:00.0: PCI bridge to [bus 00] pci 0000:03:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] pci 0000:03:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0000:03:00.0: PCI bridge to [bus 04-2c] pcieport 0000:00:07.0: Assigned bridge window [mem 0x6000000000-0x601bffffff 64bit pref] to [bus 03-2c] cannot fit 0xc00000000 required for 0000:03:00.0 bridging to [bus 04-2c] pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] to [bus 04-2c] add_size 100000 add_align 100000 pcieport 0000:00:07.0: distributing available resources pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] shrunken by 0x00000007e4400000 pci 0000:03:00.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref]: assigned The initial size of the Root Port's window is 448MB (0x601bffffff - 0x6000000000). __pci_bus_size_bridges() -> pbus_size_mem() calculates the required size to be 32772 MB (0x10003fffff - 0x800000000) which would fit the eGPU resources. adjust_bridge_window() then shrinks the bridge window down to what is guaranteed to fit into the Root Port's bridge window. The bridge window for 03:00.0 is also eliminated from the add_list (a.k.a. realloc_head) list by adjust_bridge_window(). After adjustment, the resources are assigned and as the bridge window for 03:00.0 is assigned successfully, no failure is recorded. Without a failure, no attempt to resize the window of the Root Port is required. The end result is eGPU not having large enough resources to work. The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") also claims nested bridge windows are sized the same, which is false. pbus_size_mem() calculates the size for the parent bridge window by summing all the downstream resources so the resource fitting calculates larger bridge window for the parent to accommodate the childen. That is, hpmemsize does not result the same size for the case where there are nested bridge windows. In order to fix the most immediate problem, don't shrink the resource size in adjust_bridge_window() as hpmemsize had nothing to do with it. When considering add_size, only reduce it up to what is added due to hpmemsize (if required size is larger than hpmemsize, the parameter has no impact, see calculate_memsize()). Unfortunately, if the tail of the bridge window was aligned in calculate_memsize() from below hpmemsize to above it, the size check will falsely match but the check at least errs to the side of caution. There's not enough information available in adjust_bridge_window() to know the calculated size precisely. This is not exactly a revert of the commits e4611f1d7e9 ("PCI: Set resource size directly in adjust_bridge_window()") and 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") as shrinking still remains in place but is implemented differently, and the end result behaves very differently. It is possible that those two commits fixed some other issue that is not described with enough detail in the changelog and undoing parts of them results in another regression due to behavioral change. Nonetheless, as described above, the solution by those two commits was flawed and the issue, if one exists, should be solved in a way that is compatible with the rest of the resource fitting algorithm instead of working against it. Besides shrinking, the case where adjust_bridge_window() expands the bridge window is likely somewhat wrong as well because it removes the entry from add_list (a.k.a. realloc_head), but it is less damaging as that only impacts optional resources and may have no impact if expanding by hpmemsize is larger than what add_size was. Fixing it is left as further work. Fixes: 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary") Fixes: ae4611f1d7e9 ("PCI: Set resource size directly in adjust_bridge_window()") Reported-by: Steve Oswald <stevepeter.oswald@gmail.com> Closes: https://lore.kernel.org/linux-pci/CAN95MYEaO8QYYL=5cN19nv_qDGuuP5QOD17pD_ed6a7UqFVZ-g@mail.gmail.com/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260219153951.68869-1-ilpo.jarvinen@linux.intel.com
2026-03-26PCI: Prevent assignment to unsupported bridge windowsAhmed Naseef
Previously, pci_read_bridge_io() and pci_read_bridge_mmio_pref() unconditionally set resource type flags (IORESOURCE_IO or IORESOURCE_MEM | IORESOURCE_PREFETCH) when reading bridge window registers. For windows that are not implemented in hardware, this may cause the allocator to assign space for a window that doesn't exist. For example, the EcoNET EN7528 SoC Root Port doesn't support the prefetchable window, but since a downstream device had a prefetchable BAR, the allocator mistakenly assigned a prefetchable window: pci 0001:00:01.0: [14c3:0811] type 01 class 0x060400 PCIe Root Port pci 0001:00:01.0: PCI bridge to [bus 01-ff] pci 0001:00:01.0: bridge window [mem 0x28000000-0x280fffff]: assigned pci 0001:00:01.0: bridge window [mem 0x28100000-0x282fffff pref]: assigned pci 0001:01:00.0: BAR 0 [mem 0x28100000-0x281fffff 64bit pref]: assigned pci_read_bridge_windows() already detects unsupported windows by testing register writability and sets dev->io_window/pref_window accordingly. Check dev->io_window/pref_window so we don't set the resource flags for unsupported windows, which prevents the allocator from assigning space to them. After this commit, the prefetchable BAR is correctly allocated from the non-prefetchable window: pci 0001:00:01.0: bridge window [mem 0x28000000-0x281fffff]: assigned pci 0001:01:00.0: BAR 0 [mem 0x28000000-0x280fffff 64bit pref]: assigned Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Link: https://lore.kernel.org/all/20260113210259.GA715789@bhelgaas/ Signed-off-by: Ahmed Naseef <naseefkm@gmail.com> Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260312165332.569772-4-cjd@cjdns.fr
2026-03-27PCI: of: Remove max-link-speed generation validationHans Zhang
The of_pci_get_max_link_speed() function currently validates the "max-link-speed" DT property to be in the range 1..4 (Gen1..Gen4). This imposes a maintenance burden because each new PCIe generation would require updating this validation. Remove the range check so the function returns the raw property value (or a negative error code if the property is missing or malformed). Since the callers are now validating the returned speed against the range they support, this check can now be safely removed. Removing the validation from this common function also allows future PCIe generations to be supported without modifying drivers/pci/of.c. Signed-off-by: Hans Zhang <18255117159@163.com> [mani: commit log and kernel doc fix] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260313165522.123518-6-18255117159@163.com
2026-03-27PCI: controller: Validate max-link-speedHans Zhang
Add validation for the "max-link-speed" DT property in three more drivers, using the pcie_get_link_speed() helper. - brcmstb: If the value is missing or invalid, fall back to no limitation (pcie->gen = 0). Fix the previous incorrect logic. - mediatek-gen3: If the value is missing or invalid, use the maximum speed supported by the controller. - rzg3s-host: If the value is missing or invalid, fall back to Gen2. This ensures that all users of of_pci_get_max_link_speed() are ready for the removal of the central range check. Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260313165522.123518-5-18255117159@163.com
2026-03-27PCI: j721e: Validate max-link-speed from DTHans Zhang
Use the new pcie_get_link_speed() helper to validate the value read from the "max-link-speed" DT property. If the value is missing or invalid, fall back to Gen2 (speed = 2). This prepares for the removal of the range check in of_pci_get_max_link_speed(). Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260313165522.123518-4-18255117159@163.com
2026-03-27PCI: dwc: Use pcie_get_link_speed() helper for safe array accessHans Zhang
Replace direct indexing of pcie_link_speed[] with the new helper pcie_get_link_speed() in all DesignWare core and glue drivers. This ensures that out-of-range generation numbers do not cause out-of-bounds accesses when the helper returns PCI_SPEED_UNKNOWN, and prepares for the removal of the range check in of_pci_get_max_link_speed(). The actual validation of the "max-link-speed" DT property (e.g., fallback to a safe default and warning) is added in subsequent patches for each driver that reads the property. Signed-off-by: Hans Zhang <18255117159@163.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260313165522.123518-3-18255117159@163.com
2026-03-27PCI: Add pcie_get_link_speed() helper for safe array accessHans Zhang
The pcie_link_speed[] array is indexed by PCIe generation numbers (1 = 2.5 GT/s, 2 = 5 GT/s, ...). Several drivers use it directly, which can lead to out-of-bounds accesses if an invalid generation number is used. Introduce a helper function pcie_get_link_speed() that returns the pci_bus_speed value for a given generation number, or PCI_SPEED_UNKNOWN if the generation is out of range. This will allow us to safely handle invalid values after the range check is removed from of_pci_get_max_link_speed(). Signed-off-by: Hans Zhang <18255117159@163.com> [mani: Fixed kernel-doc] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260313165522.123518-2-18255117159@163.com