| Age | Commit message (Collapse) | Author |
|
Tegra194 PCIe driver used custom version numbers to detect Tegra194 and
Tegra234 IPs. With version detect logic added, version check results in
mismatch warnings:
tegra194-pcie 14100000.pcie: Versions don't match (0000562a != 3536322a)
Use HW version numbers which match to PORT_LOGIC.PCIE_VERSION_OFF in
Tegra194 driver to avoid these kernel warnings.
Fixes: a54e19073718 ("PCI: tegra194: Add Tegra234 PCIe support")
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-12-mmaddireddy@nvidia.com
|
|
Free up the resources during remove() that were acquired by the DesignWare
driver for the Endpoint mode during probe().
Fixes: bb617cbd8151 ("PCI: tegra194: Clean up the exit path for Endpoint mode")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-11-mmaddireddy@nvidia.com
|
|
Host software initiates the L2 sequence. PCIe link is kept in L2 state
during suspend. If Endpoint mode is enabled and the link is up, the
software cannot proceed with suspend. However, when the PCIe Endpoint
driver is probed, but the PCIe link is not up, Tegra can go into suspend
state. So, allow system to suspend in this case.
Fixes: de2bbf2b71bb ("PCI: tegra194: Don't allow suspend when Tegra PCIe is in EP mode")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-10-mmaddireddy@nvidia.com
|
|
LTR message should be sent as soon as the Root Port enables LTR in the
Endpoint mode. So set snoop and no-snoop LTR timing and LTR message request
before the PCIe link comes up, so that the LTR message is sent upstream as
soon as LTR is enabled.
Without programming these values, the Endpoint would send latencies of 0 to
the host, which will be inaccurate.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-9-mmaddireddy@nvidia.com
|
|
Pre-silicon simulation showed the controller operating in Endpoint mode
initiating link speed change after completing Secondary Bus Reset. Ideally,
the Root Port or the Switch Downstream Port should initiate the link speed
change post SBR, not the Endpoint.
So, as per the hardware team recommendation, disable direct speed change
for the Endpoint mode to prevent it from initiating speed change after the
physical layer link is up at Gen1, leaving speed change ownership with the
host.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-8-mmaddireddy@nvidia.com
|
|
The GPIO DT property "nvidia,refclk-select", to select the PCIe reference
clock is optional. Use devm_gpiod_get_optional() to get it.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-7-mmaddireddy@nvidia.com
|
|
The PERST# GPIO interrupt is only registered when the controller is
operating in Endpoint mode. In Root Port mode, the PERST# GPIO is
configured as an output to control downstream devices, and no interrupt is
registered for it.
Currently, tegra_pcie_dw_stop_link() unconditionally calls disable_irq()
on pex_rst_irq, which causes issues in Root Port mode where this IRQ is
not registered.
Fix this by only disabling the PERST# IRQ when operating in Endpoint mode,
where the interrupt is actually registered and used to detect PERST#
assertion/deassertion from the host.
Fixes: c57247f940e8 ("PCI: tegra: Add support for PCIe endpoint mode in Tegra194")
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-6-mmaddireddy@nvidia.com
|
|
As per PCIe CEM r6.0, sec 2.3, the PCIe Endpoint device should be in D3cold
to assert WAKE# pin. The previous workaround that forced downstream devices
to D0 before taking the link to L2 cited PCIe r4.0, sec 5.2, "Link State
Power Management"; however, that spec does not explicitly require putting
the device into D0 and only indicates that power removal may be initiated
without transitioning to D3hot.
Remove the D0 workaround so that Endpoint devices can use wake
functionality (WAKE# from D3). With some Endpoints the link may not enter
L2 when they remain in D3, but the Root Port continues with the usual flow
after PME timeout, so there is no functional issue.
Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-5-mmaddireddy@nvidia.com
|
|
After the link reaches a Detect-related LTSSM state, disable LTSSM so it
does not keep toggling between Polling and Detect. Do this by polling for
the Detect state first, then clearing APPL_CTRL_LTSSM_EN in both
tegra_pcie_dw_pme_turnoff() and pex_ep_event_pex_rst_assert().
Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-4-mmaddireddy@nvidia.com
|
|
On surprise link down, LTSSM state transits from L0 -> Recovery.RcvrLock ->
Recovery.RcvrSpeed -> Gen1 Recovery.RcvrLock -> Detect. Recovery.RcvrLock
and Recovery.RcvrSpeed transit times are 24 ms and 48 ms respectively, so
the total time from L0 to Detect is ~96 ms. Increase the poll timeout to
120 ms to account for this.
While at it, add LTSSM state defines for Detect-related states and use them
in the poll condition. Use readl_poll_timeout() instead of
readl_poll_timeout_atomic() in tegra_pcie_dw_pme_turnoff() since that path
runs in non-atomic context.
Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-3-mmaddireddy@nvidia.com
|
|
As per PCIe r7.0, sec 5.3.3.2.1, after sending PME_Turn_Off message, Root
Port should wait for 1-10 msec for PME_TO_Ack message. Currently, driver is
polling for 10 msec with 1 usec delay which is aggressive. Use existing
macro PCIE_PME_TO_L2_TIMEOUT_US to poll for 10 msec with 1 msec delay.
Since this function is used in non-atomic context only, use non-atomic poll
function.
Fixes: 56e15a238d92 ("PCI: tegra: Add Tegra194 PCIe support")
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260324190755.1094879-2-mmaddireddy@nvidia.com
|
|
In pci-epf-test.c, set the STATUS_NO_RESOURCE status bit if
pci_epc_set_bar() returns -ENOSPC. This status bit is used to indicate
that there are not enough inbound window resources to allocate the
subrange.
In pci_endpoint_test.c, return -ENOSPC instead of -EIO when
STATUS_NO_RESOURCE is set.
In pci_endpoint_test.c, skip the BAR subrange test if -ENOSPC, i.e., there
are not enough inbound window resources to run the test.
Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: squash related commits]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Koichiro Den <den@valinux.co.jp>
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-1-6f2e65f2298c@foss.st.com
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-2-6f2e65f2298c@foss.st.com
Link: https://patch.msgid.link/20260407-skip-bar_subrange-tests-if-enospc-v4-3-6f2e65f2298c@foss.st.com
|
|
After having removed the last usage of no_pci_devices(), this function
can be removed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/b0ce592d-c34c-4e0b-b389-4e346b3a0c44@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Wei Liu:
- Two fixes for Hyper-V PCI driver (Long Li, Sahil Chandna)
- Fix an infinite loop issue in MSHV driver (Stanislav Kinsburskii)
* tag 'hyperv-fixes-signed-20260406' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
mshv: Fix infinite fault loop on permission-denied GPA intercepts
PCI: hv: Fix double ida_free in hv_pci_probe error path
PCI: hv: Set default NUMA node to 0 for devices without affinity info
|
|
Rockchip platforms provide a 64x4 bytes debug FIFO to trace the LTSSM
transition and data rate change history. These will be useful for debugging
issues such as link failure, etc.
Hence, expose these information over pcie_ltssm_state_transition
tracepoint.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/1774403912-210670-4-git-send-email-shawn.lin@rock-chips.com
|
|
Some PCI controllers may provide debug functionalities to track PCI bus
activities like LTSSM state transitions and data rate changes. These will
be very useful for debugging PCI link specific issues such as endpoint not
getting detected or performance issues.
Hence, implement the PCI controller tracepoint feature for recording LTSSM
state transitions and data rate changes.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
[mani: commit log and maintainers entry]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/1774403912-210670-2-git-send-email-shawn.lin@rock-chips.com
|
|
NPEM registers LED classdevs on PCI endpoint that may be behind
hotplug-capable ports. During hot-removal, led_classdev_unregister() calls
led_set_brightness(LED_OFF) which leads to a PCI config read to a
disconnected device, which fails and returns -ENODEV (topology details in
msgid.link below):
leds 0003:01:00.0:enclosure:ok: Setting an LED's brightness failed (-19)
The LED core already suppresses this for devices with LED_HW_PLUGGABLE set,
but NPEM never sets it. Add the flag since NPEM LEDs are on hot-pluggable
hardware by nature.
Fixes: 4e893545ef87 ("PCI/NPEM: Add Native PCIe Enclosure Management support")
Signed-off-by: Richard Cheng <icheng@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Kai-Heng Feng <kaihengf@nvidia.com>
Link: https://patch.msgid.link/20260402093850.23075-1-icheng@nvidia.com
|
|
In the PCIe PHY init for the i.MX95, the reference clock source selection
uses a conditional instead of always passing the mask. This currently
breaks functionality if the internal refclk is used.
To fix this issue, always pass IMX95_PCIE_REF_USE_PAD as the mask and clear
bit if external refclk is not used. This essentially swaps the parameters.
Fixes: d8574ce57d76 ("PCI: imx6: Add external reference clock input mode support")
Signed-off-by: Franz Schnyder <franz.schnyder@toradex.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260325093118.684142-1-fra.schnyder@gmail.com
|
|
pcie_tph_get_cpu_st() uses the Query Cache Locality Features _DSM [1]
to retrieve the TPH Steering Tag for memory associated with the CPU
identified by its "cpu_uid" parameter, a Linux logical CPU ID.
The _DSM requires an ACPI Processor UID, which pcie_tph_get_cpu_st()
previously assumed was the same as the Linux logical CPU ID. This is
true on x86 but not on arm64, so pcie_tph_get_cpu_st() returned the
wrong Steering Tag, resulting in incorrect TPH functionality on arm64.
Convert the Linux logical CPU ID to the ACPI Processor UID with
acpi_get_cpu_uid() before passing it to the _DSM. Additionally, rename
the pcie_tph_get_cpu_st() parameter from "cpu_uid" to "cpu" to reflect
that it represents a logical CPU ID (not an ACPI Processor UID).
[1] According to ECN_TPH-ST_Revision_20200924
(https://members.pcisig.com/wg/PCI-SIG/document/15470), the input
is defined as: "If the target is a processor, then this field
represents the ACPI Processor UID of the processor as specified in
the MADT. If the target is a processor container, then this field
represents the ACPI Processor UID of the processor container as
specified in the PPTT."
Fixes: d2e8a34876ce ("PCI/TPH: Add Steering Tag support")
Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260401081640.26875-9-fengchengwen@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Switch to the device-managed variant so the notifier is automatically
unregistered on device removal, allowing the open-coded remove callback
to be dropped entirely.
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
Link: https://patch.msgid.link/20260330094203.38022-3-kaihengf@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The commit 18ac51ae9df9 ("PCI: cadence: Implement capability search
using PCI core APIs") assumed all the platforms using Cadence PCIe
controller support byte and word register accesses. This is not true
for all platforms (e.g., TI J721E SoC, which only supports dword
register accesses).
This causes capability searches via cdns_pcie_find_capability() to fail
on such platforms.
Fix this by using cdns_pcie_read_sz() for config read functions, which
properly handles size-aligned accesses. Remove the now-unused byte and
word read wrapper functions (cdns_pcie_readw and cdns_pcie_readb).
Fixes: 18ac51ae9df9 ("PCI: cadence: Implement capability search using PCI core APIs")
Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260402085545.284457-1-a-garg7@ti.com
|
|
In mtk_pcie_setup_irq(), the IRQ domains are allocated before the
controller's IRQ is fetched. If the latter fails, the function
directly returns an error, without cleaning up the allocated domains.
Hence, reverse the order so that the IRQ domains are allocated after the
controller's IRQ is found.
This was flagged by Sashiko during a review of "[PATCH v6 0/7] PCI:
mediatek-gen3: add power control support".
Fixes: 814cceebba9b ("PCI: mediatek-gen3: Add INTx support")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://sashiko.dev/#/patchset/20260324052002.4072430-1-wenst%40chromium.org
Link: https://patch.msgid.link/20260324093542.18523-1-wenst@chromium.org
|
|
Tegra Endpoint exposes three 64-bit BARs at indices 0, 2, and 4:
- BAR0+BAR1: EPF test/data (programmable 64-bit BAR)
- BAR2+BAR3: MSI-X table (hardware-backed)
- BAR4+BAR5: DMA registers (hardware-backed)
Update tegra_pcie_epc_features so that BAR2 is BAR_RESERVED with
PCI_EPC_BAR_RSVD_MSIX_TBL_RAM (64 KB) & PCI_EPC_BAR_RSVD_MSIX_PBA_RAM
(64 KB) and BAR4 is BAR_RESERVED with PCI_EPC_BAR_RSVD_DMA_CTRL_MMIO (4KB).
This keeps CONSECUTIVE_BAR_TEST working while allowing the host to use
64-bit BAR2 (MSI-X) and BAR4 (DMA).
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260324080857.916263-4-mmaddireddy@nvidia.com
|
|
The Tegra194/234 Endpoint does not support the Resizable BAR capability,
but BAR0 can be programmed to different sizes via the DBI2 BAR registers
in dw_pcie_ep_set_bar_programmable(). The BAR0 size is set once during
initialization.
Remove the fixed 1MB limit from pci_epc_features so Endpoint function
drivers can configure the BAR0 size they need.
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260324080857.916263-3-mmaddireddy@nvidia.com
|
|
The aspeed_pcie_probe() function calls aspeed_pcie_init_irq_domain()
which allocates pcie->intx_domain and initializes MSI. However, if
platform_get_irq() fails afterwards, the cleanup action was not yet
registered via devm_add_action_or_reset(), causing the IRQ domain
resources to leak.
Fix this by registering the devm cleanup action immediately after
aspeed_pcie_init_irq_domain() succeeds, before calling
platform_get_irq(). This ensures proper cleanup on any subsequent
failure.
Fixes: 9aa0cb68fcc1 ("PCI: aspeed: Add ASPEED PCIe RC driver")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Jacky Chou <jacky_chou@aspeedtech.com>
Link: https://patch.msgid.link/20260324-aspeed-v1-1-354181624c00@gmail.com
|
|
If hv_pci_probe() fails after storing the domain number in
hbus->bridge->domain_nr, there is a call to free this domain_nr via
pci_bus_release_emul_domain_nr(), however, during cleanup, the bridge
release callback pci_release_host_bridge_dev() also frees the domain_nr
causing ida_free to be called on same ID twice and triggering following
warning:
ida_free called for id=28971 which is not allocated.
WARNING: lib/idr.c:594 at ida_free+0xdf/0x160, CPU#0: kworker/0:2/198
Call Trace:
pci_bus_release_emul_domain_nr+0x17/0x20
pci_release_host_bridge_dev+0x4b/0x60
device_release+0x3b/0xa0
kobject_put+0x8e/0x220
devm_pci_alloc_host_bridge_release+0xe/0x20
devres_release_all+0x9a/0xd0
device_unbind_cleanup+0x12/0xa0
really_probe+0x1c5/0x3f0
vmbus_add_channel_work+0x135/0x1a0
Fix this by letting pci core handle the free domain_nr and remove
the explicit free called in pci-hyperv driver.
Fixes: bcce8c74f1ce ("PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms")
Signed-off-by: Sahil Chandna <sahilchandna@linux.microsoft.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override")
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex@shazbot.org>
Tested-by: Gui-Dong Han <hanguidong02@gmail.com>
Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
hardware bug
On NXP i.MX7D, i.MX8MM, and i.MX8MQ chipsets, MSIs from the endpoints won't
be received by the iMSI-RX MSI controller if the Root Port MSI capability
is disabled.
Even though the Root Port MSIs won't be received by the iMSI-RX controller
due to design, these chipsets have some weird hardware bug that prevents
the endpoint MSIs from reaching when the Root Port MSI capability is
disabled.
Hence, introduce a new flag, 'dw_pcie_rp::keep_rp_msi_en', set it for the
above mentioned SoCs, and always keep the Root Port MSI capability when
this flag is set.
Note that by keeping Root Port MSI capability, Root Port MSIs such as AER,
PME and others won't be received by default. So users need to use
workarounds such as passing 'pcie_pme=nomsi' cmdline param.
Fixes: f5cd8a929c825 ("PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller")
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: fix typos]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260331085252.1243108-1-hongxing.zhu@nxp.com
|
|
Point to the relevant sections in the most recent release 7.0 of the PCIe
spec. Text has mostly just moved around without any semantic change.
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-3-f601818417e8@linux.ibm.com
|
|
When inspecting the config space of a Connect-X physical function in an
s390 system after it was initialized by the mlx5_core device driver, we
found the function to be enabled to request AtomicOps despite the Root Port
lacking support for completing them:
00:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
Subsystem: Mellanox Technologies Device 0002
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn+
On s390 and many virtualized guests, the Endpoint is visible but the Root
Port is not. In this case, pci_enable_atomic_ops_to_root() previously
enabled AtomicOps in the Endpoint even though it can't tell whether the
Root Port supports them as a completer.
Change pci_enable_atomic_ops_to_root() to fail if there's no Root Port or
the Root Port doesn't support AtomicOps.
Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()")
Reported-by: Alexander Schmidt <alexs@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
[bhelgaas: commit log, check RP first to simplify flow]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-2-f601818417e8@linux.ibm.com
|
|
Since Root Complex Integrated Endpoints (RCiEPs) attach to a bus that has
no bridge device describing the Root Port, the capability to complete
AtomicOps requests cannot be determined with PCIe methods.
Change default of pci_enable_atomic_ops_to_root() to not enable AtomicOps
requests on RCiEPs.
As far as we know, there are no RCiEPs that need AtomicOps (see Link
below).
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330-fix_pciatops-v7-1-f601818417e8@linux.ibm.com
|
|
kstrtou32_from_user() returns int, but the return value was stored in
a u32 variable 'val', risking sign loss. Use a dedicated int variable
to correctly handle the return code.
Fixes: 4fbfa17f9a07 ("PCI: dwc: Add debugfs based Silicon Debug support for DWC")
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260401023048.4182452-1-18255117159@163.com
|
|
There is already an 'if PCI' condition wrapping several config options,
e.g., PCI_DOMAINS and VGA_ARB, making the 'depends on PCI' statement for
each of these a duplicate dependency (dead code).
Leave the outer 'if PCI...endif' and remove the individual 'depends on PCI'
statement from each option.
This dead code was found by kconfirm, a static analysis tool for Kconfig.
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260330214549.16157-1-julianbraha@gmail.com
|
|
aer_print_error() skips printing if ratelimit_print[i] is not set. In the
native AER path, ratelimit_print is initialized by add_error_device()
during source device discovery, and is set to 1 for fatal errors to bypass
rate limiting since fatal errors should always be logged.
The DPC/EDR path uses the DPC-capable port as the error source and reads
its AER uncorrectable error status registers directly in
dpc_get_aer_uncorrect_severity(). Since it does not go through
add_error_device(), ratelimit_print[0] is left uninitialized and zero. As
a result, aer_print_error() silently drops all AER error messages for
DPC/EDR triggered events.
Set ratelimit_print[0] to 1 to bypass rate limiting and always print AER
logs for uncorrectable errors detected by the DPC port.
Fixes: a57f2bfb4a58 ("PCI/AER: Ratelimit correctable and non-fatal error logging")
Co-developed-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
Signed-off-by: Goudar Manjunath Ramanagouda <manjunath.ramanagouda.goudar@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260318170449.2733581-1-sathyanarayanan.kuppuswamy@linux.intel.com
|
|
The numa_node sysfs interface allows users to manually override a PCI
device's NUMA node assignment. Currently, every write triggers a FW_BUG
warning and taints the kernel, even when writing the same value that is
already set.
Check if the requested node is already assigned to the device. If it
matches, return success immediately without tainting the kernel or printing
a warning.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260317002803.2353-1-lirongqing@baidu.com
|
|
When searching for the error source, the AER driver rules out devices whose
enable_cnt is zero. This was introduced in 2009 by commit 28eb27cf0839
("PCI AER: support invalid error source IDs") without providing a
rationale.
Drivers typically call pci_enable_device() on probe, hence the enable_cnt
check essentially filters out unbound devices. At the time of the commit,
drivers had to opt in to AER by calling pci_enable_pcie_error_reporting()
and so any AER-enabled device could be assumed to be bound to a driver.
The check thus made sense because it allowed skipping config space accesses
to devices which were known not to be the error source.
But since 2022, AER is universally enabled on all devices when they are
enumerated, cf. commit f26e58bf6f54 ("PCI/AER: Enable error reporting when
AER is native").
Errors may very well be reported by unbound devices, e.g. due to link
instability. By ruling them out as error source, errors reported by them
are neither logged nor cleared. When they do get bound and another error
occurs, the earlier error is reported together with the new error, which
may confuse users. Stop doing so.
Fixes: f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stefan Roese <stefan.roese@mailbox.org>
Cc: stable@vger.kernel.org # v6.0+
Link: https://patch.msgid.link/734338c2e8b669db5a5a3b45d34131b55ffebfca.1774605029.git.lukas@wunner.de
|
|
PCI bridges are allowed to refuse activating VGA decoding, by simply
ignoring attempts to set the bit that enables it, so after setting the bit,
read it back to verify.
One example of such a bridge is the root bridge in IBM PowerNV, but this is
also useful for GPU passthrough into virtual machines, where it is
difficult to set up routing for legacy IO through IOMMU.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
[bhelgaas: subject, add comment about VGA Enable writability]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260307173538.763188-5-Simon.Richter@hogyros.de
|
|
When vNTB is used as a PCI endpoint function, the NTB device is backed
by a virtual PCI function. For DMA API allocations and mappings, NTB
clients must use the device that is associated with the IOMMU domain.
Implement ntb_dev_ops->get_dma_dev() for pci-epf-vntb and return the EPC
parent device.
Suggested-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260306031443.1911860-4-den@valinux.co.jp
|
|
The commit bc75c8e50711 ("PCI: Rewrite bridge window head alignment
function") did not use if (r_size <= align) check from pbus_size_mem() for
the new head alignment bookkeeping structure (aligns2[]). In some
configurations, this can result in producing a gap into the bridge window
which the resource larger than its alignment cannot fill.
The old alignment calculation algorithm was removed by the subsequent
commit 3958bf16e2fe ("PCI: Stop over-estimating bridge window size") which
renamed the aligns2[] array leaving only aligns[] array.
Add the if (r_size <= align) check back to avoid this problem.
Fixes: bc75c8e50711 ("PCI: Rewrite bridge window head alignment function")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/b05a6f14-979d-42c9-924c-d8408cb12ae7@roeck-us.net/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-11-ilpo.jarvinen@linux.intel.com
|
|
When a bridge window contains big and small resource(s), the small
resource(s) may not amount to the half of the size of the big resource
which would allow calculate_head_align() to shrink the head alignment.
This results in always placing the small resource(s) after the big
resource.
In general, it would be good to be able to place the small resource(s)
before the big resource to achieve better utilization of the address space.
In the cases where the large resource can only fit at the end of the
window, it is even required.
However, carrying the information over from pbus_size_mem() and
calculate_head_align() to __pci_assign_resource() and
pcibios_align_resource() is not easy with the current data structures.
A somewhat hacky way to move the non-aligning tail part to the head is
possible within pcibios_align_resource(). The free space between the start
of the free space span and the aligned start address can be compared with
the non-aligning remainder of the size. If the free space is larger than
the remainder, placing the remainder before the start address is possible.
This relocation should generally work, because PCI resources consist only
power-of-2 atoms.
Various arch requirements may still need to override the relocation, so the
relocation is only applied selectively in such cases.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-10-ilpo.jarvinen@linux.intel.com
|
|
window_alignment() lacks prefix. Rename it to pci_min_window_alignment() in
order to include the prefix and also add min to indicate the returned
window alignment is the minimum PCI spec and arch allows.
Also make it available in drivers/pci/pci.h as upcoming changes will need
to call it from outside of setup-bus.c.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-9-ilpo.jarvinen@linux.intel.com
|
|
__find_resource_space() calculates the full extent of empty space but only
passes the aligned space to resource_alignf callback. In some situations,
the callback may choose take advantage of the free space before the
requested alignment.
Pass the full extent of the calculated empty space to resource_alignf
callback as an additional parameter.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-3-ilpo.jarvinen@linux.intel.com
|
|
reassign_resources_sorted() checks for two things:
a) Resource assignment failures for mandatory resources by checking if the
resource remains unassigned, which are known to always repeat, and does
not attempt to assign them again.
b) That resource is not among the ones being processed/assigned at this
stage, leading to skip processing such resources in
reassign_resources_sorted() as well (resource assignment progresses
one PCI hierarchy level at a time).
The problem here is that a) is checked before b), but b) also implies the
resource is not being assigned yet, making also a) true. As a) only skips
resource assignment but still removes the resource from realloc_head, the
later stages that would need to process the information in realloc_head
cannot obtain the optional size information anymore. This leads to
considering only non-optional part for bridge windows deeper in the PCI
hierarchy.
This problem has been observed during rescan (add_size is not considered
while attempting assignment for 0000:e2:00.0 indicating the corresponding
entry was removed from realloc_head while processing resource assignments
for 0000:e1):
pci_bus 0000:e1: scanning bus
...
pci 0000:e3:01.0: bridge window [mem 0x800000000-0x1000ffffff 64bit pref] to [bus e4] add_size 60c000000 add_align 800000000
pci 0000:e3:01.0: bridge window [mem 0x00100000-0x000fffff] to [bus e4] add_size 200000 add_align 200000
pci 0000:e3:02.0: disabling bridge window [mem 0x00000000-0x000fffff 64bit pref] to [bus e5] (unused)
pci 0000:e2:00.0: bridge window [mem 0x800000000-0x1000ffffff 64bit pref] to [bus e3-e5] add_size 60c000000 add_align 800000000
pci 0000:e2:00.0: bridge window [mem 0x00100000-0x001fffff] to [bus e3-e5] add_size 200000 add_align 200000
pcieport 0000:e1:02.0: bridge window [io size 0x2000]: can't assign; no space
pcieport 0000:e1:02.0: bridge window [io size 0x2000]: failed to assign
pcieport 0000:e1:02.0: bridge window [io 0x1000-0x2fff]: resource restored
pcieport 0000:e1:02.0: bridge window [io 0x1000-0x2fff]: resource restored
pcieport 0000:e1:02.0: bridge window [io size 0x2000]: can't assign; no space
pcieport 0000:e1:02.0: bridge window [io size 0x2000]: failed to assign
pci 0000:e2:00.0: bridge window [mem 0x28f000000000-0x28f800ffffff 64bit pref]: assigned
Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync")
Reported-by: Peter Nisbet <peter.nisbet@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Peter Nisbet <peter.nisbet@intel.com>
Link: https://patch.msgid.link/20260313084551.1934-1-ilpo.jarvinen@linux.intel.com
|
|
Steve reported an eGPU (either Radeon Instinct MI50 32GB or NVIDIA 3080
10GB) connected via Thunderbolt was not assigned sufficient BAR space in
v6.11, so the amdgpu and nvidia drivers were unable to initialize the
device.
pci_bridge_distribute_available_resources() -> ... ->
adjust_bridge_window() is called between __pci_bus_size_bridges()
and assigning the resources. Since the commit 948675736a77 ("PCI: Allow
adjust_bridge_window() to shrink resource if necessary")
adjust_bridge_window() can also shrink the bridge window. The shrunken
size, however, conflicts with what __pci_bus_size_bridges() ->
pbus_size_mem() calculated as the required bridge window size. By shrinking
the size, adjust_bridge_window() prevents the rest of the resource fitting
algorithm from working as intended. Resource fitting logic is expecting
assignment failures when bridge windows need resizing, but there are cases
where failures are no longer happening after the commit 948675736a77 ("PCI:
Allow adjust_bridge_window() to shrink resource if necessary").
The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
resource if necessary") justifies the change by the extra reservation
made due to hpmemsize parameter, however, the kernel code contradicts
that statement. (For simplicity, finer-grained hpmmiosize and hpmmiopref
parameters that can be used to the same effect as hpmemsize are ignored in
this description.)
pbus_size_mem() calls calculate_memsize() twice. First with add_size=0
to find out the minimal required resource size. The second call occurs
with add_size=hpmemsize (effectively) but the result does not directly
affect the resource size only resulting in an entry on the realloc_head
list (a.k.a. add_list). Yet, adjust_bridge_window() directly changes
the resource size which does not include what is reserved due to
hpmemsize. Also, if the required size for the bridge window exceeds
hpmemsize, the parameter does not have any effect even on the second
size calculation made by pbus_size_mem(); from calculate_memsize():
size = max(size, add_size) + children_add_size;
The commit ae4611f1d7e9 ("PCI: Set resource size directly in
adjust_bridge_window()") that precedes the commit 948675736a77 ("PCI:
Allow adjust_bridge_window() to shrink resource if necessary") is also
related to causing this problem. Its changelog explicitly states
adjust_bridge_window() wants to "guarantee" allocation success.
Guaranteed allocations, however, are incompatible with how the other
parts of the resource fitting algorithm work. The given justification
fails to explain why guaranteed allocations at this stage are required
nor why forcing window to a smaller value than what was calculated by
pbus_size_mem() is correct. While the change might have worked by chance
in some test scenario, too small bridge window does not "guarantee"
success from the point of view of the endpoint device resource
assignments. No issue is mentioned within the changelog so it's unclear
if the change was made to fix some observed issue nor and what that
issue was.
The unwanted shrinking of a bridge window occurs, e.g., when a device with
large BARs such as eGPU is attached using Thunderbolt and the Root Port
holds less than enough resource space for the eGPU. The GPU resources are
in order of GBs and the default hotplug allocation is a mere 2MB
(DEFAULT_HOTPLUG_MMIO_PREF_SIZE). The problem is illustrated by this log
(filtered to the relevant content only):
pci 0000:00:07.0: PCI bridge to [bus 03-2c]
pci 0000:00:07.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref]
pci 0000:03:00.0: PCI bridge to [bus 00]
pci 0000:03:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref]
pci 0000:03:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:03:00.0: PCI bridge to [bus 04-2c]
pcieport 0000:00:07.0: Assigned bridge window [mem 0x6000000000-0x601bffffff 64bit pref] to [bus 03-2c] cannot fit 0xc00000000 required for 0000:03:00.0 bridging to [bus 04-2c]
pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] to [bus 04-2c] add_size 100000 add_align 100000
pcieport 0000:00:07.0: distributing available resources
pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] shrunken by 0x00000007e4400000
pci 0000:03:00.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref]: assigned
The initial size of the Root Port's window is 448MB (0x601bffffff -
0x6000000000). __pci_bus_size_bridges() -> pbus_size_mem() calculates the
required size to be 32772 MB (0x10003fffff - 0x800000000) which would fit
the eGPU resources. adjust_bridge_window() then shrinks the bridge window
down to what is guaranteed to fit into the Root Port's bridge window. The
bridge window for 03:00.0 is also eliminated from the add_list (a.k.a.
realloc_head) list by adjust_bridge_window().
After adjustment, the resources are assigned and as the bridge window for
03:00.0 is assigned successfully, no failure is recorded. Without a
failure, no attempt to resize the window of the Root Port is required. The
end result is eGPU not having large enough resources to work.
The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
resource if necessary") also claims nested bridge windows are sized the
same, which is false. pbus_size_mem() calculates the size for the parent
bridge window by summing all the downstream resources so the resource
fitting calculates larger bridge window for the parent to accommodate the
childen. That is, hpmemsize does not result the same size for the case
where there are nested bridge windows.
In order to fix the most immediate problem, don't shrink the resource size
in adjust_bridge_window() as hpmemsize had nothing to do with it. When
considering add_size, only reduce it up to what is added due to hpmemsize
(if required size is larger than hpmemsize, the parameter has no impact,
see calculate_memsize()). Unfortunately, if the tail of the bridge window
was aligned in calculate_memsize() from below hpmemsize to above it, the
size check will falsely match but the check at least errs to the side of
caution. There's not enough information available in adjust_bridge_window()
to know the calculated size precisely.
This is not exactly a revert of the commits e4611f1d7e9 ("PCI: Set resource
size directly in adjust_bridge_window()") and 948675736a77 ("PCI: Allow
adjust_bridge_window() to shrink resource if necessary") as shrinking still
remains in place but is implemented differently, and the end result behaves
very differently.
It is possible that those two commits fixed some other issue that is not
described with enough detail in the changelog and undoing parts of them
results in another regression due to behavioral change. Nonetheless, as
described above, the solution by those two commits was flawed and the
issue, if one exists, should be solved in a way that is compatible with the
rest of the resource fitting algorithm instead of working against it.
Besides shrinking, the case where adjust_bridge_window() expands the bridge
window is likely somewhat wrong as well because it removes the entry from
add_list (a.k.a. realloc_head), but it is less damaging as that only
impacts optional resources and may have no impact if expanding by hpmemsize
is larger than what add_size was. Fixing it is left as further work.
Fixes: 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink resource if necessary")
Fixes: ae4611f1d7e9 ("PCI: Set resource size directly in adjust_bridge_window()")
Reported-by: Steve Oswald <stevepeter.oswald@gmail.com>
Closes: https://lore.kernel.org/linux-pci/CAN95MYEaO8QYYL=5cN19nv_qDGuuP5QOD17pD_ed6a7UqFVZ-g@mail.gmail.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260219153951.68869-1-ilpo.jarvinen@linux.intel.com
|
|
Previously, pci_read_bridge_io() and pci_read_bridge_mmio_pref()
unconditionally set resource type flags (IORESOURCE_IO or IORESOURCE_MEM |
IORESOURCE_PREFETCH) when reading bridge window registers. For windows that
are not implemented in hardware, this may cause the allocator to assign
space for a window that doesn't exist.
For example, the EcoNET EN7528 SoC Root Port doesn't support the
prefetchable window, but since a downstream device had a prefetchable BAR,
the allocator mistakenly assigned a prefetchable window:
pci 0001:00:01.0: [14c3:0811] type 01 class 0x060400 PCIe Root Port
pci 0001:00:01.0: PCI bridge to [bus 01-ff]
pci 0001:00:01.0: bridge window [mem 0x28000000-0x280fffff]: assigned
pci 0001:00:01.0: bridge window [mem 0x28100000-0x282fffff pref]: assigned
pci 0001:01:00.0: BAR 0 [mem 0x28100000-0x281fffff 64bit pref]: assigned
pci_read_bridge_windows() already detects unsupported windows by testing
register writability and sets dev->io_window/pref_window accordingly.
Check dev->io_window/pref_window so we don't set the resource flags for
unsupported windows, which prevents the allocator from assigning space to
them.
After this commit, the prefetchable BAR is correctly allocated from the
non-prefetchable window:
pci 0001:00:01.0: bridge window [mem 0x28000000-0x281fffff]: assigned
pci 0001:01:00.0: BAR 0 [mem 0x28000000-0x280fffff 64bit pref]: assigned
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Link: https://lore.kernel.org/all/20260113210259.GA715789@bhelgaas/
Signed-off-by: Ahmed Naseef <naseefkm@gmail.com>
Signed-off-by: Caleb James DeLisle <cjd@cjdns.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260312165332.569772-4-cjd@cjdns.fr
|
|
The of_pci_get_max_link_speed() function currently validates the
"max-link-speed" DT property to be in the range 1..4 (Gen1..Gen4).
This imposes a maintenance burden because each new PCIe generation
would require updating this validation.
Remove the range check so the function returns the raw property value
(or a negative error code if the property is missing or malformed).
Since the callers are now validating the returned speed against the
range they support, this check can now be safely removed.
Removing the validation from this common function also allows future PCIe
generations to be supported without modifying drivers/pci/of.c.
Signed-off-by: Hans Zhang <18255117159@163.com>
[mani: commit log and kernel doc fix]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260313165522.123518-6-18255117159@163.com
|
|
Add validation for the "max-link-speed" DT property in three more
drivers, using the pcie_get_link_speed() helper.
- brcmstb: If the value is missing or invalid, fall back to no
limitation (pcie->gen = 0). Fix the previous incorrect logic.
- mediatek-gen3: If the value is missing or invalid, use the maximum
speed supported by the controller.
- rzg3s-host: If the value is missing or invalid, fall back to Gen2.
This ensures that all users of of_pci_get_max_link_speed() are ready
for the removal of the central range check.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20260313165522.123518-5-18255117159@163.com
|
|
Use the new pcie_get_link_speed() helper to validate the value read from
the "max-link-speed" DT property. If the value is missing or invalid,
fall back to Gen2 (speed = 2). This prepares for the removal of the
range check in of_pci_get_max_link_speed().
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260313165522.123518-4-18255117159@163.com
|
|
Replace direct indexing of pcie_link_speed[] with the new helper
pcie_get_link_speed() in all DesignWare core and glue drivers. This
ensures that out-of-range generation numbers do not cause out-of-bounds
accesses when the helper returns PCI_SPEED_UNKNOWN, and prepares for
the removal of the range check in of_pci_get_max_link_speed().
The actual validation of the "max-link-speed" DT property (e.g., fallback
to a safe default and warning) is added in subsequent patches for each
driver that reads the property.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260313165522.123518-3-18255117159@163.com
|
|
The pcie_link_speed[] array is indexed by PCIe generation numbers
(1 = 2.5 GT/s, 2 = 5 GT/s, ...). Several drivers use it directly,
which can lead to out-of-bounds accesses if an invalid generation
number is used.
Introduce a helper function pcie_get_link_speed() that returns the
pci_bus_speed value for a given generation number, or PCI_SPEED_UNKNOWN if
the generation is out of range. This will allow us to safely handle
invalid values after the range check is removed from
of_pci_get_max_link_speed().
Signed-off-by: Hans Zhang <18255117159@163.com>
[mani: Fixed kernel-doc]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260313165522.123518-2-18255117159@163.com
|