summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Add struct sockaddr_unsized for sockaddr of unknown lengthKees Cook
Add flexible sockaddr structure to support addresses longer than the traditional 14-byte struct sockaddr::sa_data limitation without requiring the full 128-byte sa_data of struct sockaddr_storage. This allows the network APIs to pass around a pointer to an object that isn't lying to the compiler about how big it is, but must be accompanied by its actual size as an additional parameter. It's possible we may way to migrate to including the size with the struct in the future, e.g.: struct sockaddr_unsized { u16 sa_data_len; u16 sa_family; u8 sa_data[] __counted_by(sa_data_len); }; Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-1-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: phy: fixed_phy: remove fixed_phy_addHeiner Kallweit
fixed_phy_add() has a number of problems/disadvantages: - It uses phy address 0 w/o checking whether a fixed phy with this address exists already. - A subsequent call to fixed_phy_register() would also use phy address 0, because fixed_phy_add() doesn't mark it as used. - fixed_phy_add() is used from platform code, therefore requires that fixed_phy code is built-in. Now that for the only two users (coldfire/5272 and bcm47xx) fixed_phy creation has been moved to the respective ethernet driver (fec, b44), we can remove fixed_phy_add(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/bee046a1-1e77-4057-8b04-fdb2a1bbbd08@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: phy: fixed_phy: add helper fixed_phy_register_100fdHeiner Kallweit
In few places a 100FD fixed PHY is used. Create a helper so that users don't have to define the struct fixed_phy_status. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/bf564b19-e9bc-4896-aeae-9f721cc4fecd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: phy: make phy_device members pause and asym_pause bitfield bitsHeiner Kallweit
We can reduce the size of struct phy_device a little by switching the type of members pause and asym_pause from int to a single bit. As C99 is supported now, we can use type bool for the bitfield members, what provides us with the benefit of the usual implicit bool conversions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/764e9a31-b40b-4dc9-b808-118192a16d87@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: stmmac: add support for configuring the phy_intf_sel inputsRussell King (Oracle)
When dwmac is synthesised with support for multiple PHY interfaces, the core provides phy_intf_sel inputs, sampled on reset, to configure the PHY facing interface. Use stmmac_get_phy_intf_sel() in core code to determine the dwmac phy_intf_sel input value, and provide a new platform method called with this value just before we issue a soft reset to the dwmac core. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vFt4h-0000000Chos-3wxX@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03net: phy: introduce internal API for PHY MSE diagnosticsOleksij Rempel
Add the base infrastructure for Mean Square Error (MSE) diagnostics, as proposed by the OPEN Alliance "Advanced diagnostic features for 100BASE-T1 automotive Ethernet PHYs" [1] specification. The OPEN Alliance spec defines only average MSE and average peak MSE over a fixed number of symbols. However, other PHYs, such as the KSZ9131, additionally expose a worst-peak MSE value latched since the last channel capture. This API accounts for such vendor extensions by adding a distinct capability bit and snapshot field. Channel-to-pair mapping is normally straightforward, but in some cases (e.g. 100BASE-TX with MDI-X resolution unknown) the mapping is ambiguous. If hardware does not expose MDI-X status, the exact pair cannot be determined. To avoid returning misleading per-channel data in this case, a LINK selector is defined for aggregate MSE measurements. All investigated devices differ in MSE capabilities, such as sample rate, number of analyzed symbols, and scaling factors. For example, the KSZ9131 uses different scaling for MSE and pMSE. To make this visible to callers, scale limits and timing information are returned via get_mse_capability(). Some PHYs sample very few symbols at high frequency (e.g. 2 us update rate). To cover such cases and allow for future high-speed PHYs with even shorter intervals, the refresh rate is reported as u64 in picoseconds. This patch introduces the internal PHY API for Mean Square Error diagnostics. It defines new kernel-side data types and driver hooks: - struct phy_mse_capability: describes supported metrics, scale limits, update interval, and sampling length. - struct phy_mse_snapshot: holds one correlated measurement set. - New phy_driver ops: `get_mse_capability()` and `get_mse_snapshot()`. These definitions form the core kernel API. No user-visible interfaces are added in this commit. Standardization notes: OPEN Alliance defines presence and interpretation of some metrics but does not fix numeric scales or sampling internals: - SQI (3-bit, 0..7) is mandatory; correlation to SNR/BER is informative (OA 100BASE-T1 TC1 v1.0 6.1.2; OA 1000BASE-T1 TC12 v2.2 6.1.2). - MSE is optional; OA recommends 2^16 symbols and scaling to 0..511, with a worst-case latch since last read (OA 100BASE-T1 TC1 v1.0 6.1.1; OA 1000BASE-T1 TC12 v2.2 6.1.1). Refresh is recommended (~0.8-2.0 ms for 100BASE-T1; ~80-200 us for 1000BASE-T1). Exact scaling/time windows are vendor-specific. - Peak MSE (pMSE) is defined only for 100BASE-T1 as optional, e.g. 128-symbol sliding window with 8-bit range and worst-case latch (OA 100BASE-T1 TC1 v1.0 6.1.3). Therefore this API exposes which measures and selectors a PHY supports, and documents where behavior is standard-referenced vs vendor-specific. [1] <https://opensig.org/wp-content/uploads/2024/01/ Advanced_PHY_features_for_automotive_Ethernet_V1.0.pdf> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20251027122801.982364-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-03net: Extend NAPI threaded polling to allow kthread based busy pollingSamiullah Khawaja
Add a new state NAPI_STATE_THREADED_BUSY_POLL to the NAPI state enum to enable and disable threaded busy polling. When threaded busy polling is enabled for a NAPI, enable NAPI_STATE_THREADED also. When the threaded NAPI is scheduled, set NAPI_STATE_IN_BUSY_POLL to signal napi_complete_done not to rearm interrupts. Whenever NAPI_STATE_THREADED_BUSY_POLL is unset, the NAPI_STATE_IN_BUSY_POLL will be unset, napi_complete_done unsets the NAPI_STATE_SCHED_THREADED bit also, which in turn will make the kthread go to sleep. Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Martin Karsten <mkarsten@uwaterloo.ca> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://patch.msgid.link/20251028203007.575686-2-skhawaja@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31dpll: add phase-adjust-gran pin attributeIvan Vecera
Phase-adjust values are currently limited by a min-max range. Some hardware requires, for certain pin types, that values be multiples of a specific granularity, as in the zl3073x driver. Add a `phase-adjust-gran` pin attribute and an appropriate field in dpll_pin_properties. If set by the driver, use its value to validate user-provided phase-adjust values. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Petr Oros <poros@redhat.com> Tested-by: Prathosh Satish <Prathosh.Satish@microchip.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Link: https://patch.msgid.link/20251029153207.178448-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc4). No conflicts, adjacent changes: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c ded9813d17d3 ("net: stmmac: Consider Tx VLAN offload tag length for maxSDU") 26ab9830beab ("net: stmmac: replace has_xxxx with core_type") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29net: phy: add iterator mdiobus_for_each_phyHeiner Kallweit
Add an iterator for all PHY's on a MII bus, and phy_find_next() as a prerequisite. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/cd112f15-401a-43d9-8525-9ff0965a68cd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29ipv4: icmp: Add RFC 5837 supportIdo Schimmel
Add the ability to append the incoming IP interface information to ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is required for more meaningful traceroute results in unnumbered networks. The feature is disabled by default and controlled via a new sysctl ("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP extensions to append to ICMP error messages. Currently, only a single value is supported, but the interface and the implementation should be able to support more extensions, if needed. Clone the skb and copy the relevant data portions before modifying the skb as the caller of __icmp_send() still owns the skb after the function returns. This should be fine since by default ICMP error messages are rate limited to 1000 per second and no more than 1 per second per specific host. Trim or pad the packet to 128 bytes before appending the ICMP extension structure in order to be compatible with legacy applications that assume that the ICMP extension structure always starts at this offset (the minimum length specified by RFC 4884). Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251027082232.232571-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-28net: rps: softnet_data reorg to make enqueue_to_backlog() fastEric Dumazet
enqueue_to_backlog() is showing up in kernel profiles on hosts with many cores, when RFS/RPS is used. The following softnet_data fields need to be updated: - input_queue_tail - input_pkt_queue (next, prev, qlen, lock) - backlog.state (if input_pkt_queue was empty) Unfortunately they are currenly using two cache lines: /* --- cacheline 3 boundary (192 bytes) --- */ call_single_data_t csd __attribute__((__aligned__(64))); /* 0xc0 0x20 */ struct softnet_data * rps_ipi_next; /* 0xe0 0x8 */ unsigned int cpu; /* 0xe8 0x4 */ unsigned int input_queue_tail; /* 0xec 0x4 */ struct sk_buff_head input_pkt_queue; /* 0xf0 0x18 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x108 0x1f0 */ Add one ____cacheline_aligned_in_smp to make sure they now are using a single cache line. Also, because napi_struct has written fields, make @state its first field. We want to make sure that cpus adding packets to sd->input_pkt_queue are not slowing down cpus processing their backlog because of false sharing. After this patch new layout is: /* --- cacheline 5 boundary (320 bytes) --- */ long int pad[3] __attribute__((__aligned__(64))); /* 0x140 0x18 */ unsigned int input_queue_tail; /* 0x158 0x4 */ /* XXX 4 bytes hole, try to pack */ struct sk_buff_head input_pkt_queue; /* 0x160 0x18 */ struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x178 0x1f0 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251024091240.3292546-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-28net/mlx5: Add balance ID support for LAG multiplane groupsMark Bloch
Implement balance ID support for multiplane LAG configurations. This feature enables per-multiplane group load balancing by extending the software system image GUID with a balance ID component. Key implementations: - Enable lag_per_mp_group capability when supported by hardware. - Append load_balance_id to software system image GUID when conditions are met. - Increase MLX5_SW_IMAGE_GUID_MAX_BYTES from 8 to 9 to accommodate the extra byte. The balance ID is appended to the system image GUID only when both load_balance_id and lag_per_mp_group capabilities are available, ensuring backward compatibility while enabling enhanced LAG functionality. This enhancement allows for more granular load balancing control in complex multi-plane LAG deployments, improving network performance and flexibility. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761211020-925651-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28net/mlx5: Add software system image GUID infrastructureMark Bloch
Replace direct hardware system image GUID usage with a new software system image GUID function that supports variable-length identifiers. Key changes: - Add mlx5_query_nic_sw_system_image_guid() function with length parameter. - Update all callsites to use the new function and buffer/length approach. - Modify mapping contexts to use byte arrays instead of u64 keys. - Update devcom matching to support variable-length keys. - Change mlx5_same_hw_devs() to use buffer comparison instead of u64. This refactoring prepares the infrastructure for balance ID support, which requires extending the system image GUID with additional data. The change maintains backward compatibility while enabling future enhancements. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761211020-925651-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-24net: phylink: add phylink managed wake-on-lan PHY speed controlRussell King (Oracle)
Some drivers, e.g. stmmac, use the speed_up()/speed_down() APIs to gain additional power saving during Wake-on-LAN where the PHY is managing the state. Add support to phylink for this, which can be enabled by the MAC driver. Only change the PHY speed if the PHY is configured for wake-up, but without any wake-up on the MAC side, as MAC side means changing the configuration once the negotiation has completed. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vBrR7-0000000BLza-2PjK@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24net: phylink: add phylink managed MAC Wake-on-Lan supportRussell King (Oracle)
Add core phylink managed Wake-on-Lan support, which is enabled when the MAC driver fills in the new .mac_wol_set() method that this commit creates. When this feature is disabled, phylink acts as it has in the past, merely passing the ethtool WoL calls to phylib whenever a PHY exists. No other new functionality provided by this commit is enabled. When this feature is enabled, a more inteligent approach is used. Phylink will first pass WoL options to the PHY, read them back, and attempt to set any options that were not set at the PHY at the MAC. Since we have PHY drivers that report they support WoL, and accept WoL configuration even though they aren't wired up to be capable of waking the system, we need a way to differentiate between PHYs that think they support WoL and those which actually do. As PHY drivers do not make use of the driver model's wake-up infrastructure, but could, we use this to determine whether PHY drivers can participate. This gives a path forward where, as MAC drivers are converted to this, it encourages PHY drivers to also be converted. Phylink will also ignore the mac_wol argument to phylink_suspend() as it now knows the WoL state at the MAC. MAC drivers are expected to record/configure the Wake-on-Lan state in their .mac_set_wol() method, and deal appropriately with it in their suspend/resume methods. The driver model provides assistance to set the IRQ wake support which may assist driver authors in achieving the necessary configuration. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vBrR2-0000000BLzU-1xYL@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24net: phy: add phy_may_wakeup()Russell King (Oracle)
Add phy_may_wakeup() which uses the driver model's device_may_wakeup() when the PHY driver has marked the device as wakeup capable in the driver model, otherwise use phy_drv_wol_enabled(). Replace the sites that used to call phy_drv_wol_enabled() with this as checking the driver model will be more efficient than checking the WoL state. Export phy_may_wakeup() so that phylink can use it. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vBrQx-0000000BLzO-1RLt@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24net: phy: add phy_can_wakeup()Russell King (Oracle)
Add phy_can_wakeup() to report whether the PHY driver has marked the PHY device as being wake-up capable as far as the driver model is concerned. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vBrQs-0000000BLzI-0w3U@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24Merge tag 'soc-fixes-6.18-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main change this time is an update to the MAINTAINERS file, listing Krzysztof Kozlowski, Alexandre Belloni, and Linus Walleij as additional maintainers for the SoC tree, in order to go back to a group maintainership. Drew Fustini joins as an additional reviewer for the SoC tree. Thanks to all of you for volunteering to help out. On the actual bugfixes, we have a few correctness changes for firmware drivers (qtee, arm-ffa, scmi) and two devicetree fixes for Raspberry Pi" * tag 'soc-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: soc: officially expand maintainership team firmware: arm_scmi: Fix premature SCMI_XFER_FLAG_IS_RAW clearing in raw mode firmware: arm_scmi: Skip RAW initialization on failure include: trace: Fix inflight count helper on failed initialization firmware: arm_scmi: Account for failed debug initialization ARM: dts: broadcom: rpi: Switch to V3D firmware clock arm64: dts: broadcom: bcm2712: Define VGIC interrupt firmware: arm_ffa: Add support for IMPDEF value in the memory access descriptor tee: QCOMTEE should depend on ARCH_QCOM tee: qcom: return -EFAULT instead of -EINVAL if copy_from_user() fails tee: qcom: prevent potential off by one read
2025-10-24Merge tag 'gpio-fixes-for-v6.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix regressions in regmap cache initialization in gpio-104-idio-16 and gpio-pci-idio-16 - configure first 16 GPIO lines of the IDIO-16 as fixed outputs - fix duplicated IRQ mapping that can lead to an RCU stall in gpio-ljca - fix printf formatters passed to dev_err() and make failure to set debounce period non fatal * tag 'gpio-fixes-for-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: ljca: Fix duplicated IRQ mapping gpiolib: acpi: Use %pe when passing an error pointer to dev_err() gpiolib: acpi: Make set debounce errors non fatal gpio: idio-16: Define fixed direction of the GPIO lines gpio: regmap: add the .fixed_direction_output configuration parameter gpio: pci-idio-16: Define maximum valid register address offset gpio: 104-idio-16: Define maximum valid register address offset
2025-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23Merge tag 'net-6.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can. Slim pickings, I'm guessing people haven't really started testing. Current release - new code bugs: - eth: mlx5e: - psp: avoid 'accel' NULL pointer dereference - skip PPHCR register query for FEC histogram if not supported Previous releases - regressions: - bonding: update the slave array for broadcast mode - rtnetlink: re-allow deleting FDB entries in user namespace - eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path Previous releases - always broken: - can: drop skb on xmit if device is in listen-only mode - gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb() - eth: mlx5e - RX, fix generating skb from non-linear xdp_buff if program trims frags - make devcom init failures non-fatal, fix races with IPSec Misc: - some documentation formatting 'fixes'" * tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net/mlx5: Fix IPsec cleanup over MPV device net/mlx5: Refactor devcom to return NULL on failure net/mlx5e: Skip PPHCR register query if not supported by the device net/mlx5: Add PPHCR to PCAM supported registers mask virtio-net: zero unused hash fields net: phy: micrel: always set shared->phydev for LAN8814 vsock: fix lock inversion in vsock_assign_transport() ovpn: use datagram_poll_queue for socket readiness in TCP espintcp: use datagram_poll_queue for socket readiness net: datagram: introduce datagram_poll_queue for custom receive queues net: bonding: fix possible peer notify event loss or dup issue net: hsr: prevent creation of HSR device with slaves from another netns sctp: avoid NULL dereference when chunk data buffer is missing ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop net: ravb: Ensure memory write completes before ringing TX doorbell net: ravb: Enforce descriptor type ordering net: hibmcge: select FIXED_PHY net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb Documentation: networking: ax25: update the mailing list info. net: gro_cells: fix lock imbalance in gro_cells_receive() ...
2025-10-23Merge tag 'pm-6.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These revert a cpuidle menu governor commit leading to a performance regression, fix an amd-pstate driver regression introduced recently, and fix new conditional guard definitions for runtime PM. - Add missing _RET == 0 condition to recently introduced conditional guard definitions for runtime PM (Rafael Wysocki) - Revert a cpuidle menu governor change that introduced a serious performance regression on Chromebooks with Intel Jasper Lake processors (Rafael Wysocki) - Fix an amd-pstate driver regression leading to EPP=0 after hibernation (Mario Limonciello)" * tag 'pm-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: runtime: Fix conditional guard definitions Revert "cpuidle: menu: Avoid discarding useful information" cpufreq/amd-pstate: Fix a regression leading to EPP 0 after hibernate
2025-10-23net/mlx5: Add PPHCR to PCAM supported registers maskAlexei Lazar
Add the PPHCR bit to the port_access_reg_cap_mask field of PCAM register to indicate that the device supports the PPHCR register and the RS-FEC histogram feature. Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761136182-918470-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23virtio-net: zero unused hash fieldsJason Wang
When GSO tunnel is negotiated virtio_net_hdr_tnl_from_skb() tries to initialize the tunnel metadata but forget to zero unused rxhash fields. This may leak information to another side. Fixing this by zeroing the unused hash fields. Acked-by: Michael S. Tsirkin <mst@redhat.com> Fixes: a2fb4bc4e2a6a ("net: implement virtio helpers to handle UDP GSO tunneling") Cc: <stable@vger.kernel.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20251022034421.70244-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23net: datagram: introduce datagram_poll_queue for custom receive queuesRalf Lici
Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver userspace-bound packets through a custom skb queue rather than the standard sk_receive_queue. Introduce datagram_poll_queue that accepts an explicit receive queue, and convert datagram_poll into a wrapper around datagram_poll_queue. This allows protocols with custom skb queues to reuse the core polling logic without relying on sk_receive_queue. Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-22net: stmmac: replace has_xxxx with core_typeRussell King (Oracle)
Replace the has_gmac, has_gmac4 and has_xgmac ints, of which only one can be set when matching a core to its driver backend, with an enumerated type carrying the DWMAC core type. Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chen-Yu Tsai <wens@kernel.org> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/E1vB6ld-0000000BIPy-2Qi4@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-22Merge tag 'mm-hotfixes-stable-2025-10-22-12-43' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "17 hotfixes. 12 are cc:stable and 14 are for MM. There's a two-patch DAMON series from SeongJae Park which addresses a missed check and possible memory leak. Apart from that it's all singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-10-22-12-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: csky: abiv2: adapt to new folio flags field mm/damon/core: use damos_commit_quota_goal() for new goal commit mm/damon/core: fix potential memory leak by cleaning ops_filter in damon_destroy_scheme hugetlbfs: move lock assertions after early returns in huge_pmd_unshare() vmw_balloon: indicate success when effectively deflating during migration mm/damon/core: fix list_add_tail() call on damon_call() mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remap mm: prevent poison consumption when splitting THP ocfs2: clear extent cache after moving/defragmenting extents mm: don't spin in add_stack_record when gfp flags don't allow dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC mm/damon/sysfs: dealloc commit test ctx always mm/damon/sysfs: catch commit test ctx alloc failure hung_task: fix warnings caused by unaligned lock pointers
2025-10-22PM: runtime: Fix conditional guard definitionsRafael J. Wysocki
Since pm_runtime_get_active() returns 0 on success, all of the DEFINE_GUARD_COND() macros in pm_runtime.h need the "_RET == 0" condition at the end of the argument list or they would not work correctly. Fixes: 9a0abc39450a ("PM: runtime: Add auto-cleanup macros for "resume and get" operations") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/linux-pm/202510191529.BCyjKlLQ-lkp@intel.com/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Farhan Ali <alifm@linux.ibm.com> Link: https://patch.msgid.link/5943878.DvuYhMxLoT@rafael.j.wysocki
2025-10-22gpio: regmap: add the .fixed_direction_output configuration parameterIoana Ciornei
There are GPIO controllers such as the one present in the LX2160ARDB QIXIS FPGA which have fixed-direction input and output GPIO lines mixed together in a single register. This cannot be modeled using the gpio-regmap as-is since there is no way to present the true direction of a GPIO line. In order to make this use case possible, add a new configuration parameter - fixed_direction_output - into the gpio_regmap_config structure. This will enable user drivers to provide a bitmap that represents the fixed direction of the GPIO lines. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-10-21net: add a common function to compute features for upper devicesHangbin Liu
Some high level software drivers need to compute features from lower devices. But each has their own implementations and may lost some feature compute. Let's use one common function to compute features for kinds of these devices. The new helper uses the current bond implementation as the reference one, as the latter already handles all the relevant aspects: netdev features, TSO limits and dst retention. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251017034155.61990-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20Merge tag 'linux-can-next-for-6.19-20251017' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-10-17 The first patch is by me and adds support for an optional reset to the m_can drivers. Vincent Mailhol's patch targets all drivers and removes the can_change_mtu() function, since the netdev's min and max MTU are populated. Markus Schneider-Pargmann contributes 4 patches to the m_can driver to add am62 wakeup support. The last 7 patches are by me and provide various cleanups to the m_can driver. * tag 'linux-can-next-for-6.19-20251017' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: m_can: m_can_get_berr_counter(): don't wake up controller if interface is down can: m_can: m_can_tx_submit(): remove unneeded sanity checks can: m_can: m_can_class_register(): remove error message in case devm_kzalloc() fails can: m_can: m_can_interrupt_enable(): use m_can_write() instead of open coding it net: m_can: convert dev_{dbg,info,err} -> netdev_{dbg,info,err} can: m_can: hrtimer_callback(): rename to m_can_polling_timer() can: m_can: m_can_init_ram(): make static can: m_can: Support pinctrl wakeup state can: m_can: Return ERR_PTR on error in allocation can: m_can: Map WoL to device_set_wakeup_enable dt-bindings: can: m_can: Add wakeup properties can: treewide: remove can_change_mtu() can: m_can: add support for optional reset ==================== Link: https://patch.msgid.link/20251017150819.1415685-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20net: stmmac: remove broken PCS codeRussell King (Oracle)
Changing the netif_carrier_*() state behind phylink's back has always been prohibited because it messes up with phylinks state tracking, and means that phylink no longer guarantees to call the mac_link_down() and mac_link_up() methods at the appropriate times. This was later documented in the sfp-phylink network driver conversion guide. stmmac was converted to phylink in 2019, but nothing was done with the "PCS" code. Since then, apart from the updates as part of phylink development, nothing has happened with stmmac to improve its use of phylink, or even to address this point. A couple of years ago, a has_integrated_pcs boolean was added by Bart, which later became the STMMAC_FLAG_HAS_INTEGRATED_PCS flag, to avoid manipulating the netif_carrier_*() state. This flag is mis-named, because whenever the stmmac is synthesized for its native SGMII, TBI or RTBI interfaces, it has an "integrated PCS". This boolean/flag actually means "ignore the status from the integrated PCS". Discussing with Bart, the reasons for this are lost to the winds of time (which is why we should always document the reasons in the commit message.) RGMII also has in-band status, and the dwmac cores and stmmac code supports this but with one bug that saves the day. When dwmac cores are synthesised for RGMII only, they do not contain an integrated PCS, and so priv->dma_cap.pcs is clear, which prevents (incorrectly) the "RGMII PCS" being used, meaning we don't read the in-band status. However, a core synthesised for RGMII and also SGMII, TBI or RTBI will have this capability bit set, thus making these code paths reachable. The Jetson Xavier NX uses RGMII mode to talk to its PHY, and removing the incorrect check for priv->dma_cap.pcs reveals the theortical issue with netif_carrier_*() manipulation is real: dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141) dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode 8021q: adding VLAN 0 to HW filter on device eth0 dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported Link is Up - 1000/Full Link is Down Link is Up - 1000/Full This looks good until one realises that the phylink "Link" status messages are missing, even when the RJ45 cable is reconnected. Nothing one can do results in the interface working. The interrupt handler (which prints those "Link is" messages) always wins over phylink's resolve worker, meaning phylink never calls the mac_link_up() nor mac_link_down() methods. eth0 also sees no traffic received, and is unable to obtain a DHCP address: 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group defa ult qlen 1000 link/ether e6:d3:6a:e6:92:de brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 0 0 0 0 0 0 TX: bytes packets errors dropped carrier collsns 27686 149 0 0 0 0 With the STMMAC_FLAG_HAS_INTEGRATED_PCS flag set, which disables the netif_carrier_*() manipulation then stmmac works normally: dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141) dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode 8021q: adding VLAN 0 to HW filter on device eth0 dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported Link is Up - 1000/Full dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx and packets can be transferred. This clearly shows that when priv->hw->pcs is set, but STMMAC_FLAG_HAS_INTEGRATED_PCS is clear, the driver reliably fails. Discovering whether a platform falls into this is impossible as parsing all the dtsi and dts files to find out which use the stmmac driver, whether any of them use RGMII or SGMII and also depends whether an external interface is being used. The kernel likely doesn't contain all dts files either. The only driver that sets this flag uses the qcom,sa8775p-ethqos compatible, and uses SGMII or 2500BASE-X. but these are saved from this problem by the incorrect check for priv->dma_cap.pcs. So, we have to assume that for every other platform that uses SGMII with stmmac is using an external PCS. Moreover, ethtool output can be incorrect. With the full-duplex link negotiated, ethtool reports: Speed: 1000Mb/s Duplex: Half because with dwmac4, the full-duplex bit is in bit 16 of the status, priv->xstats.pcs_duplex becomes BIT(16) for full duplex, but the ethtool ksettings duplex member is u8 - so becomes zero. Moreover, the supported, advertised and link partner modes are all "not reported". Finally, ksettings_set() won't be able to set the advertisement on a PHY if this PCS code is activated, which is incorrect when SGMII is used with a PHY. Thus, remove: 1. the incorrect netif_carrier_*() manipulation. 2. the broken ethtool ksettings code. Given that all uses of STMMAC_FLAG_HAS_INTEGRATED_PCS are now gone, remove the flag from stmmac.h and dwmac-qcom-ethqos.c. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/E1v9P5y-0000000AolC-1QWH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20Merge tag 'cgroup-for-6.18-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix seqcount lockdep assertion failure in cgroup freezer on PREEMPT_RT. Plain seqcount_t expects preemption disabled, but PREEMPT_RT spinlocks don't disable preemption. Switch to seqcount_spinlock_t to properly associate css_set_lock with the freeze timing seqcount. - Misc changes including kernel-doc warning fix for misc_res_type enum and improved selftest diagnostics. * tag 'cgroup-for-6.18-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/misc: fix misc_res_type kernel-doc warning selftests: cgroup: Use values_close_report in test_cpu selftests: cgroup: add values_close_report helper cgroup: Fix seqcount lockdep assertion in cgroup freezer
2025-10-20Merge tag 'fsnotify_for_v6.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: - Stop-gap solution for a race between unmount of a filesystem with fsnotify marks and someone inspecting fdinfo of fsnotify group with those marks in procfs. A proper solution is in the works but it will get a while to settle. - Fix for non-decodable file handles (used by unprivileged apps using fanotify) * tag 'fsnotify_for_v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/notify: call exportfs_encode_fid with s_umount expfs: Fix exportfs_can_encode_fh() for EXPORT_FH_FID
2025-10-18Merge tag 'hid-for-linus-2025101701' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - fix for sticky fingers handling in hid-multitouch (Benjamin Tissoires) - fix for reporting of 0 battery levels (Dmitry Torokhov) - build fix for hid-haptic in certain configurations (Jonathan Denose) - improved probe and avoiding spamming kernel log by hid-nintendo (Vicki Pfau) - fix for OOB in hid-cp2112 (Deepak Sharma) - interrupt handling fix for intel-thc-hid (Even Xu) - a couple of new device IDs and device-specific quirks * tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL selftests/hid: add tests for missing release on the Dell Synaptics HID: multitouch: fix sticky fingers HID: multitouch: fix name of Stylus input devices HID: hid-input: only ignore 0 battery events for digitizers HID: hid-debug: Fix spelling mistake "Rechargable" -> "Rechargeable" HID: Kconfig: Fix build error from CONFIG_HID_HAPTIC HID: nintendo: Rate limit IMU compensation message HID: nintendo: Wait longer for initial probe HID: core: Add printk_ratelimited variants to hid_warn() etc HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel HID: quirks: avoid Cooler Master MM712 dongle wakeup bug HID: cp2112: Add parameter validation to data length HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's HID: intel-thc-hid: Intel-quickspi: switch first interrupt from level to edge detection HID: intel-thc-hid: intel-quicki2c: Fix wrong type casting
2025-10-18Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov) - Make selftests/bpf arg_parsing.c more robust to errors (Andrii Nakryiko) - Fix redefinition of 'off' as different kind of symbol when I40E driver is builtin (Brahmajit Das) - Do not disable preemption in bpf_test_run (Sahil Chandna) - Fix memory leak in __lookup_instance error path (Shardul Bankar) - Ensure test data is flushed to disk before reading it (Xing Guo) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix redefinition of 'off' as different kind of symbol bpf: Do not disable preemption in bpf_test_run(). bpf: Fix memory leak in __lookup_instance error path selftests: arg_parsing: Ensure data is flushed to disk before reading. bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures. selftests/bpf: make arg_parsing.c more robust to crashes bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
2025-10-18Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: - Fix for FlexFiles mirror->dss allocation - Apply delay_retrans to async operations - Check if suid/sgid is cleared after a write when needed - Fix setting the state renewal timer for early mounts after a reboot * tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS4: Fix state renewals missing after boot NFS: check if suid/sgid was cleared after a write as needed NFS4: Apply delay_retrans to async operations NFSv4/flexfiles: fix to allocate mirror->dss before use
2025-10-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of ZCR_EL2 in NV VMs - Pick the correct translation regime when doing a PTW on the back of a SEA - Prevent userspace from injecting an event into a vcpu that isn't initialised yet - Move timer save/restore to the sysreg handling code, fixing EL2 timer access in the process - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug - Fix trapping configuration when the host isn't GICv3 - Improve the detection of HCR_EL2.E2H being RES1 - Drop a spurious 'break' statement in the S1 PTW - Don't try to access SPE when owned by EL3 Documentation updates: - Document the failure modes of event injection - Document that a GICv3 guest can be created on a GICv5 host with FEAT_GCIE_LEGACY Selftest improvements: - Add a selftest for the effective value of HCR_EL2.AMO - Address build warning in the timer selftest when building with clang - Teach irqfd selftests about non-x86 architectures - Add missing sysregs to the set_id_regs selftest - Fix vcpu allocation in the vgic_lpi_stress selftest - Correctly enable interrupts in the vgic_lpi_stress selftest x86: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. guest_memfd: - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) arm64: Revamp HCR_EL2.E2H RES1 detection KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available KVM: arm64: Compute per-vCPU FGTs at vcpu_load() KVM: arm64: selftests: Fix misleading comment about virtual timer encoding KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit KVM: arm64: Kill leftovers of ad-hoc timer userspace access KVM: arm64: Fix WFxT handling of nested virt KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure KVM: arm64: Add timer UAPI workaround to sysreg infrastructure KVM: arm64: Make timer_set_offset() generally accessible KVM: arm64: Replace timer context vcpu pointer with timer_id KVM: arm64: Introduce timer_context_to_vcpu() helper KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests Documentation: KVM: Update GICv3 docs for GICv5 hosts KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress KVM: arm64: selftests: Allocate vcpus with correct size ...
2025-10-18Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory.
2025-10-17rculist: Add hlist_nulls_replace_rcu() and hlist_nulls_replace_init_rcu()Xuanqiang Luo
Add two functions to atomically replace RCU-protected hlist_nulls entries. Keep using WRITE_ONCE() to assign values to ->next and ->pprev, as mentioned in the patch below: commit efd04f8a8b45 ("rcu: Use WRITE_ONCE() for assignments to ->next for rculist_nulls") commit 860c8802ace1 ("rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nulls") Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xuanqiang Luo <luoxuanqiang@kylinos.cn> Link: https://patch.msgid.link/20251015020236.431822-2-xuanqiang.luo@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-17ipv6: Move ipv6_fl_list from ipv6_pinfo to inet_sock.Kuniyuki Iwashima
In {tcp6,udp6,raw6}_sock, struct ipv6_pinfo is always placed at the beginning of a new cache line because 1. __alignof__(struct tcp_sock) is 64 due to ____cacheline_aligned of __cacheline_group_begin(tcp_sock_write_tx) 2. __alignof__(struct udp_sock) is 64 due to ____cacheline_aligned of struct numa_drop_counters 3. in raw6_sock, struct numa_drop_counters is placed before struct ipv6_pinfo . struct ipv6_pinfo is 136 bytes, but the last cache line is only used by ipv6_fl_list: $ pahole -C ipv6_pinfo vmlinux struct ipv6_pinfo { ... /* --- cacheline 2 boundary (128 bytes) --- */ struct ipv6_fl_socklist * ipv6_fl_list; /* 128 8 */ /* size: 136, cachelines: 3, members: 23 */ Let's move ipv6_fl_list from struct ipv6_pinfo to struct inet_sock to save a full cache line for {tcp6,udp6,raw6}_sock. Now, struct ipv6_pinfo is 128 bytes, and {tcp6,udp6,raw6}_sock have 64 bytes less, while {tcp,udp,raw}_sock retain the same size. Before: # grep -E "^(RAW|UDP[^L\-]|TCP)" /proc/slabinfo | awk '{print $1, "\t", $4}' RAWv6 1408 UDPv6 1472 TCPv6 2560 RAW 1152 UDP 1280 TCP 2368 After: # grep -E "^(RAW|UDP[^L\-]|TCP)" /proc/slabinfo | awk '{print $1, "\t", $4}' RAWv6 1344 UDPv6 1408 TCPv6 2496 RAW 1152 UDP 1280 TCP 2368 Also, ipv6_fl_list and inet_flags (SNDFLOW bit) are placed in the same cache line. $ pahole -C inet_sock vmlinux ... /* --- cacheline 11 boundary (704 bytes) was 56 bytes ago --- */ struct ipv6_pinfo * pinet6; /* 760 8 */ /* --- cacheline 12 boundary (768 bytes) --- */ struct ipv6_fl_socklist * ipv6_fl_list; /* 768 8 */ unsigned long inet_flags; /* 776 8 */ Doc churn is due to the insufficient Type column (only 1 space short). Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251014224210.2964778-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-17cgroup/misc: fix misc_res_type kernel-doc warningRandy Dunlap
Format the kernel-doc for SCALE_HW_CALIB_INVALID correctly to avoid a kernel-doc warning: Warning: include/linux/misc_cgroup.h:26 Enum value 'MISC_CG_RES_TDX' not described in enum 'misc_res_type' Fixes: 7c035bea9407 ("KVM: TDX: Register TDX host key IDs to cgroup misc controller") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-17Merge tag 'mmc-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull mmc cleanup from Ulf Hansson: "Move rpmb_frame struct and constants to rpmb common header This helps us to avoid sharing an immutable branch between our git trees. I was planning to send it before rc1, but I didn't make it" * tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: rpmb: move rpmb_frame struct and constants to common header
2025-10-17can: treewide: remove can_change_mtu()Vincent Mailhol
can_change_mtu() became obsolete by commit 23049938605b ("can: populate the minimum and maximum MTU values"). Now that net_device->min_mtu and net_device->max_mtu are populated, all the checks are already done by dev_validate_mtu() in net/core/dev.c. Remove the net_device_ops->ndo_change_mtu() callback of all the physical interfaces, then remove can_change_mtu(). Only keep the vcan_change_mtu() and vxcan_change_mtu() because the virtual interfaces use their own different MTU logic. The only functional change this patch introduces is that now the user will be able to change the MTU even if the interface is up. This does not matter for Classical CAN and CAN FD because their MTU range is composed of only one value, respectively CAN_MTU and CANFD_MTU. For the upcoming CAN XL, the MTU will be configurable within the CANXL_MIN_MTU to CANXL_MAX_MTU range at any time, even if the interface is up. This is consistent with the other net protocols and does not contradict ISO 11898-1:2024 as having a modifiable MTU is a kernel extension. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/20251003-remove-can_change_mtu-v1-1-337f8bc21181@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-10-16net/sched: act_mirred: add loop detectionEric Dumazet
Commit 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") added code in the fast path, even when act_mirred is not used. Prepare its revert by implementing loop detection in act_mirred. Adds an array of device pointers in struct netdev_xmit. tcf_mirred_is_act_redirect() can detect if the array already contains the target device. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251014171907.3554413-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16Merge tag 'net-6.18-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN Current release - regressions: - udp: do not use skb_release_head_state() before skb_attempt_defer_free() - gro_cells: use nested-BH locking for gro_cell - dpll: zl3073x: increase maximum size of flash utility Previous releases - regressions: - core: fix lockdep splat on device unregister - tcp: fix tcp_tso_should_defer() vs large RTT - tls: - don't rely on tx_work during send() - wait for pending async decryptions if tls_strp_msg_hold fails - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom Previous releases - always broken: - ip6_tunnel: prevent perpetual tunnel growth - dpll: zl3073x: handle missing or corrupted flash configuration - can: m_can: fix pm_runtime and CAN state handling - eth: - ixgbe: fix too early devlink_free() in ixgbe_remove() - ixgbevf: fix mailbox API compatibility - gve: Check valid ts bit on RX descriptor before hw timestamping - idpf: cleanup remaining SKBs in PTP flows - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H" * tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) udp: do not use skb_release_head_state() before skb_attempt_defer_free() net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset netdevsim: set the carrier when the device goes up selftests: tls: add test for short splice due to full skmsg selftests: net: tls: add tests for cmsg vs MSG_MORE tls: don't rely on tx_work during send() tls: wait for pending async decryptions if tls_strp_msg_hold fails tls: always set record_type in tls_process_cmsg tls: wait for async encrypt in case of error during latter iterations of sendmsg tls: trim encrypted message to match the plaintext on short splice tg3: prevent use of uninitialized remote_adv and local_adv variables MAINTAINERS: new entry for IPv6 IOAM gve: Check valid ts bit on RX descriptor before hw timestamping net: core: fix lockdep splat on device unregister MAINTAINERS: add myself as maintainer for b53 selftests: net: check jq command is supported net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit() tcp: fix tcp_tso_should_defer() vs large RTT r8152: add error handling in rtl8152_driver_init usbnet: Fix using smp_processor_id() in preemptible code warnings ...
2025-10-15hung_task: fix warnings caused by unaligned lock pointersLance Yang
The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding. However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger. To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings. Thanks to Geert Uytterhoeven for bisecting! Link: https://lkml.kernel.org/r/20250909145243.17119-1-lance.yang@linux.dev Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Signed-off-by: Lance Yang <lance.yang@linux.dev> Reported-by: Eero Tamminen <oak@helsinkinet.fi> Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Finn Thain <fthain@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Lance Yang <lance.yang@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>