summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-11-25ptp: ocp: Refactor signal_show() and fix %ptT misuseAndy Shevchenko
Refactor signal_show() to avoid sequential calls to sysfs_emit*() and use the same pattern to get the index of a signal as it's done in signal_store(). While at it, fix wrong use of %ptT against struct timespec64. It's kinda lucky that it worked just because the first member there 64-bit and it's of time64_t type. Now with %ptS it may be used correctly. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25vsock/test: Extend transport change null-ptr-deref testMichal Luczaj
syzkaller reported a lockdep lock order inversion warning[1] due to commit 687aa0c5581b ("vsock: Fix transport_* TOCTOU"). This was fixed in commit f7c877e75352 ("vsock: fix lock inversion in vsock_assign_transport()"). Redo syzkaller's repro by piggybacking on a somewhat related test implemented in commit 3a764d93385c ("vsock/test: Add test for null ptr deref when transport changes"). [1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/ Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25r8169: improve MAC EEE handlingHeiner Kallweit
Let phydev->enable_tx_lpi control whether MAC enables TX LPI, instead of enabling it unconditionally. This way TX LPI is disabled if e.g. link partner doesn't support EEE. This helps to avoid potential issues like link flaps. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/91bcb837-3fab-4b4e-b495-038df0932e44@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: phy: mxl-gpy: add support for MxL86252 and MxL86282Daniel Golle
Add PHY driver support for Maxlinear MxL86252 and MxL86282 switches. The PHYs built-into those switches are just like any other GPY 2.5G PHYs with the exception of the temperature sensor data being encoded in a different way. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/a6cd7fe461b011cec2b59dffaf34e9c8b0819059.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: phy: mxl-gpy: add support for MxL86211CChad Monroe
MxL86211C is a smaller and more efficient version of the GPY211C. Add the PHY ID and phy_driver instance to the mxl-gpy driver. Signed-off-by: Chad Monroe <chad@monroe.io> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/cabf3559d6511bed6b8a925f540e3162efc20f6b.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: mdio: remove redundant fwnode cleanupBuday Csaba
Remove redundant fwnode cleanup in of_mdiobus_register_device() and xpcs_plat_init_dev(). mdio_device_free() eventually calls mdio_device_release(), which already performs fwnode_handle_put(), making the manual cleanup unnecessary. Combine fwnode_handle_get() with device_set_node() in of_mdiobus_register_device() for clarity. Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/00847693daa8f7c8ff5dfa19dd35fc712fa4e2b5.1763995734.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: mdio: eliminate kdoc warnings in mdio_device.c and mdio_bus.cBuday Csaba
Fix all warnings reported by scripts/kernel-doc in mdio_device.c and mdio_bus.c Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/7ef7b80669da2b899d38afdb6c45e122229c3d8c.1763968667.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'net-enetc-add-port-mdio-support-for-both-i-mx94-and-i-mx95'Jakub Kicinski
Wei Fang says: ==================== net: enetc: add port MDIO support for both i.MX94 and i.MX95 The NETC IP has one external master MDIO interface (eMDIO) for managing external PHYs, all ENETC ports share this eMDIO. The EMDIO function and the ENETC port MDIO are the virtual ports of this eMDIO, ENETC can use these virtual ports to access their PHYs. The difference is that EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, so it provides a means for different software modules to share a single set of MDIO signals to access their PHYs. The ENETC port MDIO can only access its own external PHY. Furthermore, its PHY address must be set to its corresponding LaBCR register in IERB module, which is is a 64 KB size page containing registers that are used for pre-boot initialization for all NETC PCIe functions. And this IERB is owned by the host OS and it will be locked after the initialization, so it cannot be configured at running time any more. The port MDIO can only work properly when the PHY address accessed by it matches the value of its corresponding LaBCR[MDIO_PHYAD_PRTAD]. Otherwise, the MDIO access by the port MDIO will not take effect. Note that the same PHY is either controlled by port MDIO or by the EMDIO function. The netc-blk-ctrl driver will only set the PHY address in the LaBCR register corresponding to the ENETC when the ENETC node contains an mdio child node, and the ENETC driver will only create the port MDIO bus then. An example in DTS is as follows, the EMDIO function will not\ access this PHY. enetc_port0 { phy-handle = <&ethphy0>; phy-mode = "rgmii-id"; mdio { #address-cells = <1>; #size-cells = <0>; ethphy0: ethernet-phy@1 { reg = <1>; }; }; }; If users want to use EMDIO funtion to manage the PHY, they only need to place the PHY node in the emdio node. The same PHY must not be placed simultaneously within the ENETC node. An example in DTS to use EMDIO is as below. netc_emdio { ethphy0: ethernet-phy@1 { reg = <1>; }; ethphy2: ethernet-phy@8 { reg = <8>; }; }; In the host OS, when there are multiple ENETCs, they can all access their PHYs using their own port MDIO, or they can all access their PHYs using the EMDIO function, or they can partially use port MDIO and partially use the EMDIO function. Another typical use case of port MDIO is the Jailhouse usage. An ENETC is assigned to a guest OS. The EMDIO function will be unavailable in the guest OS because EMDIO is controlled by the host OS. Therefore, the ENETC can use its port MDIO to manage its external PHY in this situation. In this use case, the host OS's root dtb will disable the ENETC node, so the host OS's ENETC driver will not probe the ENETC and its PHY. In addition, this series also adds the internal MDIO bus support, each ENETC has an internal MDIO interface for managing on-die PHY (PCS) if it has PCS layer. ==================== Link: https://patch.msgid.link/20251119102557.1041881-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: update the base address of port MDIO registers for ENETC v4Wei Fang
Each ENETC has a set of external MDIO registers to access its external PHY based on its port EMDIO bus, these registers are used for MDIO bus access, such as setting the PHY address, PHY register address and value, read or write operations, C22 or C45 format, etc. The base address of this set of registers has been modified in ENETC v4 and is different from that in ENETC v1. So the base address needs to be updated so that ENETC v4 can use port MDIO to manage its own external PHY. Additionally, if ENETC has the PCS layer, it also has a set of internal MDIO registers for managing its on-die PHY (PCS/Serdes). The base address of this set of registers is also different from that of ENETC v1, so the base address also needs to be updated so that ENETC v4 can support the management of on-die PHY through the internal MDIO bus. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: set external PHY address in IERB for i.MX94 ENETCWei Fang
NETC IP has only one external master MDIO interface (eMDIO) for managing the external PHYs. ENETC can use the interfaces provided by the EMDIO function or its port MDIO to access and manage its external PHY. Both the EMDIO function and the port MDIO are all virtual ports of the eMDIO. The difference is that the EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, but port MDIO can only access its own PHY. To ensure that ENETC can only access its own PHY through port MDIO, LaBCR[MDIO_PHYAD_PRTAD] needs to be set, which represents the address of the external PHY connected to ENETC. If the accessed PHY address is not consistent with LaBCR[MDIO_PHYAD_PRTAD], then the MDIO access initiated by port MDIO will be invalid. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: set the external PHY address in IERB for port MDIO usageWei Fang
The ENETC supports managing its own external PHY through its port MDIO functionality. To use this function, the PHY address needs be set in the corresponding LaBCR register in the Integrated Endpoint Register Block (IERB), which is used for pre-boot initialization of NETC PCIe functions. The port MDIO can only work properly when the PHY address accessed by the port MDIO matches the corresponding LaBCR[MDIO_PHYAD_PRTAD] value. Because the ENETC driver only registers the MDIO bus (port MDIO bus) when it detects an MDIO child node in its node, similarly, the netc-blk-ctrl driver only resolves the PHY address and sets it in the corresponding LaBCR when it detects an MDIO child node in the ENETC node. Co-developed-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'improvements-over-dsa-conduit-ethtool-ops'Jakub Kicinski
Vladimir Oltean says: ==================== Improvements over DSA conduit ethtool ops DSA interceps 'ethtool -S eth0', where eth0 is the host port of the switch (called 'conduit'). It does this because otherwise there is no way to report port counters for the CPU port, which is a MAC like any other of that switch, except Linux exposes no net_device for it, thus no ethtool hook. Having understood all downsides of this debugging interface, when we need it we needed, so the proposed changes here are to make it more useful by dumping more counters in it: not just the switch CPU port, but all other switch ports in the tree which lack a net_device. Not reinventing any wheel, just putting more output in an existing command. That is patch 3/3. The other 2 are cleanup. ==================== Link: https://patch.msgid.link/20251122112311.138784-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: append ethtool counters of all hidden ports to conduitVladimir Oltean
Currently there is no way to see packet counters on cascade ports, and no clarity on how the API for that would look like. Because it's something that is currently needed, just extend the hack where ethtool -S on the conduit interface dumps CPU port counters, and also use it to dump counters of cascade ports. Note that the "pXX_" naming convention changes to "sXX_pYY", to distinguish between ports having the same index but belonging to different switches. This has a slight chance of causing regressions to existing tooling: - grepping for "p04_counter_name" still works, but might return more than one string now - grepping for " p04_counter_name" no longer works Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: use kernel data types for ethtool ops on conduitVladimir Oltean
Suppress some checkpatch 'CHECK' messages about u8 being preferable over uint8_t, etc. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: cpu_dp->orig_ethtool_ops might be NULLVladimir Oltean
In theory this would have been seen by now, but it seems that all drivers used as DSA conduit interfaces thus far have had ethtool_ops set, and it's hard to even find modern Ethernet drivers (and not VF ones) which don't use ethtool. Here is the unfiltered list of drivers which register any sort of net_device but don't set its ethtool_ops pointer. I don't think any of them 'risks' being used as a DSA conduit, maybe except for moxart, rnpbge and icssm, I'm not sure. - drivers/net/can/dev/dev.c - drivers/net/wwan/qcom_bam_dmux.c - drivers/net/wwan/t7xx/t7xx_netdev.c - drivers/net/arcnet/arcnet.c - drivers/net/hamradio/ - drivers/net/slip/slip.c - drivers/net/ethernet/ezchip/nps_enet.c - drivers/net/ethernet/moxa/moxart_ether.c - drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c - drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c - drivers/net/ethernet/huawei/hinic3/hinic3_main.c - drivers/net/ethernet/i825xx/ - drivers/net/ethernet/ti/icssm/icssm_prueth.c - drivers/net/ethernet/seeq/ - drivers/net/ethernet/litex/litex_liteeth.c - drivers/net/ethernet/sunplus/spl2sw_driver.c - drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c - drivers/net/ipa/ - drivers/net/wireless/microchip/wilc1000/ - drivers/net/wireless/mediatek/mt76/dma.c - drivers/net/wireless/ath/ath12k/ - drivers/net/wireless/ath/ath11k/ - drivers/net/wireless/ath/ath6kl/ - drivers/net/wireless/ath/ath10k/ - drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c - drivers/net/wireless/virtual/mac80211_hwsim.c - drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c - drivers/net/wireless/realtek/rtw89/core.c - drivers/net/wireless/realtek/rtw88/pci.c - drivers/net/caif/ - drivers/net/plip/ - drivers/net/wan/ - drivers/net/mctp/ - drivers/net/ppp/ - drivers/net/thunderbolt/ Nonetheless, it's good for the framework not to make such assumptions, and not panic when coming across such kind of host device in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25cxgb4: Rename sched_class to avoid type clashAlan Maguire
drivers/net/ethernet/chelsio/cxgb4/sched.h declares a sched_class struct which has a type name clash with struct sched_class in kernel/sched/sched.h (a type used in a field in task_struct). When cxgb4 is a builtin we end up with both sched_class types, and as a result of this we wind up with DWARF (and derived from that BTF) with a duplicate incorrect task_struct representation. When cxgb4 is built-in this type clash can cause kernel builds to fail as resolve_btfids will fail when confused which task_struct to use. See [1] for more details. As such, renaming sched_class to ch_sched_class (in line with other structs like ch_sched_flowc) makes sense. [1] https://lore.kernel.org/bpf/2412725b-916c-47bd-91c3-c2d57e3e6c7b@acm.org/ Reported-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20251121181231.64337-1-alan.maguire@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25r8169: add support for RTL9151AJaven Xu
This adds support for chip RTL9151A. Its XID is 0x68b. It is bascially basd on the one with XID 0x688, but with different firmware file. Signed-off-by: Javen Xu <javen_xu@realsil.com.cn> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251121090104.3753-1-javen_xu@realsil.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'net_sched-speedup-qdisc-dequeue'Paolo Abeni
Eric Dumazet says: ==================== net_sched: speedup qdisc dequeue Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. Idea is to cache gso_segs at enqueue time before spinlock is acquired, in the first skb cache line, where we already have qdisc_skb_cb(skb)->pkt_len. This series gives a 8 % improvement in a TX intensive workload. (120 Mpps -> 130 Mpps on a Turin host, IDPF with 32 TX queues) v2: https://lore.kernel.org/netdev/20251111093204.1432437-1-edumazet@google.com/ v1: https://lore.kernel.org/netdev/20251110094505.3335073-1-edumazet@google.com/T/#m8f562ed148f807c02fd02c6cd243604d449615b9 ==================== Link: https://patch.msgid.link/20251121083256.674562-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codelEric Dumazet
cake, codel and fq_codel can drop many packets from dequeue(). Use qdisc_dequeue_drop() so that the freeing can happen outside of the qdisc spinlock scope. Add TCQ_F_DEQUEUE_DROPS to sch->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add qdisc_dequeue_drop() helperEric Dumazet
Some qdisc like cake, codel, fq_codel might drop packets in their dequeue() method. This is currently problematic because dequeue() runs with the qdisc spinlock held. Freeing skbs can be extremely expensive. Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS so that these qdiscs can opt-in to defer the skb frees after the socket spinlock is released. TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs with an extra cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add tcf_kfree_skb_list() helperEric Dumazet
Using kfree_skb_list_reason() to free list of skbs from qdisc operations seems wrong as each skb might have a different drop reason. Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once in preparation of the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: annotate a data-race in __dev_xmit_skb()Eric Dumazet
q->limit is read locklessly, add a READ_ONCE(). Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: prefech skb->priority in __dev_xmit_skb()Eric Dumazet
Most qdiscs need to read skb->priority at enqueue time(). In commit 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") I added a prefetch(next), lets add another one for the second half of skb. Note that skb->priority and skb->hash share a common cache line, so this patch helps qdiscs needing both fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: sch_fq: prefetch one skb ahead in dequeue()Eric Dumazet
prefetch the skb that we are likely to dequeue at the next dequeue(). Also call fq_dequeue_skb() a bit sooner in fq_dequeue(). This reduces the window between read of q.qlen and changes of fields in the cache line that could be dirtied by another cpu trying to queue a packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: sch_fq: move qdisc_bstats_update() to fq_dequeue_skb()Eric Dumazet
Group together changes to qdisc fields to reduce chances of false sharing if another cpu attempts to acquire the qdisc spinlock. qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; qdisc_bstats_update(sch, skb); Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add Qdisc_read_mostly and Qdisc_write groupsEric Dumazet
It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in fast path by reducing this to a single dirtied cache line. In current layout, we change only four/six fields in the first cache line: - q.spinlock - q.qlen - bstats.bytes - bstats.packets - some Qdisc also change q.next/q.prev In the second cache line we change in the fast path: - running - state - qstats.backlog /* --- cacheline 2 boundary (128 bytes) --- */ struct sk_buff_head gso_skb __attribute__((__aligned__(64))); /* 0x80 0x18 */ struct qdisc_skb_head q; /* 0x98 0x18 */ struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xb0 0x10 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct gnet_stats_queue qstats; /* 0xc0 0x14 */ bool running; /* 0xd4 0x1 */ /* XXX 3 bytes hole, try to pack */ unsigned long state; /* 0xd8 0x8 */ struct Qdisc * next_sched; /* 0xe0 0x8 */ struct sk_buff_head skb_bad_txq; /* 0xe8 0x18 */ /* --- cacheline 4 boundary (256 bytes) --- */ Reorganize things to have a first cache line mostly read, then a mostly written one. This gives a ~3% increase of performance under tx stress. Note that there is an additional hole because @qstats now spans over a third cache line. /* --- cacheline 2 boundary (128 bytes) --- */ __u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); /* 0x80 0 */ struct sk_buff_head gso_skb; /* 0x80 0x18 */ struct Qdisc * next_sched; /* 0x98 0x8 */ struct sk_buff_head skb_bad_txq; /* 0xa0 0x18 */ __u8 __cacheline_group_end__Qdisc_read_mostly[0]; /* 0xb8 0 */ /* XXX 8 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ __u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); /* 0xc0 0 */ struct qdisc_skb_head q; /* 0xc0 0x18 */ unsigned long state; /* 0xd8 0x8 */ struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xe0 0x10 */ bool running; /* 0xf0 0x1 */ /* XXX 3 bytes hole, try to pack */ struct gnet_stats_queue qstats; /* 0xf4 0x14 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ __u8 __cacheline_group_end__Qdisc_write[0]; /* 0x108 0 */ /* XXX 56 bytes hole, try to pack */ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: cake: use qdisc_pkt_segs()Eric Dumazet
Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue() for non GSO packets. cake_overhead() does not have to recompute it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-7-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: use qdisc_skb_cb(skb)->pkt_segs in bstats_update()Eric Dumazet
Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: use qdisc_pkt_len_segs_init() in sch_handle_ingress()Eric Dumazet
sch_handle_ingress() sets qdisc_skb_cb(skb)->pkt_len. We also need to initialize qdisc_skb_cb(skb)->pkt_segs. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-5-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: initialize qdisc_skb_cb(skb)->pkt_segs in qdisc_pkt_len_init()Eric Dumazet
qdisc_pkt_len_init() is currently initalizing qdisc_skb_cb(skb)->pkt_len. Add qdisc_skb_cb(skb)->pkt_segs initialization and rename this function to qdisc_pkt_len_segs_init(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-4-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: init shinfo->gso_segs from qdisc_pkt_len_init()Eric Dumazet
Qdisc use shinfo->gso_segs for their pkts stats in bstats_update(), but this field needs to be initialized for SKB_GSO_DODGY users. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: make room for (struct qdisc_skb_cb)->pkt_segsEric Dumazet
Add a new u16 field, next to pkt_len : pkt_segs This will cache shinfo->gso_segs to speed up qdisc deqeue(). Move slave_dev_queue_mapping at the end of qdisc_skb_cb, and move three bits from tc_skb_cb : - post_ct - post_ct_snat - post_ct_dnat Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25dt-bindings: net: aspeed: add AST2700 MDIO compatibleJacky Chou
Add "aspeed,ast2700-mdio" compatible to the binding schema with a fallback to "aspeed,ast2600-mdio". Although the MDIO controller on AST2700 is functionally the same as the one on AST2600, it's good practice to add a SoC-specific compatible for new silicon. This allows future driver updates to handle any 2700-specific integration issues without requiring devicetree changes or complex runtime detection logic. For now, the driver continues to bind via the existing "aspeed,ast2600-mdio" compatible, so no driver changes are needed. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20251120-aspeed_mdio_ast2700-v2-1-0d722bfb2c54@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-24Merge branch 'mptcp-memcg-accounting-for-passive-sockets-backlog-processing'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: memcg accounting for passive sockets & backlog processing This series is split in two: the 4 first patches are linked to memcg accounting for passive sockets, and the rest introduce the backlog processing. They are sent together, because the first one appeared to be needed to get the second one fully working. The second part includes RX path improvement built around backlog processing. The main goals are improving the RX performances _and_ increase the long term maintainability. - Patches 1-3: preparation work to ease the introduction of the next patch. - Patch 4: fix memcg accounting for passive sockets. Note that this is a (non-urgent) fix, but it depends on material that is currently only in net-next, e.g. commit 4a997d49d92a ("tcp: Save lock_sock() for memcg in inet_csk_accept()."). - Patches 5-6: preparation of the stack for backlog processing, removing assumptions that will not hold true any more after the backlog introduction. - Patches 7,8,10,11,12 are more cleanups that will make the backlog patch a little less huge. - Patch 9: somewhat an unrelated cleanup, included here not to forget about it. - Patches 13-14: The real work is done by them. Patch 13 introduces the helpers needed to manipulate the msk-level backlog, and the data struct itself, without any actual functional change. Patch 14 finally uses the backlog for RX skb processing. Note that MPTCP can't use the sk_backlog, as the MPTCP release callback can also release and re-acquire the msk-level spinlock and core backlog processing works under the assumption that such event is not possible. A relevant point is memory accounts for skbs in the backlog. It's somewhat "original" due to MPTCP constraints. Such skbs use space from the incoming subflow receive buffer, do not use explicitly any forward allocated memory, as we can't update the msk fwd mem while enqueuing, nor we want to acquire again the ssk socket lock while processing the skbs. Instead the msk borrows memory from the subflow and reserve it for the backlog, see patch 5 and 14 for the gory details. ==================== Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: leverage the backlog for RX packet processingPaolo Abeni
When the msk socket is owned or the msk receive buffer is full, move the incoming skbs in a msk level backlog list. This avoid traversing the joined subflows and acquiring the subflow level socket lock at reception time, improving the RX performances. When processing the backlog, use the fwd alloc memory borrowed from the incoming subflow. skbs exceeding the msk receive space are not dropped; instead they are kept into the backlog until the receive buffer is freed. Dropping packets already acked at the TCP level is explicitly discouraged by the RFC and would corrupt the data stream for fallback sockets. Special care is needed to avoid adding skbs to the backlog of a closed msk and to avoid leaving dangling references into the backlog at subflow closing time. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-14-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: introduce mptcp-level backlogPaolo Abeni
We are soon using it for incoming data processing. MPTCP can't leverage the sk_backlog, as the latter is processed before the release callback, and such callback for MPTCP releases and re-acquire the socket spinlock, breaking the sk_backlog processing assumption. Add a skb backlog list inside the mptcp sock struct, and implement basic helper to transfer packet to and purge such list. Packets in the backlog are memory accounted and still use the incoming subflow receive memory, to allow back-pressure. The backlog size is implicitly bounded to the sum of subflows rcvbuf. When a subflow is closed, references from the backlog to such sock are removed. No packet is currently added to the backlog, so no functional changes intended here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-13-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: borrow forward memory from subflowPaolo Abeni
In the MPTCP receive path, we release the subflow allocated fwd memory just to allocate it again shortly after for the msk. That could increases the failures chances, especially when we will add backlog processing, with other actions could consume the just released memory before the msk socket has a chance to do the rcv allocation. Replace the skb_orphan() call with an open-coded variant that explicitly borrows, the fwd memory from the subflow socket instead of releasing it. The borrowed memory does not have PAGE_SIZE granularity; rounding to the page size will make the fwd allocated memory higher than what is strictly required and could make the incoming subflow fwd mem consistently negative. Instead, keep track of the accumulated frag and borrow the full page at subflow close time. This allow removing the last drop in the TCP to MPTCP transition and the associated, now unused, MIB. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-12-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: handle first subflow closing consistentlyPaolo Abeni
Currently, as soon as the PM closes a subflow, the msk stops accepting data from it, even if the TCP socket could be still formally open in the incoming direction, with the notable exception of the first subflow. The root cause of such behavior is that code currently piggy back two separate semantic on the subflow->disposable bit: the subflow context must be released and that the subflow must stop accepting incoming data. The first subflow is never disposed, so it also never stop accepting incoming data. Use a separate bit to mark the latter status and set such bit in __mptcp_close_ssk() for all subflows. Beyond making per subflow behaviour more consistent this will also simplify the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-11-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: drop the __mptcp_data_ready() helperPaolo Abeni
It adds little clarity and there is a single user of such helper, just inline it in the caller. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-10-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: make mptcp_destroy_common() staticPaolo Abeni
Such function is only used inside protocol.c, there is no need to expose it to the whole stack. Note that the function definition most be moved earlier to avoid forward declaration. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-9-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: do not miss early first subflow close event notificationPaolo Abeni
The MPTCP protocol is not currently emitting the NL event when the first subflow is closed before msk accept() time. By replacing the in use close helper is such scenario, implicitly introduce the missing notification. Note that in such scenario we want to be sure that mptcp_close_ssk() will not trigger any PM work, move the msk state change update earlier, so that the previous patch will offer such guarantee. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-8-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: ensure the kernel PM does not take action too latePaolo Abeni
The PM hooks can currently take place when the msk is already shutting down. Subflow creation will fail, thanks to the existing check at join time, but we can entirely avoid starting the to be failed operations. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-7-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: cleanup fallback dummy mapping generationPaolo Abeni
MPTCP currently access ack_seq outside the msk socket log scope to generate the dummy mapping for fallback socket. Soon we are going to introduce backlog usage and even for fallback socket the ack_seq value will be significantly off outside of the msk socket lock scope. Avoid relying on ack_seq for dummy mapping generation, using instead the subflow sequence number. Note that in case of disconnect() and (re)connect() we must ensure that any previous state is re-set. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-6-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: cleanup fallback data fin receptionPaolo Abeni
MPTCP currently generate a dummy data_fin for fallback socket when the fallback subflow has completed data reception using the current ack_seq. We are going to introduce backlog usage for the msk soon, even for fallback sockets: the ack_seq value will not match the most recent sequence number seen by the fallback subflow socket, as it will ignore data_seq sitting in the backlog. Instead use the last map sequence number to set the data_fin, as fallback (dummy) map sequences are always in sequence. Reviewed-by: Geliang Tang <geliang@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-5-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: fix memcg accounting for passive socketsPaolo Abeni
The passive sockets never got proper memcg accounting: the msk socket is associated with the memcg at accept time, but the passive subflows never got it right. At accept time, traverse the subflows list and associate each of them with the msk memcg, and try to do the same at join completion time, if the msk has been already accepted. Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-4-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: grafting MPJ subflow earlierPaolo Abeni
Later patches need to ensure that all MPJ subflows are grafted to the msk socket before accept() completion. Currently the grafting happens under the msk socket lock: potentially at msk release_cb time which make satisfying the above condition a bit tricky. Move the MPJ subflow grafting earlier, under the msk data lock, so that we can use such lock as a synchronization point. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-3-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24mptcp: factor-out cgroup data inherit helperPaolo Abeni
MPTCP will soon need the same functionality for passive sockets, factor them out in a common helper. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-2-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24net: factor-out _sk_charge() helperPaolo Abeni
Move out of __inet_accept() the code dealing charging newly accepted socket to memcg. MPTCP will soon use it to on a per subflow basis, in different contexts. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Geliang Tang <geliang@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251121-net-next-mptcp-memcg-backlog-imp-v1-1-1f34b6c1e0b1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24ipvlan: fix sparse warning about __be32 -> u32Dmitry Skorodumov
Fixed a sparse warning: ipvlan_core.c:56: warning: incorrect type in argument 1 (different base types) expected unsigned int [usertype] a got restricted __be32 const [usertype] s_addr Force cast the s_addr to u32 Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@huawei.com> Link: https://patch.msgid.link/20251121155112.4182007-1-skorodumov.dmitry@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24net: mvpp2: extract GRXRINGS from .get_rxnfcBreno Leitao
Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to optimize RX ring queries") added specific support for GRXRINGS callback, simplifying .get_rxnfc. Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new .get_rx_ring_count() for the mvpp2 driver. This simplifies the RX ring count retrieval and aligns mvpp2 with the new ethtool API for querying RX ring parameters, while keeping the other rxnfc handlers (GRXCLSRLCNT, GRXCLSRULE, GRXCLSRLALL) intact. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20251121-marvell-v1-2-8338f3e55a4c@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>