From 2cf424f5ac01682c93e3decfddee6282b7552f50 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Mon, 3 Feb 2025 18:52:29 +0000 Subject: mlx4: Remove unused functions The last use of mlx4_find_cached_mac() was removed in 2014 by commit 2f5bb473681b ("mlx4: Add ref counting to port MAC table for RoCE") mlx4_zone_free_entries() was added in 2014 by commit 7a89399ffad7 ("net/mlx4: Add mlx4_bitmap zone allocator") but hasn't been used. (The _unique version is used) Remove them. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Simon Horman Reviewed-by: Tariq Toukan Reviewed-by: Kalesh AP Link: https://patch.msgid.link/20250203185229.204279-1-linux@treblig.org Signed-off-by: Jakub Kicinski --- include/linux/mlx4/device.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 27f42f713c89..87edb7a8173b 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -1415,7 +1415,6 @@ int mlx4_get_is_vlan_offload_disabled(struct mlx4_dev *dev, u8 port, bool *vlan_offload_disabled); void mlx4_handle_eth_header_mcast_prio(struct mlx4_net_trans_rule_hw_ctrl *ctrl, struct _rule_hw *eth_header); -int mlx4_find_cached_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *idx); int mlx4_find_cached_vlan(struct mlx4_dev *dev, u8 port, u16 vid, int *idx); int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index); void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, u16 vlan); -- cgit v1.2.3 From 3e0d3cb3fbe06a7bc09d98324a21a446c80f9d3b Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Tue, 3 Dec 2024 07:58:13 +0100 Subject: ice, irdma: move interrupts code to irdma Move responsibility of MSI-X requesting for RDMA feature from ice driver to irdma driver. It is done to allow simple fallback when there is not enough MSI-X available. Change amount of MSI-X used for control from 4 to 1, as it isn't needed to have more than one MSI-X for this purpose. Reviewed-by: Jacob Keller Signed-off-by: Michal Swiatkowski Signed-off-by: Tony Nguyen --- include/linux/net/intel/iidc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net/intel/iidc.h b/include/linux/net/intel/iidc.h index 1c1332e4df26..13274c3def66 100644 --- a/include/linux/net/intel/iidc.h +++ b/include/linux/net/intel/iidc.h @@ -78,6 +78,8 @@ int ice_del_rdma_qset(struct ice_pf *pf, struct iidc_rdma_qset_params *qset); int ice_rdma_request_reset(struct ice_pf *pf, enum iidc_reset_type reset_type); int ice_rdma_update_vsi_filter(struct ice_pf *pf, u16 vsi_id, bool enable); void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos); +int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry); +void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry); /* Structure representing auxiliary driver tailored information about the core * PCI dev, each auxiliary driver using the IIDC interface will have an -- cgit v1.2.3 From 79c61899b5eee317907efd1b0d06a1ada0cc00d8 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Tue, 4 Feb 2025 18:03:10 +0100 Subject: net-sysfs: remove rtnl_trylock from device attributes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4]. There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible. But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments. Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed. [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/ [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/ Signed-off-by: Antoine Tenart Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/rtnetlink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 4bc2ee0b10b0..ccaaf4c7d5f6 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -43,6 +43,7 @@ extern void rtnl_lock(void); extern void rtnl_unlock(void); extern int rtnl_trylock(void); extern int rtnl_is_locked(void); +extern int rtnl_lock_interruptible(void); extern int rtnl_lock_killable(void); extern bool refcount_dec_and_rtnl_lock(refcount_t *r); -- cgit v1.2.3 From b7ecc1de51ca7d0a9fa8dbc3f756ab87b99a1838 Mon Sep 17 00:00:00 2001 From: Antoine Tenart Date: Tue, 4 Feb 2025 18:03:11 +0100 Subject: net-sysfs: move queue attribute groups outside the default groups Rx/tx queues embed their own kobject for registering their per-queue sysfs files. The issue is they're using the kobject default groups for this and entirely rely on the kobject refcounting for releasing their sysfs paths. In order to remove rtnl_trylock calls we need sysfs files not to rely on their associated kobject refcounting for their release. Thus we here move queues sysfs files from the kobject default groups to their own groups which can be removed separately. Signed-off-by: Antoine Tenart Link: https://patch.msgid.link/20250204170314.146022-3-atenart@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2a59034a5fa2..1dcc76af7520 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -658,6 +658,7 @@ struct netdev_queue { struct Qdisc __rcu *qdisc_sleeping; #ifdef CONFIG_SYSFS struct kobject kobj; + const struct attribute_group **groups; #endif unsigned long tx_maxrate; /* -- cgit v1.2.3 From f9beaf4fac64c84631ba9a2eb864cea6b52032a2 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 3 Feb 2025 23:35:06 +0200 Subject: net/mlx5: Change clock in mlx5_core_dev to mlx5_clock pointer Change clock member in mlx5_core_dev to a pointer, so it can point to a clock shared by multiple functions in later patch. For now, each function has its own clock, so mdev in mlx5_clock_priv is the back pointer to the function. Later it points to one (normally the first one) of the multiple functions sharing the same clock. Change mlx5_init_clock() to return error if mlx5_clock is not allocated. Besides, a null clock is defined and used when hardware clock is not supported. So, the clock pointer is always pointing to something valid. Signed-off-by: Jianbo Liu Reviewed-by: Carolina Jubran Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan Signed-off-by: Paolo Abeni --- include/linux/mlx5/driver.h | 31 ++----------------------------- 1 file changed, 2 insertions(+), 29 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index af86097641b0..5dab3d8d05e4 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -54,7 +54,6 @@ #include #include #include -#include #include #define MLX5_ADEV_NAME "mlx5_core" @@ -679,33 +678,7 @@ struct mlx5_rsvd_gids { struct ida ida; }; -#define MAX_PIN_NUM 8 -struct mlx5_pps { - u8 pin_caps[MAX_PIN_NUM]; - struct work_struct out_work; - u64 start[MAX_PIN_NUM]; - u8 enabled; - u64 min_npps_period; - u64 min_out_pulse_duration_ns; -}; - -struct mlx5_timer { - struct cyclecounter cycles; - struct timecounter tc; - u32 nominal_c_mult; - unsigned long overflow_period; -}; - -struct mlx5_clock { - struct mlx5_nb pps_nb; - seqlock_t lock; - struct hwtstamp_config hwtstamp_config; - struct ptp_clock *ptp; - struct ptp_clock_info ptp_info; - struct mlx5_pps pps_info; - struct mlx5_timer timer; -}; - +struct mlx5_clock; struct mlx5_dm; struct mlx5_fw_tracer; struct mlx5_vxlan; @@ -789,7 +762,7 @@ struct mlx5_core_dev { #ifdef CONFIG_MLX5_FPGA struct mlx5_fpga_device *fpga; #endif - struct mlx5_clock clock; + struct mlx5_clock *clock; struct mlx5_ib_clock_info *clock_info; struct mlx5_fw_tracer *tracer; struct mlx5_rsc_dump *rsc_dump; -- cgit v1.2.3 From 574998cf3b3f59afa9e3a6bbb609d9d4eb2023b4 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 3 Feb 2025 23:35:07 +0200 Subject: net/mlx5: Add devcom component for the clock shared by functions Add new devcom component for hardware clock. When it is running in real time mode, the functions are grouped by the identify they query. According to firmware document, the clock identify size is 64 bits, so it's safe to memcpy to component key, as the key size is also 64 bits. Signed-off-by: Jianbo Liu Reviewed-by: Carolina Jubran Reviewed-by: Dragos Tatulea Signed-off-by: Tariq Toukan Signed-off-by: Paolo Abeni --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 5dab3d8d05e4..46bd7550adf8 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -679,6 +679,7 @@ struct mlx5_rsvd_gids { }; struct mlx5_clock; +struct mlx5_clock_dev_state; struct mlx5_dm; struct mlx5_fw_tracer; struct mlx5_vxlan; @@ -763,6 +764,7 @@ struct mlx5_core_dev { struct mlx5_fpga_device *fpga; #endif struct mlx5_clock *clock; + struct mlx5_clock_dev_state *clock_state; struct mlx5_ib_clock_info *clock_info; struct mlx5_fw_tracer *tracer; struct mlx5_rsc_dump *rsc_dump; -- cgit v1.2.3 From ee0a4fc396f1b6fd1b34e99754896961fb67e4e3 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 3 Feb 2025 23:35:12 +0200 Subject: net/mlx5: Add support for 200Gbps per lane link modes This patch exposes new link modes using 200Gbps per lane, including 200G, 400G and 800G modes. Signed-off-by: Jianbo Liu Reviewed-by: Shahar Shitrit Signed-off-by: Tariq Toukan Signed-off-by: Paolo Abeni --- include/linux/mlx5/port.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index e68d42b8ce65..fd625e0dd869 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -115,9 +115,12 @@ enum mlx5e_ext_link_mode { MLX5E_100GAUI_1_100GBASE_CR_KR = 11, MLX5E_200GAUI_4_200GBASE_CR4_KR4 = 12, MLX5E_200GAUI_2_200GBASE_CR2_KR2 = 13, + MLX5E_200GAUI_1_200GBASE_CR1_KR1 = 14, MLX5E_400GAUI_8_400GBASE_CR8 = 15, MLX5E_400GAUI_4_400GBASE_CR4_KR4 = 16, + MLX5E_400GAUI_2_400GBASE_CR2_KR2 = 17, MLX5E_800GAUI_8_800GBASE_CR8_KR8 = 19, + MLX5E_800GAUI_4_800GBASE_CR4_KR4 = 20, MLX5E_EXT_LINK_MODES_NUMBER, }; -- cgit v1.2.3 From 8d3bbe4355aded32961b9009b31de6d41b7352e9 Mon Sep 17 00:00:00 2001 From: Biju Das Date: Wed, 5 Feb 2025 12:42:21 +0000 Subject: of: base: Add of_get_available_child_by_name() There are lot of drivers using of_get_child_by_name() followed by of_device_is_available() to find the available child node by name for a given parent. Provide a helper for these users to simplify the code. Suggested-by: Geert Uytterhoeven Reviewed-by: Rob Herring Signed-off-by: Biju Das Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/linux/of.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index eaf0e2a2b75c..9d6b8a61607f 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -301,6 +301,8 @@ extern struct device_node *of_get_compatible_child(const struct device_node *par const char *compatible); extern struct device_node *of_get_child_by_name(const struct device_node *node, const char *name); +extern struct device_node *of_get_available_child_by_name(const struct device_node *node, + const char *name); /* cache lookup */ extern struct device_node *of_find_next_cache_node(const struct device_node *); @@ -578,6 +580,13 @@ static inline struct device_node *of_get_child_by_name( return NULL; } +static inline struct device_node *of_get_available_child_by_name( + const struct device_node *node, + const char *name) +{ + return NULL; +} + static inline int of_device_is_compatible(const struct device_node *device, const char *name) { -- cgit v1.2.3 From c6594d64271704b335378e7b74c39fe4d4fcc777 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Thu, 6 Feb 2025 19:26:26 +0100 Subject: unroll: add generic loop unroll helpers There are cases when we need to explicitly unroll loops. For example, cache operations, filling DMA descriptors on very high speeds etc. Add compiler-specific attribute macros to give the compiler a hint that we'd like to unroll a loop. Example usage: #define UNROLL_BATCH 8 unrolled_count(UNROLL_BATCH) for (u32 i = 0; i < UNROLL_BATCH; i++) op(priv, i); Note that sometimes the compilers won't unroll loops if they think this would have worse optimization and perf than without unrolling, and that unroll attributes are available only starting GCC 8. For older compiler versions, no hints/attributes will be applied. For better unrolling/parallelization, don't have any variables that interfere between iterations except for the iterator itself. Co-developed-by: Jose E. Marchesi # pragmas Signed-off-by: Jose E. Marchesi Reviewed-by: Przemek Kitszel Signed-off-by: Alexander Lobakin Link: https://patch.msgid.link/20250206182630.3914318-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski --- include/linux/unroll.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/unroll.h b/include/linux/unroll.h index d42fd6366373..863fb69f6a7e 100644 --- a/include/linux/unroll.h +++ b/include/linux/unroll.h @@ -9,6 +9,50 @@ #include +#ifdef CONFIG_CC_IS_CLANG +#define __pick_unrolled(x, y) _Pragma(#x) +#elif CONFIG_GCC_VERSION >= 80000 +#define __pick_unrolled(x, y) _Pragma(#y) +#else +#define __pick_unrolled(x, y) /* not supported */ +#endif + +/** + * unrolled - loop attributes to ask the compiler to unroll it + * + * Usage: + * + * #define BATCH 8 + * + * unrolled_count(BATCH) + * for (u32 i = 0; i < BATCH; i++) + * // loop body without cross-iteration dependencies + * + * This is only a hint and the compiler is free to disable unrolling if it + * thinks the count is suboptimal and may hurt performance and/or hugely + * increase object code size. + * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends + * on what iter x will do with variables) is not a strict requirement, but + * provides best performance and object code size. + * Available only on Clang and GCC 8.x onwards. + */ + +/* Ask the compiler to pick an optimal unroll count, Clang only */ +#define unrolled \ + __pick_unrolled(clang loop unroll(enable), /* nothing */) + +/* Unroll each @n iterations of the loop */ +#define unrolled_count(n) \ + __pick_unrolled(clang loop unroll_count(n), GCC unroll n) + +/* Unroll the whole loop */ +#define unrolled_full \ + __pick_unrolled(clang loop unroll(full), GCC unroll 65534) + +/* Never unroll the loop */ +#define unrolled_none \ + __pick_unrolled(clang loop unroll(disable), GCC unroll 1) + #define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args) #define __UNROLL_0(MACRO, args...) -- cgit v1.2.3 From 848b09d53d923b4caee5491f57a5c5b22d81febc Mon Sep 17 00:00:00 2001 From: Aleksander Jan Bajkowski Date: Thu, 6 Feb 2025 23:40:33 +0100 Subject: r8152: add vendor/device ID pair for Dell Alienware AW1022z The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl Signed-off-by: Jakub Kicinski --- include/linux/usb/r8152.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h index 33a4c146dc19..2ca60828f28b 100644 --- a/include/linux/usb/r8152.h +++ b/include/linux/usb/r8152.h @@ -30,6 +30,7 @@ #define VENDOR_ID_NVIDIA 0x0955 #define VENDOR_ID_TPLINK 0x2357 #define VENDOR_ID_DLINK 0x2001 +#define VENDOR_ID_DELL 0x413c #define VENDOR_ID_ASUS 0x0b05 #if IS_REACHABLE(CONFIG_USB_RTL8152) -- cgit v1.2.3 From de86c5f60839dc0d771711a848b4f55ad3f90844 Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Wed, 5 Feb 2025 11:39:13 +0200 Subject: wifi: mac80211: Add support for EPCS configuration Add support for configuring EPCS state: - When EPCS is enabled, send an EPCS enable request action frame to the AP. When the AP replies with EPCS enable response, enable EPCS by applying the QoS parameters provided by the AP. Do so for all the valid MLD links. Once EPCS is enabled, support processing of unsolicited EPCS enable response frames. - When EPCS is disabled, send an EPCS teardown request to the AP and apply the QoS parameters as obtained from the last received beacons. Do so for all the valid links. Signed-off-by: Ilan Peer Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20250205110958.7a90afd7e140.I3f602d65f5c1fd849d6c70b12307dda33aa91ccb@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 16741e542e81..8f35a3a5211c 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -1543,6 +1543,10 @@ struct ieee80211_mgmt { u8 count; u8 variable[]; } __packed ml_reconf_resp; + struct { + u8 action_code; + u8 variable[]; + } __packed epcs; } u; } __packed action; DECLARE_FLEX_ARRAY(u8, body); /* Generic frame body */ @@ -5570,6 +5574,9 @@ static inline bool ieee80211_mle_reconf_sta_prof_size_ok(const u8 *data, fixed + prof->sta_info_len - 1 <= len; } +#define IEEE80211_MLE_STA_EPCS_CONTROL_LINK_ID 0x000f +#define IEEE80211_EPCS_ENA_RESP_BODY_LEN 3 + static inline bool ieee80211_tid_to_link_map_size_ok(const u8 *data, size_t len) { const struct ieee80211_ttlm_elem *t2l = (const void *)data; -- cgit v1.2.3 From 282eeec9196fc6593540c7bf7479305a8384de32 Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Wed, 5 Feb 2025 11:39:14 +0200 Subject: wifi: ieee80211: Add missing EHT MAC capabilities Add missing EHT MAC capabilities definitions. Signed-off-by: Ilan Peer Reviewed-by: Johannes Berg Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20250205110958.6c1643c345a1.I7405b9c35cb39ae97a52c3fbcc36b0bd81e495dc@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 8f35a3a5211c..508d466de1cc 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -3113,6 +3113,11 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_11454 2 #define IEEE80211_EHT_MAC_CAP1_MAX_AMPDU_LEN_MASK 0x01 +#define IEEE80211_EHT_MAC_CAP1_EHT_TRS 0x02 +#define IEEE80211_EHT_MAC_CAP1_TXOP_RET 0x04 +#define IEEE80211_EHT_MAC_CAP1_TWO_BQRS 0x08 +#define IEEE80211_EHT_MAC_CAP1_EHT_LINK_ADAPT_MASK 0x30 +#define IEEE80211_EHT_MAC_CAP1_UNSOL_EPCS_PRIO_ACCESS 0x40 /* EHT PHY capabilities as defined in P802.11be_D2.0 section 9.4.2.313.3 */ #define IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ 0x02 -- cgit v1.2.3 From 8eb0d381be31bfa01f768ad38a15af7ade805e69 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 10 Feb 2025 21:49:22 +0100 Subject: net: phy: rename eee_broken_modes to eee_disabled_modes This bitmap is used also if the MAC doesn't support an EEE mode. So the mode isn't necessarily broken in the PHY. Therefore rename the bitmap. Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/6cd11422-dd67-4c87-a642-308de694af92@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 19f076a71f94..dbc7e7245881 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -611,7 +611,7 @@ struct macsec_ops; * @eee_cfg: User configuration of EEE * @lp_advertising: Current link partner advertised linkmodes * @host_interfaces: PHY interface modes supported by host - * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited + * @eee_disabled_modes: Energy efficient ethernet modes not to be advertised * @autoneg: Flag autoneg being used * @rate_matching: Current rate matching mode * @link: Current link state @@ -727,7 +727,7 @@ struct phy_device { __ETHTOOL_DECLARE_LINK_MODE_MASK(supported_eee); __ETHTOOL_DECLARE_LINK_MODE_MASK(advertising_eee); /* Energy efficient ethernet modes which should be prohibited */ - __ETHTOOL_DECLARE_LINK_MODE_MASK(eee_broken_modes); + __ETHTOOL_DECLARE_LINK_MODE_MASK(eee_disabled_modes); bool enable_tx_lpi; bool eee_active; struct eee_config eee_cfg; @@ -1353,7 +1353,7 @@ int phy_speed_down_core(struct phy_device *phydev); */ static inline void phy_set_eee_broken(struct phy_device *phydev, u32 link_mode) { - linkmode_set_bit(link_mode, phydev->eee_broken_modes); + linkmode_set_bit(link_mode, phydev->eee_disabled_modes); } /** -- cgit v1.2.3 From 5e7a74b6a35782be83b433979e71df2636ab05f0 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 10 Feb 2025 21:50:10 +0100 Subject: net: phy: rename phy_set_eee_broken to phy_disable_eee_mode Consider that an EEE mode may not be broken but simply not supported by the MAC, and rename function phy_set_eee_broken(). Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/30deb630-3f6b-4ffb-a1e6-a9736021f43a@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index dbc7e7245881..29df4c602589 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1347,11 +1347,11 @@ void of_set_phy_timing_role(struct phy_device *phydev); int phy_speed_down_core(struct phy_device *phydev); /** - * phy_set_eee_broken - Mark an EEE mode as broken so that it isn't advertised. + * phy_disable_eee_mode - Don't advertise an EEE mode. * @phydev: The phy_device struct - * @link_mode: The broken EEE mode + * @link_mode: The EEE mode to be disabled */ -static inline void phy_set_eee_broken(struct phy_device *phydev, u32 link_mode) +static inline void phy_disable_eee_mode(struct phy_device *phydev, u32 link_mode) { linkmode_set_bit(link_mode, phydev->eee_disabled_modes); } -- cgit v1.2.3 From 16d11fdaeb22715d8b55b08890173ffa2326baee Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 9 Feb 2025 13:12:44 +0100 Subject: net: phy: remove unused PHY_INIT_TIMEOUT and PHY_FORCE_TIMEOUT Both definitions are unused. Last users have been removed with: f3ba9d490d6e ("net: s6gmac: remove driver") 2bd229df5e2e ("net: phy: remove state PHY_FORCING") Signed-off-by: Heiner Kallweit Reviewed-by: Gerhard Engleder Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/f8e7b8ed-a665-41ad-b0ce-cbfdb65262ef@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 29df4c602589..83994b394d8e 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -303,9 +303,6 @@ static inline long rgmii_clock(int speed) } } -#define PHY_INIT_TIMEOUT 100000 -#define PHY_FORCE_TIMEOUT 10 - #define PHY_MAX_ADDR 32 /* Used when trying to connect to a specific phy (mii bus id:phy device id) */ -- cgit v1.2.3 From 8bf47e4d7b87cbd6a69541643d3fa4003c99d95f Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 10 Feb 2025 09:23:57 +0100 Subject: net: phy: Add support for driver-specific next update time Introduce the `phy_get_next_update_time` function to allow PHY drivers to dynamically determine the time (in jiffies) until the next state update event. This enables more flexible and adaptive polling intervals based on the link state or other conditions. Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250210082358.200751-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 83994b394d8e..64982eba71d1 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1270,6 +1270,19 @@ struct phy_driver { */ int (*led_polarity_set)(struct phy_device *dev, int index, unsigned long modes); + + /** + * @get_next_update_time: Get the time until the next update event + * @dev: PHY device + * + * Callback to determine the time (in jiffies) until the next + * update event for the PHY state machine. Allows PHY drivers to + * dynamically adjust polling intervals based on link state or other + * conditions. + * + * Returns the time in jiffies until the next update event. + */ + unsigned int (*get_next_update_time)(struct phy_device *dev); }; #define to_phy_driver(d) container_of_const(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv) -- cgit v1.2.3 From 2001d21592e5eb531d23950223eedad55c987db8 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 10 Feb 2025 10:36:44 +0000 Subject: net: phylink: provide phylink_mac_implements_lpi() Provide a helper to determine whether the MAC operations structure implements the LPI operations, which will be used by both phylink and DSA. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1thR9g-003vX6-4s@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 898b00451bbf..0de78673172d 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -737,6 +737,18 @@ static inline int phylink_get_link_timer_ns(phy_interface_t interface) } } +/** + * phylink_mac_implements_lpi() - determine if MAC implements LPI ops + * @ops: phylink_mac_ops structure + * + * Returns true if the phylink MAC operations structure indicates that the + * LPI operations have been implemented, false otherwise. + */ +static inline bool phylink_mac_implements_lpi(const struct phylink_mac_ops *ops) +{ + return ops && ops->mac_disable_tx_lpi && ops->mac_enable_tx_lpi; +} + void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state, unsigned int neg_mode, u16 bmsr, u16 lpa); void phylink_mii_c22_pcs_get_state(struct mdio_device *pcs, -- cgit v1.2.3 From 34dba73b231f2a46af88519d573052cc57a84952 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Tue, 11 Feb 2025 11:20:56 +0100 Subject: sctp: Remove commented out code Remove commented out code. Signed-off-by: Thorsten Blum Link: https://patch.msgid.link/20250211102057.587182-1-thorsten.blum@linux.dev Signed-off-by: Jakub Kicinski --- include/linux/sctp.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sctp.h b/include/linux/sctp.h index 836a7e200f39..812011d8b67e 100644 --- a/include/linux/sctp.h +++ b/include/linux/sctp.h @@ -222,7 +222,6 @@ struct sctp_datahdr { __be16 stream; __be16 ssn; __u32 ppid; - /* __u8 payload[]; */ }; struct sctp_data_chunk { -- cgit v1.2.3 From 27ebd8bf9e4b53388cef2d9cdb2947bc456b0b33 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Wed, 6 Nov 2024 12:37:18 -0500 Subject: virtchnl: add support for enabling PTP on iAVF Add support for allowing a VF to enable PTP feature - Rx timestamps The new capability is gated by VIRTCHNL_VF_CAP_PTP, which must be set by the VF to request access to the new operations. In addition, the VIRTCHNL_OP_1588_PTP_CAPS command is used to determine the specific capabilities available to the VF. This support includes the following additional capabilities: * Rx timestamps enabled in the Rx queues (when using flexible advanced descriptors) * Read access to PHC time over virtchnl using VIRTCHNL_OP_1588_PTP_GET_TIME Extra space is reserved in most structures to allow for future extension (like set clock, Tx timestamps). Additional opcode numbers are reserved and space in the virtchnl_ptp_caps structure is specifically set aside for this. Additionally, each structure has some space reserved for future extensions to allow some flexibility. Signed-off-by: Jacob Keller Reviewed-by: Rahul Rameshbabu Reviewed-by: Simon Horman Reviewed-by: Alexander Lobakin Tested-by: Rafal Romanowski Signed-off-by: Mateusz Polchlopek Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 67 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 13a11f3c09b8..92866e449b21 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -154,7 +154,10 @@ enum virtchnl_ops { VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57, - /* opcode 57 - 65 are reserved */ + /* opcode 58 and 59 are reserved */ + VIRTCHNL_OP_1588_PTP_GET_CAPS = 60, + VIRTCHNL_OP_1588_PTP_GET_TIME = 61, + /* opcode 62 - 65 are reserved */ VIRTCHNL_OP_GET_QOS_CAPS = 66, /* opcode 68 through 111 are reserved */ VIRTCHNL_OP_CONFIG_QUEUE_BW = 112, @@ -270,6 +273,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource); #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF BIT(27) #define VIRTCHNL_VF_OFFLOAD_FDIR_PF BIT(28) #define VIRTCHNL_VF_OFFLOAD_QOS BIT(29) +#define VIRTCHNL_VF_CAP_PTP BIT(31) #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \ VIRTCHNL_VF_OFFLOAD_VLAN | \ @@ -1425,6 +1429,61 @@ struct virtchnl_fdir_del { VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del); +#define VIRTCHNL_1588_PTP_CAP_RX_TSTAMP BIT(1) +#define VIRTCHNL_1588_PTP_CAP_READ_PHC BIT(2) + +/** + * struct virtchnl_ptp_caps - Defines the PTP caps available to the VF. + * @caps: On send, VF sets what capabilities it requests. On reply, PF + * indicates what has been enabled for this VF. The PF shall not set + * bits which were not requested by the VF. + * @rsvd: Reserved bits for future extension. + * + * Structure that defines the PTP capabilities available to the VF. The VF + * sends VIRTCHNL_OP_1588_PTP_GET_CAPS, and must fill in the ptp_caps field + * indicating what capabilities it is requesting. The PF will respond with the + * same message with the virtchnl_ptp_caps structure indicating what is + * enabled for the VF. + * + * VIRTCHNL_1588_PTP_CAP_RX_TSTAMP indicates that the VF receive queues have + * receive timestamps enabled in the flexible descriptors. Note that this + * requires a VF to also negotiate to enable advanced flexible descriptors in + * the receive path instead of the default legacy descriptor format. + * + * VIRTCHNL_1588_PTP_CAP_READ_PHC indicates that the VF may read the PHC time + * via the VIRTCHNL_OP_1588_PTP_GET_TIME command. + * + * Note that in the future, additional capability flags may be added which + * indicate additional extended support. All fields marked as reserved by this + * header will be set to zero. VF implementations should verify this to ensure + * that future extensions do not break compatibility. + */ +struct virtchnl_ptp_caps { + u32 caps; + u8 rsvd[44]; +}; + +VIRTCHNL_CHECK_STRUCT_LEN(48, virtchnl_ptp_caps); + +/** + * struct virtchnl_phc_time - Contains the 64bits of PHC clock time in ns. + * @time: PHC time in nanoseconds + * @rsvd: Reserved for future extension + * + * Structure received with VIRTCHNL_OP_1588_PTP_GET_TIME. Contains the 64bits + * of PHC clock time in nanoseconds. + * + * VIRTCHNL_OP_1588_PTP_GET_TIME may be sent to request the current time of + * the PHC. This op is available in case direct access via the PHC registers + * is not available. + */ +struct virtchnl_phc_time { + u64 time; + u8 rsvd[8]; +}; + +VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_phc_time); + struct virtchnl_shaper_bw { /* Unit is Kbps */ u32 committed; @@ -1757,6 +1816,12 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode, } } break; + case VIRTCHNL_OP_1588_PTP_GET_CAPS: + valid_len = sizeof(struct virtchnl_ptp_caps); + break; + case VIRTCHNL_OP_1588_PTP_GET_TIME: + valid_len = sizeof(struct virtchnl_phc_time); + break; /* These are always errors coming from the VF. */ case VIRTCHNL_OP_EVENT: case VIRTCHNL_OP_UNKNOWN: -- cgit v1.2.3 From 7c1178a9df583454fc76be2ca8a6f0bef6613fba Mon Sep 17 00:00:00 2001 From: Simei Su Date: Wed, 6 Nov 2024 12:37:19 -0500 Subject: ice: support Rx timestamp on flex descriptor To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by the VF to request PTP capability and responded by the PF what capability is enabled for that VF. Hardware captures timestamps which contain only 32 bits of nominal nanoseconds, as opposed to the 64bit timestamps that the stack expects. To convert 32b to 64b, we need a current PHC time. VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the PF with the current PHC time. Signed-off-by: Simei Su Reviewed-by: Simon Horman Tested-by: Rafal Romanowski Co-developed-by: Mateusz Polchlopek Signed-off-by: Mateusz Polchlopek Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 92866e449b21..56baf97c44d0 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -313,6 +313,18 @@ struct virtchnl_txq_info { VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_txq_info); +/* virtchnl_rxq_info_flags - definition of bits in the flags field of the + * virtchnl_rxq_info structure. + * + * @VIRTCHNL_PTP_RX_TSTAMP: request to enable Rx timestamping + * + * Other flag bits are currently reserved and they may be extended in the + * future. + */ +enum virtchnl_rxq_info_flags { + VIRTCHNL_PTP_RX_TSTAMP = BIT(0), +}; + /* VIRTCHNL_OP_CONFIG_RX_QUEUE * VF sends this message to set up parameters for one RX queue. * External data buffer contains one instance of virtchnl_rxq_info. @@ -336,7 +348,8 @@ struct virtchnl_rxq_info { u32 max_pkt_size; u8 crc_disable; u8 rxdid; - u8 pad1[2]; + enum virtchnl_rxq_info_flags flags:8; /* see virtchnl_rxq_info_flags */ + u8 pad1; u64 dma_ring_addr; /* see enum virtchnl_rx_hsplit; deprecated with AVF 1.0 */ -- cgit v1.2.3 From 6a88c797ab4005cd5dd02575c1b3d1e7b53fe715 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Wed, 6 Nov 2024 12:37:20 -0500 Subject: virtchnl: add enumeration for the rxdid format Support for allowing VF to negotiate the descriptor format requires that the VF specify which descriptor format to use when requesting Rx queues. The VF is supposed to request the set of supported formats via the new VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, and then set one of the supported formats in the rxdid field of the virtchnl_rxq_info structure. The virtchnl.h header does not provide an enumeration of the format values. The existing implementations in the PF directly use the values from the DDP package. Make the formats explicit by defining an enumeration of the RXDIDs. Provide an enumeration for the values as well as the bit positions as returned by the supported_rxdids data from the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. Signed-off-by: Jacob Keller Reviewed-by: Rahul Rameshbabu Reviewed-by: Simon Horman Reviewed-by: Alexander Lobakin Tested-by: Rafal Romanowski Signed-off-by: Mateusz Polchlopek Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 50 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 56baf97c44d0..bc10e6ffa50b 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -313,6 +313,48 @@ struct virtchnl_txq_info { VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_txq_info); +/* RX descriptor IDs (range from 0 to 63) */ +enum virtchnl_rx_desc_ids { + VIRTCHNL_RXDID_0_16B_BASE = 0, + VIRTCHNL_RXDID_1_32B_BASE = 1, + VIRTCHNL_RXDID_2_FLEX_SQ_NIC = 2, + VIRTCHNL_RXDID_3_FLEX_SQ_SW = 3, + VIRTCHNL_RXDID_4_FLEX_SQ_NIC_VEB = 4, + VIRTCHNL_RXDID_5_FLEX_SQ_NIC_ACL = 5, + VIRTCHNL_RXDID_6_FLEX_SQ_NIC_2 = 6, + VIRTCHNL_RXDID_7_HW_RSVD = 7, + /* 8 through 15 are reserved */ + VIRTCHNL_RXDID_16_COMMS_GENERIC = 16, + VIRTCHNL_RXDID_17_COMMS_AUX_VLAN = 17, + VIRTCHNL_RXDID_18_COMMS_AUX_IPV4 = 18, + VIRTCHNL_RXDID_19_COMMS_AUX_IPV6 = 19, + VIRTCHNL_RXDID_20_COMMS_AUX_FLOW = 20, + VIRTCHNL_RXDID_21_COMMS_AUX_TCP = 21, + /* 22 through 63 are reserved */ +}; + +#define VIRTCHNL_RXDID_BIT(x) BIT_ULL(VIRTCHNL_RXDID_##x) + +/* RX descriptor ID bitmasks */ +enum virtchnl_rx_desc_id_bitmasks { + VIRTCHNL_RXDID_0_16B_BASE_M = VIRTCHNL_RXDID_BIT(0_16B_BASE), + VIRTCHNL_RXDID_1_32B_BASE_M = VIRTCHNL_RXDID_BIT(1_32B_BASE), + VIRTCHNL_RXDID_2_FLEX_SQ_NIC_M = VIRTCHNL_RXDID_BIT(2_FLEX_SQ_NIC), + VIRTCHNL_RXDID_3_FLEX_SQ_SW_M = VIRTCHNL_RXDID_BIT(3_FLEX_SQ_SW), + VIRTCHNL_RXDID_4_FLEX_SQ_NIC_VEB_M = VIRTCHNL_RXDID_BIT(4_FLEX_SQ_NIC_VEB), + VIRTCHNL_RXDID_5_FLEX_SQ_NIC_ACL_M = VIRTCHNL_RXDID_BIT(5_FLEX_SQ_NIC_ACL), + VIRTCHNL_RXDID_6_FLEX_SQ_NIC_2_M = VIRTCHNL_RXDID_BIT(6_FLEX_SQ_NIC_2), + VIRTCHNL_RXDID_7_HW_RSVD_M = VIRTCHNL_RXDID_BIT(7_HW_RSVD), + /* 8 through 15 are reserved */ + VIRTCHNL_RXDID_16_COMMS_GENERIC_M = VIRTCHNL_RXDID_BIT(16_COMMS_GENERIC), + VIRTCHNL_RXDID_17_COMMS_AUX_VLAN_M = VIRTCHNL_RXDID_BIT(17_COMMS_AUX_VLAN), + VIRTCHNL_RXDID_18_COMMS_AUX_IPV4_M = VIRTCHNL_RXDID_BIT(18_COMMS_AUX_IPV4), + VIRTCHNL_RXDID_19_COMMS_AUX_IPV6_M = VIRTCHNL_RXDID_BIT(19_COMMS_AUX_IPV6), + VIRTCHNL_RXDID_20_COMMS_AUX_FLOW_M = VIRTCHNL_RXDID_BIT(20_COMMS_AUX_FLOW), + VIRTCHNL_RXDID_21_COMMS_AUX_TCP_M = VIRTCHNL_RXDID_BIT(21_COMMS_AUX_TCP), + /* 22 through 63 are reserved */ +}; + /* virtchnl_rxq_info_flags - definition of bits in the flags field of the * virtchnl_rxq_info structure. * @@ -347,7 +389,12 @@ struct virtchnl_rxq_info { u32 databuffer_size; u32 max_pkt_size; u8 crc_disable; - u8 rxdid; + /* see enum virtchnl_rx_desc_ids; + * only used when VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is supported. Note + * that when the offload is not supported, the descriptor format aligns + * with VIRTCHNL_RXDID_1_32B_BASE. + */ + enum virtchnl_rx_desc_ids rxdid:8; enum virtchnl_rxq_info_flags flags:8; /* see virtchnl_rxq_info_flags */ u8 pad1; u64 dma_ring_addr; @@ -1050,6 +1097,7 @@ struct virtchnl_filter { VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter); struct virtchnl_supported_rxdids { + /* see enum virtchnl_rx_desc_id_bitmasks */ u64 supported_rxdids; }; -- cgit v1.2.3 From 2a86e210f1a102614116e347efda59896f780417 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Wed, 6 Nov 2024 12:37:21 -0500 Subject: iavf: add support for negotiating flexible RXDID format Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF driver the ability to determine what Rx descriptor formats are available. This requires sending an additional message during initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This operation requests the supported Rx descriptor IDs available from the PF. This is treated the same way that VLAN V2 capabilities are handled. Add a new set of extended capability flags, used to process send and receipt of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message. This ensures we finish negotiating for the supported descriptor formats prior to beginning configuration of receive queues. This change stores the supported format bitmap into the iavf_adapter structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled by the PF, we need to make sure that the Rx queue configuration specifies the format. Signed-off-by: Jacob Keller Reviewed-by: Simon Horman Tested-by: Rafal Romanowski Co-developed-by: Mateusz Polchlopek Signed-off-by: Mateusz Polchlopek Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index bc10e6ffa50b..4811b9a14604 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -1096,11 +1096,6 @@ struct virtchnl_filter { VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter); -struct virtchnl_supported_rxdids { - /* see enum virtchnl_rx_desc_id_bitmasks */ - u64 supported_rxdids; -}; - /* VIRTCHNL_OP_EVENT * PF sends this message to inform the VF driver of events that may affect it. * No direct response is expected from the VF, though it may generate other -- cgit v1.2.3 From a045e40645dfa02a68c17ad8a3c92a8ef62375b0 Mon Sep 17 00:00:00 2001 From: Swathi K S Date: Thu, 13 Feb 2025 09:45:59 +0530 Subject: net: stmmac: refactor clock management in EQoS driver Refactor clock management in EQoS driver for code reuse and to avoid redundancy. This way, only minimal changes are required when a new platform is added. Suggested-by: Andrew Lunn Signed-off-by: Swathi K S Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250213041559.106111-1-swathi.ks@samsung.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index c9878a612e53..24422ac4e417 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -254,6 +254,8 @@ struct plat_stmmacenet_data { struct clk *clk_ptp_ref; unsigned long clk_ptp_rate; unsigned long clk_ref_rate; + struct clk_bulk_data *clks; + int num_clks; unsigned int mult_fact_100ns; s32 ptp_max_adj; u32 cdc_error_adj; -- cgit v1.2.3 From e9f03a6a879bffea0ee51af87ac7a0c77716dda6 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 10 Feb 2025 10:53:39 +0000 Subject: net: phylink: add support for notifying PCS about EEE There are hooks in the stmmac driver into XPCS to control the EEE settings when LPI is configured at the MAC. This bypasses the layering. To allow this to be removed from the stmmac driver, add two new methods for PCS to inform them when the LPI/EEE enablement state changes at the MAC. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1thRQ3-003w6u-RH@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 0de78673172d..41ab85e591ad 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -477,6 +477,10 @@ struct phylink_pcs { * @pcs_an_restart: restart 802.3z BaseX autonegotiation. * @pcs_link_up: program the PCS for the resolved link configuration * (where necessary). + * @pcs_disable_eee: optional notification to PCS that EEE has been disabled + * at the MAC. + * @pcs_enable_eee: optional notification to PCS that EEE will be enabled at + * the MAC. * @pcs_pre_init: configure PCS components necessary for MAC hardware * initialization e.g. RX clock for stmmac. */ @@ -500,6 +504,8 @@ struct phylink_pcs_ops { void (*pcs_an_restart)(struct phylink_pcs *pcs); void (*pcs_link_up)(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, int speed, int duplex); + void (*pcs_disable_eee)(struct phylink_pcs *pcs); + void (*pcs_enable_eee)(struct phylink_pcs *pcs); int (*pcs_pre_init)(struct phylink_pcs *pcs); }; @@ -625,6 +631,22 @@ void pcs_an_restart(struct phylink_pcs *pcs); void pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, int speed, int duplex); +/** + * pcs_disable_eee() - Disable EEE at the PCS + * @pcs: a pointer to a &struct phylink_pcs + * + * Optional method informing the PCS that EEE has been disabled at the MAC. + */ +void pcs_disable_eee(struct phylink_pcs *pcs); + +/** + * pcs_enable_eee() - Enable EEE at the PCS + * @pcs: a pointer to a &struct phylink_pcs + * + * Optional method informing the PCS that EEE is about to be enabled at the MAC. + */ +void pcs_enable_eee(struct phylink_pcs *pcs); + /** * pcs_pre_init() - Configure PCS components necessary for MAC initialization * @pcs: a pointer to a &struct phylink_pcs. -- cgit v1.2.3 From 8c841486674a43df9db08210232987cf039b8942 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 10 Feb 2025 10:53:44 +0000 Subject: net: xpcs: add function to configure EEE clock multiplying factor Add a function to separate out the EEE clock multiplying factor. This will be called by the stmmac driver to configure this value. It would have been better had the driver used the CLK API to retrieve this clock, get its rate and calculate the appropriate multiplier, but that door has closed. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1thRQ8-003w70-VT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/pcs/pcs-xpcs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index 733f4ddd2ef1..749d40a9a086 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -52,6 +52,7 @@ struct phylink_pcs *xpcs_to_phylink_pcs(struct dw_xpcs *xpcs); int xpcs_get_an_mode(struct dw_xpcs *xpcs, phy_interface_t interface); int xpcs_config_eee(struct dw_xpcs *xpcs, int mult_fact_100ns, int enable); +void xpcs_config_eee_mult_fact(struct dw_xpcs *xpcs, u8 mult_fact); struct dw_xpcs *xpcs_create_mdiodev(struct mii_bus *bus, int addr); struct dw_xpcs *xpcs_create_fwnode(struct fwnode_handle *fwnode); void xpcs_destroy(struct dw_xpcs *xpcs); -- cgit v1.2.3 From 55faeb89968aab5a32f0de93cd97fc1c4169d0f7 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 10 Feb 2025 10:54:05 +0000 Subject: net: xpcs: remove xpcs_config_eee() from global scope Make xpcs_config_eee() private to the XPCS driver, called only from the phylink pcs_disable_eee() and pcs_enable_eee() methods. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1thRQT-003w7O-Ec@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/pcs/pcs-xpcs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index 749d40a9a086..e40f554ff717 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -50,8 +50,6 @@ struct dw_xpcs; struct phylink_pcs *xpcs_to_phylink_pcs(struct dw_xpcs *xpcs); int xpcs_get_an_mode(struct dw_xpcs *xpcs, phy_interface_t interface); -int xpcs_config_eee(struct dw_xpcs *xpcs, int mult_fact_100ns, - int enable); void xpcs_config_eee_mult_fact(struct dw_xpcs *xpcs, u8 mult_fact); struct dw_xpcs *xpcs_create_mdiodev(struct mii_bus *bus, int addr); struct dw_xpcs *xpcs_create_fwnode(struct fwnode_handle *fwnode); -- cgit v1.2.3 From ea47e70e476ffb3fc8969c842d95609da24266b1 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 13 Feb 2025 22:48:11 +0100 Subject: net: phy: remove fixup-related definitions from phy.h which are not used outside phylib Certain fixup-related definitions aren't used outside phy_device.c. So make them private and remove them from phy.h. Signed-off-by: Heiner Kallweit Reviewed-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/ea6fde13-9183-4c7c-8434-6c0eb64fc72c@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 14 -------------- 1 file changed, 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 64982eba71d1..245578ed710e 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1287,9 +1287,6 @@ struct phy_driver { #define to_phy_driver(d) container_of_const(to_mdio_common_driver(d), \ struct phy_driver, mdiodrv) -#define PHY_ANY_ID "MATCH ANY PHY" -#define PHY_ANY_UID 0xffffffff - #define PHY_ID_MATCH_EXACT(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 0) #define PHY_ID_MATCH_MODEL(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 4) #define PHY_ID_MATCH_VENDOR(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 10) @@ -1322,15 +1319,6 @@ static inline bool phydev_id_compare(struct phy_device *phydev, u32 id) return phy_id_compare(id, phydev->phy_id, phydev->drv->phy_id_mask); } -/* A Structure for boards to register fixups with the PHY Lib */ -struct phy_fixup { - struct list_head list; - char bus_id[MII_BUS_ID_SIZE + 3]; - u32 phy_uid; - u32 phy_uid_mask; - int (*run)(struct phy_device *phydev); -}; - const char *phy_speed_to_str(int speed); const char *phy_duplex_to_str(unsigned int duplex); const char *phy_rate_matching_to_str(int rate_matching); @@ -2127,8 +2115,6 @@ s32 phy_get_internal_delay(struct phy_device *phydev, struct device *dev, void phy_resolve_pause(unsigned long *local_adv, unsigned long *partner_adv, bool *tx_pause, bool *rx_pause); -int phy_register_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask, - int (*run)(struct phy_device *)); int phy_register_fixup_for_id(const char *bus_id, int (*run)(struct phy_device *)); int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, -- cgit v1.2.3 From d3a0e217f850a768851974a6efbd70f5673bb584 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 13 Feb 2025 22:49:19 +0100 Subject: net: phy: stop exporting feature arrays which aren't used outside phylib Stop exporting feature arrays which aren't used outside phylib. Signed-off-by: Heiner Kallweit Reviewed-by: Mateusz Polchlopek Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/01886672-4880-4ca8-b7b0-94d40f6e0ec5@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 245578ed710e..104c0b489991 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -54,11 +54,6 @@ extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_eee_cap2_features) __ro_after_init; #define PHY_EEE_CAP2_FEATURES ((unsigned long *)&phy_eee_cap2_features) extern const int phy_basic_ports_array[3]; -extern const int phy_10_100_features_array[4]; -extern const int phy_basic_t1_features_array[3]; -extern const int phy_basic_t1s_p2mp_features_array[2]; -extern const int phy_gbit_features_array[2]; -extern const int phy_10gbit_features_array[1]; /* * Set phydev->irq to PHY_POLL if interrupts are not supported, -- cgit v1.2.3 From ef6249e37df5eac72bacdfe0a3000b08ae153146 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 13 Feb 2025 22:50:02 +0100 Subject: net: phy: stop exporting phy_queue_state_machine phy_queue_state_machine() isn't used outside phy.c, so stop exporting it. Signed-off-by: Heiner Kallweit Reviewed-by: Mateusz Polchlopek Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/16986d3d-7baf-4b02-a641-e2916d491264@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 104c0b489991..2ab83d24a573 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -2071,7 +2071,6 @@ int phy_drivers_register(struct phy_driver *new_driver, int n, struct module *owner); void phy_error(struct phy_device *phydev); void phy_state_machine(struct work_struct *work); -void phy_queue_state_machine(struct phy_device *phydev, unsigned long jiffies); void phy_trigger_machine(struct phy_device *phydev); void phy_mac_interrupt(struct phy_device *phydev); void phy_start_machine(struct phy_device *phydev); -- cgit v1.2.3 From 6b2edfba74696e0defd181989c9effe911e6f54f Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Thu, 13 Feb 2025 22:51:53 +0100 Subject: net: phy: remove helper phy_is_internal Helper phy_is_internal() is just used in two places phylib-internally. So let's remove it from the API. Signed-off-by: Heiner Kallweit Reviewed-by: Mateusz Polchlopek Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/f3f35265-80a9-4ed7-ad78-ae22c21e288b@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 2ab83d24a573..3665cdd610a3 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1739,15 +1739,6 @@ static inline bool phy_is_default_hwtstamp(struct phy_device *phydev) return phy_has_hwtstamp(phydev) && phydev->default_timestamp; } -/** - * phy_is_internal - Convenience function for testing if a PHY is internal - * @phydev: the phy_device struct - */ -static inline bool phy_is_internal(struct phy_device *phydev) -{ - return phydev->is_internal; -} - /** * phy_on_sfp - Convenience function for testing if a PHY is on an SFP module * @phydev: the phy_device struct -- cgit v1.2.3 From de38503b74e28c47e28ed800d2a8d12c713b2c63 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 13 Feb 2025 17:54:19 +0000 Subject: net: remove phylink_pcs .neg_mode boolean As all PCS are using the neg_mode parameter rather than the legacy an_mode, remove the ability to use the legacy an_mode. We remove the tests in the phylink code, unconditionally passing the PCS neg_mode parameter to PCS methods, and remove setting the flag from drivers. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1tidPn-0040hd-2R@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 41ab85e591ad..08df65f6867a 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -442,7 +442,6 @@ struct phylink_pcs_ops; * are supported by this PCS. * @ops: a pointer to the &struct phylink_pcs_ops structure * @phylink: pointer to &struct phylink_config - * @neg_mode: provide PCS neg mode via "mode" argument * @poll: poll the PCS for link changes * @rxc_always_on: The MAC driver requires the reference clock * to always be on. Standalone PCS drivers which @@ -459,7 +458,6 @@ struct phylink_pcs { DECLARE_PHY_INTERFACE_MASK(supported_interfaces); const struct phylink_pcs_ops *ops; struct phylink *phylink; - bool neg_mode; bool poll; bool rxc_always_on; }; -- cgit v1.2.3 From 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 10 Feb 2025 09:45:05 -0800 Subject: net/mlx4_core: Avoid impossible mlx4_db_alloc() order value GCC can see that the value range for "order" is capped, but this leads it to consider that it might be negative, leading to a false positive warning (with GCC 15 with -Warray-bounds -fdiagnostics-details): ../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=] 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ~~~~~~~~~~~^~~ 'mlx4_alloc_db_from_pgdir': events 1-2 691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) out of array bounds here | (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53, from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42: ../include/linux/mlx4/device.h:664:33: note: while referencing 'bits' 664 | unsigned long *bits[2]; | ^~~~ Switch the argument to unsigned int, which removes the compiler needing to consider negative values. Signed-off-by: Kees Cook Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx4/device.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 87edb7a8173b..f016263e1fcf 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -1135,7 +1135,7 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, struct mlx4_buf *buf); -int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order); +int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order); void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db); int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, -- cgit v1.2.3 From 961ee5aeea048aa292f28d61f3a96a48554e91af Mon Sep 17 00:00:00 2001 From: Dimitri Fedrau Date: Fri, 14 Feb 2025 15:14:10 +0100 Subject: net: phy: Add helper for getting tx amplitude gain Add helper which returns the tx amplitude gain defined in device tree. Modifying it can be necessary to compensate losses on the PCB and connector, so the voltages measured on the RJ45 pins are conforming. Signed-off-by: Dimitri Fedrau Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-2-02ca72620599@liebherr.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 3665cdd610a3..70c632799082 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -2097,6 +2097,10 @@ void phy_get_pause(struct phy_device *phydev, bool *tx_pause, bool *rx_pause); s32 phy_get_internal_delay(struct phy_device *phydev, struct device *dev, const int *delay_values, int size, bool is_rx); +int phy_get_tx_amplitude_gain(struct phy_device *phydev, struct device *dev, + enum ethtool_link_mode_bit_indices linkmode, + u32 *val); + void phy_resolve_pause(unsigned long *local_adv, unsigned long *partner_adv, bool *tx_pause, bool *rx_pause); -- cgit v1.2.3 From 8a6a77bb5a41d5a91c6155e1b902d9f75b5bf9a6 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 16 Feb 2025 22:15:42 +0100 Subject: net: phy: move definition of phy_is_started before phy_disable_eee_mode In preparation of a follow-up patch, move phy_is_started() to before phy_disable_eee_mode(). Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 70c632799082..96ea56de71b2 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1340,22 +1340,22 @@ void of_set_phy_timing_role(struct phy_device *phydev); int phy_speed_down_core(struct phy_device *phydev); /** - * phy_disable_eee_mode - Don't advertise an EEE mode. + * phy_is_started - Convenience function to check whether PHY is started * @phydev: The phy_device struct - * @link_mode: The EEE mode to be disabled */ -static inline void phy_disable_eee_mode(struct phy_device *phydev, u32 link_mode) +static inline bool phy_is_started(struct phy_device *phydev) { - linkmode_set_bit(link_mode, phydev->eee_disabled_modes); + return phydev->state >= PHY_UP; } /** - * phy_is_started - Convenience function to check whether PHY is started + * phy_disable_eee_mode - Don't advertise an EEE mode. * @phydev: The phy_device struct + * @link_mode: The EEE mode to be disabled */ -static inline bool phy_is_started(struct phy_device *phydev) +static inline void phy_disable_eee_mode(struct phy_device *phydev, u32 link_mode) { - return phydev->state >= PHY_UP; + linkmode_set_bit(link_mode, phydev->eee_disabled_modes); } void phy_resolve_aneg_pause(struct phy_device *phydev); -- cgit v1.2.3 From a9b6a860d7789d8183530aedbb46cf70f843e40d Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 16 Feb 2025 22:16:34 +0100 Subject: net: phy: improve phy_disable_eee_mode If a mode is to be disabled, remove it from advertising_eee. Disabling EEE modes shall be done before calling phy_start(), warn if that's not the case. Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 96ea56de71b2..0d5da01d275c 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1355,7 +1355,10 @@ static inline bool phy_is_started(struct phy_device *phydev) */ static inline void phy_disable_eee_mode(struct phy_device *phydev, u32 link_mode) { + WARN_ON(phy_is_started(phydev)); + linkmode_set_bit(link_mode, phydev->eee_disabled_modes); + linkmode_clear_bit(link_mode, phydev->advertising_eee); } void phy_resolve_aneg_pause(struct phy_device *phydev); -- cgit v1.2.3 From 809265fe96fe3eb7a85a9260356767587c482cb7 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 16 Feb 2025 22:20:07 +0100 Subject: net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active After the last user has gone, we can remove the local advertisement parameter from genphy_c45_eee_is_active. Signed-off-by: Heiner Kallweit Reviewed-by: Andrew Lunn Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 0d5da01d275c..584710e084eb 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -2032,8 +2032,7 @@ int genphy_c45_plca_set_cfg(struct phy_device *phydev, const struct phy_plca_cfg *plca_cfg); int genphy_c45_plca_get_status(struct phy_device *phydev, struct phy_plca_status *plca_st); -int genphy_c45_eee_is_active(struct phy_device *phydev, unsigned long *adv, - unsigned long *lp); +int genphy_c45_eee_is_active(struct phy_device *phydev, unsigned long *lp); int genphy_c45_ethtool_get_eee(struct phy_device *phydev, struct ethtool_keee *data); int genphy_c45_ethtool_set_eee(struct phy_device *phydev, -- cgit v1.2.3 From ac9a8587edc78f3a66fdf6a99973ef151cbff72a Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Tue, 18 Feb 2025 10:24:39 +0000 Subject: net: stmmac: "speed" passed to fix_mac_speed is an int priv->plat->fix_mac_speed() is called from stmmac_mac_link_up(), which is passed the speed as an "int". However, fix_mac_speed() implicitly casts this to an unsigned int. Some platform glue code print this value using %u, others with %d. Some implicitly cast it back to an int, and others to u32. Good practice is to use one type and only one type to represent a value being passed around a driver. Switch all of these over to consistently use "int" when dealing with a speed passed from stmmac_mac_link_up(), even though the speed will always be positive. Signed-off-by: Russell King (Oracle) Acked-by: Chen-Yu Tsai Reviewed-by: Andrew Lunn Acked-by: Nobuhiro Iwamatsu Link: https://patch.msgid.link/E1tkKmN-004ObM-Ge@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 24422ac4e417..6d2aa77ea963 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -231,7 +231,7 @@ struct plat_stmmacenet_data { u8 tx_sched_algorithm; struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES]; struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES]; - void (*fix_mac_speed)(void *priv, unsigned int speed, unsigned int mode); + void (*fix_mac_speed)(void *priv, int speed, unsigned int mode); int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); -- cgit v1.2.3 From fd93eaffb3f977b23bc0a48d4c8616e654fcf133 Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Thu, 20 Feb 2025 15:29:31 +0800 Subject: bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback The subsequent patch will implement BPF TX timestamping. It will call the sockops BPF program without holding the sock lock. This breaks the current assumption that all sock ops programs will hold the sock lock. The sock's fields of the uapi's bpf_sock_ops requires this assumption. To address this, a new "u8 is_locked_tcp_sock;" field is added. This patch sets it in the current sock_ops callbacks. The "is_fullsock" test is then replaced by the "is_locked_tcp_sock" test during sock_ops_convert_ctx_access(). The new TX timestamping callbacks added in the subsequent patch will not have this set. This will prevent unsafe access from the new timestamping callbacks. Potentially, we could allow read-only access. However, this would require identifying which callback is read-safe-only and also requires additional BPF instruction rewrites in the covert_ctx. Since the BPF program can always read everything from a socket (e.g., by using bpf_core_cast), this patch keeps it simple and disables all read and write access to any socket fields through the bpf_sock_ops UAPI from the new TX timestamping callback. Moreover, note that some of the fields in bpf_sock_ops are specific to tcp_sock, and sock_ops currently only supports tcp_sock. In the future, UDP timestamping will be added, which will also break this assumption. The same idea used in this patch will be reused. Considering that the current sock_ops only supports tcp_sock, the variable is named is_locked_"tcp"_sock. Signed-off-by: Jason Xing Signed-off-by: Martin KaFai Lau Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com --- include/linux/filter.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index a3ea46281595..d36d5d5180b1 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1508,6 +1508,7 @@ struct bpf_sock_ops_kern { void *skb_data_end; u8 op; u8 is_fullsock; + u8 is_locked_tcp_sock; u8 remaining_opt_len; u64 temp; /* temp and everything after is not * initialized to 0 before calling -- cgit v1.2.3 From 6b98ec7e882af1c3088a88757e2226d06c8514f9 Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Thu, 20 Feb 2025 15:29:34 +0800 Subject: bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback Support SCM_TSTAMP_SCHED case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SCHED_CB. This callback will occur at the same timestamping point as the user space's SCM_TSTAMP_SCHED. The BPF program can use it to get the same SCM_TSTAMP_SCHED timestamp without modifying the user-space application. A new SKBTX_BPF flag is added to mark skb_shinfo(skb)->tx_flags, ensuring that the new BPF timestamping and the current user space's SO_TIMESTAMPING do not interfere with each other. Signed-off-by: Jason Xing Signed-off-by: Martin KaFai Lau Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250220072940.99994-7-kerneljasonxing@gmail.com --- include/linux/skbuff.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index bb2b751d274a..52f6e033e704 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -489,10 +489,14 @@ enum { /* generate software time stamp when entering packet scheduling */ SKBTX_SCHED_TSTAMP = 1 << 6, + + /* used for bpf extension when a bpf program is loaded */ + SKBTX_BPF = 1 << 7, }; #define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP | \ - SKBTX_SCHED_TSTAMP) + SKBTX_SCHED_TSTAMP | \ + SKBTX_BPF) #define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | \ SKBTX_HW_TSTAMP_USE_CYCLES | \ SKBTX_ANY_SW_TSTAMP) -- cgit v1.2.3 From ecebb17ad818bc043e558c278a6c56d5bbaebacc Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Thu, 20 Feb 2025 15:29:35 +0800 Subject: bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback Support sw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_SW_CB. This callback will occur at the same timestamping point as the user space's software SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. Based on this patch, BPF program will get the software timestamp when the driver is ready to send the skb. In the sebsequent patch, the hardware timestamp will be supported. Signed-off-by: Jason Xing Signed-off-by: Martin KaFai Lau Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250220072940.99994-8-kerneljasonxing@gmail.com --- include/linux/skbuff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 52f6e033e704..76582500c5ea 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4568,7 +4568,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb, static inline void skb_tx_timestamp(struct sk_buff *skb) { skb_clone_tx_timestamp(skb); - if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP) + if (skb_shinfo(skb)->tx_flags & (SKBTX_SW_TSTAMP | SKBTX_BPF)) skb_tstamp_tx(skb, NULL); } -- cgit v1.2.3 From 2deaf7f42b8c551e84da20483ca2d4a65c3623b3 Mon Sep 17 00:00:00 2001 From: Jason Xing Date: Thu, 20 Feb 2025 15:29:36 +0800 Subject: bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback Support hw SCM_TSTAMP_SND case for bpf timestamping. Add a new sock_ops callback, BPF_SOCK_OPS_TSTAMP_SND_HW_CB. This callback will occur at the same timestamping point as the user space's hardware SCM_TSTAMP_SND. The BPF program can use it to get the same SCM_TSTAMP_SND timestamp without modifying the user-space application. To avoid increasing the code complexity, replace SKBTX_HW_TSTAMP with SKBTX_HW_TSTAMP_NOBPF instead of changing numerous callers from driver side using SKBTX_HW_TSTAMP. The new definition of SKBTX_HW_TSTAMP means the combination tests of socket timestamping and bpf timestamping. After this patch, drivers can work under the bpf timestamping. Considering some drivers don't assign the skb with hardware timestamp, this patch does the assignment and then BPF program can acquire the hwstamp from skb directly. Signed-off-by: Jason Xing Signed-off-by: Martin KaFai Lau Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20250220072940.99994-9-kerneljasonxing@gmail.com --- include/linux/skbuff.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 76582500c5ea..0b4f1889500d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -470,7 +470,7 @@ struct skb_shared_hwtstamps { /* Definitions for tx_flags in struct skb_shared_info */ enum { /* generate hardware time stamp */ - SKBTX_HW_TSTAMP = 1 << 0, + SKBTX_HW_TSTAMP_NOBPF = 1 << 0, /* generate software time stamp when queueing packet to NIC */ SKBTX_SW_TSTAMP = 1 << 1, @@ -494,6 +494,8 @@ enum { SKBTX_BPF = 1 << 7, }; +#define SKBTX_HW_TSTAMP (SKBTX_HW_TSTAMP_NOBPF | SKBTX_BPF) + #define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP | \ SKBTX_SCHED_TSTAMP | \ SKBTX_BPF) -- cgit v1.2.3 From bb3bb6c92e5719c0f5d7adb9d34db7e76705ac33 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Wed, 19 Feb 2025 21:15:05 +0100 Subject: net: phy: remove unused feature array declarations After 12d5151be010 ("net: phy: remove leftovers from switch to linkmode bitmaps") the following declarations are unused and can be removed too. Signed-off-by: Heiner Kallweit Reviewed-by: Michal Swiatkowski Reviewed-by: Mateusz Polchlopek Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/b2883c75-4108-48f2-ab73-e81647262bc2@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 584710e084eb..13be48d3b8b3 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -37,10 +37,7 @@ extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_basic_t1s_p2mp_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_fibre_features) __ro_after_init; -extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_gbit_all_ports_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_10gbit_features) __ro_after_init; -extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_10gbit_fec_features) __ro_after_init; -extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_10gbit_full_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_eee_cap1_features) __ro_after_init; extern __ETHTOOL_DECLARE_LINK_MODE_MASK(phy_eee_cap2_features) __ro_after_init; -- cgit v1.2.3 From 69c7be1b903fca2835e80ec506bd1d75ce84fb4d Mon Sep 17 00:00:00 2001 From: Xiao Liang Date: Wed, 19 Feb 2025 20:50:28 +0800 Subject: rtnetlink: Pack newlink() params into struct There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/if_macvlan.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 523025106a64..0f7281e3e448 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -59,8 +59,10 @@ static inline void macvlan_count_rx(const struct macvlan_dev *vlan, extern void macvlan_common_setup(struct net_device *dev); -extern int macvlan_common_newlink(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[], +struct rtnl_newlink_params; + +extern int macvlan_common_newlink(struct net_device *dev, + struct rtnl_newlink_params *params, struct netlink_ext_ack *extack); extern void macvlan_dellink(struct net_device *dev, struct list_head *head); -- cgit v1.2.3 From dcc35baae732b9079b2c6595cfd86da02b34a4e6 Mon Sep 17 00:00:00 2001 From: Jeremy Kerr Date: Fri, 21 Feb 2025 08:56:57 +0800 Subject: usb: Add base USB MCTP definitions Upcoming changes will add a USB host (and later gadget) driver for the MCTP-over-USB protocol. Add a header that provides common definitions for protocol support: the packet header format and a few framing definitions. Add a define for the MCTP class code, as per https://usb.org/defined-class-codes. Signed-off-by: Jeremy Kerr Acked-by: Greg Kroah-Hartman Link: https://patch.msgid.link/20250221-dev-mctp-usb-v3-1-3353030fe9cc@codeconstruct.com.au Signed-off-by: Jakub Kicinski --- include/linux/usb/mctp-usb.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 include/linux/usb/mctp-usb.h (limited to 'include/linux') diff --git a/include/linux/usb/mctp-usb.h b/include/linux/usb/mctp-usb.h new file mode 100644 index 000000000000..a2f6f1e04efb --- /dev/null +++ b/include/linux/usb/mctp-usb.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * mctp-usb.h - MCTP USB transport binding: common definitions, + * based on DMTF0283 specification: + * https://www.dmtf.org/sites/default/files/standards/documents/DSP0283_1.0.1.pdf + * + * These are protocol-level definitions, that may be shared between host + * and gadget drivers. + * + * Copyright (C) 2024-2025 Code Construct Pty Ltd + */ + +#ifndef __LINUX_USB_MCTP_USB_H +#define __LINUX_USB_MCTP_USB_H + +#include + +struct mctp_usb_hdr { + __be16 id; + u8 rsvd; + u8 len; +} __packed; + +#define MCTP_USB_XFER_SIZE 512 +#define MCTP_USB_BTU 68 +#define MCTP_USB_MTU_MIN MCTP_USB_BTU +#define MCTP_USB_MTU_MAX (U8_MAX - sizeof(struct mctp_usb_hdr)) +#define MCTP_USB_DMTF_ID 0x1ab4 + +#endif /* __LINUX_USB_MCTP_USB_H */ -- cgit v1.2.3 From 531ca2b9a215d072ffb4b1ff760a73f5e80c9c46 Mon Sep 17 00:00:00 2001 From: Shahar Shitrit Date: Wed, 19 Feb 2025 10:58:07 +0200 Subject: net/mlx5: Add new health syndrome error and crr bit offset Add new error value for trust lockdown in health syndrome enum. Also, include the offset for crr bit in the health buffer layout. These changes prepare for downstream patches that update health event handling. Signed-off-by: Shahar Shitrit Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/20250219085808.349923-2-tariqt@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 1 + include/linux/mlx5/mlx5_ifc.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 0c48b20f818a..fd37f4e54d76 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -538,6 +538,7 @@ struct mlx5_cmd_layout { }; enum mlx5_rfr_severity_bit_offsets { + MLX5_CRR_BIT_OFFSET = 0x6, MLX5_RFR_BIT_OFFSET = 0x7, }; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 4f3716e124c9..cc2875e843f7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11119,6 +11119,7 @@ enum { MLX5_INITIAL_SEG_HEALTH_SYNDROME_FFSER_ERR = 0xf, MLX5_INITIAL_SEG_HEALTH_SYNDROME_HIGH_TEMP_ERR = 0x10, MLX5_INITIAL_SEG_HEALTH_SYNDROME_ICM_PCI_POISONED_ERR = 0x12, + MLX5_INITIAL_SEG_HEALTH_SYNDROME_TRUST_LOCKDOWN_ERR = 0x13, }; struct mlx5_ifc_initial_seg_bits { -- cgit v1.2.3 From 80df31f384b4146a62a01b3d4beb376cc7b9a89e Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Wed, 19 Feb 2025 10:58:08 +0200 Subject: net/mlx5: Change POOL_NEXT_SIZE define value and make it global Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define is used to request the available maximum sized flow table, and zero doesn't make sense for it, whereas some places in the driver use zero explicitly expecting the smallest table size possible but instead due to this define they end up allocating the biggest table size unawarely. In addition move the definition to "include/linux/mlx5/fs.h" to expose the define to IB driver as well, while appropriately renaming it. Signed-off-by: Patrisious Haddad Reviewed-by: Maor Gottlieb Reviewed-by: Mark Bloch Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 2a69d9d71276..01cb72d68c23 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -40,6 +40,8 @@ #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) +#define MLX5_FS_MAX_POOL_SIZE BIT(30) + enum mlx5_flow_destination_type { MLX5_FLOW_DESTINATION_TYPE_NONE, MLX5_FLOW_DESTINATION_TYPE_VPORT, -- cgit v1.2.3 From 89ac4a59ca6d98091e8317bc5a342f38669e19e2 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Fri, 21 Feb 2025 12:07:28 +0100 Subject: skbuff: kill skb_flow_get_ports() Since commit a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff"), this function is not used anymore. Signed-off-by: Nicolas Dichtel Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250221110941.2041629-2-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0b4f1889500d..21644e9215e4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1536,12 +1536,6 @@ u32 __skb_get_poff(const struct sk_buff *skb, const void *data, __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto, const void *data, int hlen_proto); -static inline __be32 skb_flow_get_ports(const struct sk_buff *skb, - int thoff, u8 ip_proto) -{ - return __skb_flow_get_ports(skb, thoff, ip_proto, NULL, 0); -} - void skb_flow_dissector_init(struct flow_dissector *flow_dissector, const struct flow_dissector_key *key, unsigned int key_count); -- cgit v1.2.3 From c52fd4f083cc634c57fc98fce36860e63f6bce2b Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Fri, 21 Feb 2025 12:07:29 +0100 Subject: net: remove '__' from __skb_flow_get_ports() Only one version of skb_flow_get_ports() exists after the previous commit, so let's remove the useless '__'. Suggested-by: Simon Horman Signed-off-by: Nicolas Dichtel Link: https://patch.msgid.link/20250221110941.2041629-3-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 21644e9215e4..f2bb8473d99a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1533,8 +1533,8 @@ void __skb_get_hash_net(const struct net *net, struct sk_buff *skb); u32 skb_get_poff(const struct sk_buff *skb); u32 __skb_get_poff(const struct sk_buff *skb, const void *data, const struct flow_keys_basic *keys, int hlen); -__be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto, - const void *data, int hlen_proto); +__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto, + const void *data, int hlen_proto); void skb_flow_dissector_init(struct flow_dissector *flow_dissector, const struct flow_dissector_key *key, -- cgit v1.2.3 From 85e4a808af2545fefaf18c8fe50071b06fcbdabc Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Thu, 20 Feb 2025 23:39:53 +0200 Subject: net/mlx5e: Add correct match to check IPSec syndromes for switchdev mode In commit dddb49b63d86 ("net/mlx5e: Add IPsec and ASO syndromes check in HW"), IPSec and ASO syndromes checks after decryption for the specified ASO object were added. But they are correct only for eswith in legacy mode. For switchdev mode, metadata register c1 is used to save the mapped id (not ASO object id). So, need to change the match accordingly for the check rules in status table. Signed-off-by: Jianbo Liu Reviewed-by: Leon Romanovsky Reviewed-by: Patrisious Haddad Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/20250220213959.504304-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- include/linux/mlx5/eswitch.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h index df73a2ccc9af..67256e776566 100644 --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -147,6 +147,8 @@ u32 mlx5_eswitch_get_vport_metadata_for_set(struct mlx5_eswitch *esw, /* reuse tun_opts for the mapped ipsec obj id when tun_id is 0 (invalid) */ #define ESW_IPSEC_RX_MAPPED_ID_MASK GENMASK(ESW_TUN_OPTS_BITS - 1, 0) +#define ESW_IPSEC_RX_MAPPED_ID_MATCH_MASK \ + GENMASK(31 - ESW_RESERVED_BITS, ESW_ZONE_ID_BITS) u8 mlx5_eswitch_mode(const struct mlx5_core_dev *dev); u16 mlx5_eswitch_get_total_vports(const struct mlx5_core_dev *dev); -- cgit v1.2.3 From a3e51d4711793be001220784bd7d8ce81517003e Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sat, 22 Feb 2025 09:36:10 +0100 Subject: net: phy: add phylib-internal.h This patch is a starting point for moving phylib-internal declarations to a private header file. Signed-off-by: Heiner Kallweit Link: https://patch.msgid.link/082eacd2-a888-4716-8797-b3491ce02820@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 13be48d3b8b3..7bfbae51070a 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -181,13 +181,6 @@ static inline void phy_interface_set_rgmii(unsigned long *intf) __set_bit(PHY_INTERFACE_MODE_RGMII_TXID, intf); } -/* - * phy_supported_speeds - return all speeds currently supported by a PHY device - */ -unsigned int phy_supported_speeds(struct phy_device *phy, - unsigned int *speeds, - unsigned int size); - /** * phy_modes - map phy_interface_t enum to device tree binding of phy-mode * @interface: enum phy_interface_t value @@ -1331,10 +1324,6 @@ phy_lookup_setting(int speed, int duplex, const unsigned long *mask, bool exact); size_t phy_speeds(unsigned int *speeds, size_t size, unsigned long *mask); -void of_set_phy_supported(struct phy_device *phydev); -void of_set_phy_eee_broken(struct phy_device *phydev); -void of_set_phy_timing_role(struct phy_device *phydev); -int phy_speed_down_core(struct phy_device *phydev); /** * phy_is_started - Convenience function to check whether PHY is started @@ -1360,7 +1349,6 @@ static inline void phy_disable_eee_mode(struct phy_device *phydev, u32 link_mode void phy_resolve_aneg_pause(struct phy_device *phydev); void phy_resolve_aneg_linkmode(struct phy_device *phydev); -void phy_check_downshift(struct phy_device *phydev); /** * phy_read - Convenience function for reading a given PHY register @@ -2035,7 +2023,6 @@ int genphy_c45_ethtool_get_eee(struct phy_device *phydev, int genphy_c45_ethtool_set_eee(struct phy_device *phydev, struct ethtool_keee *data); int genphy_c45_an_config_eee_aneg(struct phy_device *phydev); -int genphy_c45_read_eee_adv(struct phy_device *phydev, unsigned long *adv); /* Generic C45 PHY driver */ extern struct phy_driver genphy_c45_driver; -- cgit v1.2.3 From 287044abff8291993ce9565ac6e6a72b85e33b85 Mon Sep 17 00:00:00 2001 From: Thorsten Blum Date: Sun, 23 Feb 2025 21:45:07 +0100 Subject: sctp: Remove unused payload from sctp_idatahdr Remove the unused payload array from the struct sctp_idatahdr. Cc: Kees Cook Signed-off-by: Thorsten Blum Link: https://patch.msgid.link/20250223204505.2499-3-thorsten.blum@linux.dev Signed-off-by: Paolo Abeni --- include/linux/sctp.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sctp.h b/include/linux/sctp.h index 812011d8b67e..6719949135c9 100644 --- a/include/linux/sctp.h +++ b/include/linux/sctp.h @@ -238,7 +238,6 @@ struct sctp_idatahdr { __u32 ppid; __be32 fsn; }; - __u8 payload[0]; }; struct sctp_idata_chunk { -- cgit v1.2.3 From ecdff893384cf169a55a66ca68c9d8917f8417a9 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Mon, 24 Feb 2025 19:44:13 +0200 Subject: ethtool: Symmetric OR-XOR RSS hash Add an additional type of symmetric RSS hash type: OR-XOR. The "Symmetric-OR-XOR" algorithm transforms the input as follows: (SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT) Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap of supported RXH_XFRM_* types. Reviewed-by: Cosmin Ratiu Reviewed-by: Tariq Toukan Signed-off-by: Gal Pressman Reviewed-by: Edward Cree Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com Signed-off-by: Jakub Kicinski --- include/linux/ethtool.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 870994cc3ef7..7f222dccc7d1 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -763,13 +763,12 @@ struct kernel_ethtool_ts_info { /** * struct ethtool_ops - optional netdev operations + * @supported_input_xfrm: supported types of input xfrm from %RXH_XFRM_*. * @cap_link_lanes_supported: indicates if the driver supports lanes * parameter. * @cap_rss_ctx_supported: indicates if the driver supports RSS * contexts via legacy API, drivers implementing @create_rxfh_context * do not have to set this bit. - * @cap_rss_sym_xor_supported: indicates if the driver supports symmetric-xor - * RSS. * @rxfh_per_ctx_key: device supports setting different RSS key for each * additional context. Netlink API should report hfunc, key, and input_xfrm * for every context, not just context 0. @@ -995,9 +994,9 @@ struct kernel_ethtool_ts_info { * of the generic netdev features interface. */ struct ethtool_ops { + u32 supported_input_xfrm:8; u32 cap_link_lanes_supported:1; u32 cap_rss_ctx_supported:1; - u32 cap_rss_sym_xor_supported:1; u32 rxfh_per_ctx_key:1; u32 cap_rss_rxnfc_adds:1; u32 rxfh_indir_space; -- cgit v1.2.3 From 3ba075278c11cdb19e2dbb80362042f1b0c08f74 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 25 Feb 2025 17:10:48 +0000 Subject: tcp: be less liberal in TSEcr received while in SYN_RECV state Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS, for flows using TCP TS option (RFC 7323) As hinted by an old comment in tcp_check_req(), we can check the TSEcr value in the incoming packet corresponds to one of the SYNACK TSval values we have sent. In this patch, I record the oldest and most recent values that SYNACK packets have used. Send a challenge ACK if we receive a TSEcr outside of this range, and increase a new SNMP counter. nstat -az | grep TSEcrRejected TcpExtTSEcrRejected 0 0.0 Due to TCP fastopen implementation, do not apply yet these checks for fastopen flows. v2: No longer use req->num_timeout, but treq->snt_tsval_first to detect when first SYNACK is prepared. This means we make sure to not send an initial zero TSval. Make sure MPTCP and TCP selftests are passing. Change MIB name to TcpExtTSEcrRejected v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/ Reported-by: Yong-Hao Zou Signed-off-by: Eric Dumazet Reviewed-by: Matthieu Baerts (NGI0) Reviewed-by: Neal Cardwell Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index f88daaa76d83..159b2c59eb62 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -160,6 +160,8 @@ struct tcp_request_sock { u32 rcv_isn; u32 snt_isn; u32 ts_off; + u32 snt_tsval_first; + u32 snt_tsval_last; u32 last_oow_ack_time; /* last SYNACK */ u32 rcv_nxt; /* the ack # by SYNACK. For * FastOpen it's the seq# -- cgit v1.2.3 From e6116fc605574bb58c2016938ff24a7fbafe6e2a Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Mon, 24 Feb 2025 21:33:55 -0500 Subject: net: skb: free up one bit in tx_flags The linked series wants to add skb tx completion timestamps. That needs a bit in skb_shared_info.tx_flags, but all are in use. A per-skb bit is only needed for features that are configured on a per packet basis. Per socket features can be read from sk->sk_tsflags. Per packet tsflags can be set in sendmsg using cmsg, but only those in SOF_TIMESTAMPING_TX_RECORD_MASK. Per packet tsflags can also be set without cmsg by sandwiching a send inbetween two setsockopts: val |= SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); write(fd, buf, sz); val &= ~SOF_TIMESTAMPING_$FEATURE; setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); Changing a datapath test from skb_shinfo(skb)->tx_flags to skb->sk->sk_tsflags can change behavior in that case, as the tx_flags is written before the second setsockopt updates sk_tsflags. Therefore, only bits can be reclaimed that cannot be set by cmsg and are also highly unlikely to be used to target individual packets otherwise. Free up the bit currently used for SKBTX_HW_TSTAMP_USE_CYCLES. This selects between clock and free running counter source for HW TX timestamps. It is probable that all packets of the same socket will always use the same source. Link: https://lore.kernel.org/netdev/cover.1739988644.git.pav@iki.fi/ Signed-off-by: Willem de Bruijn Reviewed-by: Jason Xing Reviewed-by: Gerhard Engleder Link: https://patch.msgid.link/20250225023416.2088705-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f2bb8473d99a..171aa15f6541 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -478,8 +478,8 @@ enum { /* device driver is going to provide hardware time stamp */ SKBTX_IN_PROGRESS = 1 << 2, - /* generate hardware time stamp based on cycles if supported */ - SKBTX_HW_TSTAMP_USE_CYCLES = 1 << 3, + /* reserved */ + SKBTX_RESERVED = 1 << 3, /* generate wifi status information (where possible) */ SKBTX_WIFI_STATUS = 1 << 4, @@ -500,7 +500,6 @@ enum { SKBTX_SCHED_TSTAMP | \ SKBTX_BPF) #define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | \ - SKBTX_HW_TSTAMP_USE_CYCLES | \ SKBTX_ANY_SW_TSTAMP) /* Definitions for flags in struct skb_shared_info */ -- cgit v1.2.3 From bd7c00605ee0cf0bf27764d5da0b948d8004229e Mon Sep 17 00:00:00 2001 From: Ahmed Zaki Date: Mon, 24 Feb 2025 16:22:22 -0700 Subject: net: move aRFS rmap management and CPU affinity to core A common task for most drivers is to remember the user-set CPU affinity to its IRQs. On each netdev reset, the driver should re-assign the user's settings to the IRQs. Unify this task across all drivers by moving the CPU affinity to napi->config. However, to move the CPU affinity to core, we also need to move aRFS rmap management since aRFS uses its own IRQ notifiers. For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate and manage the aRFS rmaps. Freeing the rmap is also done by core when the netdev is freed. For better IRQ affinity management, move the IRQ rmap notifier inside the napi_struct and add new notify.notify and notify.release functions: netif_irq_cpu_rmap_notify() and netif_napi_affinity_release(). Now we have the aRFS rmap management in core, add CPU affinity mask to napi_config. To delegate the CPU affinity management to the core, drivers must: 1 - set the new netdev flag "irq_affinity_auto": netif_enable_irq_affinity(netdev) 2 - create the napi with persistent config: netif_napi_add_config() 3 - bind an IRQ to the napi instance: netif_napi_set_irq() the core will then make sure to use re-assign affinity to the napi's IRQ. The default IRQ mask is set to one cpu starting from the closest NUMA. Signed-off-by: Ahmed Zaki Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski --- include/linux/cpu_rmap.h | 1 + include/linux/netdevice.h | 24 ++++++++++++++++++++---- 2 files changed, 21 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h index 20b5729903d7..2fd7ba75362a 100644 --- a/include/linux/cpu_rmap.h +++ b/include/linux/cpu_rmap.h @@ -32,6 +32,7 @@ struct cpu_rmap { #define CPU_RMAP_DIST_INF 0xffff extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags); +extern void cpu_rmap_get(struct cpu_rmap *rmap); extern int cpu_rmap_put(struct cpu_rmap *rmap); extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9a387d456592..2094d3edda73 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -352,6 +352,7 @@ struct napi_config { u64 gro_flush_timeout; u64 irq_suspend_timeout; u32 defer_hard_irqs; + cpumask_t affinity_mask; unsigned int napi_id; }; @@ -394,6 +395,8 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; int irq; + struct irq_affinity_notify notify; + int napi_rmap_idx; int index; struct napi_config *config; }; @@ -409,6 +412,7 @@ enum { NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ + NAPI_STATE_HAS_NOTIFIER, /* Napi has an IRQ notifier */ }; enum { @@ -422,6 +426,7 @@ enum { NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), + NAPIF_STATE_HAS_NOTIFIER = BIT(NAPI_STATE_HAS_NOTIFIER), }; enum gro_result { @@ -1989,6 +1994,15 @@ enum netdev_reg_state { * * @threaded: napi threaded mode is enabled * + * @irq_affinity_auto: driver wants the core to store and re-assign the IRQ + * affinity. Set by netif_enable_irq_affinity(), then + * the driver must create a persistent napi by + * netif_napi_add_config() and finally bind the napi to + * IRQ (via netif_napi_set_irq()). + * + * @rx_cpu_rmap_auto: driver wants the core to manage the ARFS rmap. + * Set by calling netif_enable_cpu_rmap(). + * * @see_all_hwtstamp_requests: device wants to see calls to * ndo_hwtstamp_set() for all timestamp requests * regardless of source, even if those aren't @@ -2396,6 +2410,8 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; bool proto_down; bool threaded; + bool irq_affinity_auto; + bool rx_cpu_rmap_auto; /* priv_flags_slow, ungrouped to save space */ unsigned long see_all_hwtstamp_requests:1; @@ -2724,10 +2740,7 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev) netdev_assert_locked(dev); } -static inline void netif_napi_set_irq_locked(struct napi_struct *napi, int irq) -{ - napi->irq = irq; -} +void netif_napi_set_irq_locked(struct napi_struct *napi, int irq); static inline void netif_napi_set_irq(struct napi_struct *napi, int irq) { @@ -2865,6 +2878,9 @@ static inline void netif_napi_del(struct napi_struct *napi) synchronize_net(); } +int netif_enable_cpu_rmap(struct net_device *dev, unsigned int num_irqs); +void netif_set_affinity_auto(struct net_device *dev); + struct packet_type { __be16 type; /* This is really htons(ether_type). */ bool ignore_outgoing; -- cgit v1.2.3 From 291515c7640962f8865e4c54897a5e91526b450c Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Tue, 25 Feb 2025 18:17:43 +0100 Subject: net: gro: decouple GRO from the NAPI layer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In fact, these two are not tied closely to each other. The only requirements to GRO are to use it in the BH context and have some sane limits on the packet batches, e.g. NAPI has a limit of its budget (64/8/etc.). Move purely GRO fields into a new structure, &gro_node. Embed it into &napi_struct and adjust all the references. gro_node::cached_napi_id is effectively the same as napi_struct::napi_id, but to be used on GRO hotpath to mark skbs. napi_struct::napi_id is now a fully control path field. Three Ethernet drivers use napi_gro_flush() not really meant to be exported, so move it to and add that include there. napi_gro_receive() is used in more than 100 drivers, keep it in . This does not make GRO ready to use outside of the NAPI context yet. Tested-by: Daniel Xu Acked-by: Jakub Kicinski Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: Alexander Lobakin Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2094d3edda73..26a0c4e4d963 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -340,11 +340,27 @@ struct gro_list { }; /* - * size of gro hash buckets, must less than bit number of - * napi_struct::gro_bitmask + * size of gro hash buckets, must be <= the number of bits in + * gro_node::bitmask */ #define GRO_HASH_BUCKETS 8 +/** + * struct gro_node - structure to support Generic Receive Offload + * @bitmask: bitmask to indicate used buckets in @hash + * @hash: hashtable of pending aggregated skbs, separated by flows + * @rx_list: list of pending ``GRO_NORMAL`` skbs + * @rx_count: cached current length of @rx_list + * @cached_napi_id: napi_struct::napi_id cached for hotpath, 0 for standalone + */ +struct gro_node { + unsigned long bitmask; + struct gro_list hash[GRO_HASH_BUCKETS]; + struct list_head rx_list; + u32 rx_count; + u32 cached_napi_id; +}; + /* * Structure for per-NAPI config */ @@ -371,7 +387,6 @@ struct napi_struct { unsigned long state; int weight; u32 defer_hard_irqs_count; - unsigned long gro_bitmask; int (*poll)(struct napi_struct *, int); #ifdef CONFIG_NETPOLL /* CPU actively polling if netpoll is configured */ @@ -380,11 +395,8 @@ struct napi_struct { /* CPU on which NAPI has been scheduled for processing */ int list_owner; struct net_device *dev; - struct gro_list gro_hash[GRO_HASH_BUCKETS]; struct sk_buff *skb; - struct list_head rx_list; /* Pending GRO_NORMAL skbs */ - int rx_count; /* length of rx_list */ - unsigned int napi_id; /* protected by netdev_lock */ + struct gro_node gro; struct hrtimer timer; /* all fields past this point are write-protected by netdev_lock */ struct task_struct *thread; @@ -392,6 +404,7 @@ struct napi_struct { unsigned long irq_suspend_timeout; u32 defer_hard_irqs; /* control-path-only fields follow */ + u32 napi_id; struct list_head dev_list; struct hlist_node napi_hash_node; int irq; @@ -4131,8 +4144,14 @@ int netif_receive_skb(struct sk_buff *skb); int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); void netif_receive_skb_list(struct list_head *head); -gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb); -void napi_gro_flush(struct napi_struct *napi, bool flush_old); +gro_result_t gro_receive_skb(struct gro_node *gro, struct sk_buff *skb); + +static inline gro_result_t napi_gro_receive(struct napi_struct *napi, + struct sk_buff *skb) +{ + return gro_receive_skb(&napi->gro, skb); +} + struct sk_buff *napi_get_frags(struct napi_struct *napi); gro_result_t napi_gro_frags(struct napi_struct *napi); -- cgit v1.2.3 From 859d6acd94cc4ad65e9eb3fa2a9815a19e5b35cf Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Tue, 25 Feb 2025 18:17:47 +0100 Subject: net: skbuff: introduce napi_skb_cache_get_bulk() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a function to get an array of skbs from the NAPI percpu cache. It's supposed to be a drop-in replacement for kmem_cache_alloc_bulk(skbuff_head_cache, GFP_ATOMIC) and xdp_alloc_skb_bulk(GFP_ATOMIC). The difference (apart from the requirement to call it only from the BH) is that it tries to use as many NAPI cache entries for skbs as possible, and allocate new ones only if needed. The logic is as follows: * there is enough skbs in the cache: decache them and return to the caller; * not enough: try refilling the cache first. If there is now enough skbs, return; * still not enough: try allocating skbs directly to the output array with %GFP_ZERO, maybe we'll be able to get some. If there's now enough, return; * still not enough: return as many as we were able to obtain. Most of times, if called from the NAPI polling loop, the first one will be true, sometimes (rarely) the second one. The third and the fourth -- only under heavy memory pressure. It can save significant amounts of CPU cycles if there are GRO cycles and/or Tx completion cycles (anything that descends to napi_skb_cache_put()) happening on this CPU. Tested-by: Daniel Xu Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: Alexander Lobakin Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 171aa15f6541..14517e95a46c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1320,6 +1320,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size); void skb_attempt_defer_free(struct sk_buff *skb); +u32 napi_skb_cache_get_bulk(void **skbs, u32 n); struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); struct sk_buff *slab_build_skb(void *data); -- cgit v1.2.3 From f8131f4cc5bda6154d81ee5bafe3db7e2e72a89c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 25 Feb 2025 21:09:23 +0100 Subject: net: qed: make 'qed_ll2_ops_pass' as __maybe_unused gcc warns about unused const variables even in header files when building with W=1: In file included from include/linux/qed/qed_rdma_if.h:14, from drivers/net/ethernet/qlogic/qed/qed_rdma.h:16, from drivers/net/ethernet/qlogic/qed/qed_cxt.c:23: include/linux/qed/qed_ll2_if.h:270:33: error: 'qed_ll2_ops_pass' defined but not used [-Werror=unused-const-variable=] 270 | static const struct qed_ll2_ops qed_ll2_ops_pass = { This one is intentional, so mark it as __maybe_unused to it can be included from a file that doesn't use this variable. Signed-off-by: Arnd Bergmann Reviewed-by: Simon Horman Tested-by: Simon Horman # build-tested Link: https://patch.msgid.link/20250225200926.4057723-1-arnd@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/qed/qed_ll2_if.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/qed/qed_ll2_if.h b/include/linux/qed/qed_ll2_if.h index 5b67cd03276e..aa29ac53b833 100644 --- a/include/linux/qed/qed_ll2_if.h +++ b/include/linux/qed/qed_ll2_if.h @@ -267,7 +267,7 @@ struct qed_ll2_ops { int qed_ll2_alloc_if(struct qed_dev *); void qed_ll2_dealloc_if(struct qed_dev *); #else -static const struct qed_ll2_ops qed_ll2_ops_pass = { +static __maybe_unused const struct qed_ll2_ops qed_ll2_ops_pass = { .start = NULL, .stop = NULL, .start_xmit = NULL, -- cgit v1.2.3 From dea5c8ec20be7603e4a56b9d6571321f330626c1 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 27 Feb 2025 09:16:23 +0000 Subject: net: stmmac: provide set_clk_tx_rate() hook Several stmmac sub-drivers which support RGMII follow the same pattern. They calculate the transmit clock rate, and then call clk_set_rate(). Analysis of several implementation documents suggests that the platform is responsible for providing the transmit clock to the DWMAC core's clk_tx_i. The expected rates are: 10Mbps 100Mbps 1Gbps MII 2.5MHz 25MHz RMII 2.5MHz 25MHz GMII 125MHz RGMI 2.5MHz 25MHz 125MHz It seems some platforms require this clock to be manually configured, but there are outputs from the MAC core that indicate the speed, so a platform may use these to automatically configure the clock. Thus, we can't just provide one solution to configure this clock rate. Moreover, the clock may need to be derived from one of several sources depending on the interface mode. Provide a platform hook that is passed the transmit clock, interface mode and speed. Reviewed-by: Thierry Reding Signed-off-by: Russell King (Oracle) Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/E1tna0F-0052sS-Lr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 6d2aa77ea963..cd0d1383df87 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -78,6 +78,7 @@ | DMA_AXI_BLEN_32 | DMA_AXI_BLEN_64 \ | DMA_AXI_BLEN_128 | DMA_AXI_BLEN_256) +struct clk; struct stmmac_priv; /* Platfrom data for platform device structure's platform_data field */ @@ -231,6 +232,8 @@ struct plat_stmmacenet_data { u8 tx_sched_algorithm; struct stmmac_rxq_cfg rx_queues_cfg[MTL_MAX_RX_QUEUES]; struct stmmac_txq_cfg tx_queues_cfg[MTL_MAX_TX_QUEUES]; + int (*set_clk_tx_rate)(void *priv, struct clk *clk_tx_i, + phy_interface_t interface, int speed); void (*fix_mac_speed)(void *priv, int speed, unsigned int mode); int (*fix_soc_reset)(void *priv, void __iomem *ioaddr); int (*serdes_powerup)(struct net_device *ndev, void *priv); @@ -252,6 +255,7 @@ struct plat_stmmacenet_data { struct clk *stmmac_clk; struct clk *pclk; struct clk *clk_ptp_ref; + struct clk *clk_tx_i; /* clk_tx_i to MAC core */ unsigned long clk_ptp_rate; unsigned long clk_ref_rate; struct clk_bulk_data *clks; -- cgit v1.2.3 From 0c493da86374dffff7505e67289ad75b21f5b301 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Fri, 28 Feb 2025 11:20:56 +0100 Subject: net: rename netns_local to netns_immutable The name 'netns_local' is confusing. A following commit will export it via netlink, so let's use a more explicit name. Reported-by: Eric Dumazet Suggested-by: Kuniyuki Iwashima Signed-off-by: Nicolas Dichtel Reviewed-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 26a0c4e4d963..b8728d67ea91 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2021,7 +2021,7 @@ enum netdev_reg_state { * regardless of source, even if those aren't * HWTSTAMP_SOURCE_NETDEV * @change_proto_down: device supports setting carrier via IFLA_PROTO_DOWN - * @netns_local: interface can't change network namespaces + * @netns_immutable: interface can't change network namespaces * @fcoe_mtu: device supports maximum FCoE MTU, 2158 bytes * * @net_notifier_list: List of per-net netdev notifier block @@ -2429,7 +2429,7 @@ struct net_device { /* priv_flags_slow, ungrouped to save space */ unsigned long see_all_hwtstamp_requests:1; unsigned long change_proto_down:1; - unsigned long netns_local:1; + unsigned long netns_immutable:1; unsigned long fcoe_mtu:1; struct list_head net_notifier_list; -- cgit v1.2.3 From 12b6f7069ba5aa160f3332916408b34ae8e0b0f6 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Fri, 28 Feb 2025 11:20:58 +0100 Subject: net: plumb extack in __dev_change_net_namespace() It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument". Signed-off-by: Nicolas Dichtel Reviewed-by: Eric Dumazet Reviewed-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b8728d67ea91..7ab86ec228b7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4191,12 +4191,13 @@ int dev_change_flags(struct net_device *dev, unsigned int flags, int dev_set_alias(struct net_device *, const char *, size_t); int dev_get_alias(const struct net_device *, char *, size_t); int __dev_change_net_namespace(struct net_device *dev, struct net *net, - const char *pat, int new_ifindex); + const char *pat, int new_ifindex, + struct netlink_ext_ack *extack); static inline int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat) { - return __dev_change_net_namespace(dev, net, pat, 0); + return __dev_change_net_namespace(dev, net, pat, 0, NULL); } int __dev_set_mtu(struct net_device *, int); int dev_set_mtu(struct net_device *, int); -- cgit v1.2.3 From 95d0d094ba26432ec467e2260f4bf553053f1f8f Mon Sep 17 00:00:00 2001 From: Qingfang Deng Date: Sat, 1 Mar 2025 21:55:16 +0800 Subject: ppp: use IFF_NO_QUEUE in virtual interfaces MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly forwards packets to the underlying network stack and never returns anything other than 1. So these interfaces do not require a qdisc, and the IFF_NO_QUEUE flag should be set. Introduces a direct_xmit flag in struct ppp_channel to indicate when IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel() for relevant protocols. While at it, remove the usused latency member from struct ppp_channel. Signed-off-by: Qingfang Deng Reviewed-by: Toke Høiland-Jørgensen Link: https://patch.msgid.link/20250301135517.695809-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/ppp_channel.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ppp_channel.h b/include/linux/ppp_channel.h index 45e6e427ceb8..f73fbea0dbc2 100644 --- a/include/linux/ppp_channel.h +++ b/include/linux/ppp_channel.h @@ -42,8 +42,7 @@ struct ppp_channel { int hdrlen; /* amount of headroom channel needs */ void *ppp; /* opaque to channel */ int speed; /* transfer rate (bytes/second) */ - /* the following is not used at present */ - int latency; /* overhead time in milliseconds */ + bool direct_xmit; /* no qdisc, xmit directly */ }; #ifdef __KERNEL__ -- cgit v1.2.3 From e859d375d1694488015e6804bfeea527a0b25b9f Mon Sep 17 00:00:00 2001 From: Wojtek Wasko Date: Mon, 3 Mar 2025 18:13:43 +0200 Subject: posix-clock: Store file pointer in struct posix_clock_context File descriptor based pc_clock_*() operations of dynamic posix clocks have access to the file pointer and implement permission checks in the generic code before invoking the relevant dynamic clock callback. Character device operations (open, read, poll, ioctl) do not implement a generic permission control and the dynamic clock callbacks have no access to the file pointer to implement them. Extend struct posix_clock_context with a struct file pointer and initialize it in posix_clock_open(), so that all dynamic clock callbacks can access it. Acked-by: Richard Cochran Reviewed-by: Vadim Fedorenko Reviewed-by: Thomas Gleixner Signed-off-by: Wojtek Wasko Signed-off-by: David S. Miller --- include/linux/posix-clock.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index ef8619f48920..a500d3160fe8 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -95,10 +95,13 @@ struct posix_clock { * struct posix_clock_context - represents clock file operations context * * @clk: Pointer to the clock + * @fp: Pointer to the file used to open the clock * @private_clkdata: Pointer to user data * * Drivers should use struct posix_clock_context during specific character - * device file operation methods to access the posix clock. + * device file operation methods to access the posix clock. In particular, + * the file pointer can be used to verify correct access mode for ioctl() + * calls. * * Drivers can store a private data structure during the open operation * if they have specific information that is required in other file @@ -106,6 +109,7 @@ struct posix_clock { */ struct posix_clock_context { struct posix_clock *clk; + struct file *fp; void *private_clkdata; }; -- cgit v1.2.3 From 7e2f7e25f6ffc229fd5be0488e1d8aac7bdd80e8 Mon Sep 17 00:00:00 2001 From: "David E. Box" Date: Thu, 27 Feb 2025 20:15:19 +0800 Subject: arch: x86: add IPC mailbox accessor function and add SoC register access MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Exports intel_pmc_ipc() for host access to the PMC IPC mailbox - Enables the host to access specific SoC registers through the PMC firmware using IPC commands. This access method is necessary for registers that are not available through direct Memory-Mapped I/O (MMIO), which is used for other accessible parts of the PMC. Signed-off-by: David E. Box Signed-off-by: Chao Qin Signed-off-by: Choong Yong Liang Acked-by: Ilpo Järvinen Link: https://patch.msgid.link/20250227121522.1802832-4-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski --- include/linux/platform_data/x86/intel_pmc_ipc.h | 94 +++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 include/linux/platform_data/x86/intel_pmc_ipc.h (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/intel_pmc_ipc.h b/include/linux/platform_data/x86/intel_pmc_ipc.h new file mode 100644 index 000000000000..6e603a8c075f --- /dev/null +++ b/include/linux/platform_data/x86/intel_pmc_ipc.h @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Intel Core SoC Power Management Controller Header File + * + * Copyright (c) 2025, Intel Corporation. + * All Rights Reserved. + * + */ +#ifndef INTEL_PMC_IPC_H +#define INTEL_PMC_IPC_H +#include + +#define IPC_SOC_REGISTER_ACCESS 0xAA +#define IPC_SOC_SUB_CMD_READ 0x00 +#define IPC_SOC_SUB_CMD_WRITE 0x01 +#define PMC_IPCS_PARAM_COUNT 7 +#define VALID_IPC_RESPONSE 5 + +struct pmc_ipc_cmd { + u32 cmd; + u32 sub_cmd; + u32 size; + u32 wbuf[4]; +}; + +struct pmc_ipc_rbuf { + u32 buf[4]; +}; + +/** + * intel_pmc_ipc() - PMC IPC Mailbox accessor + * @ipc_cmd: Prepared input command to send + * @rbuf: Allocated array for returned IPC data + * + * Return: 0 on success. Non-zero on mailbox error + */ +static inline int intel_pmc_ipc(struct pmc_ipc_cmd *ipc_cmd, struct pmc_ipc_rbuf *rbuf) +{ + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; + union acpi_object params[PMC_IPCS_PARAM_COUNT] = { + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + {.type = ACPI_TYPE_INTEGER,}, + }; + struct acpi_object_list arg_list = { PMC_IPCS_PARAM_COUNT, params }; + union acpi_object *obj; + int status; + + if (!ipc_cmd || !rbuf) + return -EINVAL; + + /* + * 0: IPC Command + * 1: IPC Sub Command + * 2: Size + * 3-6: Write Buffer for offset + */ + params[0].integer.value = ipc_cmd->cmd; + params[1].integer.value = ipc_cmd->sub_cmd; + params[2].integer.value = ipc_cmd->size; + params[3].integer.value = ipc_cmd->wbuf[0]; + params[4].integer.value = ipc_cmd->wbuf[1]; + params[5].integer.value = ipc_cmd->wbuf[2]; + params[6].integer.value = ipc_cmd->wbuf[3]; + + status = acpi_evaluate_object(NULL, "\\IPCS", &arg_list, &buffer); + if (ACPI_FAILURE(status)) + return -ENODEV; + + obj = buffer.pointer; + + if (obj && obj->type == ACPI_TYPE_PACKAGE && + obj->package.count == VALID_IPC_RESPONSE) { + const union acpi_object *objs = obj->package.elements; + + if ((u8)objs[0].integer.value != 0) + return -EINVAL; + + rbuf->buf[0] = objs[1].integer.value; + rbuf->buf[1] = objs[2].integer.value; + rbuf->buf[2] = objs[3].integer.value; + rbuf->buf[3] = objs[4].integer.value; + } else { + return -EINVAL; + } + + return 0; +} + +#endif /* INTEL_PMC_IPC_H */ -- cgit v1.2.3 From e654cfc718d451bdf6c1554efc945171239f03f5 Mon Sep 17 00:00:00 2001 From: Choong Yong Liang Date: Thu, 27 Feb 2025 20:15:20 +0800 Subject: net: stmmac: configure SerDes on mac_finish SerDes will configure according to the provided interface mode after finish a major reconfiguration of the interface mode. Reviewed-by: Russell King (Oracle) Signed-off-by: Choong Yong Liang Link: https://patch.msgid.link/20250227121522.1802832-5-yong.liang.choong@linux.intel.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index cd0d1383df87..b6f03ab12595 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -239,6 +239,10 @@ struct plat_stmmacenet_data { int (*serdes_powerup)(struct net_device *ndev, void *priv); void (*serdes_powerdown)(struct net_device *ndev, void *priv); void (*speed_mode_2500)(struct net_device *ndev, void *priv); + int (*mac_finish)(struct net_device *ndev, + void *priv, + unsigned int mode, + phy_interface_t interface); void (*ptp_clk_freq_config)(struct stmmac_priv *priv); int (*init)(struct platform_device *pdev, void *priv); void (*exit)(struct platform_device *pdev, void *priv); -- cgit v1.2.3 From e7f984e925d23f8fd0469e344a8dc225e1a1b0ab Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 3 Mar 2025 21:18:46 +0100 Subject: net: phy: move PHY package related code from phy.h to phy_package.c Move PHY package related inline functions from phy.h to phy_package.c. While doing so remove locked versions phy_package_read() and phy_package_write() which have no user. Signed-off-by: Heiner Kallweit Link: https://patch.msgid.link/a4518379-7a5d-45f3-831c-b7fde6512c65@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 86 ----------------------------------------------------- 1 file changed, 86 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 7bfbae51070a..2b12d1bef3cc 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -350,10 +350,6 @@ struct phy_package_shared { void *priv; }; -/* used as bit number in atomic bitops */ -#define PHY_SHARED_F_INIT_DONE 0 -#define PHY_SHARED_F_PROBE_DONE 1 - /** * struct mii_bus - Represents an MDIO bus * @@ -2149,67 +2145,6 @@ int __phy_hwtstamp_set(struct phy_device *phydev, struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); -static inline int phy_package_address(struct phy_device *phydev, - unsigned int addr_offset) -{ - struct phy_package_shared *shared = phydev->shared; - u8 base_addr = shared->base_addr; - - if (addr_offset >= PHY_MAX_ADDR - base_addr) - return -EIO; - - /* we know that addr will be in the range 0..31 and thus the - * implicit cast to a signed int is not a problem. - */ - return base_addr + addr_offset; -} - -static inline int phy_package_read(struct phy_device *phydev, - unsigned int addr_offset, u32 regnum) -{ - int addr = phy_package_address(phydev, addr_offset); - - if (addr < 0) - return addr; - - return mdiobus_read(phydev->mdio.bus, addr, regnum); -} - -static inline int __phy_package_read(struct phy_device *phydev, - unsigned int addr_offset, u32 regnum) -{ - int addr = phy_package_address(phydev, addr_offset); - - if (addr < 0) - return addr; - - return __mdiobus_read(phydev->mdio.bus, addr, regnum); -} - -static inline int phy_package_write(struct phy_device *phydev, - unsigned int addr_offset, u32 regnum, - u16 val) -{ - int addr = phy_package_address(phydev, addr_offset); - - if (addr < 0) - return addr; - - return mdiobus_write(phydev->mdio.bus, addr, regnum, val); -} - -static inline int __phy_package_write(struct phy_device *phydev, - unsigned int addr_offset, u32 regnum, - u16 val) -{ - int addr = phy_package_address(phydev, addr_offset); - - if (addr < 0) - return addr; - - return __mdiobus_write(phydev->mdio.bus, addr, regnum, val); -} - int __phy_package_read_mmd(struct phy_device *phydev, unsigned int addr_offset, int devad, u32 regnum); @@ -2226,27 +2161,6 @@ int phy_package_write_mmd(struct phy_device *phydev, unsigned int addr_offset, int devad, u32 regnum, u16 val); -static inline bool __phy_package_set_once(struct phy_device *phydev, - unsigned int b) -{ - struct phy_package_shared *shared = phydev->shared; - - if (!shared) - return false; - - return !test_and_set_bit(b, &shared->flags); -} - -static inline bool phy_package_init_once(struct phy_device *phydev) -{ - return __phy_package_set_once(phydev, PHY_SHARED_F_INIT_DONE); -} - -static inline bool phy_package_probe_once(struct phy_device *phydev) -{ - return __phy_package_set_once(phydev, PHY_SHARED_F_PROBE_DONE); -} - extern const struct bus_type mdio_bus_type; struct mdio_board_info { -- cgit v1.2.3 From a4002849776948f257d564944e8fd7a727362ed9 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 3 Mar 2025 21:19:25 +0100 Subject: net: phy: remove remaining PHY package related definitions from phy.h Move definition of struct phy_package_shared to phy_package.c, and move remaining PHY package related declarations from phy.h to phylib.h, thus making them accessible for PHY drivers only. Signed-off-by: Heiner Kallweit Link: https://patch.msgid.link/211e14b6-e2f8-43d7-b533-3628ec548456@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 38 -------------------------------------- 1 file changed, 38 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 2b12d1bef3cc..c4a6385faf41 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -319,37 +319,6 @@ struct mdio_bus_stats { struct u64_stats_sync syncp; }; -/** - * struct phy_package_shared - Shared information in PHY packages - * @base_addr: Base PHY address of PHY package used to combine PHYs - * in one package and for offset calculation of phy_package_read/write - * @np: Pointer to the Device Node if PHY package defined in DT - * @refcnt: Number of PHYs connected to this shared data - * @flags: Initialization of PHY package - * @priv_size: Size of the shared private data @priv - * @priv: Driver private data shared across a PHY package - * - * Represents a shared structure between different phydev's in the same - * package, for example a quad PHY. See phy_package_join() and - * phy_package_leave(). - */ -struct phy_package_shared { - u8 base_addr; - /* With PHY package defined in DT this points to the PHY package node */ - struct device_node *np; - refcount_t refcnt; - unsigned long flags; - size_t priv_size; - - /* private data pointer */ - /* note that this pointer is shared between different phydevs and - * the user has to take care of appropriate locking. It is allocated - * and freed automatically by phy_package_join() and - * phy_package_leave(). - */ - void *priv; -}; - /** * struct mii_bus - Represents an MDIO bus * @@ -2109,13 +2078,6 @@ int phy_ethtool_get_link_ksettings(struct net_device *ndev, int phy_ethtool_set_link_ksettings(struct net_device *ndev, const struct ethtool_link_ksettings *cmd); int phy_ethtool_nway_reset(struct net_device *ndev); -int phy_package_join(struct phy_device *phydev, int base_addr, size_t priv_size); -int of_phy_package_join(struct phy_device *phydev, size_t priv_size); -void phy_package_leave(struct phy_device *phydev); -int devm_phy_package_join(struct device *dev, struct phy_device *phydev, - int base_addr, size_t priv_size); -int devm_of_phy_package_join(struct device *dev, struct phy_device *phydev, - size_t priv_size); int __init mdio_bus_init(void); void mdio_bus_exit(void); -- cgit v1.2.3 From d4c22ec680c8db832ffc0b964c6008e65436cba8 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:19 -0800 Subject: net: hold netdev instance lock during ndo_open/ndo_stop For the drivers that use shaper API, switch to the mode where core stack holds the netdev lock. This affects two drivers: * iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly remove these * netdevsim - switch to _locked APIs to avoid deadlock iavf_close diff is a bit confusing, the existing call looks like this: iavf_close() { netdev_lock() .. netdev_unlock() wait_event_timeout(down_waitqueue) } I change it to the following: netdev_lock() iavf_close() { .. netdev_unlock() wait_event_timeout(down_waitqueue) netdev_lock() // reusing this lock call } netdev_unlock() Since I'm reusing existing netdev_lock call, so it looks like I only add netdev_unlock. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7ab86ec228b7..33066b155c84 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2753,6 +2753,29 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev) netdev_assert_locked(dev); } +static inline bool netdev_need_ops_lock(struct net_device *dev) +{ + bool ret = false; + +#if IS_ENABLED(CONFIG_NET_SHAPER) + ret |= !!dev->netdev_ops->net_shaper_ops; +#endif + + return ret; +} + +static inline void netdev_lock_ops(struct net_device *dev) +{ + if (netdev_need_ops_lock(dev)) + netdev_lock(dev); +} + +static inline void netdev_unlock_ops(struct net_device *dev) +{ + if (netdev_need_ops_lock(dev)) + netdev_unlock(dev); +} + void netif_napi_set_irq_locked(struct napi_struct *napi, int irq); static inline void netif_napi_set_irq(struct napi_struct *napi, int irq) -- cgit v1.2.3 From c4f0f30b424e7258a777bcbcbf9006207da4854c Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:20 -0800 Subject: net: hold netdev instance lock during nft ndo_setup_tc Introduce new dev_setup_tc for nft ndo_setup_tc paths. Reviewed-by: Eric Dumazet Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 33066b155c84..69951eeb96d2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3353,6 +3353,8 @@ int dev_alloc_name(struct net_device *dev, const char *name); int dev_open(struct net_device *dev, struct netlink_ext_ack *extack); void dev_close(struct net_device *dev); void dev_close_many(struct list_head *head, bool unlink); +int dev_setup_tc(struct net_device *dev, enum tc_setup_type type, + void *type_data); void dev_disable_lro(struct net_device *dev); int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb); u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb, -- cgit v1.2.3 From cae03e5bdd9e0c8570506c50f1f234da40201732 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:23 -0800 Subject: net: hold netdev instance lock during queue operations For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Reviewed-by: Eric Dumazet Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 69951eeb96d2..abda17b15950 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2755,7 +2755,7 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev) static inline bool netdev_need_ops_lock(struct net_device *dev) { - bool ret = false; + bool ret = !!dev->queue_mgmt_ops; #if IS_ENABLED(CONFIG_NET_SHAPER) ret |= !!dev->netdev_ops->net_shaper_ops; -- cgit v1.2.3 From 7e4d784f5810bba76c4593791028e13cce4af547 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:24 -0800 Subject: net: hold netdev instance lock during rtnetlink operations To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 40 ++++++++++++++++++++++++++++++++++------ 1 file changed, 34 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index abda17b15950..be3d09b61e95 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2620,16 +2620,35 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, f(dev, &dev->_tx[i], arg); } +static inline int netdev_lock_cmp_fn(const struct lockdep_map *a, + const struct lockdep_map *b) +{ + /* Only lower devices currently grab the instance lock, so no + * real ordering issues can occur. In the near future, only + * hardware devices will grab instance lock which also does not + * involve any ordering. Suppress lockdep ordering warnings + * until (if) we start grabbing instance lock on pure SW + * devices (bond/team/veth/etc). + */ + if (a == b) + return 0; + return -1; +} + #define netdev_lockdep_set_classes(dev) \ { \ static struct lock_class_key qdisc_tx_busylock_key; \ static struct lock_class_key qdisc_xmit_lock_key; \ static struct lock_class_key dev_addr_list_lock_key; \ + static struct lock_class_key dev_instance_lock_key; \ unsigned int i; \ \ (dev)->qdisc_tx_busylock = &qdisc_tx_busylock_key; \ lockdep_set_class(&(dev)->addr_list_lock, \ &dev_addr_list_lock_key); \ + lockdep_set_class(&(dev)->lock, \ + &dev_instance_lock_key); \ + lock_set_cmp_fn(&dev->lock, netdev_lock_cmp_fn, NULL); \ for (i = 0; i < (dev)->num_tx_queues; i++) \ lockdep_set_class(&(dev)->_tx[i]._xmit_lock, \ &qdisc_xmit_lock_key); \ @@ -2776,6 +2795,12 @@ static inline void netdev_unlock_ops(struct net_device *dev) netdev_unlock(dev); } +static inline void netdev_ops_assert_locked(struct net_device *dev) +{ + if (netdev_need_ops_lock(dev)) + lockdep_assert_held(&dev->lock); +} + void netif_napi_set_irq_locked(struct napi_struct *napi, int irq); static inline void netif_napi_set_irq(struct napi_struct *napi, int irq) @@ -3350,7 +3375,9 @@ struct net_device *dev_get_by_name_rcu(struct net *net, const char *name); struct net_device *__dev_get_by_name(struct net *net, const char *name); bool netdev_name_in_use(struct net *net, const char *name); int dev_alloc_name(struct net_device *dev, const char *name); +int netif_open(struct net_device *dev, struct netlink_ext_ack *extack); int dev_open(struct net_device *dev, struct netlink_ext_ack *extack); +void netif_close(struct net_device *dev); void dev_close(struct net_device *dev); void dev_close_many(struct list_head *head, bool unlink); int dev_setup_tc(struct net_device *dev, enum tc_setup_type type, @@ -4211,25 +4238,26 @@ int dev_ethtool(struct net *net, struct ifreq *ifr, void __user *userdata); unsigned int dev_get_flags(const struct net_device *); int __dev_change_flags(struct net_device *dev, unsigned int flags, struct netlink_ext_ack *extack); +int netif_change_flags(struct net_device *dev, unsigned int flags, + struct netlink_ext_ack *extack); int dev_change_flags(struct net_device *dev, unsigned int flags, struct netlink_ext_ack *extack); +int netif_set_alias(struct net_device *dev, const char *alias, size_t len); int dev_set_alias(struct net_device *, const char *, size_t); int dev_get_alias(const struct net_device *, char *, size_t); -int __dev_change_net_namespace(struct net_device *dev, struct net *net, +int netif_change_net_namespace(struct net_device *dev, struct net *net, const char *pat, int new_ifindex, struct netlink_ext_ack *extack); -static inline int dev_change_net_namespace(struct net_device *dev, struct net *net, - const char *pat) -{ - return __dev_change_net_namespace(dev, net, pat, 0, NULL); -} + const char *pat); int __dev_set_mtu(struct net_device *, int); int dev_set_mtu(struct net_device *, int); int dev_pre_changeaddr_notify(struct net_device *dev, const char *addr, struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); +int netif_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, + struct netlink_ext_ack *extack); int dev_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name); -- cgit v1.2.3 From ffb7ed19ac0a9fa9ea79af1d7b42c03a10da98a5 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:25 -0800 Subject: net: hold netdev instance lock during ioctl operations Convert all ndo_eth_ioctl invocations to dev_eth_ioctl which does the locking. Reflow some of the dev_siocxxx to drop else clause. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index be3d09b61e95..8d243c0ec39d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4229,6 +4229,8 @@ int put_user_ifreq(struct ifreq *ifr, void __user *arg); int dev_ioctl(struct net *net, unsigned int cmd, struct ifreq *ifr, void __user *data, bool *need_copyout); int dev_ifconf(struct net *net, struct ifconf __user *ifc); +int dev_eth_ioctl(struct net_device *dev, + struct ifreq *ifr, unsigned int cmd); int generic_hwtstamp_get_lower(struct net_device *dev, struct kernel_hwtstamp_config *kernel_cfg); int generic_hwtstamp_set_lower(struct net_device *dev, @@ -4251,6 +4253,7 @@ int netif_change_net_namespace(struct net_device *dev, struct net *net, int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat); int __dev_set_mtu(struct net_device *, int); +int netif_set_mtu(struct net_device *dev, int new_mtu); int dev_set_mtu(struct net_device *, int); int dev_pre_changeaddr_notify(struct net_device *dev, const char *addr, struct netlink_ext_ack *extack); -- cgit v1.2.3 From ad7c7b2172c388818a111455643491d75f535e90 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:26 -0800 Subject: net: hold netdev instance lock during sysfs operations Most of them are already covered by the converted dev_xxx APIs. Add the locking wrappers for the remaining ones. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8d243c0ec39d..c61b12809588 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3382,6 +3382,7 @@ void dev_close(struct net_device *dev); void dev_close_many(struct list_head *head, bool unlink); int dev_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data); +void netif_disable_lro(struct net_device *dev); void dev_disable_lro(struct net_device *dev); int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb); u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb, @@ -4257,6 +4258,8 @@ int netif_set_mtu(struct net_device *dev, int new_mtu); int dev_set_mtu(struct net_device *, int); int dev_pre_changeaddr_notify(struct net_device *dev, const char *addr, struct netlink_ext_ack *extack); +int netif_set_mac_address(struct net_device *dev, struct sockaddr *sa, + struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int netif_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, @@ -5016,6 +5019,7 @@ static inline void __dev_mc_unsync(struct net_device *dev, /* Functions used for secondary unicast and multicast support */ void dev_set_rx_mode(struct net_device *dev); int dev_set_promiscuity(struct net_device *dev, int inc); +int netif_set_allmulti(struct net_device *dev, int inc, bool notify); int dev_set_allmulti(struct net_device *dev, int inc); void netdev_state_change(struct net_device *dev); void __netdev_notify_peers(struct net_device *dev); -- cgit v1.2.3 From 97246d6d21c21fb4c5235770a21855e457096a96 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:27 -0800 Subject: net: hold netdev instance lock during ndo_bpf Cover the paths that come via bpf system call and XSK bind. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-10-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c61b12809588..ca9c09dab14e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4277,6 +4277,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); u8 dev_xdp_prog_count(struct net_device *dev); +int netif_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf); int dev_xdp_propagate(struct net_device *dev, struct netdev_bpf *bpf); u8 dev_xdp_sb_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); -- cgit v1.2.3 From df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:29 -0800 Subject: net: replace dev_addr_sem with netdev instance lock Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ca9c09dab14e..f3e6e6f27e22 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2492,7 +2492,7 @@ struct net_device { * * Protects: * @gro_flush_timeout, @napi_defer_hard_irqs, @napi_list, - * @net_shaper_hierarchy, @reg_state, @threaded + * @net_shaper_hierarchy, @reg_state, @threaded, @dev_addr * * Partially protects (writers must hold both @lock and rtnl_lock): * @up @@ -4262,10 +4262,6 @@ int netif_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); -int netif_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, - struct netlink_ext_ack *extack); -int dev_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, - struct netlink_ext_ack *extack); int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name); int dev_get_port_parent_id(struct net_device *dev, struct netdev_phys_item_id *ppid, bool recurse); -- cgit v1.2.3 From 605ef7aec0605bd5506b31591de278d652bd0096 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:30 -0800 Subject: net: add option to request netdev instance lock Currently only the drivers that implement shaper or queue APIs are grabbing instance lock. Add an explicit opt-in for the drivers that want to grab the lock without implementing the above APIs. There is a 3-byte hole after @up, use it: /* --- cacheline 47 boundary (3008 bytes) --- */ u32 napi_defer_hard_irqs; /* 3008 4 */ bool up; /* 3012 1 */ /* XXX 3 bytes hole, try to pack */ struct mutex lock; /* 3016 144 */ /* XXX last struct has 1 hole */ Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-13-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f3e6e6f27e22..adf201617b72 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2485,6 +2485,12 @@ struct net_device { */ bool up; + /** + * @request_ops_lock: request the core to run all @netdev_ops and + * @ethtool_ops under the @lock. + */ + bool request_ops_lock; + /** * @lock: netdev-scope lock, protects a small selection of fields. * Should always be taken using netdev_lock() / netdev_unlock() helpers. @@ -2774,7 +2780,7 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev) static inline bool netdev_need_ops_lock(struct net_device *dev) { - bool ret = !!dev->queue_mgmt_ops; + bool ret = dev->request_ops_lock || !!dev->queue_mgmt_ops; #if IS_ENABLED(CONFIG_NET_SHAPER) ret |= !!dev->netdev_ops->net_shaper_ops; -- cgit v1.2.3 From cc34acd577f1a6ed805106bfcc9a262837dbd0da Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:31 -0800 Subject: docs: net: document new locking reality Also clarify ndo_get_stats (that read and write paths can run concurrently) and mention only RCU. Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index adf201617b72..2f8560a354ba 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2505,6 +2505,10 @@ struct net_device { * * Also protects some fields in struct napi_struct. * + * For the drivers that implement shaper or queue API, the scope + * of this lock is expanded to cover most ndo/queue/ethtool/sysfs + * operations. + * * Ordering: take after rtnl_lock. */ struct mutex lock; -- cgit v1.2.3 From 004b5008016a2cc37103bf8d9968573771cd311f Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 5 Mar 2025 08:37:32 -0800 Subject: eth: bnxt: remove most dependencies on RTNL Only devlink and sriov paths are grabbing rtnl explicitly. The rest is covered by netdev instance lock which the core now grabs, so there is no need to manage rtnl in most places anymore. On the core side we can now try to drop rtnl in some places (do_setlink for example) for the drivers that signal non-rtnl mode (TBD). Boot-tested and with `ethtool -L eth1 combined 24` to trigger reset. Cc: Saeed Mahameed Reviewed-by: Michael Chan Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250305163732.2766420-15-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2f8560a354ba..d206c9592b60 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2765,6 +2765,11 @@ static inline void netdev_lock(struct net_device *dev) mutex_lock(&dev->lock); } +static inline bool netdev_trylock(struct net_device *dev) +{ + return mutex_trylock(&dev->lock); +} + static inline void netdev_unlock(struct net_device *dev) { mutex_unlock(&dev->lock); -- cgit v1.2.3 From a2f61f1db85532e72fb8a3af51b06df94bb82912 Mon Sep 17 00:00:00 2001 From: Shahar Shitrit Date: Tue, 4 Mar 2025 18:06:15 +0200 Subject: net/mlx5: Relocate function declarations from port.h to mlx5_core.h The port header is a general file under include, yet it contains declarations for functions that are either not exported or exported but not used outside the mlx5_core driver. To enhance code organization, we move these declarations to mlx5_core.h, where they are more appropriately scoped. This refactor removes unnecessary exported symbols and prevents unexported functions from being inadvertently referenced outside of the mlx5_core driver. Signed-off-by: Shahar Shitrit Reviewed-by: Carolina Jubran Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/20250304160620.417580-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- include/linux/mlx5/port.h | 85 +---------------------------------------------- 1 file changed, 1 insertion(+), 84 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index fd625e0dd869..58770b86f793 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -61,15 +61,6 @@ enum mlx5_an_status { #define MLX5_EEPROM_PAGE_LENGTH 256 #define MLX5_EEPROM_HIGH_PAGE_LENGTH 128 -struct mlx5_module_eeprom_query_params { - u16 size; - u16 offset; - u16 i2c_address; - u32 page; - u32 bank; - u32 module_number; -}; - enum mlx5e_link_mode { MLX5E_1000BASE_CX_SGMII = 0, MLX5E_1000BASE_KX = 1, @@ -145,12 +136,6 @@ enum mlx5_ptys_width { MLX5_PTYS_WIDTH_12X = 1 << 4, }; -struct mlx5_port_eth_proto { - u32 cap; - u32 admin; - u32 oper; -}; - #define MLX5E_PROT_MASK(link_mode) (1U << link_mode) #define MLX5_GET_ETH_PROTO(reg, out, ext, field) \ (ext ? MLX5_GET(reg, out, ext_##field) : \ @@ -163,14 +148,7 @@ int mlx5_query_port_ptys(struct mlx5_core_dev *dev, u32 *ptys, int mlx5_query_ib_port_oper(struct mlx5_core_dev *dev, u16 *link_width_oper, u16 *proto_oper, u8 local_port, u8 plane_index); -void mlx5_toggle_port_link(struct mlx5_core_dev *dev); -int mlx5_set_port_admin_status(struct mlx5_core_dev *dev, - enum mlx5_port_status status); -int mlx5_query_port_admin_status(struct mlx5_core_dev *dev, - enum mlx5_port_status *status); -int mlx5_set_port_beacon(struct mlx5_core_dev *dev, u16 beacon_duration); - -int mlx5_set_port_mtu(struct mlx5_core_dev *dev, u16 mtu, u8 port); + void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu, u8 port); void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu, u8 port); @@ -178,65 +156,4 @@ void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu, int mlx5_query_port_vl_hw_cap(struct mlx5_core_dev *dev, u8 *vl_hw_cap, u8 local_port); -int mlx5_set_port_pause(struct mlx5_core_dev *dev, u32 rx_pause, u32 tx_pause); -int mlx5_query_port_pause(struct mlx5_core_dev *dev, - u32 *rx_pause, u32 *tx_pause); - -int mlx5_set_port_pfc(struct mlx5_core_dev *dev, u8 pfc_en_tx, u8 pfc_en_rx); -int mlx5_query_port_pfc(struct mlx5_core_dev *dev, u8 *pfc_en_tx, - u8 *pfc_en_rx); - -int mlx5_set_port_stall_watermark(struct mlx5_core_dev *dev, - u16 stall_critical_watermark, - u16 stall_minor_watermark); -int mlx5_query_port_stall_watermark(struct mlx5_core_dev *dev, - u16 *stall_critical_watermark, u16 *stall_minor_watermark); - -int mlx5_max_tc(struct mlx5_core_dev *mdev); - -int mlx5_set_port_prio_tc(struct mlx5_core_dev *mdev, u8 *prio_tc); -int mlx5_query_port_prio_tc(struct mlx5_core_dev *mdev, - u8 prio, u8 *tc); -int mlx5_set_port_tc_group(struct mlx5_core_dev *mdev, u8 *tc_group); -int mlx5_query_port_tc_group(struct mlx5_core_dev *mdev, - u8 tc, u8 *tc_group); -int mlx5_set_port_tc_bw_alloc(struct mlx5_core_dev *mdev, u8 *tc_bw); -int mlx5_query_port_tc_bw_alloc(struct mlx5_core_dev *mdev, - u8 tc, u8 *bw_pct); -int mlx5_modify_port_ets_rate_limit(struct mlx5_core_dev *mdev, - u8 *max_bw_value, - u8 *max_bw_unit); -int mlx5_query_port_ets_rate_limit(struct mlx5_core_dev *mdev, - u8 *max_bw_value, - u8 *max_bw_unit); -int mlx5_set_port_wol(struct mlx5_core_dev *mdev, u8 wol_mode); -int mlx5_query_port_wol(struct mlx5_core_dev *mdev, u8 *wol_mode); - -int mlx5_query_ports_check(struct mlx5_core_dev *mdev, u32 *out, int outlen); -int mlx5_set_ports_check(struct mlx5_core_dev *mdev, u32 *in, int inlen); -int mlx5_set_port_fcs(struct mlx5_core_dev *mdev, u8 enable); -void mlx5_query_port_fcs(struct mlx5_core_dev *mdev, bool *supported, - bool *enabled); -int mlx5_query_module_eeprom(struct mlx5_core_dev *dev, - u16 offset, u16 size, u8 *data); -int mlx5_query_module_eeprom_by_page(struct mlx5_core_dev *dev, - struct mlx5_module_eeprom_query_params *params, u8 *data); - -int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out); -int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in); - -int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state); -int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state); -int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio); -int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio); - -int mlx5_port_query_eth_proto(struct mlx5_core_dev *dev, u8 port, bool ext, - struct mlx5_port_eth_proto *eproto); -bool mlx5_ptys_ext_supported(struct mlx5_core_dev *mdev); -u32 mlx5_port_ptys2speed(struct mlx5_core_dev *mdev, u32 eth_proto_oper, - bool force_legacy); -u32 mlx5_port_speed2linkmodes(struct mlx5_core_dev *mdev, u32 speed, - bool force_legacy); -int mlx5_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); - #endif /* __MLX5_PORT_H__ */ -- cgit v1.2.3 From c8be7018d47cfdf36ae6b5bedddca9fc99cd2f0b Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Thu, 6 Mar 2025 18:45:34 +0000 Subject: net: phylink: Remove unused phylink_init_eee phylink_init_eee() is currently unused. It was last added in 2019 by commit 86e58135bc4a ("net: phylink: add phylink_init_eee() helper") but it didn't actually wire a use up. It had previous been removed in 2017 by commit 939eae25d9a5 ("phylink: remove phylink_init_eee()"). Remove it again. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20250306184534.246152-1-linux@treblig.org Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 08df65f6867a..c187267a15b6 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -714,7 +714,6 @@ void phylink_ethtool_get_pauseparam(struct phylink *, int phylink_ethtool_set_pauseparam(struct phylink *, struct ethtool_pauseparam *); int phylink_get_eee_err(struct phylink *); -int phylink_init_eee(struct phylink *, bool); int phylink_ethtool_get_eee(struct phylink *link, struct ethtool_keee *eee); int phylink_ethtool_set_eee(struct phylink *link, struct ethtool_keee *eee); int phylink_mii_ioctl(struct phylink *, struct ifreq *, int); -- cgit v1.2.3 From 248f6571fd4c51531f7f8f07f186f7ae98a50afc Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Tue, 4 Mar 2025 07:50:41 -0800 Subject: netpoll: Optimize skb refilling on critical path netpoll tries to refill the skb queue on every packet send, independently if packets are being consumed from the pool or not. This was particularly problematic while being called from printk(), where the operation would be done while holding the console lock. Introduce a more intelligent approach to skb queue management. Instead of constantly attempting to refill the queue, the system now defers refilling to a work queue and only triggers the workqueue when a buffer is actually dequeued. This change significantly reduces operations with the lock held. Add a work_struct to the netpoll structure for asynchronous refilling, updating find_skb() to schedule refill work only when necessary (skb is dequeued). These changes have demonstrated a 15% reduction in time spent during netpoll_send_msg operations, especially when no SKBs are not consumed from consumed from pool. When SKBs are being dequeued, the improvement is even better, around 70%, mainly because refilling the SKB pool is now happening outside of the critical patch (with console_owner lock held). Signed-off-by: Breno Leitao Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250304-netpoll_refill_v2-v1-1-06e2916a4642@debian.org Signed-off-by: Jakub Kicinski --- include/linux/netpoll.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index f91e50a76efd..f6e8abe0b1f1 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -33,6 +33,7 @@ struct netpoll { u16 local_port, remote_port; u8 remote_mac[ETH_ALEN]; struct sk_buff_head skb_pool; + struct work_struct refill_wq; }; struct netpoll_info { -- cgit v1.2.3 From 8ef890df4031121a94407c84659125cbccd3fdbe Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 7 Mar 2025 10:30:06 -0800 Subject: net: move misc netdev_lock flavors to a separate header Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 81 +---------------------------------------------- 1 file changed, 1 insertion(+), 80 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d206c9592b60..9a297757df7e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2630,40 +2630,6 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev, f(dev, &dev->_tx[i], arg); } -static inline int netdev_lock_cmp_fn(const struct lockdep_map *a, - const struct lockdep_map *b) -{ - /* Only lower devices currently grab the instance lock, so no - * real ordering issues can occur. In the near future, only - * hardware devices will grab instance lock which also does not - * involve any ordering. Suppress lockdep ordering warnings - * until (if) we start grabbing instance lock on pure SW - * devices (bond/team/veth/etc). - */ - if (a == b) - return 0; - return -1; -} - -#define netdev_lockdep_set_classes(dev) \ -{ \ - static struct lock_class_key qdisc_tx_busylock_key; \ - static struct lock_class_key qdisc_xmit_lock_key; \ - static struct lock_class_key dev_addr_list_lock_key; \ - static struct lock_class_key dev_instance_lock_key; \ - unsigned int i; \ - \ - (dev)->qdisc_tx_busylock = &qdisc_tx_busylock_key; \ - lockdep_set_class(&(dev)->addr_list_lock, \ - &dev_addr_list_lock_key); \ - lockdep_set_class(&(dev)->lock, \ - &dev_instance_lock_key); \ - lock_set_cmp_fn(&dev->lock, netdev_lock_cmp_fn, NULL); \ - for (i = 0; i < (dev)->num_tx_queues; i++) \ - lockdep_set_class(&(dev)->_tx[i]._xmit_lock, \ - &qdisc_xmit_lock_key); \ -} - u16 netdev_pick_tx(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev); struct netdev_queue *netdev_core_pick_tx(struct net_device *dev, @@ -2765,56 +2731,11 @@ static inline void netdev_lock(struct net_device *dev) mutex_lock(&dev->lock); } -static inline bool netdev_trylock(struct net_device *dev) -{ - return mutex_trylock(&dev->lock); -} - static inline void netdev_unlock(struct net_device *dev) { mutex_unlock(&dev->lock); } - -static inline void netdev_assert_locked(struct net_device *dev) -{ - lockdep_assert_held(&dev->lock); -} - -static inline void netdev_assert_locked_or_invisible(struct net_device *dev) -{ - if (dev->reg_state == NETREG_REGISTERED || - dev->reg_state == NETREG_UNREGISTERING) - netdev_assert_locked(dev); -} - -static inline bool netdev_need_ops_lock(struct net_device *dev) -{ - bool ret = dev->request_ops_lock || !!dev->queue_mgmt_ops; - -#if IS_ENABLED(CONFIG_NET_SHAPER) - ret |= !!dev->netdev_ops->net_shaper_ops; -#endif - - return ret; -} - -static inline void netdev_lock_ops(struct net_device *dev) -{ - if (netdev_need_ops_lock(dev)) - netdev_lock(dev); -} - -static inline void netdev_unlock_ops(struct net_device *dev) -{ - if (netdev_need_ops_lock(dev)) - netdev_unlock(dev); -} - -static inline void netdev_ops_assert_locked(struct net_device *dev) -{ - if (netdev_need_ops_lock(dev)) - lockdep_assert_held(&dev->lock); -} +/* Additional netdev_lock()-related helpers are in net/netdev_lock.h */ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq); -- cgit v1.2.3 From f6f425f3d251c059d1251edd4f37024290d3efca Mon Sep 17 00:00:00 2001 From: Chiara Meiohas Date: Wed, 26 Feb 2025 15:01:05 +0200 Subject: net/mlx5: Add RDMA_CTRL HW capabilities Add RDMA_CTRL UCTX capabilities and add the RDMA_CTRL general object type in hca_cap_2. Reviewed-by: Moshe Shemesh Signed-off-by: Chiara Meiohas Link: https://patch.msgid.link/ef7eb24be9a6f247ab52e8b4480350072e5182f5.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index cc2875e843f7..3b3d88ffcacc 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1570,6 +1570,8 @@ enum { enum { MLX5_UCTX_CAP_RAW_TX = 1UL << 0, MLX5_UCTX_CAP_INTERNAL_DEV_RES = 1UL << 1, + MLX5_UCTX_CAP_RDMA_CTRL = 1UL << 3, + MLX5_UCTX_CAP_RDMA_CTRL_OTHER_VHCA = 1UL << 4, }; #define MLX5_FC_BULK_SIZE_FACTOR 128 @@ -2140,7 +2142,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 log_min_mkey_entity_size[0x5]; u8 reserved_at_1b0[0x10]; - u8 reserved_at_1c0[0x60]; + u8 general_obj_types_127_64[0x40]; + u8 reserved_at_200[0x20]; u8 reserved_at_220[0x1]; u8 sw_vhca_id_valid[0x1]; @@ -12494,6 +12497,10 @@ enum { MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = BIT_ULL(0x24), }; +enum { + MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL = BIT_ULL(0x13), +}; + enum { MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 0xc, MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13, @@ -12501,6 +12508,7 @@ enum { MLX5_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = 0x24, MLX5_GENERAL_OBJECT_TYPES_MACSEC = 0x27, MLX5_GENERAL_OBJECT_TYPES_INT_KEK = 0x47, + MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL = 0x53, MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS = 0xff15, }; -- cgit v1.2.3 From 0a34fad1bed45ff2245ab8c315bc3d4c6471af46 Mon Sep 17 00:00:00 2001 From: Chiara Meiohas Date: Wed, 26 Feb 2025 15:01:06 +0200 Subject: net/mlx5: Allow the throttle mechanism to be more dynamic Previously, throttle commands were identified and limited based on opcode. These commands were limited to half the command slots using a semaphore, and callback commands checked the opcode to determine semaphore release. To allow exceptions, we introduce a variable to indicate when the throttle lock is held. This allows scenarios where throttle commands are not limited. Callback functions use this variable to determine if the throttle semaphore needs to be released. This patch contains no functional changes. It's a preparation for the next patch. Signed-off-by: Chiara Meiohas Reviewed-by: Tariq Toukan Reviewed-by: Moshe Shemesh Link: https://patch.msgid.link/055d975edeb816ac4c0fd1e665c6157d11947d26.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index af86097641b0..876d6b03a87a 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -989,6 +989,7 @@ struct mlx5_async_work { mlx5_async_cbk_t user_callback; u16 opcode; /* cmd opcode */ u16 op_mod; /* cmd op_mod */ + u8 throttle_locked:1; void *out; /* pointer to the cmd output buffer */ }; -- cgit v1.2.3 From f9deed0980fe29c57488f395528fe3d923c91b1f Mon Sep 17 00:00:00 2001 From: Chiara Meiohas Date: Wed, 26 Feb 2025 15:01:07 +0200 Subject: net/mlx5: Limit non-privileged commands Limit non-privileged UID commands to half of the available command slots when privileged UIDs are present. Privileged throttle commands will not be limited. Use an xarray to store privileged UIDs. Add insert and remove functions for privileged UIDs management. Non-user commands (with uid 0) are not limited. Signed-off-by: Chiara Meiohas Reviewed-by: Moshe Shemesh Reviewed-by: Tariq Toukan Link: https://patch.msgid.link/d2f3dd9a0dbad3c9f2b4bb0723837995e4e06de2.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 876d6b03a87a..4f593a61220d 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -305,6 +305,8 @@ struct mlx5_cmd { struct semaphore sem; struct semaphore pages_sem; struct semaphore throttle_sem; + struct semaphore unprivileged_sem; + struct xarray privileged_uids; } vars; enum mlx5_cmdif_state state; void *cmd_alloc_buf; @@ -990,6 +992,7 @@ struct mlx5_async_work { u16 opcode; /* cmd opcode */ u16 op_mod; /* cmd op_mod */ u8 throttle_locked:1; + u8 unpriv_locked:1; void *out; /* pointer to the cmd output buffer */ }; @@ -1020,6 +1023,8 @@ int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); +int mlx5_cmd_add_privileged_uid(struct mlx5_core_dev *dev, u16 uid); +void mlx5_cmd_remove_privileged_uid(struct mlx5_core_dev *dev, u16 uid); void mlx5_core_uplink_netdev_set(struct mlx5_core_dev *mdev, struct net_device *netdev); void mlx5_core_uplink_netdev_event_replay(struct mlx5_core_dev *mdev); -- cgit v1.2.3 From ab7d228c7e0d0efcac52b81f8514b43985747dc6 Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Wed, 26 Feb 2025 15:01:08 +0200 Subject: net/mlx5: Query ADV_RDMA capabilities Query ADV_RDMA capabilities which provide information for advanced RDMA related features. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Link: https://patch.msgid.link/e3e6ede03ea31cd201078dcdd4e407608e4a5a87.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 5 +++++ include/linux/mlx5/mlx5_ifc.h | 42 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 46 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index fd37f4e54d76..0ae6d69c5221 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1251,6 +1251,7 @@ enum mlx5_cap_type { MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, MLX5_CAP_ADV_VIRTUALIZATION = 0x26, + MLX5_CAP_ADV_RDMA = 0x28, /* NUM OF CAP Types */ MLX5_CAP_NUM }; @@ -1384,6 +1385,10 @@ enum mlx5_qcam_feature_groups { MLX5_GET(adv_virtualization_cap, \ mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap) +#define MLX5_CAP_ADV_RDMA(mdev, cap) \ + MLX5_GET(adv_rdma_cap, \ + mdev->caps.hca[MLX5_CAP_ADV_RDMA]->cur, cap) + #define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \ MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 3b3d88ffcacc..fea8af42f954 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1993,7 +1993,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 max_geneve_tlv_options[0x8]; u8 reserved_at_568[0x3]; u8 max_geneve_tlv_option_data_len[0x5]; - u8 reserved_at_570[0x9]; + u8 reserved_at_570[0x1]; + u8 adv_rdma[0x1]; + u8 reserved_at_572[0x7]; u8 adv_virtualization[0x1]; u8 reserved_at_57a[0x6]; @@ -13076,6 +13078,44 @@ struct mlx5_ifc_load_vhca_state_out_bits { u8 reserved_at_40[0x40]; }; +struct mlx5_ifc_adv_rdma_cap_bits { + u8 rdma_transport_manager[0x1]; + u8 rdma_transport_manager_other_eswitch[0x1]; + u8 reserved_at_2[0x1e]; + + u8 rcx_type[0x8]; + u8 reserved_at_28[0x2]; + u8 ps_entry_log_max_value[0x6]; + u8 reserved_at_30[0x6]; + u8 qp_max_ps_num_entry[0xa]; + + u8 mp_max_num_queues[0x8]; + u8 ps_user_context_max_log_size[0x8]; + u8 message_based_qp_and_striding_wq[0x8]; + u8 reserved_at_58[0x8]; + + u8 max_receive_send_message_size_stride[0x10]; + u8 reserved_at_70[0x10]; + + u8 max_receive_send_message_size_byte[0x20]; + + u8 reserved_at_a0[0x160]; + + struct mlx5_ifc_flow_table_prop_layout_bits rdma_transport_rx_flow_table_properties; + + struct mlx5_ifc_flow_table_prop_layout_bits rdma_transport_tx_flow_table_properties; + + struct mlx5_ifc_flow_table_fields_supported_2_bits rdma_transport_rx_ft_field_support_2; + + struct mlx5_ifc_flow_table_fields_supported_2_bits rdma_transport_tx_ft_field_support_2; + + struct mlx5_ifc_flow_table_fields_supported_2_bits rdma_transport_rx_ft_field_bitmask_support_2; + + struct mlx5_ifc_flow_table_fields_supported_2_bits rdma_transport_tx_ft_field_bitmask_support_2; + + u8 reserved_at_800[0x3800]; +}; + struct mlx5_ifc_adv_virtualization_cap_bits { u8 reserved_at_0[0x3]; u8 pg_track_log_max_num[0x5]; -- cgit v1.2.3 From 15b103df80b25025040faa8f35164c2595977bdb Mon Sep 17 00:00:00 2001 From: Patrisious Haddad Date: Wed, 26 Feb 2025 15:01:09 +0200 Subject: net/mlx5: fs, add RDMA TRANSPORT steering domain support Add RX and TX RDMA_TRANSPORT flow table namespace, and the ability to create flow tables in those namespaces. The RDMA_TRANSPORT RX and TX are per vport. Packets will traverse through RDMA_TRANSPORT_RX after RDMA_RX and through RDMA_TRANSPORT_TX before RDMA_TX, ensuring proper control and management. RDMA_TRANSPORT domains are managed by the vport group manager. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Link: https://patch.msgid.link/a6b550d9859a197eafa804b9a8d76916ca481da9.1740574103.git.leon@kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 6 ++++++ include/linux/mlx5/fs.h | 10 +++++++--- 2 files changed, 13 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 0ae6d69c5221..8fe56d0362c6 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1346,6 +1346,12 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_FLOWTABLE_RDMA_TX(mdev, cap) \ MLX5_CAP_FLOWTABLE(mdev, flow_table_properties_nic_transmit_rdma.cap) +#define MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_RX(mdev, cap) \ + MLX5_CAP_ADV_RDMA(mdev, rdma_transport_rx_flow_table_properties.cap) + +#define MLX5_CAP_FLOWTABLE_RDMA_TRANSPORT_TX(mdev, cap) \ + MLX5_CAP_ADV_RDMA(mdev, rdma_transport_tx_flow_table_properties.cap) + #define MLX5_CAP_ESW_FLOWTABLE(mdev, cap) \ MLX5_GET(flow_table_eswitch_cap, \ mdev->caps.hca[MLX5_CAP_ESWITCH_FLOW_TABLE]->cur, cap) diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 01cb72d68c23..fd62b2b1611d 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -40,6 +40,7 @@ #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) +#define MLX5_RDMA_TRANSPORT_BYPASS_PRIO 0 #define MLX5_FS_MAX_POOL_SIZE BIT(30) enum mlx5_flow_destination_type { @@ -110,6 +111,8 @@ enum mlx5_flow_namespace_type { MLX5_FLOW_NAMESPACE_RDMA_TX_IPSEC, MLX5_FLOW_NAMESPACE_RDMA_RX_MACSEC, MLX5_FLOW_NAMESPACE_RDMA_TX_MACSEC, + MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_RX, + MLX5_FLOW_NAMESPACE_RDMA_TRANSPORT_TX, }; enum { @@ -194,9 +197,9 @@ struct mlx5_flow_namespace * mlx5_get_flow_namespace(struct mlx5_core_dev *dev, enum mlx5_flow_namespace_type type); struct mlx5_flow_namespace * -mlx5_get_flow_vport_acl_namespace(struct mlx5_core_dev *dev, - enum mlx5_flow_namespace_type type, - int vport); +mlx5_get_flow_vport_namespace(struct mlx5_core_dev *dev, + enum mlx5_flow_namespace_type type, + int vport_idx); struct mlx5_flow_table_attr { int prio; @@ -204,6 +207,7 @@ struct mlx5_flow_table_attr { u32 level; u32 flags; u16 uid; + u16 vport; struct mlx5_flow_table *next_ft; struct { -- cgit v1.2.3 From f550694e88b7b13b647777f889e03e544d9db60c Mon Sep 17 00:00:00 2001 From: Yael Chemla Date: Sun, 9 Mar 2025 20:41:37 +0200 Subject: net/mlx5: Add IFC bits for PPCNT recovery counters group Add recovery counters group layout of PPCNT (Ports Performance Counters Register). This group counts recovery events per link. Also add the corresponding bit in PCAM to indicate this group is supported. Signed-off-by: Yael Chemla Reviewed-by: Cosmin Ratiu Signed-off-by: Tariq Toukan Link: https://patch.msgid.link/1741545697-23041-1-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 1 + include/linux/mlx5/mlx5_ifc.h | 11 ++++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 8fe56d0362c6..904804e995aa 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1517,6 +1517,7 @@ enum { MLX5_PHYSICAL_LAYER_COUNTERS_GROUP = 0x12, MLX5_PER_TRAFFIC_CLASS_CONGESTION_GROUP = 0x13, MLX5_PHYSICAL_LAYER_STATISTICAL_GROUP = 0x16, + MLX5_PHYSICAL_LAYER_RECOVERY_GROUP = 0x1a, MLX5_INFINIBAND_PORT_COUNTERS_GROUP = 0x20, MLX5_INFINIBAND_EXTENDED_PORT_COUNTERS_GROUP = 0x21, }; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index fea8af42f954..2c09df4ee574 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -2645,6 +2645,12 @@ struct mlx5_ifc_field_select_802_1qau_rp_bits { u8 field_select_8021qaurp[0x20]; }; +struct mlx5_ifc_phys_layer_recovery_cntrs_bits { + u8 total_successful_recovery_events[0x20]; + + u8 reserved_at_20[0x7a0]; +}; + struct mlx5_ifc_phys_layer_cntrs_bits { u8 time_since_last_clear_high[0x20]; @@ -4846,6 +4852,7 @@ union mlx5_ifc_eth_cntrs_grp_data_layout_auto_bits { struct mlx5_ifc_ib_ext_port_cntrs_grp_data_layout_bits ib_ext_port_cntrs_grp_data_layout; struct mlx5_ifc_phys_layer_cntrs_bits phys_layer_cntrs; struct mlx5_ifc_phys_layer_statistical_cntrs_bits phys_layer_statistical_cntrs; + struct mlx5_ifc_phys_layer_recovery_cntrs_bits phys_layer_recovery_cntrs; u8 reserved_at_0[0x7c0]; }; @@ -10584,7 +10591,9 @@ struct mlx5_ifc_mtutc_reg_bits { }; struct mlx5_ifc_pcam_enhanced_features_bits { - u8 reserved_at_0[0x1d]; + u8 reserved_at_0[0x10]; + u8 ppcnt_recovery_counters[0x1]; + u8 reserved_at_11[0xc]; u8 fec_200G_per_lane_in_pplm[0x1]; u8 reserved_at_1e[0x2a]; u8 fec_100G_per_lane_in_pplm[0x1]; -- cgit v1.2.3 From a18dfa9925b9ef6107ea3aa5814ca3c704d34a8a Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Thu, 6 Mar 2025 22:34:09 -0500 Subject: ipv6: save dontfrag in cork When spanning datagram construction over multiple send calls using MSG_MORE, per datagram settings are configured on the first send. That is when ip(6)_setup_cork stores these settings for subsequent use in __ip(6)_append_data and others. The only flag that escaped this was dontfrag. As a result, a datagram could be constructed with df=0 on the first sendmsg, but df=1 on a next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in the "diff" scenario. Changing datagram conditions in the middle of constructing an skb makes this already complex code path even more convoluted. It is here unintentional. Bring this flag in line with expected sockopt/cmsg behavior. And stop passing ipc6 to __ip6_append_data, to avoid such issues in the future. This is already the case for __ip_append_data. inet6_cork had a 6 byte hole, so the 1B flag has no impact. Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/ipv6.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index a6e2aadbb91b..5aeeed22f35b 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -207,6 +207,7 @@ struct inet6_cork { struct ipv6_txoptions *opt; u8 hop_limit; u8 tclass; + u8 dontfrag:1; }; /* struct ipv6_pinfo - ipv6 private area */ -- cgit v1.2.3 From 0a13c1e0a449917b29c45d90701eededa69c99d3 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Fri, 7 Mar 2025 20:47:26 -0800 Subject: net: revert to lockless TC_SETUP_BLOCK and TC_SETUP_FT There is a couple of places from which we can arrive to ndo_setup_tc with TC_SETUP_BLOCK/TC_SETUP_FT: - netlink - netlink notifier - netdev notifier Locking netdev too deep in this call chain seems to be problematic (especially assuming some/all of the call_netdevice_notifiers NETDEV_UNREGISTER) might soon be running with the instance lock). Revert to lockless ndo_setup_tc for TC_SETUP_BLOCK/TC_SETUP_FT. NFT framework already takes care of most of the locking. Document the assumptions. ndo_setup_tc TC_SETUP_BLOCK nft_block_offload_cmd nft_chain_offload_cmd nft_flow_block_chain nft_flow_offload_chain nft_flow_rule_offload_abort nft_flow_rule_offload_commit nft_flow_rule_offload_commit nf_tables_commit nfnetlink_rcv_batch nfnetlink_rcv_skb_batch nfnetlink_rcv nft_offload_netdev_event NETDEV_UNREGISTER notifier ndo_setup_tc TC_SETUP_FT nf_flow_table_offload_cmd nf_flow_table_offload_setup nft_unregister_flowtable_hook nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nfnetlink_rcv_batch (.call NFNL_CB_BATCH) nft_flowtable_update nf_tables_newflowtable nft_flowtable_event nf_tables_flowtable_event NETDEV_UNREGISTER notifier __nft_unregister_flowtable_net_hooks nft_unregister_flowtable_net_hooks nf_tables_commit nfnetlink_rcv_batch (.call NFNL_CB_BATCH) __nf_tables_abort nf_tables_abort nfnetlink_rcv_batch __nft_release_hook __nft_release_hooks nf_tables_pre_exit_net -> module unload nft_rcv_nl_event netlink_register_notifier (oh boy) nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nf_tables_newflowtable Fixes: c4f0f30b424e ("net: hold netdev instance lock during nft ndo_setup_tc") Signed-off-by: Stanislav Fomichev Reported-by: syzbot+0afb4bcf91e5a1afdcad@syzkaller.appspotmail.com Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250308044726.1193222-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9a297757df7e..0dbfe069a6e3 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3316,8 +3316,6 @@ int dev_open(struct net_device *dev, struct netlink_ext_ack *extack); void netif_close(struct net_device *dev); void dev_close(struct net_device *dev); void dev_close_many(struct list_head *head, bool unlink); -int dev_setup_tc(struct net_device *dev, enum tc_setup_type type, - void *type_data); void netif_disable_lro(struct net_device *dev); void dev_disable_lro(struct net_device *dev); int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb); -- cgit v1.2.3 From 023af5a72ab161f2e661afb53e3b6a6901f6ba00 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Wed, 5 Mar 2025 23:38:48 +0100 Subject: gso: AccECN support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Handling the CWR flag differs between RFC 3168 ECN and AccECN. With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared starting from 2nd segment which is incompatible how AccECN handles the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN. With AccECN, CWR flag (or more accurately, the ACE field that also includes ECE & AE flags) changes only when new packet(s) with CE mark arrives so the flag should not be changed within a super-skb. The new skb/feature flags are necessary to prevent such TSO engines corrupting AccECN ACE counters by clearing the CWR flag (if the CWR handling feature cannot be turned off). If NIC is completely unaware of RFC3168 ECN (doesn't support NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag despite supporting also NETIF_F_TSO_ECN, TSO could be safely used with AccECN on such NIC. This should be evaluated per NIC basis (not done in this patch series for any NICs). For the cases, where TSO cannot keep its hands off the CWR flag, a GSO fallback is provided by this patch. Signed-off-by: Ilpo Järvinen Signed-off-by: Chia-Yu Chang Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdev_features.h | 8 +++++--- include/linux/netdevice.h | 2 ++ include/linux/skbuff.h | 2 ++ 3 files changed, 9 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 11be70a7929f..7a01c518e573 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -53,12 +53,12 @@ enum { NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */ NETIF_F_GSO_UDP_L4_BIT, /* ... UDP payload GSO (not UFO) */ NETIF_F_GSO_FRAGLIST_BIT, /* ... Fraglist GSO */ + NETIF_F_GSO_ACCECN_BIT, /* TCP AccECN w/ TSO (no clear CWR) */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_FRAGLIST_BIT, + NETIF_F_GSO_ACCECN_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ - __UNUSED_NETIF_F_37, NETIF_F_NTUPLE_BIT, /* N-tuple filters supported */ NETIF_F_RXHASH_BIT, /* Receive hashing offload */ NETIF_F_RXCSUM_BIT, /* Receive checksumming offload */ @@ -128,6 +128,7 @@ enum { #define NETIF_F_SG __NETIF_F(SG) #define NETIF_F_TSO6 __NETIF_F(TSO6) #define NETIF_F_TSO_ECN __NETIF_F(TSO_ECN) +#define NETIF_F_GSO_ACCECN __NETIF_F(GSO_ACCECN) #define NETIF_F_TSO __NETIF_F(TSO) #define NETIF_F_VLAN_CHALLENGED __NETIF_F(VLAN_CHALLENGED) #define NETIF_F_RXFCS __NETIF_F(RXFCS) @@ -210,7 +211,8 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID) /* List of features with software fallbacks. */ -#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP | \ +#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \ + NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \ NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST) /* diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0dbfe069a6e3..67527243459b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -5269,6 +5269,8 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_TCP_ACCECN != + (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT)); return (features & feature) == feature; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 14517e95a46c..b8a1343d6785 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -708,6 +708,8 @@ enum { SKB_GSO_UDP_L4 = 1 << 17, SKB_GSO_FRAGLIST = 1 << 18, + + SKB_GSO_TCP_ACCECN = 1 << 19, }; #if BITS_PER_LONG > 32 -- cgit v1.2.3 From 82d3639ef7dc54d5b5cb454d9a13005202d7a701 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Sun, 9 Mar 2025 20:07:42 +0200 Subject: net/mlx5: fs, add support for flow meters HWS action Add support for HW Steering action of flow meter range. Flow meters range can use one HWS action for the whole range. Thus, share a cached HWS action among rules that use same flow meter object range. Hold refcount for each rule using the cached action. Signed-off-by: Moshe Shemesh Reviewed-by: Mark Bloch Signed-off-by: Tariq Toukan Reviewed-by: Jacob Keller Link: https://patch.msgid.link/1741543663-22123-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index fd62b2b1611d..939e58c2f386 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -244,6 +244,7 @@ void mlx5_destroy_flow_group(struct mlx5_flow_group *fg); struct mlx5_exe_aso { u32 object_id; + int base_id; u8 type; u8 return_reg_id; union { -- cgit v1.2.3 From 43e2aa56aea2e292c209cfd77c3f2d8553fb66e9 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 9 Mar 2025 21:04:14 +0100 Subject: net: phy: move PHY package MMD access function declarations from phy.h to phylib.h These functions are used by PHY drivers only, therefore move their declaration to phylib.h. Signed-off-by: Heiner Kallweit Reviewed-by: Simon Horman Link: https://patch.msgid.link/406c8a20-b62e-4ee3-b174-b566724a0876@gmail.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index c4a6385faf41..fc028bab10b7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -2107,18 +2107,10 @@ int __phy_hwtstamp_set(struct phy_device *phydev, struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); -int __phy_package_read_mmd(struct phy_device *phydev, - unsigned int addr_offset, int devad, - u32 regnum); - int phy_package_read_mmd(struct phy_device *phydev, unsigned int addr_offset, int devad, u32 regnum); -int __phy_package_write_mmd(struct phy_device *phydev, - unsigned int addr_offset, int devad, - u32 regnum, u16 val); - int phy_package_write_mmd(struct phy_device *phydev, unsigned int addr_offset, int devad, u32 regnum, u16 val); -- cgit v1.2.3 From 8ea221b22172a2cc5d0edbfec4b34ef3fe8de167 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Sun, 9 Mar 2025 21:05:08 +0100 Subject: net: phy: remove unused functions phy_package_[read|write]_mmd These functions have never had a user, so remove them. Signed-off-by: Heiner Kallweit Reviewed-by: Simon Horman Link: https://patch.msgid.link/5792e2cd-6f0a-4f7d-a5ef-b932f94d82f3@gmail.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index fc028bab10b7..61a8cb9d1247 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -2107,14 +2107,6 @@ int __phy_hwtstamp_set(struct phy_device *phydev, struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); -int phy_package_read_mmd(struct phy_device *phydev, - unsigned int addr_offset, int devad, - u32 regnum); - -int phy_package_write_mmd(struct phy_device *phydev, - unsigned int addr_offset, int devad, - u32 regnum, u16 val); - extern const struct bus_type mdio_bus_type; struct mdio_board_info { -- cgit v1.2.3 From 35b862eac9099bb492229627c44e541633fb5938 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Mon, 10 Mar 2025 11:10:52 +0000 Subject: net: phylink: expand on .pcs_config() method documentation Expand on the requirements of the .pcs_config() method documentation, specifically mentioning that it should cause minimal disruption to an established link, and that it should return a positive non-zero value when requiring the .pcs_an_restart() method to be called. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1trb24-005oVq-Is@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni --- include/linux/phylink.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index c187267a15b6..79876c84ae81 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -595,6 +595,14 @@ void pcs_get_state(struct phylink_pcs *pcs, unsigned int neg_mode, * The %neg_mode argument should be tested via the phylink_mode_*() family of * functions, or for PCS that set pcs->neg_mode true, should be tested * against the PHYLINK_PCS_NEG_* definitions. + * + * pcs_config() will be called when configuration of the PCS is required + * or when the advertisement is possibly updated. It must not unnecessarily + * disrupt an established link. + * + * When an autonegotiation restart is required for 802.3z modes, .pcs_config() + * should return a positive non-zero integer (e.g. 1) to indicate to phylink + * to call the pcs_an_restart() method. */ int pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode, phy_interface_t interface, const unsigned long *advertising, -- cgit v1.2.3 From 79f88a584e35133359c394506b351a60230cf37b Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Fri, 7 Mar 2025 18:35:58 +0100 Subject: net: ethtool: Export the link_mode_params definitions link_mode_params contains a lookup table of all 802.3 link modes that are currently supported with structured data about each mode's speed, duplex, number of lanes and mediums. As a preparation for a port representation, export that table for the rest of the net stack to use. Signed-off-by: Maxime Chevallier Link: https://patch.msgid.link/20250307173611.129125-2-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni --- include/linux/ethtool.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 7f222dccc7d1..8210ece94fa6 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -210,6 +210,14 @@ static inline u8 *ethtool_rxfh_context_key(struct ethtool_rxfh_context *ctx) void ethtool_rxfh_context_lost(struct net_device *dev, u32 context_id); +struct link_mode_info { + int speed; + u8 lanes; + u8 duplex; +}; + +extern const struct link_mode_info link_mode_params[]; + /* declare a link mode bitmap */ #define __ETHTOOL_DECLARE_LINK_MODE_MASK(name) \ DECLARE_BITMAP(name, __ETHTOOL_LINK_MODE_MASK_NBITS) -- cgit v1.2.3 From 8c8c4a87933dd924f9fb56dfd35bae7e8f30a4b5 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Fri, 7 Mar 2025 18:36:00 +0100 Subject: net: phy: phy_caps: Move phy_speeds to phy_caps Use the newly introduced link_capabilities array to derive the list of possible speeds when given a combination of linkmodes. As link_capabilities is indexed by speed, we don't have to iterate the whole phy_settings array. Signed-off-by: Maxime Chevallier Link: https://patch.msgid.link/20250307173611.129125-4-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 61a8cb9d1247..83c50bb21939 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1287,8 +1287,6 @@ struct phy_setting { const struct phy_setting * phy_lookup_setting(int speed, int duplex, const unsigned long *mask, bool exact); -size_t phy_speeds(unsigned int *speeds, size_t size, - unsigned long *mask); /** * phy_is_started - Convenience function to check whether PHY is started -- cgit v1.2.3 From ce60fef7feccb5c71d5b49e489d24db7d79c2ac7 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Fri, 7 Mar 2025 18:36:07 +0100 Subject: net: phy: drop phy_settings and the associated lookup helpers The phy_settings array is no longer relevant as it has now been replaced by the link_caps array and associated phy_caps helpers. Signed-off-by: Maxime Chevallier Link: https://patch.msgid.link/20250307173611.129125-11-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 83c50bb21939..c24e1a565819 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1275,19 +1275,6 @@ const char *phy_rate_matching_to_str(int rate_matching); int phy_interface_num_ports(phy_interface_t interface); -/* A structure for mapping a particular speed and duplex - * combination to a particular SUPPORTED and ADVERTISED value - */ -struct phy_setting { - u32 speed; - u8 duplex; - u8 bit; -}; - -const struct phy_setting * -phy_lookup_setting(int speed, int duplex, const unsigned long *mask, - bool exact); - /** * phy_is_started - Convenience function to check whether PHY is started * @phydev: The phy_device struct -- cgit v1.2.3 From 8d4880db378350f8ed8969feea13bdc164564fc1 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Tue, 11 Mar 2025 21:42:28 +0100 Subject: udp_tunnel: create a fastpath GRO lookup. Most UDP tunnels bind a socket to a local port, with ANY address, no peer and no interface index specified. Additionally it's quite common to have a single tunnel device per namespace. Track in each namespace the UDP tunnel socket respecting the above. When only a single one is present, store a reference in the netns. When such reference is not NULL, UDP tunnel GRO lookup just need to match the incoming packet destination port vs the socket local port. The tunnel socket never sets the reuse[port] flag[s]. When bound to no address and interface, no other socket can exist in the same netns matching the specified local port. Matching packets with non-local destination addresses will be aggregated, and eventually segmented as needed - no behavior changes intended. Note that the UDP tunnel socket reference is stored into struct netns_ipv4 for both IPv4 and IPv6 tunnels. That is intentional to keep all the fastpath-related netns fields in the same struct and allow cacheline-based optimization. Currently both the IPv4 and IPv6 socket pointer share the same cacheline as the `udp_table` field. Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/4d5c319c4471161829f50cb8436841de81a5edae.1741718157.git.pabeni@redhat.com Signed-off-by: Paolo Abeni --- include/linux/udp.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/udp.h b/include/linux/udp.h index 0807e21cfec9..895240177f4f 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -101,6 +101,13 @@ struct udp_sock { /* Cache friendly copy of sk->sk_peek_off >= 0 */ bool peeking_with_offset; + + /* + * Accounting for the tunnel GRO fastpath. + * Unprotected by compilers guard, as it uses space available in + * the last UDP socket cacheline. + */ + struct hlist_node tunnel_list; }; #define udp_test_bit(nr, sk) \ @@ -219,4 +226,13 @@ static inline void udp_allow_gso(struct sock *sk) #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) +static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) +{ +#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) + return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); +#else + return NULL; +#endif +} + #endif /* _LINUX_UDP_H */ -- cgit v1.2.3 From 24faa63bcea88b6f24b0a3a710708505a876f9ba Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Wed, 12 Mar 2025 14:34:50 +0800 Subject: net: skbuff: Remove unused skb_add_data() Since commit a4ea4c477619 ("rxrpc: Don't use a ring buffer for call Tx queue") this function is not used anymore. Signed-off-by: Yue Haibing Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250312063450.183652-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 19 ------------------- 1 file changed, 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b8a1343d6785..cd8294cdc249 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3867,25 +3867,6 @@ static inline int __must_check skb_put_padto(struct sk_buff *skb, unsigned int l bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) __must_check; -static inline int skb_add_data(struct sk_buff *skb, - struct iov_iter *from, int copy) -{ - const int off = skb->len; - - if (skb->ip_summed == CHECKSUM_NONE) { - __wsum csum = 0; - if (csum_and_copy_from_iter_full(skb_put(skb, copy), copy, - &csum, from)) { - skb->csum = csum_block_add(skb->csum, csum, off); - return 0; - } - } else if (copy_from_iter_full(skb_put(skb, copy), copy, from)) - return 0; - - __skb_trim(skb, off); - return -EFAULT; -} - static inline bool skb_can_coalesce(struct sk_buff *skb, int i, const struct page *page, int off) { -- cgit v1.2.3 From db5e8ea155fc1d89c87cb81f0e4a681a77b9b03f Mon Sep 17 00:00:00 2001 From: Jan Glaza Date: Tue, 4 Mar 2025 12:08:31 +0100 Subject: virtchnl: make proto and filter action count unsigned The count field in virtchnl_proto_hdrs and virtchnl_filter_action_set should never be negative while still being valid. Changing it from int to u32 ensures proper handling of values in virtchnl messages in driverrs and prevents unintended behavior. In its current signed form, a negative count does not trigger an error in ice driver but instead results in it being treated as 0. This can lead to unexpected outcomes when processing messages. By using u32, any invalid values will correctly trigger -EINVAL, making error detection more robust. Fixes: 1f7ea1cd6a374 ("ice: Enable FDIR Configure for AVF") Reviewed-by: Jedrzej Jagielski Reviewed-by: Simon Horman Signed-off-by: Jan Glaza Signed-off-by: Martyna Szapar-Mudlaw Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- include/linux/avf/virtchnl.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h index 13a11f3c09b8..aca06f300f83 100644 --- a/include/linux/avf/virtchnl.h +++ b/include/linux/avf/virtchnl.h @@ -1283,7 +1283,7 @@ struct virtchnl_proto_hdrs { * 2 - from the second inner layer * .... **/ - int count; /* the proto layers must < VIRTCHNL_MAX_NUM_PROTO_HDRS */ + u32 count; /* the proto layers must < VIRTCHNL_MAX_NUM_PROTO_HDRS */ union { struct virtchnl_proto_hdr proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS]; @@ -1335,7 +1335,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(36, virtchnl_filter_action); struct virtchnl_filter_action_set { /* action number must be less then VIRTCHNL_MAX_NUM_ACTIONS */ - int count; + u32 count; struct virtchnl_filter_action actions[VIRTCHNL_MAX_NUM_ACTIONS]; }; -- cgit v1.2.3 From 0c1f1eb65425451b8bcde52c055803cddecd1151 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 12 Mar 2025 09:34:25 +0000 Subject: net: stmmac: allow platforms to use PHY tx clock stop capability Allow platform glue to instruct stmmac to make use of the PHY transmit clock stop capability when deciding whether to allow the transmit clock from the DWMAC core to be stopped. Reviewed-by: Lad Prabhakar Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1tsITp-005vG9-Px@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni --- include/linux/stmmac.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index b6f03ab12595..c4ec8bb8144e 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -183,7 +183,8 @@ struct dwmac4_addrs { #define STMMAC_FLAG_INT_SNAPSHOT_EN BIT(9) #define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI BIT(10) #define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING BIT(11) -#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY BIT(12) +#define STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP BIT(12) +#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY BIT(13) struct plat_stmmacenet_data { int bus_id; -- cgit v1.2.3 From 8033d2aef51722fe74068b52553625ed91ea256c Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 12 Mar 2025 12:05:12 -0700 Subject: Revert "net: replace dev_addr_sem with netdev instance lock" This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8. Cc: Kohei Enju Reviewed-by: Kuniyuki Iwashima Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev Link: https://patch.msgid.link/20250312190513.1252045-2-sdf@fomichev.me Tested-by: Lei Yang Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 67527243459b..0db9fc0afe36 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2498,7 +2498,7 @@ struct net_device { * * Protects: * @gro_flush_timeout, @napi_defer_hard_irqs, @napi_list, - * @net_shaper_hierarchy, @reg_state, @threaded, @dev_addr + * @net_shaper_hierarchy, @reg_state, @threaded * * Partially protects (writers must hold both @lock and rtnl_lock): * @up @@ -4196,6 +4196,10 @@ int netif_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); +int netif_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, + struct netlink_ext_ack *extack); +int dev_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, + struct netlink_ext_ack *extack); int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name); int dev_get_port_parent_id(struct net_device *dev, struct netdev_phys_item_id *ppid, bool recurse); -- cgit v1.2.3 From 6dd132516f8e467f144f7871ff2708ce827417a1 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 12 Mar 2025 12:05:13 -0700 Subject: net: reorder dev_addr_sem lock Lockdep complains about circular lock in 1 -> 2 -> 3 (see below). Change the lock ordering to be: - rtnl_lock - dev_addr_sem - netdev_ops (only for lower devices!) - team_lock (or other per-upper device lock) 1. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_setlink rtnl_lock do_setlink IFLA_ADDRESS on lower netdev_ops dev_addr_sem 2. rtnl_lock -> team_lock -> netdev_ops rtnl_newlink rtnl_lock do_setlink IFLA_MASTER on lower do_set_master team_add_slave team_lock team_port_add dev_set_mtu netdev_ops 3. rtnl_lock -> dev_addr_sem -> team_lock rtnl_newlink rtnl_lock do_setlink IFLA_ADDRESS on upper dev_addr_sem netif_set_mac_address team_set_mac_address team_lock 4. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_lock dev_ifsioc dev_set_mac_address_user __tun_chr_ioctl rtnl_lock dev_set_mac_address_user tap_ioctl rtnl_lock dev_set_mac_address_user dev_set_mac_address_user netdev_lock_ops netif_set_mac_address_user dev_addr_sem v2: - move lock reorder to happen after kmalloc (Kuniyuki) Cc: Kohei Enju Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250312190513.1252045-3-sdf@fomichev.me Tested-by: Lei Yang Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0db9fc0afe36..0c5b1f7f8f3a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4196,8 +4196,6 @@ int netif_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); -int netif_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, - struct netlink_ext_ack *extack); int dev_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name); -- cgit v1.2.3 From 6d6c1ba7824022528dbe3e283fafbd0775424128 Mon Sep 17 00:00:00 2001 From: Uday Shankar Date: Wed, 12 Mar 2025 13:51:46 -0600 Subject: net, treewide: define and use MAC_ADDR_STR_LEN There are a few places in the tree which compute the length of the string representation of a MAC address as 3 * ETH_ALEN - 1. Define a constant for this and use it where relevant. No functionality changes are expected. Signed-off-by: Uday Shankar Reviewed-by: Michal Swiatkowski Acked-by: Johannes Berg Reviewed-by: Breno Leitao Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250312-netconsole-v6-1-3437933e79b8@purestorage.com Signed-off-by: Paolo Abeni --- include/linux/if_ether.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h index 8a9792a6427a..61b7335aa037 100644 --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -19,6 +19,9 @@ #include #include +/* XX:XX:XX:XX:XX:XX */ +#define MAC_ADDR_STR_LEN (3 * ETH_ALEN - 1) + static inline struct ethhdr *eth_hdr(const struct sk_buff *skb) { return (struct ethhdr *)skb_mac_header(skb); -- cgit v1.2.3 From f8a10bed32f5fbede13a5f22fdc4ab8740ea213a Mon Sep 17 00:00:00 2001 From: Uday Shankar Date: Wed, 12 Mar 2025 13:51:47 -0600 Subject: netconsole: allow selection of egress interface via MAC address Currently, netconsole has two methods of configuration - module parameter and configfs. The former interface allows for netconsole activation earlier during boot (by specifying the module parameter on the kernel command line), so it is preferred for debugging issues which arise before userspace is up/the configfs interface can be used. The module parameter syntax requires specifying the egress interface name. This requirement makes it hard to use for a couple reasons: - The egress interface name can be hard or impossible to predict. For example, installing a new network card in a system can change the interface names assigned by the kernel. - When constructing the module parameter, one may have trouble determining the original (kernel-assigned) name of the interface (which is the name that should be given to netconsole) if some stable interface naming scheme is in effect. A human can usually look at kernel logs to determine the original name, but this is very painful if automation is constructing the parameter. For these reasons, allow selection of the egress interface via MAC address when configuring netconsole using the module parameter. Update the netconsole documentation with an example of the new syntax. Selection of egress interface by MAC address via configfs is far less interesting (since when this interface can be used, one should be able to easily convert between MAC address and interface name), so it is left unimplemented. Signed-off-by: Uday Shankar Reviewed-by: Breno Leitao Tested-by: Breno Leitao Reviewed-by: Simon Horman Link: https://patch.msgid.link/20250312-netconsole-v6-2-3437933e79b8@purestorage.com Signed-off-by: Paolo Abeni --- include/linux/netpoll.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index f6e8abe0b1f1..0477208ed9ff 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -25,7 +25,13 @@ union inet_addr { struct netpoll { struct net_device *dev; netdevice_tracker dev_tracker; + /* + * Either dev_name or dev_mac can be used to specify the local + * interface - dev_name is used if it is a nonempty string, else + * dev_mac is used. + */ char dev_name[IFNAMSIZ]; + u8 dev_mac[ETH_ALEN]; const char *name; union inet_addr local_ip, remote_ip; -- cgit v1.2.3 From 45456e38c44eda2f1285601398fd289b3cec7002 Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Wed, 12 Mar 2025 21:30:06 +0100 Subject: net: phy: Allow loopback speed selection for PHY drivers PHY drivers support loopback mode, but it is not possible to select the speed of the loopback mode. The speed is chosen by the set_loopback() operation of the PHY driver. Same is valid for genphy_loopback(). There are PHYs that support loopback with different speeds. Extend set_loopback() to make loopback speed selection possible. Signed-off-by: Gerhard Engleder Reviewed-by: Andrew Lunn Link: https://patch.msgid.link/20250312203010.47429-2-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index c24e1a565819..1c05158e9438 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1136,8 +1136,16 @@ struct phy_driver { int (*set_tunable)(struct phy_device *dev, struct ethtool_tunable *tuna, const void *data); - /** @set_loopback: Set the loopback mood of the PHY */ - int (*set_loopback)(struct phy_device *dev, bool enable); + /** + * @set_loopback: Set the loopback mode of the PHY + * enable selects if the loopback mode is enabled or disabled. If the + * loopback mode is enabled, then the speed of the loopback mode can be + * requested with the speed argument. If the speed argument is zero, + * then any speed can be selected. If the speed argument is > 0, then + * this speed shall be selected for the loopback mode or EOPNOTSUPP + * shall be returned if speed selection is not supported. + */ + int (*set_loopback)(struct phy_device *dev, bool enable, int speed); /** @get_sqi: Get the signal quality indication */ int (*get_sqi)(struct phy_device *dev); /** @get_sqi_max: Get the maximum signal quality indication */ @@ -1915,7 +1923,7 @@ int genphy_read_status(struct phy_device *phydev); int genphy_read_master_slave(struct phy_device *phydev); int genphy_suspend(struct phy_device *phydev); int genphy_resume(struct phy_device *phydev); -int genphy_loopback(struct phy_device *phydev, bool enable); +int genphy_loopback(struct phy_device *phydev, bool enable, int speed); int genphy_soft_reset(struct phy_device *phydev); irqreturn_t genphy_handle_interrupt_no_ack(struct phy_device *phydev); @@ -1957,7 +1965,7 @@ int genphy_c45_pma_baset1_read_master_slave(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); int genphy_c45_baset1_read_status(struct phy_device *phydev); int genphy_c45_config_aneg(struct phy_device *phydev); -int genphy_c45_loopback(struct phy_device *phydev, bool enable); +int genphy_c45_loopback(struct phy_device *phydev, bool enable, int speed); int genphy_c45_pma_resume(struct phy_device *phydev); int genphy_c45_pma_suspend(struct phy_device *phydev); int genphy_c45_fast_retrain(struct phy_device *phydev, bool enable); -- cgit v1.2.3 From 0d60fd50328a96a901b09ed653704ce7f41d15ce Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Wed, 12 Mar 2025 21:30:07 +0100 Subject: net: phy: Support speed selection for PHY loopback phy_loopback() leaves it to the PHY driver to select the speed of the loopback mode. Thus, the speed of the loopback mode depends on the PHY driver in use. Add support for speed selection to phy_loopback() to enable loopback with defined speeds. Ensure that link up is signaled if speed changes as speed is not allowed to change during link up. Link down and up is necessary for a new speed. Signed-off-by: Gerhard Engleder Link: https://patch.msgid.link/20250312203010.47429-3-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni --- include/linux/phy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 1c05158e9438..60d3b8860ea2 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1808,7 +1808,7 @@ int phy_init_hw(struct phy_device *phydev); int phy_suspend(struct phy_device *phydev); int phy_resume(struct phy_device *phydev); int __phy_resume(struct phy_device *phydev); -int phy_loopback(struct phy_device *phydev, bool enable); +int phy_loopback(struct phy_device *phydev, bool enable, int speed); int phy_sfp_connect_phy(void *upstream, struct phy_device *phy); void phy_sfp_disconnect_phy(void *upstream, struct phy_device *phy); void phy_sfp_attach(void *upstream, struct sfp_bus *bus); -- cgit v1.2.3 From ed3ba9b6e280e14cc3148c1b226ba453f02fa76c Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Sun, 16 Mar 2025 12:28:37 -0700 Subject: net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF. SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to br_ioctl_call(), which causes unnecessary RTNL dance and the splat below [0] under RTNL pressure. Let's say Thread A is trying to detach a device from a bridge and Thread B is trying to remove the bridge. In dev_ioctl(), Thread A bumps the bridge device's refcnt by netdev_hold() and releases RTNL because the following br_ioctl_call() also re-acquires RTNL. In the race window, Thread B could acquire RTNL and try to remove the bridge device. Then, rtnl_unlock() by Thread B will release RTNL and wait for netdev_put() by Thread A. Thread A, however, must hold RTNL after the unlock in dev_ifsioc(), which may take long under RTNL pressure, resulting in the splat by Thread B. Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR) ---------------------- ---------------------- sock_ioctl sock_ioctl `- sock_do_ioctl `- br_ioctl_call `- dev_ioctl `- br_ioctl_stub |- rtnl_lock | |- dev_ifsioc ' ' |- dev = __dev_get_by_name(...) |- netdev_hold(dev, ...) . / |- rtnl_unlock ------. | | |- br_ioctl_call `---> |- rtnl_lock Race | | `- br_ioctl_stub |- br_del_bridge Window | | | |- dev = __dev_get_by_name(...) | | | May take long | `- br_dev_delete(dev, ...) | | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...) | | | | `- rtnl_unlock \ | |- rtnl_lock <-' `- netdev_run_todo | |- ... `- netdev_run_todo | `- rtnl_unlock |- __rtnl_unlock | |- netdev_wait_allrefs_any |- netdev_put(dev, ...) <----------------' Wait refcnt decrement and log splat below To avoid blocking SIOCBRDELBR unnecessarily, let's not call dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF. In the dev_ioctl() path, we do the following: 1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl() 2. Check CAP_NET_ADMIN in dev_ioctl() 3. Call dev_load() in dev_ioctl() 4. Fetch the master dev from ifr.ifr_name in dev_ifsioc() 3. can be done by request_module() in br_ioctl_call(), so we move 1., 2., and 4. to br_ioctl_stub(). Note that 2. is also checked later in add_del_if(), but it's better performed before RTNL. SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since the pre-git era, and there seems to be no specific reason to process them there. [0]: unregister_netdevice: waiting for wpan3 to become free. Usage count = 2 ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4282 [inline] netdev_hold include/linux/netdevice.h:4311 [inline] dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624 dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826 sock_do_ioctl+0x1ca/0x260 net/socket.c:1213 sock_ioctl+0x23a/0x6c0 net/socket.c:1318 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl fs/ioctl.c:892 [inline] __x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 893b19587534 ("net: bridge: fix ioctl locking") Reported-by: syzkaller Reported-by: yan kang Reported-by: yue sun Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/ Signed-off-by: Kuniyuki Iwashima Acked-by: Stanislav Fomichev Reviewed-by: Ido Schimmel Acked-by: Nikolay Aleksandrov Link: https://patch.msgid.link/20250316192851.19781-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni --- include/linux/if_bridge.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 3ff96ae31bf6..c5fe3b2a53e8 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -65,11 +65,9 @@ struct br_ip_list { #define BR_DEFAULT_AGEING_TIME (300 * HZ) struct net_bridge; -void brioctl_set(int (*hook)(struct net *net, struct net_bridge *br, - unsigned int cmd, struct ifreq *ifr, +void brioctl_set(int (*hook)(struct net *net, unsigned int cmd, void __user *uarg)); -int br_ioctl_call(struct net *net, struct net_bridge *br, unsigned int cmd, - struct ifreq *ifr, void __user *uarg); +int br_ioctl_call(struct net *net, unsigned int cmd, void __user *uarg); #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_IGMP_SNOOPING) int br_multicast_list_adjacent(struct net_device *dev, -- cgit v1.2.3 From ca1914a32cdcad26c4b003df743fe4f9e4bb2877 Mon Sep 17 00:00:00 2001 From: Ihor Matushchak Date: Sun, 16 Mar 2025 08:15:51 +0100 Subject: net: phy: phy_interface_t: Fix RGMII_TXID code comment Fix copy-paste error in the code comment for Interface Mode definitions. The code refers to Internal TX delay, not Internal RX delay. It was likely copied from the line above this one. Signed-off-by: Ihor Matushchak Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20250316071551.9794-1-ihor.matushchak@foobox.net Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 60d3b8860ea2..bfdbdc538910 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -81,7 +81,7 @@ extern const int phy_basic_ports_array[3]; * @PHY_INTERFACE_MODE_RGMII: Reduced gigabit media-independent interface * @PHY_INTERFACE_MODE_RGMII_ID: RGMII with Internal RX+TX delay * @PHY_INTERFACE_MODE_RGMII_RXID: RGMII with Internal RX delay - * @PHY_INTERFACE_MODE_RGMII_TXID: RGMII with Internal RX delay + * @PHY_INTERFACE_MODE_RGMII_TXID: RGMII with Internal TX delay * @PHY_INTERFACE_MODE_RTBI: Reduced TBI * @PHY_INTERFACE_MODE_SMII: Serial MII * @PHY_INTERFACE_MODE_XGMII: 10 gigabit media-independent interface -- cgit v1.2.3 From 1937a0be28c01a13e18912602b8eff08d7db77cf Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Mar 2025 08:53:13 +0000 Subject: tcp: move icsk_clean_acked to a better location As a followup of my presentation in Zagreb for netdev 0x19: icsk_clean_acked is only used by TCP when/if CONFIG_TLS_DEVICE is enabled from tcp_ack(). Rename it to tcp_clean_acked, move it to tcp_sock structure in the tcp_sock_read_rx for better cache locality in TCP fast path. Define this field only when CONFIG_TLS_DEVICE is enabled saving 8 bytes on configs not using it. Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Sabrina Dubroca Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20250317085313.2023214-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 159b2c59eb62..1669d95bb0f9 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -244,6 +244,9 @@ struct tcp_sock { struct minmax rtt_min; /* OOO segments go in this rbtree. Socket lock must be held. */ struct rb_root out_of_order_queue; +#if defined(CONFIG_TLS_DEVICE) + void (*tcp_clean_acked)(struct sock *sk, u32 acked_seq); +#endif u32 snd_ssthresh; /* Slow start size threshold */ u8 recvmsg_inq : 1;/* Indicate # of bytes in queue upon recvmsg */ __cacheline_group_end(tcp_sock_read_rx); -- cgit v1.2.3 From c353e8983e0dea5dbba7789033326e1ad34135b7 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Thu, 20 Mar 2025 19:22:38 +0100 Subject: net: introduce per netns packet chains Currently network taps unbound to any interface are linked in the global ptype_all list, affecting the performance in all the network namespaces. Add per netns ptypes chains, so that in the mentioned case only the netns owning the packet socket(s) is affected. While at that drop the global ptype_all list: no in kernel user registers a tap on "any" type without specifying either the target device or the target namespace (and IMHO doing that would not make any sense). Note that this adds a conditional in the fast path (to check for per netns ptype_specific list) and increases the dataset size by a cacheline (owing the per netns lists). Reviewed-by: Sabrina Dubroca Signed-off-by: Paolo Abeni Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/ae405f98875ee87f8150c460ad162de7e466f8a7.1742494826.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0c5b1f7f8f3a..f22cca7c03ad 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4278,7 +4278,17 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev, return 0; } -bool dev_nit_active(struct net_device *dev); +bool dev_nit_active_rcu(const struct net_device *dev); +static inline bool dev_nit_active(const struct net_device *dev) +{ + bool ret; + + rcu_read_lock(); + ret = dev_nit_active_rcu(dev); + rcu_read_unlock(); + return ret; +} + void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); static inline void __dev_put(struct net_device *dev) -- cgit v1.2.3 From 367f1854d442b33c4a0305b068ae40d67ccd7d6a Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 20 Mar 2025 22:11:07 +0000 Subject: net: phylink: add phylink_prepare_resume() When the system is suspended, the PHY may be placed in low-power mode by setting the BMCR 0.11 Power down bit. IEEE 802.3 states that the behaviour of the PHY in this state is implementation specific, and the PHY is not required to meet the RX_CLK and TX_CLK requirements. Essentially, this means that a PHY may stop the clocks that it is generating while in power down state. However, MACs exist which require the clocks from the PHY to be running in order to properly resume. phylink_prepare_resume() provides them with a way to clear the Power down bit early. Note, however, that IEEE 802.3 gives PHYs up to 500ms grace before the transmit and receive clocks meet the requirements after clearing the power down bit. Add a resume preparation function, which will ensure that the receive clock from the PHY is appropriately configured while resuming. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1tvO6V-008Vjb-AP@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 79876c84ae81..06f1b649f173 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -707,6 +707,7 @@ void phylink_start(struct phylink *); void phylink_stop(struct phylink *); void phylink_suspend(struct phylink *pl, bool mac_wol); +void phylink_prepare_resume(struct phylink *pl); void phylink_resume(struct phylink *pl); void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *); -- cgit v1.2.3 From ddf4bd3f738485c84edb98ff96a5759904498e70 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Thu, 20 Mar 2025 22:11:22 +0000 Subject: net: phylink: add functions to block/unblock rx clock stop Some MACs require the PHY receive clock to be running to complete setup actions. This may fail if the PHY has negotiated EEE, the MAC supports receive clock stop, and the link has entered LPI state. Provide a pair of APIs that MAC drivers can use to temporarily block the PHY disabling the receive clock. Signed-off-by: Russell King (Oracle) Link: https://patch.msgid.link/E1tvO6k-008Vjt-MZ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- include/linux/phylink.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 06f1b649f173..1f5773ab5660 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -706,6 +706,9 @@ int phylink_pcs_pre_init(struct phylink *pl, struct phylink_pcs *pcs); void phylink_start(struct phylink *); void phylink_stop(struct phylink *); +void phylink_rx_clk_stop_block(struct phylink *); +void phylink_rx_clk_stop_unblock(struct phylink *); + void phylink_suspend(struct phylink *pl, bool mac_wol); void phylink_prepare_resume(struct phylink *pl); void phylink_resume(struct phylink *pl); -- cgit v1.2.3 From 1f6154227b49c3d3f306f624858e695bfee50aae Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 25 Mar 2025 09:14:27 -0700 Subject: Revert "udp_tunnel: GRO optimizations" Revert "udp_tunnel: use static call for GRO hooks when possible" This reverts commit 311b36574ceaccfa3f91b74054a09cd4bb877702. Revert "udp_tunnel: create a fastpath GRO lookup." This reverts commit 8d4880db378350f8ed8969feea13bdc164564fc1. There are multiple small issues with the series. In the interest of unblocking the merge window let's opt for a revert. Link: https://lore.kernel.org/cover.1742557254.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/udp.h | 16 ---------------- 1 file changed, 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/udp.h b/include/linux/udp.h index 895240177f4f..0807e21cfec9 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -101,13 +101,6 @@ struct udp_sock { /* Cache friendly copy of sk->sk_peek_off >= 0 */ bool peeking_with_offset; - - /* - * Accounting for the tunnel GRO fastpath. - * Unprotected by compilers guard, as it uses space available in - * the last UDP socket cacheline. - */ - struct hlist_node tunnel_list; }; #define udp_test_bit(nr, sk) \ @@ -226,13 +219,4 @@ static inline void udp_allow_gso(struct sock *sk) #define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE) -static inline struct sock *udp_tunnel_sk(const struct net *net, bool is_ipv6) -{ -#if IS_ENABLED(CONFIG_NET_UDP_TUNNEL) - return rcu_dereference(net->ipv4.udp_tunnel_gro[is_ipv6].sk); -#else - return NULL; -#endif -} - #endif /* _LINUX_UDP_H */ -- cgit v1.2.3 From 983e0e4e87bdf465e8424b1902e41bfe51ba128a Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Tue, 18 Mar 2025 21:06:42 +0200 Subject: net-timestamp: COMPLETION timestamp on packet tx completion Add SOF_TIMESTAMPING_TX_COMPLETION, for requesting a software timestamp when hardware reports a packet completed. Completion tstamp is useful for Bluetooth, as hardware timestamps do not exist in the HCI specification except for ISO packets, and the hardware has a queue where packets may wait. In this case the software SND timestamp only reflects the kernel-side part of the total latency (usually small) and queue length (usually 0 unless HW buffers congested), whereas the completion report time is more informative of the true latency. It may also be useful in other cases where HW TX timestamps cannot be obtained and user wants to estimate an upper bound to when the TX probably happened. Signed-off-by: Pauli Virtanen Reviewed-by: Willem de Bruijn Signed-off-by: Luiz Augusto von Dentz --- include/linux/skbuff.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cd8294cdc249..b974a277975a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -478,8 +478,8 @@ enum { /* device driver is going to provide hardware time stamp */ SKBTX_IN_PROGRESS = 1 << 2, - /* reserved */ - SKBTX_RESERVED = 1 << 3, + /* generate software time stamp on packet tx completion */ + SKBTX_COMPLETION_TSTAMP = 1 << 3, /* generate wifi status information (where possible) */ SKBTX_WIFI_STATUS = 1 << 4, @@ -498,7 +498,8 @@ enum { #define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP | \ SKBTX_SCHED_TSTAMP | \ - SKBTX_BPF) + SKBTX_BPF | \ + SKBTX_COMPLETION_TSTAMP) #define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | \ SKBTX_ANY_SW_TSTAMP) -- cgit v1.2.3 From bae2da826196ff4ab439b57683dce883e274faef Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 24 Mar 2025 15:45:28 -0700 Subject: net: remove netif_set_real_num_rx_queues() helper for when SYSFS=n Since commit a953be53ce40 ("net-sysfs: add support for device-specific rx queue sysfs attributes"), so for at least a decade now it is safe to call net_rx_queue_update_kobjects() when SYSFS=n. That function does its own ifdef-inery and will return 0. Remove the unnecessary stub for netif_set_real_num_rx_queues(). Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250324224537.248800-3-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f22cca7c03ad..55859c565f84 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4062,17 +4062,7 @@ static inline bool netif_is_multiqueue(const struct net_device *dev) } int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq); - -#ifdef CONFIG_SYSFS int netif_set_real_num_rx_queues(struct net_device *dev, unsigned int rxq); -#else -static inline int netif_set_real_num_rx_queues(struct net_device *dev, - unsigned int rxqs) -{ - dev->real_num_rx_queues = rxqs; - return 0; -} -#endif int netif_set_real_num_queues(struct net_device *dev, unsigned int txq, unsigned int rxq); -- cgit v1.2.3 From 4b702f8b72c7b05daa1b763fdc0840aa78178c3a Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 24 Mar 2025 15:45:30 -0700 Subject: net: explain "protection types" for the instance lock Try to define some terminology for which fields are protected by which lock and how. Some fields are protected by both rtnl_lock and instance lock which is hard to talk about without having a "key phrase" to refer to a particular protection scheme. "ops protected" fields are defined later in the series, one by one. Add ASSERT_RTNL() to netdev_ops_assert_locked() for drivers not other instance protection of ops. Hopefully it's not too confusion that netdev_lock_ops() does not match the lock which netdev_ops_assert_locked() will assert, exactly. The noun "ops" is in a different place in the name, so I think it's acceptable... Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250324224537.248800-5-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 55859c565f84..2b91fb96a411 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2496,19 +2496,35 @@ struct net_device { * Should always be taken using netdev_lock() / netdev_unlock() helpers. * Drivers are free to use it for other protection. * - * Protects: + * For the drivers that implement shaper or queue API, the scope + * of this lock is expanded to cover most ndo/queue/ethtool/sysfs + * operations. Drivers may opt-in to this behavior by setting + * @request_ops_lock. + * + * @lock protection mixes with rtnl_lock in multiple ways, fields are + * either: + * + * - simply protected by the instance @lock; + * + * - double protected - writers hold both locks, readers hold either; + * + * - ops protected - protected by the lock held around the NDOs + * and other callbacks, that is the instance lock on devices for + * which netdev_need_ops_lock() returns true, otherwise by rtnl_lock; + * + * - double ops protected - always protected by rtnl_lock but for + * devices for which netdev_need_ops_lock() returns true - also + * the instance lock. + * + * Simply protects: * @gro_flush_timeout, @napi_defer_hard_irqs, @napi_list, * @net_shaper_hierarchy, @reg_state, @threaded * - * Partially protects (writers must hold both @lock and rtnl_lock): + * Double protects: * @up * * Also protects some fields in struct napi_struct. * - * For the drivers that implement shaper or queue API, the scope - * of this lock is expanded to cover most ndo/queue/ethtool/sysfs - * operations. - * * Ordering: take after rtnl_lock. */ struct mutex lock; -- cgit v1.2.3 From 0a65dcf6249b75c841b4218426b0d246a805c7e0 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 24 Mar 2025 15:45:31 -0700 Subject: net: designate queue counts as "double ops protected" by instance lock Drivers which opt into instance lock protection of ops should only call set_real_num_*_queues() under the instance lock. This means that queue counts are double protected (writes are under both rtnl_lock and instance lock, readers under either). Some readers may still be under the rtnl_lock, however, so for now we need double protection of writers. OTOH queue API paths are only under the protection of the instance lock, so we need to validate that the instance is actually locking ops, otherwise the input checks we do against queue count are racy. Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250324224537.248800-6-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2b91fb96a411..60ef367d8575 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2523,6 +2523,9 @@ struct net_device { * Double protects: * @up * + * Double ops protects: + * @real_num_rx_queues, @real_num_tx_queues + * * Also protects some fields in struct napi_struct. * * Ordering: take after rtnl_lock. -- cgit v1.2.3 From 310ae9eb2617c62deedef8f121d7ca1ae774fa76 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 24 Mar 2025 15:45:32 -0700 Subject: net: designate queue -> napi linking as "ops protected" netdev netlink is the only reader of netdev_{,rx_}queue->napi, and it already holds netdev->lock. Switch protection of the writes to netdev->lock to "ops protected". The expectation will be now that accessing queue->napi will require netdev->lock for "ops locked" drivers, and rtnl_lock for all other drivers. Current "ops locked" drivers don't require any changes. gve and netdevsim use _locked() helpers right next to netif_queue_set_napi() so they must be holding the instance lock. iavf doesn't call it. bnxt is a bit messy but all paths seem locked. Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250324224537.248800-7-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 60ef367d8575..fa79145518d1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -710,7 +710,7 @@ struct netdev_queue { * slow- / control-path part */ /* NAPI instance for the queue - * Readers and writers must hold RTNL + * "ops protected", see comment about net_device::lock */ struct napi_struct *napi; @@ -2526,7 +2526,8 @@ struct net_device { * Double ops protects: * @real_num_rx_queues, @real_num_tx_queues * - * Also protects some fields in struct napi_struct. + * Also protects some fields in: + * struct napi_struct, struct netdev_queue, struct netdev_rx_queue * * Ordering: take after rtnl_lock. */ -- cgit v1.2.3 From 2c5ac026fd1421cf6a78770b48570b2563ef40b7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 24 Mar 2025 16:39:29 +0200 Subject: =?UTF-8?q?net:=20phy:=20Introduce=20PHY=5FID=5FSIZE=20=E2=80=94?= =?UTF-8?q?=20minimum=20size=20for=20PHY=20ID=20string?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PHY_ID_FMT defines the format specifier "%s:%02x" to form the PHY ID string, where the maximum of the first part is defined in MII_BUS_ID_SIZE, including NUL terminator, and the second part is implied to be 3 as the maximum address is limited to 32, meaning that 2 hex digits is more than enough, plus ':' (colon) delimiter. However, some drivers, which are using PHY_ID_FMT, customise buffer size and do that incorrectly. Introduce a new constant PHY_ID_SIZE that makes the minimum required size explicit, so drivers are encouraged to use it. Suggested-by: "Russell King (Oracle)" Signed-off-by: Andy Shevchenko Reviewed-by: Russell King (Oracle) Link: https://patch.msgid.link/20250324144751.1271761-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index bfdbdc538910..a2bfae80c449 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -292,6 +292,7 @@ static inline long rgmii_clock(int speed) /* Used when trying to connect to a specific phy (mii bus id:phy device id) */ #define PHY_ID_FMT "%s:%02x" +#define PHY_ID_SIZE (MII_BUS_ID_SIZE + 3) #define MII_BUS_ID_SIZE 61 -- cgit v1.2.3