summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2025-10-27sctp: Use sk_clone() in sctp_accept().Kuniyuki Iwashima
sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new socket and calls sctp_sock_migrate() to copy fields from the parent socket to the new socket. sctp_v4_create_accept_sk() allocates sk by sk_alloc(), initialises it by sock_init_data(), and copy a bunch of fields from the parent socekt by sctp_copy_sock(). sctp_sock_migrate() calls sctp_copy_descendant() to copy most fields in sctp_sock from the parent socket by memcpy(). These can be simply replaced by sk_clone(). Let's consolidate sctp_v[46]_create_accept_sk() to sctp_clone_sock() with sk_clone(). We will reuse sctp_clone_sock() for sctp_do_peeloff() and then remove sctp_copy_descendant(). Note that sock_reset_flag(newsk, SOCK_ZAPPED) is not copied to sctp_clone_sock() as sctp does not use SOCK_ZAPPED at all. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251023231751.4168390-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27net: Add sk_clone().Kuniyuki Iwashima
sctp_accept() will use sk_clone_lock(), but it will be called with the parent socket locked, and sctp_migrate() acquires the child lock later. Let's add no lock version of sk_clone_lock(). Note that lockdep complains if we simply use bh_lock_sock_nested(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251023231751.4168390-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27sctp: Don't call sk->sk_prot->init() in sctp_v[46]_create_accept_sk().Kuniyuki Iwashima
sctp_accept() calls sctp_v[46]_create_accept_sk() to allocate a new socket and calls sctp_sock_migrate() to copy fields from the parent socket to the new socket. sctp_v[46]_create_accept_sk() calls sctp_init_sock() to initialise sctp_sock, but most fields are overwritten by sctp_copy_descendant() called from sctp_sock_migrate(). Things done in sctp_init_sock() but not in sctp_sock_migrate() are the following: 1. Copy sk->sk_gso 2. Copy sk->sk_destruct (sctp_v6_init_sock()) 3. Allocate sctp_sock.ep 4. Initialise sctp_sock.pd_lobby 5. Count sk_sockets_allocated_inc(), sock_prot_inuse_add(), and SCTP_DBG_OBJCNT_INC() Let's do these in sctp_copy_sock() and sctp_sock_migrate() and avoid calling sk->sk_prot->init() in sctp_v[46]_create_accept_sk(). Note that sk->sk_destruct is already copied in sctp_copy_sock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251023231751.4168390-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27sctp: Don't copy sk_sndbuf and sk_rcvbuf in sctp_sock_migrate().Kuniyuki Iwashima
sctp_sock_migrate() is called from 2 places. 1) sctp_accept() calls sp->pf->create_accept_sk() before sctp_sock_migrate(), and sp->pf->create_accept_sk() calls sctp_copy_sock(). 2) sctp_do_peeloff() also calls sctp_copy_sock() before sctp_sock_migrate(). sctp_copy_sock() copies sk_sndbuf and sk_rcvbuf from the parent socket. Let's not copy the two fields in sctp_sock_migrate(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251023231751.4168390-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27sctp: Defer SCTP_DBG_OBJCNT_DEC() to sctp_destroy_sock().Kuniyuki Iwashima
SCTP_DBG_OBJCNT_INC() is called only when sctp_init_sock() returns 0 after successfully allocating sctp_sk(sk)->ep. OTOH, SCTP_DBG_OBJCNT_DEC() is called in sctp_close(). The code seems to expect that the socket is always exposed to userspace once SCTP_DBG_OBJCNT_INC() is incremented, but there is a path where the assumption is not true. In sctp_accept(), sctp_sock_migrate() could fail after sctp_init_sock(). Then, sk_common_release() does not call inet_release() nor sctp_close(). Instead, it calls sk->sk_prot->destroy(). Let's move SCTP_DBG_OBJCNT_DEC() from sctp_close() to sctp_destroy_sock(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20251023231751.4168390-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27Merge tag 'batadv-next-pullrequest-20251024' of ↵Jakub Kicinski
https://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - use skb_crc32c() instead of skb_seq_read(), by Sven Eckelmann * tag 'batadv-next-pullrequest-20251024' of https://git.open-mesh.org/linux-merge: batman-adv: use skb_crc32c() instead of skb_seq_read() batman-adv: Start new development cycle ==================== Link: https://patch.msgid.link/20251024092315.232636-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27Merge tag 'batadv-net-pullrequest-20251024' of ↵Jakub Kicinski
https://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - release references to inactive interfaces, by Sven Eckelmann * tag 'batadv-net-pullrequest-20251024' of https://git.open-mesh.org/linux-merge: batman-adv: Release references to inactive interfaces ==================== Link: https://patch.msgid.link/20251024091150.231141-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27net: bridge: Flush multicast groups when snooping is disabledPetr Machata
When forwarding multicast packets, the bridge takes MDB into account when IGMP / MLD snooping is enabled. Currently, when snooping is disabled, the MDB is retained, even though it is not used anymore. At the same time, during the time that snooping is disabled, the IGMP / MLD control packets are obviously ignored, and after the snooping is reenabled, the administrator has to assume it is out of sync. In particular, missed join and leave messages would lead to traffic being forwarded to wrong interfaces. Keeping the MDB entries around thus serves no purpose, and just takes memory. Note also that disabling per-VLAN snooping does actually flush the relevant MDB entries. This patch flushes non-permanent MDB entries as global snooping is disabled. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/5e992df1bb93b88e19c0ea5819e23b669e3dde5d.1761228273.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27net/tls: support setting the maximum payload sizeWilfred Mallawa
During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded. Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel. Currently, there is no way to inform the kernel of such a limit. This patch adds support to a new setsockopt() option `TLS_TX_MAX_PAYLOAD_LEN` that allows for setting the maximum plaintext fragment size. Once set, outgoing records are no larger than the size specified. This option can be used to specify the record size limit. [1] https://www.rfc-editor.org/rfc/rfc8449 Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251022001937.20155-1-wilfred.opensource@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-27xfrm: Skip redundant replay recheck for the hardware offload pathJianbo Liu
The xfrm_replay_recheck() function was introduced to handle the issues arising from asynchronous crypto algorithms. The crypto offload path is now effectively synchronous, as it holds the state lock throughout its operation. This eliminates the race condition, making the recheck an unnecessary overhead. This patch improves performance by skipping the redundant call when crypto_done is true. Additionally, the sequence number assignment is moved to an earlier point in the function. This improves performance by reducing lock contention and places the logic at a more appropriate point, as the full sequence number (including the higher-order bits) can be determined as soon as the packet is received. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-27xfrm: Refactor xfrm_input lock to reduce contention with RSSJianbo Liu
With newer NICs like mlx5 supporting RSS for IPsec crypto offload, packets for a single Security Association (SA) are scattered across multiple CPU cores for parallel processing. The xfrm_state spinlock (x->lock) is held for each packet during xfrm processing. When multiple connections or flows share the same SA, this parallelism causes high lock contention on x->lock, creating a performance bottleneck and limiting scalability. The original xfrm_input() function exacerbated this issue by releasing and immediately re-acquiring x->lock. For hardware crypto offload paths, this unlock/relock sequence is unnecessary and introduces significant overhead. This patch refactors the function to relocate the type_offload->input_tail call for the offload path, performing all necessary work while continuously holding the lock. This reordering is safe, since packets which don't pass the checks below will still fail them with the new code. Performance testing with iperf using multiple parallel streams over a single IPsec SA shows significant improvement in throughput as the number of queues (and thus CPU cores) increases: +-----------+---------------+--------------+-----------------+ | RX queues | Before (Gbps) | After (Gbps) | Improvement (%) | +-----------+---------------+--------------+-----------------+ | 2 | 32.3 | 34.4 | 6.5 | | 4 | 34.4 | 40.0 | 16.3 | | 6 | 24.5 | 38.3 | 56.3 | | 8 | 23.1 | 38.3 | 65.8 | | 12 | 18.1 | 29.9 | 65.2 | | 16 | 16.0 | 25.2 | 57.5 | +-----------+---------------+--------------+-----------------+ Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-27wifi: cfg80211: Add parameters to radio-specific debugfs directoriesRoopni Devanathan
In multi-radio wiphy architecture, where a single wiphy can have multiple radios tied to it, radio specific configuration parameters and global wiphy parameters are maintained for the entire physical device and common to all radios. But, each radio in a wiphy can have different values for each radio configuration parameter like RTS threshold. With the current debugfs directory structure, the values of global wiphy configuration parameters can be viewed, but, values of individual radio configuration parameters cannot be viewed. To address this requirement, maintain separate entries of each radio configuration parameter i.e., RTS threshold in corresponding radio- specific debugfs directory. This way, radio-specific configuration parameters can be maintained along with global wiphy configuration parameters. Whenever the values are changed for one radio, the values for rest of the radios in the wiphy and the global wiphy parameter value will remain intact. Sample output: /# iw phy#0 set rts 100 radio 1 /# iw phy#0 set rts 468 radio 0 /# cat /sys/kernel/debug/ieee80211/phy0/rts_threshold -1 /# cat /sys/kernel/debug/ieee80211/phy0/radio0/radio_rts_threshold 468 /# cat /sys/kernel/debug/ieee80211/phy0/radio1/radio_rts_threshold 100 /# iw phy#0 set rts 500 /# cat /sys/kernel/debug/ieee80211/phy0/rts_threshold 500 /# cat /sys/kernel/debug/ieee80211/phy0/radio0/radio_rts_threshold 500 /# cat /sys/kernel/debug/ieee80211/phy0/radio1/radio_rts_threshold 500 Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20251024044649.483557-3-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: cfg80211: Add debugfs support for multi-radio wiphyRoopni Devanathan
In multi-radio wiphy architecture, where a single wiphy can have multiple radios tied to it, radio specific configuration parameters and global wiphy parameters are maintained for the entire physical device and common to all radios. But, each radio in a wiphy can have different values for each radio configuration parameter, like RTS threshold. With the current debugfs directory structure, the values of global wiphy configuration parameters can be viewed, but, values of individual radio configuration parameters cannot be viewed, as radio specific configuration parameters are not maintained, separately. To address this, in addition to maintaining global wiphy configuration parameters common to all radios, create separate debugfs directories for each radio in a wiphy to maintain parameters corresponding to that radio in this directory. In implementation, maintain a dentry structure in wiphy_radio_cfg, a structure containing radio configurations of a wiphy. This struct is maintained to denote per-radio configurations of a wiphy. Create separate directories representing each radio within phy#X directory in debugfs during wiphy registration. Sample directory structure with this change: ls /sys/kernel/debug/ieee80211/phy0/radio radio0/ radio1/ radio2/ Signed-off-by: Roopni Devanathan <quic_rdevanat@quicinc.com> Link: https://patch.msgid.link/20251024044649.483557-2-quic_rdevanat@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: mac80211: fix missing RX bitrate update for mesh forwarding pathSarika Sharma
Currently, RX bitrate statistics are not updated for packets received on the mesh forwarding path during fast RX processing. This results in incomplete RX rate tracking in station dump outputs for mesh scenarios. Update ieee80211_invoke_fast_rx() to record the RX rate using sta_stats_encode_rate() and store it in the last_rate field of ieee80211_sta_rx_stats when RX_QUEUED is returned from ieee80211_rx_mesh_data(). This ensures that RX bitrate is properly accounted for in both RSS and non-RSS paths. Signed-off-by: Sarika Sharma <sarika.sharma@oss.qualcomm.com> Link: https://patch.msgid.link/20251024043627.1640447-1-sarika.sharma@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: cfg80211: default S1G chandef width to 1MHzLachlan Hodges
When management frames are passed down to be transmitted by usermode, often times the NL80211_ATTR_CHANNEL_WIDTH is not used as its implied to be transmitted on the control width. This can lead to errors during chandef validation as the offsets from the channel center are wrong. Ensure we initialise S1G chandefs to a width of 1MHz rather then 20MHz. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20251021061201.235754-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: mac80211: get probe response chan via ieee80211_get_channel_khzLachlan Hodges
Make use of ieee80211_get_channel_khz() rather then the MHz counterpart to ensure probe responses received on an S1G channel pass the check. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20251021061051.235258-1-lachlan.hodges@morsemicro.com [modify indentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: mac80211: reset CRC valid after CSAJohannes Berg
While waiting for a beacon after CSA, reset the CRC valid so that the next beacon is handled even if it happens to be identical the last one on the old channel. This is an AP bug either way, but it's better to disconnect cleanly than to have lingering CSA state. In the iwlwifi instantiation of this problem, mac80211 is ignoring the beacon but the firmware creates a new CSA, and then crashes later because mac80211/driver didn't do anything about it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251019115024.521ad9c6b87d.I86376900df3d3423185b75bf63358c29f33a5eb6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-27wifi: cfg80211/mac80211: validate radio frequency range for monitor modeRyder Lee
In multi-radio devices, it is possible to have an MLD AP and a monitor interface active at the same time. In such cases, monitor mode may not be able to specify a fixed channel and could end up capturing frames from all radios, including those outside the intended frequency bands. This patch adds frequency validation for monitor mode. Received frames are now only processed if their frequency fall within the allowed ranges of the radios specified by the interface's radio_mask. This prevents monitor mode from capturing frames outside the supported radio. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://patch.msgid.link/700b8284e845d96654eb98431f8eeb5a81503862.1758647858.git.ryder.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-10-24smc: rename smc_find_ism_store_rc to reflect broader usageDust Li
The function smc_find_ism_store_rc() is used to record the reason why a suitable device (either ISM or RDMA) could not be found. However, its name suggests it is ISM-specific, which is misleading. Rename it to better reflect its actual usage. No functional changes. Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251023020012.69609-1-dust.li@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24strparser: fix typo in commentJulia Lawall
The name frags_list doesn't appear in the kernel. It should be frag_list as in the next sentence. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://patch.msgid.link/20251023013051.1728388-1-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24neighbour: Convert rwlock of struct neigh_table to spinlock.Kuniyuki Iwashima
Only neigh_for_each() and neigh_seq_start/stop() are on the reader side of neigh_table.lock. Let's convert rwlock to the plain spinlock. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251022054004.2514876-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24neighbour: Convert RTM_SETNEIGHTBL to RCU.Kuniyuki Iwashima
neightbl_set() fetches neigh_tables[] and updates attributes under write_lock_bh(&tbl->lock), so RTNL is not needed. neigh_table_clear() synchronises RCU only, and rcu_dereference_rtnl() protects nothing here. If we released RCU after fetching neigh_tables[], there would be no synchronisation to block neigh_table_clear() further, so RCU is held until the end of the function. Another option would be to protect neigh_tables[] user with SRCU and add synchronize_srcu() in neigh_table_clear(). But, holding RCU should be fine as we hold write_lock_bh() for the rest of neightbl_set() anyway. Let's perform RTM_SETNEIGHTBL under RCU and drop RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251022054004.2514876-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24neighbour: Convert RTM_GETNEIGHTBL to RCU.Kuniyuki Iwashima
neightbl_dump_info() calls these functions for each neigh_tables[] entry: 1. neightbl_fill_info() for tbl->parms 2. neightbl_fill_param_info() for tbl->parms_list (except tbl->parms) Both functions rely on the table lock (read_lock_bh(&tbl->lock)) and RTNL is not needed. Let's fetch the table under RCU and convert RTM_GETNEIGHTBL to RCU. Note that the first entry of tbl->parms_list is tbl->parms.list and embedded in neigh_table, so list_next_entry() is safe. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251022054004.2514876-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24neighbour: Annotate access to neigh_parms fields.Kuniyuki Iwashima
NEIGH_VAR() is read locklessly in the fast path, and IPv6 ndisc uses NEIGH_VAR_SET() locklessly. The next patch will convert neightbl_dump_info() to RCU. Let's annotate accesses to neigh_param with READ_ONCE() and WRITE_ONCE(). Note that ndisc_ifinfo_sysctl_change() uses &NEIGH_VAR() and we cannot use '&' with READ_ONCE(), so NEIGH_VAR_PTR() is introduced. Note also that NEIGH_VAR_INIT() does not need WRITE_ONCE() as it is before parms is published. Also, the only user hippi_neigh_setup_dev() is no longer called since commit e3804cbebb67 ("net: remove COMPAT_NET_DEV_OPS"), which looks wrong, but probably no one uses HIPPI and RoadRunner. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251022054004.2514876-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24neighbour: Use RCU list helpers for neigh_parms.list writers.Kuniyuki Iwashima
We will convert RTM_GETNEIGHTBL to RCU soon, where we traverse tbl->parms_list under RCU in neightbl_dump_info(). Let's use RCU list helper for neigh_parms in neigh_parms_alloc() and neigh_parms_release(). neigh_table_init() uses the plain list_add() for the default neigh_parm that is embedded in the table and not yet published. Note that neigh_parms_release() already uses call_rcu() to free neigh_parms. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251022054004.2514876-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-24Bluetooth: rfcomm: fix modem control handlingJohan Hovold
The RFCOMM driver confuses the local and remote modem control signals, which specifically means that the reported DTR and RTS state will instead reflect the remote end (i.e. DSR and CTS). This issue dates back to the original driver (and a follow-on update) merged in 2002, which resulted in a non-standard implementation of TIOCMSET that allowed controlling also the TS07.10 IC and DV signals by mapping them to the RI and DCD input flags, while TIOCMGET failed to return the actual state of DTR and RTS. Note that the bogus control of input signals in tiocmset() is just dead code as those flags will have been masked out by the tty layer since 2003. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: hci_core: Fix tracking of periodic advertisementLuiz Augusto von Dentz
Periodic advertising enabled flag cannot be tracked by the enabled flag since advertising and periodic advertising each can be enabled/disabled separately from one another causing the states to be inconsistent when for example an advertising set is disabled its enabled flag is set to false which is then used for periodic which has not being disabled. Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: hci_conn: Fix connection cleanup with BIG with 2 or more BISLuiz Augusto von Dentz
This fixes bis_cleanup not considering connections in BT_OPEN state before attempting to remove the BIG causing the following error: btproxy[20110]: < HCI Command: LE Terminate Broadcast Isochronous Group (0x08|0x006a) plen 2 BIG Handle: 0x01 Reason: Connection Terminated By Local Host (0x16) > HCI Event: Command Status (0x0f) plen 4 LE Terminate Broadcast Isochronous Group (0x08|0x006a) ncmd 1 Status: Unknown Advertising Identifier (0x42) Fixes: fa224d0c094a ("Bluetooth: ISO: Reassociate a socket with an active BIS") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
2025-10-24Bluetooth: ISO: Fix another instance of dst_type handlingLuiz Augusto von Dentz
Socket dst_type cannot be directly assigned to hci_conn->type since there domain is different which may lead to the wrong address type being used. Fixes: 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Revert "Bluetooth: L2CAP: convert timeouts to secs_to_jiffies()"Frédéric Danis
This reverts commit c9d84da18d1e0d28a7e16ca6df8e6d47570501d4. It replaces in L2CAP calls to msecs_to_jiffies() to secs_to_jiffies() and updates the constants accordingly. But the constants are also used in LCAP Configure Request and L2CAP Configure Response which expect values in milliseconds. This may prevent correct usage of L2CAP channel. To fix it, keep those constants in milliseconds and so revert this change. Fixes: c9d84da18d1e ("Bluetooth: L2CAP: convert timeouts to secs_to_jiffies()") Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: MGMT: fix crash in set_mesh_sync and set_mesh_completePauli Virtanen
There is a BUG: KASAN: stack-out-of-bounds in set_mesh_sync due to memcpy from badly declared on-stack flexible array. Another crash is in set_mesh_complete() due to double list_del via mgmt_pending_valid + mgmt_pending_remove. Use DEFINE_FLEX to declare the flexible array right, and don't memcpy outside bounds. As mgmt_pending_valid removes the cmd from list, use mgmt_pending_free, and also report status on error. Fixes: 302a1f674c00d ("Bluetooth: MGMT: Fix possible UAFs") Signed-off-by: Pauli Virtanen <pav@iki.fi> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: HCI: Fix tracking of advertisement set/instance 0x00Luiz Augusto von Dentz
This fixes the state tracking of advertisement set/instance 0x00 which is considered a legacy instance and is not tracked individually by adv_instances list, previously it was assumed that hci_dev itself would track it via HCI_LE_ADV but that is a global state not specifc to instance 0x00, so to fix it a new flag is introduced that only tracks the state of instance 0x00. Fixes: 1488af7b8b5f ("Bluetooth: hci_sync: Fix hci_resume_advertising_sync") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: ISO: Fix BIS connection dst_type handlingLuiz Augusto von Dentz
Socket dst_type cannot be directly assigned to hci_conn->type since there domain is different which may lead to the wrong address type being used. Fixes: 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-24Bluetooth: hci_sync: fix race in hci_cmd_sync_dequeue_onceCen Zhang
hci_cmd_sync_dequeue_once() does lookup and then cancel the entry under two separate lock sections. Meanwhile, hci_cmd_sync_work() can also delete the same entry, leading to double list_del() and "UAF". Fix this by holding cmd_sync_work_lock across both lookup and cancel, so that the entry cannot be removed concurrently. Fixes: 505ea2b29592 ("Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue") Reported-by: Cen Zhang <zzzccc427@163.com> Signed-off-by: Cen Zhang <zzzccc427@163.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-10-23tcp: Remove unnecessary null check in tcp_inbound_md5_hash()Eric Biggers
The 'if (!key && hash_location)' check in tcp_inbound_md5_hash() implies that hash_location might be null. However, later code in the function dereferences hash_location anyway, without checking for null first. Fortunately, there is no real bug, since tcp_inbound_md5_hash() is called only with non-null values of hash_location. Therefore, remove the unnecessary and misleading null check of hash_location. This silences a Smatch static checker warning (https://lore.kernel.org/netdev/aPi4b6aWBbBR52P1@stanley.mountain/) Also fix the related comment at the beginning of the function. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20251022221209.19716-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23net: unix: remove outdated BSD behavior comment in unix_release_sock()Sunday Adelodun
Remove the long-standing comment in unix_release_sock() that described a behavioral difference between Linux and BSD regarding when ECONNRESET is sent to connected UNIX sockets upon closure. As confirmed by testing on macOS (similar to BSD behavior), ECONNRESET is only observed for SOCK_DGRAM sockets, not for SOCK_STREAM. Meanwhile, Linux already returns ECONNRESET in cases where a socket is closed with unread data or is not yet accept()ed. This means the previous comment no longer accurately describes current behavior and is misleading. Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251021195906.20389-1-adelodunolaoluwa@yahoo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23Merge tag 'wireless-2025-10-23' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== First set of fixes: - brcmfmac: long-standing crash when used w/o P2P - iwlwifi: fix for a use-after-free bug - mac80211: key tailroom accounting bug could leave allocation overhead and cause a warning - ath11k: add a missing platform, fix key flag operations - bcma: skip devices disabled in OF/DT - various (potential) memory leaks * tag 'wireless-2025-10-23' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: nl80211: call kfree without a NULL check wifi: mac80211: fix key tailroom accounting leak wifi: brcmfmac: fix crash while sending Action Frames in standalone AP Mode MAINTAINERS: wcn36xx: Add linux-wireless list bcma: don't register devices disabled in OF wifi: mac80211: reset FILS discovery and unsol probe resp intervals wifi: iwlwifi: fix potential use after free in iwl_mld_remove_link() wifi: ath11k: avoid bit operation on key flags wifi: ath12k: free skb during idr cleanup callback wifi: ath11k: Add missing platform IDs for quirk table wifi: ath10k: Fix memory leak on unsupported WMI command ==================== Link: https://patch.msgid.link/20251023180604.626946-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-23Merge tag 'net-6.18-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from can. Slim pickings, I'm guessing people haven't really started testing. Current release - new code bugs: - eth: mlx5e: - psp: avoid 'accel' NULL pointer dereference - skip PPHCR register query for FEC histogram if not supported Previous releases - regressions: - bonding: update the slave array for broadcast mode - rtnetlink: re-allow deleting FDB entries in user namespace - eth: dpaa2: fix the pointer passed to PTR_ALIGN on Tx path Previous releases - always broken: - can: drop skb on xmit if device is in listen-only mode - gro: clear skb_shinfo(skb)->hwtstamps in napi_reuse_skb() - eth: mlx5e - RX, fix generating skb from non-linear xdp_buff if program trims frags - make devcom init failures non-fatal, fix races with IPSec Misc: - some documentation formatting 'fixes'" * tag 'net-6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits) net/mlx5: Fix IPsec cleanup over MPV device net/mlx5: Refactor devcom to return NULL on failure net/mlx5e: Skip PPHCR register query if not supported by the device net/mlx5: Add PPHCR to PCAM supported registers mask virtio-net: zero unused hash fields net: phy: micrel: always set shared->phydev for LAN8814 vsock: fix lock inversion in vsock_assign_transport() ovpn: use datagram_poll_queue for socket readiness in TCP espintcp: use datagram_poll_queue for socket readiness net: datagram: introduce datagram_poll_queue for custom receive queues net: bonding: fix possible peer notify event loss or dup issue net: hsr: prevent creation of HSR device with slaves from another netns sctp: avoid NULL dereference when chunk data buffer is missing ptp: ocp: Fix typo using index 1 instead of i in SMA initialization loop net: ravb: Ensure memory write completes before ringing TX doorbell net: ravb: Enforce descriptor type ordering net: hibmcge: select FIXED_PHY net: dlink: use dev_kfree_skb_any instead of dev_kfree_skb Documentation: networking: ax25: update the mailing list info. net: gro_cells: fix lock imbalance in gro_cells_receive() ...
2025-10-23vsock: fix lock inversion in vsock_assign_transport()Stefano Garzarella
Syzbot reported a potential lock inversion deadlock between vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called. The issue was introduced by commit 687aa0c5581b ("vsock: Fix transport_* TOCTOU") which added vsock_register_mutex locking in vsock_assign_transport() around the transport->release() call, that can call vsock_linger(). vsock_assign_transport() can be called with sk_lock held. vsock_linger() calls sk_wait_event() that temporarily releases and re-acquires sk_lock. During this window, if another thread hold vsock_register_mutex while trying to acquire sk_lock, a circular dependency is created. Fix this by releasing vsock_register_mutex before calling transport->release() and vsock_deassign_transport(). This is safe because we don't need to hold vsock_register_mutex while releasing the old transport, and we ensure the new transport won't disappear by obtaining a module reference first via try_module_get(). Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Fixes: 687aa0c5581b ("vsock: Fix transport_* TOCTOU") Cc: mhal@rbox.co Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-23espintcp: use datagram_poll_queue for socket readinessRalf Lici
espintcp uses a custom queue (ike_queue) to deliver packets to userspace. The polling logic relies on datagram_poll, which checks sk_receive_queue, which can lead to false readiness signals when that queue contains non-userspace packets. Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring poll only signals readiness when userspace data is actually available. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251021100942.195010-3-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-23net: datagram: introduce datagram_poll_queue for custom receive queuesRalf Lici
Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver userspace-bound packets through a custom skb queue rather than the standard sk_receive_queue. Introduce datagram_poll_queue that accepts an explicit receive queue, and convert datagram_poll into a wrapper around datagram_poll_queue. This allows protocols with custom skb queues to reuse the core polling logic without relying on sk_receive_queue. Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-22net: hsr: prevent creation of HSR device with slaves from another netnsFernando Fernandez Mancera
HSR/PRP driver does not handle correctly having slaves/interlink devices in a different net namespace. Currently, it is possible to create a HSR link in a different net namespace than the slaves/interlink with the following command: ip link add hsr0 netns hsr-ns type hsr slave1 eth1 slave2 eth2 As there is no use-case on supporting this scenario, enforce that HSR device link matches netns defined by IFLA_LINK_NETNSID. The iproute2 command mentioned above will throw the following error: Error: hsr: HSR slaves/interlink must be on the same net namespace than HSR link. Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20251020135533.9373-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-22sctp: avoid NULL dereference when chunk data buffer is missingAlexey Simakov
chunk->skb pointer is dereferenced in the if-block where it's supposed to be NULL only. chunk->skb can only be NULL if chunk->head_skb is not. Check for frag_list instead and do it just before replacing chunk->skb. We're sure that otherwise chunk->skb is non-NULL because of outer if() condition. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20251021130034.6333-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21net: dsa: tag_yt921x: add support for Motorcomm YT921x tagsDavid Yang
Add support for Motorcomm YT921x tags, which includes a proper configurable ethertype field (default to 0x9988). Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251017060859.326450-3-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21net: bridge: use common function to compute the featuresHangbin Liu
Previously, bridge ignored all features propagation and DST retention, only handling explicitly the GSO limits. By switching to the new helper netdev_compute_master_upper_features(), the bridge now expose additional features, depending on the lowers capabilities. Since br_set_gso_limits() is already covered by the helper, it can be removed safely. Bridge has it's own way to update needed_headroom. So we don't need to update it in the helper. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251017034155.61990-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21net: add a common function to compute features for upper devicesHangbin Liu
Some high level software drivers need to compute features from lower devices. But each has their own implementations and may lost some feature compute. Let's use one common function to compute features for kinds of these devices. The new helper uses the current bond implementation as the reference one, as the latter already handles all the relevant aspects: netdev features, TSO limits and dst retention. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251017034155.61990-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21net: gro_cells: fix lock imbalance in gro_cells_receive()Eric Dumazet
syzbot found that the local_unlock_nested_bh() call was missing in some cases. WARNING: possible recursive locking detected syzkaller #0 Not tainted -------------------------------------------- syz.2.329/7421 is trying to acquire lock: ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline] ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30 but task is already holding lock: ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:44 [inline] ffffe8ffffd48888 ((&cell->bh_lock)){+...}-{3:3}, at: gro_cells_receive+0x404/0x790 net/core/gro_cells.c:30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((&cell->bh_lock)); lock((&cell->bh_lock)); *** DEADLOCK *** Given the introduction of @have_bh_lock variable, it seems the author intent was to have the local_unlock_nested_bh() after the @unlock label. Fixes: 25718fdcbdd2 ("net: gro_cells: Use nested-BH locking for gro_cell") Reported-by: syzbot+f9651b9a8212e1c8906f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68f65eb9.a70a0220.205af.0034.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20251020161114.1891141-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21devlink: region: correct port region lookup to use port_opsAlok Tiwari
The function devlink_port_region_get_by_name() incorrectly uses region->ops->name to compare the region name. as it is not any critical impact as ops and port_ops define as union for devlink_region but as per code logic it should refer port_ops here. No functional impact as ops and port_ops are part of same union, and name is the first member of both. Update it to use region->port_ops->name to properly reference the name of the devlink port region. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251020170916.1741808-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-21mptcp: pm: in-kernel: C-flag: handle late ADD_ADDRMatthieu Baerts (NGI0)
The special C-flag case expects the ADD_ADDR to be received when switching to 'fully-established'. But for various reasons, the ADD_ADDR could be sent after the "4th ACK", and the special case doesn't work. On NIPA, the new test validating this special case for the C-flag failed a few times, e.g. 102 default limits, server deny join id 0 syn rx [FAIL] got 0 JOIN[s] syn rx expected 2 Server ns stats (...) MPTcpExtAddAddrTx 1 MPTcpExtEchoAdd 1 Client ns stats (...) MPTcpExtAddAddr 1 MPTcpExtEchoAddTx 1 synack rx [FAIL] got 0 JOIN[s] synack rx expected 2 ack rx [FAIL] got 0 JOIN[s] ack rx expected 2 join Rx [FAIL] see above syn tx [FAIL] got 0 JOIN[s] syn tx expected 2 join Tx [FAIL] see above I had a suspicion about what the issue could be: the ADD_ADDR might have been received after the switch to the 'fully-established' state. The issue was not easy to reproduce. The packet capture shown that the ADD_ADDR can indeed be sent with a delay, and the client would not try to establish subflows to it as expected. A simple fix is not to mark the endpoints as 'used' in the C-flag case, when looking at creating subflows to the remote initial IP address and port. In this case, there is no need to try. Note: newly added fullmesh endpoints will still continue to be used as expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case. Fixes: 4b1ff850e0c1 ("mptcp: pm: in-kernel: usable client side with C-flag") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-1-8207030cb0e8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>