summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2026-04-27netpoll: fix IPv6 local-address corruptionBreno Leitao
netpoll_setup() decides whether to auto-populate the local source address by testing np->local_ip.ip, which only inspects the first 4 bytes of the union inet_addr storage. For an IPv6 netpoll whose caller-supplied local address has a zero high-32 bits (::1, ::<suffix>, IPv4-mapped ::ffff:a.b.c.d, etc.), this misdetects the address as unset (which they are not, but the first 4 bytes are empty), calls netpoll_take_ipv6() and overwrites it with whatever matching link-local/global address the device happens to expose first. Introduce a helper netpoll_local_ip_unset() that picks the correct family-aware test (ipv6_addr_any() for IPv6, !.ip for IPv4) and use it from netpoll_setup(). Reproducer is something like: echo "::2" > local_ip echo 1 > enabled cat local_ip # before this fix: 2001:db8::1 (caller-supplied ::2 was clobbered) # after this fix: ::2 Fixes: b7394d2429c1 ("netpoll: prepare for ipv6") Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260424-netpoll_fix-v1-1-3a55348c625f@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27tcp: make probe0 timer handle expired user timeoutAltan Hacigumus
tcp_clamp_probe0_to_user_timeout() computes remaining time in jiffies using subtraction with an unsigned lvalue. If elapsed probing time exceeds the configured TCP_USER_TIMEOUT, the underflow yields a large value. This ends up re-arming the probe timer for a full backoff interval instead of expiring immediately, delaying connection teardown beyond the configured timeout. Fix this by preventing underflow so user-set timeout expiration is handled correctly without extending the probe timer. Fixes: 344db93ae3ee ("tcp: make TCP_USER_TIMEOUT accurate for zero window probes") Link: https://lore.kernel.org/r/20260414013634.43997-1-ahacigu.linux@gmail.com Signed-off-by: Altan Hacigumus <ahacigu.linux@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260424014639.54110-1-ahacigu.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27neigh: let neigh_xmit take skb ownershipFlorian Westphal
neigh_xmit always releases the skb, except when no neighbour table is found. But even the first added user of neigh_xmit (mpls) relied on neigh_xmit to release the skb (or queue it for tx). sashiko reported: If neigh_xmit() is called with an uninitialized neighbor table (for example, NEIGH_ND_TABLE when IPv6 is disabled), it returns -EAFNOSUPPORT and bypasses its internal out_kfree_skb error path. Because the return value of neigh_xmit() is ignored here, does this leak the SKB? Assume full ownership and remove the last code path that doesn't xmit or free skb. Fixes: 4fd3d7d9e868 ("neigh: Add helper function neigh_xmit") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260424145843.74055-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27ipmr: Free mr_table after RCU grace period.Kuniyuki Iwashima
With CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup() does not check if net->ipv4.mrt is NULL. Since default_device_exit_batch() is called after ->exit_rtnl(), a device could receive IGMP packets and access net->ipv4.mrt during/after ipmr_rules_exit_rtnl(). If ipmr_rules_exit_rtnl() had already cleared it and freed the memory, the access would trigger null-ptr-deref or use-after-free. Let's fix it by using RCU helper and free mrt after RCU grace period. In addition, check_net(net) is added to mroute_clean_tables() and ipmr_cache_unresolved() to synchronise via mfc_unres_lock. This prevents ipmr_cache_unresolved() from putting skb into c->_c.mfc_un.unres.unresolved after mroute_clean_tables() purges it. For the same reason, timer_shutdown_sync() is moved after mroute_clean_tables(). Since rhltable_destroy() holds mutex internally, rcu_work is used, and it is placed as the first member because rcu_head must be placed within <4K offset. mr_table is alraedy 3864 bytes without rcu_work. Note that IP6MR is not yet converted to ->exit_rtnl(), so this change is not needed for now but will be. Fixes: b22b01867406 ("ipmr: Convert ipmr_net_exit_batch() to ->exit_rtnl().") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260423053456.4097409-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net: phonet: do not BUG_ON() in pn_socket_autobind() on failed bindMorduan Zang
syzbot reported a kernel BUG triggered from pn_socket_sendmsg() via pn_socket_autobind(): kernel BUG at net/phonet/socket.c:213! RIP: 0010:pn_socket_autobind net/phonet/socket.c:213 [inline] RIP: 0010:pn_socket_sendmsg+0x240/0x250 net/phonet/socket.c:421 Call Trace: sock_sendmsg_nosec+0x112/0x150 net/socket.c:797 __sock_sendmsg net/socket.c:812 [inline] __sys_sendto+0x402/0x590 net/socket.c:2280 ... pn_socket_autobind() calls pn_socket_bind() with port 0 and, on -EINVAL, assumes the socket was already bound and asserts that the port is non-zero: err = pn_socket_bind(sock, ..., sizeof(struct sockaddr_pn)); if (err != -EINVAL) return err; BUG_ON(!pn_port(pn_sk(sock->sk)->sobject)); return 0; /* socket was already bound */ However pn_socket_bind() also returns -EINVAL when sk->sk_state is not TCP_CLOSE, even when the socket has never been bound and pn_port() is still 0. In that case the BUG_ON() fires and panics the kernel from a user-triggerable path. Treat the "bind returned -EINVAL but pn_port() is still 0" case as a regular error and propagate -EINVAL to the caller instead of crashing. Existing callers already translate a non-zero return from pn_socket_autobind() into -ENOBUFS/-EAGAIN, so returning -EINVAL here only changes behaviour from panic to a normal errno. Fixes: ba113a94b750 ("Phonet: common socket glue") Reported-by: syzbot+706f5eb79044e686c794@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=706f5eb79044e686c794 Suggested-by: Remi Denis-Courmont <courmisch@gmail.com> Signed-off-by: Morduan Zang <zhangdandan@uniontech.com> Signed-off-by: zhanjun <zhanjun@uniontech.com> Acked-by: Rémi Denis-Courmont <remi@remlab.net> Link: https://patch.msgid.link/87A8960A2045AF3C+20260423010557.138124-1-zhangdandan@uniontech.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: taprio: fix NULL pointer dereference in class dumpWeiming Shi
When a TAPRIO child qdisc is deleted via RTM_DELQDISC, taprio_graft() is called with new == NULL and stores NULL into q->qdiscs[cl - 1]. Subsequent RTM_GETTCLASS dump operations walk all classes via taprio_walk() and call taprio_dump_class(), which calls taprio_leaf() returning the NULL pointer, then dereferences it to read child->handle, causing a kernel NULL pointer dereference. The bug is reachable with namespace-scoped CAP_NET_ADMIN on any kernel with CONFIG_NET_SCH_TAPRIO enabled. On systems with unprivileged user namespaces enabled, an unprivileged local user can trigger a kernel panic by creating a taprio qdisc inside a new network namespace, grafting an explicit child qdisc, deleting it, and requesting a class dump. The RTM_GETTCLASS dump itself requires no capability. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000007: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] RIP: 0010:taprio_dump_class (net/sched/sch_taprio.c:2478) Call Trace: <TASK> tc_fill_tclass (net/sched/sch_api.c:1966) qdisc_class_dump (net/sched/sch_api.c:2326) taprio_walk (net/sched/sch_taprio.c:2514) tc_dump_tclass_qdisc (net/sched/sch_api.c:2352) tc_dump_tclass_root (net/sched/sch_api.c:2370) tc_dump_tclass (net/sched/sch_api.c:2431) rtnl_dumpit (net/core/rtnetlink.c:6864) netlink_dump (net/netlink/af_netlink.c:2325) rtnetlink_rcv_msg (net/core/rtnetlink.c:6959) netlink_rcv_skb (net/netlink/af_netlink.c:2550) </TASK> Fix this by substituting &noop_qdisc when new is NULL in taprio_graft(), a common pattern used by other qdiscs (e.g., multiq_graft()) to ensure the q->qdiscs[] slots are never NULL. This makes control-plane dump paths safe without requiring individual NULL checks. Since the data-plane paths (taprio_enqueue and taprio_dequeue_from_txq) previously had explicit NULL guards that would drop/skip the packet cleanly, update those checks to test for &noop_qdisc instead. Without this, packets would reach taprio_enqueue_one() which increments the root qdisc's qlen and backlog before calling the child's enqueue; noop_qdisc drops the packet but those counters are never rolled back, permanently inflating the root qdisc's statistics. After this change *old can be a valid qdisc, NULL, or &noop_qdisc. Only call qdisc_put(*old) in the first case to avoid decreasing noop_qdisc's refcount, which was never increased. Fixes: 665338b2a7a0 ("net/sched: taprio: dump class stats for the actual q->qdiscs[]") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Tested-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260422161958.2517539-3-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27ipv6: rpl: reserve mac_len headroom when recompressed SRH growsGreg Kroah-Hartman
ipv6_rpl_srh_rcv() decompresses an RFC 6554 Source Routing Header, swaps the next segment into ipv6_hdr->daddr, recompresses, then pulls the old header and pushes the new one plus the IPv6 header back. The recompressed header can be larger than the received one when the swap reduces the common-prefix length the segments share with daddr (CmprI=0, CmprE>0, seg[0][0] != daddr[0] gives the maximum +8 bytes). pskb_expand_head() was gated on segments_left == 0, so on earlier segments the push consumed unchecked headroom. Once skb_push() leaves fewer than skb->mac_len bytes in front of data, skb_mac_header_rebuild()'s call to: skb_set_mac_header(skb, -skb->mac_len); will store (data - head) - mac_len into the u16 mac_header field, which wraps to ~65530, and the following memmove() writes mac_len bytes ~64KiB past skb->head. A single AF_INET6/SOCK_RAW/IPV6_HDRINCL packet over lo with a two segment type-3 SRH (CmprI=0, CmprE=15) reaches headroom 8 after one pass; KASAN reports a 14-byte OOB write in ipv6_rthdr_rcv. Fix this by expanding the head whenever the remaining room is less than the push size plus mac_len, and request that much extra so the rebuilt MAC header fits afterwards. Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr") Cc: stable <stable@kernel.org> Reported-by: Anthropic Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2026042133-gout-unvented-1bd9@gregkh Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()Eric Dumazet
fq_codel_dump_stats() acquires the qdisc spinlock a bit too late. Move this acquisition before we fill tc_fq_pie_xstats with live data. Alternative would be to add READ_ONCE() and WRITE_ONCE() annotations, but the spinlock is needed anyway to scan q->new_flows and q->old_flows. Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260423063527.2568262-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: sch_choke: annotate data-races in choke_dump_stats()Eric Dumazet
choke_dump_stats() only runs with RTNL held. It reads fields that can be changed in qdisc fast path. Add READ_ONCE()/WRITE_ONCE() annotations. Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260423062839.2524324-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: check for negative latency and jitterStephen Hemminger
Reject requests with negative latency or jitter. A negative value added to current timestamp (u64) wraps to an enormous time_to_send, disabling dequeue. The original UAPI used u32 for these values; the conversion to 64-bit time values via TCA_NETEM_LATENCY64 and TCA_NETEM_JITTER64 allowed signed values to reach the kernel without validation. Jitter is already silently clamped by an abs() in netem_change(); that abs() can be removed in a follow-up once this rejection is in place. Fixes: 99803171ef04 ("netem: add uapi to express delay and jitter in nanoseconds") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-7-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: fix slot delay calculation overflowStephen Hemminger
get_slot_next() computes a random delay between min_delay and max_delay using: get_random_u32() * (max_delay - min_delay) >> 32 This overflows signed 64-bit arithmetic when the delay range exceeds approximately 2.1 seconds (2^31 nanoseconds), producing a negative result that effectively disables slot-based pacing. This is a realistic configuration for WAN emulation (e.g., slot 1s 5s). Use mul_u64_u32_shr() which handles the widening multiply without overflow. Fixes: 0a9fe5c375b5 ("netem: slotting with non-uniform distribution") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-6-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: validate slot configurationStephen Hemminger
Reject slot configurations that have no defensible meaning: - negative min_delay or max_delay - min_delay greater than max_delay - negative dist_delay or dist_jitter - negative max_packets or max_bytes Negative or out-of-order delays underflow in get_slot_next(), producing garbage intervals. Negative limits trip the per-slot accounting (packets_left/bytes_left <= 0) on the first packet of every slot, defeating the rate-limiting half of the slot feature. Note that dist_jitter has been silently coerced to its absolute value by get_slot() since the feature was introduced; rejecting negatives here converts that silent coercion into -EINVAL. The abs() can be removed in a follow-up. Fixes: 836af83b54e3 ("netem: support delivering packets in delayed time slots") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-5-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: only reseed PRNG when seed is explicitly providedStephen Hemminger
netem_change() unconditionally reseeds the PRNG on every tc change command. If TCA_NETEM_PRNG_SEED is not specified, a new random seed is generated, destroying reproducibility for users who set a deterministic seed on a previous change. Move the initial random seed generation to netem_init() and only reseed in netem_change() when TCA_NETEM_PRNG_SEED is explicitly provided by the user. Fixes: 4072d97ddc44 ("netem: add prng attribute to netem_sched_data") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-4-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: fix queue limit check to include reordered packetsStephen Hemminger
The queue limit check in netem_enqueue() uses q->t_len which only counts packets in the internal tfifo. Packets placed in sch->q by the reorder path (__qdisc_enqueue_head) are not counted, allowing the total queue occupancy to exceed sch->limit under reordering. Include sch->q.qlen in the limit check. Fixes: f8d4bc455047 ("net/sched: netem: account for backlog updates from child qdisc") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-3-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27net/sched: netem: fix probability gaps in 4-state loss modelStephen Hemminger
The 4-state Markov chain in loss_4state() has gaps at the boundaries between transition probability ranges. The comparisons use: if (rnd < a4) else if (a4 < rnd && rnd < a1 + a4) When rnd equals a boundary value exactly, neither branch matches and no state transition occurs. The redundant lower-bound check (a4 < rnd) is already implied by being in the else branch. Remove the unnecessary lower-bound comparisons so the ranges are contiguous and every random value produces a transition, matching the GI (General and Intuitive) loss model specification. This bug goes back to original implementation of this model. Fixes: 661b79725fea ("netem: revised correlated loss generator") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260418032027.900913-2-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27wifi: mac80211: drop stray 'static' from fast-RX rx_resultCatherine
ieee80211_invoke_fast_rx() is documented as safe for parallel RX, but its per-invocation rx_result is declared static. Concurrent callers then share one instance and can overwrite each other's result between ieee80211_rx_mesh_data() and the switch on res. That can make a packet that was queued or consumed by ieee80211_rx_mesh_data() fall through into ieee80211_rx_8023(), or make a packet that should continue return as queued. Make res an automatic variable so each invocation keeps its own result. Fixes: 3468e1e0c639 ("wifi: mac80211: add mesh fast-rx support") Cc: stable@vger.kernel.org Signed-off-by: Catherine <enderaoelyther@gmail.com> Link: https://patch.msgid.link/20260424131435.83212-2-enderaoelyther@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-04-27wifi: mac80211: check ieee80211_rx_data_set_link return in pubsta MLO pathMichael Bommarito
__ieee80211_rx_handle_packet() resolves the link via ieee80211_rx_data_set_link() on the pubsta->mlo path but ignores the helper's return value. Inside the helper, rx->link = rcu_dereference(rx->sdata->link[link_id]); can leave rx->link NULL if link_id references a slot already cleared by ieee80211_vif_set_links() during station-initiated ML reconfiguration (see mlme.c's ieee80211_ml_reconfiguration(), which invalidates sdata->link[] before the matching ieee80211_sta_remove_link() loop walks the link-sta hash). RX dispatch still resolves a link_sta from the hash and then drops into ieee80211_prepare_and_rx_handle(), which dereferences link->conf->addr. Every other user site of ieee80211_rx_data_set_link() checks the return and bails on failure; only this branch did not. Mirror the safe pattern. Fixes: e66b7920aa5a ("wifi: mac80211: fix initialization of rx->link and rx->link_sta") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260422000651.4184602-1-michael.bommarito@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-04-27wifi: nl80211: require admin perm on SET_PMK / DEL_PMKMichael Bommarito
NL80211_CMD_SET_PMK and NL80211_CMD_DEL_PMK manage the offloaded 4-way-handshake PMK state used by drivers advertising NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_1X. The only in-tree driver that wires up both ->set_pmk / ->del_pmk and advertises the feature today is brcmfmac, so the practical reach of this patch is narrow. Both ops were introduced without a .flags gate, so the generic netlink layer dispatches them to an unprivileged caller instead of rejecting with -EPERM at the permission check. Every other connection-state op in the adjacent block (CONNECT, ASSOCIATE, AUTHENTICATE, SET_KEY, ...) carries GENL_UNS_ADMIN_PERM; SET_PMK / DEL_PMK were introduced without the flag in 2017 and left unchanged by later refactors. Johannes checked the original Intel submission history and confirmed there is no admin check in any prior revision either, so this seems likely to be a simple oversight rather than an intentional carve-out. Require GENL_UNS_ADMIN_PERM so the genl layer performs the same capable(CAP_NET_ADMIN) check as its siblings. wpa_supplicant already needs CAP_NET_ADMIN for every other nl80211 op it issues, so supplicant operation is unaffected. The worst case the missing gate enables today is an unprivileged local process on a multi-user system invalidating the offloaded PMK state of another user's 4-way-handshake session, forcing a full EAP re-auth on the next reconnect. Verified in UML: an unprivileged probe (uid=1000) sees SET_MULTICAST_TO_UNICAST (sibling op with GENL_UNS_ADMIN_PERM) return -EPERM on both pre- and post-fix kernels, while SET_PMK / DEL_PMK return -ENODEV from nl80211_pre_doit()'s wdev lookup pre- fix (proving dispatch crossed the genl permission check) and -EPERM post-fix (rejected at the genl layer as intended). Suggested-by: Johannes Berg <johannes@sipsolutions.net> Fixes: 3a00df5707b6 ("cfg80211: support 4-way handshake offloading for 802.1X") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom> Link: https://patch.msgid.link/20260421224552.4044147-1-michael.bommarito@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-04-27wifi: mac80211: skip ieee80211_verify_sta_ht_mcs_support check in non-strict ↵Rio Liu
mode Some Xfinity XB8 firmware advertises >1 spatial stream MCS indexes in their basic HT-MCS set. On cards with lower spatial streams, the check would fail, and we'd be stuck with no HT when in fact work fine with its own supported rate. This change makes it so the check is only performed in strict mode. Fixes: 574faa0e936d ("wifi: mac80211: add HT and VHT basic set verification") Signed-off-by: Rio Liu <rio@r26.me> Link: https://patch.msgid.link/99Mv9QEceyPrQhSP52MtAVmz0_kWJmzqotJjD9YW6LGLqk-AZloAueUyHCURilFkuqOh6Ecv8i2KKdSE1ujP3AnbU5QEouVisT1w_V3xdfc=@r26.me Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-04-24Merge tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix handling of ENOSPC so that if we have to resend writes, they are written synchronously - SUNRPC RDMA transport fixes from Chuck - Several fixes for delegated timestamps in NFSv4.2 - Failure to obtain a directory delegation should not cause stat() to fail with NFSv4 - Rename was failing to update timestamps when a directory delegation is held on NFSv4 - Ensure we check rsize/wsize after crossing a NFSv4 filesystem boundary - NFSv4/pnfs: - If the server is down, retry the layout returns on reboot - Fallback to MDS could result in a short write being incorrectly logged Cleanups: - Use memcpy_and_pad in decode_fh" * tag 'nfs-for-7.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (21 commits) NFS: Fix RCU dereference of cl_xprt in nfs_compare_super_address NFS: remove redundant __private attribute from nfs_page_class NFSv4.2: fix CLONE/COPY attrs in presence of delegated attributes NFS: fix writeback in presence of errors nfs: use memcpy_and_pad in decode_fh NFSv4.1: Apply session size limits on clone path NFSv4: retry GETATTR if GET_DIR_DELEGATION failed NFS: fix RENAME attr in presence of directory delegations pnfs/flexfiles: validate ds_versions_cnt is non-zero NFS/blocklayout: print each device used for SCSI layouts xprtrdma: Post receive buffers after RPC completion xprtrdma: Scale receive batch size with credit window xprtrdma: Replace rpcrdma_mr_seg with xdr_buf cursor xprtrdma: Decouple frwr_wp_create from frwr_map xprtrdma: Close lost-wakeup race in xprt_rdma_alloc_slot xprtrdma: Avoid 250 ms delay on backlog wakeup xprtrdma: Close sendctx get/put race that can block a transport nfs: update inode ctime after removexattr operation nfs: fix utimensat() for atime with delegated timestamps NFS: improve "Server wrote zero bytes" error ...
2026-04-24Merge tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "We have a series from Alex which extends CephFS client metrics with support for per-subvolume data I/O performance and latency tracking (metadata operations aren't included) and a good variety of fixes and cleanups across RBD and CephFS" * tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-client: ceph: add subvolume metrics collection and reporting ceph: parse subvolume_id from InodeStat v9 and store in inode ceph: handle InodeStat v8 versioned field in reply parsing libceph: Fix slab-out-of-bounds access in auth message processing rbd: fix null-ptr-deref when device_add_disk() fails crush: cleanup in crush_do_rule() method ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails ceph: only d_add() negative dentries when they are unhashed libceph: update outdated comment in ceph_sock_write_space() libceph: Remove obsolete session key alignment logic ceph: fix num_ops off-by-one when crypto allocation fails libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
2026-04-24Merge tag '9p-for-7.1-rc1' of https://github.com/martinetd/linuxLinus Torvalds
Pull 9p updates from Dominique Martinet: - 9p access flag fix (cannot change access flag since new mount API implem) - some minor cleanup * tag '9p-for-7.1-rc1' of https://github.com/martinetd/linux: 9p/trans_xen: replace simple_strto* with kstrtouint 9p/trans_xen: make cleanup idempotent after dataring alloc errors 9p: document missing enum values in kernel-doc comments 9p: fix access mode flags being ORed instead of replaced 9p: fix memory leak in v9fs_init_fs_context error path
2026-04-24bpf: Fix sk_local_storage diag dumping uninitialized special fieldsAmery Hung
Call check_and_init_map_value() after the copy_map_value() to zero out special field regions. diag_get() copies sk_local_storage map values into a netlink message using copy_map_value{_locked}(), which intentionally skip special fields. However, the destination buffer from nla_reserve_64bit() is not zeroed and the skipped regions contain uninitialized skb data can be sent to userspace. Fixes: 1ed4d92458a9 ("bpf: INET_DIAG support in bpf_sk_storage") Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260423222356.155387-1-ameryhung@gmail.com
2026-04-24netfilter: nf_conntrack_sip: don't use simple_strtoulFlorian Westphal
Replace unsafe port parsing in epaddr_len(), ct_sip_parse_header_uri(), and ct_sip_parse_request() with a new sip_parse_port() helper that validates each digit against the buffer limit, eliminating the use of simple_strtoul() which assumes NUL-terminated strings. The previous code dereferenced pointers without bounds checks after sip_parse_addr() and relied on simple_strtoul() on non-NUL-terminated skb data. A port that reaches the buffer limit without a trailing character is also rejected as malformed. Also get rid of all simple_strtoul() usage in conntrack, prefer a stricter version instead. There are intentional changes: - Bail out if number is > UINT_MAX and indicate a failure, same for too long sequences. While we do accept 05535 as port 5535, we will not accept e.g. 'sip:10.0.0.1:005060'. While its syntactically valid under RFC 3261, we should restrict this to not waste cycles when presented with malformed packets with 64k '0' characters. - Force base 10 in ct_sip_parse_numerical_param(). This is used to fetch 'expire=' and 'rports='; both are expected to use base-10. - In nf_nat_sip.c, only accept the parsed value if its within the 1k-64k range. - epaddr_len now returns 0 if the port is invalid, as it already does for invalid ip addresses. This is intentional. nf_conntrack_sip performs lots of guesswork to find the right parts of the message to parse. Being stricter could break existing setups. Connection tracking helpers are designed to allow traffic to pass, not to block it. Based on an earlier patch from Jenny Guanni Qu <qguanni@gmail.com>. Fixes: 05e3ced297fe ("[NETFILTER]: nf_conntrack_sip: introduce SIP-URI parsing helper") Reported-by: Klaudia Kloc <klaudia@vidocsecurity.com> Reported-by: Dawid Moczadło <dawid@vidocsecurity.com> Reported-by: Jenny Guanni Qu <qguanni@gmail.com>. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-24netfilter: reject zero shift in nft_bitwiseKai Ma
Reject zero shift operands for nft_bitwise left and right shift expressions during initialization. The carry propagation logic computes the carry from the adjacent 32-bit word using BITS_PER_TYPE(u32) - shift. A zero shift operand turns this into a 32-bit shift, which is undefined behaviour. Reject zero shift operands in the control plane, alongside the existing check for values greater than or equal to 32, so malformed rules never reach the packet path. Fixes: 567d746b55bc ("netfilter: bitwise: add support for shifts.") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Kai Ma <k4729.23098@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-24netfilter: xt_policy: fix strict mode inbound policy matchingJiexun Wang
match_policy_in() walks sec_path entries from the last transform to the first one, but strict policy matching needs to consume info->pol[] in the same forward order as the rule layout. Derive the strict-match policy position from the number of transforms already consumed so that multi-element inbound rules are matched consistently. Fixes: c4b885139203 ("[NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version") Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-24Merge tag 'net-deletions' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking deletions from Jakub Kicinski: "Delete some obsolete networking code Old code like amateur radio and NFC have long been a burden to core networking developers. syzbot loves to find bugs in BKL-era code, and noobs try to fix them. If we want to have a fighting chance of surviving the LLM-pocalypse this code needs to find a dedicated owner or get deleted. We've talked about these deletions multiple times in the past and every time someone wanted the code to stay. It is never very clear to me how many of those people actually use the code vs are just nostalgic to see it go. Amateur radio did have occasional users (or so I think) but most users switched to user space implementations since its all super slow stuff. Nobody stepped up to maintain the kernel code. We were lucky enough to find someone who wants to help with NFC so we're giving that a chance. Let's try to put the rest of this code behind us" * tag 'net-deletions' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: drivers: net: 8390: wd80x3: Remove this driver drivers: net: 8390: ultra: Remove this driver drivers: net: 8390: AX88190: Remove this driver drivers: net: fujitsu: fmvj18x: Remove this driver drivers: net: smsc: smc91c92: Remove this driver drivers: net: smsc: smc9194: Remove this driver drivers: net: amd: nmclan: Remove this driver drivers: net: amd: lance: Remove this driver drivers: net: 3com: 3c589: Remove this driver drivers: net: 3com: 3c574: Remove this driver drivers: net: 3com: 3c515: Remove this driver drivers: net: 3com: 3c509: Remove this driver net: packetengines: remove obsolete yellowfin driver and vendor dir net: packetengines: remove obsolete hamachi driver net: remove unused ATM protocols and legacy ATM device drivers net: remove ax25 and amateur radio (hamradio) subsystem net: remove ISDN subsystem and Bluetooth CMTP caif: remove CAIF NETWORK LAYER
2026-04-23bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()Weiming Shi
When tot_len is not provided by the user, bpf_skb_fib_lookup() resolves the FIB result's output device via dev_get_by_index_rcu() to check skb forwardability and fill in mtu_result. The returned pointer is dereferenced without a NULL check. If the device is concurrently unregistered, dev_get_by_index_rcu() returns NULL and is_skb_forwardable() crashes at dev->flags: KASAN: null-ptr-deref in range [0x00000000000000b0-0x00000000000000b7] Call Trace: is_skb_forwardable (include/linux/netdevice.h:4365) bpf_skb_fib_lookup (net/core/filter.c:6446) bpf_prog_test_run_skb (net/bpf/test_run.c) __sys_bpf (kernel/bpf/syscall.c) Add the missing NULL check, returning -ENODEV to be consistent with how bpf_ipv4_fib_lookup() and bpf_ipv6_fib_lookup() handle the same condition. Fixes: 4f74fede40df ("bpf: Add mtu checking to FIB forwarding helper") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://patch.msgid.link/20260423183831.1325480-2-bestswngs@gmail.com
2026-04-23sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().Kuniyuki Iwashima
syzbot reported a splat in sock_map_destroy() [0], where psock was NULL even though sk->sk_prot still pointed to tcp_bpf_prots[][]. The stack trace shows how badly the path was excercised, see inet_release() calls tcp_close(), not sock_map_close() yet, but finally reaching sock_map_destroy(). The root cause is a lack of synchronisation. Even if sk_psock_get() fails to bump psock->refcnt, it does not guarantee that sk_psock_drop() has finished, and thus sk->sk_prot might not have been restored to the original one. Commit 4b4647add7d3 ("sock_map: avoid race between sock_map_close and sk_psock_put") attempted to address this, but it was insufficient for two reasons. It did not cover sock_map_unhash() and sock_map_destroy(), and it missed the corner case where sk_psock() is NULL. On non-x86 platforms, sk_psock_restore_proto(sk, psock) and rcu_assign_sk_user_data(sk, NULL) can be reordered because there is no address dependency between sk->sk_prot and sk->sk_user_data. sk_psock_get() returning NULL implies nothing about sk->sk_prot. Let's simply retry sk_psock_get() in the unlikely case. Note that we cannot avoid loop even if we added memory barrier in sk_psock_drop() and sock_map_psock_get_checked(). Also note that sock_map_destroy() cannot be called from softirq while sock_map_close() has also been running. It is because sock_map_destroy() requires SOCK_DEAD, so sock_map_destroy() cannot happen until sock_map_close() has finished the saved_close() (which is tcp_close()). [0]: WARNING: CPU: 1 PID: 8459 at net/core/sock_map.c:1667 sock_map_destroy+0x28b/0x2b0 net/core/sock_map.c:1667 Modules linked in: CPU: 1 UID: 0 PID: 8459 Comm: syz.0.1109 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:sock_map_destroy+0x28b/0x2b0 net/core/sock_map.c:1667 Code: 8b 36 49 83 c6 38 4c 89 f0 48 c1 e8 03 42 80 3c 38 00 74 08 4c 89 f7 e8 93 62 22 f9 4d 8b 3e e9 79 ff ff ff e8 a6 2b c3 f8 90 <0f> 0b 90 eb 9c e8 9b 2b c3 f8 4c 89 e7 be 03 00 00 00 e8 0e 4e bc RSP: 0018:ffffc9000d067be8 EFLAGS: 00010293 RAX: ffffffff88fb30aa RBX: ffff888024832000 RCX: ffff888024283b80 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: dffffc0000000000 R11: ffffed100862e946 R12: dffffc0000000000 R13: ffff888024832000 R14: ffffffff995b2208 R15: ffffffff88fb2e20 FS: 0000555579a7d500(0000) GS:ffff8881269c2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00002000000048c0 CR3: 000000003713a000 CR4: 00000000003526f0 Call Trace: <TASK> inet_csk_destroy_sock+0x166/0x3a0 net/ipv4/inet_connection_sock.c:1294 __tcp_close+0xcc1/0xfd0 net/ipv4/tcp.c:3262 tcp_close+0x28/0x110 net/ipv4/tcp.c:3274 inet_release+0x144/0x190 net/ipv4/af_inet.c:435 __sock_release net/socket.c:649 [inline] sock_close+0xc0/0x240 net/socket.c:1439 __fput+0x45b/0xa80 fs/file_table.c:468 task_work_run+0x1d4/0x260 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xec/0x110 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x2bd/0x3b0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f265847ebe9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd158dfbd8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 000000000002ddb0 RCX: 00007f265847ebe9 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007f26586a7da0 R08: 0000000000000001 R09: 0000000e158dfecf R10: 0000001b30a20000 R11: 0000000000000246 R12: 00007f26586a5fac R13: 00007f26586a5fa0 R14: ffffffffffffffff R15: 00007ffd158dfcf0 </TASK> Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks") Fixes: b05545e15e1f ("bpf: sockmap, fix transition through disconnect without close") Fixes: d8616ee2affc ("bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues") Reported-by: syzbot+b0842d38af58376d1fdc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/69cec5ef.050a0220.2dbe29.0009.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260420194846.1089595-1-kuniyu@google.com
2026-04-23bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag pathsWeiming Shi
bpf_selem_unlink_nofail() sets SDATA(selem)->smap to NULL before removing the selem from the storage hlist. A concurrent RCU reader in bpf_sk_storage_clone() can observe the selem still on the list with smap already NULL, causing a NULL pointer dereference. general protection fault, probably for non-canonical address 0xdffffc000000000a: KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057] RIP: 0010:bpf_sk_storage_clone+0x1cd/0xaa0 net/core/bpf_sk_storage.c:174 Call Trace: <IRQ> sk_clone+0xfed/0x1980 net/core/sock.c:2591 inet_csk_clone_lock+0x30/0x760 net/ipv4/inet_connection_sock.c:1222 tcp_create_openreq_child+0x35/0x2680 net/ipv4/tcp_minisocks.c:571 tcp_v4_syn_recv_sock+0x123/0xf90 net/ipv4/tcp_ipv4.c:1729 tcp_check_req+0x8e1/0x2580 include/net/tcp.h:855 tcp_v4_rcv+0x1845/0x3b80 net/ipv4/tcp_ipv4.c:2347 Add a NULL check for smap in bpf_sk_storage_clone(). bpf_sk_storage_diag_put_all() has the same issue. Add a NULL check and pass the validated smap directly to diag_get(), which is refactored to take smap as a parameter instead of reading it internally. bpf_sk_storage_diag_put() uses diag->maps[i] which is always valid under its refcount, so diag->maps[i] is passed directly to diag_get(). Fixes: 5d800f87d0a5 ("bpf: Support lockless unlink when freeing map or local storage") Reported-by: Xiang Mei <xmei5@asu.edu> Acked-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260422065411.1007737-2-bestswngs@gmail.com
2026-04-23Merge tag 'net-7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Steady stream of fixes. Last two weeks feel comparable to the two weeks before the merge window. Lots of AI-aided bug discovery. A newer big source is Sashiko/Gemini (Roman Gushchin's system), which points out issues in existing code during patch review (maybe 25% of fixes here likely originating from Sashiko). Nice thing is these are often fixed by the respective maintainers, not drive-bys. Current release - new code bugs: - kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP Previous releases - regressions: - add async ndo_set_rx_mode and switch drivers which we promised to be called under the per-netdev mutex to it - dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops - hv_sock: report EOF instead of -EIO for FIN - vsock/virtio: fix MSG_PEEK calculation on bytes to copy Previous releases - always broken: - ipv6: fix possible UAF in icmpv6_rcv() - icmp: validate reply type before using icmp_pointers - af_unix: drop all SCM attributes for SOCKMAP - netfilter: fix a number of bugs in the osf (OS fingerprinting) - eth: intel: fix timestamp interrupt configuration for E825C Misc: - bunch of data-race annotations" * tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits) rxrpc: Fix error handling in rxgk_extract_token() rxrpc: Fix re-decryption of RESPONSE packets rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets rxrpc: Fix missing validation of ticket length in non-XDR key preparsing rxgk: Fix potential integer overflow in length check rxrpc: Fix conn-level packet handling to unshare RESPONSE packets rxrpc: Fix potential UAF after skb_unshare() failure rxrpc: Fix rxkad crypto unalignment handling rxrpc: Fix memory leaks in rxkad_verify_response() net: rds: fix MR cleanup on copy error m68k: mvme147: Make me the maintainer net: txgbe: fix firmware version check selftests/bpf: check epoll readiness during reuseport migration tcp: call sk_data_ready() after listener migration vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll() ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim tipc: fix double-free in tipc_buf_append() llc: Return -EINPROGRESS from llc_ui_connect() ipv4: icmp: validate reply type before using icmp_pointers selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges ...
2026-04-23rxrpc: Fix error handling in rxgk_extract_token()David Howells
Fix a missing bit of error handling in rxgk_extract_token(): in the event that rxgk_decrypt_skb() returns -ENOMEM, it should just return that rather than continuing on (for anything else, it generates an abort). Fixes: 64863f4ca494 ("rxrpc: Fix unhandled errors in rxgk_verify_packet_integrity()") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix re-decryption of RESPONSE packetsDavid Howells
If a RESPONSE packet gets a temporary failure during processing, it may end up in a partially decrypted state - and then get requeued for a retry. Fix this by just discarding the packet; we will send another CHALLENGE packet and thereby elicit a further response. Similarly, discard an incoming CHALLENGE packet if we get an error whilst generating a RESPONSE; the server will send another CHALLENGE. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packetsDavid Howells
Fix rxrpc_input_call_event() to only unshare DATA packets and not ACK, ABORT, etc.. And with that, rxrpc_input_packet() doesn't need to take a pointer to the pointer to the packet, so change that to just a pointer. Fixes: 1f2740150f90 ("rxrpc: Fix potential UAF after skb_unshare() failure") Closes: https://sashiko.dev/#/patchset/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260423200909.3049438-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix missing validation of ticket length in non-XDR key preparsingAnderson Nascimento
In rxrpc_preparse(), there are two paths for parsing key payloads: the XDR path (for large payloads) and the non-XDR path (for payloads <= 28 bytes). While the XDR path (rxrpc_preparse_xdr_rxkad()) correctly validates the ticket length against AFSTOKEN_RK_TIX_MAX, the non-XDR path fails to do so. This allows an unprivileged user to provide a very large ticket length. When this key is later read via rxrpc_read(), the total token size (toksize) calculation results in a value that exceeds AFSTOKEN_LENGTH_MAX, triggering a WARN_ON(). [ 2001.302904] WARNING: CPU: 2 PID: 2108 at net/rxrpc/key.c:778 rxrpc_read+0x109/0x5c0 [rxrpc] Fix this by adding a check in the non-XDR parsing path of rxrpc_preparse() to ensure the ticket length does not exceed AFSTOKEN_RK_TIX_MAX, bringing it into parity with the XDR parsing logic. Fixes: 8a7a3eb4ddbe ("KEYS: RxRPC: Use key preparsing") Fixes: 84924aac08a4 ("rxrpc: Fix checker warning") Reported-by: Anderson Nascimento <anderson@allelesecurity.com> Signed-off-by: Anderson Nascimento <anderson@allelesecurity.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxgk: Fix potential integer overflow in length checkDavid Howells
Fix potential integer overflow in rxgk_extract_token() when checking the length of the ticket. Rather than rounding up the value to be tested (which might overflow), round down the size of the available data. Fixes: 2429a1976481 ("rxrpc: Fix untrusted unsigned subtract") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix conn-level packet handling to unshare RESPONSE packetsDavid Howells
The security operations that verify the RESPONSE packets decrypt bits of it in place - however, the sk_buff may be shared with a packet sniffer, which would lead to the sniffer seeing an apparently corrupt packet (actually decrypted). Fix this by handing a copy of the packet off to the specific security handler if the packet was cloned. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix potential UAF after skb_unshare() failureDavid Howells
If skb_unshare() fails to unshare a packet due to allocation failure in rxrpc_input_packet(), the skb pointer in the parent (rxrpc_io_thread()) will be NULL'd out. This will likely cause the call to trace_rxrpc_rx_done() to oops. Fix this by moving the unsharing down to where rxrpc_input_call_event() calls rxrpc_input_call_packet(). There are a number of places prior to that where we ignore DATA packets for a variety of reasons (such as the call already being complete) for which an unshare is then avoided. And with that, rxrpc_input_packet() doesn't need to take a pointer to the pointer to the packet, so change that to just a pointer. Fixes: 2d1faf7a0ca3 ("rxrpc: Simplify skbuff accounting in receive path") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix rxkad crypto unalignment handlingDavid Howells
Fix handling of a packet with a misaligned crypto length. Also handle non-ENOMEM errors from decryption by aborting. Further, remove the WARN_ON_ONCE() so that it can't be remotely triggered (a trace line can still be emitted). Fixes: f93af41b9f5f ("rxrpc: Fix missing error checks for rxkad encryption/decryption failure") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23rxrpc: Fix memory leaks in rxkad_verify_response()David Howells
Fix rxkad_verify_response() to free the ticket and the server key under all circumstances by initialising the ticket pointer to NULL and then making all paths through the function after the first allocation has been done go through a single common epilogue that just releases everything - where all the releases skip on a NULL pointer. Fixes: 57af281e5389 ("rxrpc: Tidy up abort generation infrastructure") Fixes: ec832bd06d6f ("rxrpc: Don't retain the server key in the connection") Closes: https://sashiko.dev/#/patchset/20260408121252.2249051-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org cc: stable@kernel.org Link: https://patch.msgid.link/20260422161438.2593376-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23net: remove unused ATM protocols and legacy ATM device driversJakub Kicinski
Remove the ATM protocol modules and PCI/SBUS ATM device drivers that are no longer in active use. The ATM core protocol stack, PPPoATM, BR2684, and USB DSL modem drivers (drivers/usb/atm/) are retained in-tree to maintain PPP over ATM (PPPoA) and PPPoE-over-BR2684 support for DSL connections. The Solos ADSL2+ PCI driver is also retained. Removed ATM protocol modules: - net/atm/clip.c - Classical IP over ATM (RFC 2225) - net/atm/lec.c - LAN Emulation Client (LANE) - net/atm/mpc.c, mpoa_caches.c, mpoa_proc.c - Multi-Protocol Over ATM Removed PCI/SBUS ATM device drivers (drivers/atm/): - adummy, atmtcp - software/testing ATM devices - eni - Efficient Networks ENI155P (OC-3, ~1995) - fore200e - FORE Systems 200E PCI/SBUS (OC-3, ~1999) - he - ForeRunner HE (OC-3/OC-12, ~2000) - idt77105 - IDT 77105 25 Mbps ATM PHY - idt77252 - IDT 77252 NICStAR II (OC-3, ~2000) - iphase - Interphase ATM PCI (OC-3/DS3/E3) - lanai - Efficient Networks Speedstream 3010 - nicstar - IDT 77201 NICStAR (155/25 Mbps, ~1999) - suni - PMC S/UNI SONET PHY library Also clean up references in: - net/bridge/ - remove ATM LANE hook (br_fdb_test_addr_hook, br_fdb_test_addr) - net/core/dev.c - remove br_fdb_test_addr_hook export - defconfig files - remove ATM driver config options The removed code is moved to an out-of-tree module package (mod-orphan). Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260422041846.2035118-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23net: rds: fix MR cleanup on copy errorAo Zhou
__rds_rdma_map() hands sg/pages ownership to the transport after get_mr() succeeds. If copying the generated cookie back to user space fails after that point, the error path must not free those resources again before dropping the MR reference. Remove the duplicate unpin/free from the put_user() failure branch so that MR teardown is handled only through the existing final cleanup path. Fixes: 0d4597c8c5ab ("net/rds: Track user mapped pages through special API") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ao Zhou <draw51280@163.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/79c8ef73ec8e5844d71038983940cc2943099baf.1776764247.git.draw51280@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23tcp: call sk_data_ready() after listener migrationZhenzhong Wu
When inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group, the target listener gets a new accept-queue entry via inet_csk_reqsk_queue_add(), but that path never notifies the target listener's waiters. A nonblocking accept() still works because it checks the queue directly, but poll()/epoll_wait() waiters and blocking accept() callers can also remain asleep indefinitely. Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration in inet_csk_listen_stop(). However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired in reuseport_migrate_sock() is effectively transferred to nreq->rsk_listener. Another CPU can then dequeue nreq via accept() or listener shutdown, hit reqsk_put(), and drop that listener ref. Since listeners are SOCK_RCU_FREE, wrap the post-queue_add() dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also covers the existing sock_net(nsk) access in that path. The reqsk_timer_handler() path does not need the same changes for two reasons: half-open requests become readable only after the final ACK, where tcp_child_process() already wakes the listener; and once nreq is visible via inet_ehash_insert(), the success path no longer touches nsk directly. Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.") Cc: stable@vger.kernel.org Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Zhenzhong Wu <jt26wzz@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260422024554.130346-2-jt26wzz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_limDaniel Borkmann
Commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and Destination options") added net.ipv6.max_{hbh,dst}_opts_{cnt,len} and applied them in ip6_parse_tlv(), the generic TLV walker invoked from ipv6_destopt_rcv() and ipv6_parse_hopopts(). ip6_tnl_parse_tlv_enc_lim() does not go through ip6_parse_tlv(); it has its own hand-rolled TLV scanner inside its NEXTHDR_DEST branch which looks for IPV6_TLV_TNL_ENCAP_LIMIT. That inner loop is bounded only by optlen, which can be up to 2048 bytes. Stuffing the Destination Options header with 2046 Pad1 (type=0) entries advances the scanner a single byte at a time, yielding ~2000 TLV iterations per extension header. Reusing max_dst_opts_cnt to bound the TLV iterations, matching the semantics from 47d3d7ac656a, would require duplicating ip6_parse_tlv() to also validate Pad1/PadN payload. It would also mandate enforcing max_dst_opts_len, since otherwise an attacker shifts the axis to few options with a giant PadN and recovers the original DoS. Allowing up to 8 options before the tunnel encapsulation limit TLV is liberal enough; in practice encap limit is the first TLV. Thus, go with a hard-coded limit IP6_TUNNEL_MAX_DEST_TLVS (8). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23tipc: fix double-free in tipc_buf_append()Lee Jones
tipc_msg_validate() can potentially reallocate the skb it is validating, freeing the old one. In tipc_buf_append(), it was being called with a pointer to a local variable which was a copy of the caller's skb pointer. If the skb was reallocated and validation subsequently failed, the error handling path would free the original skb pointer, which had already been freed, leading to double-free. Fix this by checking if head now points to a newly allocated reassembled skb. If it does, reassign *headbuf for later freeing operations. Fixes: d618d09a68e4 ("tipc: enforce valid ratio between skb truesize and contents") Suggested-by: Tung Nguyen <tung.quang.nguyen@est.tech> Signed-off-by: Lee Jones <lee@kernel.org> Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23llc: Return -EINPROGRESS from llc_ui_connect()Ernestas Kulik
Given a zero sk_sndtimeo, llc_ui_connect() skips waiting for state change and returns 0, confusing userspace applications that will assume the socket is connected, making e.g. getpeername() calls error out. More specifically, the issue was discovered in libcoap, where newly-added AF_LLC socket support was behaving differently from AF_INET connections due to EINPROGRESS handling being skipped. Set rc to -EINPROGRESS if connect() would not block, akin to AF_INET sockets. Signed-off-by: Ernestas Kulik <ernestas.k@iconn-networks.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260421060304.285419-1-ernestas.k@iconn-networks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23ipv4: icmp: validate reply type before using icmp_pointersRuide Cao
Extended echo replies use ICMP_EXT_ECHOREPLY as the outbound reply type. That value is outside the range covered by icmp_pointers[], which only describes the traditional ICMP types up to NR_ICMP_TYPES. Avoid consulting icmp_pointers[] for reply types outside that range, and use array_index_nospec() for the remaining in-range lookup. Normal ICMP replies keep their existing behavior unchanged. Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruide Cao <caoruide123@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0dace90c01a5978e829ca741ef684dbd7304ce62.1776628519.git.caoruide123@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23tcp: send a challenge ACK on SEG.ACK > SND.NXTJiayuan Chen
RFC 5961 Section 5.2 validates an incoming segment's ACK value against the range [SND.UNA - MAX.SND.WND, SND.NXT] and states: "All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back." Commit 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") opted Linux into this mitigation and implements the challenge ACK on the lower side (SEG.ACK < SND.UNA - MAX.SND.WND), but the symmetric upper side (SEG.ACK > SND.NXT) still takes the pre-RFC-5961 path and silently returns SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, even though RFC 793 Section 3.9 (now RFC 9293 Section 3.10.7.4) has always required: "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return." Complete the mitigation by sending a challenge ACK on that branch, reusing the existing tcp_send_challenge_ack() path which already enforces the per-socket RFC 5961 Section 7 rate limit via __tcp_oow_rate_limited(). FLAG_NO_CHALLENGE_ACK is honoured for symmetry with the lower-edge case. Update the existing tcp_ts_recent_invalid_ack.pkt selftest, which drives this exact path, to consume the new challenge ACK. Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260422123605.320000-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23net/smc: avoid early lgr access in smc_clc_wait_msgRuijie Li
A CLC decline can be received while the handshake is still in an early stage, before the connection has been associated with a link group. The decline handling in smc_clc_wait_msg() updates link-group level sync state for first-contact declines, but that state only exists after link group setup has completed. Guard the link-group update accordingly and keep the per-socket peer diagnosis handling unchanged. This preserves the existing sync_err handling for established link-group contexts and avoids touching link-group state before it is available. Fixes: 0cfdd8f92cac ("smc: connection and link group creation") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Ruijie Li <ruijieli51@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Link: https://patch.msgid.link/08c68a5c817acf198cce63d22517e232e8d60718.1776850759.git.ruijieli51@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-23hv_sock: Return -EIO for malformed/short packetsDexuan Cui
Commit f63152958994 fixes a regression, however it fails to report an error for malformed/short packets -- normally we should never see such packets, but let's report an error for them just in case. Fixes: f63152958994 ("hv_sock: Report EOF instead of -EIO for FIN") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260423064811.1371749-1-decui@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>