summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
14 hoursMerge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
32 hoursMerge tag 'net-next-7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...
39 hoursMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes in preparation for the net-next PR. Conflicts: include/net/sch_generic.h a6bd339dbb351 ("net_sched: fix skb memory leak in deferred qdisc drops") ff2998f29f390 ("net: sched: introduce qdisc-specific drop reason tracing") https://lore.kernel.org/adz0iX85FHMz0HdO@sirena.org.uk drivers/net/ethernet/airoha/airoha_eth.c 1acdfbdb516b ("net: airoha: Fix VIP configuration for AN7583 SoC") bf3471e6e6c0 ("net: airoha: Make flow control source port mapping dependent on nbq parameter") Adjacent changes: drivers/net/ethernet/airoha/airoha_ppe.c f44218cd5e6a ("net: airoha: Reset PPE cpu port configuration in airoha_ppe_hw_init()") 7da62262ec96 ("inet: add ip_local_port_step_width sysctl to improve port usage distribution") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daystcp: Don't set treq->req_usec_ts in cookie_tcp_reqsk_init().Kuniyuki Iwashima
Commit de5626b95e13 ("tcp: Factorise cookie-independent fields initialisation in cookie_v[46]_check().") miscategorised tcp_rsk(req)->req_usec_ts init to cookie_tcp_reqsk_init(), which is used by both BPF/non-BPF SYN cookie reqsk. Rather, it should have been moved to cookie_tcp_reqsk_alloc() by commit 8e7bab6b9652 ("tcp: Factorise cookie-dependent fields initialisation in cookie_v[46]_check()") so that only non-BPF SYN cookie sets tcp_rsk(req)->req_usec_ts to false. Let's move the initialisation to cookie_tcp_reqsk_alloc() to respect bpf_tcp_req_attrs.usec_ts_ok. Fixes: e472f88891ab ("bpf: tcp: Support arbitrary SYN Cookie.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260410235328.1773449-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 daysudp: Force compute_score to always inlineGabriel Krisman Bertazi
Back in 2024 I reported a 7-12% regression on an iperf3 UDP loopback thoughput test that we traced to the extra overhead of calling compute_score on two places, introduced by commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present"). At the time, I pointed out the overhead was caused by the multiple calls, associated with cpu-specific mitigations, and merged commit 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") to jump back explicitly, to force the rescore call in a single place. Recently though, we got another regression report against a newer distro version, which a team colleague traced back to the same root-cause. Turns out that once we updated to gcc-13, the compiler got smart enough to unroll the loop, undoing my previous mitigation. Let's bite the bullet and __always_inline compute_score on both ipv4 and ipv6 to prevent gcc from de-optimizing it again in the future. These functions are only called in two places each, udpX_lib_lookup1 and udpX_lib_lookup2, so the extra size shouldn't be a problem and it is hot enough to be very visible in profilings. In fact, with gcc13, forcing the inline will prevent gcc from unrolling the fix from commit 50aee97d1511, so we don't end up increasing udpX_lib_lookup2 at all. I haven't recollected the results myself, as I don't have access to the machine at the moment. But the same colleague reported 4.67% inprovement with this patch in the loopback benchmark, solving the regression report within noise margins. Eric Dumazet reported no size change to vmlinux when built with clang. I report the same also with gcc-13: scripts/bloat-o-meter vmlinux vmlinux-inline add/remove: 0/2 grow/shrink: 4/0 up/down: 616/-416 (200) Function old new delta udp6_lib_lookup2 762 949 +187 __udp6_lib_lookup 810 975 +165 udp4_lib_lookup2 757 906 +149 __udp4_lib_lookup 871 986 +115 __pfx_compute_score 32 - -32 compute_score 384 - -384 Total: Before=35011784, After=35011984, chg +0.00% Fixes: 50aee97d1511 ("udp: Avoid call to compute_score on multiple sites") Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://patch.msgid.link/20260410155936.654915-1-krisman@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysMerge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
3 daystcp: update window_clamp when SO_RCVBUF is setJakub Kicinski
Commit under Fixes moved recomputing the window clamp to tcp_measure_rcv_mss() (when scaling_ratio changes). I suspect it missed the fact that we don't recompute the clamp when rcvbuf is set. Until scaling_ratio changes we are stuck with the old window clamp which may be based on the small initial buffer. scaling_ratio may never change. Inspired by Eric's recent commit d1361840f8c5 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning") plumb the user action thru to TCP and have it update the clamp. A smaller fix would be to just have tcp_rcvbuf_grow() adjust the clamp even if SOCK_RCVBUF_LOCK is set. But IIUC this is what we were trying to get away from in the first place. Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumaze@google.com> Link: https://patch.msgid.link/20260408001438.129165-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysMerge branch 'net-reduce-sk_filter-and-friends-bloat'Jakub Kicinski
Eric Dumazet says: ==================== net: reduce sk_filter() (and friends) bloat Some functions return an error by value, and a drop_reason by an output parameter. This extra parameter can force stack canaries. A drop_reason is enough and more efficient. This series reduces bloat by 678 bytes on x86_64: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.final add/remove: 0/0 grow/shrink: 3/18 up/down: 79/-757 (-678) Function old new delta vsock_queue_rcv_skb 50 79 +29 ipmr_cache_report 1290 1315 +25 ip6mr_cache_report 1322 1347 +25 tcp_v6_rcv 3169 3167 -2 packet_rcv_spkt 329 327 -2 unix_dgram_sendmsg 1731 1726 -5 netlink_unicast 957 945 -12 netlink_dump 1372 1359 -13 sk_filter_trim_cap 889 858 -31 netlink_broadcast_filtered 1633 1595 -38 tcp_v4_rcv 3152 3111 -41 raw_rcv_skb 122 80 -42 ping_queue_rcv_skb 109 61 -48 ping_rcv 215 162 -53 rawv6_rcv_skb 278 224 -54 __sk_receive_skb 690 632 -58 raw_rcv 591 527 -64 udpv6_queue_rcv_one_skb 935 869 -66 udp_queue_rcv_one_skb 919 853 -66 tun_net_xmit 1146 1074 -72 sock_queue_rcv_skb_reason 166 76 -90 Total: Before=29722890, After=29722212, chg -0.00% Future conversions from sock_queue_rcv_skb() to sock_queue_rcv_skb_reason() can be done later. ==================== Link: https://patch.msgid.link/20260409145625.2306224-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: change sk_filter_trim_cap() to return a drop_reason by valueEric Dumazet
Current return value can be replaced with the drop_reason, reducing kernel bloat: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/11 up/down: 32/-603 (-571) Function old new delta tcp_v6_rcv 3135 3167 +32 unix_dgram_sendmsg 1731 1726 -5 netlink_unicast 957 945 -12 netlink_dump 1372 1359 -13 sk_filter_trim_cap 882 858 -24 tcp_v4_rcv 3143 3111 -32 __pfx_tcp_filter 32 - -32 netlink_broadcast_filtered 1633 1595 -38 sock_queue_rcv_skb_reason 126 76 -50 tun_net_xmit 1127 1074 -53 __sk_receive_skb 690 632 -58 udpv6_queue_rcv_one_skb 935 869 -66 udp_queue_rcv_one_skb 919 853 -66 tcp_filter 154 - -154 Total: Before=29722783, After=29722212, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daystcp: change tcp_filter() to return the reason by valueEric Dumazet
sk_filter_trim_cap() will soon return the reason by value, do the same for tcp_filter(). Note: tcp_filter() is no longer inlined. Following patch will inline it again. $ scripts/bloat-o-meter -t vmlinux.4 vmlinux.5 add/remove: 2/0 grow/shrink: 0/2 up/down: 186/-43 (143) Function old new delta tcp_filter - 154 +154 __pfx_tcp_filter - 32 +32 tcp_v4_rcv 3152 3143 -9 tcp_v6_rcv 3169 3135 -34 Total: Before=29722640, After=29722783, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: change sock_queue_rcv_skb_reason() to return a drop_reasonEric Dumazet
Change sock_queue_rcv_skb_reason() to return the drop_reason directly instead of using a reference. This is part of an effort to remove stack canaries and reduce bloat. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 3/7 up/down: 79/-301 (-222) Function old new delta vsock_queue_rcv_skb 50 79 +29 ipmr_cache_report 1290 1315 +25 ip6mr_cache_report 1322 1347 +25 packet_rcv_spkt 329 327 -2 sock_queue_rcv_skb_reason 166 128 -38 raw_rcv_skb 122 80 -42 ping_queue_rcv_skb 109 61 -48 ping_rcv 215 162 -53 rawv6_rcv_skb 278 224 -54 raw_rcv 591 527 -64 Total: Before=29722890, After=29722668, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409145625.2306224-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysgre: Count GRE packet dropsGal Pressman
GRE is silently dropping packets without updating statistics. In case of drop, increment rx_dropped counter to provide visibility into packet loss. For the case where no GRE protocol handler is registered, use rx_nohandler. Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260409090945.1542440-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daystcp: add indirect call wrapper in tcp_conn_request()Eric Dumazet
Small improvement in SYN processing, to directly call tcp_v6_init_seq_and_ts_off() or tcp_v4_init_seq_and_ts_off(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260410174950.745670-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daystcp: return a drop_reason from tcp_add_backlog()Eric Dumazet
Part of a stack canary removal from tcp_v{4,6}_rcv(). Return a drop_reason instead of a boolean, so that we no longer have to pass the address of a local variable. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-37 (-37) Function old new delta tcp_v6_rcv 3133 3129 -4 tcp_v4_rcv 3206 3202 -4 tcp_add_backlog 1281 1252 -29 Total: Before=25567186, After=25567149, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260409101147.1642967-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: Add net_cookie to Dead loop messagesChris J Arges
Network devices can have the same name within different network namespaces. To help distinguish these devices, add the net_cookie value which can be used to identify the netns. Signed-off-by: Chris J Arges <carges@cloudflare.com> Link: https://patch.msgid.link/20260408191056.1036330-1-carges@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") 78723a62b969a ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c 51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode") 6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'ipsec-2026-04-08' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-04-08 1) Clear trailing padding in build_polexpire() to prevent leaking unititialized memory. From Yasuaki Torimaru. 2) Fix aevent size calculation when XFRMA_IF_ID is used. From Keenan Dong. 3) Wait for RCU readers during policy netns exit before freeing the policy hash tables. 4) Fix dome too eaerly dropped references on the netdev when uding transport mode. From Qi Tang. 5) Fix refcount leak in xfrm_migrate_policy_find(). From Kotlyarov Mihail. 6) Fix two fix info leaks in build_report() and in build_mapping(). From Greg Kroah-Hartman. 7) Zero aligned sockaddr tail in PF_KEY exports. From Zhengchuan Liang. * tag 'ipsec-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: net: af_key: zero aligned sockaddr tail in PF_KEY exports xfrm_user: fix info leak in build_report() xfrm_user: fix info leak in build_mapping() xfrm: fix refcount leak in xfrm_migrate_policy_find xfrm: hold dev ref until after transport_finish NF_HOOK xfrm: Wait for RCU readers during policy netns exit xfrm: account XFRMA_IF_ID in aevent size calculation xfrm: clear trailing padding in build_polexpire() ==================== Link: https://patch.msgid.link/20260408095925.253681-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysxfrm: hold dev ref until after transport_finish NF_HOOKQi Tang
After async crypto completes, xfrm_input_resume() calls dev_put() immediately on re-entry before the skb reaches transport_finish. The skb->dev pointer is then used inside NF_HOOK and its okfn, which can race with device teardown. Remove the dev_put from the async resumption entry and instead drop the reference after the NF_HOOK call in transport_finish, using a saved device pointer since NF_HOOK may consume the skb. This covers NF_DROP, NF_QUEUE and NF_STOLEN paths that skip the okfn. For non-transport exits (decaps, gro, drop) and secondary async return points, release the reference inline when async is set. Suggested-by: Florian Westphal <fw@strlen.de> Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets through tasklet") Cc: stable@vger.kernel.org Signed-off-by: Qi Tang <tpluszz77@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
9 daystcp: add recv_should_stop helperGeliang Tang
Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked() and tcp_splice_read() to check whether to stop receiving. And use this helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daysmm: rename zap_page_range_single() to zap_vma_range()David Hildenbrand (Arm)
Let's rename it to make it better match our new naming scheme. While at it, polish the kerneldoc. [akpm@linux-foundation.org: fix rustfmtcheck] Link: https://lkml.kernel.org/r/20260227200848.114019-15-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Puranjay Mohan <puranjay@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
11 daysmm/memory: remove "zap_details" parameter from zap_page_range_single()David Hildenbrand (Arm)
Nobody except memory.c should really set that parameter to non-NULL. So let's just drop it and make unmap_mapping_range_vma() use zap_page_range_single_batched() instead. [david@kernel.org: format on a single line] Link: https://lkml.kernel.org/r/8a27e9ac-2025-4724-a46d-0a7c90894ba7@kernel.org Link: https://lkml.kernel.org/r/20260227200848.114019-3-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Puranjay Mohan <puranjay@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
12 daysipv4: icmp: fix null-ptr-deref in icmp_build_probe()Yiqi Sun
ipv6_stub->ipv6_dev_find() may return ERR_PTR(-EAFNOSUPPORT) when the IPv6 stack is not active (CONFIG_IPV6=m and not loaded), and passing this error pointer to dev_hold() will cause a kernel crash with null-ptr-deref. Instead, silently discard the request. RFC 8335 does not appear to define a specific response for the case where an IPv6 interface identifier is syntactically valid but the implementation cannot perform the lookup at runtime, and silently dropping the request may safer than misreporting "No Such Interface". Fixes: d329ea5bd884 ("icmp: add response to RFC 8335 PROBE messages") Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com> Link: https://patch.msgid.link/20260402070419.2291578-1-sunyiqixm@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysipv4: nexthop: allocate skb dynamically in rtm_get_nexthop()Fernando Fernandez Mancera
When querying a nexthop object via RTM_GETNEXTHOP, the kernel currently allocates a fixed-size skb using NLMSG_GOODSIZE. While sufficient for single nexthops and small Equal-Cost Multi-Path groups, this fixed allocation fails for large nexthop groups like 512 nexthops. This results in the following warning splat: WARNING: net/ipv4/nexthop.c:3395 at rtm_get_nexthop+0x176/0x1c0, CPU#20: rep/4608 [...] RIP: 0010:rtm_get_nexthop (net/ipv4/nexthop.c:3395) [...] Call Trace: <TASK> rtnetlink_rcv_msg (net/core/rtnetlink.c:6989) netlink_rcv_skb (net/netlink/af_netlink.c:2550) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) ____sys_sendmsg (net/socket.c:721 net/socket.c:736 net/socket.c:2585) ___sys_sendmsg (net/socket.c:2641) __sys_sendmsg (net/socket.c:2671) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Fix this by allocating the size dynamically using nh_nlmsg_size() and using nlmsg_new(), this is consistent with nexthop_notify() behavior. In addition, adjust nh_nlmsg_size_grp() so it calculates the size needed based on flags passed. While at it, also add the size of NHA_FDB for nexthop group size calculation as it was missing too. This cannot be reproduced via iproute2 as the group size is currently limited and the command fails as follows: addattr_l ERROR: message exceeded bound of 1048 Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Reported-by: Yiming Qian <yimingqian591@gmail.com> Closes: https://lore.kernel.org/netdev/CAL_bE8Li2h4KO+AQFXW4S6Yb_u5X4oSKnkywW+LPFjuErhqELA@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dumpFernando Fernandez Mancera
Currently NHA_HW_STATS_ENABLE is included twice everytime a dump of nexthop group is performed with NHA_OP_FLAG_DUMP_STATS. As all the stats querying were moved to nla_put_nh_group_stats(), leave only that instance of the attribute querying. Fixes: 5072ae00aea4 ("net: nexthop: Expose nexthop group HW stats to user space") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260402072613.25262-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysinet: remove leftover EXPORT_SYMBOL()Eric Dumazet
IPv6 is no longer a module, we no longer need to export these symbols. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260402174430.2462800-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29ipv4: drop ipv6_stub usage and use direct function callsFernando Fernandez Mancera
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. The IPv4 stack interacts with IPv6 mainly to support IPv4 routes with IPv6 next-hops (RFC 8950). Convert all these cross-family calls from ipv6_stub to direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-8-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29net: remove EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() macrosFernando Fernandez Mancera
As IPv6 is built-in only, the macro is always evaluating to an empty one. Remove it completely from the code. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260325120928.15848-3-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29ipv6: convert CONFIG_IPV6 to built-in only and clean up KconfigsFernando Fernandez Mancera
Maintaining a modular IPv6 stack offers image size savings for specific setups, this benefit is outweighed by the architectural burden it imposes on the subsystems on implementation and maintenance. Therefore, drop it. Change CONFIG_IPV6 from tristate to bool. Remove all Kconfig dependencies across the tree that explicitly checked for IPV6=m. In addition, remove MODULE_DESCRIPTION(), MODULE_ALIAS(), MODULE_AUTHOR() and MODULE_LICENSE(). This is also replacing module_init() by device_initcall(). It is not possible to use fs_initcall() as IPv4 does because that creates a race condition on IPv6 addrconf. Finally, modify the default configs from CONFIG_IPV6=m to CONFIG_IPV6=y except for m68k as according to the bloat-o-meter the image is increasing by 330KB~ and that isn't acceptable. Instead, disable IPv6 on this architecture by default. This is aligned with m68k RAM requirements and recommendations [1]. [1] http://www.linux-m68k.org/faq/ram.html Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # arm64 Link: https://patch.msgid.link/20260325120928.15848-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26tcp: tcp_vegas: use tcp_vegas_cwnd_event_tx_start()Eric Dumazet
While net/ipv4/tcp_yeah.c is correctly setting .cwnd_event_tx_start to tcp_vegas_cwnd_event_tx_start(), I forgot to do the same in tcp_vegas.c Fixes: d1e59a469737 ("tcp: add cwnd_event_tx_start to tcp_congestion_ops") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260325212440.4146579-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-24tcp: add cwnd_event_tx_start to tcp_congestion_opsEric Dumazet
(tcp_congestion_ops)->cwnd_event() is called very often, with @event oscillating between CA_EVENT_TX_START and other values. This is not branch prediction friendly. Provide a new cwnd_event_tx_start pointer dedicated for CA_EVENT_TX_START. Both BBR and CUBIC benefit from this change, since they only care about CA_EVENT_TX_START. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 4/4 grow/shrink: 3/1 up/down: 564/-568 (-4) Function old new delta bbr_cwnd_event_tx_start - 450 +450 cubictcp_cwnd_event_tx_start - 70 +70 __pfx_cubictcp_cwnd_event_tx_start - 16 +16 __pfx_bbr_cwnd_event_tx_start - 16 +16 tcp_unregister_congestion_control 93 99 +6 tcp_update_congestion_control 518 521 +3 tcp_register_congestion_control 422 425 +3 __tcp_transmit_skb 3308 3306 -2 __pfx_cubictcp_cwnd_event 16 - -16 __pfx_bbr_cwnd_event 16 - -16 cubictcp_cwnd_event 80 - -80 bbr_cwnd_event 454 - -454 Total: Before=25240512, After=25240508, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323234920.1097858-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-24Merge tag 'ipsec-2026-03-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-23udp: Fix wildcard bind conflict check when using hash2Martin KaFai Lau
When binding a udp_sock to a local address and port, UDP uses two hashes (udptable->hash and udptable->hash2) for collision detection. The current code switches to "hash2" when hslot->count > 10. "hash2" is keyed by local address and local port. "hash" is keyed by local port only. The issue can be shown in the following bind sequence (pseudo code): bind(fd1, "[fd00::1]:8888") bind(fd2, "[fd00::2]:8888") bind(fd3, "[fd00::3]:8888") bind(fd4, "[fd00::4]:8888") bind(fd5, "[fd00::5]:8888") bind(fd6, "[fd00::6]:8888") bind(fd7, "[fd00::7]:8888") bind(fd8, "[fd00::8]:8888") bind(fd9, "[fd00::9]:8888") bind(fd10, "[fd00::10]:8888") /* Correctly return -EADDRINUSE because "hash" is used * instead of "hash2". udp_lib_lport_inuse() detects the * conflict. */ bind(fail_fd, "[::]:8888") /* After one more socket is bound to "[fd00::11]:8888", * hslot->count exceeds 10 and "hash2" is used instead. */ bind(fd11, "[fd00::11]:8888") bind(fail_fd, "[::]:8888") /* succeeds unexpectedly */ The same issue applies to the IPv4 wildcard address "0.0.0.0" and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For example, if there are existing sockets bound to "192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or "[::ffff:0.0.0.0]:8888" can also miss the conflict when hslot->count > 10. TCP inet_csk_get_port() already has the correct check in inet_use_bhash2_on_bind(). Rename it to inet_use_hash2_on_bind() and move it to inet_hashtables.h so udp.c can reuse it in this fix. Fixes: 30fff9231fad ("udp: bind() optimisation") Reported-by: Andrew Onyshchuk <oandrew@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c 598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap") 3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19icmp: fix NULL pointer dereference in icmp_tag_validation()Weiming Shi
icmp_tag_validation() unconditionally dereferences the result of rcu_dereference(inet_protos[proto]) without checking for NULL. The inet_protos[] array is sparse -- only about 15 of 256 protocol numbers have registered handlers. When ip_no_pmtu_disc is set to 3 (hardened PMTU mode) and the kernel receives an ICMP Fragmentation Needed error with a quoted inner IP header containing an unregistered protocol number, the NULL dereference causes a kernel panic in softirq context. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] RIP: 0010:icmp_unreach (net/ipv4/icmp.c:1085 net/ipv4/icmp.c:1143) Call Trace: <IRQ> icmp_rcv (net/ipv4/icmp.c:1527) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207) ip_local_deliver_finish (net/ipv4/ip_input.c:242) ip_local_deliver (net/ipv4/ip_input.c:262) ip_rcv (net/ipv4/ip_input.c:573) __netif_receive_skb_one_core (net/core/dev.c:6164) process_backlog (net/core/dev.c:6628) handle_softirqs (kernel/softirq.c:561) </IRQ> Add a NULL check before accessing icmp_strict_tag_validation. If the protocol has no registered handler, return false since it cannot perform strict tag validation. Fixes: 8ed1dc44d3e9 ("ipv4: introduce hardened ip_no_pmtu_disc mode") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260318130558.1050247-4-bestswngs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-17fou: Remove IPPROTO_UDPLITE check in gue_err() and gue6_err().Kuniyuki Iwashima
UDP-Lite has been removed, and its error handler is no longer found in either inet_protos[IPPROTO_UDPLITE] or inet6_protos[IPPROTO_UDPLITE]. The recursion fixed by the protocol check in gue_err() and gue6_err() no longer occurs with UDP-Lite. Let's remove the checks. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260316133127.2646421-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-16bonding: prevent potential infinite loop in bond_header_parse()Eric Dumazet
bond_header_parse() can loop if a stack of two bonding devices is setup, because skb->dev always points to the hierarchy top. Add new "const struct net_device *dev" parameter to (struct header_ops)->parse() method to make sure the recursion is bounded, and that the final leaf parse method is called. Fixes: 950803f72547 ("bonding: fix type confusion in bond_setup_by_slave()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com> Tested-by: Jiayuan Chen <jiayuan.chen@shopee.com> Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260315104152.1436867-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14ipv4: validate IPV4_DEVCONF attributes properlyFernando Fernandez Mancera
As the IPV4_DEVCONF netlink attributes are not being validated, it is possible to use netlink to set read-only values like mc_forwarding. In addition, valid ranges are not being validated neither but that is less relevant as they aren't in sysctl. To avoid similar situations in the future, define a NLA policy for IPV4_DEVCONF attributes which are nested in IFLA_INET_CONF. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260312142637.5704-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14net: dropreason: add SKB_DROP_REASON_RECURSION_LIMITEric Dumazet
ip[6]tunnel_xmit() can drop packets if a too deep recursion level is detected. Add SKB_DROP_REASON_RECURSION_LIMIT drop reason. We will use this reason later in __dev_queue_xmit(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260312201824.203093-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOWSimon Baatz
Since commit 9ca48d616ed7 ("tcp: do not accept packets beyond window"), the path leading to SKB_DROP_REASON_TCP_OVERWINDOW in tcp_data_queue() is probably dead. However, it can be reached now when tcp_max_receive_window() is larger than tcp_receive_window(). In that case, increment LINUX_MIB_BEYOND_WINDOW as done in tcp_sequence(). Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-3-4c7f96b1ec69@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14tcp: implement RFC 7323 window retraction receiver requirementsSimon Baatz
By default, the Linux TCP implementation does not shrink the advertised window (RFC 7323 calls this "window retraction") with the following exceptions: - When an incoming segment cannot be added due to the receive buffer running out of memory. Since commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze") a zero window will be advertised in this case. It turns out that reaching the required memory pressure is easy when window scaling is in use. In the simplest case, sending a sufficient number of segments smaller than the scale factor to a receiver that does not read data is enough. - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by allowing the tcp window to shrink") addressed the "eating memory" problem by introducing a sysctl knob that allows shrinking the window before running out of memory. However, RFC 7323 does not only state that shrinking the window is necessary in some cases, it also formulates requirements for TCP implementations when doing so (Section 2.4). This commit addresses the receiver-side requirements: After retracting the window, the peer may have a snd_nxt that lies within a previously advertised window but is now beyond the retracted window. This means that all incoming segments (including pure ACKs) will be rejected until the application happens to read enough data to let the peer's snd_nxt be in window again (which may be never). To comply with RFC 7323, the receiver MUST honor any segment that would have been in window for any ACK sent by the receiver and, when window scaling is in effect, SHOULD track the maximum window sequence number it has advertised. This patch tracks that maximum window sequence number rcv_mwnd_seq throughout the connection and uses it in tcp_sequence() when deciding whether a segment is acceptable. rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in tcp_select_window(). If we count tcp_sequence() as fast path, it is read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's cacheline group. The logic for handling received data in tcp_data_queue() is already sufficient and does not need to be updated. Signed-off-by: Simon Baatz <gmbnomis@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260309-tcp_rfc7323_retract_wnd_rfc-v3-1-4c7f96b1ec69@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Don't pass proto to __udp4_lib_rcv() and __udp6_lib_rcv().Kuniyuki Iwashima
UDP and UDP-Lite shared __udp4_lib_rcv() and __udp6_lib_rcv() by passing IPPROTO_UDP or IPPROTO_UDPLITE. Now, @proto is always IPPROTO_UDP. Let's not pass it and rename the functions accordingly. With this series removing a bunch of conditionals for UDP-Lite from the fast path, udp_rr with 20,000 flows sees a 10% increase in pps (13.3 Mpps -> 14.7 Mpps) on an AMD EPYC 7B12 (Zen 2) 64-Core Processor platform. [ With FDO, the baseline is much higher and the delta was ~3%, 20.1 Mpps -> 20.7 Mpps ] Before: $ nstat > /dev/null; sleep 1; nstat | grep Udp Udp6InDatagrams 14013408 0.0 Udp6OutDatagrams 14013128 0.0 After: $ nstat > /dev/null; sleep 1; nstat | grep Udp Udp6InDatagrams 15491971 0.0 Udp6OutDatagrams 15491671 0.0 $ ./scripts/bloat-o-meter vmlinux.before vmlinux.after add/remove: 13/75 grow/shrink: 11/75 up/down: 13777/-18401 (-4624) Function old new delta udp4_gro_receive 872 866 -6 udp6_gro_receive 910 903 -7 udp_rcv 32 1727 +1695 udpv6_rcv 32 1450 +1418 __udp4_lib_rcv 2045 - -2045 __udp6_lib_rcv 2084 - -2084 udp_unicast_rcv_skb 160 149 -11 udp6_unicast_rcv_skb 196 181 -15 __udp4_lib_mcast_deliver 925 846 -79 __udp6_lib_mcast_deliver 922 810 -112 __udp4_lib_lookup 973 969 -4 __udp6_lib_lookup 940 929 -11 __udp4_lib_lookup_skb 106 100 -6 __udp6_lib_lookup_skb 71 66 -5 udp4_lib_lookup_skb 132 127 -5 udp6_lib_lookup_skb 87 81 -6 udp_queue_rcv_skb 326 356 +30 udpv6_queue_rcv_skb 331 361 +30 udp_queue_rcv_one_skb 1233 914 -319 udpv6_queue_rcv_one_skb 1250 930 -320 __udp_enqueue_schedule_skb 1067 995 -72 udp_rcv_segment 520 480 -40 udp_post_segment_fix_csum 120 - -120 udp_lib_checksum_complete 200 84 -116 udp_err 27 1103 +1076 udpv6_err 36 1417 +1381 __udp4_lib_err 1112 - -1112 __udp6_lib_err 1448 - -1448 udp_recvmsg 1149 994 -155 udpv6_recvmsg 1349 1294 -55 udp_sendmsg 2730 2648 -82 udp_send_skb 909 681 -228 udpv6_sendmsg 3022 2861 -161 udp_v6_send_skb 1214 952 -262 ... Total: Before=18446744073748075501, After=18446744073748070877, chg -0.00% Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-16-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Don't pass udptable to IPv4 socket lookup functions.Kuniyuki Iwashima
Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to pass the pointer down to many socket lookup functions. UDP-Lite gone, and we do not need to do that. Let's fetch net->ipv4.udp_table only where needed in IPv4 stack: __udp4_lib_lookup(), __udp4_lib_mcast_deliver(), and udp_diag_dump(). Some functions are renamed as the wrapper functions are no longer needed. __udp4_lib_err() -> udp_err() __udp_diag_destroy() -> udp_diag_destroy() udp_dump_one() -> udp_diag_dump_one() udp_dump() -> udp_diag_dump() Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Don't pass udptable to IPv6 socket lookup functions.Kuniyuki Iwashima
Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to pass the pointer down to many socket lookup functions. UDP-Lite gone, and we do not need to do that. Let's fetch net->ipv4.udp_table only where needed in IPv6 stack: __udp6_lib_lookup() and __udp6_lib_mcast_deliver(). __udp6_lib_err() is renamed to udpv6_err() as its wrapper is no longer needed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove dead check in __udp[46]_lib_lookup() for BPF.Kuniyuki Iwashima
BPF socket lookup for SO_REUSEPORT does not support UDP-Lite. In __udp4_lib_lookup() and __udp6_lib_lookup(), it checks if the passed udptable pointer is the same as net->ipv4.udp_table, which is only true for UDP. Now, the condition is always true. Let's remove the check. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove udp_table in struct udp_seq_afinfo.Kuniyuki Iwashima
Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to fetch them from different pointers for procfs or bpf iterator. UDP always has its global or per-netns table in net->ipv4.udp_table and struct udp_seq_afinfo.udp_table is NULL. OTOH, UDP-Lite had only one global table in the pointer. We no longer use the field. Let's remove it and udp_get_table_seq(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove struct proto.h.udp_table.Kuniyuki Iwashima
Since UDP and UDP-Lite had dedicated socket hash tables for each, we have had to fetch them from different pointers. UDP always has its global or per-netns table in net->ipv4.udp_table and struct proto.h.udp_table is NULL. OTOH, UDP-Lite had only one global table in the pointer. We no longer use the field. Let's remove it and udp_get_table_prot(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV.Kuniyuki Iwashima
UDP-Lite supports variable-length checksum and has two socket options, UDPLITE_SEND_CSCOV and UDPLITE_RECV_CSCOV, to control the checksum coverage. Let's remove the support. setsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was only available for UDP-Lite and returned -ENOPROTOOPT for UDP. Now, the options are handled in ip_setsockopt() and ipv6_setsockopt(), which still return the same error. getsockopt(UDPLITE_SEND_CSCOV / UDPLITE_RECV_CSCOV) was available for UDP and always returned 0, meaning full checksum, but now -ENOPROTOOPT is returned. Given that getsockopt() is meaningless for UDP and even the options are not defined under include/uapi/, this should not be a problem. $ man 7 udplite ... BUGS Where glibc support is missing, the following definitions are needed: #define IPPROTO_UDPLITE 136 #define UDPLITE_SEND_CSCOV 10 #define UDPLITE_RECV_CSCOV 11 Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove partial csum code in TX.Kuniyuki Iwashima
UDP TX paths also have some code for UDP-Lite partial checksum: * udplite_csum() in udp_send_skb() and udp_v6_send_skb() * udplite_getfrag() in udp_sendmsg() and udpv6_sendmsg() Let's remove such code. Now, we can use IPPROTO_UDP directly instead of sk->sk_protocol or fl6->flowi6_proto for csum_tcpudp_magic() and csum_ipv6_magic(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-13udp: Remove partial csum code in RX.Kuniyuki Iwashima
UDP-Lite supports the partial checksum and the coverage is stored in the position of the length field of struct udphdr. In RX paths, udp4_csum_init() / udp6_csum_init() save the value in UDP_SKB_CB(skb)->cscov and set UDP_SKB_CB(skb)->partial_cov to 1 if the coverage is not full. The subsequent processing diverges depending on the value, but such paths are now dead. Also, these functions have some code guarded for UDP: * udp_unicast_rcv_skb / udp6_unicast_rcv_skb * __udp4_lib_rcv() and __udp6_lib_rcv(). Let's remove the partial csum code and the unnecessary guard for UDP-Lite in RX. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260311052020.1213705-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>