Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-09-26
1) Fix field-spanning memcpy warning in AH output.
From Charalampos Mitrodimas.
2) Replace the strcpy() calls for alg_name by strscpy().
From Miguel García.
* tag 'ipsec-next-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: xfrm_user: use strscpy() for alg_name
net: ipv6: fix field-spanning memcpy warning in AH output
====================
Link: https://patch.msgid.link/20250926053025.2242061-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc8).
Conflicts:
drivers/net/can/spi/hi311x.c
6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled")
27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users")
https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org
net/smc/smc_loopback.c
drivers/dibs/dibs_loopback.c
a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()")
cc21191b584c ("dibs: Move data path to dibs layer")
https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org
Adjacent changes:
drivers/net/dsa/lantiq/lantiq_gswip.c
c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()")
7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xiumei reported a regression in IPsec offload tests over xfrmi, where
the traffic for IPv6 over IPv4 tunnels is processed in SW instead of
going through crypto offload, after commit
cc18f482e8b6 ("xfrm: provide common xdo_dev_offload_ok callback
implementation").
Commit cc18f482e8b6 added a generic version of existing checks
attempting to prevent packets with IPv4 options or IPv6 extension
headers from being sent to HW that doesn't support offloading such
packets. The check mistakenly uses x->props.family (the outer family)
to determine the inner packet's family and verify if
options/extensions are present.
In the case of IPv6 over IPv4, the check compares some of the traffic
class bits to the expected no-options ihl value (5). The original
check was introduced in commit 2ac9cfe78223 ("net/mlx5e: IPSec, Add
Innova IPSec offload TX data path"), and then duplicated in the other
drivers. Before commit cc18f482e8b6, the loose check (ihl > 5) passed
because those traffic class bits were not set to a value that
triggered the no-offload codepath. Packets with options/extension
headers that should have been handled in SW went through the offload
path, and were likely dropped by the NIC or incorrectly
processed. Since commit cc18f482e8b6, the check is now strict (ihl !=
5), and in a basic setup (no traffic class configured), all packets go
through the no-offload codepath.
The commits that introduced the incorrect family checks in each driver
are:
2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
8362ea16f69f ("crypto: chcr - ESN for Inline IPSec Tx")
859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer")
32188be805d0 ("cn10k-ipsec: Allow ipsec crypto offload for skb with SA")
[ixgbe/ixgbevf commits are ignored, as that HW does not support tunnel
mode, thus no cross-family setups are possible]
Fixes: cc18f482e8b6 ("xfrm: provide common xdo_dev_offload_ok callback implementation")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Use ARRAY_SIZE(), so that we know the limit at compile time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
x->id.spi == 0 means "no SPI assigned", but since commit
94f39804d891 ("xfrm: Duplicate SPI Handling"), we now create states
and add them to the byspi list with this value.
__xfrm_state_delete doesn't remove those states from the byspi list,
since they shouldn't be there, and this shows up as a UAF the next
time we go through the byspi list.
Reported-by: syzbot+a25ee9d20d31e483ba7b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a25ee9d20d31e483ba7b
Fixes: 94f39804d891 ("xfrm: Duplicate SPI Handling")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Convert the ->flowic_tos field of struct flowi_common from __u8 to
dscp_t, rename it ->flowic_dscp and propagate these changes to struct
flowi and struct flowi4.
We've had several bugs in the past where ECN bits could interfere with
IPv4 routing, because these bits were not properly cleared when setting
->flowi4_tos. These bugs should be fixed now and the dscp_t type has
been introduced to ensure that variables carrying DSCP values don't
accidentally have any ECN bits set. Several variables and structure
fields have been converted to dscp_t already, but the main IPv4 routing
structure, struct flowi4, is still using a __u8. To avoid any future
regression, this patch converts it to dscp_t.
There are many users to convert at once. Fortunately, around half of
->flowi4_tos users already have a dscp_t value at hand, which they
currently convert to __u8 using inet_dscp_to_dsfield(). For all of
these users, we just need to drop that conversion.
But, although we try to do the __u8 <-> dscp_t conversions at the
boundaries of the network or of user space, some places still store
TOS/DSCP variables as __u8 in core networking code. Those can hardly be
converted either because the data structure is part of UAPI or because
the same variable or field is also used for handling ECN in other parts
of the code. In all of these cases where we don't have a dscp_t
variable at hand, we need to use inet_dsfield_to_dscp() when
interacting with ->flowi4_dscp.
Changes since v1:
* Fix space alignment in __bpf_redirect_neigh_v4() (Ido).
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/29acecb45e911d17446b9a3dbdb1ab7b821ea371.1756128932.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Going forward skb_dst_set will assert that skb dst_entry
is empty during skb_dst_set. skb_dstref_steal is added to reset
existing entry without doing refcnt. Switch to skb_dstref_steal
in __xfrm_route_forward and add a comment on why it's safe
to skip skb_dstref_restore.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250818154032.3173645-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the strcpy() calls that copy the canonical algorithm name into
alg_name with strscpy() to avoid potential overflows and guarantee NULL
termination.
Destination is alg_name in xfrm_algo/xfrm_algo_auth/xfrm_algo_aead
(size CRYPTO_MAX_ALG_NAME).
Tested in QEMU (BusyBox/Alpine rootfs):
- Added ESP AEAD (rfc4106(gcm(aes))) and classic ESP (sha256 + cbc(aes))
- Verified canonical names via ip -d xfrm state
- Checked IPComp negative (unknown algo) and deflate path
Signed-off-by: Miguel García <miguelgarciaroman8@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This is partial revert of commit d53dda291bbd993a29b84d358d282076e3d01506.
This change causes traffic using GSO with SW crypto running through a
NIC capable of HW offload to no longer get segmented during
validate_xmit_xfrm, and is unrelated to the bonding use case mentioned
in the commit.
Fixes: d53dda291bbd ("xfrm: Remove unneeded device check from validate_xmit_xfrm")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Commit 49431af6c4ef incorrectly assumes that the GSO path is only used
by HW offload, but it's also useful for SW crypto.
This patch re-enables GSO for SW crypto. It's not an exact revert to
preserve the other changes made to xfrm_dev_offload_ok afterwards, but
it reverts all of its effects.
Fixes: 49431af6c4ef ("xfrm: rely on XFRM offload")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
While reverting commit f75a2804da39 ("xfrm: destroy xfrm_state
synchronously on net exit path"), I incorrectly changed
xfrm_state_flush's "proto" argument back to IPSEC_PROTO_ANY. This
reverts some of the changes in commit dbb2483b2a46 ("xfrm: clean up
xfrm protocol checks"), and leads to some states not being removed
when we exit the netns.
Pass 0 instead of IPSEC_PROTO_ANY from both xfrm_state_fini
xfrm6_tunnel_net_exit, so that xfrm_state_flush deletes all states.
Fixes: 2a198bbec691 ("Revert "xfrm: destroy xfrm_state synchronously on net exit path"")
Reported-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6641a61fe0e2e89ae8c5
Tested-by: syzbot+6641a61fe0e2e89ae8c5@syzkaller.appspotmail.com
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc8).
Conflicts:
drivers/net/ethernet/microsoft/mana/gdma_main.c
9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion")
755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/ti/icssg/icssg_prueth.h
6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-07-23
1) Optimize to hold device only for the asynchronous decryption,
where it is really needed.
From Jianbo Liu.
2) Align our inbund SA lookup to RFC 4301. Only SPI and protocol
should be used for an inbound SA lookup.
From Aakash Kumar S.
3) Skip redundant statistics update for xfrm crypto offload.
From Jianbo Liu.
Please pull or let me know if there are problems.
* tag 'ipsec-next-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Skip redundant statistics update for crypto offload
xfrm: Duplicate SPI Handling
xfrm: hold device only for the asynchronous decryption
====================
Link: https://patch.msgid.link/20250723080402.3439619-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2025-07-23
1) Premption fixes for xfrm_state_find.
From Sabrina Dubroca.
2) Initialize offload path also for SW IPsec GRO. This fixes a
performance regression on SW IPsec offload.
From Leon Romanovsky.
3) Fix IPsec UDP GRO for IKE packets.
From Tobias Brunner,
4) Fix transport header setting for IPcomp after decompressing.
From Fernando Fernandez Mancera.
5) Fix use-after-free when xfrmi_changelink tries to change
collect_md for a xfrm interface.
From Eyal Birger .
6) Delete the special IPcomp x->tunnel state along with the state x
to avoid refcount problems.
From Sabrina Dubroca.
Please pull or let me know if there are problems.
* tag 'ipsec-2025-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
Revert "xfrm: destroy xfrm_state synchronously on net exit path"
xfrm: delete x->tunnel as we delete x
xfrm: interface: fix use-after-free after changing collect_md xfrm interface
xfrm: ipcomp: adjust transport header after decompressing
xfrm: Set transport header to fix UDP GRO handling
xfrm: always initialize offload path
xfrm: state: use a consistent pcpu_id in xfrm_state_find
xfrm: state: initialize state_ptrs earlier in xfrm_state_find
====================
Link: https://patch.msgid.link/20250723075417.3432644-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This reverts commit f75a2804da391571563c4b6b29e7797787332673.
With all states (whether user or kern) removed from the hashtables
during deletion, there's no need for synchronous destruction of
states. xfrm6_tunnel states still need to have been destroyed (which
will be the case when its last user is deleted (not destroyed)) so
that xfrm6_tunnel_free_spi removes it from the per-netns hashtable
before the netns is destroyed.
This has the benefit of skipping one synchronize_rcu per state (in
__xfrm_state_destroy(sync=true)) when we exit a netns.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The ipcomp fallback tunnels currently get deleted (from the various
lists and hashtables) as the last user state that needed that fallback
is destroyed (not deleted). If a reference to that user state still
exists, the fallback state will remain on the hashtables/lists,
triggering the WARN in xfrm_state_fini. Because of those remaining
references, the fix in commit f75a2804da39 ("xfrm: destroy xfrm_state
synchronously on net exit path") is not complete.
We recently fixed one such situation in TCP due to defered freeing of
skbs (commit 9b6412e6979f ("tcp: drop secpath at the same time as we
currently drop dst")). This can also happen due to IP reassembly: skbs
with a secpath remain on the reassembly queue until netns
destruction. If we can't guarantee that the queues are flushed by the
time xfrm_state_fini runs, there may still be references to a (user)
xfrm_state, preventing the timely deletion of the corresponding
fallback state.
Instead of chasing each instance of skbs holding a secpath one by one,
this patch fixes the issue directly within xfrm, by deleting the
fallback state as soon as the last user state depending on it has been
deleted. Destruction will still happen when the final reference is
dropped.
A separate lockdep class for the fallback state is required since
we're going to lock x->tunnel while x is locked.
Fixes: 9d4139c76905 ("netns xfrm: per-netns xfrm_state_all list")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In the crypto offload path, every packet is still processed by the
software stack. The state's statistics required for the expiration
check are being updated in software.
However, the code also calls xfrm_dev_state_update_stats(), which
triggers a query to the hardware device to fetch statistics. This
hardware query is redundant and introduces unnecessary performance
overhead.
Skip this call when it's crypto offload (not packet offload) to avoid
the unnecessary hardware access, thereby improving performance.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
collect_md property on xfrm interfaces can only be set on device creation,
thus xfrmi_changelink() should fail when called on such interfaces.
The check to enforce this was done only in the case where the xi was
returned from xfrmi_locate() which doesn't look for the collect_md
interface, and thus the validation was never reached.
Calling changelink would thus errornously place the special interface xi
in the xfrmi_net->xfrmi hash, but since it also exists in the
xfrmi_net->collect_md_xfrmi pointer it would lead to a double free when
the net namespace was taken down [1].
Change the check to use the xi from netdev_priv which is available earlier
in the function to prevent changes in xfrm collect_md interfaces.
[1] resulting oops:
[ 8.516540] kernel BUG at net/core/dev.c:12029!
[ 8.516552] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 8.516559] CPU: 0 UID: 0 PID: 12 Comm: kworker/u80:0 Not tainted 6.15.0-virtme #5 PREEMPT(voluntary)
[ 8.516565] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 8.516569] Workqueue: netns cleanup_net
[ 8.516579] RIP: 0010:unregister_netdevice_many_notify+0x101/0xab0
[ 8.516590] Code: 90 0f 0b 90 48 8b b0 78 01 00 00 48 8b 90 80 01 00 00 48 89 56 08 48 89 32 4c 89 80 78 01 00 00 48 89 b8 80 01 00 00 eb ac 90 <0f> 0b 48 8b 45 00 4c 8d a0 88 fe ff ff 48 39 c5 74 5c 41 80 bc 24
[ 8.516593] RSP: 0018:ffffa93b8006bd30 EFLAGS: 00010206
[ 8.516598] RAX: ffff98fe4226e000 RBX: ffffa93b8006bd58 RCX: ffffa93b8006bc60
[ 8.516601] RDX: 0000000000000004 RSI: 0000000000000000 RDI: dead000000000122
[ 8.516603] RBP: ffffa93b8006bdd8 R08: dead000000000100 R09: ffff98fe4133c100
[ 8.516605] R10: 0000000000000000 R11: 00000000000003d2 R12: ffffa93b8006be00
[ 8.516608] R13: ffffffff96c1a510 R14: ffffffff96c1a510 R15: ffffa93b8006be00
[ 8.516615] FS: 0000000000000000(0000) GS:ffff98fee73b7000(0000) knlGS:0000000000000000
[ 8.516619] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.516622] CR2: 00007fcd2abd0700 CR3: 000000003aa40000 CR4: 0000000000752ef0
[ 8.516625] PKRU: 55555554
[ 8.516627] Call Trace:
[ 8.516632] <TASK>
[ 8.516635] ? rtnl_is_locked+0x15/0x20
[ 8.516641] ? unregister_netdevice_queue+0x29/0xf0
[ 8.516650] ops_undo_list+0x1f2/0x220
[ 8.516659] cleanup_net+0x1ad/0x2e0
[ 8.516664] process_one_work+0x160/0x380
[ 8.516673] worker_thread+0x2aa/0x3c0
[ 8.516679] ? __pfx_worker_thread+0x10/0x10
[ 8.516686] kthread+0xfb/0x200
[ 8.516690] ? __pfx_kthread+0x10/0x10
[ 8.516693] ? __pfx_kthread+0x10/0x10
[ 8.516697] ret_from_fork+0x82/0xf0
[ 8.516705] ? __pfx_kthread+0x10/0x10
[ 8.516709] ret_from_fork_asm+0x1a/0x30
[ 8.516718] </TASK>
Fixes: abc340b38ba2 ("xfrm: interface: support collect metadata mode")
Reported-by: Lonial Con <kongln9170@gmail.com>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
(dst_entry)->obsolete is read locklessly, add corresponding
annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250630121934.3399505-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The issue originates when Strongswan initiates an XFRM_MSG_ALLOCSPI
Netlink message, which triggers the kernel function xfrm_alloc_spi().
This function is expected to ensure uniqueness of the Security Parameter
Index (SPI) for inbound Security Associations (SAs). However, it can
return success even when the requested SPI is already in use, leading
to duplicate SPIs assigned to multiple inbound SAs, differentiated
only by their destination addresses.
This behavior causes inconsistencies during SPI lookups for inbound packets.
Since the lookup may return an arbitrary SA among those with the same SPI,
packet processing can fail, resulting in packet drops.
According to RFC 4301 section 4.4.2 , for inbound processing a unicast SA
is uniquely identified by the SPI and optionally protocol.
Reproducing the Issue Reliably:
To consistently reproduce the problem, restrict the available SPI range in
charon.conf : spi_min = 0x10000000 spi_max = 0x10000002
This limits the system to only 2 usable SPI values.
Next, create more than 2 Child SA. each using unique pair of src/dst address.
As soon as the 3rd Child SA is initiated, it will be assigned a duplicate
SPI, since the SPI pool is already exhausted.
With a narrow SPI range, the issue is consistently reproducible.
With a broader/default range, it becomes rare and unpredictable.
Current implementation:
xfrm_spi_hash() lookup function computes hash using daddr, proto, and family.
So if two SAs have the same SPI but different destination addresses, then
they will:
a. Hash into different buckets
b. Be stored in different linked lists (byspi + h)
c. Not be seen in the same hlist_for_each_entry_rcu() iteration.
As a result, the lookup will result in NULL and kernel allows that Duplicate SPI
Proposed Change:
xfrm_state_lookup_spi_proto() does a truly global search - across all states,
regardless of hash bucket and matches SPI and proto.
Signed-off-by: Aakash Kumar S <saakashkumar@marvell.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The skb transport header pointer needs to be adjusted by network header
pointer plus the size of the ipcomp header.
This shows up when running traffic over ipcomp using transport mode.
After being reinjected, packets are dropped because the header isn't
adjusted properly and some checks can be triggered. E.g the skb is
mistakenly considered as IP fragmented packet and later dropped.
kworker/30:1-mm 443 [030] 102.055250: skb:kfree_skb:skbaddr=0xffff8f104aa3ce00 rx_sk=(
ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms])
ffffffff8419f1f4 sk_skb_reason_drop+0x94 ([kernel.kallsyms])
ffffffff84281420 ip_defrag+0x4b0 ([kernel.kallsyms])
ffffffff8428006e ip_local_deliver+0x4e ([kernel.kallsyms])
ffffffff8432afb1 xfrm_trans_reinject+0xe1 ([kernel.kallsyms])
ffffffff83758230 process_one_work+0x190 ([kernel.kallsyms])
ffffffff83758f37 worker_thread+0x2d7 ([kernel.kallsyms])
ffffffff83761cc9 kthread+0xf9 ([kernel.kallsyms])
ffffffff836c3437 ret_from_fork+0x197 ([kernel.kallsyms])
ffffffff836718da ret_from_fork_asm+0x1a ([kernel.kallsyms])
Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Link: https://bugzilla.suse.com/1244532
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The dev_hold() on skb->dev during packet reception was originally
added to prevent the device from being released prematurely during
asynchronous decryption operations.
As current hardware can offload decryption, this asynchronous path is
not always utilized. This often results in a pattern of dev_hold()
immediately followed by dev_put() for each packet, creating
unnecessary reference counting overhead detrimental to performance.
This patch optimizes this by skipping the dev_hold() and subsequent
dev_put() when asynchronous decryption is not being performed.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Offload path is used for GRO with SW IPsec, and not just for HW
offload. So initialize it anyway.
Fixes: 585b64f5a620 ("xfrm: delay initialization of offload path till its actually requested")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Closes: https://lore.kernel.org/all/aEGW_5HfPqU1rFjl@krikkit
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|
|
If we get preempted during xfrm_state_find, we could run
xfrm_state_look_at using a different pcpu_id than the one
xfrm_state_find saw. This could lead to ignoring states that should
have matched, and triggering acquires on a CPU that already has a pcpu
state.
xfrm_state_find starts on CPU1
pcpu_id = 1
lookup starts
<preemption, we're now on CPU2>
xfrm_state_look_at pcpu_id = 2
finds a state
found:
best->pcpu_num != pcpu_id (2 != 1)
if (!x && !error && !acquire_in_progress) {
...
xfrm_state_alloc
xfrm_init_tempstate
...
This can be avoided by passing the original pcpu_id down to all
xfrm_state_look_at() calls.
Also switch to raw_smp_processor_id, disabling preempting just to
re-enable it immediately doesn't really make sense.
Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
In case of preemption, xfrm_state_look_at will find a different
pcpu_id and look up states for that other CPU. If we matched a state
for CPU2 in the state_cache while the lookup started on CPU1, we will
jump to "found", but the "best" state that we got will be ignored and
we will enter the "acquire" block. This block uses state_ptrs, which
isn't initialized at this point.
Let's initialize state_ptrs just after taking rcu_read_lock. This will
also prevent a possible misuse in the future, if someone adjusts this
function.
Reported-by: syzbot+7ed9d47e15e88581dc5b@syzkaller.appspotmail.com
Fixes: e952837f3ddb ("xfrm: state: fix out-of-bounds read during lookup")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Remove some unnecessary strscpy_pad() size arguments.
From Thorsten Blum.
2) Correct use of xso.real_dev on bonding offloads.
Patchset from Cosmin Ratiu.
3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
From Chiachang Wang.
4) Refactor migration setup during cloning. This was
done after the clone was created. Now it is done
in the cloning function itself.
From Chiachang Wang.
5) Validate assignment of maximal possible SEQ number.
Prevent from setting to the maximum sequrnce number
as this would cause for traffic drop.
From Leon Romanovsky.
6) Prevent configuration of interface index when offload
is used. Hardware can't handle this case.i
From Leon Romanovsky.
7) Always use kfree_sensitive() for SA secret zeroization.
From Zilin Guan.
ipsec-next-2025-05-23
* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: use kfree_sensitive() for SA secret zeroization
xfrm: prevent configuration of interface index when offload is used
xfrm: validate assignment of maximal possible SEQ number
xfrm: Refactor migration setup during the cloning process
xfrm: Migrate offload configuration
bonding: Fix multiple long standing offload races
bonding: Mark active offloaded xfrm_states
xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
xfrm: Remove unneeded device check from validate_xmit_xfrm
xfrm: Use xdo.dev instead of xdo.real_dev
net/mlx5: Avoid using xso.real_dev unnecessarily
xfrm: Remove unnecessary strscpy_pad() size arguments
====================
Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939a5 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
High-level copy_to_user_* APIs already redact SA secret fields when
redaction is enabled, but the state teardown path still freed aead,
aalg and ealg structs with plain kfree(), which does not clear memory
before deallocation. This can leave SA keys and other confidential
data in memory, risking exposure via post-free vulnerabilities.
Since this path is outside the packet fast path, the cost of zeroization
is acceptable and prevents any residual key material. This patch
replaces those kfree() calls unconditionally with kfree_sensitive(),
which zeroizes the entire buffer before freeing.
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
nat_keepalive_sk_ipv[46] is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Use sock_bh_locked which has a sock pointer and a local_lock_t. Use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20250512092736.229935-7-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Both packet and crypto offloads perform decryption while packet is
arriving to the HW from the wire. It means that there is no possible
way to perform lookup on XFRM if_id as it can't be set to be "before' HW.
So instead of silently ignore this configuration, let's warn users about
misconfiguration.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Users can set any seq/seq_hi/oseq/oseq_hi values. The XFRM core code
doesn't prevent from them to set even 0xFFFFFFFF, however this value
will cause for traffic drop.
Is is happening because SEQ numbers here mean that packet with such
number was processed and next number should be sent on the wire. In this
case, the next number will be 0, and it means overflow which causes to
(expected) packet drops.
While it can be considered as misconfiguration and handled by XFRM
datapath in the same manner as any other SEQ number, let's add
validation to easy for packet offloads implementations which need to
configure HW with next SEQ to send and not with current SEQ like it is
done in core code.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Prior to this patch, the mark is sanitized (applying the state's mask to
the state's value) only on inserts when checking if a conflicting XFRM
state or policy exists.
We discovered in Cilium that this same sanitization does not occur
in the hot-path __xfrm_state_lookup. In the hot-path, the sk_buff's mark
is simply compared to the state's value:
if ((mark & x->mark.m) != x->mark.v)
continue;
Therefore, users can define unsanitized marks (ex. 0xf42/0xf00) which will
never match any packet.
This commit updates __xfrm_state_insert and xfrm_policy_insert to store
the sanitized marks, thus removing this footgun.
This has the side effect of changing the ip output, as the
returned mark will have the mask applied to it when printed.
Fixes: 3d6acfa7641f ("xfrm: SA lookups with mark")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Co-developed-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
ipcomp_post_acomp currently drops all frags (via pskb_trim_unique(skb,
0)), and then subtracts the old skb->data_len from truesize. This
adjustment has already be done during trimming (in skb_condense), so
we don't need to do it again.
This shows up for example when running fragmented traffic over ipcomp,
we end up hitting the WARN_ON_ONCE in skb_try_coalesce.
Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Previously, migration related setup, such as updating family,
destination address, and source address, was performed after
the clone was created in `xfrm_state_migrate`. This change
moves this setup into the cloning function itself, improving
code locality and reducing redundancy.
The `xfrm_state_clone_and_setup` function now conditionally
applies the migration parameters from struct xfrm_migrate
if it is provided. This allows the function to be used both
for simple cloning and for cloning with migration setup.
Test: Tested with kernel test in the Android tree located
in https://android.googlesource.com/kernel/tests/
The xfrm_tunnel_test.py under the tests folder in
particular.
Signed-off-by: Chiachang Wang <chiachangwang@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Add hardware offload configuration to XFRM_MSG_MIGRATE
using an option netlink attribute XFRMA_OFFLOAD_DEV.
In the existing xfrm_state_migrate(), the xfrm_init_state()
is called assuming no hardware offload by default. Even the
original xfrm_state is configured with offload, the setting will
be reset. If the device is configured with hardware offload,
it's reasonable to allow the device to maintain its hardware
offload mode. But the device will end up with offload disabled
after receiving a migration event when the device migrates the
connection from one netdev to another one.
The devices that support migration may work with different
underlying networks, such as mobile devices. The hardware setting
should be forwarded to the different netdev based on the
migration configuration. This change provides the capability
for user space to migrate from one netdev to another.
Test: Tested with kernel test in the Android tree located
in https://android.googlesource.com/kernel/tests/
The xfrm_tunnel_test.py under the tests folder in
particular.
Signed-off-by: Chiachang Wang <chiachangwang@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Previously, device driver IPSec offload implementations would fall into
two categories:
1. Those that used xso.dev to determine the offload device.
2. Those that used xso.real_dev to determine the offload device.
The first category didn't work with bonding while the second did.
In a non-bonding setup the two pointers are the same.
This commit adds explicit pointers for the offload netdevice to
.xdo_dev_state_add() / .xdo_dev_state_delete() / .xdo_dev_state_free()
which eliminates the confusion and allows drivers from the first
category to work with bonding.
xso.real_dev now becomes a private pointer managed by the bonding
driver.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
validate_xmit_xfrm checks whether a packet already passed through it on
the master device (xso.dev) and skips processing the skb again on the
slave device (xso.real_dev).
This check was added in commit [1] to avoid tx packets on a bond device
pass through xfrm twice and get two sets of headers, but the check was
soon obsoleted by commit [2], which was added around the same time to
fix a similar but unrelated problem. Commit [3] set XFRM_XMIT only when
packets are hw offloaded.
xso.dev is usually equal to xso.real_dev, unless bonding is used, in
which case the bonding driver uses xso.real_dev to manage offloaded xfrm
states.
Since commit [3], the check added in commit [1] is unused on all cases,
since packets going through validate_xmit_xfrm twice bail out on the
check added in commit [2]. Here's a breakdown of relevant scenarios:
1. ESP offload off: validate_xmit_xfrm returns early on !xo.
2. ESP offload on, no bond: skb->dev == xso.real_dev == xso.dev.
3. ESP offload on, bond, xs on bond dev: 1st pass adds XFRM_XMIT, 2nd
pass returns early on XFRM_XMIT.
3. ESP offload on, bond, xs on slave dev: 1st pass returns early on
!xo, 2nd pass adds XFRM_XMIT.
4. ESP offload on, bond, xs on both bond AND slave dev: only 1 offload
possible in secpath. Either 1st pass adds XFRM_XMIT and 2nd pass returns
early on XFRM_XMIT, or 1st pass is sw and returns early on !xo.
6. ESP offload on, crypto fallback triggered in esp_xmit/esp6_xmit: 1st
pass does sw crypto & secpath_reset, 2nd pass returns on !xo.
This commit removes the unnecessary check, so xso.real_dev becomes what
it is in practice: a private field managed by bonding driver.
The check immediately below that can be simplified as well.
[1] commit 272c2330adc9 ("xfrm: bail early on slave pass over skb")
[2] commit 94579ac3f6d0 ("xfrm: Fix double ESP trailer insertion in
IPsec crypto offload.")
[3] commit c7dbf4c08868 ("xfrm: Provide private skb extensions for
segmented and hw offloaded ESP packets")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
The policy offload struct was reused from the state offload and
real_dev was copied from dev, but it was never set to anything else.
Simplify the code by always using xdo.dev for policies.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
xfrmi_exit_batch_rtnl() iterates the dying netns list and
performs the same operations for each.
Let's use ->exit_rtnl().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250411205258.63164-9-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current scheme for caching the encap socket can lead to reference
leaks when we try to delete the netns.
The reference chain is: xfrm_state -> enacp_sk -> netns
Since the encap socket is a userspace socket, it holds a reference on
the netns. If we delete the espintcp state (through flush or
individual delete) before removing the netns, the reference on the
socket is dropped and the netns is correctly deleted. Otherwise, the
netns may not be reachable anymore (if all processes within the ns
have terminated), so we cannot delete the xfrm state to drop its
reference on the socket.
This patch results in a small (~2% in my tests) performance
regression.
A GC-type mechanism could be added for the socket cache, to clear
references if the state hasn't been used "recently", but it's a lot
more complex than just not caching the socket.
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
A few error paths are missing a kfree_skb.
Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
If the destination buffer has a fixed length, strscpy_pad()
automatically determines its size using sizeof() when the argument is
omitted. This makes the explicit sizeof() calls unnecessary - remove
them.
No functional changes intended.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove legacy compression interface
- Improve scatterwalk API
- Add request chaining to ahash and acomp
- Add virtual address support to ahash and acomp
- Add folio support to acomp
- Remove NULL dst support from acomp
Algorithms:
- Library options are fuly hidden (selected by kernel users only)
- Add Kerberos5 algorithms
- Add VAES-based ctr(aes) on x86
- Ensure LZO respects output buffer length on compression
- Remove obsolete SIMD fallback code path from arm/ghash-ce
Drivers:
- Add support for PCI device 0x1134 in ccp
- Add support for rk3588's standalone TRNG in rockchip
- Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
- Fix bugs in tegra uncovered by multi-threaded self-test
- Fix corner cases in hisilicon/sec2
Others:
- Add SG_MITER_LOCAL to sg miter
- Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"
* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
crypto: testmgr - Add multibuffer acomp testing
crypto: acomp - Fix synchronous acomp chaining fallback
crypto: testmgr - Add multibuffer hash testing
crypto: hash - Fix synchronous ahash chaining fallback
crypto: arm/ghash-ce - Remove SIMD fallback code path
crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
crypto: api - Call crypto_alg_put in crypto_unregister_alg
crypto: scompress - Fix incorrect stream freeing
crypto: lib/chacha - remove unused arch-specific init support
crypto: remove obsolete 'comp' compression API
crypto: compress_null - drop obsolete 'comp' implementation
crypto: cavium/zip - drop obsolete 'comp' implementation
crypto: zstd - drop obsolete 'comp' implementation
crypto: lzo - drop obsolete 'comp' implementation
crypto: lzo-rle - drop obsolete 'comp' implementation
crypto: lz4hc - drop obsolete 'comp' implementation
crypto: lz4 - drop obsolete 'comp' implementation
crypto: deflate - drop obsolete 'comp' implementation
crypto: 842 - drop obsolete 'comp' implementation
crypto: nx - Migrate to scomp API
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-03-24
1) Prevent setting high order sequence number bits input in
non-ESN mode. From Leon Romanovsky.
2) Support PMTU handling in tunnel mode for packet offload.
From Leon Romanovsky.
3) Make xfrm_state_lookup_byaddr lockless.
From Florian Westphal.
4) Remove unnecessary NULL check in xfrm_lookup_with_ifid().
From Dan Carpenter.
* tag 'ipsec-next-2025-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove unnecessary NULL check in xfrm_lookup_with_ifid()
xfrm: state: make xfrm_state_lookup_byaddr lockless
xfrm: check for PMTU in tunnel mode for packet offload
xfrm: provide common xdo_dev_offload_ok callback implementation
xfrm: rely on XFRM offload
xfrm: simplify SA initialization routine
xfrm: delay initialization of offload path till its actually requested
xfrm: prevent high SEQ input in non-ESN mode
====================
Link: https://patch.msgid.link/20250324061855.4116819-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the legacy comperssion interface with the new acomp
interface. This is the first user to make full user of the
asynchronous nature of acomp by plugging into the existing xfrm
resume interface.
As a result of SG support by acomp, the linear scratch buffer
in ipcomp can be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
If a malformed packet is received there may not be enough data
to pull. This isn't a problem in practice because the caller
has already done xfrm_parse_spi which in effect does the same
thing.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|