<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/net/xfrm, branch master</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>xfrm: espintcp: do not reuse an in-progress partial send</title>
<updated>2026-06-05T11:20:03+00:00</updated>
<author>
<name>Wyatt Feng</name>
<email>bronzed_45_vested@icloud.com</email>
</author>
<published>2026-06-02T16:46:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c381039ade2e161ab08c0eda73c4f8b9a7115928'/>
<id>c381039ade2e161ab08c0eda73c4f8b9a7115928</id>
<content type='text'>
espintcp keeps a single in-flight transmit in ctx-&gt;partial.
Before building a new sk_msg, espintcp_sendmsg() first tries to flush
that state through espintcp_push_msgs().

For blocking callers, espintcp_push_msgs() may return success even when
the previous partial send is still pending. espintcp_sendmsg() would
then reinitialize emsg-&gt;skmsg and reuse ctx-&gt;partial while the old
transfer still owns that state.

Do not rebuild the send message when ctx-&gt;partial is still in progress.
If espintcp_push_msgs() returns with emsg-&gt;len still set, fail the new
send instead of overwriting the live partial state.

This is a memory-safety fix: reusing the live partial-send state can
leave a stale offset attached to a new sk_msg and lead to an out-of-
bounds read in the send path.

tcp_sendmsg_locked() already handles waiting for send buffer memory, so
the fix here is just to preserve espintcp's one-message-at-a-time
transmit state.

Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Zhengchuan Liang &lt;zcliangcn@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng &lt;bronzed_45_vested@icloud.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
espintcp keeps a single in-flight transmit in ctx-&gt;partial.
Before building a new sk_msg, espintcp_sendmsg() first tries to flush
that state through espintcp_push_msgs().

For blocking callers, espintcp_push_msgs() may return success even when
the previous partial send is still pending. espintcp_sendmsg() would
then reinitialize emsg-&gt;skmsg and reuse ctx-&gt;partial while the old
transfer still owns that state.

Do not rebuild the send message when ctx-&gt;partial is still in progress.
If espintcp_push_msgs() returns with emsg-&gt;len still set, fail the new
send instead of overwriting the live partial state.

This is a memory-safety fix: reusing the live partial-send state can
leave a stale offset attached to a new sk_msg and lead to an out-of-
bounds read in the send path.

tcp_sendmsg_locked() already handles waiting for send buffer memory, so
the fix here is just to preserve espintcp's one-message-at-a-time
transmit state.

Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Zhengchuan Liang &lt;zcliangcn@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Assisted-by: Codex:GPT-5.4
Signed-off-by: Wyatt Feng &lt;bronzed_45_vested@icloud.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: fix ABBA deadlock in iptfs_destroy_state()</title>
<updated>2026-06-05T11:06:08+00:00</updated>
<author>
<name>Tristan Madani</name>
<email>tristmd@gmail.com</email>
</author>
<published>2026-06-02T17:16:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c8a8a75b733467b00c08b91a38dbaf207a08ed6e'/>
<id>c8a8a75b733467b00c08b91a38dbaf207a08ed6e</id>
<content type='text'>
iptfs_destroy_state() calls hrtimer_cancel() while holding a spinlock
that the timer callback also acquires, leading to an ABBA deadlock on
SMP systems.

For the output timer (iptfs_timer):
  - iptfs_destroy_state() holds x-&gt;lock, calls hrtimer_cancel()
  - iptfs_delay_timer() callback takes x-&gt;lock

For the drop timer (drop_timer):
  - iptfs_destroy_state() holds drop_lock, calls hrtimer_cancel()
  - iptfs_drop_timer() callback takes drop_lock

Both timers use HRTIMER_MODE_REL_SOFT, so their callbacks run in softirq
context.  When hrtimer_cancel() is called for a soft timer that is
currently executing on another CPU, hrtimer_cancel_wait_running() spins
on softirq_expiry_lock -- the same lock held by the softirq running the
callback.  If the callback is blocked waiting for the spinlock held by
the caller of hrtimer_cancel(), a circular dependency forms:

  CPU 0: holds lock_A -&gt; waits for softirq_expiry_lock
  CPU 1: holds softirq_expiry_lock -&gt; waits for lock_A

Fix by calling hrtimer_cancel() before acquiring the respective locks.
hrtimer_cancel() is safe to call without holding any lock and will wait
for any in-progress callback to complete.  For the output timer, the
lock is still acquired afterwards to drain the packet queue.  For the
drop timer, the lock/unlock pair is removed entirely since it only
existed to serialize with the timer callback, which hrtimer_cancel()
already guarantees.

Found by source code audit.

Fixes: 4b3faf610cc6 ("xfrm: iptfs: add new iptfs xfrm mode impl")
Cc: Christian Hopps &lt;chopps@labn.net&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani &lt;tristan@talencesecurity.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iptfs_destroy_state() calls hrtimer_cancel() while holding a spinlock
that the timer callback also acquires, leading to an ABBA deadlock on
SMP systems.

For the output timer (iptfs_timer):
  - iptfs_destroy_state() holds x-&gt;lock, calls hrtimer_cancel()
  - iptfs_delay_timer() callback takes x-&gt;lock

For the drop timer (drop_timer):
  - iptfs_destroy_state() holds drop_lock, calls hrtimer_cancel()
  - iptfs_drop_timer() callback takes drop_lock

Both timers use HRTIMER_MODE_REL_SOFT, so their callbacks run in softirq
context.  When hrtimer_cancel() is called for a soft timer that is
currently executing on another CPU, hrtimer_cancel_wait_running() spins
on softirq_expiry_lock -- the same lock held by the softirq running the
callback.  If the callback is blocked waiting for the spinlock held by
the caller of hrtimer_cancel(), a circular dependency forms:

  CPU 0: holds lock_A -&gt; waits for softirq_expiry_lock
  CPU 1: holds softirq_expiry_lock -&gt; waits for lock_A

Fix by calling hrtimer_cancel() before acquiring the respective locks.
hrtimer_cancel() is safe to call without holding any lock and will wait
for any in-progress callback to complete.  For the output timer, the
lock is still acquired afterwards to drain the packet queue.  For the
drop timer, the lock/unlock pair is removed entirely since it only
existed to serialize with the timer callback, which hrtimer_cancel()
already guarantees.

Found by source code audit.

Fixes: 4b3faf610cc6 ("xfrm: iptfs: add new iptfs xfrm mode impl")
Cc: Christian Hopps &lt;chopps@labn.net&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Tristan Madani &lt;tristan@talencesecurity.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: policy: fix use-after-free on inexact bin in xfrm_policy_bysel_ctx()</title>
<updated>2026-06-04T09:55:22+00:00</updated>
<author>
<name>Sanghyun Park</name>
<email>sanghyun.park.cnu@gmail.com</email>
</author>
<published>2026-06-02T09:49:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f2d76c9c03257c0782afef9d95321fa04096f60'/>
<id>7f2d76c9c03257c0782afef9d95321fa04096f60</id>
<content type='text'>
Fix the race by pruning the bin while still holding xfrm_policy_lock,
before dropping it. Use __xfrm_policy_inexact_prune_bin() directly since
the lock is already held. The wrapper xfrm_policy_inexact_prune_bin()
becomes unused and is removed.

Race:

  CPU0 (XFRM_MSG_DELPOLICY)           CPU1 (XFRM_MSG_NEWSPDINFO)
  ==========================          ==========================
  xfrm_policy_bysel_ctx():
    spin_lock_bh(xfrm_policy_lock)
    bin = xfrm_policy_inexact_lookup()
    __xfrm_policy_unlink(pol)
    spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_kill(ret)
    // wide window, lock not held
                                       xfrm_hash_rebuild():
                                         spin_lock_bh(xfrm_policy_lock)
                                         __xfrm_policy_inexact_flush():
                                           kfree_rcu(bin)  // bin freed
                                         spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_inexact_prune_bin(bin)
    // UAF: bin is freed

Fixes: 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure")
Signed-off-by: Sanghyun Park &lt;sanghyun.park.cnu@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix the race by pruning the bin while still holding xfrm_policy_lock,
before dropping it. Use __xfrm_policy_inexact_prune_bin() directly since
the lock is already held. The wrapper xfrm_policy_inexact_prune_bin()
becomes unused and is removed.

Race:

  CPU0 (XFRM_MSG_DELPOLICY)           CPU1 (XFRM_MSG_NEWSPDINFO)
  ==========================          ==========================
  xfrm_policy_bysel_ctx():
    spin_lock_bh(xfrm_policy_lock)
    bin = xfrm_policy_inexact_lookup()
    __xfrm_policy_unlink(pol)
    spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_kill(ret)
    // wide window, lock not held
                                       xfrm_hash_rebuild():
                                         spin_lock_bh(xfrm_policy_lock)
                                         __xfrm_policy_inexact_flush():
                                           kfree_rcu(bin)  // bin freed
                                         spin_unlock_bh(xfrm_policy_lock)
    xfrm_policy_inexact_prune_bin(bin)
    // UAF: bin is freed

Fixes: 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure")
Signed-off-by: Sanghyun Park &lt;sanghyun.park.cnu@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: fix use-after-free on first_skb in __input_process_payload</title>
<updated>2026-06-02T10:21:50+00:00</updated>
<author>
<name>Zhenghang Xiao</name>
<email>kipreyyy@gmail.com</email>
</author>
<published>2026-05-26T10:53:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=eb48730bb827d1550401a5d391903f9d90b493c8'/>
<id>eb48730bb827d1550401a5d391903f9d90b493c8</id>
<content type='text'>
__input_process_payload() stores first_skb into xtfs-&gt;ra_newskb under
drop_lock when starting partial reassembly, then unlocks and breaks out
of the processing loop. The post-loop check reads xtfs-&gt;ra_newskb
without the lock to decide whether first_skb is still owned:

    if (first_skb &amp;&amp; first_iplen &amp;&amp; !defer &amp;&amp; first_skb != xtfs-&gt;ra_newskb)

Between spin_unlock and this read, a concurrent CPU running
iptfs_reassem_cont() (or the drop_timer hrtimer) can complete
reassembly, NULL xtfs-&gt;ra_newskb, and free the skb. The check then
evaluates first_skb != NULL as true, and pskb_trim/ip_summed/consume_skb
operate on the freed skb — a use-after-free in skbuff_head_cache.

Replace the unlocked read with a local bool that records whether
first_skb was handed to the reassembly state in the current call. The
flag is set after the existing spin_unlock, before the break, using the
pointer equality that is stable at that point (first_skb == skb iff
first_skb was stored in ra_newskb).

Fixes: 3f3339885fb3 ("xfrm: iptfs: add reusing received skb for the tunnel egress packet")
Signed-off-by: Zhenghang Xiao &lt;kipreyyy@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__input_process_payload() stores first_skb into xtfs-&gt;ra_newskb under
drop_lock when starting partial reassembly, then unlocks and breaks out
of the processing loop. The post-loop check reads xtfs-&gt;ra_newskb
without the lock to decide whether first_skb is still owned:

    if (first_skb &amp;&amp; first_iplen &amp;&amp; !defer &amp;&amp; first_skb != xtfs-&gt;ra_newskb)

Between spin_unlock and this read, a concurrent CPU running
iptfs_reassem_cont() (or the drop_timer hrtimer) can complete
reassembly, NULL xtfs-&gt;ra_newskb, and free the skb. The check then
evaluates first_skb != NULL as true, and pskb_trim/ip_summed/consume_skb
operate on the freed skb — a use-after-free in skbuff_head_cache.

Replace the unlocked read with a local bool that records whether
first_skb was handed to the reassembly state in the current call. The
flag is set after the existing spin_unlock, before the break, using the
pointer equality that is stable at that point (first_skb == skb iff
first_skb was stored in ra_newskb).

Fixes: 3f3339885fb3 ("xfrm: iptfs: add reusing received skb for the tunnel egress packet")
Signed-off-by: Zhenghang Xiao &lt;kipreyyy@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: preserve shared-frag marker in iptfs_consume_frags()</title>
<updated>2026-06-01T06:38:51+00:00</updated>
<author>
<name>Takao Sato</name>
<email>takaosato1997@gmail.com</email>
</author>
<published>2026-05-26T16:09:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e9096a5a170e7ecd6467bc2e08668ec39897cda7'/>
<id>e9096a5a170e7ecd6467bc2e08668ec39897cda7</id>
<content type='text'>
iptfs_consume_frags() transfers paged fragments from one socket buffer
to another but fails to propagate the SKBFL_SHARED_FRAG flag. This is
the same class of bug that was fixed in skb_try_coalesce() for
CVE-2026-46300: when fragments backed by read-only page-cache pages are
merged, the marker indicating their shared nature must be preserved so
that ESP can decide correctly whether in-place encryption is safe.

Apply the same two-line fix used in skb_try_coalesce() to
iptfs_consume_frags().

Fixes: b96ba312e21c ("xfrm: iptfs: share page fragments of inner packets")
Cc: stable@vger.kernel.org # 6.14+
Signed-off-by: Takao Sato &lt;takaosato1997@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iptfs_consume_frags() transfers paged fragments from one socket buffer
to another but fails to propagate the SKBFL_SHARED_FRAG flag. This is
the same class of bug that was fixed in skb_try_coalesce() for
CVE-2026-46300: when fragments backed by read-only page-cache pages are
merged, the marker indicating their shared nature must be preserved so
that ESP can decide correctly whether in-place encryption is safe.

Apply the same two-line fix used in skb_try_coalesce() to
iptfs_consume_frags().

Fixes: b96ba312e21c ("xfrm: iptfs: share page fragments of inner packets")
Cc: stable@vger.kernel.org # 6.14+
Signed-off-by: Takao Sato &lt;takaosato1997@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: input: hold netns during deferred transport reinjection</title>
<updated>2026-05-26T08:35:30+00:00</updated>
<author>
<name>Zhengchuan Liang</name>
<email>zcliangcn@gmail.com</email>
</author>
<published>2026-05-22T09:31:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c16f74dc1d75d0e2e7670076d5375deda110ebeb'/>
<id>c16f74dc1d75d0e2e7670076d5375deda110ebeb</id>
<content type='text'>
Transport-mode reinjection stores a struct net pointer in skb-&gt;cb and
uses it later from xfrm_trans_reinject(). That pointer must stay valid
until the deferred callback runs.

Take a netns reference when queueing deferred reinjection work and drop
it after the callback completes. Use maybe_get_net() so the queueing
path does not revive a namespace that is already being torn down.

This keeps the existing workqueue design and fixes the netns lifetime
handling in one place for all users of xfrm_trans_queue_net().

Fixes: 7b3801927e52 ("xfrm: introduce xfrm_trans_queue_net")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Co-developed-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Zhengchuan Liang &lt;zcliangcn@gmail.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Assisted-by: Codex:gpt-5.4
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Transport-mode reinjection stores a struct net pointer in skb-&gt;cb and
uses it later from xfrm_trans_reinject(). That pointer must stay valid
until the deferred callback runs.

Take a netns reference when queueing deferred reinjection work and drop
it after the callback completes. Use maybe_get_net() so the queueing
path does not revive a namespace that is already being torn down.

This keeps the existing workqueue design and fixes the netns lifetime
handling in one place for all users of xfrm_trans_queue_net().

Fixes: 7b3801927e52 ("xfrm: introduce xfrm_trans_queue_net")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Co-developed-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Zhengchuan Liang &lt;zcliangcn@gmail.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Assisted-by: Codex:gpt-5.4
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit</title>
<updated>2026-05-26T08:35:29+00:00</updated>
<author>
<name>Usama Arif</name>
<email>usama.arif@linux.dev</email>
</author>
<published>2026-05-21T10:29:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3e52417318473782012b236d0325bf7d2266a597'/>
<id>3e52417318473782012b236d0325bf7d2266a597</id>
<content type='text'>
The struct pernet_operations docstring in include/net/net_namespace.h
explicitly warns against blocking RCU primitives in .exit handlers:

    Exit methods using blocking RCU primitives, such as
    synchronize_rcu(), should be implemented via exit_batch.
    [...]
    Please, avoid synchronize_rcu() at all, where it's possible.

    Note that a combination of pre_exit() and exit() can
    be used, since a synchronize_rcu() is guaranteed between
    the calls.

xfrm_policy_fini() violates this: it calls synchronize_rcu() before
freeing the policy_bydst hash tables (so no RCU reader is mid-
traversal at free time), but runs from xfrm_net_ops.exit -- once per
namespace -- so a cleanup_net() of N namespaces pays N full RCU
grace periods serially.

Use the documented pre_exit/exit split. Move the policy flush (and
the workqueue drains it depends on) into a new .pre_exit handler;
xfrm_policy_fini() then runs in .exit and frees the hash tables
after the synchronize_rcu_expedited() that cleanup_net() guarantees
between the two phases. Providing O(1) RCU grace periods per batch
instead of O(N).

Observed on Linux 6.18 with a workload doing unshare(CLONE_NEWNET)
at ~13/sec sustained: cleanup_net() and the netns_wq rescuer kthread
both stuck in xfrm_policy_fini()'s synchronize_rcu(), &gt;300k struct
net accumulated in the cleanup queue, Percpu in /proc/meminfo climbed
to 130+ GB on 256-CPU hosts, and memcg OOMs followed. setup_net and
__put_net counts were balanced, ruling out a refcount leak.

Fixes: 069daad4f2ae ("xfrm: Wait for RCU readers during policy netns exit")
Signed-off-by: Usama Arif &lt;usama.arif@linux.dev&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The struct pernet_operations docstring in include/net/net_namespace.h
explicitly warns against blocking RCU primitives in .exit handlers:

    Exit methods using blocking RCU primitives, such as
    synchronize_rcu(), should be implemented via exit_batch.
    [...]
    Please, avoid synchronize_rcu() at all, where it's possible.

    Note that a combination of pre_exit() and exit() can
    be used, since a synchronize_rcu() is guaranteed between
    the calls.

xfrm_policy_fini() violates this: it calls synchronize_rcu() before
freeing the policy_bydst hash tables (so no RCU reader is mid-
traversal at free time), but runs from xfrm_net_ops.exit -- once per
namespace -- so a cleanup_net() of N namespaces pays N full RCU
grace periods serially.

Use the documented pre_exit/exit split. Move the policy flush (and
the workqueue drains it depends on) into a new .pre_exit handler;
xfrm_policy_fini() then runs in .exit and frees the hash tables
after the synchronize_rcu_expedited() that cleanup_net() guarantees
between the two phases. Providing O(1) RCU grace periods per batch
instead of O(N).

Observed on Linux 6.18 with a workload doing unshare(CLONE_NEWNET)
at ~13/sec sustained: cleanup_net() and the netns_wq rescuer kthread
both stuck in xfrm_policy_fini()'s synchronize_rcu(), &gt;300k struct
net accumulated in the cleanup queue, Percpu in /proc/meminfo climbed
to 130+ GB on 256-CPU hosts, and memcg OOMs followed. setup_net and
__put_net counts were balanced, ruling out a refcount leak.

Fixes: 069daad4f2ae ("xfrm: Wait for RCU readers during policy netns exit")
Signed-off-by: Usama Arif &lt;usama.arif@linux.dev&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: reset runtime state when cloning SAs</title>
<updated>2026-05-26T08:35:28+00:00</updated>
<author>
<name>Shaomin Chen</name>
<email>eeesssooo020@gmail.com</email>
</author>
<published>2026-05-20T18:07:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7f83d174073234839aea176f265e517e0d50a1d2'/>
<id>7f83d174073234839aea176f265e517e0d50a1d2</id>
<content type='text'>
iptfs_clone_state() clones the IPTFS mode data with kmemdup(). This
copies runtime objects which must not be shared with the original SA,
including the embedded sk_buff_head, hrtimers, spinlock, and in-flight
reassembly/reorder state.

If xfrm_state_migrate() fails after clone_state() but before the later
init_state() call has reinitialized those fields, the cloned state can be
destroyed by xfrm_state_gc_task() with list and timer state copied from the
original SA. With queued packets this lets the clone splice and free skbs
owned by the original IPTFS queue, leading to use-after-free and
double-free reports in iptfs_destroy_state() and skb release paths.

Reinitialize the clone's runtime state before publishing it through
x-&gt;mode_data. Because clone_state() now publishes a destroyable mode_data
object before init_state(), take the mode callback module reference there.
Avoid taking it again from __iptfs_init_state() for the same object.

Fixes: 0e4fbf013fa5 ("xfrm: iptfs: add user packet (tunnel ingress) handling")
Cc: stable@vger.kernel.org
Signed-off-by: Shaomin Chen &lt;eeesssooo020@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
iptfs_clone_state() clones the IPTFS mode data with kmemdup(). This
copies runtime objects which must not be shared with the original SA,
including the embedded sk_buff_head, hrtimers, spinlock, and in-flight
reassembly/reorder state.

If xfrm_state_migrate() fails after clone_state() but before the later
init_state() call has reinitialized those fields, the cloned state can be
destroyed by xfrm_state_gc_task() with list and timer state copied from the
original SA. With queued packets this lets the clone splice and free skbs
owned by the original IPTFS queue, leading to use-after-free and
double-free reports in iptfs_destroy_state() and skb release paths.

Reinitialize the clone's runtime state before publishing it through
x-&gt;mode_data. Because clone_state() now publishes a destroyable mode_data
object before init_state(), take the mode callback module reference there.
Avoid taking it again from __iptfs_init_state() for the same object.

Fixes: 0e4fbf013fa5 ("xfrm: iptfs: add user packet (tunnel ingress) handling")
Cc: stable@vger.kernel.org
Signed-off-by: Shaomin Chen &lt;eeesssooo020@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: Check for underflow in xfrm_state_mtu</title>
<updated>2026-05-14T08:17:43+00:00</updated>
<author>
<name>David Ahern</name>
<email>dahern@nvidia.com</email>
</author>
<published>2026-05-13T16:49:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=742b04d0550b0ec89dcbc99537ec88653bd1ad90'/>
<id>742b04d0550b0ec89dcbc99537ec88653bd1ad90</id>
<content type='text'>
Leo Lin reported OOB write issue in esp component:

  xfrm_state_mtu() returns u32 but performs its arithmetic in unsigned
  modulo-2^32 space using an attacker-influenced "header_len + authsize +
  net_adj" subtracted from a small "mtu" argument. A nobody user can
  install an IPv4 ESP tunnel SA with a large authentication key
  (XFRMA_ALG_AUTH_TRUNC, e.g. hmac(sha512), 64-byte key, 64-byte trunc),
  configure a small interface MTU (68 bytes), and set XFRMA_TFCPAD to a
  large value. When a single UDP datagram is then sent through the
  tunnel, xfrm_state_mtu() underflows to a near-2^32 value, and
  esp_output() consumes it as a signed int via:

        padto      = min(x-&gt;tfcpad, xfrm_state_mtu(x, mtu_cached))
        esp.tfclen = padto - skb-&gt;len   (assigned to int)

  esp.tfclen ends up negative (e.g. -207). It is sign-extended to size_t
  when passed to memset() inside esp_output_fill_trailer(), producing a
  ~16 EB write of zeroes at skb_tail_pointer(skb). KASAN logs it as
  "Write of size 18446744073709551537 at addr ffff888...".

Check for underflow and return 1. This causes the sendmsg attempt to
fail with ENETUNREACH.

Fixes: c5c252389374 ("[XFRM]: Optimize MTU calculation")
Reported-by: Leo Lin &lt;leo@depthfirst.com&gt;
Assisted-by: Codex:26.506.31004
Signed-off-by: David Ahern &lt;dahern@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Leo Lin reported OOB write issue in esp component:

  xfrm_state_mtu() returns u32 but performs its arithmetic in unsigned
  modulo-2^32 space using an attacker-influenced "header_len + authsize +
  net_adj" subtracted from a small "mtu" argument. A nobody user can
  install an IPv4 ESP tunnel SA with a large authentication key
  (XFRMA_ALG_AUTH_TRUNC, e.g. hmac(sha512), 64-byte key, 64-byte trunc),
  configure a small interface MTU (68 bytes), and set XFRMA_TFCPAD to a
  large value. When a single UDP datagram is then sent through the
  tunnel, xfrm_state_mtu() underflows to a near-2^32 value, and
  esp_output() consumes it as a signed int via:

        padto      = min(x-&gt;tfcpad, xfrm_state_mtu(x, mtu_cached))
        esp.tfclen = padto - skb-&gt;len   (assigned to int)

  esp.tfclen ends up negative (e.g. -207). It is sign-extended to size_t
  when passed to memset() inside esp_output_fill_trailer(), producing a
  ~16 EB write of zeroes at skb_tail_pointer(skb). KASAN logs it as
  "Write of size 18446744073709551537 at addr ffff888...".

Check for underflow and return 1. This causes the sendmsg attempt to
fail with ENETUNREACH.

Fixes: c5c252389374 ("[XFRM]: Optimize MTU calculation")
Reported-by: Leo Lin &lt;leo@depthfirst.com&gt;
Assisted-by: Codex:26.506.31004
Signed-off-by: David Ahern &lt;dahern@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: ipcomp: Free destination pages on acomp errors</title>
<updated>2026-05-11T08:34:35+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2026-05-06T13:23:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7dbac7680eb629b3b4dc7e98c34f943b8814c0c8'/>
<id>7dbac7680eb629b3b4dc7e98c34f943b8814c0c8</id>
<content type='text'>
Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success.

Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Reported-by: Yilin Zhu &lt;zylzyl2333@gmail.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success.

Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Reported-by: Yilin Zhu &lt;zylzyl2333@gmail.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
