diff options
| author | Jakub Kicinski <kuba@kernel.org> | 2025-12-06 17:09:41 -0800 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2025-12-10 01:15:27 -0800 |
| commit | 006a5035b495dec008805df249f92c22c89c3d2e (patch) | |
| tree | 28ec392e8b2177b26cca538b1bbbab2fe658d5e1 /lib/mpi/mpi-inline.c | |
| parent | 1231eec6994be29d6bb5c303dfa54731ed9fc0e6 (diff) | |
inet: frags: flush pending skbs in fqdir_pre_exit()
We have been seeing occasional deadlocks on pernet_ops_rwsem since
September in NIPA. The stuck task was usually modprobe (often loading
a driver like ipvlan), trying to take the lock as a Writer.
lockdep does not track readers for rwsems so the read wasn't obvious
from the reports.
On closer inspection the Reader holding the lock was conntrack looping
forever in nf_conntrack_cleanup_net_list(). Based on past experience
with occasional NIPA crashes I looked thru the tests which run before
the crash and noticed that the crash follows ip_defrag.sh. An immediate
red flag. Scouring thru (de)fragmentation queues reveals skbs sitting
around, holding conntrack references.
The problem is that since conntrack depends on nf_defrag_ipv6,
nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its
netns exit hooks run _after_ conntrack's netns exit hook.
Flush all fragment queue SKBs during fqdir_pre_exit() to release
conntrack references before conntrack cleanup runs. Also flush
the queues in timer expiry handlers when they discover fqdir->dead
is set, in case packet sneaks in while we're running the pre_exit
flush.
The commit under Fixes is not exactly the culprit, but I think
previously the timer firing would eventually unblock the spinning
conntrack.
Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'lib/mpi/mpi-inline.c')
0 files changed, 0 insertions, 0 deletions
