summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-12-10ALSA: hda: intel-dsp-config: Prefer legacy driver as fallbackTakashi Iwai
When config table entries don't match with the device to be probed, currently we fall back to SND_INTEL_DSP_DRIVER_ANY, which means to allow any drivers to bind with it. This was set so with the assumption (or hope) that all controller drivers should cover the devices generally, but in practice, this caused a problem as reported recently. Namely, when a specific kconfig for SOF isn't set for the modern Intel chips like Alderlake, a wrong driver (AVS) got probed and failed. This is because we have entries like: #if IS_ENABLED(CONFIG_SND_SOC_SOF_ALDERLAKE) /* Alder Lake / Raptor Lake */ { .flags = FLAG_SOF | FLAG_SOF_ONLY_IF_DMIC_OR_SOUNDWIRE, .device = PCI_DEVICE_ID_INTEL_HDA_ADL_S, }, .... #endif so this entry is effective only when CONFIG_SND_SOC_SOF_ALDERLAKE is set. If not set, there is no matching entry, hence it returns SND_INTEL_DSP_DRIVER_ANY as fallback. OTOH, if the kconfig is set, it explicitly falls back to SND_INTEL_DSP_DRIVER_LEGACY when no DMIC or SoundWire is found -- that was the working scenario. That being said, the current setup may be broken for modern Intel chips that are supposed to work with either SOF or legacy driver when the corresponding kconfig were missing. For addressing the problem above, this patch changes the fallback driver to the legacy driver, i.e. return SND_INTEL_DSP_DRIVER_LEGACY type as much as possible. When CONFIG_SND_HDA_INTEL is also disabled, the fallback is set to SND_INTEL_DSP_DRIVER_ANY type, just to be sure. Reported-by: Askar Safin <safinaskar@gmail.com> Closes: https://lore.kernel.org/all/20251014034156.4480-1-safinaskar@gmail.com/ Tested-by: Askar Safin <safinaskar@gmail.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20251210131553.184404-1-tiwai@suse.de
2025-12-10mm/slub: reset KASAN tag in defer_free() before accessing freed memoryDeepanshu Kartikey
When CONFIG_SLUB_TINY is enabled, kfree_nolock() calls kasan_slab_free() before defer_free(). On ARM64 with MTE (Memory Tagging Extension), kasan_slab_free() poisons the memory and changes the tag from the original (e.g., 0xf3) to a poison tag (0xfe). When defer_free() then tries to write to the freed object to build the deferred free list via llist_add(), the pointer still has the old tag, causing a tag mismatch and triggering a KASAN use-after-free report: BUG: KASAN: slab-use-after-free in defer_free+0x3c/0xbc mm/slub.c:6537 Write at addr f3f000000854f020 by task kworker/u8:6/983 Pointer tag: [f3], memory tag: [fe] Fix this by calling kasan_reset_tag() before accessing the freed memory. This is safe because defer_free() is part of the allocator itself and is expected to manipulate freed memory for bookkeeping purposes. Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Cc: stable@vger.kernel.org Reported-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7a25305a76d872abcfa1 Tested-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20251210022024.3255826-1-kartikey406@gmail.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-12-10Merge branches 'fixes' and 'misc' into for-nextRussell King (Oracle)
2025-12-10ARM: fix branch predictor hardeningRussell King (Oracle)
__do_user_fault() may be called with indeterminent interrupt enable state, which means we may be preemptive at this point. This causes problems when calling harden_branch_predictor(). For example, when called from a data abort, do_alignment_fault()->do_bad_area(). Move harden_branch_predictor() out of __do_user_fault() and into the calling contexts. Moving it into do_kernel_address_page_fault(), we can be sure that interrupts will be disabled here. Converting do_translation_fault() to use do_kernel_address_page_fault() rather than do_bad_area() means that we keep branch predictor handling for translation faults. Interrupts will also be disabled at this call site. do_sect_fault() needs special handling, so detect user mode accesses to kernel-addresses, and add an explicit call to branch predictor hardening. Finally, add branch predictor hardening to do_alignment() for the faulting case (user mode accessing kernel addresses) before interrupts are enabled. This should cover all cases where harden_branch_predictor() is called, ensuring that it is always has interrupts disabled, also ensuring that it is called early in each call path. Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10ARM: fix hash_name() faultRussell King (Oracle)
Zizhi Wo reports: "During the execution of hash_name()->load_unaligned_zeropad(), a potential memory access beyond the PAGE boundary may occur. For example, when the filename length is near the PAGE_SIZE boundary. This triggers a page fault, which leads to a call to do_page_fault()->mmap_read_trylock(). If we can't acquire the lock, we have to fall back to the mmap_read_lock() path, which calls might_sleep(). This breaks RCU semantics because path lookup occurs under an RCU read-side critical section." This is seen with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_KFENCE=y. Kernel addresses (with the exception of the vectors/kuser helper page) do not have VMAs associated with them. If the vectors/kuser helper page faults, then there are two possibilities: 1. if the fault happened while in kernel mode, then we're basically dead, because the CPU won't be able to vector through this page to handle the fault. 2. if the fault happened while in user mode, that means the page was protected from user access, and we want to fault anyway. Thus, we can handle kernel addresses from any context entirely separately without going anywhere near the mmap lock. This gives us an entirely non-sleeping path for all kernel mode kernel address faults. As we handle the kernel address faults before interrupts are enabled, this change has the side effect of improving the branch predictor hardening, but does not completely solve the issue. Reported-by: Zizhi Wo <wozizhi@huaweicloud.com> Reported-by: Xie Yuanbin <xieyuanbin1@huawei.com> Link: https://lore.kernel.org/r/20251126090505.3057219-1-wozizhi@huaweicloud.com Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10ARM: allow __do_kernel_fault() to report execution of memory faultsRussell King (Oracle)
Allow __do_kernel_fault() to detect the execution of memory, so we can provide the same fault message as do_page_fault() would do. This is required when we split the kernel address fault handling from the main do_page_fault() code path. Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10selftests: netfilter: prefer xfail in case race wasn't triggeredFlorian Westphal
Jakub says: "We try to reserve SKIP for tests skipped because tool is missing in env, something isn't built into the kernel etc." use xfail, we can't force the race condition to appear at will so its expected that the test 'fails' occasionally. Fixes: 78a588363587 ("selftests: netfilter: add conntrack clash resolution test case") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251206175647.5c32f419@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10netfilter: always set route tuple out ifindexLorenzo Bianconi
Always set nf_flow_route tuple out ifindex even if the indev is not one of the flowtable configured devices since otherwise the outdev lookup in nf_flow_offload_ip_hook() or nf_flow_offload_ipv6_hook() for FLOW_OFFLOAD_XMIT_NEIGH flowtable entries will fail. The above issue occurs in the following configuration since IP6IP6 tunnel does not support flowtable acceleration yet: $ip addr show 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:11:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns1 inet6 2001:db8:1::2/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::211:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:22:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns3 inet6 2001:db8:2::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::222:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 7: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000 link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr a85:e732:2c37:: inet6 2002:db8:1::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::885:e7ff:fe32:2c37/64 scope link proto kernel_ll valid_lft forever preferred_lft forever $ip -6 route show 2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium 2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium default via 2002:db8:1::2 dev tun0 metric 1024 pref medium $nft list ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth0, eth1 } } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Fixes: b5964aac51e0 ("netfilter: flowtable: consolidate xmit path") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10ipvs: fix ipv4 null-ptr-deref in route error pathSlavin Liu
The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure() without ensuring skb->dev is set, leading to a NULL pointer dereference in fib_compute_spec_dst() when ipv4_link_failure() attempts to send ICMP destination unreachable messages. The issue emerged after commit ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") started calling __ip_options_compile() from ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst() which dereferences skb->dev. An attempt was made to fix the NULL skb->dev dereference in commit 0113d9c9d1cc ("ipv4: fix null-deref in ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev) dereference by using a fallback device. The fix was incomplete because fib_compute_spec_dst() later in the call chain still accesses skb->dev directly, which remains NULL when IPVS calls dst_link_failure(). The crash occurs when: 1. IPVS processes a packet in NAT mode with a misconfigured destination 2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route 3. The error path calls dst_link_failure(skb) with skb->dev == NULL 4. ipv4_link_failure() → ipv4_send_dest_unreach() → __ip_options_compile() → fib_compute_spec_dst() 5. fib_compute_spec_dst() dereferences NULL skb->dev Apply the same fix used for IPv6 in commit 326bf17ea5d4 ("ipvs: fix ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before calling dst_link_failure(). KASAN: null-ptr-deref in range [0x0000000000000328-0x000000000000032f] CPU: 1 PID: 12732 Comm: syz.1.3469 Not tainted 6.6.114 #2 RIP: 0010:__in_dev_get_rcu include/linux/inetdevice.h:233 RIP: 0010:fib_compute_spec_dst+0x17a/0x9f0 net/ipv4/fib_frontend.c:285 Call Trace: <TASK> spec_dst_fill net/ipv4/ip_options.c:232 spec_dst_fill net/ipv4/ip_options.c:229 __ip_options_compile+0x13a1/0x17d0 net/ipv4/ip_options.c:330 ipv4_send_dest_unreach net/ipv4/route.c:1252 ipv4_link_failure+0x702/0xb80 net/ipv4/route.c:1265 dst_link_failure include/net/dst.h:437 __ip_vs_get_out_rt+0x15fd/0x19e0 net/netfilter/ipvs/ip_vs_xmit.c:412 ip_vs_nat_xmit+0x1d8/0xc80 net/netfilter/ipvs/ip_vs_xmit.c:764 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Slavin Liu <slavin452@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10netfilter: nf_conncount: fix leaked ct in error pathsFernando Fernandez Mancera
There are some situations where ct might be leaked as error paths are skipping the refcounted check and return immediately. In order to solve it make sure that the check is always called. Fixes: be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AESIlya Dryomov
None of the RBD code directly requires CRC32, CRYPTO, or CRYPTO_AES. These options are needed by CEPH_LIB code and they are selected there directly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@linux.dev>
2025-12-10ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AESEric Biggers
None of the CEPH_FS code directly requires CRC32, CRYPTO, or CRYPTO_AES. These options do get selected indirectly anyway via CEPH_LIB, which does need them, but there is no need for CEPH_FS to select them too. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10libceph: make decode_pool() more resilient against corrupted osdmapsIlya Dryomov
If the osdmap is (maliciously) corrupted such that the encoded length of ceph_pg_pool envelope is less than what is expected for a particular encoding version, out-of-bounds reads may ensue because the only bounds check that is there is based on that length value. This patch adds explicit bounds checks for each field that is decoded or skipped. Cc: stable@vger.kernel.org Reported-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Tested-by: ziming zhang <ezrakiez@gmail.com>
2025-12-10libceph: Amend checking to fix `make W=1` build breakageAndy Shevchenko
In a few cases the code compares 32-bit value to a SIZE_MAX derived constant which is much higher than that value on 64-bit platforms, Clang, in particular, is not happy about this net/ceph/osdmap.c:1441:10: error: result of comparison of constant 4611686018427387891 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1441 | if (len > (SIZE_MAX - sizeof(*pg)) / sizeof(u32)) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/ceph/osdmap.c:1624:10: error: result of comparison of constant 2305843009213693945 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1624 | if (len > (SIZE_MAX - sizeof(*pg)) / (2 * sizeof(u32))) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by casting to size_t. Note, that possible replacement of SIZE_MAX by U32_MAX may lead to the behaviour changes on the corner cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10ceph: Amend checking to fix `make W=1` build breakageAndy Shevchenko
In a few cases the code compares 32-bit value to a SIZE_MAX derived constant which is much higher than that value on 64-bit platforms, Clang, in particular, is not happy about this fs/ceph/snap.c:377:10: error: result of comparison of constant 2305843009213693948 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 377 | if (num > (SIZE_MAX - sizeof(*snapc)) / sizeof(u64)) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by casting to size_t. Note, that possible replacement of SIZE_MAX by U32_MAX may lead to the behaviour changes on the corner cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10ceph: add trace points to the MDS clientMax Kellermann
This patch adds trace points to the Ceph filesystem MDS client: - request submission (CEPH_MSG_CLIENT_REQUEST) and completion (CEPH_MSG_CLIENT_REPLY) - capabilities (CEPH_MSG_CLIENT_CAPS) These are the central pieces that are useful for analyzing MDS latency/performance problems from the client's perspective. In the long run, all doutc() calls should be replaced with tracepoints. This way, the Ceph filesystem can be traced at any time (without spamming the kernel log). Additionally, trace points can be used in BPF programs (which can even deference the pointer parameters and extract more values). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10libceph: fix log output race condition in OSD clientSimon Buttgereit
OSD client logging has a problem in get_osd() and put_osd(). For one logging output refcount_read() is called twice. If recount value changes between both calls logging output is not consistent. This patch prints out only the resulting value. [ idryomov: don't make the log messages more verbose ] Signed-off-by: Simon Buttgereit <simon.buttgereit@tu-ilmenau.de> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10blk-mq: delete task running check in blk_hctx_poll()Fengnan Chang
blk_hctx_poll() always checks if the task is running or not, and returns 1 if the task is running. This is a leftover from when polled IO was purely for synchronous IO, and doesn't make sense anymore when polled IO is purely asynchronous. Similarly, marking the task as TASK_RUNNING is also superflous, as the very much has to be running to enter the function in the first place. It looks like there has been this judgment for historical reasons, and in very early versions of this function the user would set the process state to TASK_UNINTERRUPTIBLE. Signed-off-by: Diangang Li <lidiangang@bytedance.com> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> [axboe: kill all remnants of task running, pointless now. massage message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-10Merge branch 'bpf-fix-bpf_d_path-helper-prototype'Alexei Starovoitov
Shuran Liu says: ==================== bpf: fix bpf_d_path() helper prototype Hi, This series fixes a verifier issue with bpf_d_path() and adds a regression test to cover its use within a hook function. Patch 1 updates the bpf_d_path() helper prototype so that the second argument is marked as MEM_WRITE. This makes it explicit to the verifier that the helper writes into the provided buffer. Patch 2 extends the existing d_path selftest to cover incorrect verifier assumptions caused by an incorrect function prototype. The test program calls bpf_d_path() and checks if the first character of the path can be read. It ensures the verifier does not assume the buffer remains unwritten. Changelog ========= v5: - Moved the temporary file for the fallocate test from /tmp to /dev/shm Since bpf CI's 9P filesystem under /tmp does not support fallocate. v4: - Use the fallocate hook instead of an LSM hook to simplify the selftest, as suggested by Matt and Alexei. - Add a utility function in test_d_path.c to load the BPF program, improving code reuse. v3: - Switch the pathname prefix loop to use bpf_for() instead of #pragma unroll, as suggested by Matt. - Remove /tmp/bpf_d_path_test in the test cleanup path. - Add the missing Reviewed-by tags. v2: - Merge the new test into the existing d_path selftest rather than creating new files. - Add PID filtering in the LSM program to avoid nondeterministic failures due to unrelated processes triggering bprm_check_security. - Synchronize child execution using a pipe to ensure deterministic updates to the PID. Thanks for your time and reviews. ==================== Link: https://patch.msgid.link/20251206141210.3148-1-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10selftests/bpf: add regression test for bpf_d_path()Shuran Liu
Add a regression test for bpf_d_path() to cover incorrect verifier assumptions caused by an incorrect function prototype. The test attaches to the fallocate hook, calls bpf_d_path() and verifies that a simple prefix comparison on the returned pathname behaves correctly after the fix in patch 1. It ensures the verifier does not assume the buffer remains unwritten. Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10bpf: Fix verifier assumptions of bpf_d_path's output bufferShuran Liu
Commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") started distinguishing read vs write accesses performed by helpers. The second argument of bpf_d_path() is a pointer to a buffer that the helper fills with the resulting path. However, its prototype currently uses ARG_PTR_TO_MEM without MEM_WRITE. Before 37cce22dbd51, helper accesses were conservatively treated as potential writes, so this mismatch did not cause issues. Since that commit, the verifier may incorrectly assume that the buffer contents are unchanged across the helper call and base its optimizations on this wrong assumption. This can lead to misbehaviour in BPF programs that read back the buffer, such as prefix comparisons on the returned path. Fix this by marking the second argument of bpf_d_path() as ARG_PTR_TO_MEM | MEM_WRITE so that the verifier correctly models the write to the caller-provided buffer. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20251206141210.3148-2-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10Merge branch 'inet-frags-flush-pending-skbs-in-fqdir_pre_exit'Jakub Kicinski
Jakub Kicinski says: ==================== inet: frags: flush pending skbs in fqdir_pre_exit() Fix the issue reported by NIPA starting on Sep 18th [1], where pernet_ops_rwsem is constantly held by a reader, preventing writers from grabbing it (specifically driver modules from loading). The fact that reports started around that time seems coincidental. The issue seems to be skbs queued for defrag preventing conntrack from exiting. First patch fixes another theoretical issue, it's mostly a leftover from an attempt to get rid of the inet_frag_queue refcnt, which I gave up on (still think it's doable but a bit of a time sink). Second patch is a minor refactor. The real fix is in the third patch. It's the simplest fix I can think of which is to flush the frag queues. Perhaps someone has a better suggestion? Last patch adds an explicit warning for conntrack getting stuck, as this seems like something that can easily happen if bugs sneak in. The warning will hopefully save us the first 20% of the investigation effort. Link: https://lore.kernel.org/20251001082036.0fc51440@kernel.org # [1] ==================== Link: https://patch.msgid.link/20251207010942.1672972-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10netfilter: conntrack: warn when cleanup is stuckJakub Kicinski
nf_conntrack_cleanup_net_list() calls schedule() so it does not show up as a hung task. Add an explicit check to make debugging leaked skbs/conntack references more obvious. Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: flush pending skbs in fqdir_pre_exit()Jakub Kicinski
We have been seeing occasional deadlocks on pernet_ops_rwsem since September in NIPA. The stuck task was usually modprobe (often loading a driver like ipvlan), trying to take the lock as a Writer. lockdep does not track readers for rwsems so the read wasn't obvious from the reports. On closer inspection the Reader holding the lock was conntrack looping forever in nf_conntrack_cleanup_net_list(). Based on past experience with occasional NIPA crashes I looked thru the tests which run before the crash and noticed that the crash follows ip_defrag.sh. An immediate red flag. Scouring thru (de)fragmentation queues reveals skbs sitting around, holding conntrack references. The problem is that since conntrack depends on nf_defrag_ipv6, nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its netns exit hooks run _after_ conntrack's netns exit hook. Flush all fragment queue SKBs during fqdir_pre_exit() to release conntrack references before conntrack cleanup runs. Also flush the queues in timer expiry handlers when they discover fqdir->dead is set, in case packet sneaks in while we're running the pre_exit flush. The commit under Fixes is not exactly the culprit, but I think previously the timer firing would eventually unblock the spinning conntrack. Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: add inet_frag_queue_flush()Jakub Kicinski
Instead of exporting inet_frag_rbtree_purge() which requires that caller takes care of memory accounting, add a new helper. We will need to call it from a few places in the next patch. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: avoid theoretical race in ip_frag_reinit()Jakub Kicinski
In ip_frag_reinit() we want to move the frag timeout timer into the future. If the timer fires in the meantime we inadvertently scheduled it again, and since the timer assumes a ref on frag_queue we need to acquire one to balance things out. This is technically racy, we should have acquired the reference _before_ we touch the timer, it may fire again before we take the ref. Avoid this entire dance by using mod_timer_pending() which only modifies the timer if its pending (and which exists since Linux v2.6.30) Note that this was the only place we ever took a ref on frag_queue since Eric's conversion to RCU. So we could potentially replace the whole refcnt field with an atomic flag and a bit more RCU. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10Merge branch 'selftests-fix-build-warnings-and-errors' (part)Jakub Kicinski
Guenter Roeck says: ==================== selftests: Fix build warnings and errors This series fixes build warnings and errors observed when trying to build selftests. ==================== Link: https://patch.msgid.link/20251205171010.515236-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10selftests: net: tfo: Fix build warningGuenter Roeck
Fix tfo.c: In function ‘run_server’: tfo.c:84:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ by evaluating the return value from read() and displaying an error message if it reports an error. Fixes: c65b5bb2329e3 ("selftests: net: add passive TFO test binary") Cc: David Wei <dw@davidwei.uk> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-14-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10selftests: net: Fix build warningsGuenter Roeck
Fix ksft.h: In function ‘ksft_ready’: ksft.h:27:9: warning: ignoring return value of ‘write’ declared with attribute ‘warn_unused_result’ ksft.h: In function ‘ksft_wait’: ksft.h:51:9: warning: ignoring return value of ‘read’ declared with attribute ‘warn_unused_result’ by checking the return value of the affected functions and displaying an error message if an error is seen. Fixes: 2b6d490b82668 ("selftests: drv-net: Factor out ksft C helpers") Cc: Joe Damato <jdamato@fastly.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-11-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10selftest: af_unix: Support compilers without flex-array-member-not-at-end ↵Guenter Roeck
support Fix: gcc: error: unrecognized command-line option ‘-Wflex-array-member-not-at-end’ by making the compiler option dependent on its support. Fixes: 1838731f1072c ("selftest: af_unix: Add -Wall and -Wflex-array-member-not-at-end to CFLAGS.") Cc: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://patch.msgid.link/20251205171010.515236-7-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10selftests: tls: fix warning of uninitialized variableAnkit Khushwaha
In 'poll_partial_rec_async' a uninitialized char variable 'token' with is used for write/read instruction to synchronize between threads via a pipe. tls.c:2833:26: warning: variable 'token' is uninitialized when passed as a const pointer argument Initialize 'token' to '\0' to silence compiler warning. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Link: https://patch.msgid.link/20251205163242.14615-1-ankitkhushwaha.linux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10broadcom: b44: prevent uninitialized value usageAlexey Simakov
On execution path with raised B44_FLAG_EXTERNAL_PHY, b44_readphy() leaves bmcr value uninitialized and it is used later in the code. Add check of this flag at the beginning of the b44_nway_reset() and exit early of the function with restarting autonegotiation if an external PHY is used. Fixes: 753f492093da ("[B44]: port to native ssb support") Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20251205155815.4348-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10net/handshake: restore destructor on submit failurecaoping
handshake_req_submit() replaces sk->sk_destruct but never restores it when submission fails before the request is hashed. handshake_sk_destruct() then returns early and the original destructor never runs, leaking the socket. Restore sk_destruct on the error path. Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: caoping <caoping@cmss.chinamobile.com> Link: https://patch.msgid.link/20251204091058.1545151-1-caoping@cmss.chinamobile.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10net: ti: icssg-prueth: add PTP_1588_CLOCK_OPTIONAL dependencyArnd Bergmann
The new icssg-prueth driver needs the same dependency as the other parts that use the ptp-1588: WARNING: unmet direct dependencies detected for TI_ICSS_IEP Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y] Selected by [y]: - TI_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && NET_SWITCHDEV [=y] Add the correct dependency on the two drivers missing it, and remove the pointless 'imply' in the process. Fixes: e654b85a693e ("net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms") Fixes: 511f6c1ae093 ("net: ti: icssm-prueth: Adds ICSSM Ethernet driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251204100138.1034175-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10net: openvswitch: fix middle attribute validation in push_nsh() actionIlya Maximets
The push_nsh() action structure looks like this: OVS_ACTION_ATTR_PUSH_NSH(OVS_KEY_ATTR_NSH(OVS_NSH_KEY_ATTR_BASE,...)) The outermost OVS_ACTION_ATTR_PUSH_NSH attribute is OK'ed by the nla_for_each_nested() inside __ovs_nla_copy_actions(). The innermost OVS_NSH_KEY_ATTR_BASE/MD1/MD2 are OK'ed by the nla_for_each_nested() inside nsh_key_put_from_nlattr(). But nothing checks if the attribute in the middle is OK. We don't even check that this attribute is the OVS_KEY_ATTR_NSH. We just do a double unwrap with a pair of nla_data() calls - first time directly while calling validate_push_nsh() and the second time as part of the nla_for_each_nested() macro, which isn't safe, potentially causing invalid memory access if the size of this attribute is incorrect. The failure may not be noticed during validation due to larger netlink buffer, but cause trouble later during action execution where the buffer is allocated exactly to the size: BUG: KASAN: slab-out-of-bounds in nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch] Read of size 184 at addr ffff88816459a634 by task a.out/22624 CPU: 8 UID: 0 PID: 22624 6.18.0-rc7+ #115 PREEMPT(voluntary) Call Trace: <TASK> dump_stack_lvl+0x51/0x70 print_address_description.constprop.0+0x2c/0x390 kasan_report+0xdd/0x110 kasan_check_range+0x35/0x1b0 __asan_memcpy+0x20/0x60 nsh_hdr_from_nlattr+0x1dd/0x6a0 [openvswitch] push_nsh+0x82/0x120 [openvswitch] do_execute_actions+0x1405/0x2840 [openvswitch] ovs_execute_actions+0xd5/0x3b0 [openvswitch] ovs_packet_cmd_execute+0x949/0xdb0 [openvswitch] genl_family_rcv_msg_doit+0x1d6/0x2b0 genl_family_rcv_msg+0x336/0x580 genl_rcv_msg+0x9f/0x130 netlink_rcv_skb+0x11f/0x370 genl_rcv+0x24/0x40 netlink_unicast+0x73e/0xaa0 netlink_sendmsg+0x744/0xbf0 __sys_sendto+0x3d6/0x450 do_syscall_64+0x79/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Let's add some checks that the attribute is properly sized and it's the only one attribute inside the action. Technically, there is no real reason for OVS_KEY_ATTR_NSH to be there, as we know that we're pushing an NSH header already, it just creates extra nesting, but that's how uAPI works today. So, keeping as it is. Fixes: b2d0f5d5dc53 ("openvswitch: enable NSH support") Reported-by: Junvy Yang <zhuque@tencent.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron echaudro@redhat.com Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20251204105334.900379-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10drm/mgag200: Fix big-endian supportRené Rebe
Unlike the original, deleted Matrox mga driver, the new mgag200 driver has the XRGB frame-buffer byte swapped on big-endian "RISC" systems. Fix by enabling byte swapping "PowerPC" OPMODE for any __BIG_ENDIAN config. Fixes: 414c45310625 ("mgag200: initial g200se driver (v2)") Signed-off-by: René Rebe <rene@exactco.de> Cc: stable@kernel.org Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251208.141827.965103015954471168.rene@exactco.de
2025-12-10can: gs_usb: gs_can_open(): fix error handlingMarc Kleine-Budde
Commit 2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling") added missing error handling to the gs_can_open() function. The driver uses 2 USB anchors to track the allocated URBs: the TX URBs in struct gs_can::tx_submitted for each netdev and the RX URBs in struct gs_usb::rx_submitted for the USB device. gs_can_open() allocates the RX URBs, while TX URBs are allocated during gs_can_start_xmit(). The cleanup in gs_can_open() kills all anchored dev->tx_submitted URBs (which is not necessary since the netdev is not yet registered), but misses the parent->rx_submitted URBs. Fix the problem by killing the rx_submitted instead of the tx_submitted. Fixes: 2603be9e8167 ("can: gs_usb: gs_can_open(): improve error handling") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251210-gs_usb-fix-error-handling-v1-1-d6a5a03f10bb@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-12-10Merge tag 'locking-futex-2025-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex updates from Ingo Molnar: - Standardize on ktime_t in restart_block::time as well (Thomas Weißschuh) - Futex selftests: - Add robust list testcases (André Almeida) - Formatting fixes/cleanups (Carlos Llamas) * tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Store time as ktime_t in restart block selftests/futex: Create test for robust list selftests/futex: Skip tests if shmget unsupported selftests/futex: Add newline to ksft_exit_fail_msg() selftests/futex: Remove unused test_futex_mpol()
2025-12-10can: fix build dependencyArnd Bergmann
A recent bugfix introduced a new problem with Kconfig dependencies: WARNING: unmet direct dependencies detected for CAN_DEV Depends on [n]: NETDEVICES [=n] && CAN [=m] Selected by [m]: - CAN [=m] && NET [=y] Since the CAN core code now links into the CAN device code, that particular function needs to be available, though the rest of it does not. Revert the incomplete fix and instead use Makefile logic to avoid the link failure. Fixes: cb2dc6d2869a ("can: Kconfig: select CAN driver infrastructure by default") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512091523.zty3CLmc-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251204100015.1033688-1-arnd@kernel.org [mkl: removed module option from CAN_DEV help text (thanks Vincent)] [mkl: removed '&& CAN' from Kconfig dependency (thanks Vincent)] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-12-10Merge tag 'kbuild-6.19-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fix from Nathan Chancellor: - Fix install-extmod-build when ccache is used via CC * tag 'kbuild-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: install-extmod-build: Properly fix CC expansion when ccache is used
2025-12-09selftests/bpf: Add test for truncated dmabuf_iter readsT.J. Mercier
If many dmabufs are present, reads of the dmabuf iterator can be truncated at PAGE_SIZE or user buffer size boundaries before the fix in "bpf: Fix truncated dmabuf iterator reads". Add a test to confirm truncation does not occur. Signed-off-by: T.J. Mercier <tjmercier@google.com> Link: https://lore.kernel.org/r/20251204000348.1413593-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09bpf: Fix truncated dmabuf iterator readsT.J. Mercier
If there is a large number (hundreds) of dmabufs allocated, the text output generated from dmabuf_iter_seq_show can exceed common user buffer sizes (e.g. PAGE_SIZE) necessitating multiple start/stop cycles to iterate through all dmabufs. However the dmabuf iterator currently returns NULL in dmabuf_iter_seq_start for all non-zero pos values, which results in the truncation of the output before all dmabufs are handled. After dma_buf_iter_begin / dma_buf_iter_next, the refcount of the buffer is elevated so that the BPF iterator program can run without holding any locks. When a stop occurs, instead of immediately dropping the reference on the buffer, stash a pointer to the buffer in seq->priv until either start is called or the iterator is released. This also enables the resumption of iteration without first walking through the list of dmabufs based on the pos value. Fixes: 76ea95534995 ("bpf: Add dmabuf iterator") Signed-off-by: T.J. Mercier <tjmercier@google.com> Link: https://lore.kernel.org/r/20251204000348.1413593-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeaturesYongxin Liu
Zero can be a valid value of num_records. For example, on Intel Atom x6425RE, only x87 and SSE are supported (features 0, 1), and fpu_user_cfg.max_features is 3. The for_each_extended_xfeature() loop only iterates feature 2, which is not enabled, so num_records = 0. This is valid and should not cause core dump failure. The issue is that dump_xsave_layout_desc() returns 0 for both genuine errors (dump_emit() failure) and valid cases (no extended features). Use negative return values for errors and only abort on genuine failures. Fixes: ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86 core files") Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251210000219.4094353-2-yongxin.liu@windriver.com
2025-12-10Merge tag 'input-for-v6.19-rc0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: - DT bindings for Melfas MIP4 touchscreen controller and TWL4030 keypad have been converted to the DT schema - simple touch controller bindings have been consolidated to trivial-touch.yaml DT schema - memory allocation failure noise was removed from qnap-mcu-input and zforce_ts dirvers - ti_am335x_tsc driver was hardened to handle invalid (too large) number of coordinates specified in device tree - a cleanup in Cypress cyttsp5 driver to use %pe to print error code * tag 'input-for-v6.19-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: ti_am335x_tsc - clamp coordinate_readouts to DT maximum (6) dt-bindings: touchscreen: consolidate simple touch controller to trivial-touch.yaml dt-bindings: touchscreen: trivial-touch: add reset-gpios and wakeup-source dt-bindings: input: ti,twl4030-keypad: convert to DT schema Input: zforce_ts - omit error message when memory allocation fails Input: qnap-mcu-input - omit error message when memory allocation fails dt-bindings: input: Convert MELFAS MIP4 Touchscreen to DT schema dt-bindings: touchscreen: move ar1021.txt to trivial-touch.yaml dt-bindings: touchscreen: rename maxim,max11801.yaml to trivial-touch.yaml Input: cyttsp5 - use %pe format specifier
2025-12-10Merge tag 'trace-v6.19-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix unused tracepoint build for modules only using exported tracepoints The tracepoint-update.c code that looks for unused tracepoints expects if tracepoints are used then it will have tracepoints defined. If not, it errors out which fails the build. In most cases this the way things work. A tracepoint can't be used if it is not defined. There is one exception; If a module only uses tracepoints that are defined in other modules or the vmlinux proper, where the tracepoints are exported. In this case, the tracepoint-update.c code thinks tracepoints are used but not defined and errors out, failing the build. When tracepoint-update.c detects this case, if it is a module that is being processed, exit out normally as it is a legitimate case. - Add tracepoint-update.c to MAINTAINERS file The tracepoint-update.c file is specific to tracing so add it to the tracing subsystem in the MAINTAINERS file. * tag 'trace-v6.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: MAINTAINERS: Add tracepoint-update.c to TRACING section tracing: Fix unused tracepoints when module uses only exported ones
2025-12-10x86/boot/Documentation: Fix htmldocs build warning due to malformed table in ↵Swaraj Gaikwad
boot.rst Sphinx reports htmldocs warnings: Documentation/arch/x86/boot.rst:437: ERROR: Malformed table. Text in column margin in table line 2. The table header defined the first column width as 2 characters ("=="), which is too narrow for entries like "0x10" and "0x13". This caused the text to spill into the margin, triggering a docutils parsing failure. Fix it by extending the first column of assigned boot loader ID to 4 characters ("====") to fit the widest entries. Fixes: 1c3377bee212 ("x86/boot/Documentation: Prefix hexadecimal literals with 0x") Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Swaraj Gaikwad <swarajgaikwad1925@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20251210092814.9986-1-swarajgaikwad1925@gmail.com
2025-12-09Merge branch ↵Alexei Starovoitov
'bpf-x86-unwind-orc-support-reliable-unwinding-through-bpf-stack-frames' Josh Poimboeuf says: ==================== bpf, x86/unwind/orc: Support reliable unwinding through BPF stack frames Fix livepatch stalls which may be seen when a task is blocked with BPF JIT on its kernel stack. Changes since v1 (https://lore.kernel.org/cover.1764699074.git.jpoimboe@kernel.org): - fix NULL ptr deref in __arch_prepare_bpf_trampoline() ==================== Link: https://patch.msgid.link/cover.1764818927.git.jpoimboe@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-09x86/unwind/orc: Support reliable unwinding through BPF stack framesJosh Poimboeuf
BPF JIT programs and trampolines use a frame pointer, so the current ORC unwinder strategy of falling back to frame pointers (when an ORC entry is missing) usually works in practice when unwinding through BPF JIT stack frames. However, that frame pointer fallback is just a guess, so the unwind gets marked unreliable for live patching, which can cause livepatch transition stalls. Make the common case reliable by calling the bpf_has_frame_pointer() helper to detect the valid frame pointer region of BPF JIT programs and trampolines. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Closes: https://lore.kernel.org/0e555733-c670-4e84-b2e6-abb8b84ade38@crowdstrike.com Acked-by: Song Liu <song@kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/a18505975662328c8ffb1090dded890c6f8c1004.1764818927.git.jpoimboe@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org>
2025-12-09bpf: Add bpf_has_frame_pointer()Josh Poimboeuf
Introduce a bpf_has_frame_pointer() helper that unwinders can call to determine whether a given instruction pointer is within the valid frame pointer region of a BPF JIT program or trampoline (i.e., after the prologue, before the epilogue). This will enable livepatch (with the ORC unwinder) to reliably unwind through BPF JIT frames. Acked-by: Song Liu <song@kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/fd2bc5b4e261a680774b28f6100509fd5ebad2f0.1764818927.git.jpoimboe@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org>
2025-12-09bpf, arm64: Do not audit capability check in do_jit()Ondrej Mosnacek
Analogically to the x86 commit 881a9c9cb785 ("bpf: Do not audit capability check in do_jit()"), change the capable() call to ns_capable_noaudit() in order to avoid spurious SELinux denials in audit log. The commit log from that commit applies here as well: """ The failure of this check only results in a security mitigation being applied, slightly affecting performance of the compiled BPF program. It doesn't result in a failed syscall, an thus auditing a failed LSM permission check for it is unwanted. For example with SELinux, it causes a denial to be reported for confined processes running as root, which tends to be flagged as a problem to be fixed in the policy. Yet dontauditing or allowing CAP_SYS_ADMIN to the domain may not be desirable, as it would allow/silence also other checks - either going against the principle of least privilege or making debugging potentially harder. Fix it by changing it from capable() to ns_capable_noaudit(), which instructs the LSMs to not audit the resulting denials. """ Fixes: f300769ead03 ("arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Link: https://lore.kernel.org/r/20251204125916.441021-1-omosnace@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>