summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2 daysPCI: endpoint: pci-epf-vntb: Implement .db_vector_count()/mask() for doorbellsKoichiro Den
Implement .db_vector_count() and .db_vector_mask() so NTB core/clients can map doorbell events to per-vector work and avoid the thundering-herd behavior. pci-epf-vntb reserves two slots in db_count: slot 0 for link events and slot 1 which is historically unused. Therefore the number of doorbell vectors is (db_count - 2). Report vectors as 0..N-1 and return BIT_ULL(db_vector) for the corresponding doorbell bit. Build db_valid_mask from a validated vector count so out-of-range db_count values cannot create invalid shifts. Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260513024923.451765-8-den@valinux.co.jp
2 daysPCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_maskKoichiro Den
In pci-epf-vntb, db_count represents the total number of doorbell slots exposed to the peer, including: - slot #0 reserved for link events, and - slot #1 historically unused (kept for compatibility). Only the remaining slots correspond to actual doorbell bits. The current db_valid_mask() exposes all slots as valid doorbells. Limit db_valid_mask() to the real doorbell bits by returning BIT_ULL(db_count - 2) - 1, and guard against db_count < 2. Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260513024923.451765-7-den@valinux.co.jp
2 daysMerge tag 'platform-drivers-x86-v7.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: - amd/hfi: Add support for dynamic ranking tables (version 3) - amd/pmc: - Add PMC driver support for AMD 1Ah M80H SoC - Delay suspend for some Lenovo Laptops to avoid keyboard and lid switch problems after s2idle - arm64: qcom-hamoa-ec: Add Hamoa/Purwa/Glymur EC driver - asus-armoury: add support for G614PR, GA402NJ, GA403UM, and FX608JPR - asus-wmi: add keystone dongle support - dell-dw5826e: Add reset driver for DW5826e - dell-laptop: Fix rollback path - hp-wmi: - Add support for Omen 16-ap0xxx (board ID 8D26) and board ID 8B2F - intel-hid: - Add HP ProBook x360 440 G1 5 button array support - Prevent racing ACPI notify handlers - intel/pmc: - Add Nova Lake support - Rate-limit LTR scale-factor warning - intel-uncore-freq: - Expose instance ID in the sysfs - Fix current_freq_khz after CPU hotplug - intel/vsec: Restore BAR fallback for header walk - ISST: Restore SST-PP control to all domains - lenovo-wmi-*: - Add more CPU tunable attributes - Add GPU tunable attributes - Add WMI battery charge limiting - oxpec: add support for OneXPlayer Super X - sel3350-platform: Retain LED state on load and unload - surface: SAM: Add support for Surface Pro 12in - uniwill-laptop: Add support for battery charge modes - tools/power/x86/intel-speed-select: Harden daemon pidfile open - Major refactoring efforts: - ACPI driver to platform driver conversion - Converting drivers to use the improved WMI API - Miscellaneous cleanups / refactoring / improvements * tag 'platform-drivers-x86-v7.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (115 commits) platform/x86/intel/pmc: Add NVL PCI IDs for SSRAM telemetry discovery platform/x86/intel/pmc/ssram: Make PMT registration optional platform/x86/intel/pmc/ssram: Add ACPI discovery scaffolding platform/x86/intel/pmc/ssram: Switch to static array with per-index probe state platform/x86/intel/pmc/ssram: Refactor DEVID/PWRMBASE extraction into helper platform/x86/intel/pmc/ssram: Add PCI platform data platform/x86/intel/pmc/ssram: Rename probe and PCI ID table for consistency platform/x86/intel/pmc: Add ACPI PWRM telemetry driver for Nova Lake S platform/x86/intel/pmc: Add PMC SSRAM Kconfig description platform/x86/intel/pmt: Unify header fetch and add ACPI source platform/x86/intel/pmt: Cache the telemetry discovery header platform/x86/intel/pmt: Pass discovery index instead of resource platform/x86/intel/pmt/telemetry: Move overlap check to post-decode hook platform/x86/intel/pmt/crashlog: Split init into pre-decode platform/x86/intel/pmt: Add pre/post decode hooks around header parsing modpost: Handle malformed WMI GUID strings platform/wmi: Make sysfs attributes const platform/wmi: Make wmi_bus_class const hwmon: (dell-smm) Use new buffer-based WMI API platform/x86: dell-ddv: Use new buffer-based WMI API ...
3 daysMerge tag 'nvme-7.2-2026-06-23' of git://git.infradead.org/nvme into block-7.2Jens Axboe
Pull NVMe fixes from Keith: "- Apple A11 quirk for sharing tags across admin and IO queues (Nick) - Target fix for short AUTH_RECEIVE buffers (Michael) - Target fix for SQ refcount leak (Wentao) - Target RDMA handling inline data with nonzero offset (Bryam) - Target TCP fix handling the TCP_CLOSING state (Maurizio) - FC abort fixes in early initialization (Mohamed) - Controller device teardown fixes (Maurizio, John) - Allocate the target ana_state with the port (Rosen) - Quieten sparse and sysfs symbol warnings (John)" * tag 'nvme-7.2-2026-06-23' of git://git.infradead.org/nvme: nvmet-tcp: handle TCP_CLOSING state in nvmet_tcp_state_change nvmet-auth: reject short AUTH_RECEIVE buffers nvme-fc: Do not cancel requests in io target before it is initialized nvme: make nvme_add_ns{_head}_cdev return void nvme: make some sysfs diagnostic structures static nvmet-rdma: handle inline data with a nonzero offset nvme: target: allocate ana_state with port nvme: fix crash and memory leak during invalid cdev teardown nvmet: fix refcount leak in nvmet_sq_create() nvme: quieten sparse warning in valid LBA size check nvme-apple: Prevent shared tags across queues on Apple A11
3 daysMerge tag 'mailbox-v7.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: "Core: - add debugfs support for used channels - fix resource leak on startup failure - propagate tx error codes - clarify blocking mode thread support Drivers: - exynos: remove unused register definitions - imx: refactor IRQ handlers, migrate to devm helpers, and other minor improvements - mpfs: fix syscon presence check in inbox ISR - mtk-adsp: fix use-after-free during device teardown - qcom: add dt-bindings for QCOM Maili, Hawi, Shikra APCS, and Nord CPUCP platform support" * tag 'mailbox-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (23 commits) mailbox: imx: Don't force-thread the primary handler mailbox: imx: Move the RXDB part of the mailbox into the threaded handler mailbox: imx: Move the RX part of the mailbox into the threaded handler mailbox: imx: Start splitting the IRQ handler in primary and threaded handler mailbox: imx: Use channel index instead of zero in imx_mu_specific_rx() mailbox: imx: use devm_of_platform_populate() mailbox: imx: Use devm_pm_runtime_enable() mailbox: imx: Add a channel shutdown field mailbox: imx: Forward the timeout/ error in imx_mu_generic_tx() dt-bindings: mailbox: qcom: Add IPCC support for Maili Platform mailbox: add list of used channels to debugfs mailbox: don't free the channel if the startup callback failed mailbox: Make mbox_send_message() return error code when tx fails mailbox: Clarify multi-thread is not supported in blocking mode mailbox: mtk-adsp: fix UAF during device teardown mailbox: qcom: Unify user-visible "Qualcomm" name mailbox: exynos: Drop unused register definitions dt-bindings: mailbox: qcom: Add IPCC support for Hawi Platform dt-bindings: mailbox: qcom,cpucp-mbox: Add Hawi compatible dt-bindings: mailbox: qcom: Add Shikra APCS compatible ...
3 daysMerge tag 'for-next-tpm-7.2-rc1-fixed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Only bug fixes" * tag 'for-next-tpm-7.2-rc1-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: fix event_size output in tpm1_binary_bios_measurements_show tpm: tpm_crb_ffa: revert defered_probed when tpm_crb_ffa is built-in tpm: tpm2-sessions: wait for async KPP completion in tpm_buf_append_salt tpm: tpm_tis: Add settle time for some TPMs tpm: tpm_tis: store entire did_vid tpm_crb: Check ACPI_COMPANION() against NULL during probe tpm: tpm_tis_spi: Use wait_woken() in wait_for_tmp_stat() tpm: Initialize name_size_alg for non-NULL name in tpm_buf_append_name() tpm: restore timeout for key creation commands tpm: svsm: constify tpm_chip_ops
3 daysMerge tag 'linux_kselftest-next-7.2-rc1-second' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull more kselftest updates from Shuah Khan: "Docs: -remove obsolete wiki link from kselftest.rst ftrace: - drop invalid top-level local in test_ownership - Fix trace_marker_raw test on 64K page kernels" * tag 'linux_kselftest-next-7.2-rc1-second' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: docs: kselftest: remove link to obsolete wiki selftests/ftrace: Fix trace_marker_raw test on 64K page kernels selftests/ftrace: Drop invalid top-level local in test_ownership
3 daysnetfilter: nf_conntrack_helper: cap maximum number of expectation at helper ↵Pablo Neira Ayuso
registration On helper registration, the maximum number of expectations cannot go over NF_CT_EXPECT_MAX_CNT (255), but zero can be specified then nf_conntrack_expect_max applies. Turn zero into NF_CT_EXPECT_MAX_CNT otherwise, expectation LRU eviction on insertion is disabled. Moreover, expand this sanity check all expectation classes. This max_expecy policy is only tunable since userspace helpers are available, set Fixes: tag to the commit that adds such infrastructure. Remove the check for p->max_expected given this field must always be non-zero after this patch. Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nft_ct: expectation timeouts are passed in millisecondsFlorian Westphal
Userspace passes '5000' in case user asks for 5 seconds. Allowing for sub-second expectation lifetimes makes sense to me. so fix up the kernel side instead of munging nft to send a value rounded up to next second. Also note that this violates nft convention of passing integers in network byte order, but we can't change this anymore. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nf_conntrack_expect: run expectation eviction with no helperPablo Neira Ayuso
Run expectation eviction if no helper is specified to deal with the nft_ct expectation support. Cap the maximum expectation limit per master conntrack to NF_CT_EXPECT_MAX_CNT (255). Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nf_conntrack_expect: store master_tuple in expectationPablo Neira Ayuso
Store master conntrack tuple in the expectation since exp->master might refer to a different conntrack when accessed from rcu read side lock area due to typesafe rcu rules. Fixes: 02a3231b6d82 ("netfilter: nf_conntrack_expect: store netns and zone in expectation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: conntrack: add deprecation warnings for irc and pptp trackersFlorian Westphal
IRC Direct client-to-client requires plaintext. IRC over TLS should be preferred, making this helper ineffective. Add a deprecation warning and update the help text to better reflect that this is needed for the DCC extension, not IRC itself. PPTP is esoteric these days and it is the only helper that requires the destroy callback in the conntrack helper API. Removal would simplify the conntrack core. Both helpers are IPv4 only. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysgpio: davinci: fix IRQ domain leak on devm_kzalloc failureQingshuang Fu
In davinci_gpio_irq_setup(), after successfully creating an IRQ domain with irq_domain_create_legacy(), a subsequent devm_kzalloc() failure in the bank loop causes the function to return -ENOMEM without removing the IRQ domain. Unlike devm-managed resources, irq_domain_create_legacy() does not auto-clean up on probe failure, so the domain is leaked. Fix by calling irq_domain_remove() before returning on allocation failure. Fixes: b5cf3fd827d2 ("gpio: davinci: Redesign driver to accommodate ngpios in one gpio chip") Signed-off-by: Qingshuang Fu <fuqingshuang@kylinos.cn> Link: https://patch.msgid.link/20260623023106.117229-1-fffsqian@163.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
3 daysgpio: tegra: do not call pinctrl for GPIO directionRunyu Xiao
tegra_gpio_direction_input() and tegra_gpio_direction_output() already program the GPIO controller direction registers directly. The additional pinctrl_gpio_direction_input/output() calls do not add a Tegra pinctrl operation, because the Tegra pinmux ops provide GPIO request/free handling but no gpio_set_direction hook. The extra call still enters the pinctrl core and takes pctldev->mutex. Shared GPIO users can call the direction path while holding their per-line spinlock, so this otherwise redundant pinctrl direction call can sleep in an atomic context. This was found by our static analysis tool and then confirmed by manual review of tegra_gpio_probe(), the Tegra GPIO direction callbacks and the Tegra pinctrl ops. The reviewed path has a default non-sleeping struct gpio_chip while the direction callback still enters the pinctrl mutex path. A directed runtime validation kept the same non-sleeping chip registration and drove: gpio_shared_proxy_direction_output() gpiod_direction_output_raw_commit() tegra_gpio_direction_output() pinctrl_gpio_direction_output() Lockdep reported a sleep-in-atomic warning with the shared GPIO spinlock held and pinctrl_get_device_gpio_range() plus tegra_gpio_direction_output() on the stack. Do not mark the whole chip as can_sleep to paper over this: can_sleep describes whether get()/set() may sleep, and Tegra value access is MMIO. Remove the redundant pinctrl direction calls and keep pinctrl involvement in the existing request/free path. Fixes: 11da90541283 ("gpio: tegra: Fix offset of pinctrl calls") Cc: stable@vger.kernel.org Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> Link: https://patch.msgid.link/20260619152439.1239561-1-runyu.xiao@seu.edu.cn Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
3 daysapparmor: mediate the implicit connect of TCP fast open sendmsgBryam Vargas
sendmsg()/sendto() with MSG_FASTOPEN is a combination of connect(2) and write(2): it opens the connection in the SYN. apparmor_socket_sendmsg() only checks AA_MAY_SEND, so a profile that grants send but denies connect lets a confined task open an outbound TCP/MPTCP connection that connect(2) would have refused, bypassing connect mediation. Mediate the implicit connect when MSG_FASTOPEN is set and a destination is supplied. Add it to apparmor_socket_sendmsg() (not the shared aa_sock_msg_perm() helper, which recvmsg also uses) and call aa_sk_perm() directly, mirroring the selinux and tomoyo fixes. sk_is_tcp() does not cover MPTCP fast open, so the SOCK_STREAM/IPPROTO_MPTCP arm is explicit. Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Signed-off-by: John Johansen <john.johansen@canonical.com>
3 daysnetfilter: ctnetlink: do not allow to reset helper on existing conntrackPablo Neira Ayuso
This feature allows to reset a helper for an existing conntrack, but it is not safe. This requires a synchronized_rcu() call after resetting the helper, which is going to be expensive for a large batch of conntrack entries. This also needs to call to the .destroy callback to release the GRE/PPTP mappings to fix it. This feature antedates the creation of the conntrack-tools and I cannot find a good use-case for this. Given that I cannot find any user in the netfilter.org userspace tree, I prefer to remove this feature. Fixes: c1d10adb4a52 ("[NETFILTER]: Add ctnetlink port for nf_conntrack") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysselftests: nft_queue.sh: add a bridge queue testFlorian Westphal
Add a test queueing from bridge family. This was lacking: we queued from inet for ipv4 and ipv6 but we had no bridge queue test so far. Given kernel MUST validate that in/out port are still part of a bridge device on reinject add a test case for this before adding this check. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nft_compat: ebtables emulation must reject non-bridge targetsFlorian Westphal
xtables targets return netfilter verdicts: NF_ACCEPT, NF_DROP, and so on. ebtables targets return incompatible verdicts: EBT_ACCEPT, EBT_DROP, ... We cannot allow fallback to NFPROTO_UNSPEC. ebtables doesn't permit this since 11ff7288beb2 ("netfilter: ebtables: reject non-bridge targets") but that commit missed the nft_compat layer. Reported-by: Ren Wei <n05ec@lzu.edu.cn> Reported-by: Wyatt Feng <bronzed_45_vested@icloud.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysselftests: netfilter: conntrack_sctp_collision.sh: Introduce SCTP INIT ↵Yi Chen
collision test The existing test covered a scenario where a delayed INIT_ACK chunk updates the vtag in conntrack after the association has already been established. A similar issue can occur with a delayed SCTP INIT chunk. Add a new simultaneous-open test case where the client's INIT is delayed, allowing conntrack to establish the association based on the server-initiated handshake. When the stale INIT arrives later, it may get recorded and cause a following INIT_ACK from the peer to be accepted instead of dropped. This INIT_ACK overwrites the vtag in conntrack, causing subsequent SCTP DATA chunks to be considered as invalid and then dropped by nft rules matching on ct state invalid. This test verifies such stale INIT chunks do not cause problems. Signed-off-by: Yi Chen <yiche.cy@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nft_synproxy: stop bypassing the priv->info snapshotRunyu Xiao
nft_synproxy_eval_v4() and nft_synproxy_eval_v6() already take a whole-object READ_ONCE() snapshot of the shared priv->info state before building the SYNACK reply, but nft_synproxy_tcp_options() still masks opts->options with priv->info.options from the live shared object. When a named synproxy object is updated concurrently with SYN traffic, the eval path can then mix mss and timestamp handling from the local snapshot with an options mask taken from a newer configuration, so one SYNACK no longer reflects a coherent synproxy configuration. Use info->options so nft_synproxy_tcp_options() stays on the same local snapshot that the eval path already copied from priv->info. Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support") Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: x_tables.h: fix all kernel-doc warningsRandy Dunlap
- use correct names in kernel-doc comments - add missing struct members to kernel-doc comments Warning: include/linux/netfilter/x_tables.h:41 struct member 'targinfo' not described in 'xt_action_param' Warning: include/linux/netfilter/x_tables.h:41 Excess struct member 'targetinfo' description in 'xt_action_param' Warning: include/linux/netfilter/x_tables.h:90 struct member 'family' not described in 'xt_mtchk_param' Warning: include/linux/netfilter/x_tables.h:90 struct member 'nft_compat' not described in 'xt_mtchk_param' Warning: include/linux/netfilter/x_tables.h:101 expecting prototype for struct xt_mdtor_param. Prototype was for struct xt_mtdtor_param instead Warning: include/linux/netfilter/x_tables.h:121 struct member 'net' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'table' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'target' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'targinfo' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'hook_mask' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'family' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:121 struct member 'nft_compat' not described in 'xt_tgchk_param' Warning: include/linux/netfilter/x_tables.h:345 expecting prototype for xt_recseq(). Prototype was for DECLARE_PER_CPU() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: flowtable: Validate iph->ihl in nf_flow_ip4_tunnel_proto()Lorenzo Bianconi
Add sanity check for iph->ihl field in nf_flow_ip4_tunnel_proto() before using it to compute the header size, avoiding out-of-bounds access with malformed IP headers. While at it, use iph->protocol instead of the hardcoded IPPROTO_IPIP constant when setting ctx->tun.proto and reference ctx->tun.hdr_size when updating ctx->offset. Fixes: ab427db178858 ("netfilter: flowtable: Add IPIP rx sw acceleration") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nf_conncount: prevent connlimit drops for early confirmed ctFernando Fernandez Mancera
Commit 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped") introduced a regression where packets for valid connections are dropped when using connlimit for soft-limiting scenarios. The issue occurs when a new connection reuses a socket currently in the TIME_WAIT state. In this scenario, the connection tracking entry is evaluated as already confirmed. Previously, __nf_conncount_add() assumed that if a connection was confirmed and did not originate from the loopback interface, it should skip the addition and return -EEXIST. Skipping the addition triggers a garbage collection run that cleans up the TIME_WAIT connection. Consequently, the active connection count drops to 0, which xt_connlimit mishandles, leading to the false rejection of the perfectly valid new connection. Fix this by replacing the interface check with protocol-agnostic state checks. We now skip the tree insertion and preserve the lockless garbage collection optimization only if the connection is IPS_ASSURED. This allows early-confirmed setup packets (such as reused TIME_WAIT sockets or locally generated SYN-ACKs) to be properly evaluated and counted without falsely dropping. The goto check_connections path is maintained to ensure these setup packets are deduplicated correctly. This has been tested with slowhttptest and HTTP server configured locally to ensure we are not breaking soft-limiting scenarios for local or external connections. In addition, it was tested with a OVS zone limit too. Fixes: 69894e5b4c5e ("netfilter: nft_connlimit: update the count if add was skipped") Reported-by: Alejandro Olivan Alvarez <alejandro.olivan.alvarez@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/177349610461.3071718.4083978280323144323@eldamar.lan/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysnetfilter: nf_nat: avoid invalid nat_net pointer use on failed nf_nat_init()Mathias Krause
We ran into below KASAN splat, which is mostly uninteresting, beside for having nf_nat_register_fn() in the call chain as a cause for the offending access: ================================================================== BUG: KASAN: slab-out-of-bounds in nf_nat_register_fn+0x5f9/0x640 Read of size 8 at addr ffff890031e54c20 by task iptables/9510 CPU: 0 UID: 0 PID: 9510 Comm: iptables Not tainted 6.18.18-grsec-full-20260320181326 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> […] dump_stack_lvl+0xee/0x160 ffff88004117eeb8 […] print_report+0x6e/0x640 ffff88004117eee0 […] ? __phys_addr+0x8e/0x140 ffff88004117eef0 […] ? kasan_addr_to_slab+0x51/0xe0 ffff88004117ef08 […] ? complete_report_info+0xec/0x1c0 ffff88004117ef20 […] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef48 […] kasan_report+0xbc/0x140 ffff88004117ef50 […] ? nf_nat_register_fn+0x5f9/0x640 ffff88004117ef90 […] nf_nat_register_fn+0x5f9/0x640 ffff88004117eff8 […] ? nf_nat_icmp_reply_translation+0x6e0/0x6e0 ffff88004117f070 […] nf_tables_register_hook.part.0+0xa0/0x220 ffff88004117f080 […] nf_tables_addchain.constprop.0+0x1054/0x1fc0 ffff88004117f0b8 […] ? nft_chain_lookup.part.0+0x4ce/0xac0 ffff88004117f130 […] ? nf_tables_abort+0x3d80/0x3d80 ffff88004117f190 […] ? nf_tables_dumpreset_obj+0x100/0x100 ffff88004117f1c8 […] ? nft_table_lookup.part.0+0x255/0x300 ffff88004117f310 […] ? nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f358 […] nf_tables_newchain+0x21a4/0x2fa0 ffff88004117f360 […] ? nf_tables_addchain.constprop.0+0x1fc0/0x1fc0 ffff88004117f458 […] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f488 […] ? lock_acquire+0x16f/0x320 ffff88004117f490 […] ? find_held_lock+0x3b/0xe0 ffff88004117f4b0 […] ? __nla_parse+0x45/0x80 ffff88004117f500 […] nfnetlink_rcv_batch+0xbca/0x19a0 ffff88004117f550 […] ? nfnetlink_net_exit_batch+0x120/0x120 ffff88004117f618 […] ? __sanitizer_cov_trace_switch+0x63/0xe0 ffff88004117f720 […] ? gr_acl_handle_mmap+0x1c4/0x320 ffff88004117f7c0 […] ? nla_get_range_signed+0x4a0/0x4a0 ffff88004117f7e8 […] ? gr_is_capable+0x6f/0xe0 ffff88004117f830 […] ? __nla_parse+0x45/0x80 ffff88004117f860 […] ? skb_pull+0x103/0x1a0 ffff88004117f880 […] nfnetlink_rcv+0x3db/0x4a0 ffff88004117f8b0 […] ? nfnetlink_rcv_batch+0x19a0/0x19a0 ffff88004117f8d8 […] ? netlink_lookup+0xe2/0x240 ffff88004117f900 […] netlink_unicast+0x74b/0xb00 ffff88004117f930 […] ? netlink_attachskb+0xb20/0xb20 ffff88004117f980 […] ? __check_object_size+0x3e/0xaa0 ffff88004117f998 […] ? security_netlink_send+0x51/0x160 ffff88004117f9c8 […] netlink_sendmsg+0xa03/0x1200 ffff88004117f9f8 […] ? netlink_unicast+0xb00/0xb00 ffff88004117fa70 […] ? netlink_unicast+0xb00/0xb00 ffff88004117fac8 […] ? ____sys_sendmsg+0xe2a/0x1040 ffff88004117faf8 […] ____sys_sendmsg+0xe2a/0x1040 ffff88004117fb00 […] ? kernel_recvmsg+0x300/0x300 ffff88004117fb60 […] ? reacquire_held_locks+0xe9/0x260 ffff88004117fbc8 […] ___sys_sendmsg+0x138/0x200 ffff88004117fbf8 […] ? do_recvmmsg+0x7e0/0x7e0 ffff88004117fc30 […] ? lockdep_hardirqs_on_prepare+0x101/0x1e0 ffff88004117fc50 […] ? lock_acquire+0x16f/0x320 ffff88004117fd20 […] ? lock_acquire+0x16f/0x320 ffff88004117fd58 […] ? find_held_lock+0x3b/0xe0 ffff88004117fd70 […] __sys_sendmsg+0x17a/0x260 ffff88004117fdc8 […] ? __sys_sendmsg_sock+0x80/0x80 ffff88004117fdf0 […] ? syscall_trace_enter+0x15e/0x2c0 ffff88004117fe98 […] do_syscall_64+0x7d/0x400 ffff88004117fec8 […] entry_SYSCALL_64_safe_stack+0x4a/0x60 ffff88004117fef8 </TASK> ================================================================== The out-of-bounds report, though, is a red herring as it is for an access that shouldn't have happened in the first place. When nf_nat_init() fails to register its BPF kfuncs, it'll unwind and, among others, call unregister_pernet_subsys() to deregister its per-net ops. This makes the previously allocated net id available for reuse by the next caller of register_pernet_subsys(), in our case, synproxy. However, 'nat_net_id' will still hold the previously allocated value. If nf_nat.o gets build as a module, all this doesn't matter. A failed initialization routine makes the module fail to load and any dependent module won't be able to load either. However, if nf_nat.o is built-in, a failing init won't /completely/ make its functionality unavailable to dependent modules, namely the code and static data is still there, free to be called by modules like nft_chain_nat.ko. Case in point, nft_chain_nat registers hooks that'll call into nf_nat which, in our case, failed to initialize and therefore won't have a valid net id nor related net_nat object any more. Code in nf_nat, namely nf_nat_register_fn() and nf_nat_unregister_fn(), still making use of the reallocated net id, lead to a type confusion as the call to net_generic() will no longer return memory belonging to an object suited to fit 'struct nat_net' but 'struct synproxy_net' instead. The latter is only 24 bytes on 64-bit systems, much smaller than struct nat_net which is 176 bytes, perfectly explaining the OOB KASAN report. Detect and handle a failed nf_nat_init() by testing the 'nf_nat_hook' pointer which will be reset to NULL on initialization errors to prevent the usage of an invalid nat_net pointer. As this check is only needed when nf_nat.o is built-in, guard it by '#ifndef MODULE...'. Fixes: cbc1dd5b659f ("netfilter: nf_nat: Fix possible memory leak in nf_nat_init()") Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3 daysMerge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 7.2 merge window.
3 daysInput: mms114 - fix touch indexing for MMS134S and MMS136Dmitry Torokhov
The MMS134S and MMS136 touch controllers have an event size of 6 bytes rather than 8 bytes. When __mms114_read_reg() reads the touch data packet from the device into the touch buffer, the events are packed tightly at 6-byte intervals. However, the driver iterates through the events using standard C array indexing (touch[index]), where each element is sizeof(struct mms114_touch) (8 bytes) apart. As a result, any touch events beyond the first one are read from incorrect offsets and parsed improperly. Fix this by explicitly calculating the byte offset for each touch event based on the device's specific event size. Fixes: 53fefdd1d3a3 ("Input: mms114 - support MMS136") Fixes: ab108678195f ("Input: mms114 - support MMS134S") Reported-by: sashiko-bot@kernel.org Assisted-by: Antigravity:gemini-3.5-flash Reviewed-by: Bryam Vargas <hexlabsecurity@proton.me> Link: https://patch.msgid.link/20260616050912.1531241-1-dmitry.torokhov@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 daysInput: elan_i2c - prevent division by zero and arithmetic underflowRanjan Kumar
The Elan I2C touchpad driver queries the device for its physical dimensions and trace counts to calculate the device resolution and width. However, if the device firmware or device tree provides invalid zero values for x_traces or y_traces, it results in a fatal division-by-zero exception leading to a kernel panic during device probe. Add checks to ensure these parameters are non-zero before performing the division. If invalid trace values are detected, fall back to a safe default of 1. Additionally, prevent an arithmetic underflow in the touch reporting logic. Previously, if the calculated or fallback width was smaller than ETP_FWIDTH_REDUCE (90), the subtraction would underflow, resulting in a massive unsigned integer being reported to userspace. Clamp the adjusted width to a minimum of 0 to safely handle small physical dimensions and fallback scenarios. Completing the probe with safe fallback values ensures the sysfs nodes are created, keeping the firmware update path intact so a recovery firmware can be flashed to the device. Fixes: 6696777c6506 ("Input: add driver for Elan I2C/SMbus touchpad") Fixes: e3a9a1290688 ("Input: elan_i2c - do not query the info if they are provided") Signed-off-by: Ranjan Kumar <kumarranja@chromium.org> Link: https://patch.msgid.link/20260612060339.3829666-1-kumarranja@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 daysInput: stop force-feedback timer when unregistering input devicesDmitry Torokhov
Memoryless force-feedback devices use a timer to manage playback of effects. When a driver for such a device is unbound (or the device is unregistered for other reasons), the driver typically frees its private data synchronously. However, the input_dev structure (and its associated force-feedback structures, including the timer) is only freed when the last user closes the corresponding device node. If userspace keeps the device node open while the device is unregistered (e.g., during driver unbind), the force-feedback timer can still fire after the driver's private data has been freed. Introduce a new 'stop' callback to struct ff_device, and call it from input_unregister_device() before the device is deleted. Implement this callback for memoryless devices and synchronously shut down the timer to ensure it is stopped and cannot be rearmed once unregistration happens. Assisted-by: Gemini:gemini-3.1-pro Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 daysInput: iforce - bound the device-reported force-feedback effect indexBryam Vargas
iforce_process_packet() handles a status report (packet id 0x02) by taking a force-feedback effect index straight from the device wire and using it to address the per-effect state array: i = data[1] & 0x7f; if (data[1] & 0x80) { if (!test_and_set_bit(FF_CORE_IS_PLAYED, iforce->core_effects[i].flags)) ... } else if (test_and_clear_bit(FF_CORE_IS_PLAYED, iforce->core_effects[i].flags)) { ... } The index is masked only with 0x7f, so it ranges 0..127, but core_effects[] holds only IFORCE_EFFECTS_MAX (32) entries. For an index of 32..127 the test_and_set_bit()/test_and_clear_bit() is an out-of-bounds single-bit read-modify-write past the array. core_effects[] is the second-to-last member of struct iforce, so the write lands in the trailing members and beyond the embedding kzalloc()'d iforce_serio / iforce_usb object. data[1] is unvalidated device payload on both transports (the USB interrupt endpoint and serio), and the status path is not gated on force feedback being present, so a malicious or counterfeit device can set or clear a bit at an attacker-chosen offset past the object. Reject an out-of-range index instead of indexing with it. Bound against the array dimension IFORCE_EFFECTS_MAX rather than dev->ff->max_effects so the check guarantees memory safety regardless of how many effects the device registered. A legitimate "effect started/stopped" status always carries an index below IFORCE_EFFECTS_MAX, so well-formed devices are unaffected; the neighbouring mark_core_as_ready() loop is already bounded and is left untouched. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260613-b4-disp-4828d263-v1-1-02320e1a89dd@proton.me Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 daysInput: goodix - clamp the device-reported contact countBryam Vargas
goodix_ts_read_input_report() copies the number of touch points reported by the device into an on-stack buffer u8 point_data[2 + GOODIX_MAX_CONTACT_SIZE * GOODIX_MAX_CONTACTS]; which is sized for at most GOODIX_MAX_CONTACTS (10) contacts. The only runtime check bounds the per-interrupt count against ts->max_touch_num, but that value is taken verbatim from a 4-bit field of the device configuration block and is never clamped: ts->max_touch_num = ts->config[MAX_CONTACTS_LOC] & 0x0f; The nibble can be 0..15, so a malfunctioning, malicious or counterfeit controller (or an attacker tampering with the I2C bus) can advertise up to 15 contacts. goodix_ts_read_input_report() then accepts a touch_num of up to 15 and the second goodix_i2c_read() writes ts->contact_size * (touch_num - 1) bytes past the one-contact header into point_data - up to 30 bytes (45 with the 9-byte report format) beyond the 92-byte buffer: a stack out-of-bounds write. Clamp max_touch_num to GOODIX_MAX_CONTACTS, the number of contacts point_data[] is sized for, when reading it from the configuration. Fixes: a7ac7c95d468 ("Input: goodix - use max touch number from device config") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Link: https://patch.msgid.link/20260612-b4-disp-6844625d-v1-1-df0aed080c9d@proton.me Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
3 daysbpf: Disable xfrm_decode_session hook attachmentBradley Morgan
BPF LSM programs can currently attach to xfrm_decode_session(). That hook may return an error, but security_skb_classify_flow() calls it from a void path and triggers BUG_ON() if an error is returned. Disable BPF attachment to the hook to prevent a BPF LSM program from turning packet classification into a full panic. Fixes: 9e4e01dfd325 ("bpf: lsm: Implement attach, detach and execution") Signed-off-by: Bradley Morgan <include@grrlz.net> Link: https://lore.kernel.org/r/20260619130305.27779-1-include@grrlz.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysMerge tag 'erofs-for-7.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "The most notable change is the removal of the fscache backend: it has been deprecated for almost two years, mainly because EROFS file-backed mounts and fanotify pre-content hooks (together with erofs-utils) now provide better functionality and simpler codebase. In addition, fscache has depended on netfslib for years, which is undesirable for EROFS since it is a local filesystem. More details in [1]. In addition, sparse support has been added to the pcluster layout, which is helpful for large sparse AI datasets, and map requests for chunk-based inodes have been optimized to be more efficient as well. There are also the usual fixes and cleanups. Summary: - Report more consecutive chunks of the same type for each iomap request - Add sparse support for the pcluster layout - Update the EROFS documentation overview - Remove the deprecated fscache backend - Various fixes and cleanups" Link: https://lore.kernel.org/r/20260622013622.934174-1-hsiangkao@linux.alibaba.com [1] * tag 'erofs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: handle 48-bit blocks_hi for compressed inodes erofs: remove fscache backend entirely erofs: simplify RCU read critical sections erofs: add sparse support to pcluster layout erofs: add folio order to trace_erofs_read_folio erofs: introduce erofs_map_chunks() erofs: call erofs_exit_ishare() before rcu_barrier() erofs: update the overview of the documentation erofs: clean up erofs_ishare_fill_inode()
3 daysmd/raid5: avoid R5_Overlap races while breaking stripe batchesChen Cheng
KCSAN report a race in break_stripe_batch_list() vs. raid5_make_request() on sh->dev[i].flags (plain word write vs. atomic bit op).. and .. one possible scenario is: CPU1 CPU2 break_stripe_batch_list(sh1) -> handle sh2 -> lock(sh2) -> sh2->batch_head = NULL -> unlock(sh2) -> test_and_clear_bit(R5_Overlap, sh2->dev[i].flags) -> wake_up_bit(sh2->dev[i].flags) raid5_make_request() -> add_all_stripe_bios(sh2) -> lock(sh2) -> stripe_bio_overlaps(sh2) returns true batch_head is NULL, so new bio overlap exist bio on sh2 -> true -> set_bit(R5_Overlap, sh2->dev[i].flags) -> unlock(sh2) -> wait_on_bit(sh2->dev[i].flags) -> sh2->dev[i].flags = sh1->dev[i].flags & ~R5_Overlap No wait_up_bit(), CPU2 could be wait_on_bit() forever... Fix by : - Expand the protect zone. - Use batch_head's device flag's snaphot when no held head_sh->stripe_lock. - Move sh/head_sh->batch_head = NULL to the end of protected zone , and , any concurrent add_all_stripe_bios() grabs sh->stripe_lock now either: - see batch_head != null, and , is rejected by stripe_bio_overlaps() under the lock (no R5_Overlap wait ) , or , - sees batch_head == NULL, only after dev[i].flags has already been set and the prior R5_Overlap waiters worken. KCSAN report: ================================================ BUG: KCSAN: data-race in break_stripe_batch_list / raid5_make_request write (marked) to 0xffff8e89c8117548 of 8 bytes by task 4042 on cpu 0: raid5_make_request+0xea0/0x2930 md_handle_request+0x4a2/0xa40 md_submit_bio+0x109/0x1a0 __submit_bio+0x2ec/0x390 submit_bio_noacct_nocheck+0x457/0x710 submit_bio_noacct+0x2a7/0xc20 submit_bio+0x56/0x250 blkdev_direct_IO+0x54c/0xda0 blkdev_write_iter+0x38f/0x570 aio_write+0x22b/0x490 io_submit_one+0xa51/0xf70 __x64_sys_io_submit+0xf7/0x220 x64_sys_call+0x1907/0x1c60 do_syscall_64+0x130/0x570 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff8e89c8117548 of 8 bytes by task 4010 on cpu 5: break_stripe_batch_list+0x249/0x480 handle_stripe_clean_event+0x720/0x9b0 handle_stripe+0x32fb/0x4500 handle_active_stripes.isra.0+0x6e0/0xa50 raid5d+0x7e0/0xba0 md_thread+0x15a/0x2d0 kthread+0x1e3/0x220 ret_from_fork+0x37a/0x410 ret_from_fork_asm+0x1a/0x30 value changed: 0x0000000000000019 -> 0x0000000000000099 --> R5_Overlap Fixes: fb642b92c267 ("md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list") Signed-off-by: Chen Cheng <chencheng@fnnas.com> Link: https://patch.msgid.link/20260619041013.1207148-1-chencheng@fnnas.com Signed-off-by: Yu Kuai <yukuai@fygo.io>
3 daysmd/raid5: use stripe state snapshot in break_stripe_batch_list()Chen Cheng
The patch just suppress KCSAN noise. No functional change. RAID-5 can group multi full-stripe-write aka stripe_head into a batch aka batch_list, with one head_sh leading them. Call break_stripe_batch_list() when the batch is finished, or, a stripe has to be dropped out of the batch. break_stripe_batch_list() reads stripe state several times while request paths can update thost state words concurrently with lockless bitops, which reported by KCSAN. Use a snapshot to guarantees that the value used for warning, copying, and handle checks is internally consistent at current read moment. KCSAN report: ============================================== BUG: KCSAN: data-race in __add_stripe_bio / break_stripe_batch_list write (marked) to 0xffff8e89d4f0b988 of 8 bytes by task 4323 on cpu 3: __add_stripe_bio+0x35e/0x400 raid5_make_request+0x6ac/0x2930 md_handle_request+0x4a2/0xa40 md_submit_bio+0x109/0x1a0 __submit_bio+0x2ec/0x390 submit_bio_noacct_nocheck+0x457/0x710 submit_bio_noacct+0x2a7/0xc20 submit_bio+0x56/0x250 blkdev_direct_IO+0x54c/0xda0 blkdev_write_iter+0x38f/0x570 aio_write+0x22b/0x490 io_submit_one+0xa51/0xf70 read to 0xffff8e89d4f0b988 of 8 bytes by task 4290 on cpu 4: break_stripe_batch_list+0x3ce/0x480 handle_stripe_clean_event+0x720/0x9b0 handle_stripe+0x32fb/0x4500 handle_active_stripes.isra.0+0x6e0/0xa50 raid5d+0x7e0/0xba0 Signed-off-by: Chen Cheng <chencheng@fnnas.com> Link: https://patch.msgid.link/20260618134748.1168360-1-chencheng@fnnas.com Signed-off-by: Yu Kuai <yukuai@fygo.io>
3 daysselftests: drv-net: so_txtime: relax variance boundsWillem de Bruijn
The net-next-hw spinners on netdev.bots.linux.dev observe failing so-txtime-py tests. A review of stdout shows most failures to be due to exceeding the 4ms grace period. All I saw were within 8ms. So increase to that. Double the bounds from 4 to 8ms. This is still is small enough to differentiate the delays programmed by the test, 10 and 20ms. Fixes: 5c6baef3885c ("selftests: drv-net: convert so_txtime to drv-net") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20260610170651.1b644001@kernel.org/ Signed-off-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260621200137.1564776-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysnet: airoha: Fix TX scheduler queue mask loop upper boundWayen Yan
In airoha_qdma_set_chan_tx_sched(), the loop clearing queue mask was using AIROHA_NUM_TX_RING (32) instead of AIROHA_NUM_QOS_QUEUES (8). Each channel has 8 queues, and TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i) computes BIT(i + (channel * 8)). With i ranging 0..31, this causes: - channel 0: clears bit 0..31 (all 4 channels) instead of 0..7 - channel 1: clears bit 8..31 (channels 1-3) instead of 8..15 - channel 2: clears bit 16..31 (channels 2-3) instead of 16..23 - channel 3: clears bit 24..31 (channel 3 only) - correct by accident While BIT(32+) on arm64 produces 64-bit values truncated to 0 in u32 mask parameter, the loop still incorrectly clears queues within the same channel beyond queue 7. Even though this is functionally harmless (the register resets to 0 and is only ever cleared, never set — so clearing extra bits is a no-op), the loop bound is semantically wrong and should be fixed for correctness and clarity. Fix by using AIROHA_NUM_QOS_QUEUES (8) as the loop upper bound. Fixes: ef1ca9271313 ("net: airoha: Add sched HTB offload support") Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Wayen Yan <win847@gmail.com> Link: https://patch.msgid.link/178187479434.2400840.1312143943526335838@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysbnx2x: fix potential memory leak in bnx2x_alloc_mem_bp()Abdun Nihaal
If the allocation of fp[i].tpa_info fails, the error path will not free the struct bnx2x_fastpath allocated earlier, as it is not linked to the bp structure yet. Fix that by linking it immediately after allocation. Cc: stable@vger.kernel.org Fixes: 15192a8cf8a8 ("bnx2x: Split the FP structure") Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260620062402.89549-1-nihaal@cse.iitm.ac.in Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysipv4: fib: Don't ignore error route in local/main tables.Kuniyuki Iwashima
When CONFIG_IP_MULTIPLE_TABLES is enabled but no rule is added, fib_lookup() performs route lookup directly on two tables. Since the first lookup does not properly bail out, the result of an error route in the merged local/main table could be overwritten by another route in the default table: # unshare -n # ip link set lo up # ip route add 192.168.0.0/24 dev lo table 253 # ip route add unreachable 192.168.0.0/24 # ip route get 192.168.0.1 192.168.0.1 dev lo table default uid 0 cache <local> Once a random rule is added, the error route is respected: # ip rule add table 0 # ip rule del table 0 # ip route get 192.168.0.1 RTNETLINK answers: No route to host Let's fix the inconsistent behaviour. Fixes: f4530fa574df ("ipv4: Avoid overhead when no custom FIB rules are installed.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260619212753.3367244-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 dayseth: bnxt: improve the timing of statsJakub Kicinski
Kernel selftests wait 1.25x of the promised stats refresh time (as read from ethtool -c). bnxt reports 1sec by default, but the stats update process has two steps. First device DMAs the new values, then the service task performs update in full-width SW counters. So the worst case delay is actually 2x. Note that the behavior is different for ring stats and port stats. Port stats are fetched synchronously by the service worker, so there's no risk of doubling up the delay there. The problem of stale stats impacts not only tests but real workloads which monitor egress bandwidth of a NIC. The inaccuracy causes double counting in the next cycle and spurious overload alarms. Try to read from the DMA buffer more aggressively, to mitigate timing issues between DMA and service task. The SW update should be cheap. Fixes: 51f307856b60 ("bnxt_en: Allow statistics DMA to be configurable using ethtool -C.") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260619191538.104165-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysipv6: Fix null-ptr-deref in fib6_nh_mtu_change().Xiang Mei
fib6_nh_mtu_change() re-fetches idev via __in6_dev_get(arg->dev) and dereferences idev->cnf.mtu6 without a NULL check. addrconf_ifdown() clears dev->ip6_ptr with RCU_INIT_POINTER() after rt6_disable_ip() has released tb6_lock, so the RA-driven MTU walk can observe a NULL idev and oops. The caller rt6_mtu_change_route() guards its own __in6_dev_get(), but this re-fetch is unguarded; nexthop-backed routes survive addrconf_ifdown()'s flush, so the walk still reaches it after ip6_ptr is nulled. Return 0 when idev is NULL, matching rt6_mtu_change_route() and the fib6_mtu() fix in commit 5ad509c1fdad ("ipv6: Fix null-ptr-deref in fib6_mtu()."). Oops: general protection fault, ... KASAN: null-ptr-deref in range [0x00000000000002a8-0x00000000000002af] RIP: 0010:fib6_nh_mtu_change+0x203/0x990 rt6_mtu_change_route+0x141/0x1d0 __fib6_clean_all+0xd0/0x160 rt6_mtu_change+0xb4/0x100 ndisc_router_discovery+0x24b5/0x2cb0 icmpv6_rcv+0x12e9/0x1710 ipv6_rcv+0x39b/0x410 Fixes: c0b220cf7d80 ("ipv6: Refactor exception functions") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260619045334.2427073-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 dayshdlc_ppp: sync per-proto timers before freeing hdlc stateFan Wu
Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp registers a timer via timer_setup(). That struct ppp is the hdlc->state allocation, which detach_hdlc_protocol() frees with kfree() in both teardown paths: unregister_hdlc_device() and the re-attach inside attach_hdlc_protocol(). The ppp proto never registered a .detach callback, so detach_hdlc_protocol() performs no timer synchronization before the kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(), is partial (it does not wait for a running callback) and only runs on the ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A ppp_timer callback already executing (blocked on ppp->lock) survives the kfree and then dereferences proto->state / ppp->lock in freed memory, leading to a use-after-free. Fix this by adding a .detach helper that calls timer_shutdown_sync() on every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev) before kfree(hdlc->state), so timer_shutdown_sync() now runs on both free paths. timer_shutdown_sync() is used instead of timer_delete_sync() because the keepalive path re-arms the timer through add_timer()/mod_timer() and shutdown blocks any re-activation during teardown. Initialize the per-protocol timers in ppp_ioctl() when the protocol is attached, and remove the now-redundant timer_setup() from ppp_start(), so that the timers are initialized exactly once at attach time and ppp_timer_release() never operates on uninitialized timer_list structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so struct ppp's protos[i].timer is uninitialized garbage until the first timer_setup(); without this init-at-attach, attaching the PPP protocol without ever bringing the device up would leave timer_shutdown_sync() operating on uninitialized memory in .detach. Moving the init out of ppp_start() (which only runs on NETDEV_UP) into the attach path makes the initialization unconditional and avoids initializing the same timer_list twice. This bug was found by static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysnet: ethernet: ti: icssg: guard PA stat lookupsPhilippe Schenker
icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name() with FW PA stat names regardless of whether the PA stats block is present on the hardware. emac_get_stat_by_name() already guards the PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer is NULL the lookup falls through to netdev_err() and returns -EINVAL. Because ndo_get_stats64 is polled regularly by the networking stack this produces thousands of log entries of the form: icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR A secondary consequence is that the int(-EINVAL) return value is implicitly widened to a near-ULLONG_MAX unsigned value when accumulated into the __u64 fields of rtnl_link_stats64, silently corrupting the rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`. Every other PA-aware code path in the driver is already guarded with the same `if (emac->prueth->pa_stats)` check. Apply the same guard here. Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats") Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch> Reviewed-by: Simon Horman <horms@kernel.org> Cc: danishanwar@ti.com Cc: rogerq@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260618093037.3448858-1-dev@pschenker.ch Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysMerge branch 'fix-stale-register-bounds-on-lsm-retval-context-load'Alexei Starovoitov
Tristan Madani says: ==================== Fix stale register bounds on LSM retval context load From: Tristan Madani <tristan@talencesecurity.com> check_mem_access() calls __mark_reg_s32_range() to narrow a register to the LSM hook retval range, but the intersection preserves stale bounds from prior instructions. Add mark_reg_unknown() before narrowing (same pattern as the else branch) and a selftest that catches the mismatch. Changes in v3: - Add selftest demonstrating the issue (Eduard Zingerman) - No code change in patch 1 from v2 ==================== Link: https://patch.msgid.link/20260622230123.3695446-1-tristmd@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysselftests/bpf: Add test for stale bounds on LSM retval context loadTristan Madani
Add a verifier test that catches the stale-bounds issue fixed in the previous patch. The test sets r6 = 0 to create known bounds, then loads the LSM hook return value into r6 from the context. Without the fix, the verifier intersects the retval range with the stale bounds and incorrectly narrows r6 to a single value, pruning the fall-through branch as dead code and missing the div-by-zero. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260622230123.3695446-3-tristmd@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: Reset register bounds before narrowing retval range in check_mem_access()Tristan Madani
When the BPF verifier processes a context load of an LSM hook return value, it calls __mark_reg_s32_range() to narrow the register to the hook's valid range. However, __mark_reg_s32_range() intersects the new range with the register's existing bounds using max_t()/min_t() rather than replacing them. If the destination register carries stale bounds from a prior instruction (e.g. BPF_MOV64_IMM), the intersection can produce a range narrower than reality. The verifier then believes it knows the register's exact value, while at runtime the actual hook return value is loaded, creating a verifier/runtime mismatch that can be used to bypass BPF memory safety checks. The else branch already calls mark_reg_unknown() to reset register state before any narrowing. Apply the same reset in the is_retval path so stale bounds are cleared before __mark_reg_s32_range() intersects. Fixes: 5d99e198be27 ("bpf, lsm: Add check for BPF LSM return value") Cc: stable@vger.kernel.org Signed-off-by: Tristan Madani <tristan@talencesecurity.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260622230123.3695446-2-tristmd@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysrocker: Fix memory leak in ofdpa_port_fdb()Ziran Zhang
In ofdpa_port_fdb(), the hash_del() only unlinks the node from hash table, but does not free it. Fix this by adding kfree(found) after the !found == removing check, where the pointer value is no longer needed. Found by Coccinelle kfree script. Cc: <stable+noautosel@kernel.org> # rocker is a test harness, it's never loaded on production systems Signed-off-by: Ziran Zhang <zhangcoder@yeah.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260616013245.7098-1-zhangcoder@yeah.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 daysMerge branch 'bpf-guard-conntrack-opts-error-writes'Alexei Starovoitov
Yiyang Chen says: ==================== bpf: Guard conntrack opts error writes The conntrack lookup/allocation kfuncs expose an opts/opts__sz pair. The verifier checks the caller-provided opts__sz range, but the wrappers currently write opts->error after internal errors even when opts__sz is too small to include that field. Patch 1 writes opts->error only when opts__sz includes it, and uses a single helper to fold ERR_PTR returns into the kfunc ABI result while keeping the local nfct result variable in each wrapper. Patch 2 adds a bpf_nf regression check that keeps a guard in opts->error while passing opts__sz covering only netns_id. The regression check follows the existing bpf_nf test shape. Before the fix, the guard is overwritten with -EINVAL even though opts__sz covers only the first four bytes of the options object. After the fix, the kfunc still returns NULL for the invalid size, but the guard remains intact. Validation, rebased and tested on bpf-next master e771677c937d ("Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd"): git diff --check origin/master..HEAD: OK scripts/checkpatch.pl --strict on 1/2 and 2/2: OK make O=/root/ebpf-verifier-bug-detection/kernel-build/bpf-next \ net/netfilter/nf_conntrack_bpf.o: OK Focused QEMU direct-runner against XDP and TC lookup/alloc paths: unpatched bpf-next e771677c937d: guard overwritten with -EINVAL patched v2 007dfd0341cd: guard preserved as 0x12345678 QEMU upstream bpf_nf selftest with CONFIG_NF_CONNTRACK_MARK, CONFIG_NF_CONNTRACK_ZONES, and legacy iptables enabled: ./test_progs -t bpf_nf -vv: OK git am of exported 1/2 and 2/2 on a fresh worktree at base: OK range-diff between branch commits and git-am result: equivalent Changes in v2: - Rebased onto current bpf-next master. - Reworked patch 1 to use bpf_ct_opts_result() for the ERR_PTR-to-NULL conversion and guarded opts->error write, as suggested by Alexei. - Kept the local nfct result variable in each wrapper before returning through bpf_ct_opts_result(). - Added matching Fixes tags to the selftest patch so the regression test can be backported with the fix. v1: https://lore.kernel.org/bpf/cover.1781586477.git.chenyy23@mails.tsinghua.edu.cn/ ==================== Link: https://patch.msgid.link/cover.1781765747.git.chenyy23@mails.tsinghua.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysselftests/bpf: Cover small conntrack opts error writesYiyang Chen
Add a conntrack kfunc regression check for opts__sz values that do not cover opts->error. The BPF program initializes opts->error with a guard value, calls the lookup and allocation kfuncs with opts__sz set to sizeof(opts->netns_id), and verifies that the guard is still intact after the kfunc returns NULL. Without the conntrack wrapper guard, the kfunc error path overwrites that guard with -EINVAL even though the verifier checked only the first four bytes of the options object. Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF") Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT") Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/r/007dfd0341cd84560e4795a2a951cc56d4adff1d.1781765747.git.chenyy23@mails.tsinghua.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: Guard conntrack opts error writesYiyang Chen
The conntrack lookup and allocation kfuncs take an opts pointer together with an opts__sz argument. The verifier checks only the memory range described by opts__sz, but the wrappers unconditionally write opts->error whenever the internal lookup or allocation helper returns an error. For an invalid size smaller than the end of opts->error, that write can land outside the verifier-checked range. Keep returning NULL for invalid arguments, but only report the error through opts->error when the supplied size includes the field. This preserves error reporting for the supported 12-byte and 16-byte layouts, and for other invalid sizes that still include opts->error. Fixes: b4c2b9593a1c ("net/netfilter: Add unstable CT lookup helpers for XDP and TC-BPF") Fixes: d7e79c97c00c ("net: netfilter: Add kfuncs to allocate and insert CT") Signed-off-by: Yiyang Chen <chenyy23@mails.tsinghua.edu.cn> Link: https://lore.kernel.org/r/9535e781fe14449b1d4e9bbc3baa7566a93bf512.1781765747.git.chenyy23@mails.tsinghua.edu.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysblk-cgroup: defer blkcg css_put until blkg is unlinked from queueZizhi Wo
[BUG] Our fuzz testing triggered a blkcg use-after-free issue: BUG: KASAN: slab-use-after-free in _raw_spin_lock+0x75/0xe0 Call Trace: ... blkcg_deactivate_policy+0x244/0x4d0 ioc_rqos_exit+0x44/0xe0 rq_qos_exit+0xba/0x120 __del_gendisk+0x50b/0x800 del_gendisk+0xff/0x190 ... [CAUSE] process1 process2 cgroup_rmdir ... css_killed_work_fn offline_css ... blkcg_destroy_blkgs ... __blkg_release css_put(&blkg->blkcg->css) blkg_free INIT_WORK(xxx, blkg_free_workfn) schedule_work css_put ... blkcg_css_free kfree(blkcg)--------blkcg has been freed!!! ====================================schedule_work blkg_free_workfn __del_gendisk rq_qos_exit ioc_rqos_exit blkcg_deactivate_policy mutex_lock(&q->blkcg_mutex) spin_lock_irq(&q->queue_lock) list_for_each_entry(blkg, xxx) blkcg = blkg->blkcg spin_lock(&blkcg->lock)-------UAF!!! mutex_lock(&q->blkcg_mutex) spin_lock_irq(&q->queue_lock) /* Only then is the blkg removed from the list */ list_del_init(&blkg->q_node) As a result, a blkg can still be reachable through q->blkg_list while its ->blkcg has already been freed. [Fix] Fix this by deferring the blkcg css_put() until after the blkg has been unlinked from q->blkg_list in blkg_free_workfn(). This ensures that the blkcg outlives every blkg still reachable through q->blkg_list, so any iterator holding q->queue_lock is guaranteed to observe a valid blkg->blkcg. While at it, move css_tryget_online() from blkg_create() into blkg_alloc() so that the css reference is owned by the alloc/free pair rather than straddling layers: blkg_alloc() <-> blkg_free() blkg_create() <-> blkg_destroy() Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Suggested-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Yu Kuai <yukuai@fygo.io> Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com> Link: https://patch.msgid.link/20260616011746.2451461-1-wozizhi@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>