summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
7 daysALSA: core: Serialize deferred fasync state checksCássio Gabriel
snd_fasync_helper() updates fasync->on under snd_fasync_lock, and snd_fasync_work_fn() now also evaluates fasync->on under the same lock. snd_kill_fasync() still tests the flag before taking the lock, leaving an unsynchronized read against FASYNC enable/disable updates. Move the enabled-state check into the locked section. Also clear fasync->on under snd_fasync_lock in snd_fasync_free() before unlinking the pending entry. Together with the locked sender-side check, this publishes teardown before flushing the deferred work and prevents a racing sender from requeueing the entry after free has started. Fixes: ef34a0ae7a26 ("ALSA: core: Add async signal helpers") Fixes: 8146cd333d23 ("ALSA: core: Fix potential data race at fasync handling") Cc: stable@vger.kernel.org Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com> Link: https://patch.msgid.link/20260506-alsa-core-fasync-on-lock-v1-1-ea48c77d6ca4@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daysALSA: hda/realtek: Add mute LED fixup for HP Pavilion 15-cs1xxxRodrigo Faria
Add a new fixup for the mute LED on the HP Pavilion 15-cs1xxx series using the VREF on NID 0x1b. The BIOS on these models (tested up to F.32) incorrectly reports the mute LED on NID 0x18 via DMI OEM strings, which lacks VREF capabilities. This fixup overrides the LED pin to the correct NID 0x1b. Signed-off-by: Rodrigo Faria <rodrigofilipefaria@gmail.com> Link: https://patch.msgid.link/20260505185518.23625-1-rodrigofilipefaria@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daysALSA: seq: Fix UMP group 16 filteringCássio Gabriel
The sequencer UAPI defines group_filter as an unsigned int bitmap. Bit 0 filters groupless messages and bits 1-16 filter UMP groups 1-16. The internal snd_seq_client storage is only unsigned short, so bit 16 is truncated when userspace sets the filter. The same truncation affects the automatic UMP client filter used to avoid delivery to inactive groups, so events for group 16 cannot be filtered. Store the internal bitmap as unsigned int and keep both userspace-provided and automatically generated values limited to the defined UAPI bits. Fixes: d2b706077792 ("ALSA: seq: Add UMP group filter") Cc: stable@vger.kernel.org Signed-off-by: Cássio Gabriel <cassiogabrielcontato@gmail.com> Link: https://patch.msgid.link/20260506-alsa-seq-ump-group16-filter-v1-1-b75160bf6993@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
7 daystimers/migration: Fix another hotplug activation raceFrederic Weisbecker
The hotplug control CPU is assumed to be active in the hierarchy but that doesn't imply that the root is active. If the current CPU is not the one that activated the current hierarchy, and the CPU performing this duty is still halfway through the tree, the root may still be observed inactive. And this can break the activation of a new root as in the following scenario: 1) Initially, the whole system has 64 CPUs and only CPU 63 is awake. [GRP1:0] active / | \ / | \ [GRP0:0] [...] [GRP0:7] idle idle active / | \ | CPU 0 CPU 1 ... CPU 63 idle idle active 2) CPU 63 goes idle _but_ due to a #VMEXIT it hasn't yet reached the [GRP1:0]->parent dereference (that would be NULL and stop the walk) in __walk_groups_from(). [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] idle idle idle / | \ | CPU 0 CPU 1 ... CPU 63 idle idle idle 3) CPU 1 wakes up, activates GRP0:0 but didn't yet manage to propagate up to GRP1:0 due to yet another #VMEXIT. [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] active idle idle / | \ | CPU 0 CPU 1 ... CPU 63 idle active idle 3) CPU 0 wakes up and doesn't need to walk above GRP0:0 as it's CPU 1 role. [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] active idle idle / | \ | CPU 0 CPU 1 ... CPU 63 active active idle 4) CPU 0 boots CPU 64. It creates a new root for it. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 5) CPU 0 activates the new root, but note that GRP1:0 is still idle, waiting for CPU 1 to resume from #VMEXIT and activate it. [GRP2:0] active / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 6) CPU 63 resumes after #VMEXIT and sees the new GRP1:0 parent. Therefore it propagates the stale inactive state of GRP1:0 up to GRP2:0. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 7) CPU 1 resumes after #VMEXIT and finally activates GRP1:0. But it doesn't observe its parent link because no ordering enforced that. Therefore GRP2:0 is spuriously left idle. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] active idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline Such races are highly theoretical and the problem would solve itself once the old root ever becomes idle again. But it still leaves a taste of discomfort. Fix it with enforcing a fully ordered atomic read of the old root state before propagating the activate state up to the new root. It has a two directions ordering effect: * Acquire + release of the latest old root state: If the hotplug control CPU is not the one that woke up the old root, make sure to acquire its active state and propagate it upwards through the ordered chain of activation (the acquire pairs with the cmpxchg() in tmigr_active_up() and subsequent releases will pair with atomic_read_acquire() and smp_mb__after_atomic() in tmigr_inactive_up()). * Release: If the hotplug control CPU is not the one that must wake up the old root, but the CPU covering that is lagging behind its duty, publish the links from the old root to the new parents. This way the lagging CPU will propagate the active state itself. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260423165354.95152-2-frederic@kernel.org
7 daysMerge tag 'loongarch-fixes-7.1-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix some build and runtime issues after 32BIT Kconfig option enabled, improve the platform-specific PCI controller compatibility, drop custom __arch_vdso_hres_capable(), and fix a lot of KVM bugs" * tag 'loongarch-fixes-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Move unconditional delay into timer clear scenery LoongArch: KVM: Fix HW timer interrupt lost when inject interrupt by software LoongArch: KVM: Move AVEC interrupt injection into switch loop LoongArch: KVM: Use kvm_set_pte() in kvm_flush_pte() LoongArch: KVM: Fix missing EMULATE_FAIL in kvm_emu_mmio_read() LoongArch: KVM: Cap KVM_CAP_NR_VCPUS by KVM_CAP_MAX_VCPUS LoongArch: KVM: Fix "unreliable stack" for kvm_exc_entry LoongArch: KVM: Compile switch.S directly into the kernel LoongArch: vDSO: Drop custom __arch_vdso_hres_capable() LoongArch: Fix potential ADE in loongson_gpu_fixup_dma_hang() LoongArch: Use per-root-bridge PCIH flag to skip mem resource fixup LoongArch: Fix SYM_SIGFUNC_START definition for 32BIT LoongArch: Specify -m32/-m64 explicitly for 32BIT/64BIT LoongArch: Make CONFIG_64BIT as the default option
7 daysMerge branch 'xsk-fix-bugs-around-xsk-skb-allocation'Jakub Kicinski
Jason Xing says: ==================== xsk: fix bugs around xsk skb allocation There are rare issues around xsk_build_skb(). Some of them were founded by Sashiko[1][2]. [1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/ [2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/ ==================== Link: https://patch.msgid.link/20260502200722.53960-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: fix u64 descriptor address truncation on 32-bit architecturesJason Xing
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a uintptr_t cast: skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL); On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned mode the chunk offset is encoded in bits 48-63 of the descriptor address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is lost entirely. The completion queue then returns a truncated address to userspace, making buffer recycling impossible. Fix this by handling the 32-bit case directly in xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an xsk_addrs struct (the same path already used for multi-descriptor SKBs) to store the full u64 address. The existing tagged-pointer logic in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned from kmem_cache_zalloc() are always word-aligned and therefore have bit 0 clear, which correctly identifies them as a struct pointer rather than an inline tagged address on every architecture. Factor the shared kmem_cache_zalloc + destructor_arg assignment into __xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1). The three former open-coded kmem_cache_zalloc call sites now reduce to a single call each. Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb() before skb->destructor is installed. The overhead is one extra kmem_cache_zalloc per first descriptor on 32-bit only; 64-bit builds are completely unchanged. Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/ Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: fix xsk_addrs slab leak on multi-buffer error pathJason Xing
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first continuation descriptor, it promotes destructor_arg from an inlined address to a freshly allocated xsk_addrs (num_descs = 1). The counter is bumped to >= 2 only at the very end of a successful build (by calling xsk_inc_num_desc()). If the build fails in between (e.g. alloc_page() returns NULL with -EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip calling xsk_inc_num_desc() to increment num_descs and leave the half-built skb attached to xs->skb for the app to retry. The skb now has 1) destructor_arg = a real xsk_addrs pointer, 2) num_descs = 1 If the app never retries and just close()s the socket, xsk_release() calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to free xsk_addrs by testing num_descs > 1: if (unlikely(num_descs > 1)) kmem_cache_free(xsk_tx_generic_cache, destructor_arg); Because num_descs is exactly 1 the branch is skipped and the xsk_addrs object is leaked to the xsk_tx_generic_cache slab. Fix it by directly testing if destructor_arg is still addr. Or else it is modified and used to store the newly allocated memory from xsk_tx_generic_cache regardless of increment of num_desc, which we need to handle. Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/ Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: avoid skb leak in XDP_TX_METADATA caseJason Xing
Fix it by explicitly adding kfree_skb() before returning back to its caller. How to reproduce it in virtio_net: 1. the current skb is the first one (which means no frag and xs->skb is NULL) and users enable metadata feature. 2. xsk_skb_metadata() returns a error code. 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. 4. there is no chance to free this skb anymore. Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/ Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()Jason Xing
Once xsk_skb_init_misc() has been called on an skb, its destructor is set to xsk_destruct_skb(), which submits the descriptor address(es) to the completion queue and advances the CQ producer. If such an skb is subsequently freed via kfree_skb() along an error path - before the skb has ever been handed to the driver - the destructor still runs and submits a bogus, half-initialized address to the CQ. Postpone the init phase when we believe the allocation of first frag is successfully completed. Before this init, skb can be safely freed by kfree_skb(). Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/ Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: fix use-after-free of xs->skb in xsk_build_skb() free_err pathJason Xing
When xsk_build_skb() processes multi-buffer packets in copy mode, the first descriptor stores data into the skb linear area without adding any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb to accumulate subsequent descriptors. If a continuation descriptor fails (e.g. alloc_page returns NULL with -EAGAIN), we jump to free_err where the condition: if (skb && !skb_shinfo(skb)->nr_frags) kfree_skb(skb); evaluates to true because nr_frags is still 0 (the first descriptor used the linear area, not frags). This frees the skb while xs->skb still points to it, creating a dangling pointer. On the next transmit attempt or socket close, xs->skb is dereferenced, causing a use-after-free or double-free. Fix by using a !xs->skb check to handle first frag situation, ensuring we only free skbs that were freshly allocated in this call (xs->skb is NULL) and never free an in-progress multi-buffer skb that the caller still references. Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/ Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: handle NULL dereference of the skb without frags issueJason Xing
When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the free_err -EOVERFLOW handler unconditionally dereferences xs->skb via xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL pointer dereference. Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb() calls with an xs->skb check (for the continuation case), and add an else branch for the first-descriptor case that manually cancels the one reserved CQ slot and increments invalid_descs by one to account for the single invalid descriptor. Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: free the skb when hitting the upper bound MAX_SKB_FRAGSJason Xing
Fix it by explicitly adding kfree_skb() before returning back to its caller. How to reproduce it in virtio_net: 1. the current skb is the first one (which means xs->skb is NULL) and hit the limit MAX_SKB_FRAGS. 2. xsk_build_skb_zerocopy() returns -EOVERFLOW. 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This is why bug can be triggered. 4. there is no chance to free this skb anymore. Note that if in this case the xs->skb is not NULL, xsk_build_skb() will call xsk_drop_skb(xs->skb) to do the right thing. Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysxsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devicesJason Xing
skb_checksum_help() is a common helper that writes the folded 16-bit checksum back via skb->data + csum_start + csum_offset, i.e. it relies on the skb's linear head and fails (with WARN_ONCE and -EINVAL) when skb_headlen() is 0. AF_XDP generic xmit takes two very different paths depending on the netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net) skip the "copy payload into a linear head" step on purpose as a performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM pages as frags and never calls skb_put(), so skb_headlen() stays 0 for the whole skb. For these skbs there is simply no linear area for skb_checksum_help() to write the csum into - the sw-csum fallback is structurally inapplicable. The patch tries to catch this and reject the combination with error at setup time. Rejecting at bind() converts this silent per-packet failure into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected because they do not call skb_checksum_help(). Without the patch, every descriptor carrying 'XDP_TX_METADATA | XDP_TXMD_FLAGS_CHECKSUM' produces: 1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(), 2) sendmsg() returning -EINVAL without consuming the descriptor (invalid_descs is not incremented), 3) a wedged TX ring: __xsk_generic_xmit() does not advance the consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads the same descriptor and re-hits the same WARN until the socket is closed. Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function") Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 dayspowerpc/pasemi: Drop redundant res assignmentKrzysztof Kozlowski
Return value of pas_add_bridge() is not used, so code can be simplified to fix W=1 clang warnings: arch/powerpc/platforms/pasemi/pci.c:275:6: error: variable 'res' set but not used [-Werror,-Wunused-but-set-variable] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260317130823.240279-4-krzysztof.kozlowski@oss.qualcomm.com
7 dayspowerpc/ps3: Drop redundant result assignmentKrzysztof Kozlowski
Return value of ps3_start_probe_thread() is not used, so code can be simplified to fix W=1 clang warnings: arch/powerpc/platforms/ps3/device-init.c:953:6: error: variable 'result' set but not used [-Werror,-Wunused-but-set-variable] Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260317130823.240279-3-krzysztof.kozlowski@oss.qualcomm.com
7 dayspowerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags ↵Nathan Chancellor
with clang After commit 73cdf24e81e4 ("powerpc64: make clang cross-build friendly"), building 64-bit little endian + CONFIG_COMPAT=y with clang results in many warnings along the lines of: $ cat arch/powerpc/configs/compat.config CONFIG_COMPAT=y $ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64le_defconfig compat.config arch/powerpc/kernel/vdso/ ... In file included from <built-in>:4: In file included from lib/vdso/gettimeofday.c:6: In file included from include/vdso/datapage.h:15: In file included from include/vdso/cache.h:5: arch/powerpc/include/asm/cache.h:77:8: warning: unknown attribute 'patchable_function_entry' ignored [-Wunknown-attributes] 77 | static inline u32 l1_icache_bytes(void) | ^~~~~~ include/linux/compiler_types.h:235:58: note: expanded from macro 'inline' 235 | #define inline inline __gnu_inline __inline_maybe_unused notrace | ^~~~~~~ include/linux/compiler_types.h:215:34: note: expanded from macro 'notrace' 215 | #define notrace __attribute__((patchable_function_entry(0, 0))) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... arch/powerpc/Makefile adds -DCC_USING_PATCHABLE_FUNCTION_ENTRY to KBUILD_CPPFLAGS, which is inherited by the 32-bit vDSO. However, the 32-bit little endian target does not support '-fpatchable-function-entry', resulting in the warnings above. Remove -DCC_USING_PATCHABLE_FUNCTION_ENTRY from the 32-bit vDSO flags when building with clang to avoid the warnings. Fixes: 73cdf24e81e4 ("powerpc64: make clang cross-build friendly") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260311-ppc-vdso-drop-cc-using-pfe-define-clang-v1-1-66c790e22650@kernel.org
7 daysplatform/chrome: cros_ec_typec: Init mutex in Thunderbolt registrationTzung-Bi Shih
cros_typec_register_thunderbolt() missed initializing the `adata->lock` mutex. This leads to a NULL dereference when the mutex is later acquired (e.g. in cros_typec_altmode_work()). Initialize the mutex in cros_typec_register_thunderbolt() to fix the issue. Cc: stable@vger.kernel.org Fixes: 3b00be26b16a ("platform/chrome: cros_ec_typec: Thunderbolt support") Reviewed-by: Benson Leung <bleung@chromium.org> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Link: https://lore.kernel.org/r/20260505053403.3335740-1-tzungbi@kernel.org Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
7 daysMerge branch 'net-mlx5-fixes-for-socket-direct'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5: Fixes for Socket-Direct This series fixes several race conditions and bugs in the mlx5 Socket-Direct (SD) single netdev flow. Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with mlx5_devcom_comp_lock() and tracks the SD group state on the primary device, preventing concurrent or duplicate bring-up/tear-down. Patch 2 fixes the debugfs "multi-pf" directory being stored on the calling device's sd struct instead of the primary's, which caused memory leaks and recreation errors when cleanup ran from a different PF. Patch 3 fixes a race where a secondary PF could access the primary's auxiliary device after it had been unbound, by holding the primary's device lock while operating on its auxiliary device. Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on the resume path requires introducing sd_suspend/resume APIs that only destroy FW resources and is left for a follow-up series. ==================== Link: https://patch.msgid.link/20260504180206.268568-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: SD, Fix race condition in secondary device probe/removeShay Drory
When utilizing Socket-Direct single netdev functionality the driver resolves the actual auxiliary device using mlx5_sd_get_adev(). However, the current implementation returns the primary ETH auxiliary device without holding the device lock, leading to a potential race condition where the ETH device could be unbound or removed concurrently during probe, suspend, resume, or remove operations.[1] Fix this by introducing mlx5_sd_put_adev() and updating mlx5_sd_get_adev() so that secondaries devices would get a ref and acquire the device lock of the returned auxiliary device. After the lock is acquired, a second devcom check is needed[2]. In addition, update The callers to pair the get operation with the new put operation, ensuring the lock is held while the auxiliary device is being operated on and released afterwards. The "primary" designation is determined once in sd_register(). It's set before devcom is marked ready, and it never changes after that. In Addition, The primary path never locks a secondary: When the primary device invoke mlx5_sd_get_adev(), it sees dev == primary and returns. no additional lock is taken. Therefore lock ordering is always: secondary_lock -> primary_lock. The reverse never happens, so ABBA deadlock is impossible. [1] for example: BUG: kernel NULL pointer dereference, address: 0000000000000370 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 4 UID: 0 PID: 3945 Comm: bash Not tainted 6.19.0-rc3+ #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 [mlx5_core] Call Trace: <TASK> mlx5e_remove+0x82/0x12a [mlx5_core] device_release_driver_internal+0x194/0x1f0 bus_remove_device+0xc6/0x140 device_del+0x159/0x3c0 ? devl_param_driverinit_value_get+0x29/0x80 mlx5_rescan_drivers_locked+0x92/0x160 [mlx5_core] mlx5_unregister_device+0x34/0x50 [mlx5_core] mlx5_uninit_one+0x43/0xb0 [mlx5_core] remove_one+0x4e/0xc0 [mlx5_core] pci_device_remove+0x39/0xa0 device_release_driver_internal+0x194/0x1f0 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x55/0xe90 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] CPU0 (primary) CPU1 (secondary) ========================================================================== mlx5e_remove() (device_lock held) mlx5e_remove() (2nd device_lock held) mlx5_sd_get_adev() mlx5_devcom_comp_is_ready() => true device_lock(primary) mlx5_sd_get_adev() ==> ret adev _mlx5e_remove() mlx5_sd_cleanup() // mlx5e_remove finished // releasing device_lock //need another check here... mlx5_devcom_comp_is_ready() => false Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: SD, Fix missing cleanup on probe errorShay Drory
When _mlx5e_probe() fails, the preceding successful mlx5_sd_init() is not undone. Auxiliary bus probe failure skips binding, so mlx5e_remove() is never called for that adev and the matching mlx5_sd_cleanup() never runs - leaking the per-dev SD struct. Call mlx5_sd_cleanup() on the probe error path to balance mlx5_sd_init(). A similar gap exists on the resume path: mlx5_sd_init() and mlx5_sd_cleanup() are currently bundled with both probe/remove and suspend/resume, even though only the FW alias state actually needs to follow the suspend/resume lifecycle - the sd struct allocation and devcom membership are software state that should track the full bound lifetime. As a result, a failed resume can leave a still-bound device with sd == NULL, which mlx5_sd_get_adev() can't distinguish from a non-SD device. Fixing this requires sd_suspend/resume APIs which will only destroy FW resources and is left for a follow-up series. Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5: SD, Keep multi-pf debugfs entries on primaryShay Drory
mlx5_sd_init() creates the "multi-pf" debugfs directory under the primary device debugfs root, but stored the dentry in the calling device's sd struct. When sd_cleanup() run on a different PF, this leads to using the wrong sd->dfs for removing entries, which results in memory leak and an error in when re-creating the SD.[1] Fix it by explicitly storing the debugfs dentry in the primary device sd struct and use it for all per-group files. [1] debugfs: 'multi-pf' already exists in '0000:08:00.1' Fixes: 4375130bf527 ("net/mlx5: SD, Add debugfs") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5: SD: Serialize init/cleanupShay Drory
mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same Socket-Direct group. This can cause the SD bring-up/tear-down sequence to be executed more than once or interleaved across PFs. Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD group state on the primary device. Skip init if the primary is already UP, and skip cleanup unless the primary is UP. The state check on cleanup is needed because sd_register() drops the devcom comp lock between marking the comp ready and assigning primary_dev on each peer. A concurrent cleanup that acquires the lock in this window would observe devcom_is_ready==true while primary_dev is still NULL (causing mlx5_sd_get_primary() to return NULL) or while the FW alias setup performed by mlx5_sd_init()'s body has not yet run (causing sd_cmd_unset_primary() to dereference a NULL tx_ft). Gate the cleanup body on primary_sd->state == MLX5_SD_STATE_UP, which is set only at the very end of mlx5_sd_init() under the same comp lock - so observing UP guarantees primary_dev, secondaries[], tx_ft, and dfs are all populated. Also bail explicitly if mlx5_sd_get_primary() returns NULL, in case state is checked on a peer whose primary_dev hasn't been assigned yet. In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister() into the cleanup's locked section, including the !primary and state != UP early-exit paths, so the device cannot unregister and free its struct mlx5_sd while devcom is still marked ready. A concurrent init acquiring the devcom lock will now observe devcom is no longer ready and bail out immediately. Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504180206.268568-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysarch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig filesThomas Zimmermann
CONFIG_FIRMWARE_EDID=y depends on X86 or EFI_GENERIC_STUB. Neither is true here, so drop the lines from the defconfig files. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260401083023.214426-1-tzimmermann@suse.de
7 daysMerge branch 'net-mlx5e-psp-fixes'Jakub Kicinski
Tariq Toukan says: ==================== net/mlx5e: PSP fixes This patchset provides bug fixes from Cosmin to the mlx5e PSP feature. ==================== Link: https://patch.msgid.link/20260504181100.269334-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disableCosmin Ratiu
devlink reload while PSP connections are active does: mlx5_unload_one_devl_locked() -> mlx5_detach_device() -> _mlx5e_suspend() -> mlx5e_detach_netdev() -> profile->cleanup_rx -> profile->cleanup_tx -> mlx5e_destroy_mdev_resources() -> mlx5_core_dealloc_pd() fails: ... mlx5_core 0000:08:00.0: mlx5_cmd_out_err:821:(pid 19722): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xef0c8a), err(-22) ... The reason for failure is the existence of TX keys, which are removed by the PSP dev unregistration happening in: profile->cleanup() -> mlx5e_psp_unregister() -> mlx5e_psp_cleanup() -> psp_dev_unregister() ...but this isn't invoked in the devlink reload flow, only when changing the NIC profile (e.g. when transitioning to switchdev mode) or on dev teardown. Move PSP device registration into mlx5e_nic_enable(), and unregistration into the corresponding mlx5e_nic_disable(). These functions are called during netdev attach/detach after RX & TX are set up. This ensures that the keys will be gone by the time the PD is destroyed. Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: psp: Expose only a fully initialized priv->pspCosmin Ratiu
Currently, during PSP init, priv->psp is initialized to an incompletely built psp struct. Additionally, on fs init failure priv->psp is reset to NULL. Change this so that only a fully initialized priv->psp is set, which makes the code easier to reason about in failure scenarios. Fixes: af2196f49480 ("net/mlx5e: Implement PSP operations .assoc_add and .assoc_del") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnet/mlx5e: psp: Fix invalid access on PSP dev registration failCosmin Ratiu
priv->psp->psp is initialized with the PSP device as returned by psp_dev_create(). This could also return an error, in which case a future psp_dev_unregister() will result in unpleasantness. Avoid that by using a local variable and only saving the PSP device when registration succeeds. In case psp_dev_create() fails, priv->psp and steering structs are left in place, but they will be inert. The unchecked access of priv->psp in mlx5e_psp_offload_handle_rx_skb() won't happen because without a PSP device, there can be no SAs added and therefore no packets will be successfully decrypted and be handed off to the SW handler. Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260504181100.269334-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 dayspowerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked eventsShivani Nittor
The core-book3s PMU sampling code validates the SIER TYPE field when PERF_SAMPLE_DATA_SRC is requested. The SIER TYPE field indicates the instruction type and is only valid for random sampling (marked events). To handle cases observed where SIER TYPE could be zero even for marked events,validation was added to drop such samples and increment event->lost_samples. However, this validation was applied to all samples, including continuous sampling. In continuous sampling mode, the PMU does not set the SIER TYPE field, so it remains zero. As a result, valid continuous samples were incorrectly treated as invalid and dropped. Fixed this by gating the SIER TYPE validation with mark_event, so the check runs only for marked (random) events. Continuous samples now skip this check and are recorded normally in the final data recording path. Fixes: 2ffb26afa642 ("arch/powerpc/perf: Check the instruction type before creating sample with perf_mem_data_src") Signed-off-by: Shivani Nittor <shivani@linux.ibm.com> Reviewed-by: Mukesh Kumar Chaurasiya (IBM) <mkchauras@gmail.com> Reviewed-by: Athira Rajeev <atrajeev@linux.ibm.com> [Maddy: Fixed reviewed-by tag] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260421150628.96500-1-shivani@linux.ibm.com
7 dayspowerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()Christophe Leroy (CS GROUP)
Allthough fsl,cpm1-gpio-irq-mask always contains a 16 bits value, it is a standard u32 OF property as documented in Documentation/devicetree/bindings/soc/fsl/cpm_qe/gpio.txt The driver erroneously uses of_property_read_u16() leading to a mask which is always 0. Fix it by using of_property_read_u32() instead. Fixes: 726bd223105c ("powerpc/8xx: Adding support of IRQ in MPC8xx GPIO") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/bb0b6d6c4543238c38d5d29a776d0674a8c0c180.1776752750.git.chleroy@kernel.org
7 daysnet: wwan: t7xx: validate port_count against message length in ↵Pavitra Jha
t7xx_port_enum_msg_handler t7xx_port_enum_msg_handler() uses the modem-supplied port_count field as a loop bound over port_msg->data[] without checking that the message buffer contains sufficient data. A modem sending port_count=65535 in a 12-byte buffer triggers a slab-out-of-bounds read of up to 262140 bytes. Add a sizeof(*port_msg) check before accessing the port message header fields to guard against undersized messages. Add a struct_size() check after extracting port_count and before the loop. In t7xx_parse_host_rt_data(), guard the rt_feature header read with a remaining-buffer check before accessing data_len, validate feat_data_len against the actual remaining buffer to prevent OOB reads and signed integer overflow on offset. Pass msg_len from both call sites: skb->len at the DPMAIF path after skb_pull(), and the validated feat_data_len at the handshake path. Fixes: da45d2566a1d ("net: wwan: t7xx: Add control port") Cc: stable@vger.kernel.org Signed-off-by: Pavitra Jha <jhapavitra98@gmail.com> Link: https://patch.msgid.link/20260501110713.145563-1-jhapavitra98@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 dayspowerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexecSourabh Jain
The kexec sequence invokes enter_vmx_ops() via copy_page() with the MMU disabled. In this context, code must not rely on normal virtual address translations or trigger page faults. With KASAN enabled, functions get instrumented and may access shadow memory using regular address translation. When executed with the MMU off, this can lead to page faults (bad_page_fault) from which the kernel cannot recover in the kexec path, resulting in a hang. The kexec path sets preempt_count to HARDIRQ_OFFSET before entering the MMU-off copy sequence. current_thread_info()->preempt_count = HARDIRQ_OFFSET kexec_sequence(..., copy_with_mmu_off = 1) -> kexec_copy_flush(image) copy_segments() -> copy_page(dest, addr) bl enter_vmx_ops() if (in_interrupt()) return 0 beq .Lnonvmx_copy Since kexec sets preempt_count to HARDIRQ_OFFSET, in_interrupt() evaluates to true and enter_vmx_ops() returns early. As in_interrupt() (and preempt_count()) are always inlined, mark enter_vmx_ops() with __no_sanitize_address to avoid KASAN instrumentation and shadow memory access with MMU disabled, helping kexec boot fine with KASAN enabled. Reported-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260407124349.1698552-2-sourabhjain@linux.ibm.com
7 dayspowerpc/kdump: fix KASAN sanitization flag for core_$(BITS).oSourabh Jain
KASAN instrumentation is intended to be disabled for the kexec core code, but the existing Makefile entry misses the object suffix. As a result, the flag is not applied correctly to core_$(BITS).o. So when KASAN is enabled, kexec_copy_flush and copy_segments in kexec/core_64.c are instrumented, which can result in accesses to shadow memory via normal address translation paths. Since these run with the MMU disabled, such accesses may trigger page faults (bad_page_fault) that cannot be handled in the kdump path, ultimately causing a hang and preventing the kdump kernel from booting. The same is true for kexec as well, since the same functions are used there. Update the entry to include the “.o” suffix so that KASAN instrumentation is properly disabled for this object file. Fixes: 2ab2d5794f14 ("powerpc/kasan: Disable address sanitization in kexec paths") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/1dee8891-8bcc-46b4-93f3-fc3a774abd5b@linux.ibm.com/ Cc: stable@vger.kernel.org Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260407124349.1698552-1-sourabhjain@linux.ibm.com
7 dayspseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()Ritesh Harjani (IBM)
While at it let's also fix the similar style issue in enable_hvpipe_IRQ() function. This also fixes a minor checkpatch warning which I got due to an extra space before " ==". Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1174f60d0ae128e773dbefd11dd8d46d69e7f50e.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()Ritesh Harjani (IBM)
Simplify hvpipe_rtas_recv_msg() by removing three levels of nesting... if (!ret) if (buf) if (size < bytes_written) ... this refactoring of the function bails out to "out:" label first, in case of any error. This simplifies the init flow. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/bbe7ddf8b8e25c9be8fc5e2c4aea9e5fca128bf4.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_infoRitesh Harjani (IBM)
We don't really use task_struct pointer for anything meaningful. So just kill it for now, and we can bring back later if we need this for any future debug purposes. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/895e061e45cdc95db36fa7f27aa1922b81eed867.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()Ritesh Harjani (IBM)
Once the src_info is removed from the global list, no one can access it. This simplies the usage of spin_unlock_irqrestore() in papr_hvpipe_handle_release() Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/4a980331557af3d10aada8576aaa16cddc691c65.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Fix the usage of copy_to_user()Ritesh Harjani (IBM)
copy_to_user() return bytes_not_copied to the user buffer. If there was an error writing bytes into the user buffer, i.e. if copy_to_user returns a non-zero value, then we should simply return -EFAULT from the ->read() call. Otherwise, in the non-patched version, we may end up mixing "bytes_not_copied + bytes_copied (HVPIPE_HDR_LEN)" as the return value to the user in ->read() call Also let's make sure we clear the hvpipe_status flag, if we have consumed the hvpipe msg by making the rtas call. ret = -EFAULT means copy_to_user has failed but that still means that the msg was read from the hvpipe, hence for both cases, success & -EFAULT, we should clear the HVPIPE_MSG_AVAILABLE flag in hvpipe_status. Cc: stable@vger.kernel.org Fixes: cebdb522fd3edd1 ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8fda3212a1ad48879c174e92f67472d9b9f1c3b7.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()Ritesh Harjani (IBM)
Remove such 3 levels of nesting patterns to check success return values from function calls. ret = enable_hvpipe_IRQ() if (!ret) ret = set_hvpipe_sys_param(1) if (!ret) ret = misc_register() Instead just bail out to "out*:" labels, in case of any error. This simplifies the init flow. While at it let's also fix the following error handling logic: We have already enabled interrupt sources and enabled hvpipe to received interrupts, if misc_register() fails, we will destroy the workqueue, but the HMC might send us a msg via hvpipe which will call, queue work on the workqueue which might be destroyed. So instead, let's reverse the order of enabling set_hvpipe_sys_param(1) and in case of an error let's remove the misc dev by calling misc_deregister(). Cc: stable@vger.kernel.org Fixes: 39a08a4f94980 ("powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/f2141eafb80e7780395e03aa9a22e8a37be80513.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()Ritesh Harjani (IBM)
commit 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()"), changed the create handle to FD_PREPARE(), but it caused kernel null-ptr-deref because after call to retain_and_null_ptr(src_info), src_info is re-used for adding it to the global list. Getting the following kernel panic in papr_hvpipe_dev_create_handle() when trying to add src_info to the list. Kernel attempted to write user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on write at 0x00000000 Faulting instruction address: 0xc0000000001b44a0 Oops: Kernel access of bad area, sig: 11 [#1] ... Call Trace: papr_hvpipe_dev_ioctl+0x1f4/0x48c (unreliable) sys_ioctl+0x528/0x1064 system_call_exception+0x128/0x360 system_call_vectored_common+0x15c/0x2ec Now, the error handling with FD_PREPARE's file cleanup and __free(kfree) auto cleanup is getting too convoluted. This is mainly because we need to ensure only 1 user get the srcID handle. To simplify this, we allocate prepare the src_info in the beginning and add it to the global list under a spinlock after checking that no duplicates exist. This simplify the error handling where if the FD_ADD fails, we can simply remove the src_info from the list and consume any pending msg in hvpipe to be cleared, after src_info became visible in the global list. Cc: stable@vger.kernel.org Fixes: 6d3789d347a7 ("papr-hvpipe: convert papr_hvpipe_dev_create_handle() to FD_PREPARE()") Reported-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/31ad94bc89d44156ee700c5bd006cb47a748e3cb.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Prevent kernel stack memory leak to userspaceRitesh Harjani (IBM)
The hdr variable is allocated on the stack and only hdr.version and hdr.flags are initialized explicitly. Because the struct papr_hvpipe_hdr contains reserved padding bytes (reserved[3] and reserved2[40]), these could leak the uninitialized bytes to userspace after copy_to_user(). This patch fixes that by initializing the whole struct to 0. Cc: stable@vger.kernel.org Fixes: cebdb522fd3ed ("powerpc/pseries: Receive payload with ibm,receive-hvpipe-msg RTAS") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/7bfe03b65a282c856ed8182d1871bb973c0b78f2.1777606826.git.ritesh.list@gmail.com
7 dayspseries/papr-hvpipe: Fix race with interrupt handlerRitesh Harjani (IBM)
While executing ->ioctl handler or ->release handler, if an interrupt fires on the same cpu, then we can enter into a deadlock. This patch fixes both these handlers to take spin_lock_irq{save|restore} versions of the lock to prevent this deadlock. Cc: stable@vger.kernel.org Fixes: 814ef095f12c9 ("powerpc/pseries: Add papr-hvpipe char driver for HVPIPE interfaces") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/e4ed435c44fc191f2eb23c7907ba6f72f193e6aa.1777606826.git.ritesh.list@gmail.com
7 dayspowerpc/pseries/htmdump: Add memory configuration dump support to htmdump moduleAthira Rajeev
H_HTM (Hardware Trace Macro) hypervisor call has capability to capture SystemMemory Configuration. This information helps to understand the address mapping for the partitions in the system. Support dumping system memory configuration from Hardware Trace Macro (HTM) function via debugfs interface. Under debugfs folder "/sys/kernel/debug/powerpc/htmdump", add file "htmsystem_mem". The interface allows only read of this file which will present the content of HTM buffer from the hcall. The 16th offset of HTM buffer has value for the number of entries for array of processors. Use this information to copy data to the debugfs file Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260314132953.27269-1-atrajeev@linux.ibm.com
7 dayspowerpc/pseries/htmdump: Fix the offset value used in htm status dumpAthira Rajeev
H_HTM call is invoked using three parameters specifying the address of the buffer, size of the buffer and offset where to read from. offset used was always zero. "offset" is value from output buffer header that points to next entry to dump. zero is the first entry to dump. next entry is read from the output bufferbyte offset 0x8. Update htmstatus_read() function to use right offset. Return when offset points to -1 Fixes: 627cf584f4c3 ("powerpc/pseries/htmdump: Add htm status support to htmdump module") Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260314132432.25581-3-atrajeev@linux.ibm.com
7 dayspowerpc/pseries/htmdump: Fix the offset value used in processor ↵Athira Rajeev
configuration dump H_HTM call is invoked using three parameters specifying the address of the buffer, size of the buffer and offset where to read from. offset used was always zero. "offset" is value from output buffer header that points to next entry to dump. zero is the first entry to dump. next entry is read from the output bufferbyte offset 0x8. Update htminfo_read() function to use right offset. Return when offset points to -1 Fixes: dea7384e14e7 ("powerpc/pseries/htmdump: Add htm info support to htmdump module") Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260314132432.25581-2-atrajeev@linux.ibm.com
7 dayspowerpc/pseries/htmdump: Free the global buffers in htmdump module exitAthira Rajeev
htmdump modules uses global memory buffers to capture details like capabilities, status of specified HTM, read the trace buffer. These are initialized during module init and hence needs to be freed in module exit. Patch adds freeing of the memory in module exit. The change also includes minor clean up for the variable name. The read call back for the debugfs interface file saves filp->private_data to local variable name which is same as global variable name for the memory buffers. Rename these local variable names. Signed-off-by: Athira Rajeev <atrajeev@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260314132432.25581-1-atrajeev@linux.ibm.com
7 daysnet/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()Eric Dumazet
fq_codel_dump_class_stats() acquires qdisc spinlock only when requested to follow flow->head chain. As we did in sch_cake recently, add the missing READ_ONCE()/WRITE_ONCE() annotations. Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260504163842.1162001-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'nf-26-05-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== IPVS fixes for net The following batch contains IPVS fixes for net to address issues from the latest net-next pull request. Julian Anastasov made the following summary: 1-3) Fixes for the recently added resizable hash tables 4) dest from trash can be leaked if ip_vs_start_estimator() fails 5) fixed races and locking for the estimation kthreads 6) fix for wrong roundup_pow_of_two() usage in the resizable hash tables 7-8) v2 of the changes from Waiman Long to properly guard against the housekeeping_cpumask() updates: https://lore.kernel.org/netfilter-devel/20260331165015.2777765-1-longman@redhat.com/ I added missing Fixes tag. The original description: Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management"), the HK_TYPE_KTHREAD housekeeping cpumask may no longer be correct in showing the actual CPU affinity of kthreads that have no predefined CPU affinity. As the ipvs networking code is still using HK_TYPE_KTHREAD, we need to make HK_TYPE_KTHREAD reflect the reality. This patch series makes HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN and uses RCU to protect access to the HK_TYPE_KTHREAD housekeeping cpumask. Julian plans to post a nf-next patch to limit the connections by using "conn_max" sysctl. With Simon Horman, they agreed that this is an old problem that we do not have a limit of connections and it is not a stopper for this patchset. * tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size ipvs: fix races around est_mutex and est_cpulist ipvs: do not leak dest after get from dest trash ipvs: fix the spin_lock usage for RT build ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars ipvs: fixes for the new ip_vs_status info ==================== Link: https://patch.msgid.link/20260505001648.360569-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Pavan Chebbi says: ==================== bnxt_en: Bug fixes This patchset adds the following fixes for bnxt: Patch #1 fixes DPC AER handling to make it more reliable Patch #2 fixes incorrect capping bp->max_tpa based on what the FW supports Patch #3 fixes ignoring of VNIC configuration result when RDMA driver is loading Patch #4 fixes logic to make phase adjustment on the PPS OUT signal ==================== Link: https://patch.msgid.link/20260504083611.1383776-1-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysbnxt_en: Use absolute target ns from ptp_clock_requestPavan Chebbi
There is no need to calculate the target PHC cycles required to make phase adjustment on the PPS OUT signal. This is because the application supplies absolute n_sec value in the future and is already the actual desired target value. Remove the unnecessary code. Fixes: 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Tested-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260504083611.1383776-5-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>