summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2025-11-27kho: remove global preserved_mem_map and store state in FDTPasha Tatashin
Currently, the serialized memory map is tracked via kho_out.preserved_mem_map and copied to the FDT during finalization. This double tracking is redundant. Remove preserved_mem_map from kho_out. Instead, maintain the physical address of the head chunk directly in the preserved-memory-map FDT property. Introduce kho_update_memory_map() to manage this property. This function handles: 1. Retrieving and freeing any existing serialized map (handling the abort/retry case). 2. Updating the FDT property with the new chunk address. This establishes the FDT as the single source of truth for the handover state. Link: https://lkml.kernel.org/r/20251114190002.3311679-9-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: simplify serialization and remove __kho_abortPasha Tatashin
Currently, __kho_finalize() performs memory serialization in the middle of FDT construction. If FDT construction fails later, the function must manually clean up the serialized memory via __kho_abort(). Refactor __kho_finalize() to perform kho_mem_serialize() only after the FDT has been successfully constructed and finished. This reordering has two benefits: 1. It avoids expensive serialization work if FDT generation fails. 2. It removes the need for cleanup in the FDT error path. As a result, the internal helper __kho_abort() is no longer needed for internal error handling. Inline its remaining logic (cleanup of the preserved memory map) directly into kho_abort() and remove the helper. Link: https://lkml.kernel.org/r/20251114190002.3311679-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: always expose output FDT in debugfsPasha Tatashin
Currently, the output FDT is added to debugfs only when KHO is finalized and removed when aborted. There is no need to hide the FDT based on the state. Always expose it starting from initialization. This aids the transition toward removing the explicit abort functionality and converting KHO to be fully stateless. Link: https://lkml.kernel.org/r/20251114190002.3311679-7-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: verify deserialization status and fix FDT alignment accessPasha Tatashin
During boot, kho_restore_folio() relies on the memory map having been successfully deserialized. If deserialization fails or no map is present, attempting to restore the FDT folio is unsafe. Update kho_mem_deserialize() to return a boolean indicating success. Use this return value in kho_memory_init() to disable KHO if deserialization fails. Also, the incoming FDT folio is never used, there is no reason to restore it. Additionally, use get_unaligned() to retrieve the memory map pointer from the FDT. FDT properties are not guaranteed to be naturally aligned, and accessing a 64-bit value via a pointer that is only 32-bit aligned can cause faults. Link: https://lkml.kernel.org/r/20251114190002.3311679-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: preserve FDT folio only once during initializationPasha Tatashin
Currently, the FDT folio is preserved inside __kho_finalize(). If the user performs multiple finalize/abort cycles, kho_preserve_folio() is called repeatedly for the same FDT folio. Since the FDT folio is allocated once during kho_init(), it should be marked for preservation at the same time. Move the preservation call to kho_init() to align the preservation state with the object's lifecycle and simplify the finalize path. Also, pre-zero the FDT tree so we do not expose random bits to the user and to the next kernel by using the new kho_alloc_preserve() api. Link: https://lkml.kernel.org/r/20251114190002.3311679-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: introduce high-level memory allocation APIPasha Tatashin
Currently, clients of KHO must manually allocate memory (e.g., via alloc_pages), calculate the page order, and explicitly call kho_preserve_folio(). Similarly, cleanup requires separate calls to unpreserve and free the memory. Introduce a high-level API to streamline this common pattern: - kho_alloc_preserve(size): Allocates physically contiguous, zeroed memory and immediately marks it for preservation. - kho_unpreserve_free(ptr): Unpreserves and frees the memory in the current kernel. - kho_restore_free(ptr): Restores the struct page state of preserved memory in the new kernel and immediately frees it to the page allocator. [pasha.tatashin@soleen.com: build fixes] Link: https://lkml.kernel.org/r/CA+CK2bBgXDhrHwTVgxrw7YTQ-0=LgW0t66CwPCgG=C85ftz4zw@mail.gmail.com Link: https://lkml.kernel.org/r/20251114190002.3311679-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: convert __kho_abort() to return voidPasha Tatashin
The internal helper __kho_abort() always returns 0 and has no failure paths. Its return value is ignored by __kho_finalize and checked needlessly by kho_abort. Change the return type to void to reflect that this function cannot fail, and simplify kho_abort by removing dead error handling code. Link: https://lkml.kernel.org/r/20251114190002.3311679-3-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: fix misleading log message in kho_populate()Pasha Tatashin
Patch series "kho: simplify state machine and enable dynamic updates", v2. This patch series refactors the Kexec Handover subsystem to transition from a rigid, state-locked model to a dynamic, re-entrant architecture. It also introduces usability improvements. Motivation Currently, KHO relies on a strict state machine where memory preservation is locked upon finalization. If a change is required, the user must explicitly "abort" to reset the state. Additionally, the kexec image cannot be loaded until KHO is finalized, and the FDT is rebuilt from scratch on every finalization. This series simplifies this workflow to support "load early, finalize late" scenarios. Key Changes State Machine Simplification: - Removed kho_abort(). kho_finalize() is now re-entrant; calling it a second time automatically flushes the previous serialized state and generates a fresh one. - Removed kho_out.finalized checks from preservation APIs, allowing drivers to add/remove pages even after an initial finalization. - Decoupled kexec_file_load from KHO finalization. The KHO FDT physical address is now stable from boot, allowing the kexec image to be loaded before the handover metadata is finalized. FDT Management: - The FDT is now updated in-place dynamically when subtrees are added or removed, removing the need for complex reconstruction logic. - The output FDT is always exposed in debugfs (initialized and zeroed at boot), improving visibility and debugging capabilities throughout the system lifecycle. - Removed the redundant global preserved_mem_map pointer, establishing the FDT property as the single source of truth. New Features & API Enhancements: - High-Level Allocators: Introduced kho_alloc_preserve() and friends to reduce boilerplate for drivers that need to allocate, preserve, and eventually restore simple memory buffers. - Configuration: Added CONFIG_KEXEC_HANDOVER_ENABLE_DEFAULT to allow KHO to be active by default without requiring the kho=on command line parameter. Fixes: - Fixed potential alignment faults when accessing 64-bit FDT properties. - Fixed the lifecycle of the FDT folio preservation (now preserved once at init). This patch (of 13): The log message in kho_populate() currently states "Will skip init for some devices". This implies that Kexec Handover always involves skipping device initialization. However, KHO is a generic mechanism used to preserve kernel memory across reboot for various purposes, such as memfd, telemetry, or reserve_mem. Skipping device initialization is a specific property of live update drivers using KHO, not a property of the mechanism itself. Remove the misleading suffix to accurately reflect the generic nature of KHO discovery. Link: https://lkml.kernel.org/r/20251114190002.3311679-2-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Coiby Xu <coxu@redhat.com> Cc: Dave Vasilevsky <dave@vasilevsky.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27liveupdate: kho: use %pe format specifier for error pointer printingZhu Yanjun
Make pr_xxx() call to use the %pe format specifier instead of %d. The %pe specifier prints a symbolic error string (e.g., -ENOMEM, -EINVAL) when given an error pointer created with ERR_PTR(err). This change enhances the clarity and diagnostic value of the error message by showing a descriptive error name rather than a numeric error code. Note, that some err are still printed by value, as those errors might come from libfdt and not regular errnos. Link: https://lkml.kernel.org/r/20251101142325.1326536-10-pasha.tatashin@soleen.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27liveupdate: kho: move to kernel/liveupdatePasha Tatashin
Move KHO to kernel/liveupdate/ in preparation of placing all Live Update core kernel related files to the same place. [pasha.tatashin@soleen.com: disable the menu when DEFERRED_STRUCT_PAGE_INIT] Link: https://lkml.kernel.org/r/CA+CK2bAvh9Oa2SLfsbJ8zztpEjrgr_hr-uGgF1coy8yoibT39A@mail.gmail.com Link: https://lkml.kernel.org/r/20251101142325.1326536-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: don't unpreserve memory during abortPasha Tatashin
KHO allows clients to preserve memory regions at any point before the KHO state is finalized. The finalization process itself involves KHO performing its own actions, such as serializing the overall preserved memory map. If this finalization process is aborted, the current implementation destroys KHO's internal memory tracking structures (`kho_out.ser.track.orders`). This behavior effectively unpreserves all memory from KHO's perspective, regardless of whether those preservations were made by clients before the finalization attempt or by KHO itself during finalization. This premature unpreservation is incorrect. An abort of the finalization process should only undo actions taken by KHO as part of that specific finalization attempt. Individual memory regions preserved by clients prior to finalization should remain preserved, as their lifecycle is managed by the clients themselves. These clients might still need to call kho_unpreserve_folio() or kho_unpreserve_phys() based on their own logic, even after a KHO finalization attempt is aborted. Link: https://lkml.kernel.org/r/20251101142325.1326536-7-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: add interfaces to unpreserve folios, page ranges, and vmallocPasha Tatashin
Allow users of KHO to cancel the previous preservation by adding the necessary interfaces to unpreserve folio, pages, and vmallocs. Link: https://lkml.kernel.org/r/20251101142325.1326536-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: drop notifiersMike Rapoport (Microsoft)
The KHO framework uses a notifier chain as the mechanism for clients to participate in the finalization process. While this works for a single, central state machine, it is too restrictive for kernel-internal components like pstore/reserve_mem or IMA. These components need a simpler, direct way to register their state for preservation (e.g., during their initcall) without being part of a complex, shutdown-time notifier sequence. The notifier model forces all participants into a single finalization flow and makes direct preservation from an arbitrary context difficult. This patch refactors the client participation model by removing the notifier chain and introducing a direct API for managing FDT subtrees. The core kho_finalize() and kho_abort() state machine remains, but clients now register their data with KHO beforehand. Link: https://lkml.kernel.org/r/20251101142325.1326536-3-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27kho: make debugfs interface optionalPasha Tatashin
Patch series "liveupdate: Rework KHO for in-kernel users", v9. This series refactors the KHO framework to better support in-kernel users like the upcoming LUO. The current design, which relies on a notifier chain and debugfs for control, is too restrictive for direct programmatic use. The core of this rework is the removal of the notifier chain in favor of a direct registration API. This decouples clients from the shutdown-time finalization sequence, allowing them to manage their preserved state more flexibly and at any time. In support of this new model, this series also: - Makes the debugfs interface optional. - Introduces APIs to unpreserve memory and fixes a bug in the abort path where client state was being incorrectly discarded. Note that this is an interim step, as a more comprehensive fix is planned as part of the stateless KHO work [1]. - Moves all KHO code into a new kernel/liveupdate/ directory to consolidate live update components. This patch (of 9): Currently, KHO is controlled via debugfs interface, but once LUO is introduced, it can control KHO, and the debug interface becomes optional. Add a separate config CONFIG_KEXEC_HANDOVER_DEBUGFS that enables the debugfs interface, and allows to inspect the tree. Move all debugfs related code to a new file to keep the .c files clear of ifdefs. Link: https://lkml.kernel.org/r/20251101142325.1326536-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20251101142325.1326536-2-pasha.tatashin@soleen.com Link: https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com [1] Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Changyuan Lyu <changyuanl@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Simon Horman <horms@kernel.org> Cc: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27fork: stop ignoring NUMA while handling cached thread stacksMateusz Guzik
1. the numa parameter was straight up ignored. 2. nothing was done to check if the to-be-cached/allocated stack matches the local node The id remains ignored on free in case of memoryless nodes. Note the current caching is already bad as the cache keeps overflowing and a different solution is needed for the long run, to be worked out(tm). Stats collected over a kernel build with the patch with the following topology: NUMA node(s): 2 NUMA node0 CPU(s): 0-11 NUMA node1 CPU(s): 12-23 caller's node vs stack backing pages on free: matching: 50083 (70%) mismatched: 21492 (30%) caching efficiency: cached: 32651 (65.2%) dropped: 17432 (34.8%) Link: https://lkml.kernel.org/r/20251120054015.3019419-1-mjguzik@gmail.com Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Linus Waleij <linus.walleij@linaro.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Kees Cook <kees@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-27Merge branch 'mm-hotfixes-stable' into mm-nonmm-stable in order to be ableAndrew Morton
to merge "kho: make debugfs interface optional" into mm-nonmm-stable.
2025-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: net/xdp/xsk.c 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") 8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb") 30ed05adca4a ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27printk: Use console_is_usable on console_unblankMarcos Paulo de Souza
The macro for_each_console_srcu iterates over all registered consoles. It's implied that all registered consoles have CON_ENABLED flag set, making the check for the flag unnecessary. Call console_is_usable function to fully verify if the given console is usable before calling the ->unblank callback. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/20251121-printk-cleanup-part2-v2-3-57b8b78647f4@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-27sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_convJoel Granados
Make do_proc_douintvec static and export proc_douintvec_conv wrapper function for external use. This is to keep with the design in sysctl.c. Update fs/pipe.c to use the new public API. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create pipe-max-size converter using sysctl UINT macrosJoel Granados
Create a converter for the pipe-max-size proc_handler using the SYSCTL_UINT_CONV_CUSTOM. Move SYSCTL_CONV_IDENTITY macro to the sysctl header to make it available for pipe size validation. Keep returning -EINVAL when (val == 0) by using a range checking converter and setting the minimal valid value (extern1) to SYSCTL_ONE. Keep round_pipe_size by passing it as the operation for SYSCTL_USER_TO_KERN_INT_CONV. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.cJoel Granados
Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c. Create a non static wrapper function proc_doulongvec_minmax_conv that forwards the custom convmul and convdiv argument values to the internal do_proc_doulongvec_minmax. Remove unused linux/times.h include from kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Move jiffies converters to kernel/time/jiffies.cJoel Granados
Move integer jiffies converters (proc_dointvec{_,_ms_,_userhz_}jiffies and proc_dointvec_ms_jiffies_minmax) to kernel/time/jiffies.c. Error stubs for when CONFIG_PRCO_SYSCTL is not defined are not reproduced because all the jiffies converters go through proc_dointvec_conv which is already stubbed. This is part of the greater effort to move sysctl logic out of kernel/sysctl.c thereby reducing merge conflicts in kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Move UINT converter macros to sysctl headerJoel Granados
Move SYSCTL_USER_TO_KERN_UINT_CONV and SYSCTL_UINT_CONV_CUSTOM macros to include/linux/sysctl.h. No need to embed sysctl_kern_to_user_uint_conv in a macro as it will not need a custom kernel pointer operation. This is a preparation commit to enable jiffies converter creation outside kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Move INT converter macros to sysctl headerJoel Granados
Move direction macros (SYSCTL_{USER_TO_KERN,KERN_TO_USER}) and the integer converter macros (SYSCTL_{USER_TO_KERN,KERN_TO_USER}_INT_CONV, SYSCTL_INT_CONV_CUSTOM) into include/linux/sysctl.h. This is a preparation commit to enable jiffies converter creation outside kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Allow custom converters from outside sysctlJoel Granados
The new non-static proc_dointvec_conv forwards a custom converter function to do_proc_dointvec from outside the sysctl scope. Rename the do_proc_dointvec call points so any future changes to proc_dointvec_conv are propagated in sysctl.c This is a preparation commit that allows the integer jiffie converter functions to move out of kernel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: remove __user qualifier from stack_erasing_sysctl buffer argumentJoel Granados
The buffer arg in proc handler functions have been void* (no __user qualifier) since commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler"). The __user qualifier was erroneously brought back in commit 0df8bdd5e3b3 ("stackleak: move stack_erasing sysctl to stackleak.c"). This fixes the error by removing the __user qualifier. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510221719.3ggn070M-lkp@intel.com/ Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create macro for user-to-kernel uint converterJoel Granados
Replace sysctl_user_to_kern_uint_conv function with SYSCTL_USER_TO_KERN_UINT_CONV macro that accepts u_ptr_op parameter for value transformation. Replacing sysctl_kern_to_user_uint_conv is not needed as it will only be used from within sysctl.c. This is a preparation commit for creating a custom converter in fs/pipe.c. No Functional changes are intended. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOMJoel Granados
Add k_ptr_range_check parameter to SYSCTL_UINT_CONV_CUSTOM macro to enable range validation using table->extra1/extra2. Replace do_proc_douintvec_minmax_conv with do_proc_uint_conv_minmax generated by the updated macro. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create unsigned int converter using new macroJoel Granados
Pass sysctl_{user_to_kern,kern_to_user}_uint_conv (unsigned integer uni-directional converters) to the new SYSCTL_UINT_CONV_CUSTOM macro to create do_proc_douintvec_conv's replacement (do_proc_uint_conv). This is a preparation commit to use the unsigned integer converter from outside sysctl. No functional change is intended. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOMJoel Granados
Extend the SYSCTL_INT_CONV_CUSTOM macro with a k_ptr_range_check parameter to conditionally generate range validation code. When enabled, validation is done against table->extra1 (min) and table->extra2 (max) bounds before assignment. Add base minmax and ms_jiffies_minmax converter instances that utilize the range checking functionality. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create integer converters with one macroJoel Granados
New SYSCTL_INT_CONV_CUSTOM macro creates "bi-directional" converters from a user-to-kernel and a kernel-to-user functions. Replace integer versions of do_proc_*_conv functions with the ones from the new macro. Rename "_dointvec_" to just "_int_" as these converters are not applied to vectors and the "do" is already in the name. Move the USER_HZ validation directly into proc_dointvec_userhz_jiffies() Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Create converter functions with two new macrosJoel Granados
Eight converter functions are created using two new macros (SYSCTL_USER_TO_KERN_INT_CONV & SYSCTL_KERN_TO_USER_INT_CONV); they are called from four pre-existing converter functions: do_proc_dointvec_conv and do_proc_dointvec{,_userhz,_ms}_jiffies_conv. The function names generated by the macros are differentiated by a string suffix passed as the first macro argument. The SYSCTL_USER_TO_KERN_INT_CONV macro first executes the u_ptr_op operation, then checks for overflow, assigns sign (-, +) and finally writes to the kernel var with WRITE_ONCE; it always returns an -EINVAL when an overflow is detected. The SYSCTL_KERN_TO_USER_INT_CONV uses READ_ONCE, casts to unsigned long, then executes the k_ptr_op before assigning the value to the user space buffer. The overflow check is always done against MAX_INT after applying {k,u}_ptr_op. This approach avoids rounding or precision errors that might occur when using the inverse operations. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Discriminate between kernel and user converter paramsJoel Granados
Rename converter parameter to indicate data flow direction: "lvalp" to "u_ptr" indicating a user space parsed value pointer. "valp" to "k_ptr" indicating a kernel storage value pointer. This facilitates the identification of discrepancies between direction (copy to kernel or copy to user space) and the modified variable. This is a preparation commit for when the converter functions are exposed to the rest of the kernel. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Indicate the direction of operation with macro namesJoel Granados
Replace the "write" integer parameter with SYSCTL_USER_TO_KERN() and SYSCTL_KERN_TO_USER() that clearly indicate data flow direction in sysctl operations. "write" originates in proc_sysctl.c (proc_sys_{read,write}) and can take one of two values: "0" or "1" when called from proc_sys_read and proc_sys_write respectively. When write has a value of zero, data is "written" to a user space buffer from a kernel variable (usually ctl_table->data). Whereas when write has a value greater than zero, data is "written" to an internal kernel variable from a user space buffer. Remove this ambiguity by introducing macros that clearly indicate the direction of the "write". The write mode names in sysctl_writes_mode are left unchanged as these directly relate to the sysctl_write_strict file in /proc/sys where the word "write" unambiguously refers to writing to a file. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Remove superfluous __do_proc_* indirectionJoel Granados
Remove "__" from __do_proc_do{intvec,uintvec,ulongvec_minmax} internal functions and delete their corresponding do_proc_do* wrappers. These indirections are unnecessary as they do not add extra logic nor do they indicate a layer separation. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Remove superfluous tbl_data param from "dovec" functionsJoel Granados
Remove superfluous tbl_data param from do_proc_douintvec{,_r,_w} and __do_proc_do{intvec,uintvec,ulongvec_minmax}. There is no need to pass it as it is always contained within the ctl_table struct. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27sysctl: Replace void pointer with const pointer to ctl_tableJoel Granados
* Replace void* data in the converter functions with a const struct ctl_table* table as it was only getting forwarding values from ctl_table->extra{1,2}. * Remove the void* data in the do_proc_* functions as they already had a pointer to the ctl_table. * Remove min/max structures do_proc_do{uint,int}vec_minmax_conv_param; the min/max values get passed directly in ctl_table. * Keep min/max initialization in extra{1,2} in proc_dou8vec_minmax. * The do_proc_douintvec was adjusted outside sysctl.c as it is exported to fs/pipe.c. Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-11-27refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()Paul E. McKenney
This commit updates the initialization for the "srcu-fast" scale type to use DEFINE_STATIC_SRCU_FAST() when reader_flavor is equal to SRCU_READ_FLAVOR_FAST. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: <bpf@vger.kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-27rcutorture: Make srcu{,d}_torture_init() announce the SRCU typePaul E. McKenney
This commit causes rcutorture's srcu_torture_init() and srcud_torture_init() functions to announce on the console log which variant of SRCU is being tortured, for example: "torture: srcud_torture_init fast SRCU". [ paulmck: Apply feedback from kernel test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-27srcu: Create an SRCU-fast-updown APIPaul E. McKenney
This commit creates an SRCU-fast-updown API, including DEFINE_SRCU_FAST_UPDOWN(), DEFINE_STATIC_SRCU_FAST_UPDOWN(), __init_srcu_struct_fast_updown(), init_srcu_struct_fast_updown(), srcu_read_lock_fast_updown(), srcu_read_unlock_fast_updown(), __srcu_read_lock_fast_updown(), and __srcu_read_unlock_fast_updown(). These are initially identical to their SRCU-fast counterparts, but both SRCU-fast and SRCU-fast-updown will be optimized in different directions by later commits. SRCU-fast will lack any sort of srcu_down_read() and srcu_up_read() APIs, which will enable extremely efficient NMI safety. For its part, SRCU-fast-updown will not be NMI safe, which will enable reasonably efficient implementations of srcu_down_read_fast() and srcu_up_read_fast(). This API fork happens to meet two different future use cases. * SRCU-fast will become the reimplementation basis for RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must be NMI safe, SRCU-fast must be as well. * SRCU-fast-updown will be needed for uretprobes code in order to get rid of the read-side memory barriers while still allowing entering the reader at task level while exiting it in a timer handler. This commit also adds rcutorture tests for the new APIs. This (annoyingly) needs to be in the same commit for bisectability. With this commit, the 0x8 value tests SRCU-fast-updown. However, most SRCU-fast testing will be via the RCU Tasks Trace wrappers. [ paulmck: Apply s/0x8/0x4/ missing change per Boqun Feng feedback. ] [ paulmck: Apply Akira Yokosawa feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <bpf@vger.kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-11-26Merge tag 'trace-ringbuffer-v6.18-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fix from Steven Rostedt: - Do not allow mmapped ring buffer to be split When the ring buffer VMA is split by a partial munmap or a MAP_FIXED, the kernel calls vm_ops->close() on each portion. This causes the ring_buffer_unmap() to be called multiple times. This causes subsequent calls to return -ENODEV and triggers a warning. There's no reason to allow user space to split up memory mapping of the ring buffer. Have it return -EINVAL when that happens. * tag 'trace-ringbuffer-v6.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix WARN_ON in tracing_buffers_mmap_close for split VMAs
2025-11-26dma-direct: Fix missing sg_dma_len assignment in P2PDMA bus mappingsPranjal Shrivastava
Prior to commit a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers"), P2P segments were mapped using the pci_p2pdma_map_segment() helper. This helper was responsible for populating sg->dma_address, marking the bus address, and also setting sg_dma_len(sg). The refactor[1] removed this helper and moved the mapping logic directly into the callers. While iommu_dma_map_sg() was correctly updated to set the length in the new flow, it was missed in dma_direct_map_sg(). Thus, in dma_direct_map_sg(), the PCI_P2PDMA_MAP_BUS_ADDR case sets the dma_address and marks the segment, but immediately executes 'continue', which causes the loop to skip the standard assignment logic at the end: sg_dma_len(sg) = sg->length; As a result, when CONFIG_NEED_SG_DMA_LENGTH is enabled, the dma_length field remains uninitialized (zero) for P2P bus address mappings. This breaks upper-layer drivers (for e.g. RDMA/IB) that rely on sg_dma_len() to determine the transfer size. Fix this by explicitly setting the DMA length in the PCI_P2PDMA_MAP_BUS_ADDR case before continuing to the next scatterlist entry. Fixes: a25e7962db0d7 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers") Reported-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Pranjal Shrivastava <praan@google.com> [1] https://lore.kernel.org/all/ac14a0e94355bf898de65d023ccf8a2ad22a3ece.1746424934.git.leon@kernel.org/ Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shivaji Kant <shivajikant@google.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251126114112.3694469-1-praan@google.com
2025-11-26fgraph: Remove coarse PID filtering from graph_entry()Shengming Hu
With PID filtering working via ftrace_pids_enabled() and fgraph_pid_func, the coarse-grained ftrace_trace_task() check in graph_entry() is obsolete. It was only a fallback for uninitialized op->private (now fixed), and its removal ensures consistent PID filtering with standard function tracing. Also remove unused ftrace_trace_task() definition from trace.h. Cc: <wang.yaxin@zte.com.cn> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.yang29@zte.com.cn> Link: https://patch.msgid.link/20251126173552333XoJZN20143fWbsdTEtWoU@zte.com.cn Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26fgraph: Check ftrace_pids_enabled on registration for early filteringShengming Hu
When registering ftrace_graph, check if ftrace_pids_enabled is active. If enabled, assign entryfunc to fgraph_pid_func to ensure filtering is performed before executing the saved original entry function. Cc: stable@vger.kernel.org Cc: <wang.yaxin@zte.com.cn> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.yang29@zte.com.cn> Link: https://patch.msgid.link/20251126173331679XGVF98NLhyLJRdtNkVZ6w@zte.com.cn Fixes: df3ec5da6a1e7 ("function_graph: Add pid tracing back to function graph tracer") Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26fgraph: Initialize ftrace_ops->private for function graph opsShengming Hu
The ftrace_pids_enabled(op) check relies on op->private being properly initialized, but fgraph_ops's underlying ftrace_ops->private was left uninitialized. This caused ftrace_pids_enabled() to always return false, effectively disabling PID filtering for function graph tracing. Fix this by copying src_ops->private to dst_ops->private in fgraph_init_ops(), ensuring PID filter state is correctly propagated. Cc: stable@vger.kernel.org Cc: <wang.yaxin@zte.com.cn> Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.yang29@zte.com.cn> Fixes: c132be2c4fcc1 ("function_graph: Have the instances use their own ftrace_ops for filtering") Link: https://patch.msgid.link/20251126172926004y3hC8QyU4WFOjBkU_UxLC@zte.com.cn Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26function_graph: Enable funcgraph-args and funcgraph-retaddr to work ↵pengdonglin
simultaneously Currently, the funcgraph-args and funcgraph-retaddr features are mutually exclusive. This patch resolves this limitation by allowing funcgraph-retaddr to have an args array. To verify the change, use perf to trace vfs_write with both options enabled: Before: # perf ftrace -G vfs_write --graph-opts args,retaddr ...... down_read() { /* <-n_tty_write+0xa3/0x540 */ __cond_resched(); /* <-down_read+0x12/0x160 */ preempt_count_add(); /* <-down_read+0x3b/0x160 */ preempt_count_sub(); /* <-down_read+0x8b/0x160 */ } After: # perf ftrace -G vfs_write --graph-opts args,retaddr ...... down_read(sem=0xffff8880100bea78) { /* <-n_tty_write+0xa3/0x540 */ __cond_resched(); /* <-down_read+0x12/0x160 */ preempt_count_add(val=1); /* <-down_read+0x3b/0x160 */ preempt_count_sub(val=1); /* <-down_read+0x8b/0x160 */ } Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com> Link: https://patch.msgid.link/20251125093425.2563849-1-dolinux.peng@gmail.com Signed-off-by: pengdonglin <pengdonglin@xiaomi.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26tracing: Add boot-time backup of persistent ring bufferMasami Hiramatsu (Google)
Currently, the persistent ring buffer instance needs to be read before using it. This means we have to wait for boot up user space and dump the persistent ring buffer. However, in that case we can not start tracing on it from the kernel cmdline. To solve this limitation, this adds an option which allows to create a trace instance as a backup of the persistent ring buffer at boot. If user specifies trace_instance=<BACKUP>=<PERSIST_RB> then the <BACKUP> instance is made as a copy of the <PERSIST_RB> instance. For example, the below kernel cmdline records all syscalls, scheduler and interrupt events on the persistent ring buffer `boot_map` but before starting the tracing, it makes a `backup` instance from the `boot_map`. Thus, the `backup` instance has the previous boot events. 'reserve_mem=12M:4M:trace trace_instance=boot_map@trace,syscalls:*,sched:*,irq:* trace_instance=backup=boot_map' As you can see, this just make a copy of entire reserved area and make a backup instance on it. So you can release (or shrink) the backup instance after use it to save the memory usage. /sys/kernel/tracing/instances # free total used free shared buff/cache available Mem: 1999284 55704 1930520 10132 13060 1914628 Swap: 0 0 0 /sys/kernel/tracing/instances # rmdir backup/ /sys/kernel/tracing/instances # free total used free shared buff/cache available Mem: 1999284 40640 1945584 10132 13060 1929692 Swap: 0 0 0 Note: since there is no reason to make a copy of empty buffer, this backup only accepts a persistent ring buffer as the original instance. Also, since this backup is based on vmalloc(), it does not support user-space mmap(). Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/176377150002.219692.9425536150438129267.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26ftrace: Allow tracing of some of the tracing codeSteven Rostedt
There is times when tracing the tracing infrastructure can be useful for debugging the tracing code. Currently all files in the tracing directory are set to "notrace" the functions. Add a new config option FUNCTION_SELF_TRACING that will allow some of the files in the tracing infrastructure to be traced. It requires a config to enable because it will add noise to the function tracer if events and other tracing features are enabled. Tracing functions and events together is quite common, so not tracing the event code should be the default. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251120181514.736f2d5f@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26tracing: Use strim() in trigger_process_regex() instead of skip_spaces()Steven Rostedt
The function trigger_process_regex() is called by a few functions, where only one calls strim() on the buffer passed to it. That leaves the other functions not trimming the end of the buffer passed in and making it a little inconsistent. Remove the strim() from event_trigger_regex_write() and have trigger_process_regex() use strim() instead of skip_spaces(). The buff variable is not passed in as const, so it can be modified. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251125214032.323747707@kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26tracing: Add bulk garbage collection of freeing event_trigger_dataSteven Rostedt
The event trigger data requires a full tracepoint_synchronize_unregister() call before freeing. That call can take 100s of milliseconds to complete. In order to allow for bulk freeing of the trigger data, it can not call the tracepoint_synchronize_unregister() for every individual trigger data being free. Create a kthread that gets created the first time a trigger data is freed, and have it use the lockless llist to get the list of data to free, run the tracepoint_synchronize_unregister() then free everything in the list. By freeing hundreds of event_trigger_data elements together, it only requires two runs of the synchronization function, and not hundreds of runs. This speeds up the operation by orders of magnitude (milliseconds instead of several seconds). Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251125214032.151674992@kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>