summaryrefslogtreecommitdiff
path: root/kernel/liveupdate
AgeCommit message (Collapse)Author
2026-06-03liveupdate: Remove limit on the number of files per sessionPasha Tatashin
To remove the fixed limit on the number of preserved files per session, transition the file metadata serialization from a single contiguous memory block to a chain of linked blocks. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-11-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: Remove limit on the number of sessionsPasha Tatashin
Currently, the number of LUO sessions is limited by a fixed number of pre-allocated pages for serialization (16 pages, allowing for ~819 sessions). This limitation is problematic if LUO is used to support things such as systemd file descriptor store, and would be used not just as VM memory but to save other states on the machine. Remove this limit by transitioning to a linked-block approach for session metadata serialization. Instead of a single contiguous block, session metadata is now stored in a chain of 16-page blocks. Each block starts with a header containing the physical address of the next block and the number of session entries in the current block. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-10-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: defer session block allocation and physical address settingPasha Tatashin
Currently, luo_session_setup_outgoing() allocates the session block and sets its physical address in the header immediately. With upcoming dynamic block-based session management, this makes the first block different from the rest. Move the allocation to where it is first needed. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-9-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03kho: add support for linked-block serializationPasha Tatashin
Introduce a linked-block serialization mechanism for state handover. Previously, LUO used contiguous memory blocks for serializing sessions and files, which imposed limits on the total number of items that could be preserved across a live update. This commit adds the infrastructure for a more flexible, block-based approach where serialized data is stored in a chain of linked blocks. This is a generic KHO serialization block infrastructure that can be used by multiple subsystems. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-8-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: Extract luo_session_deserialize_one helperPasha Tatashin
Extract the logic for deserializing single entries for sessions into separate helper functions. In preparation to a linked-block serialization for sessions. This is a pure code movement, no other changes intended. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-7-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: Extract luo_file_deserialize_one helperPasha Tatashin
Extract the logic for deserializing single entries for files into separate helper functions. In preparation to a linked-block serialization for files. This is a pure code movement, no other changes intended. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-6-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: register luo_ser as KHO subtreePasha Tatashin
Entirely remove the LUO FDT wrapper since the FDT only carries the compatible string and the pointer to the centralized struct luo_ser. Instead, register the struct luo_ser via the KHO raw subtree API, placing the compatibility string inside the structure itself. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-5-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: centralize state management into struct luo_serPasha Tatashin
Transition the LUO to ABI v2, which centralizes state management into a single struct luo_ser header. Previously, LUO state was spread across multiple FDT properties and subnodes. ABI v2 simplifies this by placing all core state, including the liveupdate number and physical addresses for sessions and FLB headers into a centralized struct luo_ser. Note that this change introduces a semantic difference: the sessions and FLB serialization formats are no longer completely independent of the core LUO. Their metadata (such as physical addresses for sessions and FLB headers) is now coupled to and managed via the centralized struct luo_ser. Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-4-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: avoid mixing cleanup guards with goto in luo_session_retrieve_fdPasha Tatashin
Refactoring luo_session_retrieve_fd() to avoid mixing automated cleanup-style guards with goto-based resource release, which is not recommended under the Linux kernel coding style. Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-3-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-03liveupdate: change file_set->count type to u64 for type safetyPasha Tatashin
This improves type safety and aligns the in-memory file_set->count with the serialized count type. It avoids potential truncation or sign conversion mismatch issues. Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260603154402.468928-2-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: Remove unused ser field from struct luo_sessionPasha Tatashin
The ser field in struct luo_session was intended to point to the serialized data for a session, but it was never actually utilized in the implementation. All serialization and deserialization logic consistently uses the pointers maintained in struct luo_session_header. Remove the dead field to clean up the structure. Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260527202737.1345192-6-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: fix u-a-f in luo_file_unpreserve_files() and luo_file_finish()Pasha Tatashin
In luo_file_unpreserve_files() and luo_file_finish(), reorder module_put() and xa_erase() to ensure the file handler module remains pinned while its operations are being accessed. Specifically, luo_get_id() dereferences fh->ops->get_id, so the module reference must be held until after xa_erase() (which calls luo_get_id) completes. For luo_file_finish(), this requires moving the module_put() call out of the luo_file_finish_one() helper and into the main loop of luo_file_finish() itself. Fixes: 00d0b372374f ("liveupdate: prevent double management of files") Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260527202737.1345192-5-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: block session mutations during rebootPasha Tatashin
During the reboot() syscall, user processes may still be running concurrently and attempting to mutate sessions (e.g., creating, retrieving, or releasing sessions). To prevent this, introduce luo_session_serialize_rwsem to synchronize mutations with the serialization process. All session mutation operations (create, retrieve, release, ioctl) take the read lock. The serialization process (luo_session_serialize) takes the write lock and holds it indefinitely on success. This effectively freezes the LUO session subsystem during the transition to the new kernel. If serialization fails, the lock is released to allow recovery. Fixes: 0153094d03df ("liveupdate: luo_session: add sessions support") Reported-by: Oskar Gerlicz Kowalczuk <oskar@gerlicz.space> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/20260527202737.1345192-4-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: fix TOCTOU race in luo_session_retrieve()Pasha Tatashin
Extend the scope of the rwsem_read lock in luo_session_retrieve() to overlap with the acquisition of the session mutex. This prevents a concurrent thread from releasing and freeing the session between the lookup and the mutex lock. Fixes: 0153094d03df ("liveupdate: luo_session: add sessions support") Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260527202737.1345192-3-pasha.tatashin@soleen.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01kho: make sure scratch size is always aligned by CMA_MIN_ALIGNMENT_BYTESPratyush Yadav (Google)
When using scratch_scale, the scratch sizes are rounded up to CMA_MIN_ALIGNMENT_BYTES since they will be released as MIGRATE_CMA. This is not done when using fixed scratch sizes via command line. This can result in user specifying a size which is not aligned, and thus kernel releasing a pageblock that is only partially scratch. Do the rounding up for both cases in scratch_size_update(). Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers") Cc: stable@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/20260519160554.2713361-1-pratyush@kernel.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: document liveupdate=onPratyush Yadav (Google)
While the liveupdate= parameter is documented in kernel-parameters.txt, it is not listed in LUO's user facing documentation. This can make it hard for users to figure out how to enable the subsystem, since enabling just the config isn't enough. Note the need for the kernel parameter in LUO core documentation, which gets exported to Documentation/core-api/liveupdate.rst. Suggested-by: Luca Boccassi <luca.boccassi@gmail.com> Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Link: https://patch.msgid.link/20260519125714.2435640-1-pratyush@kernel.org Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: validate session type before performing operationPratyush Yadav (Google)
The sessions ioctls are not applicable to all session types. PRESERVE_FD is only applicable to outgoing sessions. RETRIEVE_FD and FINISH are only valid for incoming session. Calling a incoming ioctl on an outgoing session is invalid and can cause file handlers to run into unexpected errors. For example, a user can create a (outgoing) session, preserve a memfd, and then immediately do a retrieve without doing a kexec in between. This would result in memfd's retrieve handler to run. The handlers expects to be called from a post-kexec context, and will try to do a kho_restore_vmalloc() or kho_restore_folio() to try and restore memory. KHO catches this (thanks to KHO_PAGE_MAGIC) and returns an error, but since this is considered an internal error and KHO throws out a bunch of WARN()s. Associate a type with each ioctl op and validate the type in luo_session_ioctl() before dispatching the ioctl handler to make sure the op is being called for the right session type. Fixes: 16cec0d26521 ("liveupdate: luo_session: add ioctls for file preservation") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/20260519122428.2378446-1-pratyush@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: Reference count incoming FLB dataDavid Matlack
Increment the incoming FLB refcount in liveupdate_flb_get_incoming() so that the FLB structure cannot be freed while the caller is actively using it. Add an additional liveupdate_flb_put_incoming() function so the caller can explicitly indicate when it is done using the FLB data. During a Live Update, a subsystem might need to hold onto the incoming File-Lifecycle-Bound (FLB) data for an extended period, such as during device enumeration. Incrementing the reference count guarantees that the data remains valid and accessible until the subsystem releases it, preventing future use-after-free bugs. Fixes: cab056f2aae7 ("liveupdate: luo_flb: introduce File-Lifecycle-Bound global state") Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20260423174032.3140399-3-dmatlack@google.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: Use refcount_t for FLB reference countsDavid Matlack
Use refcount_t instead of a raw integer to keep track of references on incoming and outgoing FLBs. Using refcount_t provides protection from overflow, underflow, and other issues. Fixes: cab056f2aae7 ("liveupdate: luo_flb: introduce File-Lifecycle-Bound global state") Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/20260423174032.3140399-2-dmatlack@google.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: add LIVEUPDATE_SESSION_GET_NAME ioctlLuca Boccassi
Userspace when requesting a session via the ioctl specifies a name and gets a FD, but then there is no ioctl to go back the other way and get the name given a LUO session FD. This is problematic especially when there is a userspace orchestrator that wants to check what FDs it is handling for clients without having to do manual string scraping of procfs, or without procfs at all. Add a ioctl to simply get the name from an FD. Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Link: https://lore.kernel.org/r/20260429212221.814107-4-luca.boccassi@gmail.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01liveupdate: reject LIVEUPDATE_IOCTL_CREATE_SESSION with invalid name lengthLuca Boccassi
A session name must not be an empty string, and must not exceed the maximum size define in the uapi header, including null termination. Fixes: 0153094d03df ("liveupdate: luo_session: add sessions support") Cc: stable@vger.kernel.org Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://lore.kernel.org/r/20260429212221.814107-2-luca.boccassi@gmail.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01kho: make preserved pages compatible with deferred struct page initEvangelos Petrongonas
When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, struct page initialization is deferred to parallel kthreads that run later in the boot process. During KHO restoration, kho_preserved_memory_reserve() writes metadata for each preserved memory region. However, if the struct page has not been initialized, this write targets uninitialized memory, potentially leading to errors like: BUG: unable to handle page fault for address: ... Fix this by introducing kho_get_preserved_page(), which ensures all struct pages in a preserved region are initialized by calling init_deferred_page() which is a no-op when the struct page is already initialized. Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> Co-developed-by: Michal Clapinski <mclapinski@google.com> Signed-off-by: Michal Clapinski <mclapinski@google.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260423122538.140993-3-mclapinski@google.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-01kho: fix deferred initialization of scratch areasMichal Clapinski
Currently, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, kho_release_scratch() will initialize the struct pages and set migratetype of KHO scratch. Unless the whole scratch fits below first_deferred_pfn, some of that will be overwritten either by deferred_init_pages() or memmap_init_reserved_range(). To fix it, make memmap_init_range(), deferred_init_memmap_chunk() and __init_page_from_nid() recognize KHO scratch regions and set migratetype of pageblocks in those regions to MIGRATE_CMA. Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Michal Clapinski <mclapinski@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Link: https://patch.msgid.link/20260423122538.140993-2-mclapinski@google.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-05-26kho: fix order calculation for kho_unpreserve_pages()Pratyush Yadav (Google)
Commit 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") made sure preservations from kho_preserve_pages() do not span multiple NUMA nodes. If they do, the order is reduced and tried again. The same logic was not implemented for kho_unpreserve_pages(). This can result in unpreserve calculating a different order than preserve, and thus not actually unpreserving the pages. Fix this by moving the order calculation logic to __kho_preserve_pages_order() and use it from both preserve and unpreserve paths. Move __kho_unpreserve() down to avoid having a forward declaration. Its users are further down in the file anyway. Also, it results in grouping for all the page-level preservation and unpreservation functions. This unfortunately makes the diff hard to read, but the main change in __kho_unpreserve() is to call __kho_preserve_pages_order() instead of open-coding the order calculation. Fixes: 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260519133332.2498092-1-pratyush@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-05-13Merge tag 'fixes-2026-05-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux Pull liveupdate fixes from Mike Rapoport: "A few fixes for kexec handover and liveupdate: - make sure KHO is skipped for crash kernel - fix error reporting in memfd preservation if it fails mid-loop - don't allow preserving memfds whose page count exceeds UINT_MAX - fix documentation of memfd seals preservation to match the code" * tag 'fixes-2026-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: mm/memfd_luo: document preservation of file seals mm/memfd_luo: reject memfds whose page count exceeds UINT_MAX mm/memfd_luo: report error when restoring a folio fails mid-loop kho: skip KHO for crash kernel
2026-04-28kho: skip KHO for crash kernelEvangelos Petrongonas
kho_fill_kimage() unconditionally populates the kimage with KHO metadata for every kexec image type. When the image is a crash kernel, this can be problematic as the crash kernel can run in a small reserved region and the KHO scratch areas can sit outside it. The crash kernel then faults during kho_memory_init() when it tries phys_to_virt() on the KHO FDT address: Unable to handle kernel paging request at virtual address xxxxxxxx ... fdt_offset_ptr+... fdt_check_node_offset_+... fdt_first_property_offset+... fdt_get_property_namelen_+... fdt_getprop+... kho_memory_init+... mm_core_init+... start_kernel+... kho_locate_mem_hole() already skips KHO logic for KEXEC_TYPE_CRASH images, but kho_fill_kimage() was missing the same guard. As kho_fill_kimage() is the single point that populates image->kho.fdt and image->kho.scratch, fixing it here is sufficient for both arm64 and x86 as the FDT and boot_params path are bailing out when these fields are unset. Fixes: d7255959b69a ("kho: allow kexec load before KHO finalization") Signed-off-by: Evangelos Petrongonas <epetron@amazon.de> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260410011609.1103-1-epetron@amazon.de Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-04-27kho: fix error handling in kho_add_subtree()Breno Leitao
Fix two error handling issues in kho_add_subtree(), where it doesn't handle the error path correctly. 1. If fdt_setprop() fails after the subnode has been created, the subnode is not removed. This leaves an incomplete node in the FDT (missing "preserved-data" or "blob-size" properties). 2. The fdt_setprop() return value (an FDT error code) is stored directly in err and returned to the caller, which expects -errno. Fix both by storing fdt_setprop() results in fdt_err, jumping to a new out_del_node label that removes the subnode on failure, and only setting err = 0 on the success path, otherwise returning -ENOMEM (instead of FDT_ERR_ errors that would come from fdt_setprop). No user-visible changes. This patch fixes error handling in the KHO (Kexec HandOver) subsystem, which is used to preserve data across kexec reboots. The fix only affects a rare failure path during kexec preparation — specifically when the kernel runs out of space in the Flattened Device Tree buffer while registering preserved memory regions. In the unlikely event that this error path was triggered, the old code would leave a malformed node in the device tree and return an incorrect error code to the calling subsystem, which could lead to confusing log messages or incorrect recovery decisions. With this fix, the incomplete node is properly cleaned up and the appropriate errno value is propagated, this error code is not returned to the user. Link: https://lore.kernel.org/20260410-kho_fix_send-v2-1-1b4debf7ee08@debian.org Fixes: 3dc92c311498 ("kexec: add Kexec HandOver (KHO) generation helpers") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Breno Leitao <leitao@debian.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-27liveupdate: fix return value on session allocation failurePasha Tatashin
When session allocation fails during deserialization, the global 'err' variable was not updated before returning. This caused subsequent calls to luo_session_deserialize() to incorrectly report success. Ensure 'err' is set to the error code from PTR_ERR(session). This ensures that an error is correctly returned to userspace when it attempts to open /dev/liveupdate in the new kernel if deserialization failed. Link: https://lore.kernel.org/20260415193738.515491-1-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-19Merge tag 'mm-stable-2026-04-18-02-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
2026-04-18liveupdate: defer file handler module refcounting to active sessionsPasha Tatashin
Stop pinning modules indefinitely upon file handler registration. Instead, dynamically increment the module reference count only when a live update session actively uses the file handler (e.g., during preservation or deserialization), and release it when the session ends. This allows modules providing live update handlers to be gracefully unloaded when no live update is in progress. Link: https://lore.kernel.org/20260327033335.696621-11-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: make unregister functions return voidPasha Tatashin
Change liveupdate_unregister_file_handler and liveupdate_unregister_flb to return void instead of an error code. This follows the design principle that unregistration during module unload should not fail, as the unload cannot be stopped at that point. Link: https://lore.kernel.org/20260327033335.696621-10-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: remove liveupdate_test_unregister()Pasha Tatashin
Now that file handler unregistration automatically unregisters all associated file handlers (FLBs), the liveupdate_test_unregister() function is no longer needed. Remove it along with its usages and declarations. Link: https://lore.kernel.org/20260327033335.696621-9-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: auto unregister FLBs on file handler unregistrationPasha Tatashin
To ensure that unregistration is always successful and doesn't leave dangling resources, introduce auto-unregistration of FLBs: when a file handler is unregistered, all FLBs associated with it are automatically unregistered. Introduce a new helper luo_flb_unregister_all() which unregisters all FLBs linked to the given file handler. Link: https://lore.kernel.org/20260327033335.696621-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: remove luo_session_quiesce()Pasha Tatashin
Now that FLB module references are handled dynamically during active sessions, we can safely remove the luo_session_quiesce() and luo_session_resume() mechanism. Link: https://lore.kernel.org/20260327033335.696621-7-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: defer FLB module refcounting to active sessionsPasha Tatashin
Stop pinning modules indefinitely upon FLB registration. Instead, dynamically take a module reference when the FLB is actively used in a session (e.g., during preserve and retrieve) and release it when the session concludes. This allows modules providing FLB operations to be cleanly unloaded when not in active use by the live update orchestrator. Link: https://lore.kernel.org/20260327033335.696621-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: protect FLB lists with luo_register_rwlockPasha Tatashin
Because liveupdate FLB objects will soon drop their persistent module references when registered, list traversals must be protected against concurrent module unloading. To provide this protection, utilize the global luo_register_rwlock. It protects the global registry of FLBs and the handler's specific list of FLB dependencies. Read locks are used during concurrent list traversals (e.g., during preservation and serialization). Write locks are taken during registration and unregistration. Link: https://lore.kernel.org/20260327033335.696621-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: protect file handler list with rwsemPasha Tatashin
Because liveupdate file handlers will no longer hold a module reference when registered, we must ensure that the access to the handler list is protected against concurrent module unloading. Utilize the global luo_register_rwlock to protect the global registry of file handlers. Read locks are taken during list traversals in luo_preserve_file() and luo_file_deserialize(). Write locks are taken during registration and unregistration. Link: https://lore.kernel.org/20260327033335.696621-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: synchronize lazy initialization of FLB private statePasha Tatashin
The luo_flb_get_private() function, which is responsible for lazily initializing the private state of FLB objects, can be called concurrently from multiple threads. This creates a data race on the 'initialized' flag and can lead to multiple executions of mutex_init() and INIT_LIST_HEAD() on the same memory. Introduce a static spinlock (luo_flb_init_lock) local to the function to synchronize the initialization path. Use smp_load_acquire() and smp_store_release() for memory ordering between the fast path and the slow path. Link: https://lore.kernel.org/20260327033335.696621-3-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: safely print untrusted stringsPasha Tatashin
Patch series "liveupdate: Fix module unloading and unregister API", v3. This patch series addresses an issue with how LUO handles module reference counting and unregistration during a module unload (e.g., via rmmod). Currently, modules that register live update file handlers are pinned for the entire duration they are registered. This prevents the modules from being unloaded gracefully, even when no live update session is in progress. Furthermore, if a module is forcefully unloaded, the unregistration functions return an error (e.g. -EBUSY) if a session is active, which is ignored by the kernel's module unload path, leaving dangling pointers in the LUO global lists. To resolve these issues, this series introduces the following changes: 1. Adds a global read-write semaphore (luo_register_rwlock) to protect the registration lists for both file handlers and FLBs. 2. Reduces the scope of module reference counting for file handlers and FLBs. Instead of pinning modules indefinitely upon registration, references are now taken only when they are actively used in a live update session (e.g., during preservation, retrieval, or deserialization). 3. Removes the global luo_session_quiesce() mechanism since module unload behavior now handles active sessions implicitly. 4. Introduces auto-unregistration of FLBs during file handler unregistration to prevent leaving dangling resources. 5. Changes the unregistration functions to return void instead of an error code. 6. Fixes a data race in luo_flb_get_private() by introducing a spinlock for thread-safe lazy initialization. 7. Strengthens security by using %.*s when printing untrusted deserialized compatible strings and session names to prevent out-of-bounds reads. This patch (of 10): Deserialized strings from KHO data (such as file handler compatible strings and session names) are provided by the previous kernel and might not be null-terminated if the data is corrupted or maliciously crafted. When printing these strings in error messages, use the %.*s format specifier with the maximum buffer size to prevent out-of-bounds reads into adjacent kernel memory. Link: https://lore.kernel.org/20260327033335.696621-1-pasha.tatashin@soleen.com Link: https://lore.kernel.org/20260327033335.696621-2-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18liveupdate: prevent double management of filesPasha Tatashin
Patch series "liveupdate: prevent double preservation", v4. Currently, LUO does not prevent the same file from being managed twice across different active sessions. Because LUO preserves files of absolutely different types: memfd, and upcoming vfiofd [1], iommufd [2], guestmefd (and possible kvmfd/cpufd). There is no common private data or guarantee on how to prevent that the same file is not preserved twice beside using inode or some slower and expensive method like hashtables. This patch (of 4) Currently, LUO does not prevent the same file from being managed twice across different active sessions. Use a global xarray luo_preserved_files to keep track of file identifiers being preserved by LUO. Update luo_preserve_file() to check and insert the file identifier into this xarray when it is preserved, and erase it in luo_file_unpreserve_files() when it is released. To allow handlers to define what constitutes a "unique" file (e.g., different struct file objects pointing to the same hardware resource), add a get_id() callback to struct liveupdate_file_ops. If not provided, the default identifier is the struct file pointer itself. This ensures that the same file (or resource) cannot be managed by multiple sessions. If another session attempts to preserve an already managed file, it will now fail with -EBUSY. Link: https://lore.kernel.org/20260326163943.574070-1-pasha.tatashin@soleen.com Link: https://lore.kernel.org/20260326163943.574070-2-pasha.tatashin@soleen.com Link: https://lore.kernel.org/all/20260129212510.967611-1-dmatlack@google.com [1] Link: https://lore.kernel.org/all/20260203220948.2176157-1-skhawaja@google.com [2] Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Matlack <dmatlack@google.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18kho: kexec-metadata: track previous kernel chainBreno Leitao
Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time. Example output: [ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107 (count 1) Motivation ========== Bugs that only reproduce when kexecing from specific kernel versions are difficult to diagnose. These issues occur when a buggy kernel kexecs into a new kernel, with the bug manifesting only in the second kernel. Recent examples include the following commits: * commit eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition") * commit 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption") * commit 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot") As kexec-based reboots become more common, these version-dependent bugs are appearing more frequently. At scale, correlating crashes to the previous kernel version is challenging, especially when issues only occur in specific transition scenarios. Implementation ============== The kexec metadata is stored as a plain C struct (struct kho_kexec_metadata) rather than FDT format, for simplicity and direct field access. It is registered via kho_add_subtree() as a separate subtree, keeping it independent from the core KHO ABI. This design choice: - Keeps the core KHO ABI minimal and stable - Allows the metadata format to evolve independently - Avoids requiring version bumps for all KHO consumers (LUO, etc.) when the metadata format changes The struct kho_kexec_metadata contains two fields: - previous_release: The kernel version that initiated the kexec - kexec_count: Number of kexec boots since last cold boot On cold boot, kexec_count starts at 0 and increments with each kexec. The count helps identify issues that only manifest after multiple consecutive kexec reboots. [leitao@debian.org: call kho_kexec_metadata_init() for both boot paths] Link: https://lore.kernel.org/all/20260309-kho-v8-5-c3abcf4ac750@debian.org/ [1] Link: https://lore.kernel.org/20260409-kho_fix_merge_issue-v1-1-710c84ceaa85@debian.org Link: https://lore.kernel.org/20260316-kho-v9-5-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18kho: fix kho_in_debugfs_init() to handle non-FDT blobsBreno Leitao
kho_in_debugfs_init() calls fdt_totalsize() to determine blob sizes, which assumes all blobs are FDTs. This breaks for non-FDT blobs like struct kho_kexec_metadata. Fix this by reading the "blob-size" property from the FDT (persisted by kho_add_subtree()) instead of calling fdt_totalsize(). Also rename local variables from fdt_phys/sub_fdt to blob_phys/blob for consistency with the non-FDT-specific naming. Link: https://lore.kernel.org/20260316-kho-v9-4-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18kho: persist blob size in KHO FDTBreno Leitao
kho_add_subtree() accepts a size parameter but only forwards it to debugfs. The size is not persisted in the KHO FDT, so it is lost across kexec. This makes it impossible for the incoming kernel to determine the blob size without understanding the blob format. Store the blob size as a "blob-size" property in the KHO FDT alongside the "preserved-data" physical address. This allows the receiving kernel to recover the size for any blob regardless of format. Also extend kho_retrieve_subtree() with an optional size output parameter so callers can learn the blob size without needing to understand the blob format. Update all callers to pass NULL for the new parameter. Link: https://lore.kernel.org/20260316-kho-v9-3-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18kho: rename fdt parameter to blob in kho_add/remove_subtree()Breno Leitao
Since kho_add_subtree() now accepts arbitrary data blobs (not just FDTs), rename the parameter from 'fdt' to 'blob' to better reflect its purpose. Apply the same rename to kho_remove_subtree() for consistency. Also rename kho_debugfs_fdt_add() and kho_debugfs_fdt_remove() to kho_debugfs_blob_add() and kho_debugfs_blob_remove() respectively, with the same parameter rename from 'fdt' to 'blob'. Link: https://lore.kernel.org/20260316-kho-v9-2-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18kho: add size parameter to kho_add_subtree()Breno Leitao
Patch series "kho: history: track previous kernel version and kexec boot count", v9. Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time. Example ======= [ 0.000000] Linux version 6.19.0-rc3-upstream-00047-ge5d992347849 ... [ 0.000000] KHO: exec from: 6.19.0-rc4-next-20260107upstream-00004-g3071b0dc4498 (count 1) Motivation ========== Bugs that only reproduce when kexecing from specific kernel versions are difficult to diagnose. These issues occur when a buggy kernel kexecs into a new kernel, with the bug manifesting only in the second kernel. Recent examples include: * eb2266312507 ("x86/boot: Fix page table access in 5-level to 4-level paging transition") * 77d48d39e991 ("efistub/tpm: Use ACPI reclaim memory for event log to avoid corruption") * 64b45dd46e15 ("x86/efi: skip memattr table on kexec boot") As kexec-based reboots become more common, these version-dependent bugs are appearing more frequently. At scale, correlating crashes to the previous kernel version is challenging, especially when issues only occur in specific transition scenarios. Some bugs manifest only after multiple consecutive kexec reboots. Tracking the kexec count helps identify these cases (this metric is already used by live update sub-system). KHO provides a reliable mechanism to pass information between kernels. By carrying the previous kernel's release string and kexec count forward, we can print this context at boot time to aid debugging. The goal of this feature is to have this information being printed in early boot, so, users can trace back kernel releases in kexec. Systemd is not helpful because we cannot assume that the previous kernel has systemd or even write access to the disk (common when using Linux as bootloaders) This patch (of 6): kho_add_subtree() assumes the fdt argument is always an FDT and calls fdt_totalsize() on it in the debugfs code path. This assumption will break if a caller passes arbitrary data instead of an FDT. When CONFIG_KEXEC_HANDOVER_DEBUGFS is enabled, kho_debugfs_fdt_add() calls __kho_debugfs_fdt_add(), which executes: f->wrapper.size = fdt_totalsize(fdt); Fix this by adding an explicit size parameter to kho_add_subtree() so callers specify the blob size. This allows subtrees to contain arbitrary data formats, not just FDTs. Update all callers: - memblock.c: use fdt_totalsize(fdt) - luo_core.c: use fdt_totalsize(fdt_out) - test_kho.c: use fdt_totalsize() - kexec_handover.c (root fdt): use fdt_totalsize(kho_out.fdt) Also update __kho_debugfs_fdt_add() to receive the size explicitly instead of computing it internally via fdt_totalsize(). In kho_in_debugfs_init(), pass fdt_totalsize() for the root FDT and sub-blobs since all current users are FDTs. A subsequent patch will persist the size in the KHO FDT so the incoming side can handle non-FDT blobs correctly. Link: https://lore.kernel.org/20260323110747.193569-1-duanchenghao@kylinos.cn Link: https://lore.kernel.org/20260316-kho-v9-1-ed6dcd951988@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-06liveupdate: propagate file deserialization failuresLeo Timmins
luo_session_deserialize() ignored the return value from luo_file_deserialize(). As a result, a session could be left partially restored even though the /dev/liveupdate open path treats deserialization failures as fatal. Propagate the error so a failed file deserialization aborts session deserialization instead of silently continuing. Link: https://lkml.kernel.org/r/20260325044608.8407-1-leotimmins1974@gmail.com Link: https://lkml.kernel.org/r/20260325044608.8407-2-leotimmins1974@gmail.com Fixes: 16cec0d26521 ("liveupdate: luo_session: add ioctls for file preservation") Signed-off-by: Leo Timmins <leotimmins1974@gmail.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05kho: drop restriction on maximum page orderPratyush Yadav
KHO currently restricts the maximum order of a restored page to the maximum order supported by the buddy allocator. While this works fine for much of the data passed across kexec, it is possible to have pages larger than MAX_PAGE_ORDER. For one, it is possible to get a larger order when using kho_preserve_pages() if the number of pages is large enough, since it tries to combine multiple aligned 0-order preservations into one higher order preservation. For another, upcoming support for hugepages can have gigantic hugepages being preserved over KHO. There is no real reason for this limit. The KHO preservation machinery can handle any page order. Remove this artificial restriction on max page order. Link: https://lkml.kernel.org/r/20260309123410.382308-2-pratyush@kernel.org Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05kho: make sure preservations do not span multiple NUMA nodesPratyush Yadav (Google)
The KHO restoration machinery is not capable of dealing with preservations that span multiple NUMA nodes. kho_preserve_folio() guarantees the preservation will only span one NUMA node since folios can't span multiple nodes. This leaves kho_preserve_pages(). While semantically kho_preserve_pages() only deals with 0-order pages, so all preservations should be single page only, in practice it combines preservations to higher orders for efficiency. This can result in a preservation spanning multiple nodes. Break up the preservations into a smaller order if that happens. Link: https://lkml.kernel.org/r/20260309123410.382308-1-pratyush@kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05kho: fix KASAN support for restored vmalloc regionsPasha Tatashin
Restored vmalloc regions are currently not properly marked for KASAN, causing KASAN to treat accesses to these regions as out-of-bounds. Fix this by properly unpoisoning the restored vmalloc area using kasan_unpoison_vmalloc(). This requires setting the VM_UNINITIALIZED flag during the initial area allocation and clearing it after the pages have been mapped and unpoisoned, using the clear_vm_uninitialized_flag() helper. Link: https://lkml.kernel.org/r/20260225223857.1714801-3-pasha.tatashin@soleen.com Fixes: a667300bd53f ("kho: add support for preserving vmalloc allocations") Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reported-by: Pratyush Yadav <pratyush@kernel.org> Reviewed-by: Pratyush Yadav (Google) <pratyush@kernel.org> Tested-by: Pratyush Yadav (Google) <pratyush@kernel.org> Cc: Alexander Graf <graf@amazon.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>