summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-20include/linux/once_lite.h: fix judgment in WARN_ONCE with clangXie Yuanbin
For c code: ```c extern int xx; void test(void) { if (WARN_ONCE(xx, "x")) __asm__ volatile ("nop":::); } ``` Clang will generate the following assembly code: ```assemble test: movl xx(%rip), %eax // Assume xx == 0 (likely case) testl %eax, %eax // judge once je .LBB0_3 // jump to .LBB0_3 testb $1, test.__already_done(%rip) je .LBB0_2 .LBB0_3: testl %eax, %eax // judge again je .LBB0_5 // jump to .LBB0_5 .LBB0_4: nop .LBB0_5: retq // omit ``` In the above code, `xx == 0` should be a likely case, but in this case, xx has been judged twice. Test info: 1. kernel source: linux-next commit 9c0826a5d9aa4d52206d ("Add linux-next specific files for 20251107") 2. compiler: clang: Debian clang version 21.1.4 (8) with Debian LLD 21.1.4 (compatible with GNU linkers) 3. config: base on default x86_64_defconfig, and setting: CONFIG_MITIGATION_RETHUNK=n CONFIG_STACKPROTECTOR=n Add unlikely to __ret_cond to help the compiler optimize correctly. [akpm@linux-foundation.org: undo whitespace changes] Link: https://lkml.kernel.org/r/20251109083715.24495-1-qq570070308@gmail.com Signed-off-by: Xie Yuanbin <qq570070308@gmail.com> Cc: Bill Wendling <morbo@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Justin Stitt <justinstitt@google.com> Cc: Maninder Singh <maninder1.s@samsung.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20math.h: amend abs() kernel-doc and add a note about signed type limitsAndy Shevchenko
- amend the kernel-doc so the description is decoupled from the parameter descriptions. - add a note to explain behaviour for the signed types when supplied value is the minimum (e.g., INT_MIN for int type). Link: https://lkml.kernel.org/r/20251106152051.2361551-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20lib: mul_u64_u64_div_u64(): optimise multiply on 32bit x86David Laight
gcc generates horrid code for both ((u64)u32_a * u32_b) and (u64_a + u32_b). As well as the extra instructions it can generate a lot of spills to stack (including spills of constant zeros and even multiplies by constant zero). mul_u32_u32() already exists to optimise the multiply. Add a similar add_u64_32() for the addition. Disable both for clang - it generates better code without them. Move the 64x64 => 128 multiply into a static inline helper function for code clarity. No need for the a/b_hi/lo variables, the implicit casts on the function calls do the work for us. Should have minimal effect on the generated code. Use mul_u32_u32() and add_u64_u32() in the 64x64 => 128 multiply in mul_u64_add_u64_div_u64(). Link: https://lkml.kernel.org/r/20251105201035.64043-8-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20lib: add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()David Laight
The existing mul_u64_u64_div_u64() rounds down, a 'rounding up' variant needs 'divisor - 1' adding in between the multiply and divide so cannot easily be done by a caller. Add mul_u64_add_u64_div_u64(a, b, c, d) that calculates (a * b + c)/d and implement the 'round down' and 'round up' using it. Update the x86-64 asm to optimise for 'c' being a constant zero. Add kerndoc definitions for all three functions. Link: https://lkml.kernel.org/r/20251105201035.64043-5-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Nicolas Pitre <npitre@baylibre.com> Cc: Biju Das <biju.das.jz@bp.renesas.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Li RongQing <lirongqing@baidu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20util_macros.h: fix kernel-doc for u64_to_user_ptr()Andy Shevchenko
The added documentation to u64_to_user_ptr() misspelled the function name. Fix it. Link: https://lkml.kernel.org/r/20251104183834.1046584-1-andriy.shevchenko@linux.intel.com Fixes: 029c896c4105 ("kernel.h: move PTR_IF() and u64_to_user_ptr() to util_macros.h") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexandru Ardelean <aardelean@baylibre.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20panic: sys_info: align constant definition names with parametersAndy Shevchenko
Align constant definition names with parameters to make it easier to map. It's also better to maintain and extend the names while keeping their uniqueness. Link: https://lkml.kernel.org/r/20251030132007.3742368-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: propagate VM_SOFTDIRTY on mergeLorenzo Stoakes
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2. Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. Now we have the concept of making VMA flags 'sticky', that is that they both don't prevent merge and, importantly, are propagated to merged VMAs, this seems a sensible alternative to the existing special-casing of VM_SOFTDIRTY. We additionally add a self-test that demonstrates that this logic behaves as expected. This patch (of 2): Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by establishing a new VMA, or via merge) as implemented in __mmap_complete() and do_brk_flags(). However, when performing a merge of existing mappings such as when performing mprotect(), we may lose the VM_SOFTDIRTY flag. This is because currently we simply ignore VM_SOFTDIRTY for the purposes of merge, so one VMA may possess the flag and another not, and whichever happens to be the target VMA will be the one upon which the merge is performed which may or may not have VM_SOFTDIRTY set. Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one which solves this issue. Additionally update VMA userland tests to propagate changes. [akpm@linux-foundation.org: update comments, per Lorenzo] Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local Link: https://lkml.kernel.org/r/cover.1763399675.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Andrey Vagin <avagin@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm/damon: rename damos->filters to damos->core_filtersSeongJae Park
DAMOS filters that are handled by the ops layer are linked to damos->ops_filters. Owing to the ops_ prefix on the name, it is easy to understand it is for ops layer handled filters. The other types of filters, which are handled by the core layer, are linked to damos->filters. Because of the name, it is easy to confuse the list is there for not only core layer handled ones but all filters. Avoid such confusions by renaming the field to core_filters. Link: https://lkml.kernel.org/r/20251112154114.66053-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Justin Stitt <justinstitt@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm/damon: rename damos core filter helpers to have word coreSeongJae Park
Patch series "mm/damon: misc cleanups". Yet another batch of misc cleanups and refactoring for DAMON code, tests, and documents. First two patches (1and 2) rename DAMOS core filters related code for readability. Three following patches (3-5) refactor page table walk callback functions in DAMON, as suggested by Hugh and David, and I promised. Next two patches (6 and 7) refactor DAMON core layer kunit test and sysfs interface selftest to be simple and deduplicated. Final two patches (8 and 9) fix up sphinx and grammatical errors on documents. This patch (of 9): DAMOS filters handled by the core layer are called core filters, while those handled by the ops layer are called ops filters. They share the same type but are managed in different places since core filters are evaluated before the ops filters. They also have different helper functions that depend on their managed places. The helper functions for ops filters have '_ops_' keyword on their name, so it is easy to know they are for ops filters. Meanwhile, the helper functions for core filters are not having the 'core' keyword on their name. This makes it easy to be mistakenly used for ops filters. Actually there was such a bug. To avoid future mistakes from similar confusions, rename DAMOS core filters helper functions to have a keyword 'core' on their names. Link: https://lkml.kernel.org/r/20251112154114.66053-1-sj@kernel.org Link: https://lkml.kernel.org/r/20251112154114.66053-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Bill Wendling <morbo@google.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Justin Stitt <justinstitt@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: add vma_start_write_killable()Matthew Wilcox (Oracle)
Patch series "vma_start_write_killable"", v2. When we added the VMA lock, we made a major oversight in not adding a killable variant. That can run us into trouble where a thread takes the VMA lock for read (eg handling a page fault) and then goes out to lunch for an hour (eg doing reclaim). Another thread tries to modify the VMA, taking the mmap_lock for write, then attempts to lock the VMA for write. That blocks on the first thread, and ensures that every other page fault now tries to take the mmap_lock for read. Because everything's in an uninterruptible sleep, we can't kill the task, which makes me angry. This patchset just adds vma_start_write_killable() and converts one caller to use it. Most users are somewhat tricky to convert, so expect follow-up individual patches per call-site which need careful analysis to make sure we've done proper cleanup. This patch (of 2): The vma can be held read-locked for a substantial period of time, eg if memory allocation needs to go into reclaim. It's useful to be able to send fatal signals to threads which are waiting for the write lock. Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Chris Li <chriscli@google.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: memcg: dump memcg protection info on oom or alloc failuresShakeel Butt
Currently kernel dumps memory state on oom and allocation failures. One of the question usually raised on those dumps is why the kernel has not reclaimed the reclaimable memory instead of triggering oom. One potential reason is the usage of memory protection provided by memcg. So, let's also dump the memory protected by the memcg in such reports to ease the debugging. Link: https://lkml.kernel.org/r/20251107234041.3632644-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD oneLorenzo Stoakes
Gather all the VMA flags whose presence implies that page tables must be copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this rather than specifying individual flags in vma_needs_copy(). We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies that there may be metadata contained in the page tables (that is - guard markers) which would will not and cannot be propagated upon fork. This was already being done manually previously in vma_needs_copy(), but this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and VM_UFFD_WP all of which imply the same. Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too - because equally a flag being VM_STICKY indicates that the VMA contains metadat that is not propagated by being faulted in - i.e. that the VMA metadata does not fully describe the VMA alone, and thus we must propagate whatever metadata there is on a fork. However, for maximum flexibility, we do not make this necessarily the case here. Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: implement sticky VMA flagsLorenzo Stoakes
It is useful to be able to designate that certain flags are 'sticky', that is, if two VMAs are merged one with a flag of this nature and one without, the merged VMA sets this flag. As a result we ignore these flags for the purposes of determining VMA flag differences between VMAs being considered for merge. This patch therefore updates the VMA merge logic to perform this action, with flags possessing this property being described in the VM_STICKY bitmap. Those flags which ought to be ignored for the purposes of VMA merge are described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also updated to use. As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it already had this behaviour, alongside VM_STICKY as sticky flags by implication must not disallow merge. Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its own right, but this change is out of scope for this series. The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result of this change, once the VMA flag is set upon guard region installation, VMAs with guard ranges will now not have their merge behaviour impacted as a result and can be freely merged with other VMAs without VM_MAYBE_GUARD set. Also update the comments for vma_modify_flags() to directly reference sticky flags now we have established the concept. We also update the VMA userland tests to account for the changes. Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: add atomic VMA flags and set VM_MAYBE_GUARD as suchLorenzo Stoakes
This patch adds the ability to atomically set VMA flags with only the mmap read/VMA read lock held. As this could be hugely problematic for VMA flags in general given that all other accesses are non-atomic and serialised by the mmap/VMA locks, we implement this with a strict allow-list - that is, only designated flags are allowed to do this. We make VM_MAYBE_GUARD one of these flags. Link: https://lkml.kernel.org/r/97e57abed09f2663077ed7a36fb8206e243171a9.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smapsLorenzo Stoakes
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4. Currently, guard regions are not visible to users except through /proc/$pid/pagemap, with no explicit visibility at the VMA level. This makes the feature less useful, as it isn't entirely apparent which VMAs may have these entries present, especially when performing actions which walk through memory regions such as those performed by CRIU. This series addresses this issue by introducing the VM_MAYBE_GUARD flag which fulfils this role, updating the smaps logic to display an entry for these. The semantics of this flag are that a guard region MAY be present if set (we cannot be sure, as we can't efficiently track whether an MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if not set the VMA definitely does NOT have any guard regions present. It's problematic to establish this flag without further action, because that means that VMAs with guard regions in them become non-mergeable with adjacent VMAs for no especially good reason. To work around this, this series also introduces the concept of 'sticky' VMA flags - that is flags which: a. if set in one VMA and not in another still permit those VMAs to be merged (if otherwise compatible). b. When they are merged, the resultant VMA must have the flag set. The VMA logic is updated to propagate these flags correctly. Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve an issue with file-backed guard regions - previously these established an anon_vma object for file-backed mappings solely to have vma_needs_copy() correctly propagate guard region mappings to child processes. We introduce a new flag alias VM_COPY_ON_FORK (which currently only specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly for this flag and to copy page tables if it is present, which resolves this issue. Additionally, we add the ability for allow-listed VMA flags to be atomically writable with only mmap/VMA read locks held. The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure does not cause any races by being allowed to do so. This allows us to maintain guard region installation as a read-locked operation and not endure the overhead of obtaining a write lock here. Finally we introduce extensive VMA userland tests to assert that the sticky VMA logic behaves correctly as well as guard region self tests to assert that smaps visibility is correctly implemented. This patch (of 9): Currently, if a user needs to determine if guard regions are present in a range, they have to scan all VMAs (or have knowledge of which ones might have guard regions). Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to pagemap") and the related commit a516403787e0 ("fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions"), users can use either /proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this operation at a virtual address level. This is not ideal, and it gives no visibility at a /proc/$pid/smaps level that guard regions exist in ranges. This patch remedies the situation by establishing a new VMA flag, VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is uncertain because we cannot reasonably determine whether a MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and additionally VMAs may change across merge/split). We utilise 0x800 for this flag which makes it available to 32-bit architectures also, a flag that was previously used by VM_DENYWRITE, which was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't bee reused yet. We also update the smaps logic and documentation to identify these VMAs. Another major use of this functionality is that we can use it to identify that we ought to copy page tables on fork. We do not actually implement usage of this flag in mm/madvise.c yet as we need to allow some VMA flags to be applied atomically under mmap/VMA read lock in order to avoid the need to acquire a write lock for this purpose. Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Andrei Vagin <avagin@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20mm: kill mm_wr_locked from unmap_vmas() and unmap_single_vma()Kefeng Wang
Kill mm_wr_locked since commit f8e97613fed2 ("mm: convert VM_PFNMAP tracking to pfnmap_track() + pfnmap_untrack()") remove the user. Link: https://lkml.kernel.org/r/20251104085709.2688433-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20lockd: don't allow locking on reexported NFSv2/3Jeff Layton
Since commit 9254c8ae9b81 ("nfsd: disallow file locking and delegations for NFSv4 reexport"), file locking when reexporting an NFS mount via NFSv4 is expressly prohibited by nfsd. Do the same in lockd: Add a new nlmsvc_file_cannot_lock() helper that will test whether file locking is allowed for a given file, and return nlm_lck_denied_nolocks if it isn't. Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-20Merge back material related to system sleep for 6.19Rafael J. Wysocki
2025-11-20sched: Provide and use set_need_resched_current()Peter Zijlstra
set_tsk_need_resched(current) requires set_preempt_need_resched(current) to work correctly outside of the scheduler. Provide set_need_resched_current() which wraps this correctly and replace all the open coded instances. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251116174750.665769842@linutronix.de
2025-11-20io_uring/kbuf: remove obsolete buf_nr_pages and update commentsJoanne Koong
The buf_nr_pages field in io_buffer_list was previously used to determine whether the buffer list uses ring-provided buffers or classic provided buffers. This is now determined by checking the IOBL_BUF_RING flag. Remove the buf_nr_pages field and update related comments. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-20timers/migration: Exclude isolated cpus from hierarchyGabriele Monaco
The timer migration mechanism allows active CPUs to pull timers from idle ones to improve the overall idle time. This is however undesired when CPU intensive workloads run on isolated cores, as the algorithm would move the timers from housekeeping to isolated cores, negatively affecting the isolation. Exclude isolated cores from the timer migration algorithm, extend the concept of unavailable cores, currently used for offline ones, to isolated ones: * A core is unavailable if isolated or offline; * A core is available if non isolated and online; A core is considered unavailable as isolated if it belongs to: * the isolcpus (domain) list * an isolated cpuset Except if it is: * in the nohz_full list (already idle for the hierarchy) * the nohz timekeeper core (must be available to handle global timers) CPUs are added to the hierarchy during late boot, excluding isolated ones, the hierarchy is also adapted when the cpuset isolation changes. Due to how the timer migration algorithm works, any CPU part of the hierarchy can have their global timers pulled by remote CPUs and have to pull remote timers, only skipping pulling remote timers would break the logic. For this reason, prevent isolated CPUs from pulling remote global timers, but also the other way around: any global timer started on an isolated CPU will run there. This does not break the concept of isolation (global timers don't come from outside the CPU) and, if considered inappropriate, can usually be mitigated with other isolation techniques (e.g. IRQ pinning). This effect was noticed on a 128 cores machine running oslat on the isolated cores (1-31,33-63,65-95,97-127). The tool monopolises CPUs, and the CPU with lowest count in a timer migration hierarchy (here 1 and 65) appears as always active and continuously pulls global timers, from the housekeeping CPUs. This ends up moving driver work (e.g. delayed work) to isolated CPUs and causes latency spikes: before the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 1203 10 3 4 ... 5 (us) after the change: # oslat -c 1-31,33-63,65-95,97-127 -D 62s ... Maximum: 10 4 3 4 3 ... 5 (us) The same behaviour was observed on a machine with as few as 20 cores / 40 threads with isocpus set to: 1-9,11-39 with rtla-osnoise-top. Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: John B. Wyatt IV <jwyatt@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-8-gmonaco@redhat.com
2025-11-20cpumask: Add initialiser to use cleanup helpersYury Norov
Now we can simplify a code that allocates cpumasks for local needs. Automatic variables have to be initialized at declaration, or at least before any possibility for the logic to return, so that compiler wouldn't try to call an associate destructor function on a random stack number. Because cpumask_var_t, depending on the CPUMASK_OFFSTACK config, is either a pointer or an array, we have to have a macro for initialization. So define a CPUMASK_VAR_NULL macro, which allows to init struct cpumask pointer with NULL when CPUMASK_OFFSTACK is enabled, and effectively a no-op when CPUMASK_OFFSTACK is disabled (initialisation optimised out with -O2). Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20251120145653.296659-7-gmonaco@redhat.com
2025-11-20vfio: Export vfio device get and put registration helpersVivek Kasireddy
These helpers are useful for managing additional references taken on the device from other associated VFIO modules. Original-patch-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-7-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20dma-buf: provide phys_vec to scatter-gather mapping routineLeon Romanovsky
Add dma_buf_phys_vec_to_sgt() and dma_buf_free_sgt() helpers to convert an array of MMIO physical address ranges into scatter-gather tables with proper DMA mapping. These common functions are a starting point and support any PCI drivers creating mappings from their BAR's MMIO addresses. VFIO is one case, as shortly will be RDMA. We can review existing DRM drivers to refactor them separately. We hope this will evolve into routines to help common DRM that include mixed CPU and MMIO mappings. Compared to the dma_map_resource() abuse this implementation handles the complicated PCI P2P scenarios properly, especially when an IOMMU is enabled: - Direct bus address mapping without IOVA allocation for PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This happens if the IOMMU is enabled but the PCIe switch ACS flags allow transactions to avoid the host bridge. Further, this handles the slightly obscure, case of MMIO with a phys_addr_t that is different from the physical BAR programming (bus offset). The phys_addr_t is converted to a dma_addr_t and accommodates this effect. This enables certain real systems to work, especially on ARM platforms. - Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE). This happens when the IOMMU is enabled and the ACS flags are forcing all traffic to the IOMMU - ie for virtualization systems. - Cases where P2P is not supported through the host bridge/CPU. The P2P subsystem is the proper place to detect this and block it. Helper functions fill_sg_entry() and calc_sg_nents() handle the scatter-gather table construction, splitting large regions into UINT_MAX-sized chunks to fit within sg->length field limits. Since the physical address based DMA API forbids use of the CPU list of the scatterlist this will produce a mangled scatterlist that has a fully zero-length and NULL'd CPU list. The list is 0 length, all the struct page pointers are NULL and zero sized. This is stronger and more robust than the existing mangle_sg_table() technique. It is a future project to migrate DMABUF as a subsystem away from using scatterlist for this data structure. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-6-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() functionLeon Romanovsky
Provide an access to pci_p2pdma_map_type() function to allow subsystems to determine the appropriate mapping type for P2PDMA transfers between a provider and target device. The pci_p2pdma_map_type() function is the core P2P layer version of the existing public, but struct page focused, pci_p2pdma_state() function. It returns the same result. It is required to use the p2p subsystem from drivers that don't use the struct page layer. Like __pci_p2pdma_update_state() it is not an exported function. The idea is that only subsystem code will implement mapping helpers for taking in phys_addr_t lists, this is deliberately not made accessible to every driver to prevent abuse. Following patches will use this function to implement a shared DMA mapping helper for DMABUF. Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-4-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocationLeon Romanovsky
Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like DMABUF to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pcim_p2pdma_provider() function that returns a p2pdma_provider structure. Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-3-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Simplify bus address mapping APILeon Romanovsky
Update the pci_p2pdma_bus_addr_map() function to take a direct pointer to the p2pdma_provider structure instead of the pci_p2pdma_map_state. This simplifies the API by removing the need for callers to extract the provider from the state structure. The change updates all callers across the kernel (block layer, IOMMU, DMA direct, and HMM) to pass the provider pointer directly, making the code more explicit and reducing unnecessary indirection. This also removes the runtime warning check since callers now have direct control over which provider they use. Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-2-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Separate the mmap() support from the core logicLeon Romanovsky
Currently the P2PDMA code requires a pgmap and a struct page to function. The was serving three important purposes: - DMA API compatibility, where scatterlist required a struct page as input - Life cycle management, the percpu_ref is used to prevent UAF during device hot unplug - A way to get the P2P provider data through the pci_p2pdma_pagemap The DMA API now has a new flow, and has gained phys_addr_t support, so it no longer needs struct pages to perform P2P mapping. Lifecycle management can be delegated to the user, DMABUF for instance has a suitable invalidation protocol that does not require struct page. Finding the P2P provider data can also be managed by the caller without need to look it up from the phys_addr. Split the P2PDMA code into two layers. The optional upper layer, effectively, provides a way to mmap() P2P memory into a VMA by providing struct page, pgmap, a genalloc and sysfs. The lower layer provides the actual P2P infrastructure and is wrapped up in a new struct p2pdma_provider. Rework the mmap layer to use new p2pdma_provider based APIs. Drivers that do not want to put P2P memory into VMA's can allocate a struct p2pdma_provider after probe() starts and free it before remove() completes. When DMA mapping the driver must convey the struct p2pdma_provider to the DMA mapping code along with a phys_addr of the MMIO BAR slice to map. The driver must ensure that no DMA mapping outlives the lifetime of the struct p2pdma_provider. The intended target of this new API layer is DMABUF. There is usually only a single p2pdma_provider for a DMABUF exporter. Most drivers can establish the p2pdma_provider during probe, access the single instance during DMABUF attach and use that to drive the DMA mapping. DMABUF provides an invalidation mechanism that can guarantee all DMA is halted and the DMA mappings are undone prior to destroying the struct p2pdma_provider. This ensures there is no UAF through DMABUFs that are lingering past driver removal. The new p2pdma_provider layer cannot be used to create P2P memory that can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and so on. These use cases must still use the mmap() layer. The p2pdma_provider layer is principally for DMABUF-like use cases where DMABUF natively manages the life cycle and access instead of vmas/pin_user_pages()/struct page. In addition, remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-1-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20Merge tag 'reset-gpio-for-v6.19' of https://git.pengutronix.de/git/pza/linux ↵Bartosz Golaszewski
into gpio/for-next Reset/GPIO/swnode changes for v6.19 * Extend software node implementation, allowing its properties to reference existing firmware nodes. * Update the GPIO property interface to use reworked swnode macros. * Rework reset-gpio code to use GPIO lookup via swnode. * Fix spi-cs42l43 driver to work with swnode changes.
2025-11-20ASoC: SDCA: Add basic SDCA class driverCharles Keepax
Add a device level driver as the entry point for the class driver. Additional auxiliary drivers will be registered to support each function within the device. This driver will register those function drivers and provide the device level functionality, such as monitoring bus attach/detach, the device level register map, and the root for the IRQ handling. Co-developed-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Maciej Strozek <mstrozek@opensource.cirrus.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Tested-by: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/20251120153023.2105663-13-ckeepax@opensource.cirrus.com Reviewed-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()Pingfan Liu
cpuset_cpus_allowed() uses a reader lock that is sleepable under RT, which means it cannot be called inside raw_spin_lock_t context. Introduce a new cpuset_cpus_allowed_locked() helper that performs the same function as cpuset_cpus_allowed() except that the caller must have acquired the cpuset_mutex so that no further locking will be needed. Suggested-by: Waiman Long <longman@redhat.com> Signed-off-by: Pingfan Liu <piliu@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: linux-kernel@vger.kernel.org To: cgroups@vger.kernel.org Reviewed-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-20software node: allow referencing firmware nodesBartosz Golaszewski
At the moment software nodes can only reference other software nodes. This is a limitation for devices created, for instance, on the auxiliary bus with a dynamic software node attached which cannot reference devices the firmware node of which is "real" (as an OF node or otherwise). Make it possible for a software node to reference all firmware nodes in addition to static software nodes. To that end: add a second pointer to struct software_node_ref_args of type struct fwnode_handle. The core swnode code will first check the swnode pointer and if it's NULL, it will assume the fwnode pointer should be set. Software node graphs remain the same, as in: the remote endpoints still have to be software nodes. Acked-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Tested-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-20Merge patch series "Add OP-TEE based RPMB driver for UFS devices"Martin K. Petersen
Bean Huo <beanhuo@iokpp.de> says: This patch series introduces OP-TEE based RPMB (Replay Protected Memory Block) support for UFS devices, extending the kernel-level secure storage capabilities that are currently available for eMMC devices. Previously, OP-TEE required a userspace supplicant to access RPMB partitions, which created complex dependencies and reliability issues, especially during early boot scenarios. Recent work by Linaro has moved core supplicant functionality directly into the Linux kernel for eMMC devices, eliminating userspace dependencies and enabling immediate secure storage access. This series extends the same approach to UFS devices, which are used in enterprise and mobile applications that require secure storage capabilities. Benefits: - Eliminates dependency on userspace supplicant for UFS RPMB access - Enables early boot secure storage access (e.g., fTPM, secure UEFI variables) - Provides kernel-level RPMB access as soon as UFS driver is initialized - Removes complex initramfs dependencies and boot ordering requirements - Ensures reliable and deterministic secure storage operations - Supports both built-in and modular fTPM configurations. Prerequisites: -------------- This patch series depends on commit 7e8242405b94 ("rpmb: move struct rpmb_frame to common header") which has been merged into mainline v6.18-rc2. Link: https://patch.msgid.link/20251107230518.4060231-1-beanhuo@iokpp.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-11-20nodemask: use min() instead of min_t()David Laight
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'. Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long' and so cannot discard significant bits. In this case the 'unsigned long' value is small enough that the result is ok. Detected by an extra check added to min_t(). Signed-off-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
2025-11-20of/fdt: Consolidate duplicate code into helper functionsYuntao Wang
Currently, there are many pieces of nearly identical code scattered across different places. Consolidate the duplicate code into helper functions to improve maintainability and reduce the likelihood of errors. Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev> Link: https://patch.msgid.link/20251115134753.179931-2-yuntao.wang@linux.dev Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-11-20string: fix kerneldoc formatting in strends()Bartosz Golaszewski
strends() kernel doc should have used `@str:` format for arguments instead of `@str -`. Fixes: 197b3f3c70d6 ("string: provide strends()") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20251118134748.40f03b9c@canb.auug.org.au/ Link: https://lore.kernel.org/r/20251118-strends-follow-up-v1-1-d3f8ef750f59@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-20firmware: cs_dsp: Factor out common debugfs string readRichard Fitzgerald
cs_dsp_debugfs_wmfw_read() and cs_dsp_debugfs_bin_read() were identical except for which struct member they printed. Move all this duplicated code into a common function cs_dsp_debugfs_string_read(). The check for dsp->booted has been removed because this is redundant. The two strings are set when the DSP is booted and cleared when the DSP is powered-down. Access to the string char * must be protected by the pwr_lock mutex. The string is passed into cs_dsp_debugfs_string_read() as a pointer to the char * so that the mutex lock can also be factored out into cs_dsp_debugfs_string_read(). wmfw_file_name and bin_file_name members of struct cs_dsp have been changed to const char *. It makes for a better API to pass a const pointer into cs_dsp_debugfs_string_read(). Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20251120130640.1169780-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-20gpio: improve support for shared GPIOsMark Brown
Merge series from Bartosz Golaszewski <brgl@bgdev.pl>: Problem statement: GPIOs are implemented as a strictly exclusive resource in the kernel but there are lots of platforms on which single pin is shared by multiple devices which don't communicate so need some way of properly sharing access to a GPIO. What we have now is the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag which was introduced as a hack and doesn't do any locking or arbitration of access - it literally just hand the same GPIO descriptor to all interested users. The proposed solution is composed of three major parts: the high-level, shared GPIO proxy driver that arbitrates access to the shared pin and exposes a regular GPIO chip interface to consumers, a low-level shared GPIOLIB module that scans firmware nodes and creates auxiliary devices that attach to the proxy driver and finally a set of core GPIOLIB changes that plug the former into the GPIO lookup path. The changes are implemented in a way that allows to seamlessly compile out any code related to sharing GPIOs for systems that don't need it. The practical use-case for this are the powerdown GPIOs shared by speakers on Qualcomm db845c platform, however I have also extensively tested it using gpio-virtuser on arm64 qemu with various DT configurations.
2025-11-20ata: libata-scsi: Fix system suspend for a security locked driveNiklas Cassel
Commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") fixed ata_to_sense_error() to properly generate sense key ABORTED COMMAND (without any additional sense code), instead of the previous bogus sense key ILLEGAL REQUEST with the additional sense code UNALIGNED WRITE COMMAND, for a failed command. However, this broke suspend for Security locked drives (drives that have Security enabled, and have not been Security unlocked by boot firmware). The reason for this is that the SCSI disk driver, for the Synchronize Cache command only, treats any sense data with sense key ILLEGAL REQUEST as a successful command (regardless of ASC / ASCQ). After commit cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") the code that treats any sense data with sense key ILLEGAL REQUEST as a successful command is no longer applicable, so the command fails, which causes the system suspend to be aborted: sd 1:0:0:0: PM: dpm_run_callback(): scsi_bus_suspend returns -5 sd 1:0:0:0: PM: failed to suspend async: error -5 PM: Some devices failed to suspend, or early wake event detected To make suspend work once again, for a Security locked device only, return sense data LOGICAL UNIT ACCESS NOT AUTHORIZED, the actual sense data which a real SCSI device would have returned if locked. The SCSI disk driver treats this sense data as a successful command. Cc: stable@vger.kernel.org Reported-by: Ilia Baryshnikov <qwelias@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220704 Fixes: cf3fc037623c ("ata: libata-scsi: Fix ata_to_sense_error() status handling") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2025-11-20cpumask: Introduce cpumask_weighted_or()Thomas Gleixner
CID management OR's two cpumasks and then calculates the weight on the result. That's inefficient as that has to walk the same stuff twice. As this is done with runqueue lock held, there is a real benefit of speeding this up. Depending on the system this results in 10-20% less cycles spent with runqueue lock held for a 4K cpumask. Provide cpumask_weighted_or() and the corresponding bitmap functions which return the weight of the OR result right away. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.448263340@linutronix.de
2025-11-20sched/mmcid: Move scheduler code out of global headerThomas Gleixner
This is only used in the scheduler core code, so there is no point to have it in a global header. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.321259077@linutronix.de
2025-11-20sched/mmcid: Cacheline align MM CID storageThomas Gleixner
Both the per CPU storage and the data in mm_struct are heavily used in context switch. As they can end up next to other frequently modified data, they are subject to false sharing. Make them cache line aligned. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.194111661@linutronix.de
2025-11-20sched/mmcid: Use proper data structuresThomas Gleixner
Having a lot of CID functionality specific members in struct task_struct and struct mm_struct is not really making the code easier to read. Encapsulate the CID specific parts in data structures and keep them separate from the stuff they are embedded in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.131573768@linutronix.de
2025-11-20sched/mmcid: Revert the complex CID managementThomas Gleixner
The CID management is a complex beast, which affects both scheduling and task migration. The compaction mechanism forces random tasks of a process into task work on exit to user space causing latency spikes. Revert back to the initial simple bitmap allocating mechanics, which are known to have scalability issues as that allows to gradually build up a replacement functionality in a reviewable way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.068197830@linutronix.de
2025-11-20wifi: mac80211: add generic MMIE struct definesChien Wong
The added struct is needed when writing generic handler for both CMAC-128 and CMAC-256. Signed-off-by: Chien Wong <m@xv97.com> Link: https://patch.msgid.link/20251113140511.48658-3-m@xv97.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-20mfd: sec: Use chained IRQs for s2mpg10André Draszik
On S2MPG10 (and similar like S2MPG11), top-level interrupt status and mask registers exist which need to be unmasked to get the PMIC interrupts. This additional status doesn't seem to exist on other PMICs in the S2MP* family, and the S2MPG10 driver is manually dealing with masking and unmasking currently. The correct approach here is to register this hierarchy as chained interrupts, though, without any additional manual steps. Doing so will also simplify addition of other, similar, PMICs (like S2MPG11) in the future. Update the driver to do just that. Signed-off-by: André Draszik <andre.draszik@linaro.org> Link: https://patch.msgid.link/20251114-s2mpg10-chained-irq-v1-1-34ddfa49c4cd@linaro.org Signed-off-by: Lee Jones <lee@kernel.org>
2025-11-20wifi: cfg80211: Add support for 6GHz AP role not relevant AP typePagadala Yesu Anjaneyulu
Add IEEE80211_6GHZ_CTRL_REG_AP_ROLE_NOT_RELEVANT and map it to IEEE80211_REG_LPI_AP for safe regulatory compliance when AP role classification is not applicable. Use LPI as safe fallback to prevent power limit violations. Signed-off-by: Pagadala Yesu Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251112110828.856283677cc7.I36138a34847c3b4e680974bf347dde844448f3bc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-20gpio: improve support for shared GPIOsMark Brown
Merge series from Bartosz Golaszewski <brgl@bgdev.pl>: Problem statement: GPIOs are implemented as a strictly exclusive resource in the kernel but there are lots of platforms on which single pin is shared by multiple devices which don't communicate so need some way of properly sharing access to a GPIO. What we have now is the GPIOD_FLAGS_BIT_NONEXCLUSIVE flag which was introduced as a hack and doesn't do any locking or arbitration of access - it literally just hand the same GPIO descriptor to all interested users. The proposed solution is composed of three major parts: the high-level, shared GPIO proxy driver that arbitrates access to the shared pin and exposes a regular GPIO chip interface to consumers, a low-level shared GPIOLIB module that scans firmware nodes and creates auxiliary devices that attach to the proxy driver and finally a set of core GPIOLIB changes that plug the former into the GPIO lookup path. The changes are implemented in a way that allows to seamlessly compile out any code related to sharing GPIOs for systems that don't need it. The practical use-case for this are the powerdown GPIOs shared by speakers on Qualcomm db845c platform, however I have also extensively tested it using gpio-virtuser on arm64 qemu with various DT configurations.
2025-11-20Merge branch 'for-linus' into for-nextTakashi Iwai
Pull 6.18-devel branch for applying the further HD-audio fixups for HP. Signed-off-by: Takashi Iwai <tiwai@suse.de>