From 2b6a3f061f11372af79b862d6184d43193ae927f Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Tue, 25 Nov 2025 10:00:59 +0000 Subject: mm: declare VMA flags by bit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "initial work on making VMA flags a bitmap", v3. We are in the rather silly situation that we are running out of VMA flags as they are currently limited to a system word in size. This leads to absurd situations where we limit features to 64-bit architectures only because we simply do not have the ability to add a flag for 32-bit ones. This is very constraining and leads to hacks or, in the worst case, simply an inability to implement features we want for entirely arbitrary reasons. This also of course gives us something of a Y2K type situation in mm where we might eventually exhaust all of the VMA flags even on 64-bit systems. This series lays the groundwork for getting away from this limitation by establishing VMA flags as a bitmap whose size we can increase in future beyond 64 bits if required. This is necessarily a highly iterative process given the extensive use of VMA flags throughout the kernel, so we start by performing basic steps. Firstly, we declare VMA flags by bit number rather than by value, retaining the VM_xxx fields but in terms of these newly introduced VMA_xxx_BIT fields. While we are here, we use sparse annotations to ensure that, when dealing with VMA bit number parameters, we cannot be passed values which are not declared as such - providing some useful type safety. We then introduce an opaque VMA flag type, much like the opaque mm_struct flag type introduced in commit bb6525f2f8c4 ("mm: add bitmap mm->flags field"), which we establish in union with vma->vm_flags (but still set at system word size meaning there is no functional or data type size change). We update the vm_flags_xxx() helpers to use this new bitmap, introducing sensible helpers to do so. This series lays the foundation for further work to expand the use of bitmap VMA flags and eventually eliminate these arbitrary restrictions. This patch (of 4): In order to lay the groundwork for VMA flags being a bitmap rather than a system word in size, we need to be able to consistently refer to VMA flags by bit number rather than value. Take this opportunity to do so in an enum which we which is additionally useful for tooling to extract metadata from. This additionally makes it very clear which bits are being used for what at a glance. We use the VMA_ prefix for the bit values as it is logical to do so since these reference VMAs. We consistently suffix with _BIT to make it clear what the values refer to. We declare bit values even when the flags that use them would not be enabled by config options as this is simply clearer and clearly defines what bit numbers are used for what, at no additional cost. We declare a sparse-bitwise type vma_flag_t which ensures that users can't pass around invalid VMA flags by accident and prepares for future work towards VMA flags being a bitmap where we want to ensure bit values are type safe. To make life easier, we declare some macro helpers - DECLARE_VMA_BIT() allows us to avoid duplication in the enum bit number declarations (and maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to assist with declaration of flags. Unfortunately we can't declare both in the enum, as we run into issue with logic in the kernel requiring that flags are preprocessor definitions, and additionally we cannot have a macro which declares another macro so we must define each flag macro directly. Additionally, update the VMA userland testing vma_internal.h header to include these changes. We also have to fix the parameters to the vma_flag_*_atomic() functions since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will complain otherwise. We have to update some rather silly if-deffery found in mm/task_mmu.c which would otherwise break. Finally, we update the rust binding helper as now it cannot auto-detect the flags at all. Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes Acked-by: Vlastimil Babka Reviewed-by: Pedro Falcato Acked-by: Alice Ryhl [rust] Cc: Alex Gaynor Cc: Alistair Popple Cc: Andreas Hindborg Cc: Axel Rasmussen Cc: Baolin Wang Cc: Baoquan He Cc: Barry Song Cc: Ben Segall Cc: Björn Roy Baron Cc: Boqun Feng Cc: Byungchul Park Cc: Chengming Zhou Cc: Chris Li Cc: Danilo Krummrich Cc: David Hildenbrand Cc: David Rientjes Cc: Dev Jain Cc: Dietmar Eggemann Cc: Gary Guo Cc: Gregory Price Cc: "Huang, Ying" Cc: Ingo Molnar Cc: Jann Horn Cc: Jason Gunthorpe Cc: Johannes Weiner Cc: John Hubbard Cc: Joshua Hahn Cc: Juri Lelli Cc: Kairui Song Cc: Kees Cook Cc: Kemeng Shi Cc: Lance Yang Cc: Leon Romanovsky Cc: Liam Howlett Cc: Mathew Brost Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Miguel Ojeda Cc: Mike Rapoport Cc: Muchun Song Cc: Nhat Pham Cc: Nico Pache Cc: Oscar Salvador Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Rakie Kim Cc: Rik van Riel Cc: Ryan Roberts Cc: Shakeel Butt Cc: Steven Rostedt Cc: Suren Baghdasaryan Cc: Trevor Gross Cc: Valentin Schneider Cc: Vincent Guittot Cc: Wei Xu Cc: xu xin Cc: Yuanchu Xie Cc: Zi Yan Signed-off-by: Andrew Morton --- rust/bindgen_parameters | 25 +++++++++++++++++++++++++ rust/bindings/bindings_helper.h | 25 +++++++++++++++++++++++++ 2 files changed, 50 insertions(+) (limited to 'rust') diff --git a/rust/bindgen_parameters b/rust/bindgen_parameters index e13c6f9dd17b..fd2fd1c3cb9a 100644 --- a/rust/bindgen_parameters +++ b/rust/bindgen_parameters @@ -35,6 +35,31 @@ # recognized, block generation of the non-helper constants. --blocklist-item ARCH_SLAB_MINALIGN --blocklist-item ARCH_KMALLOC_MINALIGN +--blocklist-item VM_MERGEABLE +--blocklist-item VM_READ +--blocklist-item VM_WRITE +--blocklist-item VM_EXEC +--blocklist-item VM_SHARED +--blocklist-item VM_MAYREAD +--blocklist-item VM_MAYWRITE +--blocklist-item VM_MAYEXEC +--blocklist-item VM_MAYEXEC +--blocklist-item VM_PFNMAP +--blocklist-item VM_IO +--blocklist-item VM_DONTCOPY +--blocklist-item VM_DONTEXPAND +--blocklist-item VM_LOCKONFAULT +--blocklist-item VM_ACCOUNT +--blocklist-item VM_NORESERVE +--blocklist-item VM_HUGETLB +--blocklist-item VM_SYNC +--blocklist-item VM_ARCH_1 +--blocklist-item VM_WIPEONFORK +--blocklist-item VM_DONTDUMP +--blocklist-item VM_SOFTDIRTY +--blocklist-item VM_MIXEDMAP +--blocklist-item VM_HUGEPAGE +--blocklist-item VM_NOHUGEPAGE # Structs should implement `Zeroable` when all of their fields do. --with-derive-custom-struct .*=MaybeZeroable diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h index 2e43c66635a2..4c327db01ca0 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -108,7 +108,32 @@ const xa_mark_t RUST_CONST_HELPER_XA_PRESENT = XA_PRESENT; const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC = XA_FLAGS_ALLOC; const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC1 = XA_FLAGS_ALLOC1; + const vm_flags_t RUST_CONST_HELPER_VM_MERGEABLE = VM_MERGEABLE; +const vm_flags_t RUST_CONST_HELPER_VM_READ = VM_READ; +const vm_flags_t RUST_CONST_HELPER_VM_WRITE = VM_WRITE; +const vm_flags_t RUST_CONST_HELPER_VM_EXEC = VM_EXEC; +const vm_flags_t RUST_CONST_HELPER_VM_SHARED = VM_SHARED; +const vm_flags_t RUST_CONST_HELPER_VM_MAYREAD = VM_MAYREAD; +const vm_flags_t RUST_CONST_HELPER_VM_MAYWRITE = VM_MAYWRITE; +const vm_flags_t RUST_CONST_HELPER_VM_MAYEXEC = VM_MAYEXEC; +const vm_flags_t RUST_CONST_HELPER_VM_MAYSHARE = VM_MAYEXEC; +const vm_flags_t RUST_CONST_HELPER_VM_PFNMAP = VM_PFNMAP; +const vm_flags_t RUST_CONST_HELPER_VM_IO = VM_IO; +const vm_flags_t RUST_CONST_HELPER_VM_DONTCOPY = VM_DONTCOPY; +const vm_flags_t RUST_CONST_HELPER_VM_DONTEXPAND = VM_DONTEXPAND; +const vm_flags_t RUST_CONST_HELPER_VM_LOCKONFAULT = VM_LOCKONFAULT; +const vm_flags_t RUST_CONST_HELPER_VM_ACCOUNT = VM_ACCOUNT; +const vm_flags_t RUST_CONST_HELPER_VM_NORESERVE = VM_NORESERVE; +const vm_flags_t RUST_CONST_HELPER_VM_HUGETLB = VM_HUGETLB; +const vm_flags_t RUST_CONST_HELPER_VM_SYNC = VM_SYNC; +const vm_flags_t RUST_CONST_HELPER_VM_ARCH_1 = VM_ARCH_1; +const vm_flags_t RUST_CONST_HELPER_VM_WIPEONFORK = VM_WIPEONFORK; +const vm_flags_t RUST_CONST_HELPER_VM_DONTDUMP = VM_DONTDUMP; +const vm_flags_t RUST_CONST_HELPER_VM_SOFTDIRTY = VM_SOFTDIRTY; +const vm_flags_t RUST_CONST_HELPER_VM_MIXEDMAP = VM_MIXEDMAP; +const vm_flags_t RUST_CONST_HELPER_VM_HUGEPAGE = VM_HUGEPAGE; +const vm_flags_t RUST_CONST_HELPER_VM_NOHUGEPAGE = VM_NOHUGEPAGE; #if IS_ENABLED(CONFIG_ANDROID_BINDER_IPC_RUST) #include "../../drivers/android/binder/rust_binder.h" -- cgit v1.2.3 From 9ea35a25d51b13013b724943a177a7aaf4bfed71 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Tue, 25 Nov 2025 10:01:02 +0000 Subject: mm: introduce VMA flags bitmap type MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It is useful to transition to using a bitmap for VMA flags so we can avoid running out of flags, especially for 32-bit kernels which are constrained to 32 flags, necessitating some features to be limited to 64-bit kernels only. By doing so, we remove any constraint on the number of VMA flags moving forwards no matter the platform and can decide in future to extend beyond 64 if required. We start by declaring an opaque types, vma_flags_t (which resembles mm_struct flags of type mm_flags_t), setting it to precisely the same size as vm_flags_t, and place it in union with vm_flags in the VMA declaration. We additionally update struct vm_area_desc equivalently placing the new opaque type in union with vm_flags. This change therefore does not impact the size of struct vm_area_struct or struct vm_area_desc. In order for the change to be iterative and to avoid impacting performance, we designate VM_xxx declared bitmap flag values as those which must exist in the first system word of the VMA flags bitmap. We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(), vma_flags_overwrite_word(), vma_flags_overwrite_word_once(), vma_flags_set_word() and vma_flags_clear_word() in order to allow us to update the existing vm_flags_*() functions to utilise these helpers. This is a stepping stone towards converting users to the VMA flags bitmap and behaves precisely as before. By doing this, we can eliminate the existing private vma->__vm_flags field in the vma->vm_flags union and replace it with the newly introduced opaque type vma_flags, which we call flags so we refer to the new bitmap field as vma->flags. We update vma_flag_[test, set]_atomic() to account for the change also. We adapt vm_flags_reset_once() to only clear those bits above the first system word providing write-once semantics to the first system word (which it is presumed the caller requires - and in all current use cases this is so). As we currently only specify that the VMA flags bitmap size is equal to BITS_PER_LONG number of bits, this is a noop, but is defensive in preparation for a future change that increases this. We additionally update the VMA userland test declarations to implement the same changes there. Finally, we update the rust code to reference vma->vm_flags on update rather than vma->__vm_flags which has been removed. This is safe for now, albeit it is implicitly performing a const cast. Once we introduce flag helpers we can improve this more. No functional change intended. Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes Acked-by: Vlastimil Babka Reviewed-by: Pedro Falcato Acked-by: Alice Ryhl [rust] Cc: Alex Gaynor Cc: Alistair Popple Cc: Andreas Hindborg Cc: Axel Rasmussen Cc: Baolin Wang Cc: Baoquan He Cc: Barry Song Cc: Ben Segall Cc: Björn Roy Baron Cc: Boqun Feng Cc: Byungchul Park Cc: Chengming Zhou Cc: Chris Li Cc: Danilo Krummrich Cc: David Hildenbrand Cc: David Rientjes Cc: Dev Jain Cc: Dietmar Eggemann Cc: Gary Guo Cc: Gregory Price Cc: "Huang, Ying" Cc: Ingo Molnar Cc: Jann Horn Cc: Jason Gunthorpe Cc: Johannes Weiner Cc: John Hubbard Cc: Joshua Hahn Cc: Juri Lelli Cc: Kairui Song Cc: Kees Cook Cc: Kemeng Shi Cc: Lance Yang Cc: Leon Romanovsky Cc: Liam Howlett Cc: Mathew Brost Cc: Matthew Wilcox (Oracle) Cc: Mel Gorman Cc: Michal Hocko Cc: Miguel Ojeda Cc: Mike Rapoport Cc: Muchun Song Cc: Nhat Pham Cc: Nico Pache Cc: Oscar Salvador Cc: Peter Xu Cc: Peter Zijlstra Cc: Qi Zheng Cc: Rakie Kim Cc: Rik van Riel Cc: Ryan Roberts Cc: Shakeel Butt Cc: Steven Rostedt Cc: Suren Baghdasaryan Cc: Trevor Gross Cc: Valentin Schneider Cc: Vincent Guittot Cc: Wei Xu Cc: xu xin Cc: Yuanchu Xie Cc: Zi Yan Signed-off-by: Andrew Morton --- rust/kernel/mm/virt.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'rust') diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs index a1bfa4e19293..da21d65ccd20 100644 --- a/rust/kernel/mm/virt.rs +++ b/rust/kernel/mm/virt.rs @@ -250,7 +250,7 @@ impl VmaNew { // SAFETY: This is not a data race: the vma is undergoing initial setup, so it's not yet // shared. Additionally, `VmaNew` is `!Sync`, so it cannot be used to write in parallel. // The caller promises that this does not set the flags to an invalid value. - unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags = flags }; + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags = flags }; } /// Set the `VM_MIXEDMAP` flag on this vma. -- cgit v1.2.3