linux-toradex.git/arch/arm/include, branch master

ARM: clean up the memset64() C wrapper

2026-02-13T19:15:05+00:00

The current logic to split the 64-bit argument into its 32-bit halves is
byte-order specific and a bit clunky.  Use a union instead which is
easier to read and works in all cases.

GCC still generates the same machine code.

While at it, rename the arguments of the __memset64() prototype to
actually reflect their semantics.

Signed-off-by: Thomas Weißschuh 
Signed-off-by: Linus Torvalds

Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2026-02-12T19:32:37+00:00

Pull MM updates from Andrew Morton:

 - "powerpc/64s: do not re-activate batched TLB flush" makes
   arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)

   It adds a generic enter/leave layer and switches architectures to use
   it. Various hacks were removed in the process.

 - "zram: introduce compressed data writeback" implements data
   compression for zram writeback (Richard Chang and Sergey Senozhatsky)

 - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
   page ranges for hugepages. Large improvements during demand faulting
   are demonstrated (David Hildenbrand)

 - "memcg cleanups" tidies up some memcg code (Chen Ridong)

 - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
   stats" improves DAMOS stat's provided information, deterministic
   control, and readability (SeongJae Park)

 - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
   issues in the hugetlb cgroup charging selftests (Li Wang)

 - "Fix va_high_addr_switch.sh test failure - again" addresses several
   issues in the va_high_addr_switch test (Chunyu Hu)

 - "mm/damon/tests/core-kunit: extend existing test scenarios" improves
   the KUnit test coverage for DAMON (Shu Anzai)

 - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
   glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
   transiently return -EAGAIN (Shivank Garg)

 - "arch, mm: consolidate hugetlb early reservation" reworks and
   consolidates a pile of straggly code related to reservation of
   hugetlb memory from bootmem and creation of CMA areas for hugetlb
   (Mike Rapoport)

 - "mm: clean up anon_vma implementation" cleans up the anon_vma
   implementation in various ways (Lorenzo Stoakes)

 - "tweaks for __alloc_pages_slowpath()" does a little streamlining of
   the page allocator's slowpath code (Vlastimil Babka)

 - "memcg: separate private and public ID namespaces" cleans up the
   memcg ID code and prevents the internal-only private IDs from being
   exposed to userspace (Shakeel Butt)

 - "mm: hugetlb: allocate frozen gigantic folio" cleans up the
   allocation of frozen folios and avoids some atomic refcount
   operations (Kefeng Wang)

 - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
   of memory betewwn the active and inactive LRUs and adds auto-tuning
   of the ratio-based quotas and of monitoring intervals (SeongJae Park)

 - "Support page table check on PowerPC" makes
   CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)

 - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
   nodes_and() and nodes_andnot() propagate the return values from the
   underlying bit operations, enabling some cleanup in calling code
   (Yury Norov)

 - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
   some DAMON internal interfaces (SeongJae Park)

 - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
   in khupaged and fixes a scan limit accounting issue (Shivank Garg)

 - "mm: balloon infrastructure cleanups" goes to town on the balloon
   infrastructure and its page migration function. Mainly cleanups, also
   some locking simplification (David Hildenbrand)

 - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
   additional tracepoints to the page reclaim code (Jiayuan Chen)

 - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
   part of Marco's kernel-wide migration from the legacy workqueue APIs
   over to the preferred unbound workqueues (Marco Crivellari)

 - "Various mm kselftests improvements/fixes" provides various unrelated
   improvements/fixes for the mm kselftests (Kevin Brodsky)

 - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
   folio allocation, mainly by avoiding unnecessary work in
   pfn_range_valid_contig() (Kefeng Wang)

 - "selftests/damon: improve leak detection and wss estimation
   reliability" improves the reliability of two of the DAMON selftests
   (SeongJae Park)

 - "mm/damon: cleanup kdamond, damon_call(), damos filter and
   DAMON_MIN_REGION" does some cleanup work in the core DAMON code
   (SeongJae Park)

 - "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
   performs maintenance work on the DAMON documentation (SeongJae Park)

 - "mm: add and use vma_assert_stabilised() helper" refactors and cleans
   up the core VMA code. The main aim here is to be able to use the mmap
   write lock's lockdep state to perform various assertions regarding
   the locking which the VMA code requires (Lorenzo Stoakes)

 - "mm, swap: swap table phase II: unify swapin use" removes some old
   swap code (swap cache bypassing and swap synchronization) which
   wasn't working very well. Various other cleanups and simplifications
   were made. The end result is a 20% speedup in one benchmark (Kairui
   Song)

 - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
   available on 64-bit alpha, loongarch, mips, parisc, and um. Various
   cleanups were performed along the way (Qi Zheng)

* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
  mm/memory: handle non-split locks correctly in zap_empty_pte_table()
  mm: move pte table reclaim code to memory.c
  mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
  mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
  um: mm: enable MMU_GATHER_RCU_TABLE_FREE
  parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
  LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
  alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
  mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
  mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
  zsmalloc: make common caches global
  mm: add SPDX id lines to some mm source files
  mm/zswap: use %pe to print error pointers
  mm/vmscan: use %pe to print error pointers
  mm/readahead: fix typo in comment
  mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
  mm: refactor vma_map_pages to use vm_insert_pages
  mm/damon: unify address range representation with damon_addr_range
  mm/cma: replace snprintf with strscpy in cma_new_area
  ...

Merge tag 'asm-generic-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

2026-02-11T04:27:33+00:00

Pull asm-generic header updates from Arnd Bergmann:
 "A series from Thomas Weißschuh cleans up the UAPI header files to no
  longer contain any references to Kconfig symbols, as these make no
  sense in userspace.

  The build-time check for these was originally added by Sam Ravnborg in
  linux-2.6.28, and a later version started warning for all newly added
  CONFIG_* checks here but kept a list of known exceptions. With the
  last exceptions gone from that list, the warning is now unconditional
  in 'make headers_install'.

  John Garry contributed a cleanup of cpumask_of_node()"

* tag 'asm-generic-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  scripts: headers_install.sh: Remove config leak ignore machinery
  x86/uapi: Stop leaking kconfig references to userspace
  nios2: uapi: Remove custom asm/swab.h from UAPI
  ARM: uapi: Drop PSR_ENDSTATE
  ARC: Always use SWAPE instructions for __arch_swab32()
  include/asm-generic/topology.h: Remove unused definition of cpumask_of_node()

Merge tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2026-02-11T03:01:45+00:00

Pull x86 paravirt updates from Borislav Petkov:

 - A nice cleanup to the paravirt code containing a unification of the
   paravirt clock interface, taming the include hell by splitting the
   pv_ops structure and removing of a bunch of obsolete code (Juergen
   Gross)

* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
  x86/paravirt: Remove trailing semicolons from alternative asm templates
  x86/pvlocks: Move paravirt spinlock functions into own header
  x86/paravirt: Specify pv_ops array in paravirt macros
  x86/paravirt: Allow pv-calls outside paravirt.h
  objtool: Allow multiple pv_ops arrays
  x86/xen: Drop xen_mmu_ops
  x86/xen: Drop xen_cpu_ops
  x86/xen: Drop xen_irq_ops
  x86/paravirt: Move pv_native_*() prototypes to paravirt.c
  x86/paravirt: Introduce new paravirt-base.h header
  x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
  x86/paravirt: Use common code for paravirt_steal_clock()
  riscv/paravirt: Use common code for paravirt_steal_clock()
  loongarch/paravirt: Use common code for paravirt_steal_clock()
  arm64/paravirt: Use common code for paravirt_steal_clock()
  arm/paravirt: Use common code for paravirt_steal_clock()
  sched: Move clock related paravirt code to kernel/sched
  paravirt: Remove asm/paravirt_api_clock.h
  x86/paravirt: Move thunk macros to paravirt_types.h
  ...

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

2026-02-10T04:28:45+00:00

Pull arm64 updates from Will Deacon:
 "There's a little less than normal, probably due to LPC & Christmas/New
  Year meaning that a few series weren't quite ready or reviewed in
  time. It's still useful across the board, despite the only real
  feature being support for the LS64 feature enabling 64-byte atomic
  accesses to endpoints that support it.

  ACPI:
   - Add interrupt signalling support to the AGDI handler
   - Add Catalin and myself to the arm64 ACPI MAINTAINERS entry

  CPU features:
   - Drop Kconfig options for PAN and LSE (these are detected at runtime)
   - Add support for 64-byte single-copy atomic instructions (LS64/LS64V)
   - Reduce MTE overhead when executing in the kernel on Ampere CPUs
   - Ensure POR_EL0 value exposed via ptrace is up-to-date
   - Fix error handling on GCS allocation failure

  CPU frequency:
   - Add CPU hotplug support to the FIE setup in the AMU driver

  Entry code:
   - Minor optimisations and cleanups to the syscall entry path
   - Preparatory rework for moving to the generic syscall entry code

  Hardware errata:
   - Work around Spectre-BHB on TSV110 processors
   - Work around broken CMO propagation on some systems with the SI-L1
     interconnect

  Miscellaneous:
   - Disable branch profiling for arch/arm64/ to avoid issues with
     noinstr
   - Minor fixes and cleanups (kexec + ubsan, WARN_ONCE() instead of
     WARN_ON(), reduction of boolean expression)
   - Fix custom __READ_ONCE() implementation for LTO builds when
     operating on non-atomic types

  Perf and PMUs:
   - Support for CMN-600AE
   - Be stricter about supported hardware in the CMN driver
   - Support for DSU-110 and DSU-120
   - Support for the cycles event in the DSU driver (alongside the
     dedicated cycles counter)
   - Use IRQF_NO_THREAD instead of IRQF_ONESHOT in the cxlpmu driver
   - Use !bitmap_empty() as a faster alternative to bitmap_weight()
   - Fix SPE error handling when failing to resume profiling

  Selftests:
   - Add support for the FORCE_TARGETS option to the arm64 kselftests
   - Avoid nolibc-specific my_syscall() function
   - Add basic test for the LS64 HWCAP
   - Extend fp-pidbench to cover additional workload patterns"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (43 commits)
  perf/arm-cmn: Reject unsupported hardware configurations
  perf: arm_spe: Properly set hw.state on failures
  arm64/gcs: Fix error handling in arch_set_shadow_stack_status()
  arm64: Fix non-atomic __READ_ONCE() with CONFIG_LTO=y
  arm64: poe: fix stale POR_EL0 values for ptrace
  kselftest/arm64: Raise default number of loops in fp-pidbench
  kselftest/arm64: Add a no-SVE loop after SVE in fp-pidbench
  perf/cxlpmu: Replace IRQF_ONESHOT with IRQF_NO_THREAD
  arm64: mte: Set TCMA1 whenever MTE is present in the kernel
  arm64/ptrace: Return early for ptrace_report_syscall_entry() error
  arm64/ptrace: Split report_syscall()
  arm64: Remove unused _TIF_WORK_MASK
  kselftest/arm64: Add missing file in .gitignore
  arm64: errata: Workaround for SI L1 downstream coherency issue
  kselftest/arm64: Add HWCAP test for FEAT_LS64
  arm64: Add support for FEAT_{LS64, LS64_V}
  KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
  arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
  KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory
  KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
  ...

ARM: 9468/1: fix memset64() on big-endian

2026-01-31T12:58:27+00:00

On big-endian systems the 32-bit low and high halves need to be swapped
for the underlying assembly implementation to work correctly.

Fixes: fd1d362600e2 ("ARM: implement memset32 & memset64")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh 
Reviewed-by: Matthew Wilcox (Oracle) 
Reviewed-by: Arnd Bergmann 
Signed-off-by: Russell King (Oracle)

ARM: uapi: Drop PSR_ENDSTATE

2026-01-30T15:46:17+00:00

The symbol PSR_ENDSTATE is pointless for userspace. Drop it from the
UAPI headers and instead inline it into the only two callers.

As as side-effect, remove a leak of an internal kconfig symbol through
the UAPI headers.

Suggested-by: Arnd Bergmann 
Link: https://lore.kernel.org/lkml/d2ad12f2-3d65-4bef-890c-65d78a33d790@app.fastmail.com/
Signed-off-by: Thomas Weißschuh 
Signed-off-by: Arnd Bergmann

arm: make initialization of zero page independent of the memory map

2026-01-27T04:02:13+00:00

Unlike most architectures, arm keeps a struct page pointer to the
empty_zero_page and to initialize it requires conversion of a virtual
address to page which makes it necessary to have memory map initialized
before creating the empty_zero_page.

Make empty_zero_page a stataic array in BSS to decouple it's
initialization from the initialization of the memory map.

This also aligns arm with vast majorty of architectures.

Link: https://lkml.kernel.org/r/20260111082105.290734-5-rppt@kernel.org
Signed-off-by: Klara Modin 
Signed-off-by: Mike Rapoport (Microsoft) 
Cc: Alexander Gordeev 
Cc: Alex Shi 
Cc: Andreas Larsson 
Cc: "Borislav Petkov (AMD)" 
Cc: Catalin Marinas 
Cc: David Hildenbrand 
Cc: David S. Miller 
Cc: Dinh Nguyen 
Cc: Geert Uytterhoeven 
Cc: Guo Ren 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Huacai Chen 
Cc: Ingo Molnar 
Cc: Johannes Berg 
Cc: John Paul Adrian Glaubitz 
Cc: Jonathan Corbet 
Cc: Liam Howlett 
Cc: Lorenzo Stoakes 
Cc: Magnus Lindholm 
Cc: Matt Turner 
Cc: Max Filippov 
Cc: Michael Ellerman 
Cc: Michal Hocko 
Cc: Michal Simek 
Cc: Muchun Song 
Cc: Oscar Salvador 
Cc: Palmer Dabbelt 
Cc: Pratyush Yadav 
Cc: Richard Weinberger 
Cc: "Ritesh Harjani (IBM)" 
Cc: Russell King 
Cc: Stafford Horne 
Cc: Suren Baghdasaryan 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Vasily Gorbik 
Cc: Vineet Gupta 
Cc: Vlastimil Babka 
Cc: Will Deacon 
Signed-off-by: Andrew Morton

treewide: provide a generic clear_user_page() variant

2026-01-21T03:24:39+00:00

Patch series "mm: folio_zero_user: clear page ranges", v11.

This series adds clearing of contiguous page ranges for hugepages.

The series improves on the current discontiguous clearing approach in two
ways:

  - clear pages in a contiguous fashion.
  - use batched clearing via clear_pages() wherever exposed.

The first is useful because it allows us to make much better use of
hardware prefetchers.

The second, enables advertising the real extent to the processor.  Where
specific instructions support it (ex.  string instructions on x86; "mops"
on arm64 etc), a processor can optimize based on this because, instead of
seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a
larger unit being operated on.

For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to
a mode where they start eliding cacheline allocation.  This is helpful not
just because it results in higher bandwidth, but also because now the
cache is not evicting useful cachelines and replacing them with zeroes.

Demand faulting a 64GB region shows performance improvement:

 $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5

                       baseline              +series
                   (GBps +- %stdev)      (GBps +- %stdev)

   pg-sz=2MB       11.76 +- 1.10%        25.34 +- 1.18% [*]   +115.47%  	preempt=*

   pg-sz=1GB       24.85 +- 2.41%        39.22 +- 2.32%       + 57.82%  	preempt=none|voluntary
   pg-sz=1GB         (similar)           52.73 +- 0.20% [#]   +112.19%  	preempt=full|lazy

 [*] This improvement is because switching to sequential clearing
  allows the hardware prefetchers to do a much better job.

 [#] For pg-sz=1GB a large part of the improvement is because of the
  cacheline elision mentioned above. preempt=full|lazy improves upon
  that because, not needing explicit invocations of cond_resched() to
  ensure reasonable preemption latency, it can clear the full extent
  as a single unit. In comparison the maximum extent used for
  preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB).

  When provided the full extent the processor forgoes allocating
  cachelines on this path almost entirely.

  (The hope is that eventually, in the fullness of time, the lazy
   preemption model will be able to do the same job that none or
   voluntary models are used for, allowing us to do away with
   cond_resched().)

Raghavendra also tested previous version of the series on AMD Genoa and
sees similar improvement [1] with preempt=lazy.

  $ perf bench mem map -p $page-size -f populate -s 64GB -l 10

                    base               patched              change
   pg-sz=2MB       12.731939 GB/sec    26.304263 GB/sec     106.6%
   pg-sz=1GB       26.232423 GB/sec    61.174836 GB/sec     133.2%


This patch (of 8):

Let's drop all variants that effectively map to clear_page() and provide
it in a generic variant instead.

We'll use the macro clear_user_page to indicate whether an architecture
provides it's own variant.

Also, clear_user_page() is only called from the generic variant of
clear_user_highpage(), so define it only if the architecture does not
provide a clear_user_highpage().  And, for simplicity define it in
linux/highmem.h.

Note that for parisc, clear_page() and clear_user_page() map to
clear_page_asm(), so we can just get rid of the custom clear_user_page()
implementation.  There is a clear_user_page_asm() function on parisc, that
seems to be unused.  Not sure what's up with that.

Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com
Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com
Signed-off-by: David Hildenbrand 
Co-developed-by: Ankur Arora 
Signed-off-by: Ankur Arora 
Cc: Andy Lutomirski 
Cc: Ankur Arora 
Cc: "Borislav Petkov (AMD)" 
Cc: Boris Ostrovsky 
Cc: David Hildenbrand 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Konrad Rzessutek Wilk 
Cc: Lance Yang 
Cc: "Liam R. Howlett" 
Cc: Li Zhe 
Cc: Lorenzo Stoakes 
Cc: Mateusz Guzik 
Cc: Matthew Wilcox (Oracle) 
Cc: Michal Hocko 
Cc: Mike Rapoport 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Suren Baghdasaryan 
Cc: Thomas Gleixner 
Cc: Vlastimil Babka 
Signed-off-by: Andrew Morton

arm/paravirt: Use common code for paravirt_steal_clock()

2026-01-12T14:57:23+00:00

Remove the arch-specific variant of paravirt_steal_clock() and use
the common one instead.

This allows to remove paravirt.c and paravirt.h from arch/arm.

Until all archs supporting Xen have been switched to the common code
of paravirt_steal_clock(), drivers/xen/time.c needs to include
asm/paravirt.h for those archs, while this is not necessary for arm
any longer.

Signed-off-by: Juergen Gross 
Signed-off-by: Borislav Petkov (AMD) 
Acked-by: Peter Zijlstra (Intel) 
Link: https://patch.msgid.link/20260105110520.21356-8-jgross@suse.com