| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
so mm_struct is now passed into pxx_user_accessible_page() callbacks
- Expose the PCI function UID as an arch-specific slot attribute in
sysfs so a function can be identified by its user-defined id while
still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
drivers/pci/slot.c
- Refresh s390 PCI documentation to reflect current behavior and cover
previously undocumented sysfs attributes
- zcrypt device driver cleanup series: consistent field types, clearer
variable naming, a kernel-doc warning fix, and a comment explaining
the intentional synchronize_rcu() in pkey_handler_register()
- Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
get_lowcore() using alternatives, shrinking defconfig by ~27 kB
- Guard identity-base randomization with kaslr_enabled() so nokaslr
keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y
- Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT &&
m, since built-in would not exercise module loading
- Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
code in the hmcdrv driver
- Drop stale ident_map_size extern conflicting with asm/page.h
* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix warning about wrong kernel doc comment
PCI: s390: Expose the UID as an arch specific PCI slot attribute
docs: s390/pci: Improve and update PCI documentation
s390/pkey: Add comment about synchronize_rcu() to pkey base
s390/hmcdrv: Remove commented out code
s390/zcrypt: Slight rework on the agent_id field
s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb
s390/zcrypt: Rework MKVP fields and handling
s390/zcrypt: Make apfs a real unsigned int field
s390/zcrypt: Rework domain processing within zcrypt device driver
s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code
s390/percpu: Provide arch_raw_cpu_ptr()
s390: Enable page table check for debug_defconfig
s390/pgtable: Add s390 support for page table check
s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
s390/boot: Respect kaslr_enabled() for identity randomization
s390/Kconfig: Make modules sanity test a module-only option
s390/setup: Drop stale ident_map_size declaration
|
|
Pull kvm updates from Paolo Bonzini:
"Arm:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis. This
uses the new infrastructure for 'remote' trace buffers that can be
exposed by non-kernel entities such as firmware, and which came
through the tracing tree
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM
- Finally add support for pKVM protected guests, where pages are
unmapped from the host as they are faulted into the guest and can
be shared back from the guest using pKVM hypercalls. Protected
guests are created using a new machine type identifier. As the
elusive guestmem has not yet delivered on its promises, anonymous
memory is also supported
This is only a first step towards full isolation from the host; for
example, the CPU register state and DMA accesses are not yet
isolated. Because this does not really yet bring fully what it
promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
created. Caveat emptor
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to the
various helpers and rendering a substantial amount of state
immutable
- Expand the Stage-2 page table dumper to support NV shadow page
tables on a per-VM basis
- Tidy up the pKVM PSCI proxy code to be slightly less hard to
follow
- Fix both SPE and TRBE in non-VHE configurations so that they do not
generate spurious, out of context table walks that ultimately lead
to very bad HW lockups
- A small set of patches fixing the Stage-2 MMU freeing in error
cases
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls
- The usual cleanups and other selftest churn
LoongArch:
- Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()
- Add DMSINTC irqchip in kernel support
RISC-V:
- Fix steal time shared memory alignment checks
- Fix vector context allocation leak
- Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
- Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
- Fix integer overflow in kvm_pmu_validate_counter_mask()
- Fix shift-out-of-bounds in make_xfence_request()
- Fix lost write protection on huge pages during dirty logging
- Split huge pages during fault handling for dirty logging
- Skip CSR restore if VCPU is reloaded on the same core
- Implement kvm_arch_has_default_irqchip() for KVM selftests
- Factored-out ISA checks into separate sources
- Added hideleg to struct kvm_vcpu_config
- Factored-out VCPU config into separate sources
- Support configuration of per-VM HGATP mode from KVM user space
s390:
- Support for ESA (31-bit) guests inside nested hypervisors
- Remove restriction on memslot alignment, which is not needed
anymore with the new gmap code
- Fix LPSW/E to update the bear (which of course is the breaking
event address register)
x86:
- Shut up various UBSAN warnings on reading module parameter before
they were initialized
- Don't zero-allocate page tables that are used for splitting
hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
the page table and thus write all bytes
- As an optimization, bail early when trying to unsync 4KiB mappings
if the target gfn can just be mapped with a 2MiB hugepage
x86 generic:
- Copy single-chunk MMIO write values into struct kvm_vcpu (more
precisely struct kvm_mmio_fragment) to fix use-after-free stack
bugs where KVM would dereference stack pointer after an exit to
userspace
- Clean up and comment the emulated MMIO code to try to make it
easier to maintain (not necessarily "easy", but "easier")
- Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
VMX and SVM enabling) as it is needed for trusted I/O
- Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
- Immediately fail the build if a required #define is missing in one
of KVM's headers that is included multiple times
- Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
exception, mostly to prevent syzkaller from abusing the uAPI to
trigger WARNs, but also because it can help prevent userspace from
unintentionally crashing the VM
- Exempt SMM from CPUID faulting on Intel, as per the spec
- Misc hardening and cleanup changes
x86 (AMD):
- Fix and optimize IRQ window inhibit handling for AVIC; make it
per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
vCPUs have to-be-injected IRQs
- Clean up and optimize the OSVW handling, avoiding a bug in which
KVM would overwrite state when enabling virtualization on multiple
CPUs in parallel. This should not be a problem because OSVW should
usually be the same for all CPUs
- Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
about a "too large" size based purely on user input
- Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION
- Disallow synchronizing a VMSA of an already-launched/encrypted
vCPU, as doing so for an SNP guest will crash the host due to an
RMP violation page fault
- Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
queries are required to hold kvm->lock, and enforce it by lockdep.
Fix various bugs where sev_guest() was not ensured to be stable for
the whole duration of a function or ioctl
- Convert a pile of kvm->lock SEV code to guard()
- Play nicer with userspace that does not enable
KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
payload would end up in EXITINFO2 rather than CR2, for example).
Only set CR2 and DR6 when consumption of the payload is imminent,
but on the other hand force delivery of the payload in all paths
where userspace retrieves CR2 or DR6
- Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
instead of vmcb02->save.cr2. The value is out of sync after a
save/restore or after a #PF is injected into L2
- Fix a class of nSVM bugs where some fields written by the CPU are
not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
are not up-to-date when saved by KVM_GET_NESTED_STATE
- Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
initialized after save+restore
- Add a variety of missing nSVM consistency checks
- Fix several bugs where KVM failed to correctly update VMCB fields
on nested #VMEXIT
- Fix several bugs where KVM failed to correctly synthesize #UD or
#GP for SVM-related instructions
- Add support for save+restore of virtualized LBRs (on SVM)
- Refactor various helpers and macros to improve clarity and
(hopefully) make the code easier to maintain
- Aggressively sanitize fields when copying from vmcb12, to guard
against unintentionally allowing L1 to utilize yet-to-be-defined
features
- Fix several bugs where KVM botched rAX legality checks when
emulating SVM instructions. There are remaining issues in that KVM
doesn't handle size prefix overrides for 64-bit guests
- Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
down on AMD's architectural but sketchy behavior of generating #GP
for "unsupported" addresses)
- Cache all used vmcb12 fields to further harden against TOCTOU bugs
x86 (Intel):
- Drop obsolete branch hint prefixes from the VMX instruction macros
- Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
register input when appropriate
- Code cleanups
guest_memfd:
- Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
support reclaim, the memory is unevictable, and there is no storage
to write back to
LoongArch selftests:
- Add KVM PMU test cases
s390 selftests:
- Enable more memory selftests
x86 selftests:
- Add support for Hygon CPUs in KVM selftests
- Fix a bug in the MSR test where it would get false failures on
AMD/Hygon CPUs with exactly one of RDPID or RDTSCP
- Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
for a bug where the kernel would attempt to collapse guest_memfd
folios against KVM's will"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
KVM: x86: use inlines instead of macros for is_sev_*guest
x86/virt: Treat SVM as unsupported when running as an SEV+ guest
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
KVM: SEV: use mutex guard in snp_handle_guest_req()
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
KVM: SEV: use mutex guard in snp_launch_update()
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
KVM: SEV: WARN on unhandled VM type when initializing VM
KVM: LoongArch: selftests: Add PMU overflow interrupt test
KVM: LoongArch: selftests: Add basic PMU event counting test
KVM: LoongArch: selftests: Add cpucfg read/write helpers
LoongArch: KVM: Add DMSINTC inject msi to vCPU
LoongArch: KVM: Add DMSINTC device support
LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "pid: make sub-init creation retryable" (Oleg Nesterov)
Make creation of init in a new namespace more robust by clearing away
some historical cruft which is no longer needed. Also some
documentation fixups
- "selftests/fchmodat2: Error handling and general" (Mark Brown)
Fix and a cleanup for the fchmodat2() syscall selftest
- "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)
- "hung_task: Provide runtime reset interface for hung task detector"
(Aaron Tomlin)
Give administrators the ability to zero out
/proc/sys/kernel/hung_task_detect_count
- "tools/getdelays: use the static UAPI headers from
tools/include/uapi" (Thomas Weißschuh)
Teach getdelays to use the in-kernel UAPI headers rather than the
system-provided ones
- "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)
Several cleanups and fixups to the hardlockup detector code and its
documentation
- "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)
A couple of small/theoretical fixes in the bch code
- "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)
- "cleanup the RAID5 XOR library" (Christoph Hellwig)
A quite far-reaching cleanup to this code. I can't do better than to
quote Christoph:
"The XOR library used for the RAID5 parity is a bit of a mess right
now. The main file sits in crypto/ despite not being cryptography
and not using the crypto API, with the generic implementations
sitting in include/asm-generic and the arch implementations
sitting in an asm/ header in theory. The latter doesn't work for
many cases, so architectures often build the code directly into
the core kernel, or create another module for the architecture
code.
Change this to a single module in lib/ that also contains the
architecture optimizations, similar to the library work Eric
Biggers has done for the CRC and crypto libraries later. After
that it changes to better calling conventions that allow for
smarter architecture implementations (although none is contained
here yet), and uses static_call to avoid indirection function call
overhead"
- "lib/list_sort: Clean up list_sort() scheduling workarounds"
(Kuan-Wei Chiu)
Clean up this library code by removing a hacky thing which was added
for UBIFS, which UBIFS doesn't actually need
- "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)
Fix a few bugs in the scatterlist code, add in-kernel tests for the
now-fixed bugs and fix a leak in the test itself
- "kdump: Enable LUKS-encrypted dump target support in ARM64 and
PowerPC" (Coiby Xu)
Enable support of the LUKS-encrypted device dump target on arm64 and
powerpc
- "ocfs2: consolidate extent list validation into block read callbacks"
(Joseph Qi)
Cleanup, simplify, and make more robust ocfs2's validation of extent
list fields (Kernel test robot loves mounting corrupted fs images!)
* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
ocfs2: validate group add input before caching
ocfs2: validate bg_bits during freefrag scan
ocfs2: fix listxattr handling when the buffer is full
doc: watchdog: fix typos etc
update Sean's email address
ocfs2: use get_random_u32() where appropriate
ocfs2: split transactions in dio completion to avoid credit exhaustion
ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
ocfs2: validate extent block list fields during block read
ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
ocfs2: validate dx_root extent list fields during block read
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
ocfs2: handle invalid dinode in ocfs2_group_extend
.get_maintainer.ignore: add Askar
ocfs2: validate bg_list extent bounds in discontig groups
checkpatch: exclude forward declarations of const structs
tools/accounting: handle truncated taskstats netlink messages
taskstats: set version in TGID exit notifications
ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- randomize_kstack: Improve implementation across arches (Ryan Roberts)
- lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
- refcount: Remove unused __signed_wrap function annotations
* tag 'hardening-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
refcount: Remove unused __signed_wrap function annotations
randomize_kstack: Unify random source across arches
randomize_kstack: Maintain kstack_offset per task
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- ESA nesting support
- 4k memslots
- LPSW/E fix
|
|
On s390, an individual PCI function can generally be identified by two
identifiers, the FID and the UID. Which identifier is used depends on
the scope and the platform configuration.
The first identifier, the FID, is always available and identifies a PCI
device uniquely within a machine. The FID may be virtualized by
hypervisors, but on the LPAR level, the machine scope makes it
impossible to create the same configuration based on FIDs on two
different LPARs of the same machine, and difficult to reuse across
machines.
Such matching LPAR configurations are useful, though, allowing
standardized setups and booting a Linux installation on different LPARs.
To this end the UID, or user-defined identifier, was introduced. While
it is only guaranteed to be unique within an LPAR and only if indicated
by firmware, it allows users to replicate PCI device setups.
On s390, which uses a machine hypervisor, a per PCI function hotplug
model is used. The shortcoming with the UID then is, that it is not
visible to the user without first attaching the PCI function and
accessing the "uid" device attribute. The FID, on the other hand, is
used as the slot name and is thus known even with the PCI function in
standby.
Remedy this shortcoming by providing the UID as an attribute on the slot
allowing the user to identify a PCI function based on the UID without
having to first attach it. Do this via a macro mechanism analogous to
what was introduced by commit 265baca69a07 ("s390/pci: Stop usurping
pdev->dev.groups") for the PCI device attributes.
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/slot.c
Link: https://lore.kernel.org/r/20260407-uid_slot-v8-2-15ae4409d2ce@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS
and friends and update the architecture-specific definitions also.
We then have to update some KSM logic to handle VMA flags, and introduce
VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS.
We also introduce two helper functions for use during the time we are
converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and
legacy_to_vma_flags().
This enables us to iteratively make changes to break these changes up into
separate parts.
We use these explicitly here to keep VM_STACK_FLAGS around for certain
users which need to maintain the legacy vm_flags_t values for the time
being.
We are no longer able to rely on the simple VM_xxx being set to zero if
the feature is not enabled, so in the case of VM_DROPPABLE we introduce
VMA_DROPPABLE as the vma_flags_t equivalent, which is set to
EMPTY_VMA_FLAGS if the droppable flag is not available.
While we're here, we make the description of do_brk_flags() into a kdoc
comment, as it almost was already.
We use vma_flags_to_legacy() to not need to update the vm_get_page_prot()
logic as this time.
Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON()
with a VM_WARN_ON_ONCE() as the tested values are no longer build time
available.
We also update mprotect_fixup() to use VMA flags where possible, though we
have to live with a little duplication between vm_flags_t and vma_flags_t
values for the time being until further conversions are made.
While we're here, update VM_SPECIAL to be defined in terms of
VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy().
Finally, we update the VMA tests to reflect these changes.
Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> [SELinux]
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The pmdp_clear_flush_young() is used to clear the young flag and flush the
TLB, returning whether the young flag was set for this PMD entry. Change
the return type to bool to make the intention clearer.
Link: https://lkml.kernel.org/r/a668b9a974c0d675e7a41f6973bcbe3336e8b373.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Callers use pmdp_test_and_clear_young() to clear the young flag and check
whether it was set for this PMD entry. Change the return type to bool to
make the intention clearer.
Link: https://lkml.kernel.org/r/f1d31307a13365d3d0fed5809727dcc2dd59631b.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ptep_clear_flush_young() and clear_flush_young_ptes() are used to
clear the young flag and flush the TLB, returning whether the young flag
was set. Change the return type to bool to make the intention clearer.
Link: https://lkml.kernel.org/r/24af5144b96103631594501f77d4525f2475c1be.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "change young flag check functions to return bool", v2.
This is a cleanup patchset to change all young flag check functions to
return bool, as discussed with David in the previous thread[1]. Since
callers only care about whether the young flag was set, returning bool
makes the intention clearer. No functional changes intended.
This patch (of 6):
Callers use ptep_test_and_clear_young() to clear the young flag and check
whether it was set. Change the return type to bool to make the intention
clearer.
Link: https://lkml.kernel.org/r/cover.1774075004.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/57e70efa9703d43959aa645246ea3cbdba14fa17.1774075004.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move the asm/xor.h headers to lib/raid/xor/$(SRCARCH)/xor_arch.h and
include/linux/raid/xor_impl.h to lib/raid/xor/xor_impl.h so that the
xor.ko module implementation is self-contained in lib/raid/.
As this remove the asm-generic mechanism a new kconfig symbol is added to
indicate that a architecture-specific implementations exists, and
xor_arch.h should be included.
Link: https://lkml.kernel.org/r/20260327061704.3707577-22-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Drop the pretty confusing historic XOR_TRY_TEMPLATES and
XOR_SELECT_TEMPLATE, and instead let the architectures provide a
arch_xor_init that calls either xor_register to register candidates or
xor_force to force a specific implementation.
Link: https://lkml.kernel.org/r/20260327061704.3707577-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Sterba <dsterba@suse.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Li Nan <linan122@huawei.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The prefix page address occupies a different number
of bits for z/Architecture versus ESA mode. Adjust the
definition to cover both, and permit an ESA mode
address within the nested codepath.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
Linux/KVM runs in z/Architecture-only mode. Although z/Architecture
is built upon a long history of hardware refinements, any other
CPU mode is not permitted.
Allow a userspace to explicitly enable the use of ESA mode for
nested guests, otherwise usage will be rejected.
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Add array_index_nospec() to syscall dispatch table lookup to prevent
limited speculative out-of-bounds access with user-controlled syscall
number
- Mark array_index_mask_nospec() __always_inline since GCC may emit an
out-of-line call instead of the inline data dependency sequence the
mitigation relies on
- Clear r12 on kernel entry to prevent potential speculative use of
user value in system_call, ext/io/mcck interrupt handlers
* tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/entry: Scrub r12 register on kernel entry
s390/syscalls: Add spectre boundary for syscall dispatch table
s390/barrier: Make array_index_mask_nospec() __always_inline
|
|
Mark array_index_mask_nospec() as __always_inline to guarantee the
mitigation is emitted inline regardless of compiler inlining decisions.
Fixes: e2dd833389cc ("s390: add optimized array_index_mask_nospec")
Cc: stable@kernel.org
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().
There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.
So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:
- We can remove choose_random_kstack_offset(); That was only there to
try to make the timestamp counter value a bit harder to influence
from user space [*].
- The architecture code is simplified. All it has to do now is call
add_random_kstack_offset() in the syscall path.
- The strength of the randomness can be reasoned about independently
of the architecture.
- Arches previously using get_random_u16() now have much faster
syscall paths, see below results.
[*] Additionally, this gets rid of some redundant work on s390 and x86.
Before this patch, those architectures called
choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(),
which is also called for exception returns to userspace which were *not*
syscalls (e.g. regular interrupts). Getting rid of
choose_random_kstack_offset() avoids a small amount of redundant work
for the non-syscall cases.
In some configurations, add_random_kstack_offset() will now call
instrumentable code, so for a couple of arches, I have moved the call a
bit later to the first point where instrumentation is allowed. This
doesn't impact the efficacy of the mechanism.
There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.
Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.
Performance data are provided below. The baseline is v6.18 with rndstack
on for each respective arch. (I)/(R) indicate statistically significant
improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal).
x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge):
+-----------------+--------------+---------------+---------------+
| Benchmark | Result Class | per-cpu-prng | per-cpu-prng |
| | | arm64 (metal) | x86_64 (VM) |
+=================+==============+===============+===============+
| syscall/getpid | mean (ns) | (I) -9.50% | (I) -17.65% |
| | p99 (ns) | (I) -59.24% | (I) -24.41% |
| | p99.9 (ns) | (I) -59.52% | (I) -28.52% |
+-----------------+--------------+---------------+---------------+
| syscall/getppid | mean (ns) | (I) -9.52% | (I) -19.24% |
| | p99 (ns) | (I) -59.25% | (I) -25.03% |
| | p99.9 (ns) | (I) -59.50% | (I) -28.17% |
+-----------------+--------------+---------------+---------------+
| syscall/invalid | mean (ns) | (I) -10.31% | (I) -18.56% |
| | p99 (ns) | (I) -60.79% | (I) -20.06% |
| | p99.9 (ns) | (I) -61.04% | (I) -25.04% |
+-----------------+--------------+---------------+---------------+
I tested an earlier version of this change on x86 bare metal and it
showed a smaller but still significant improvement. The bare metal
system wasn't available this time around so testing was done in a VM
instance. I'm guessing the cost of rdtsc is higher for VMs.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Provide an s390 specific arch_raw_cpu_ptr() implementation which avoids the
detour over get_lowcore() to get the lowcore pointer. The inline assembly
is implemented with an alternative so that relocated lowcore (percpu offset
is at a different address) is handled correctly.
This turns code like this
102f78: a7 39 00 00 lghi %r3,0
102f7c: e3 20 33 b8 00 08 ag %r2,952(%r3)
which adds the percpu offset to register r2 into a single instruction
102f7c: e3 20 33 b8 00 08 ag %r2,952(%r0)
and also avoids the need of a base register, thus reducing register
pressure.
With defconfig bloat-o-meter -t provides this result:
add/remove: 12/26 grow/shrink: 183/3391 up/down: 14880/-41950 (-27070)
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes for 7.0
- fix deadlock in new memory management
- handle kernel faults on donated memory properly
- fix bounds checking for irq routing + selftest
- fix invalid machine checks + logging
|
|
The recent XFER_TO_GUEST_WORK change resulted in a situation, where the
vsie code would interpret a signal during work as a machine check during
SIE as both use the EINTR return code.
The exit_reason of the sie64a function has nothing to do with the
kvm_run exit_reason. Rename it and define a specific code for machine
checks instead of abusing -EINTR.
rename exit_reason into sie_return to avoid the naming conflict
and change the code flow in vsie.c to have a separate variable for rc
and sie_return.
Fixes: 2bd1337a1295e ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions")
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux into features
Add s390 support for CONFIG_PAGE_TABLE_CHECK.
* 'page-table-check-support' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Enable page table check for debug_defconfig
s390/pgtable: Add s390 support for page table check
s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add page table check hooks into routines that modify user page
tables.
Unlike other architectures s390 does not have means to distinguish
between kernel and user page table entries. Rely on the fact the
page table check infrastructure itself operates on non-init_mm
memory spaces only.
Use the provided mm_struct to verify that the memory space is not
init_mm (aka not the kernel memory space) indeed. That check is
supposed to be succeeded already (on some code paths even twice).
If the passed memory space by contrast is init_mm that would be an
unexpected semantical change in generic code, so do VM_BUG_ON() in
such case.
Unset _SEGMENT_ENTRY_READ bit to indicate that pmdp_invalidate()
was applied against a huge PMD and is going to be updated by
set_pmd_at() shortly. The hook pmd_user_accessible_page() should
skip such entries until that, otherwise the page table accounting
falls apart and BUG_ON() gets hit as result.
The invalidated huge PMD entry should not be confused with a PROT_NONE
entry as reported by pmd_protnone(), though the entry characteristics
exactly match: _SEGMENT_ENTRY_LARGE is set while _SEGMENT_ENTRY_READ is
unset. Since pmd_protnone() implementation depends on NUMA_BALANCING
configuration option, it should not be used in pmd_user_accessible_page()
check, which is expected to be CONFIG_NUMA_BALANCING-agnostic.
Nevertheless, an invalidated huge PMD is technically still pmd_protnone()
entry and it should not break other code paths once _SEGMENT_ENTRY_READ
is unset. As of now, all pmd_protnone() checks are done under page table
locks or exercise GUP-fast and HMM code paths, which are expected to be
safe against concurrent page table updates.
Alternative approach would be using the last remaining unused PMD entry
bit 0x800 to indicate that pmdp_invalidate() was called on a PMD. That
would allow avoiding collisions with pmd_protnone() handling code paths,
but saving the bit is more preferable way to go.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Co-developed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/4db8a681205bd555298d62441cdcfca43317a35a.1772812343.git.agordeev@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Commit 3a5a8d343e1c ("mm: fix race between __split_huge_pmd_locked()
and GUP-fast") failed to follow the convention and used direct PMD
entry modification instead of set_pmd_bit().
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/a9248694a38cc898d3f0628f59b8abb57d56a416.1772812343.git.agordeev@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
ident_map_size is no longer a standalone variable and the declaration
in asm/setup.h conflicts with its current definition in asm/page.h
introduced with commit 236f324b7473 ("s390/mm: Create virtual memory
layout structure"). Remove the stale declaration.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The __stackleak_poison() inline assembly comes with a "count" operand where
the "d" constraint is used. "count" is used with the exrl instruction and
"d" means that the compiler may allocate any register from 0 to 15.
If the compiler would allocate register 0 then the exrl instruction would
not or the value of "count" into the executed instruction - resulting in a
stackframe which is only partially poisoned.
Use the correct "a" constraint, which excludes register 0 from register
allocation.
Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
psw_idle() does not exist anymore. Remove its prototype.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Inline update_timer_idle() again to avoid an extra function call. This
way the generated code is close to old assembler version again.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
With the conversion to generic entry [1] cpu idle exit cpu time accounting
was converted from assembly to C. This introduced an reversed order of cpu
time accounting.
On cpu idle exit the current accounting happens with the following call
chain:
-> do_io_irq()/do_ext_irq()
-> irq_enter_rcu()
-> account_hardirq_enter()
-> vtime_account_irq()
-> vtime_account_kernel()
vtime_account_kernel() accounts the passed cpu time since last_update_timer
as system time, and updates last_update_timer to the current cpu timer
value.
However the subsequent call of
-> account_idle_time_irq()
will incorrectly subtract passed cpu time from timer_idle_enter to the
updated last_update_timer value from system_timer. Then last_update_timer
is updated to a sys_enter_timer, which means that last_update_timer goes
back in time.
Subsequently account_hardirq_exit() will account too much cpu time as
hardirq time. The sum of all accounted cpu times is still correct, however
some cpu time which was previously accounted as system time is now
accounted as hardirq time, plus there is the oddity that last_update_timer
goes back in time.
Restore previous behavior by extracting cpu time accounting code from
account_idle_time_irq() into a new update_timer_idle() function and call it
before irq_enter_rcu().
Fixes: 56e62a737028 ("s390: convert to generic entry") [1]
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Make KEXEC_SIG available again for CONFIG_MODULES=n
- The s390 topology code used to call rebuild_sched_domains() before
common code scheduling domains were setup. This was silently ignored
by common code, but now results in a warning. Address by avoiding the
early call
- Convert debug area lock from spinlock to raw spinlock to address
lockdep warnings
- The recent 3490 tape device driver rework resulted in a different
device driver name, which is visible via sysfs for user space. This
breaks at least one user space application. Change the device driver
name back to its old name to fix this
* tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/tape: Fix device driver name
s390/debug: Convert debug area lock from a spinlock to a raw spinlock
s390/smp: Avoid calling rebuild_sched_domains() early
s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n
|
|
With PREEMPT_RT as potential configuration option, spinlock_t is now
considered as a sleeping lock, and thus might cause issues when used in
an atomic context. But even with PREEMPT_RT as potential configuration
option, raw_spinlock_t remains as a true spinning lock/atomic context.
This creates potential issues with the s390 debug/tracing feature. The
functions to trace errors are called in various contexts, including
under lock of raw_spinlock_t, and thus the used spinlock_t in each debug
area is in violation of the locking semantics.
Here are two examples involving failing PCI Read accesses that are
traced while holding `pci_lock` in `drivers/pci/access.c`:
=============================
[ BUG: Invalid wait context ]
6.19.0-devel #18 Not tainted
-----------------------------
bash/3833 is trying to lock:
0000027790baee30 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/3833:
#0: 0000027efbb29450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 00000277f0504a90 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 00000277beed8c18 (kn->active#339){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 00000277e9859190 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x2e/0x40
#4: 00000383068a7708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x4a/0xb0
stack backtrace:
CPU: 6 UID: 0 PID: 3833 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #18 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<00000383048afec2>] dump_stack_lvl+0xa2/0xe8
[<00000383049ba166>] __lock_acquire+0x816/0x1660
[<00000383049bb1fa>] lock_acquire+0x24a/0x370
[<00000383059e3860>] _raw_spin_lock_irqsave+0x70/0xc0
[<00000383048bbb6c>] debug_event_common+0xfc/0x300
[<0000038304900b0a>] __zpci_load+0x17a/0x1f0
[<00000383048fad88>] pci_read+0x88/0xd0
[<00000383054cbce0>] pci_bus_read_config_dword+0x70/0xb0
[<00000383054d55e4>] pci_dev_wait+0x174/0x290
[<00000383054d5a3e>] __pci_reset_function_locked+0xfe/0x170
[<00000383054d9b30>] pci_reset_function+0xd0/0x100
[<00000383054ee21a>] reset_store+0x5a/0x80
[<0000038304e98758>] kernfs_fop_write_iter+0x1e8/0x260
[<0000038304d995da>] new_sync_write+0x13a/0x180
[<0000038304d9c5d0>] vfs_write+0x200/0x330
[<0000038304d9c88c>] ksys_write+0x7c/0xf0
[<00000383059cfa80>] __do_syscall+0x210/0x500
[<00000383059e4c06>] system_call+0x6e/0x90
INFO: lockdep is turned off.
=============================
[ BUG: Invalid wait context ]
6.19.0-devel #3 Not tainted
-----------------------------
bash/6861 is trying to lock:
0000009da05c7430 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300
other info that might help us debug this:
context-{5:5}
5 locks held by bash/6861:
#0: 000000acff404450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0
#1: 000000acff41c490 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260
#2: 0000009da36937d8 (kn->active#75){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260
#3: 0000009dd15250d0 (&zdev->state_lock){+.+.}-{4:4}, at: enable_slot+0x2e/0xc0
#4: 000001a19682f708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_byte+0x42/0xa0
stack backtrace:
CPU: 16 UID: 0 PID: 6861 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #3 PREEMPTLAZY
Hardware name: IBM 9175 ME1 701 (LPAR)
Call Trace:
[<000001a194837ec2>] dump_stack_lvl+0xa2/0xe8
[<000001a194942166>] __lock_acquire+0x816/0x1660
[<000001a1949431fa>] lock_acquire+0x24a/0x370
[<000001a19596b810>] _raw_spin_lock_irqsave+0x70/0xc0
[<000001a194843b6c>] debug_event_common+0xfc/0x300
[<000001a194888b0a>] __zpci_load+0x17a/0x1f0
[<000001a194882d88>] pci_read+0x88/0xd0
[<000001a195453b88>] pci_bus_read_config_byte+0x68/0xa0
[<000001a195457bc2>] pci_setup_device+0x62/0xad0
[<000001a195458e70>] pci_scan_single_device+0x90/0xe0
[<000001a19488a0f6>] zpci_bus_scan_device+0x46/0x80
[<000001a19547f958>] enable_slot+0x98/0xc0
[<000001a19547f134>] power_write_file+0xc4/0x110
[<000001a194e20758>] kernfs_fop_write_iter+0x1e8/0x260
[<000001a194d215da>] new_sync_write+0x13a/0x180
[<000001a194d245d0>] vfs_write+0x200/0x330
[<000001a194d2488c>] ksys_write+0x7c/0xf0
[<000001a195957a30>] __do_syscall+0x210/0x500
[<000001a19596cbb6>] system_call+0x6e/0x90
INFO: lockdep is turned off.
Since it is desired to keep it possible to create trace records in most
situations, including this particular case (failing PCI config space
accesses are relevant), convert the used spinlock_t in `struct
debug_info` to raw_spinlock_t.
The impact is small, as the debug area lock only protects bounded memory
access without external dependencies, apart from one function
debug_set_size() where kfree() is implicitly called with the lock held.
Move debug_info_free() out of this lock, to keep remove this external
dependency.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Pull KVM updates from Paolo Bonzini:
"Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register
state
- Fix return value for memory reads from and writes to in-kernel
devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to
guests
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators
stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page
allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that
enlisted help from the architecture's page table management to
build hypervisor page tables, in particular enabling sharing the
last level of page tables. This however was a lot of code (~3K
lines) in order to support KVM, and also blocked several features.
The biggest advantages is that the page size of userspace is
completely independent of the page size used by the guest:
userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
and in fact transparent hugepages were not possible before. It's
also now possible to have nested guests and guests with huge pages
running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware
(contexted switched around the fastpath run loop) and allowing
direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce
event filtering and to prevent the guest from profiling sensitive
host state. This is more accurate, since it has no risk of
contention and thus dropped events, and also has significantly less
overhead.
For more information, see the commit message for merge commit
bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all
the same reasons KVM disallows change the model after the first
KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV
MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
(for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
Only a few exports that are intended for external usage, and those
are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore
-EBUSY instead of WARNing. Userspace can sometimes put the vCPU
into what should be an impossible state, and spurious exit to
userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being
blocked when vCPU is in Wait-For-SIPI, which also resulted in
playing whack-a-mole with syzkaller stuffing architecturally
impossible states into KVM
- Add support for new Intel instructions that don't require anything
beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the
intended setup flow, as nested VMX in particular is sensitive to
unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI
broadcasts when the suppression feature is enabled by the guest
(currently limited to split IRQCHIP, i.e. userspace I/O APIC).
Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
option as some userspaces have come to rely on KVM's buggy behavior
(KVM advertises Supress EOI Broadcast irrespective of whether or
not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's
unused ASSERT() macro, most of which could be trivially triggered
by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to
make it more obvious what the weird parameter is used for, and to
allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in
kvm_apic_update_apicv() to consolidate the updates, and to
co-locate SVI updates with the updates for KVM's own cache of ISR
information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to
temporarily switch to vmcs01 instead of deferring the update until
the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs,
was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page
faults, and instead always reflect them into the guest. Since KVM
doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup
handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported,
mostly to avoid weirdness and complexity for FRED and other
features, where KVM wants enable VMCS shadowing for fields that
conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to
load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of
ERAPS relies on an upcoming, publicly announced change in the APM
to reduce the set of conditions where hardware (i.e. KVM) *must*
flush the RAP
- Ignore nSVM intercepts for instructions that are not supported
according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's
fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently
of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when
emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
support, and extend x86's infrastructure to support EPT and NPT
(for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression
test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
improve test coverage and to hopefully make the test easier to
understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked
hugepage handling. SEV/SNP was the only user of the tracking and it
can do it via the RMP
- Retroactively document and enforce (for SNP) that
KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
source page to be 4KiB aligned, to avoid non-trivial complexity for
something that no known VMM seems to be doing and to avoid an API
special case for in-place conversion, which simply can't support
unaligned sources
- When populating guest_memfd memory, GUP the source page in common
code and pass the refcounted page to the vendor callback, instead
of letting vendor code do the heavy lifting. Doing so avoids a
looming deadlock bug with in-place due an AB-BA conflict betwee
mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space
when creating a vCPU-specific mapping of guest memory. Actually
this bug could not be hit even on x86, the only architecture with
multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
KVM: s390: Increase permitted SE header size to 1 MiB
MAINTAINERS: Replace backup for s390 vfio-pci
KVM: s390: vsie: Fix race in acquire_gmap_shadow()
KVM: s390: vsie: Fix race in walk_guest_tables()
KVM: s390: Use guest address to mark guest page dirty
irqchip/riscv-imsic: Adjust the number of available guest irq files
RISC-V: KVM: Transparent huge page support
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add riscv vm satp modes
KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
RISC-V: KVM: Remove unnecessary 'ret' assignment
KVM: s390: Add explicit padding to struct kvm_s390_keyop
KVM: LoongArch: selftests: Add steal time test case
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves
disk space by teaching ocfs2 to reclaim suballocator block group
space (Heming Zhao)
- "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the
ARRAY_END() macro and uses it in various places (Alejandro Colomar)
- "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes
the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the
page size (Pnina Feder)
- "kallsyms: Prevent invalid access when showing module buildid" cleans
up kallsyms code related to module buildid and fixes an invalid
access crash when printing backtraces (Petr Mladek)
- "Address page fault in ima_restore_measurement_list()" fixes a
kexec-related crash that can occur when booting the second-stage
kernel on x86 (Harshit Mogalapalli)
- "kho: ABI headers and Documentation updates" updates the kexec
handover ABI documentation (Mike Rapoport)
- "Align atomic storage" adds the __aligned attribute to atomic_t and
atomic64_t definitions to get natural alignment of both types on
csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain)
- "kho: clean up page initialization logic" simplifies the page
initialization logic in kho_restore_page() (Pratyush Yadav)
- "Unload linux/kernel.h" moves several things out of kernel.h and into
more appropriate places (Yury Norov)
- "don't abuse task_struct.group_leader" removes the usage of
->group_leader when it is "obviously unnecessary" (Oleg Nesterov)
- "list private v2 & luo flb" adds some infrastructure improvements to
the live update orchestrator (Pasha Tatashin)
* tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits)
watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency
procfs: fix missing RCU protection when reading real_parent in do_task_stat()
watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()
kcsan, compiler_types: avoid duplicate type issues in BPF Type Format
kho: fix doc for kho_restore_pages()
tests/liveupdate: add in-kernel liveupdate test
liveupdate: luo_flb: introduce File-Lifecycle-Bound global state
liveupdate: luo_file: Use private list
list: add kunit test for private list primitives
list: add primitives for private list manipulations
delayacct: fix uapi timespec64 definition
panic: add panic_force_cpu= parameter to redirect panic to a specific CPU
netclassid: use thread_group_leader(p) in update_classid_task()
RDMA/umem: don't abuse current->group_leader
drm/pan*: don't abuse current->group_leader
drm/amd: kill the outdated "Only the pthreads threading model is supported" checks
drm/amdgpu: don't abuse current->group_leader
android/binder: use same_thread_group(proc->tsk, current) in binder_mmap()
android/binder: don't abuse current->group_leader
kho: skip memoryless NUMA nodes when reserving scratch areas
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes
arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data
compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
page ranges for hugepages. Large improvements during demand faulting
are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
stats" improves DAMOS stat's provided information, deterministic
control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several
issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves
the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and
consolidates a pile of straggly code related to reservation of
hugetlb memory from bootmem and creation of CMA areas for hugetlb
(Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma
implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of
the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the
memcg ID code and prevents the internal-only private IDs from being
exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the
allocation of frozen folios and avoids some atomic refcount
operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
of memory betewwn the active and inactive LRUs and adds auto-tuning
of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes
CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
nodes_and() and nodes_andnot() propagate the return values from the
underlying bit operations, enabling some cleanup in calling code
(Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon
infrastructure and its page migration function. Mainly cleanups, also
some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
part of Marco's kernel-wide migration from the legacy workqueue APIs
over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated
improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
folio allocation, mainly by avoiding unnecessary work in
pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation
reliability" improves the reliability of two of the DAMON selftests
(SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and
DAMON_MIN_REGION" does some cleanup work in the core DAMON code
(SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans
up the core VMA code. The main aim here is to be able to use the mmap
write lock's lockdep state to perform various assertions regarding
the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old
swap code (swap cache bypassing and swap synchronization) which
wasn't working very well. Various other cleanups and simplifications
were made. The end result is a 20% speedup in one benchmark (Kairui
Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
available on 64-bit alpha, loongarch, mips, parisc, and um. Various
cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
mm/memory: handle non-split locks correctly in zap_empty_pte_table()
mm: move pte table reclaim code to memory.c
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
zsmalloc: make common caches global
mm: add SPDX id lines to some mm source files
mm/zswap: use %pe to print error pointers
mm/vmscan: use %pe to print error pointers
mm/readahead: fix typo in comment
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
mm: refactor vma_map_pages to use vm_insert_pages
mm/damon: unify address range representation with damon_addr_range
mm/cma: replace snprintf with strscpy in cma_new_area
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
"x86 PMU driver updates:
- Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
(Dapeng Mi)
Compared to previous iterations of the Intel PMU code, there's been
a lot of changes, which center around three main areas:
- Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
Off-Core Response (OCR) facility
- New PEBS data source encoding layout
- Support the new "RDPMC user disable" feature
- Likewise, a large series adds uncore PMU support for Intel Diamond
Rapids (DMR) CPUs (Zide Chen)
This centers around these four main areas:
- DMR may have two Integrated I/O and Memory Hub (IMH) dies,
separate from the compute tile (CBB) dies. Each CBB and each IMH
die has its own discovery domain.
- Unlike prior CPUs that retrieve the global discovery table
portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
discovery and MSR for CBB PMON discovery.
- DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.
- IIO free-running counters in DMR are MMIO-based, unlike SPR.
- Also add support for Add missing PMON units for Intel Panther Lake,
and support Nova Lake (NVL), which largely maps to Panther Lake.
(Zide Chen)
- KVM integration: Add support for mediated vPMUs (by Kan Liang and
Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
Sandipan Das and Mingwei Zhang)
- Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
which are a low-power variant of Panther Lake (Zide Chen)
- Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
(aka MaxLinear Lightning Mountain), which maps to the existing
Airmont code (Martin Schiller)
Performance enhancements:
- Speed up kexec shutdown by avoiding unnecessary cross CPU calls
(Jan H. Schönherr)
- Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)
User-space stack unwinding support:
- Various cleanups and refactorings in preparation to generalize the
unwinding code for other architectures (Jens Remus)
Uprobes updates:
- Transition from kmap_atomic to kmap_local_page (Keke Ming)
- Fix incorrect lockdep condition in filter_chain() (Breno Leitao)
- Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)
Misc fixes and cleanups:
- s390: Remove kvm_types.h from Kbuild (Randy Dunlap)
- x86/intel/uncore: Convert comma to semicolon (Chen Ni)
- x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)
- x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"
* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
s390: remove kvm_types.h from Kbuild
uprobes: Fix incorrect lockdep condition in filter_chain()
x86/ibs: Fix typo in dc_l2tlb_miss comment
x86/uprobes: Fix XOL allocation failure for 32-bit tasks
perf/x86/intel/uncore: Convert comma to semicolon
perf/x86/intel: Add support for rdpmc user disable feature
perf/x86: Use macros to replace magic numbers in attr_rdpmc
perf/x86/intel: Add core PMU support for Novalake
perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
perf/x86/intel: Add core PMU support for DMR
perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
perf/core: Fix slow perf_event_task_exit() with LBR callstacks
perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
uprobes: use kmap_local_page() for temporary page mappings
arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
perf/x86/intel/uncore: Add Nova Lake support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Fix race condition in hwrng core by using RCU
Algorithms:
- Allow authenc(sha224,rfc3686) in fips mode
- Add test vectors for authenc(hmac(sha384),cbc(aes))
- Add test vectors for authenc(hmac(sha224),cbc(aes))
- Add test vectors for authenc(hmac(md5),cbc(des3_ede))
- Add lz4 support in hisi_zip
- Only allow clear key use during self-test in s390/{phmac,paes}
Drivers:
- Set rng quality to 900 in airoha
- Add gcm(aes) support for AMD/Xilinx Versal device
- Allow tfms to share device in hisilicon/trng"
* tag 'v7.0-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (100 commits)
crypto: img-hash - Use unregister_ahashes in img_{un}register_algs
crypto: testmgr - Add test vectors for authenc(hmac(md5),cbc(des3_ede))
crypto: cesa - Simplify return statement in mv_cesa_dequeue_req_locked
crypto: testmgr - Add test vectors for authenc(hmac(sha224),cbc(aes))
crypto: testmgr - Add test vectors for authenc(hmac(sha384),cbc(aes))
hwrng: core - use RCU and work_struct to fix race condition
crypto: starfive - Fix memory leak in starfive_aes_aead_do_one_req()
crypto: xilinx - Fix inconsistant indentation
crypto: rng - Use unregister_rngs in register_rngs
crypto: atmel - Use unregister_{aeads,ahashes,skciphers}
hwrng: optee - simplify OP-TEE context match
crypto: ccp - Add sysfs attribute for boot integrity
dt-bindings: crypto: atmel,at91sam9g46-sha: add microchip,lan9691-sha
dt-bindings: crypto: atmel,at91sam9g46-aes: add microchip,lan9691-aes
dt-bindings: crypto: qcom,inline-crypto-engine: document the Milos ICE
crypto: caam - fix netdev memory leak in dpaa2_caam_probe
crypto: hisilicon/qm - increase wait time for mailbox
crypto: hisilicon/qm - obtain the mailbox configuration at one time
crypto: hisilicon/qm - remove unnecessary code in qm_mb_write()
crypto: hisilicon/qm - move the barrier before writing to the mailbox register
...
|
|
kvm_types.h is mandatory in include/asm-generic/Kbuild so having it
in another Kbuild file causes a warning. Remove it from the arch/
Kbuild file to fix the warning.
../scripts/Makefile.asm-headers:39: redundant generic-y found in ../arch/s390/include/asm/Kbuild: kvm_types.h
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260203184204.1329414-1-rdunlap@infradead.org
|
|
Remove the PGSTE config option.
Remove all code from linux/s390 mm that involves PGSTEs.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Remove the now unused include/asm/gmap.h and mm/gmap.c files.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Switch KVM/s390 to use the new gmap code.
Remove includes to <gmap.h> and include "gmap.h" instead; fix all the
existing users of the old gmap functions to use the new ones instead.
Fix guest storage key access functions to work with the new gmap.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Switch to using IS_ENABLED(CONFIG_KVM) instead of CONFIG_PGSTE, since
the latter will be removed soon.
Many CONFIG_PGSTE are left behind, because they will be removed
completely in upcoming patches. The ones replaced here are mostly the
ones that will stay.
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Add some helper functions for handling multiple guest faults at the
same time.
This will be needed for VSIE, where a nested guest access also needs to
access all the page tables that map it.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Enable KVM_GENERIC_MMU_NOTIFIER, for now with empty placeholder callbacks.
Also enable KVM_MMU_LOCKLESS_AGING and define KVM_HAVE_MMU_RWLOCK.
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
If uv_convert_from_secure_pte() fails, the page becomes unusable by the
host. The failure can only occour in case of hardware malfunction or a
serious KVM bug.
When the unusable page is reused, the system can have issues and
hang.
Print a warning to aid debugging such unlikely scenarios.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Export __make_folio_secure() and s390_wiggle_split_folio(), as they will
be needed to be used by KVM.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Introduce import_lock to avoid future races when converting pages to
secure.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
Add gmap_helper_set_unused() to mark userspace ptes as unused.
Core mm code will use that information to discard unused pages instead
of attempting to swap them.
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Tested-by: Nico Boehr <nrb@linux.ibm.com>
Acked-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|