| Age | Commit message (Collapse) | Author |
|
Add a selftest to verify reads and writes to various MSRs, from both the
guest and host, and expect success/failure based on whether or not the
vCPU supports the MSR according to supported CPUID.
Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
provides more coverage with respect to host accesses, and will be extended
to provide addition testing of CPUID-based features, save/restore lists,
and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
KUT.
If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as
KVM's ABI is a mess; what exactly is supposed to be ignored, and when,
varies wildly.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-46-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Merge the queue of KVM selftests changes for 6.18 to pick up the ex_str()
helper so that it can be used to pretty print expected versus actual
exceptions in a new MSR selftest. CET virtualization will add support for
several MSRs with non-trivial semantics, along with new uAPI for accessing
the guest's Shadow Stack Pointer (SSP) from userspace.
|
|
Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-45-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458c9 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.
Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com>
Link: https://lore.kernel.org/r/20250908210547.12748-1-hsukrut3@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a PMU errata framework and use it to relax precise event counts on
Atom platforms that overcount "Instruction Retired" and "Branch Instruction
Retired" events, as the overcount issues on VM-Exit/VM-Entry are impossible
to prevent from userspace, e.g. the test can't prevent host IRQs.
Setup errata during early initialization and automatically sync the mask
to VMs so that tests can check for errata without having to manually
manage host=>guest variables.
For Intel Atom CPUs, the PMU events "Instruction Retired" or
"Branch Instruction Retired" may be overcounted for some certain
instructions, like FAR CALL/JMP, RETF, IRET, VMENTRY/VMEXIT/VMPTRLD
and complex SGX/SMX/CSTATE instructions/flows.
The detailed information can be found in the errata (section SRF7):
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/sierra-forest/xeon-6700-series-processor-with-e-cores-specification-update/errata-details/
For the Atom platforms before Sierra Forest (including Sierra Forest),
Both 2 events "Instruction Retired" and "Branch Instruction Retired" would
be overcounted on these certain instructions, but for Clearwater Forest
only "Instruction Retired" event is overcounted on these instructions.
Signed-off-by: dongsheng <dongsheng.x.zhang@intel.com>
Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add support for 5 new architectural events (4 topdown level 1 metrics
events and LBR inserts event) that will first show up in Intel's
Clearwater Forest CPUs. Detailed info about the new events can be found
in SDM section 21.2.7 "Pre-defined Architectural Performance Events".
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: drop "unavailable_mask" changes]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Reduce the number of combinations of unavailable PMU events masks that are
testing by the PMU counters test. In reality, testing every possible
combination isn't all that interesting, and certainly not worth the tens
of seconds (or worse, minutes) of runtime. Fully testing the N^2 space
will be especially problematic in the near future, as 5! new arch events
are on their way.
Use alternating bit patterns (and 0 and -1u) in the hopes that _if_ there
is ever a KVM bug, it's not something horribly convoluted that shows up
only with a super specific pattern/value.
Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Track the mask of "unavailable" PMU events as a 32-bit value. While bits
31:9 are currently reserved, silently truncating those bits is unnecessary
and asking for missed coverage. To avoid running afoul of the sanity check
in vcpu_set_cpuid_property(), explicitly adjust the mask based on the
non-reserved bits as reported by KVM's supported CPUID.
Opportunistically update the "all ones" testcase to pass -1u instead of
0xff.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
A new bit PERF_CAPABILITIES[17] called "PEBS_TIMING_INFO" bit is added
to indicated if PEBS supports to record timing information in a new
"Retried Latency" field.
Since KVM requires user can only set host consistent PEBS capabilities,
otherwise the PERF_CAPABILITIES setting would fail, add pebs_timing_info
into the "immutable_caps" to block host inconsistent PEBS configuration
and cause errors.
Opportunistically drop the anythread_deprecated bit. It isn't and likely
never was a PERF_CAPABILITIES flag, the test's definition snuck in when
the union was copy+pasted from the kernel's definition.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
[sean: call out anythread_deprecated change]
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250919214648.1585683-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add stress tests for BPF task-work scheduling kfuncs. The tests spawn
multiple threads that concurrently schedule task_work callbacks against
the same and different map values to exercise the kfuncs under high
contention.
Verify callbacks are reliably enqueued and executed with no drops.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20250923112404.668720-10-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introducing selftests that check BPF task work scheduling mechanism.
Validate that verifier does not accepts incorrect calls to
bpf_task_work_schedule kfunc.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250923112404.668720-9-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Exercise the scenario described in detail in the cover letter:
1) socket A: connect() from ephemeral port X
2) socket B: explicitly bind() to port X
3) check that port X is now excluded from ephemeral ports
4) close socket B to release the port bind
5) socket C: connect() from ephemeral port X
As well as a corner case to test that the connect-bind flag is cleared:
1) connect() from ephemeral port X
2) disconnect the socket with connect(AF_UNSPEC)
3) bind() it explicitly to port X
4) check that port X is now excluded from ephemeral ports
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-2-57168b661b47@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for
nested file systems") we introduced the ability for stacked drivers and
file systems to correctly invoke the f_op->mmap_prepare() handler from an
f_op->mmap() handler via a compatibility layer implemented in
compat_vma_mmap_prepare().
This populates vm_area_desc fields according to those found in the (not
yet fully initialised) VMA passed to f_op->mmap().
However this function implicitly assumes that the struct file which we are
operating upon is equal to vma->vm_file. This is not a safe assumption in
all cases.
The only really sane situation in which this matters would be something
like e.g. i915_gem_dmabuf_mmap() which invokes vfs_mmap() against
obj->base.filp:
ret = vfs_mmap(obj->base.filp, vma);
if (ret)
return ret;
And then sets the VMA's file to this, should the mmap operation succeed:
vma_set_file(vma, obj->base.filp);
That is - it is the file that is intended to back the VMA mapping.
This is not an issue currently, as so far we have only implemented
f_op->mmap_prepare() handlers for some file systems and internal mm uses,
and the only stacked f_op->mmap() operations that can be performed upon
these are those in backing_file_mmap() and coda_file_mmap(), both of which
use vma->vm_file.
However, moving forward, as we convert drivers to using
f_op->mmap_prepare(), this will become a problem.
Resolve this issue by explicitly setting desc->file to the provided file
parameter and update callers accordingly.
Callers are expected to read desc->file and update desc->vm_file - the
former will be the file provided by the caller (if stacked, this may
differ from vma->vm_file).
If the caller needs to differentiate between the two they therefore now
can.
While we are here, also provide a variant of compat_vma_mmap_prepare()
that operates against a pointer to any file_operations struct and does not
assume that the file_operations struct we are interested in is file->f_op.
This function is __compat_vma_mmap_prepare() and we invoke it from
compat_vma_mmap_prepare() so that we share code between the two functions.
This is important, because some drivers provide hooks in a separate
struct, for instance struct drm_device provides an fops field for this
purpose.
Also update the VMA selftests accordingly.
Link: https://lkml.kernel.org/r/dd0c72df8a33e8ffaa243eeb9b01010b670610e9.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: do not assume file == vma->vm_file in
compat_vma_mmap_prepare()", v2.
As part of the efforts to eliminate the problematic f_op->mmap callback, a
new callback - f_op->mmap_prepare was provided.
While we are converting these callbacks, we must deal with 'stacked'
filesystems and drivers - those which in their own f_op->mmap callback
invoke an inner f_op->mmap callback.
To accomodate for this, a compatibility layer is provided that, via
vfs_mmap(), detects if f_op->mmap_prepare is provided and if so, generates
a vm_area_desc containing the VMA's metadata and invokes the call.
So far, we have provided desc->file equal to vma->vm_file. However this
is not necessarily valid, especially in the case of stacked drivers which
wish to assign a new file after the inner hook is invoked.
To account for this, we adjust vm_area_desc to have both file and vm_file
fields. The .vm_file field is strictly set to vma->vm_file (or in the
case of a new mapping, what will become vma->vm_file).
However, .file is set to whichever file vfs_mmap() is invoked with when
using the compatibilty layer.
Therefore, if the VMA's file needs to be updated in .mmap_prepare,
desc->vm_file should be assigned, whilst desc->file should be read.
No current f_op->mmap_prepare users assign desc->file so this is safe to
do.
This makes the .mmap_prepare callback in the context of a stacked
filesystem or driver completely consistent with the existing .mmap
implementations.
While we're here, we do a few small cleanups, and ensure that we const-ify
things correctly in the vm_area_desc struct to avoid hooks accidentally
trying to assign fields they should not.
This patch (of 2):
Stacked filesystems and drivers may invoke mmap hooks with a struct file
pointer that differs from the overlying file. We will make this
functionality possible in a subsequent patch.
In order to prepare for this, let's update vm_area_struct to separately
provide desc->file and desc->vm_file parameters.
The desc->file parameter is the file that the hook is expected to operate
upon, and is not assignable (though the hok may wish to e.g. update the
file's accessed time for instance).
The desc->vm_file defaults to what will become vma->vm_file and is what
the hook must reassign should it wish to change the VMA"s vma->vm_file.
For now we keep desc->file, vm_file the same to remain consistent.
No f_op->mmap_prepare() callback sets a new vma->vm_file currently, so
this is safe to change.
While we're here, make the mm_struct desc->mm pointers at immutable as
well as the desc->mm field itself.
As part of this change, also update the single hook which this would
otherwise break - mlock_future_ok(), invoked by secretmem_mmap_prepare()).
We additionally update set_vma_from_desc() to compare fields in a more
logical fashion, checking the (possibly) user-modified fields as the first
operand against the existing value as the second one.
Additionally, update VMA tests to accommodate changes.
Link: https://lkml.kernel.org/r/cover.1756920635.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3fa15a861bb7419f033d22970598aa61850ea267.1756920635.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pedro Falcato <pfalcato@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test harness uses the verify_sig_setup.sh to generate the required
key material for program signing.
Generate key material for signing LSKEL some lskel programs and use
xxd to convert the verification certificate into a C header file.
Finally, update the main test runner to load this
certificate into the session keyring via the add_key() syscall before
executing any tests. Use the session keyring in the tests with signed
programs.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20250921160120.9711-6-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
server-side info linked to the MPTCP connect/established events can now
come from the flags, in addition to the dedicated attribute.
The attribute is now deprecated -- in favour of the new flag, and will
be removed later on.
Print this info only once.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-4-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This attribute is a boolean. No need to add it to set it to 'false'.
Indeed, the default value when this attribute is not set is naturally
'false'. A few bytes can then be saved by not adding this attribute if
the connection is not on the server side.
This prepares the future deprecation of its attribute, in favour of a
new flag.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250919-net-next-mptcp-server-side-flag-v1-1-a97a5d561a8b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Quoted from musl wiki:
GNU getopt permutes argv to pull options to the front, ahead of
non-option arguments. musl and the POSIX standard getopt stop
processing options at the first non-option argument with no
permutation.
Thus these scripts stop working on musl since non-option arguments for
tools using getopt() (in this case, (ar)ping) do not always come last.
Fix it by reordering arguments.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250919053538.1106753-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
"Fix two user triggerable use-after-free issues:
- Possible race UAF setting up mmaps
- Syzkaller found UAF when erroring an file descriptor creation ioctl
due to the fput() work queue"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
iommufd/selftest: Update the fail_nth limit
iommufd: WARN if an object is aborted with an elevated refcount
iommufd: Fix race during abort for file descriptors
iommufd: Fix refcounting race during mmap
|
|
Remove the IsBranch flag from ENTER and LEAVE in KVM's emulator, as ENTER
and LEAVE are stack operations, not branches. Add forced emulation of
said instructions to the PMU counters test to prove that KVM diverges from
hardware, and to guard against regressions.
Opportunistically add a missing "1 MOV" to the selftest comment regarding
the number of instructions per loop, which commit 7803339fa929 ("KVM:
selftests: Use data load to trigger LLC references/misses in Intel PMU")
forgot to add.
Fixes: 018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://lore.kernel.org/r/20250919004639.1360453-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The while loop doesn't execute and following warning gets generated:
protection_keys.c:561:15: warning: code will never be executed
[-Wunreachable-code]
int rpkey = alloc_random_pkey();
Let's enable the while loop such that it gets executed nr_iterations
times. Simplify the code a bit as well.
Link: https://lkml.kernel.org/r/20250912123025.1271051-3-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/mm: Add -Wunreachable-code and fix warnings".
Add -Wunreachable-code to selftests and remove dead code from generated
warnings.
This patch (of 2):
Enable -Wunreachable-code flag to catch dead code and fix them.
1. Remove the dead code and write a comment instead:
hmm-tests.c:2033:3: warning: code will never be executed
[-Wunreachable-code]
perror("Should not reach this\n");
^~~~~~
2. ksft_exit_fail_msg() calls exit(). So cleanup isn't done. Replace it
with ksft_print_msg().
split_huge_page_test.c:301:3: warning: code will never be executed
[-Wunreachable-code]
goto cleanup;
^~~~~~~~~~~~
3. Remove duplicate inline.
pkey_sighandler_tests.c:44:15: warning: duplicate 'inline' declaration
specifier [-Wduplicate-decl-specifier]
static inline __always_inline
Link: https://lkml.kernel.org/r/20250912123025.1271051-1-usama.anjum@collabora.com
Link: https://lkml.kernel.org/r/20250912123025.1271051-2-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This macro gets used in different tests. Add it to kselftest.h which is
central location and tests use this header. Then use this new macro.
Link: https://lkml.kernel.org/r/20250912125102.1309796-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Antonio Quartulli <antonio@openvpn.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Sabrina Dubroca" <sd@queasysnail.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We recently missed detecting an issue during early testing because the
default (!all) tests would not trigger it and even when running "all"
tests it only would happen sometimes because of races.
So let's allow for an easy way to specify "GUP all pages in a single
call", extend the test matrix and extend our default (!all) tests.
By GUP'ing all pages in a single call, with the default size of 128MiB
we'll cover multiple leaf page tables / PMDs on architectures with sane
THP sizes.
Link: https://lkml.kernel.org/r/20250910093051.1693097-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As of v6.8 commit 7fbb5e188248 ("mm: remove VM_EXEC requirement for THP
eligibility") thp collapse no longer requires file-backed mappings be
created with PROT_EXEC.
Remove the overly-strict dependency from thp collapse tests so we test the
least-strict requirement for success.
Link: https://lkml.kernel.org/r/20250909190534.512801-1-zokeefe@google.com
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test will fail as below on x86_64 with cpu la57 support (will skip if
no la57 support). Note, the test requries nr_hugepages to be set first.
# running bash ./va_high_addr_switch.sh
# -------------------------------------
# mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
# mmap(addr_switch_hint - pagesize, (2 * pagesize)): 0x7f55b60f9000 - OK
# mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
# mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
# mmap(NULL): 0x7f55b60f9000 - OK
# mmap(low_addr): 0x40000000 - OK
# mmap(high_addr): 0x1000000000000 - OK
# mmap(high_addr) again: 0xffff55b6136000 - OK
# mmap(high_addr, MAP_FIXED): 0x1000000000000 - OK
# mmap(-1): 0xffff55b6134000 - OK
# mmap(-1) again: 0xffff55b6132000 - OK
# mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
# mmap(addr_switch_hint - pagesize, 2 * pagesize): 0x7f55b60f9000 - OK
# mmap(addr_switch_hint - pagesize/2 , 2 * pagesize): 0x7f55b60f7000 - OK
# mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
# mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
# mmap(NULL, MAP_HUGETLB): 0x7f55b5c00000 - OK
# mmap(low_addr, MAP_HUGETLB): 0x40000000 - OK
# mmap(high_addr, MAP_HUGETLB): 0x1000000000000 - OK
# mmap(high_addr, MAP_HUGETLB) again: 0xffff55b5e00000 - OK
# mmap(high_addr, MAP_FIXED | MAP_HUGETLB): 0x1000000000000 - OK
# mmap(-1, MAP_HUGETLB): 0x7f55b5c00000 - OK
# mmap(-1, MAP_HUGETLB) again: 0x7f55b5a00000 - OK
# mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB): 0x800000000000 - FAILED
# mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0x800000000000 - OK
# [FAIL]
addr_switch_hint is defined as DFEFAULT_MAP_WINDOW in the failed test (for
x86_64, DFEFAULT_MAP_WINDOW is defined as (1UL<<47) - pagesize) in 64 bit.
Before commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*}
functions"), for x86_64 hugetlb_get_unmapped_area() is handled in arch
code arch/x86/mm/hugetlbpage.c and addr is checked with
map_address_hint_valid() after align with 'addr &= huge_page_mask(h)'
which is a round down way, and it will fail the check because the addr is
within the DEFAULT_MAP_WINDOW but (addr + len) is above the
DFEFAULT_MAP_WINDOW. So it wil go through the
hugetlb_get_unmmaped_area_top_down() to find an area within the
DFEFAULT_MAP_WINDOW.
After commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*}
functions"). The addr hint for hugetlb_get_unmmaped_area() will be
rounded up and aligned to hugepage size with ALIGN() for all arches. And
after the align, the addr will be above the default MAP_DEFAULT_WINDOW,
and the map_addresshint_valid() check will pass because both aligned addr
(addr0) and (addr + len) are above the DEFAULT_MAP_WINDOW, and the aligned
hint address (0x800000000000) is returned as an suitable gap is found
there, in arch_get_unmapped_area_topdown().
To still cover the case that addr is within the DEFAULT_MAP_WINDOW, and
addr + len is above the DFEFAULT_MAP_WINDOW, change to choose the last
hugepage aligned address within the DEFAULT_MAP_WINDOW as the hint addr,
and the addr + len (2 hugepages) will be one hugepage above the
DEFAULT_MAP_WINDOW. An aligned address won't be affected by the page
round up or round down from kernel, so it's determistic.
Link: https://lkml.kernel.org/r/20250912013711.3002969-4-chuhu@redhat.com
Fixes: cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*} functions")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Alloc hugepages in the test internally, so we don't fully rely on the
run_vmtests.sh. If run_vmtests.sh does that great, free hugepages is
enough for being used to run the test, leave it as it is, otherwise setup
the hugepages in the test.
Save the original nr_hugepages value and restore it after test finish, so
leave a stable test envronment.
Link: https://lkml.kernel.org/r/20250912013711.3002969-3-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fix va_high_addr_switch.sh test failure", v3.
These three patches fix the va_high_addr_switch.sh test failure on x86_64.
Patch 1 fixes the hugepage setup issue that nr_hugepages is reset too
early in run_vmtests.sh and break the later va_high_addr_switch testing.
Patch 2 adds hugepage setup in va_high_addr_switch test, so that it can
still work if vm_runtests.sh changes the hugepage setup someday.
Patch 3 fixes the test failure caused by the hint addr align method change
in hugetlb_get_unmapped_area().
This patch (of 3):
The nr_hugepgs variable is used to keep the original nr_hugepages at the
hugepage setup step at test beginning. After userfaultfd test, a cleaup
is executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and
/proc/sys//vm/nr_hugepages are reset to 'original' value before
userfaultfd test starts.
Issue here is the value used to restore /proc/sys/vm/nr_hugepages is
nr_hugepgs which is the initial value before the vm_runtests.sh runs, not
the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests
runs after that will possibly see no hugepages available for test, and got
EINVAL when mmap(HUGETLB), making the result invalid.
And before pkey tests, nr_hugepgs is changed to be used as a temp variable
to save nr_hugepages before pkey test, and restore it after pkey tests
finish. The original nr_hugepages value is not tracked anymore, so no way
to restore it after all tests finish.
Add a new variable orig_nr_hugepgs to save the original nr_hugepages, and
and restore it to nr_hugepages after all tests finish. And change to use
the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after
hugepage setup, it's also the value before userfaultfd test starts, and
the correct value to be restored after userfaultfd finishes. The
va_high_addr_switch.sh broken will be resolved.
Link: https://lkml.kernel.org/r/20250912013711.3002969-1-chuhu@redhat.com
Link: https://lkml.kernel.org/r/20250912013711.3002969-2-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is room for improvement, so let's clean up a bit:
(1) Define "4" as a constant.
(2) SKIP if we fail to allocate all THPs (e.g., fragmented) and add
recovery code for all other failure cases: no need to exit the test.
(3) Rename "len" to thp_area_size, and "one_page" to "thp_area".
(4) Allocate a new area "page_area" into which we will mremap the
pages; add "page_area_size". Now we can easily merge the two
mremap instances into a single one.
(5) Iterate THPs instead of bytes when checking for missed THPs after
mremap.
(6) Rename "pte_mapped2" to "tmp", used to verify mremap(MAP_FIXED)
result.
(7) Split the corruption test from the failed-split test, so we can just
iterate bytes vs. thps naturally.
(8) Extend comments and clarify why we are using mremap in the first
place.
Link: https://lkml.kernel.org/r/20250903070253.34556-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
wrong results
Patch series "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements", v2.
One fix for occasional failures I found while testing and a bunch of
cleanups that should make that test easier to digest.
This patch (of 2):
When checking for actual tail or head pages of a folio, we must make sure
that the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flag is paired with KPF_THP.
For example, if we have another large folio after our large folio in
physical memory, our "pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL)" would
trigger even though it's actually a head page of the next folio.
If is_backed_by_folio() returns a wrong result, split_pte_mapped_thp() can
fail with "Some THPs are missing during mremap".
Fix it by checking for head/tail pages of folios properly. Add
folio_tail_flags/folio_head_flags to improve readability and use these
masks also when just testing for any compound page.
Link: https://lkml.kernel.org/r/20250903070253.34556-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250903070253.34556-2-david@redhat.com
Fixes: 169b456b0162 ("selftests/mm: reimplement is_backed_by_thp() with more precise check")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that all users are gone, let's remove it.
Link: https://lkml.kernel.org/r/20250901150359.867252-38-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It's no longer user-selectable (and the default was already "y"), so let's
just drop it.
It was never really relevant to the wireguard selftests either way.
Link: https://lkml.kernel.org/r/20250901150359.867252-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Refactor macros and non-composite global variable definitions into a
struct that is defined at the start of a test and is passed around instead
of relying on global vars.
Link: https://lkml.kernel.org/r/20250829155600.2000-1-ujwal.kundur@gmail.com
Signed-off-by: Ujwal Kundur <ujwal.kundur@gmail.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With zswap using zsmalloc directly, there are no more in-tree users of
this code. Remove it.
With zpool gone, zsmalloc is now always a simple dependency and no
longer something the user needs to configure. Hide CONFIG_ZSMALLOC
from the user and have zswap and zram pull it in as needed.
Link: https://lkml.kernel.org/r/20250829162212.208258-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: SeongJae Park <sj@kernel.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Vitaly Wool <vitaly.wool@konsulko.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some ublk selftests have strange behavior when fio is not installed.
While most tests behave correctly (run if they don't need fio, or skip
if they need fio), the following tests have different behavior:
- test_null_01, test_null_02, test_generic_01, test_generic_02, and
test_generic_12 try to run fio without checking if it exists first,
and fail on any failure of the fio command (including "fio command
not found"). So these tests fail when they should skip.
- test_stress_05 runs fio without checking if it exists first, but
doesn't fail on fio command failure. This test passes, but that pass
is misleading as the test doesn't do anything useful without fio
installed. So this test passes when it should skip.
Fix these issues by adding _have_program fio checks to the top of all of
these tests.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
With latest llvm22, I got the following verification failure:
...
; int big_alloc2(void *ctx) @ verifier_arena_large.c:207
0: (b4) w6 = 1 ; R6_w=1
...
; if (err) @ verifier_arena_large.c:233
53: (56) if w6 != 0x0 goto pc+62 ; R6=0
54: (b7) r7 = -4 ; R7_w=-4
55: (18) r8 = 0x7f4000000000 ; R8_w=scalar()
57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena
58: (b4) w6 = 5 ; R6_w=5
; pg = page[i]; @ verifier_arena_large.c:238
59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4
60: (07) r1 += 4 ; R1_w=0
61: (79) r2 = *(u64 *)(r9 +0) ; R2_w=scalar() R9_w=arena
; if (*pg != i) @ verifier_arena_large.c:239
62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena
63: (71) r3 = *(u8 *)(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff))
64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0
; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241
65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0)
67: (b4) w3 = 2 ; R3_w=2
68: (85) call bpf_arena_free_pages#72675 ;
69: (b7) r1 = 0 ; R1_w=0
; page[i + 1] = NULL; @ verifier_arena_large.c:243
70: (7b) *(u64 *)(r8 +8) = r1
R8 invalid mem access 'scalar'
processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2
=============
#489/5 verifier_arena_large/big_alloc2:FAIL
The main reason is that 'r8' in insn '70' is not an arena pointer.
Further debugging at llvm side shows that llvm commit ([1]) caused
the failure. For the original code:
page[i] = NULL;
page[i + 1] = NULL;
the llvm transformed it to something like below at source level:
__builtin_memset(&page[i], 0, 16)
Such transformation prevents llvm BPFCheckAndAdjustIR pass from
generating proper addr_space_cast insns ([2]).
Adding support in llvm BPFCheckAndAdjustIR pass should work, but
not sure that such a pattern exists or not in real applications.
At the same time, simply adding a memory barrier between two 'page'
assignment can fix the issue.
[1] https://github.com/llvm/llvm-project/pull/155415
[2] https://github.com/llvm/llvm-project/pull/84410
Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250920045805.3288551-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There is a spelling mistake in a test message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Every futex selftest uses the kselftest_harness.h helper and don't need
the logging.h file. Delete it.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
futex_numa doesn't really use logging.h helpers, it's only need two
includes from this file. So drop it and include the two missing
includes.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_numa_mpol test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_priv_hash test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_waitv test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_requeue test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_wait test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
kselftest_harness.h
To reduce the boilerplate code, refactor futex_wait_private_mapped_file
test to use kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_wait_unitialized_heap
test to use kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_wait_wouldblock test to
use kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
To reduce the boilerplate code, refactor futex_wait_timeout test to use
kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue_pi_signal_restart
test to use kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
kselftest_harness.h
To reduce the boilerplate code, refactor futex_requeue_pi_mismatched_ops
test to use kselftest_harness header instead of futex's logging header.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|