| Age | Commit message (Collapse) | Author |
|
Merge in late fixes in preparation for the net-next PR.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CPU feature updates from Dave Hansen:
"The biggest thing of note here is Linear Address Space Separation
(LASS). It represents the first time I can think of that the
upper=>kernel/lower=>user address space convention is actually
recognized by the hardware on x86. It ensures that userspace can not
even get the hardware to _start_ page walks for the kernel address
space. This, of course, is a really nice generic side channel defense.
This is really only a down payment on LASS support. There are still
some details to work out in its interaction with EFI calls and
vsyscall emulation. For now, LASS is disabled if either of those
features is compiled in (which is almost always the case).
There's also one straggler commit in here which converts an
under-utilized AMD CPU feature leaf into a generic Linux-defined leaf
so more feature can be packed in there.
Summary:
- Enable Linear Address Space Separation (LASS)
- Change X86_FEATURE leaf 17 from an AMD leaf to Linux-defined"
* tag 'x86_cpu_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Enable LASS during CPU initialization
selftests/x86: Update the negative vsyscall tests to expect a #GP
x86/traps: Communicate a LASS violation in #GP message
x86/kexec: Disable LASS during relocate kernel
x86/alternatives: Disable LASS when patching kernel code
x86/asm: Introduce inline memcpy and memset
x86/cpu: Add an LASS dependency on SMAP
x86/cpufeatures: Enumerate the LASS feature bits
x86/cpufeatures: Make X86_FEATURE leaf 17 Linux-specific
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
- SCA rework
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Prevent a thundering herd problem when the timekeeper CPU is delayed
and a large number of CPUs compete to acquire jiffies_lock to do the
update. Limit it to one CPU with a separate "uncontended" atomic
variable.
- A set of improvements for the timer migration mechanism:
- Support imbalanced NUMA trees correctly
- Support dynamic exclusion of CPUs from the migrator duty to allow
the cpuset/isolation mechanism to exclude them from handling
timers of remote idle CPUs
- The usual small updates, cleanups and enhancements
* tag 'timers-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Exclude isolated cpus from hierarchy
cpumask: Add initialiser to use cleanup helpers
sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
timers/migration: Use scoped_guard on available flag set/clear
timers/migration: Add mask for CPUs available in the hierarchy
timers/migration: Rename 'online' bit to 'available'
selftests/timers/nanosleep: Add tests for return of remaining time
selftests/timers: Clean up kernel version check in posix_timers
time: Fix a few typos in time[r] related code comments
time: tick-oneshot: Add missing Return and parameter descriptions to kernel-doc
hrtimer: Store time as ktime_t in restart block
timers/migration: Remove dead code handling idle CPU checking for remote timers
timers/migration: Remove unused "cpu" parameter from tmigr_get_group()
timers/migration: Assert that hotplug preparing CPU is part of stable active hierarchy
timers/migration: Fix imbalanced NUMA trees
timers/migration: Remove locking on group connection
timers/migration: Convert "while" loops to use "for"
tick/sched: Limit non-timekeeper CPUs calling jiffies update
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.19
- Support for userspace handling of synchronous external aborts (SEAs),
allowing the VMM to potentially handle the abort in a non-fatal
manner.
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers in
hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ.
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU.
- Allow page table destruction to reschedule, fixing long need_resched
latencies observed when destroying a large VM.
- Minor fixes to KVM and selftests
|
|
KVM/riscv changes for 6.19
- SBI MPXY support for KVM guest
- New KVM_EXIT_FAIL_ENTRY_NO_VSFILE for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support enabling dirty log gradually in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.19
1. Get VM PMU capability from HW GCFG register.
2. Add AVEC basic support.
3. Use 64-bit register definition for EIOINTC.
4. Add KVM timer test cases for tools/selftests.
|
|
Add a CPUID testcase to verify that KVM allows KVM_SET_CPUID2 after (or in
conjunction with) runtime updates. This is a regression test for the bug
introduced by commit 93da6af3ae56 ("KVM: x86: Defer runtime updates of
dynamic CPUID bits until CPUID emulation"), where KVM would incorrectly
reject KVM_SET_CPUID due to a not handling a pending runtime update on the
current CPUID, resulting in a false mismatch between the "old" and "new"
CPUID entries.
Link: https://lore.kernel.org/all/20251128123202.68424a95@imammedo
Link: https://patch.msgid.link/20251202015049.1167490-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In commit 0297cdc12a87 ("KVM: selftests: Add option to rseq test to
override /dev/cpu_dma_latency"), a 'break' is missed before the option
'l' in the argument parsing loop, which leads to an unexpected core
dump in atoi_paranoid(). It tries to get the latency from non-existent
argument.
host$ ./rseq_test -u
Random seed: 0x6b8b4567
Segmentation fault (core dumped)
Add a 'break' before the option 'l' in the argument parsing loop to avoid
the unexpected core dump.
Fixes: 0297cdc12a87 ("KVM: selftests: Add option to rseq test to override /dev/cpu_dma_latency")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Gavin Shan <gshan@redhat.com>
Link: https://patch.msgid.link/20251124050427.1924591-1-gshan@redhat.com
[sean: describe code change in shortlog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add tests that trigger packet drops in cake_enqueue(): "CAKE with QFQ
Parent - CAKE enqueue with packets dropping". It forces CAKE_enqueue to
return NET_XMIT_CN after dropping the packets when it has a QFQ parent.
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20251128001415.377823-3-xmei5@asu.edu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.19
This is a very large set of updates, as well as some more extensive
cleanup work from Morimto-san we've also added a generic SCDA class
driver for SoundWire devices enabling us to support many chips with
no custom code. There's also a batch of new drivers added for both
SoCs and CODECs.
- Added a SoundWire SCDA generic class driver, pulling in a little
regmap work to support it.
- A *lot* of cleaup and API improvement work from Morimoto-san.
- Lots of work on the existing Cirrus, Intel, Maxim and Qualcomm
drivers.
- Support for Allwinner A523, Mediatek MT8189, Qualcomm QCM2290,
QRB2210 and SM6115, SpacemiT K1, and TI TAS2568, TAS5802, TAS5806,
TAS5815, TAS5828 and TAS5830.
This also pulls in some gpiolib changes supporting shared GPIOs in the
core there so we can convert some of the ASoC drivers open coding
handling of that to the core functionality.
|
|
Currently, tolerance is computed against the TC’s expected percentage,
making TC3 (20%) validation overly strict and TC4 (80%) overly loose.
Update BandwidthValidator to take a dict of shares and compute bounds
relative to the overall total, so that all shares are validated
consistently.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-7-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct the documented bandwidth distribution between TC3 and TC4
from 80/20 to 20/80. Update test descriptions and printed messages
to consistently reflect the intended split.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-6-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 7c32f7a2d3db ("selftests: net: py: don't default to shell=True")
changed the cmd() helper to avoid spawning a shell unless explicitly
requested.
The devlink_rate_tc_bw test enables SR-IOV by writing to the
sriov_numvfs sysfs attribute using redirection. Without shell=True the
redirection is not interpreted and the VF device never appears,
causing the test to fail.
Fix by explicitly passing shell=True in the two places that update
sriov_numvfs.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-5-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the inline iperf3 subprocess and JSON parsing with
Iperf3Runner.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-4-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GenerateTraffic was added to spin up long-running iperf3 load, mainly
to drive high PPS background traffic. It was never meant to provide
stable throughput numbers, and trying to repurpose it for measurement
does not make sense.
Introduce Iperf3Runner to allow tests to split out server/client
configuration, control start/stop, and collect JSON output for
analysis. This makes it possible to measure bandwidth directly when
validating egress shaping.
GenerateTraffic stays as the background load generator, reusing the
common iperf3 helpers under the hood.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-3-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This makes devlink_rate_tc_bw.py present in the Makefile under the same
directory.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Link: https://patch.msgid.link/20251130091938.4109055-2-cjubran@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This removes some noise that can be distracting while looking at
selftests by redirecting socat stderr to /dev/null.
Before this commit, netcons_basic would output:
Running with target mode: basic (ipv6)
2025/11/29 12:08:03 socat[259] W exiting on signal 15
2025/11/29 12:08:03 socat[271] W exiting on signal 15
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
2025/11/29 12:08:05 socat[329] W exiting on signal 15
2025/11/29 12:08:05 socat[322] W exiting on signal 15
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[386] W exiting on signal 15
2025/11/29 12:08:08 socat[380] W exiting on signal 15
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
2025/11/29 12:08:10 socat[440] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
2025/11/29 12:08:10 socat[435] W exiting on signal 15
extended : ipv4 : Test passed
After these changes, output looks like:
Running with target mode: basic (ipv6)
basic : ipv6 : Test passed
Running with target mode: basic (ipv4)
basic : ipv4 : Test passed
Running with target mode: extended (ipv6)
extended : ipv6 : Test passed
Running with target mode: extended (ipv4)
extended : ipv4 : Test passed
Signed-off-by: Andre Carvalho <asantostc@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20251129-netcons-socat-noise-v1-1-605a0cea8fca@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
New NIPA installation had been reporting a few flaky tests.
arp_ndisc_evict_nocarrier is most flaky of them all.
I suspect that the flakiness is due to udev swapping the MAC
addresses on the interfaces. Extend the message in
arp_ndisc_evict_nocarrier to hint at this potential issue.
Having the neigh get fail right after ping is rather unusual,
unless udev changes the MAC addr causing a flush in the meantime.
Link: https://patch.msgid.link/20251127194556.2409574-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following up on the old discussion [1]. Let the BaseExceptions out of
defer()'ed cleanup. And handle it in the main loop. This allows us to
exit the tests if user hit Ctrl-C during defer().
Link: https://lore.kernel.org/20251119063228.3adfd743@kernel.org # [1]
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20251128004846.2602687-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfd and coredump updates from Christian Brauner:
"Features:
- Expose coredump signal via pidfd
Expose the signal that caused the coredump through the pidfd
interface. The recent changes to rework coredump handling to rely
on unix sockets are in the process of being used in systemd. The
previous systemd coredump container interface requires the coredump
file descriptor and basic information including the signal number
to be sent to the container. This means the signal number needs to
be available before sending the coredump to the container.
- Add supported_mask field to pidfd
Add a new supported_mask field to struct pidfd_info that indicates
which information fields are supported by the running kernel. This
allows userspace to detect feature availability without relying on
error codes or kernel version checks.
Cleanups:
- Drop struct pidfs_exit_info and prepare to drop exit_info pointer,
simplifying the internal publication mechanism for exit and
coredump information retrievable via the pidfd ioctl
- Use guard() for task_lock in pidfs
- Reduce wait_pidfd lock scope
- Add missing PIDFD_INFO_SIZE_VER1 constant
- Add missing BUILD_BUG_ON() assert on struct pidfd_info
Fixes:
- Fix PIDFD_INFO_COREDUMP handling
Selftests:
- Split out coredump socket tests and common helpers into separate
files for better organization
- Fix userspace coredump client detection issues
- Handle edge-triggered epoll correctly
- Ignore ENOSPC errors in tests
- Add debug logging to coredump socket tests, socket protocol tests,
and test helpers
- Add tests for PIDFD_INFO_COREDUMP_SIGNAL
- Add tests for supported_mask field
- Update pidfd header for selftests"
* tag 'vfs-6.19-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
pidfs: reduce wait_pidfd lock scope
selftests/coredump: add second PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: add first PIDFD_INFO_COREDUMP_SIGNAL test
selftests/coredump: ignore ENOSPC errors
selftests/coredump: add debug logging to coredump socket protocol tests
selftests/coredump: add debug logging to coredump socket tests
selftests/coredump: add debug logging to test helpers
selftests/coredump: handle edge-triggered epoll correctly
selftests/coredump: fix userspace coredump client detection
selftests/coredump: fix userspace client detection
selftests/coredump: split out coredump socket tests
selftests/coredump: split out common helpers
selftests/pidfd: add second supported_mask test
selftests/pidfd: add first supported_mask test
selftests/pidfd: update pidfd header
pidfs: expose coredump signal
pidfs: drop struct pidfs_exit_info
pidfs: prepare to drop exit_info pointer
pidfd: add a new supported_mask field
pidfs: add missing BUILD_BUG_ON() assert on struct pidfd_info
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains substantial namespace infrastructure changes including a new
system call, active reference counting, and extensive header cleanups.
The branch depends on the shared kbuild branch for -fms-extensions support.
Features:
- listns() system call
Add a new listns() system call that allows userspace to iterate
through namespaces in the system. This provides a programmatic
interface to discover and inspect namespaces, addressing
longstanding limitations:
Currently, there is no direct way for userspace to enumerate
namespaces. Applications must resort to scanning /proc/*/ns/ across
all processes, which is:
- Inefficient - requires iterating over all processes
- Incomplete - misses namespaces not attached to any running
process but kept alive by file descriptors, bind mounts, or
parent references
- Permission-heavy - requires access to /proc for many processes
- No ordering or ownership information
- No filtering per namespace type
The listns() system call solves these problems:
ssize_t listns(const struct ns_id_req *req, u64 *ns_ids,
size_t nr_ns_ids, unsigned int flags);
struct ns_id_req {
__u32 size;
__u32 spare;
__u64 ns_id;
struct /* listns */ {
__u32 ns_type;
__u32 spare2;
__u64 user_ns_id;
};
};
Features include:
- Pagination support for large namespace sets
- Filtering by namespace type (MNT_NS, NET_NS, USER_NS, etc.)
- Filtering by owning user namespace
- Permission checks respecting namespace isolation
- Active Reference Counting
Introduce an active reference count that tracks namespace
visibility to userspace. A namespace is visible in the following
cases:
- The namespace is in use by a task
- The namespace is persisted through a VFS object (namespace file
descriptor or bind-mount)
- The namespace is a hierarchical type and is the parent of child
namespaces
The active reference count does not regulate lifetime (that's still
done by the normal reference count) - it only regulates visibility
to namespace file handles and listns().
This prevents resurrection of namespaces that are pinned only for
internal kernel reasons (e.g., user namespaces held by
file->f_cred, lazy TLB references on idle CPUs, etc.) which should
not be accessible via (1)-(3).
- Unified Namespace Tree
Introduce a unified tree structure for all namespaces with:
- Fixed IDs assigned to initial namespaces
- Lookup based solely on inode number
- Maintained list of owned namespaces per user namespace
- Simplified rbtree comparison helpers
Cleanups
- Header Reorganization:
- Move namespace types into separate header (ns_common_types.h)
- Decouple nstree from ns_common header
- Move nstree types into separate header
- Switch to new ns_tree_{node,root} structures with helper functions
- Use guards for ns_tree_lock
- Initial Namespace Reference Count Optimization
- Make all reference counts on initial namespaces a nop to avoid
pointless cacheline ping-pong for namespaces that can never go
away
- Drop custom reference count initialization for initial namespaces
- Add NS_COMMON_INIT() macro and use it for all namespaces
- pid: rely on common reference count behavior
- Miscellaneous Cleanups
- Rename exit_task_namespaces() to exit_nsproxy_namespaces()
- Rename is_initial_namespace() and make argument const
- Use boolean to indicate anonymous mount namespace
- Simplify owner list iteration in nstree
- nsfs: raise SB_I_NODEV, SB_I_NOEXEC, and DCACHE_DONTCACHE explicitly
- nsfs: use inode_just_drop()
- pidfs: raise DCACHE_DONTCACHE explicitly
- pidfs: simplify PIDFD_GET__NAMESPACE ioctls
- libfs: allow to specify s_d_flags
- cgroup: add cgroup namespace to tree after owner is set
- nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
Fixes:
- setns(pidfd, ...) race condition
Fix a subtle race when using pidfds with setns(). When the target
task exits after prepare_nsset() but before commit_nsset(), the
namespace's active reference count might have been dropped. If
setns() then installs the namespaces, it would bump the active
reference count from zero without taking the required reference on
the owner namespace, leading to underflow when later decremented.
The fix resurrects the ownership chain if necessary - if the caller
succeeded in grabbing passive references, the setns() should
succeed even if the target task exits or gets reaped.
- Return EFAULT on put_user() error instead of success
- Make sure references are dropped outside of RCU lock (some
namespaces like mount namespace sleep when putting the last
reference)
- Don't skip active reference count initialization for network
namespace
- Add asserts for active refcount underflow
- Add asserts for initial namespace reference counts (both passive
and active)
- ipc: enable is_ns_init_id() assertions
- Fix kernel-doc comments for internal nstree functions
- Selftests
- 15 active reference count tests
- 9 listns() functionality tests
- 7 listns() permission tests
- 12 inactive namespace resurrection tests
- 3 threaded active reference count tests
- commit_creds() active reference tests
- Pagination and stress tests
- EFAULT handling test
- nsid tests fixes"
* tag 'namespace-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (103 commits)
pidfs: simplify PIDFD_GET_<type>_NAMESPACE ioctls
nstree: fix kernel-doc comments for internal functions
nsproxy: fix free_nsproxy() and simplify create_new_namespaces()
selftests/namespaces: fix nsid tests
ns: drop custom reference count initialization for initial namespaces
pid: rely on common reference count behavior
ns: add asserts for initial namespace active reference counts
ns: add asserts for initial namespace reference counts
ns: make all reference counts on initial namespace a nop
ipc: enable is_ns_init_id() assertions
fs: use boolean to indicate anonymous mount namespace
ns: rename is_initial_namespace()
ns: make is_initial_namespace() argument const
nstree: use guards for ns_tree_lock
nstree: simplify owner list iteration
nstree: switch to new structures
nstree: add helper to operate on struct ns_tree_{node,root}
nstree: move nstree types into separate header
nstree: decouple from ns_common header
ns: move namespace types into separate header
...
|
|
Pull remaining 6.18-devel changes.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
* kvm-arm64/nv-xnx-haf: (22 commits)
: Support for FEAT_XNX and FEAT_HAF in nested
:
: Add support for a couple of MMU-related features that weren't
: implemented by KVM's software page table walk:
:
: - FEAT_XNX: Allows the hypervisor to describe execute permissions
: separately for EL0 and EL1
:
: - FEAT_HAF: Hardware update of the Access Flag, which in the context of
: nested means software walkers must also set the Access Flag.
:
: The series also adds some basic support for testing KVM's emulation of
: the AT instruction, including the implementation detail that AT sets the
: Access Flag in KVM.
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
* kvm-arm64/vgic-lr-overflow: (50 commits)
: Support for VGIC LR overflows, courtesy of Marc Zyngier
:
: Address deficiencies in KVM's GIC emulation when a vCPU has more active
: IRQs than can be represented in the VGIC list registers. Sort the AP
: list to prioritize inactive and pending IRQs, potentially spilling
: active IRQs outside of the LRs.
:
: Handle deactivation of IRQs outside of the LRs for both EOImode=0/1,
: which involves special consideration for SPIs being deactivated from a
: different vCPU than the one that acked it.
KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATURE
KVM: arm64: selftests: vgic_irq: Add timer deactivation test
KVM: arm64: selftests: vgic_irq: Add Group-0 enable test
KVM: arm64: selftests: vgic_irq: Add asymmetric SPI deaectivation test
KVM: arm64: selftests: vgic_irq: Perform EOImode==1 deactivation in ack order
KVM: arm64: selftests: vgic_irq: Remove LR-bound limitation
KVM: arm64: selftests: vgic_irq: Exclude timer-controlled interrupts
KVM: arm64: selftests: vgic_irq: Change configuration before enabling interrupt
KVM: arm64: selftests: vgic_irq: Fix GUEST_ASSERT_IAR_EMPTY() helper
KVM: arm64: selftests: gic_v3: Disable Group-0 interrupts by default
KVM: arm64: selftests: gic_v3: Add irq group setting helper
KVM: arm64: GICv2: Always trap GICV_DIR register
KVM: arm64: GICv2: Handle deactivation via GICV_DIR traps
KVM: arm64: GICv2: Handle LR overflow when EOImode==0
KVM: arm64: GICv3: Force exit to sync ICH_HCR_EL2.En
KVM: arm64: GICv3: nv: Plug L1 LR sync into deactivation primitive
KVM: arm64: GICv3: nv: Resync LRs/VMCR/HCR early for better MI emulation
KVM: arm64: GICv3: Avoid broadcast kick on CPUs lacking TDIR
KVM: arm64: GICv3: Handle in-LR deactivation when possible
KVM: arm64: GICv3: Add SPI tracking to handle asymmetric deactivation
...
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
* kvm-arm64/sea-user:
: Userspace handling of SEAs, courtesy of Jiaqi Yan
:
: Add support for processing external aborts in userspace in situations
: where the host has failed to do so, allowing the VMM to potentially
: reinject an external abort into the VM.
Documentation: kvm: new UAPI for handling SEA
KVM: selftests: Test for KVM_EXIT_ARM_SEA
KVM: arm64: VM exit to userspace to handle SEA
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
There is a spelling mistake in a TEST_FAIL message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://msgid.link/20251128175124.319094-1-colin.i.king@gmail.com
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Add a basic test for AT emulation in the EL2&0 and EL1&0 translation
regimes.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-16-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
- In order to prepare the layout for nohz_full work deferral to
user exit, the context tracking state must shrink the counter
of transitions to/from RCU not watching. The only possible hazard
is to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption.
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN warnings.
On recent discussions, we also concluded that all those WRITE_ONCE()
and READ_ONCE() on list APIs deserve appropriate comments. Something
to be expected for the next cycle.
- Provide a script to apply several configs to several commits with torture.
- Allow torture to reuse a build directory in order to save needless
rebuild time.
- Various cleanups.
|
|
In "uffd-stress.c" & "uffd-unit-tests.c". address of char variable having
garbage value (uninitialized) is passed to 'write' syscall triggers
warning.
uffd-stress.c:246:39: warning: variable 'c' is uninitialized when
passed as a const pointer argument here
[-Wuninitialized-const-pointer]
uffd-unit-tests.c:581:31: warning: variable 'c' is uninitialized
when passed as a const pointer argument here
[-Wuninitialized-const-pointer]
so the fix is to assign char variable to '\0' to prevent writing of
garbage value.
Link: https://lkml.kernel.org/r/20251126160830.52124-1-ankitkhushwaha.linux@gmail.com
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is useful to transition to using a bitmap for VMA flags so we can avoid
running out of flags, especially for 32-bit kernels which are constrained
to 32 flags, necessitating some features to be limited to 64-bit kernels
only.
By doing so, we remove any constraint on the number of VMA flags moving
forwards no matter the platform and can decide in future to extend beyond
64 if required.
We start by declaring an opaque types, vma_flags_t (which resembles
mm_struct flags of type mm_flags_t), setting it to precisely the same size
as vm_flags_t, and place it in union with vm_flags in the VMA declaration.
We additionally update struct vm_area_desc equivalently placing the new
opaque type in union with vm_flags.
This change therefore does not impact the size of struct vm_area_struct or
struct vm_area_desc.
In order for the change to be iterative and to avoid impacting
performance, we designate VM_xxx declared bitmap flag values as those
which must exist in the first system word of the VMA flags bitmap.
We therefore declare vma_flags_clear_all(), vma_flags_overwrite_word(),
vma_flags_overwrite_word(), vma_flags_overwrite_word_once(),
vma_flags_set_word() and vma_flags_clear_word() in order to allow us to
update the existing vm_flags_*() functions to utilise these helpers.
This is a stepping stone towards converting users to the VMA flags bitmap
and behaves precisely as before.
By doing this, we can eliminate the existing private vma->__vm_flags field
in the vma->vm_flags union and replace it with the newly introduced opaque
type vma_flags, which we call flags so we refer to the new bitmap field as
vma->flags.
We update vma_flag_[test, set]_atomic() to account for the change also.
We adapt vm_flags_reset_once() to only clear those bits above the first
system word providing write-once semantics to the first system word (which
it is presumed the caller requires - and in all current use cases this is
so).
As we currently only specify that the VMA flags bitmap size is equal to
BITS_PER_LONG number of bits, this is a noop, but is defensive in
preparation for a future change that increases this.
We additionally update the VMA userland test declarations to implement the
same changes there.
Finally, we update the rust code to reference vma->vm_flags on update
rather than vma->__vm_flags which has been removed. This is safe for now,
albeit it is implicitly performing a const cast.
Once we introduce flag helpers we can improve this more.
No functional change intended.
Link: https://lkml.kernel.org/r/bab179d7b153ac12f221b7d65caac2759282cfe9.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The userland VMA test code relied on an internal implementation detail -
the existence of vma->__vm_flags to directly access VMA flags. There is
no need to do so when we have the vm_flags_*() helper functions available.
This is ugly, but also a subsequent commit will eliminate this field
altogether so this will shortly become broken.
This patch has us utilise the helper functions instead.
Link: https://lkml.kernel.org/r/6275c53a6bb20743edcbe92d3e130183b47d18d0.1764064557.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "initial work on making VMA flags a bitmap", v3.
We are in the rather silly situation that we are running out of VMA flags
as they are currently limited to a system word in size.
This leads to absurd situations where we limit features to 64-bit
architectures only because we simply do not have the ability to add a flag
for 32-bit ones.
This is very constraining and leads to hacks or, in the worst case, simply
an inability to implement features we want for entirely arbitrary reasons.
This also of course gives us something of a Y2K type situation in mm where
we might eventually exhaust all of the VMA flags even on 64-bit systems.
This series lays the groundwork for getting away from this limitation by
establishing VMA flags as a bitmap whose size we can increase in future
beyond 64 bits if required.
This is necessarily a highly iterative process given the extensive use of
VMA flags throughout the kernel, so we start by performing basic steps.
Firstly, we declare VMA flags by bit number rather than by value,
retaining the VM_xxx fields but in terms of these newly introduced
VMA_xxx_BIT fields.
While we are here, we use sparse annotations to ensure that, when dealing
with VMA bit number parameters, we cannot be passed values which are not
declared as such - providing some useful type safety.
We then introduce an opaque VMA flag type, much like the opaque mm_struct
flag type introduced in commit bb6525f2f8c4 ("mm: add bitmap mm->flags
field"), which we establish in union with vma->vm_flags (but still set at
system word size meaning there is no functional or data type size change).
We update the vm_flags_xxx() helpers to use this new bitmap, introducing
sensible helpers to do so.
This series lays the foundation for further work to expand the use of
bitmap VMA flags and eventually eliminate these arbitrary restrictions.
This patch (of 4):
In order to lay the groundwork for VMA flags being a bitmap rather than a
system word in size, we need to be able to consistently refer to VMA flags
by bit number rather than value.
Take this opportunity to do so in an enum which we which is additionally
useful for tooling to extract metadata from.
This additionally makes it very clear which bits are being used for what
at a glance.
We use the VMA_ prefix for the bit values as it is logical to do so since
these reference VMAs. We consistently suffix with _BIT to make it clear
what the values refer to.
We declare bit values even when the flags that use them would not be
enabled by config options as this is simply clearer and clearly defines
what bit numbers are used for what, at no additional cost.
We declare a sparse-bitwise type vma_flag_t which ensures that users can't
pass around invalid VMA flags by accident and prepares for future work
towards VMA flags being a bitmap where we want to ensure bit values are
type safe.
To make life easier, we declare some macro helpers - DECLARE_VMA_BIT()
allows us to avoid duplication in the enum bit number declarations (and
maintaining the sparse __bitwise attribute), and INIT_VM_FLAG() is used to
assist with declaration of flags.
Unfortunately we can't declare both in the enum, as we run into issue with
logic in the kernel requiring that flags are preprocessor definitions, and
additionally we cannot have a macro which declares another macro so we
must define each flag macro directly.
Additionally, update the VMA userland testing vma_internal.h header to
include these changes.
We also have to fix the parameters to the vma_flag_*_atomic() functions
since VMA_MAYBE_GUARD_BIT is now of type vma_flag_t and sparse will
complain otherwise.
We have to update some rather silly if-deffery found in mm/task_mmu.c
which would otherwise break.
Finally, we update the rust binding helper as now it cannot auto-detect
the flags at all.
Link: https://lkml.kernel.org/r/cover.1764064556.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/3a35e5a0bcfa00e84af24cbafc0653e74deda64a.1764064556.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Alice Ryhl <aliceryhl@google.com> [rust]
Cc: Alex Gaynor <alex.gaynor@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andreas Hindborg <a.hindborg@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Björn Roy Baron <bjorn3_gh@protonmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Chris Li <chrisl@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Gary Guo <gary@garyguo.net>
Cc: Gregory Price <gourry@gourry.net>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mathew Brost <matthew.brost@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Trevor Gross <tmgross@umich.edu>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Wei Xu <weixugc@google.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
test_tc_edt currently defines the target rate in both the userspace and
BPF parts. This value could be defined once in the userspace part if we
make it able to configure the BPF program before starting the test.
Add a target_rate variable in the BPF part, and make the userspace part
set it to the desired rate before attaching the shaping program.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Now that test_tc_edt has been integrated in test_progs, remove the
legacy shell script.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX
veth to shape the traffic to 5MBps. It then checks that the amount of
received bytes (at interface level), compared to the TX duration, indeed
matches 5Mbps.
Convert this test script to the test_progs framework:
- keep the double veth setup, isolated in two veths
- run a small tcp server, and connect client to server
- push a pre-configured amount of bytes, and measure how much time has
been needed to push those
- ensure that this rate is in a 2% error margin around the target rate
This two percent value, while being tight, is hopefully large enough to
not make the test too flaky in CI, while also turning it into a small
example of BPF-based shaping.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The test_tc_edt BPF program uses a custom section name, which works fine
when manually loading it with tc, but prevents it from being loaded with
libbpf.
Update the program section name to "tc" to be able to manipulate it with
a libbpf-based C test.
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add stats to observe the success and failure rate of lock acquisition
attempts in various contexts.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Jakub reported increased flakiness in bond_macvlan_ipvlan.sh on regular
kernel, while the tests consistently pass on a debug kernel. This suggests
a timing-sensitive issue.
To mitigate this, introduce a short sleep before each xvlan_over_bond
connectivity check. The delay helps ensure neighbor and route cache
have fully converged before verifying connectivity.
The sleep interval is kept minimal since check_connection() is invoked
nearly 100 times during the test.
Fixes: 246af950b940 ("selftests: bonding: add macvlan over bond testing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251114082014.750edfad@kernel.org
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20251127143310.47740-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
runqslower was added in commit 9c01546d26d2 "tools/bpf: Add runqslower
tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf
workflows. runqslower continues to live in BCC (libbpf-tools), so there
is no need to keep building and maintaining it.
Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and
selftests accordingly.
Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com>
Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
file_alloc_security hook is disabled. Use other LSM hooks in selftests
instead.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The original implementation added a hack to check_mem_access()
to prevent programs from writing into insn arrays. To get rid
of this hack, enforce BPF_F_RDONLY_PROG on map creation.
Also fix the corresponding selftest, as the error message changes
with this patch.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a new VFIO selftest for measuring the time it takes to run
vfio_pci_device_init() in parallel for one or more devices.
This test serves as manual regression test for the performance
improvement of commit e908f58b6beb ("vfio/pci: Separate SR-IOV VF
dev_set"). For example, when running this test with 64 VFs under the
same PF:
Before:
$ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
...
Wall time: 6.653234463s
Min init time (per device): 0.101215344s
Max init time (per device): 6.652755941s
Avg init time (per device): 3.377609608s
After:
$ ./vfio_pci_device_init_perf_test -r vfio_pci_device_init_perf_test.iommufd.init 0000:1a:00.0 0000:1a:00.1 ...
...
Wall time: 0.122978332s
Min init time (per device): 0.108121915s
Max init time (per device): 0.122762761s
Avg init time (per device): 0.113816748s
This test does not make any assertions about performance, since any such
assertion is likely to be flaky due to system differences and random
noise. However this test can be fed into automation to detect
regressions, and can be used by developers in the future to measure
performance optimizations.
Suggested-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-19-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Eliminate INVALID_IOVA as there are platforms where UINT64_MAX is a
valid iova.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-18-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Split out the contents of libvfio.h into separate header files, but keep
libvfio.h as the top-level include that all tests can use.
Put all new header files into a libvfio/ subdirectory to avoid future
name conflicts in include paths when libvfio is used by other selftests
like KVM.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-17-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Move the vfio_selftests_*() helpers into their own file libvfio.c. These
helpers have nothing to do with struct vfio_pci_device, so they don't
make sense in vfio_pci_device.c.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-16-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Rename vfio_util.h to libvfio.h to match the name of libvfio.mk.
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-15-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Drop the struct vfio_pci_device wrappers for IOMMU map/unmap functions
and require tests to directly call iommu_map(), iommu_unmap(), etc. This
results in more concise code, and also makes it clear the map operations
are happening on a struct iommu, not necessarily on a specific device,
especially when multi-device tests are introduced.
Do the same for iova_allocator_init() as that function only needs the
struct iommu, not struct vfio_pci_device.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-14-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Move the IOVA allocator into its own file, to provide better separation
between the allocator and the struct vfio_pci_device helper code.
The allocator could go into iommu.c, but it is standalone enough that a
separate file seems cleaner. This also continues the trend of having a
.c for every major object in VFIO selftests (vfio_pci_device.c,
vfio_pci_driver.c, iommu.c, and now iova_allocator.c).
No functional change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-13-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|
|
Move all the IOMMU related library code into their own file iommu.c.
This provides a better separation between the vfio_pci_device helper
code and the iommu code.
No function change intended.
Reviewed-by: Alex Mastro <amastro@fb.com>
Tested-by: Alex Mastro <amastro@fb.com>
Reviewed-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20251126231733.3302983-12-dmatlack@google.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
|