| Age | Commit message (Collapse) | Author |
|
To the TFO, only the file descriptor is needed, the family is not.
Also, the error can be handled the same way when 'sendto()' or
'connect()' are used. Only the printed error message is different.
This avoids a bit of confusions.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-14-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A few loops were declaring 'i', but this variable was not used.
To avoid confusions, use '_' instead: it is more explicit to mark that
this variable is not needed.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-13-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nstat outputs are already printed when calling 'fail_test', no need to
do it again.
While at it, no need to use the dump_stats variable, print the extra
stats directly. And use 'ip -n $ns' instead of 'ip netns exec $ns',
shorter and clearer.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-12-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of waiting for a random amount of time (1 second), wait for an
event to be received on the other side.
To do that, when an address is announced (userspace_pm_add_addr), the
ANNOUNCED is expected. When a new subflow is created
(userspace_pm_add_sf), the SUB_ESTABLISHED event is expected.
With this, the tests can finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-11-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It looks like most of the time, this helper was simply waiting a bit
more than one second: the previous MPJoin counter was often already at
the expected value. So at the end, it was just checking 10 times for
the MPJoin counter to change, but it was not happening. For the tests,
that was time, it was just waiting longer for nothing.
Instead, use 'wait_mpj' with the expected counter: in the tests, the MPJ
counter can easily be predicted. While at it, stop passing the netns as
argument: here the received MPJoin ACK is checked, which happens on the
server side. If later on, this needs to be checked on the client side,
the helper can be adapted for this case, but better avoid confusions now
if it is not needed.
While at it, stop using 'i' for the variable if it is not used.
With this, the tests can finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-10-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'wait_mpj' was used just after having created a background connection,
but before creating new subflows. So no MPJ were sent. The intention was
to wait for the connection to be established, which was the same as
doing a simple sleep with a "random" value.
Instead, wait for an "established" event. With this, the tests can
finish quicker.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-9-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This file is the only one from this directory not to have all these
header inclusions sorted by type and alphabetical order.
Adapt them, to ease the reading, prevent conflicts during potential
future backport modifying these lines, and also to avoid having UAPI
header inclusions before libc ones, see [1].
Link: https://lore.kernel.org/20260120-uapi-sockaddr-v2-1-63c319111cf6@linutronix.de
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260203-net-next-mptcp-misc-feat-6-20-v1-8-31ec8bfc56d1@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add tests for linked register tracking with negative offsets, BPF_SUB,
and alu32. These test for all edge cases like overflows, etc.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Previously, the verifier only tracked positive constant deltas between
linked registers using BPF_ADD. This limitation meant patterns like:
r1 = r0;
r1 += -4;
if r1 s>= 0 goto l0_%=; // r1 >= 0 implies r0 >= 4
// verifier couldn't propagate bounds back to r0
if r0 != 0 goto l0_%=;
r0 /= 0; // Verifier thinks this is reachable
l0_%=:
Similar limitation exists for 32-bit registers.
With this change, the verifier can now track negative deltas in reg->off
enabling bound propagation for the above pattern.
For alu32, we make sure the destination register has the upper 32 bits
as 0s before creating the link. BPF_ADD_CONST is split into
BPF_ADD_CONST64 and BPF_ADD_CONST32, the latter is used in case of alu32
and sync_linked_regs uses this to zext the result if known_reg has this
flag.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204151741.2678118-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Now BPF_END has bitwise tracking support. This patch adds selftests to
cover various cases of BPF_END (`bswap(16|32|64)`, `be(16|32|64)`,
`le(16|32|64)`) with bitwise propagation.
This patch is based on existing `verifier_bswap.c`, and add several
types of new tests:
1. Unconditional byte swap operations:
- bswap16/bswap32/bswap64 with unknown bytes
2. Endian conversion operations (architecture-aware):
- be16/be32/be64: convert to big-endian
* on little-endian: do swap
* on big-endian: truncation (16/32-bit) or no-op (64-bit)
- le16/le32/le64: convert to little-endian
* on big-endian: do swap
* on little-endian: truncation (16/32-bit) or no-op (64-bit)
Each test simulates realistic networking scenarios where a value is
masked with unknown bits (e.g., var_off=(0x0; 0x3f00), range=[0,0x3f00]),
then byte-swapped, and the verifier must prove the result stays within
expected bounds.
Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely track bitwise through byte swap
operations, it can prune the trap path (invalid memory access) that
should be unreachable, allowing the program to pass verification.
If bitwise tracking is incorrect, the verifier cannot prove the trap
is unreachable, causing verification failure.
The tests use preprocessor conditionals (#ifdef __BYTE_ORDER__) to
verify correct behavior on both little-endian and big-endian
architectures, and require Clang 18+ for bswap instruction support.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260204111503.77871-3-ziye@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Strengthen timer_start_deadlock test and check for recursion now
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260204055147.54960-5-alexei.starovoitov@gmail.com
|
|
Add a testcase that checks that deadlock avoidance is working
as expected.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260204055147.54960-3-alexei.starovoitov@gmail.com
|
|
Merge cpupower utility updates for 6.20-rc1/7.0-rc1:
- Fix miscellaneous problems in cpupower (Kaushlendra Kumar):
* idle_monitor: Fix incorrect value logged after stop
* Fix inverted APERF capability check
* Use strcspn() to strip trailing newline
* Reset errno before strtoull()
* Show C0 in idle-info dump
- Improve cpupower installation procedure by making the systemd step
optional and allowing users to disable the installation of systemd's
unit file (João Marcos Costa)
* pm-tools:
cpupower: make systemd unit installation optional
tools/power cpupower: Show C0 in idle-info dump
tools/power cpupower: Reset errno before strtoull()
tools/cpupower: Use strcspn() to strip trailing newline
tools/cpupower: Fix inverted APERF capability check
cpupower: idle_monitor: fix incorrect value logged after stop
|
|
Pull KVM fixes from Paolo Bonzini:
- Fix a bug where AVIC is incorrectly inhibited when running with
x2AVIC disabled via module param (or on a system without x2AVIC)
- Fix a dangling device posted IRQs bug by explicitly checking if the
irqfd is still active (on the list) when handling an eventfd signal,
instead of zeroing the irqfd's routing information when the irqfd is
deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not
disable posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks
for an MSI), incorrectly leaving the IRQ in posted mode (and leading
to use-after-free and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next
for a while.
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends,
which leads to unexpected page faults in guest code due e.g.
__memset_chk@plt not being resolved.
- Explicitly configure the supported XSS capabilities from within
{svm,vmx}_set_cpu_caps() to fix a bug where VMX will compute the
reference VMCS configuration with SHSTK and IBT enabled, but then
compute each CPUs local config with SHSTK and IBT disabled if not all
CET xfeatures are enabled, e.g. if the kernel is built with
X86_KERNEL_IBT=n.
The mismatch in features results in differing nVMX setting, and
ultimately causes kvm-intel.ko to refuse to load with nested=1.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly configure supported XSS from {svm,vmx}_set_cpu_caps()
KVM: selftests: Add -U_FORTIFY_SOURCE to avoid some unpredictable test failures
KVM: x86: Assert that non-MSI doesn't have bypass vCPU when deleting producer
KVM: Don't clobber irqfd routing type when deassigning irqfd
KVM: SVM: Check vCPU ID against max x2AVIC ID if and only if x2AVIC is enabled
|
|
Merge updates of Intel thermal drivers for 6.20/7.0:
- Add Panther Lake, Wildcat Lake and Nova Lake processor IDs to the
list of supported processors in the intel_tcc_cooling thermal
driver (Srinivas Pandruvada)
- Drop unnecessary explicit driver data clearing on removal from the
intel_pch_thermal driver (Kaushlendra Kumar)
- Add support for "slow" workload type hints to the int340x
processor_thermal driver and enable it on the Panther Lake
platform (Srinivas Pandruvada)
- Use sysfs_emit{_at}() in sysfs show functions in Intel thermal
drivers (Thorsten Blum)
- Update the x86_pkg_temp_thermal driver to handle THERMAL_TEMP_INVALID
that can be passed to it via sysfs as expected (Rafael Wysocki)
- Drop a redundant local variable from the intel_tcc_cooling thermal
driver and fix a kerneldoc comment typo in the TCC library (Sumeet
Pawnikar)
* thermal-intel:
drivers: thermal: intel: tcc_cooling: Drop redundant local variable
thermal: intel: x86_pkg_temp_thermal: Handle invalid temperature
thermal: intel: Use sysfs_emit() in a sysfs show function
thermal: intel: fix typo "nagative" in comment for cpu argument
thermal: intel: int340x: Use sysfs_emit{_at}() in sysfs show functions
thermal: intel: selftests: workload_hint: Support slow workload hints
thermal: int340x: processor_thermal: Enable slow workload type hints
thermal: intel: intel_pch_thermal: Drop explicit driver data clearing
thermal: intel: intel_tcc_cooling: Add CPU models in the support list
|
|
Preserve original relative order of anonymous or same-named
types to improve the consistency.
No functional changes.
Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260202120114.3707141-1-dolinux.peng@gmail.com
|
|
Add support for normalized CXL address translation through ACPI PRM method
to support AMD Zen5 platforms. Including a conventions doc that explains
how the translation is implemented and for future implementations that
need such setup to comply with the current implementation method.
cxl: Disable HPA/SPA translation handlers for Normalized Addressing
cxl/region: Factor out code into cxl_region_setup_poison()
cxl/atl: Lock decoders that need address translation
cxl: Enable AMD Zen5 address translation using ACPI PRMT
cxl/acpi: Prepare use of EFI runtime services
cxl: Introduce callback for HPA address ranges translation
cxl/region: Use region data to get the root decoder
cxl/region: Add @hpa_range argument to function cxl_calc_interleave_pos()
cxl/region: Separate region parameter setup and region construction
cxl: Simplify cxl_root_ops allocation and handling
cxl/region: Store HPA range in struct cxl_region
cxl/region: Store root decoder in struct cxl_region
cxl/region: Rename misleading variable name @hpa to @hpa_range
Documentation/driver-api/cxl: ACPI PRM Address Translation Support and AMD Zen5 enablement
cxl, doc: Moving conventions in separate files
cxl, doc: Remove isonum.txt inclusion
|
|
Final KVM fixes for 6.19:
- Fix a bug where AVIC is incorrectly inhibited when running with x2AVIC
disabled via module param (or on a system without x2AVIC).
- Fix a dangling device posted IRQs bug by explicitly checking if the irqfd is
still active (on the list) when handling an eventfd signal, instead of
zeroing the irqfd's routing information when the irqfd is deassigned.
Zeroing the irqfd's routing info causes arm64 and x86's to not disable
posting for the IRQ (kvm_arch_irq_bypass_del_producer() looks for an MSI),
incorrectly leaving the IRQ in posted mode (and leading to use-after-free
and memory leaks on AMD in particular).
This is both the most pressing and scariest, but it's been in -next for
a while.
- Disable FORTIFY_SOURCE for KVM selftests to prevent the compiler from
generating calls to the checked versions of memset() and friends, which
leads to unexpected page faults in guest code due e.g. __memset_chk@plt
not being resolved.
- Explicitly configure the support XSS from within {svm,vmx}_set_cpu_caps() to
fix a bug where VMX will compute the reference VMCS configuration with SHSTK
and IBT enabled, but then compute each CPUs local config with SHSTK and IBT
disabled if not all CET xfeatures are enabled, e.g. if the kernel is built
with X86_KERNEL_IBT=n. The mismatch in features results in differing nVMX
setting, and ultimately causes kvm-intel.ko to refuse to load with nested=1.
|
|
Add AMD Zen5 support for address translation.
Zen5 systems may be configured to use 'Normalized addresses'. Then,
host physical addresses (HPA) are different from their system physical
addresses (SPA). The endpoint has its own physical address space and
an incoming HPA is already converted to the device's physical address
(DPA). Thus it has interleaving disabled and CXL endpoints are
programmed passthrough (DPA == HPA).
Host Physical Addresses (HPAs) need to be translated from the endpoint
to its CXL host bridge, esp. to identify the endpoint's root decoder
and region's address range. ACPI Platform Runtime Mechanism (PRM)
provides a handler to translate the DPA to its SPA. This is documented
in:
AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh
ACPI v6.5 Porting Guide, Publication # 58088
https://www.amd.com/en/search/documentation/hub.html
With Normalized Addressing this PRM handler must be used to translate
an HPA of an endpoint to its SPA.
Do the following to implement AMD Zen5 address translation:
Introduce a new file core/atl.c to handle ACPI PRM specific address
translation code. Naming is loosely related to the kernel's AMD
Address Translation Library (CONFIG_AMD_ATL) but implementation does
not depend on it, nor it is vendor specific. Use Kbuild and Kconfig
options respectively to enable the code depending on architecture and
platform options.
AMD Zen5 systems support the ACPI PRM CXL Address Translation firmware
call (see ACPI v6.5 Porting Guide, Address Translation - CXL DPA to
System Physical Address). Firmware enables the PRM handler if the
platform has address translation implemented. Check firmware and
kernel support of ACPI PRM using the specific GUID. On success enable
address translation by setting up the earlier introduced root port
callback, see function cxl_prm_setup_translation(). Setup is done in
cxl_setup_prm_address_translation(), it is the only function that
needs to be exported. For low level PRM firmware calls, use the ACPI
framework.
Identify the region's interleaving ways by inspecting the address
ranges. Also determine the interleaving granularity using the address
translation callback. Note that the position of the chunk from one
interleaving block to the next may vary and thus cannot be considered
constant. Address offsets larger than the interleaving block size
cannot be used to calculate the granularity. Thus, probe the
granularity using address translation for various HPAs in the same
interleaving block.
[ dj: Add atl.o build to cxl_test ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://patch.msgid.link/20260114164837.1076338-11-rrichter@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
This test allows to test the various storage key handling functions.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
We had a few patches in this area and no explicit coverage so far.
The test case covers the scenario addressed by the previous fix;
reusing the existing udpgro_fwd.sh script to leverage part of the
of the virtual network setup, even if such script is possibly not
a perfect fit.
Note that the mentioned script already contains several shellcheck
violation; this patch does not fix the existing code, just avoids
adding more issues in the new one.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/768ca132af81e83856e34d3105b86c37e566a7ad.1770032084.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test to stress bpf_timer_start and map_delete race
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-10-alexei.starovoitov@gmail.com
|
|
Now bpf_timer can be used in tracepoints, so these tests are no longer
relevant.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-9-alexei.starovoitov@gmail.com
|
|
Add stress tests for BPF timers that run in NMI context using perf_event
programs attached to PERF_COUNT_HW_CPU_CYCLES.
The tests cover three scenarios:
- nmi_race: Tests concurrent timer start and async cancel operations
- nmi_update: Tests updating a map element (effectively deleting and
inserting new for array map) from within a timer callback
- nmi_cancel: Tests timer self-cancellation attempt.
A common test_common() helper is used to share timer setup logic across
all test modes.
The tests spawn multiple threads in a child process to generate
perf events, which trigger the BPF programs in NMI context. Hit counters
verify that the NMI code paths were actually exercised.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-8-alexei.starovoitov@gmail.com
|
|
Add test that verifies that bpf_timer_cancel_async works: can cancel
callback successfully.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-7-alexei.starovoitov@gmail.com
|
|
Extend BPF timer selftest to run stress test for async cancel.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-6-alexei.starovoitov@gmail.com
|
|
Refactor timer selftests, extracting stress test into a separate test.
This makes it easier to debug test failures and allows to extend.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-5-alexei.starovoitov@gmail.com
|
|
Add a selftest to ensure BPF stream functions can now be called
while holding a lock.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260203180424.14057-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add selftests for the new bpf_stream_print_stack kfunc.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260203180424.14057-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Test that two registers with their id=0 (unlinked) in the cached state
can be mapped to a single id (linked) in the current state.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Scalar register IDs are used by the verifier to track relationships
between registers and enable bounds propagation across those
relationships. Once an ID becomes singular (i.e. only a single
register/stack slot carries it), it can no longer contribute to bounds
propagation and effectively becomes stale. The previous commit makes the
verifier clear such ids before caching the state.
When comparing the current and cached states for pruning, these stale
IDs can cause technically equivalent states to be considered different
and thus prevent pruning.
For example, in the selftest added in the next commit, two registers -
r6 and r7 are not linked to any other registers and get cached with
id=0, in the current state, they are both linked to each other with
id=A. Before this commit, check_scalar_ids would give temporary ids to
r6 and r7 (say tid1 and tid2) and then check_ids() would map tid1->A,
and when it would see tid2->A, it would not consider these state
equivalent.
Relax scalar ID equivalence by treating rold->id == 0 as "independent":
if the old state did not rely on any ID relationships for a register,
then any ID/linking present in the current state only adds constraints
and is always safe to accept for pruning. Implement this by returning
true immediately in check_scalar_ids() when old_id == 0.
Maintain correctness for the opposite direction (old_id != 0 && cur_id
== 0) by still allocating a temporary ID for cur_id == 0. This avoids
incorrectly allowing multiple independent current registers (id==0) to
satisfy a single linked old ID during mapping.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260203165102.2302462-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Linux Accurate ECN test sets using ACE counters and AccECN options to
cover several scenarios: Connection teardown, different ACK conditions,
counter wrapping, SACK space grabbing, fallback schemes, negotiation
retransmission/reorder/loss, AccECN option drop/loss, different
handshake reflectors, data with marking, and different sysctl values.
The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56
of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn.
And corresponding patches are sent to google/packetdrill email list.
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Co-developed-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, GRO does not flush packets when the CWR bit is set.
A corresponding self-test is being added, in which the CWR flag
is set for two consecutive packets, but the first packet with the
CWR flag set will not be flushed immediately.
+===================+==========+===============+===========+
| Packet id | CWR flag | Payload | Flushing? |
+===================+==========+===============+===========+
| 0 | 0 | PAYLOAD_LEN | 0 |
| ... | 0 | PAYLOAD_LEN | 1 |
+-------------------+----------+---------------+-----------+
| NUM_PACKETS/2 - 1 | 1 | payload_len | 0 |
| NUM_PACKETS/2 | 1 | payload_len | 1 |
+-------------------+----------+---------------+-----------+
| ... | 0 | PAYLOAD_LEN | 0 |
| NUM_PACKETS | 0 | PAYLOAD_LEN | 1 |
+===================+==========+===============+===========+
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260131222515.8485-4-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a new kselftest to verify that the total_bw value in
/sys/kernel/debug/sched/debug remains consistent across all CPUs
under different sched_ext BPF program states:
1. Before a BPF scheduler is loaded
2. While a BPF scheduler is loaded and active
3. After a BPF scheduler is unloaded
The test runs CPU stress threads to ensure DL server bandwidth
values stabilize before checking consistency. This helps catch
potential issues with DL server bandwidth accounting during
sched_ext transitions.
Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-8-arighi@nvidia.com
|
|
Add a selftest to validate the correct behavior of the deadline server
for the ext_sched_class.
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-7-arighi@nvidia.com
|
|
Update to avoid conflicts with /urgent patches.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.
Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Usage:
./mptcp_connect.sh -m splice
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make the default (non-JSON) output more compact. Looking at RSS
context dumps is pretty much impossible without this, because
default print shows the indirection table with line per entry:
'indir': [0,
1,
2,
...
And indirection tables have 100-200 entries each.
The compact output is far more readable:
'indir': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20260131203029.1173492-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test which checks that the RSS table is at least 4x the max
queue count supported by the device. The original RSS spec from
Microsoft stated that the RSS indirection table should be 2 to 8
times the CPU count, presumably assuming queue per CPU. If the
CPU count is not a power of two, however, a power-of-2 table
2x larger than queue count results in a 33% traffic imbalance.
Validate that the indirection table is at least 4x the queue
count. This lowers the imbalance to 16% which empirically
appears to be more acceptable to memcache-like workloads.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260131225454.1225151-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The init_enable_count test is flaky. The test forks 1024 children before
attaching the scheduler to verify that existing tasks get ops.init_task()
called. The children were using sleep(1) before exiting.
7900aa699c34 ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free()
to finish_task_switch()") changed when tasks are removed from scx_tasks -
previously when the task_struct was freed, now immediately in
finish_task_switch() when the task dies.
Before the commit, pre-forked children would linger on scx_tasks until freed
regardless of when they exited, so the scheduler would always see them during
iteration. The sleep(1) was unnecessary. After the commit, children are
removed as soon as they die. The sleep(1) masks the problem in most cases but
the test becomes flaky depending on timing.
Fix by synchronizing properly using a pipe. All children block on read() and
the parent signals them to exit by closing the write end after attaching the
scheduler. The children are auto-reaped so there's no need to wait on them.
Reported-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Besides the parameters that are passed via command line arguments,
the wrapper's behavior is affected by several environment variables.
Document that. While here, use __doc__ for its description.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <e0ccee75f7e7fb499e0f59d5b84469f4b6a21627.1769500383.git.mchehab+huawei@kernel.org>
|
|
Documentation builds were using "-q" for a long time, but sometimes
it is nice to see the Sphinx progress, without increasing build
verbosity - which would also turn on kernel-doc verbosity.
Instead of doing that, let's parse the sphinx-build already-existing
-v: each time it is used, it increases the verbosity level.
With that, if the default is to use -q, a single -v will disable
quiet mode. Passing more -v will keep increasing its verbosity.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <38b24e97a3cbd2def418359a8e69b1b087a945ad.1769500383.git.mchehab+huawei@kernel.org>
|
|
the "\1" inside a docstring requires proper scaping to not be
considered a hex character and break the build.
Reported-by: Akira Yokosawa <akiyks@gmail.com>
Closes: https://lore.kernel.org/linux-doc/63e99049-cc72-4156-83af-414fdde34312@gmail.com/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <2fff8ef1d0d64e8b68f15f5c07613f302d773855.1769500383.git.mchehab+huawei@kernel.org>
|
|
Fixup and refactor downstream port enumeration to prepare for CXL port
protocol error handling. Main motivation is to move endpoint
component register mapping to a port object.
cxl/port: Unify endpoint and switch port lookup
cxl/port: Move endpoint component register management to cxl_port
cxl/port: Map Port RAS registers
cxl/port: Move dport RAS setup to dport add time
cxl/port: Move dport probe operations to a driver event
cxl/port: Move decoder setup before dport creation
cxl/port: Cleanup dport removal with a devres group
cxl/port: Reduce number of @dport variables in cxl_port_add_dport()
cxl/port: Cleanup handling of the nr_dports 0 -> 1 transition
|
|
Fix a pointer arithmetic error in scx_sdt during freeing that
causes the allocator to use the wrong memory address for the
allocation's data header.
Fixes: 36929ebd17ae ("tools/sched_ext: add arena based scheduler")
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Acked-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Towards the end goal of making all CXL RAS capability handling uniform
across host bridge ports, upstream switch ports, and endpoint ports, move
dport RAS setup. Move it to cxl_switch_port_probe() context for switch / VH
dports (via cxl_port_add_dport()) and cxl_endpoint_port_probe() context for
an RCH dport. Rename the RAS setup helper to devm_cxl_dport_ras_setup() for
symmetry with devm_cxl_switch_port_decoders_setup().
Only the RCH version needs to be exported and the cxl_test mocking can be
deleted with a dev_is_pci() check on the dport_dev.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260131000403.2135324-7-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
In preparation for adding more register setup to the cxl_port_add_dport()
path (for RAS register mapping), move the dport creation event to a driver
callback. This achieves two goals, it puts driver operations logically
where they belong, in a driver, and it obviates the gymnastics of
DECLARE_TESTABLE() which just makes a mess of grepping for CXL symbols.
In other words, a driver callback is less of an ongoing maintenance burden
than this DECLARE_TESTABLE arrangement that does not scale and diminishes
the grep-ability of the codebase.
cxl_port_add_dport() moves mostly unmodified from drivers/cxl/core/port.c.
The only deliberate change is that it now assumes that the device_lock is
held on entry and the driver is attached (just like cxl_port_probe()).
Reviewed-by: Terry Bowman <terry.bowman@amd.com>
Tested-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260131000403.2135324-6-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
- Fix a build error on ia32-x86_64 cross builds
- Replace locally open coded ALIGN_UP(), ALIGN_UP_POW2()
and MAX(), which, beyond being duplicates, the
ALIGN_UP_POW2() is also buggy
- Fix objtool klp-diff regression caused by a recent
change to the bug table format
- Fix klp-build vs CONFIG_MODULE_SRCVERSION_ALL build
failure
* tag 'objtool-urgent-2026-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
livepatch/klp-build: Fix klp-build vs CONFIG_MODULE_SRCVERSION_ALL
objtool/klp: Fix bug table handling for __WARN_printf()
objtool: Replace custom macros in elf.c with shared ones
objtool: Print bfd_vma as unsigned long long on ia32-x86_64 cross build
|
|
schedulers
Add scx_bpf_error() calls when scx_bpf_create_dsq() fails in the remaining
schedulers to improve debuggability:
- scx_simple.bpf.c: simple_init()
- scx_sdt.bpf.c: sdt_init()
- scx_cpu0.bpf.c: cpu0_init()
- scx_flatcg.bpf.c: fcg_init()
This follows the same pattern established in commit 2f8d489897ae
("sched_ext: Add error logging for dsq creation failures") for other
schedulers and ensures consistent error reporting across all schedulers.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|