| Age | Commit message (Collapse) | Author |
|
Add --rdonly_shmem_buf option to kublk that registers shared memory
buffers with UBLK_SHMEM_BUF_READ_ONLY (read-only pinning without
FOLL_WRITE) and mmaps with PROT_READ only.
Add test_shmemzc_04.sh which exercises the new flag with a null target,
hugetlbfs buffer, and write workload. Write I/O works because the
server only reads from the shared buffer — the data flows from client
to kernel to the shared pages, and the server reads them out.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-11-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test_shmemzc_03.sh which exercises shmem_zc through the full
filesystem stack: mkfs ext4 on the ublk device, mount it, then run
fio verify on a file inside the filesystem with --mem=mmaphuge.
Extend _mkfs_mount_test() to accept an optional command that runs
between mount and umount. The function cd's into the mount directory
so the command can use relative file paths. Existing callers that
pass only the device are unaffected.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-10-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test_shmem_zc_02.sh which tests the UBLK_IO_F_SHMEM_ZC zero-copy
path on the loop target using a hugetlbfs shared buffer. Both kublk and
fio mmap the same hugetlbfs file with MAP_SHARED, sharing physical
pages. The kernel's PFN matching enables zero-copy — the loop target
reads/writes directly from the shared buffer to the backing file.
Uses standard fio --mem=mmaphuge:<path> (supported since fio 1.10),
no patched fio required.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-9-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test_shmem_zc_01.sh which tests UBLK_IO_F_SHMEM_ZC on the null
target using a hugetlbfs shared buffer. Both kublk (--htlb) and fio
(--mem=mmaphuge:<path>) mmap the same hugetlbfs file with MAP_SHARED,
sharing physical pages. The kernel PFN match enables zero-copy I/O.
Uses standard fio --mem=mmaphuge:<path> (supported since fio 1.10),
no patched fio required.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add loop_queue_shmem_zc_io() which handles I/O requests marked with
UBLK_IO_F_SHMEM_ZC. When the kernel sets this flag, the request data
lives in a registered shared memory buffer — decode index + offset
from iod->addr and use the server's mmap as the I/O buffer.
The dispatch check in loop_queue_tgt_rw_io() routes SHMEM_ZC requests
to this new function, bypassing the normal buffer registration path.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add infrastructure for UBLK_F_SHMEM_ZC shared memory zero-copy:
- kublk.h: struct ublk_shmem_entry and table for tracking registered
shared memory buffers
- kublk.c: per-device unix socket listener that accepts memfd
registrations from clients via SCM_RIGHTS fd passing. The listener
mmaps the memfd and registers the VA range with the kernel for PFN
matching. Also adds --shmem_zc command line option.
- kublk.c: --htlb <path> option to open a pre-allocated hugetlbfs
file, mmap it with MAP_SHARED|MAP_POPULATE, and register it with
the kernel via ublk_ctrl_reg_buf(). Any process that mmaps the same
hugetlbfs file shares the same physical pages, enabling zero-copy
without socket-based fd passing.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260331153207.3635125-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Before the fix, teardown of a ublk server that was attempting to recover
a device, but died when it had submitted a nonempty proper subset of the
fetch commands to any queue would loop forever. Add a test to verify
that, after the fix, teardown completes. This is done by:
- Adding a new argument to the fault_inject target that causes it die
after fetching a nonempty proper subset of the IOs to a queue
- Using that argument in a new test while trying to recover an
already-created device
- Attempting to delete the ublk device at the end of the test; this
hangs forever if teardown from the fault-injected ublk server never
completed.
It was manually verified that the test passes with the fix and hangs
without it.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://patch.msgid.link/20260405-cancel-v2-2-02d711e643c2@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Support for batch request processing for ublk, improving the
efficiency of the kernel/ublk server communication. This can yield
nice 7-12% performance improvements
- Support for integrity data for ublk
- Various other ublk improvements and additions, including a ton of
selftests additions and updated
- Move the handling of blk-crypto software fallback from below the
block layer to above it. This reduces the complexity of dealing with
bio splitting
- Series fixing a number of potential deadlocks in blk-mq related to
the queue usage counter and writeback throttling and rq-qos debugfs
handling
- Add an async_depth queue attribute, to resolve a performance
regression that's been around for a qhilw related to the scheduler
depth handling
- Only use task_work for IOPOLL completions on NVMe, if it is necessary
to do so. An earlier fix for an issue resulted in all these
completions being punted to task_work, to guarantee that completions
were only run for a given io_uring ring when it was local to that
ring. With the new changes, we can detect if it's necessary to use
task_work or not, and avoid it if possible.
- rnbd fixes:
- Fix refcount underflow in device unmap path
- Handle PREFLUSH and NOUNMAP flags properly in protocol
- Fix server-side bi_size for special IOs
- Zero response buffer before use
- Fix trace format for flags
- Add .release to rnbd_dev_ktype
- MD pull requests via Yu Kuai
- Fix raid5_run() to return error when log_init() fails
- Fix IO hang with degraded array with llbitmap
- Fix percpu_ref not resurrected on suspend timeout in llbitmap
- Fix GPF in write_page caused by resize race
- Fix NULL pointer dereference in process_metadata_update
- Fix hang when stopping arrays with metadata through dm-raid
- Fix any_working flag handling in raid10_sync_request
- Refactor sync/recovery code path, improve error handling for
badblocks, and remove unused recovery_disabled field
- Consolidate mddev boolean fields into mddev_flags
- Use mempool to allocate stripe_request_ctx and make sure
max_sectors is not less than io_opt in raid5
- Fix return value of mddev_trylock
- Fix memory leak in raid1_run()
- Add Li Nan as mdraid reviewer
- Move phys_vec definitions to the kernel types, mostly in preparation
for some VFIO and RDMA changes
- Improve the speed for secure erase for some devices
- Various little rust updates
- Various other minor fixes, improvements, and cleanups
* tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
blk-mq: ABI/sysfs-block: fix docs build warnings
selftests: ublk: organize test directories by test ID
block: decouple secure erase size limit from discard size limit
block: remove redundant kill_bdev() call in set_blocksize()
blk-mq: add documentation for new queue attribute async_dpeth
block, bfq: convert to use request_queue->async_depth
mq-deadline: covert to use request_queue->async_depth
kyber: covert to use request_queue->async_depth
blk-mq: add a new queue sysfs attribute async_depth
blk-mq: factor out a helper blk_mq_limit_depth()
blk-mq-sched: unify elevators checking for async requests
block: convert nr_requests to unsigned int
block: don't use strcpy to copy blockdev name
blk-mq-debugfs: warn about possible deadlock
blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs()
blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos()
blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static
blk-rq-qos: fix possible debugfs_mutex deadlock
blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos
blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"resctrl test:
- fix division by zero error on Hygon
- fix non-contiguous CBM check for Hygon
- define CPU vendor IDs as bits to match usage
- add CPU vendor detection for Hygon
misc:
- coredeump test: use __builtin_trap() instead of a null pointer
- anon_inode: replace null pointers with empty arrays
- kublk: include message in _Static_assert for C11 compatibility
- run_kselftest.sh: add `--skip` argument option
- pidfd: fix typo in comment"
* tag 'linux_kselftest-next-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/pidfd: fix typo in comment
selftests/run_kselftest.sh: Add `--skip` argument option
selftests/resctrl: Fix non-contiguous CBM check for Hygon
selftests/resctrl: Add CPU vendor detection for Hygon
selftests/resctrl: Define CPU vendor IDs as bits to match usage
selftests/resctrl: Fix a division by zero error on Hygon
kselftest/kublk: include message in _Static_assert for C11 compatibility
kselftest/anon_inode: replace null pointers with empty arrays
kselftest/coredump: use __builtin_trap() instead of null pointer
|
|
Set UBLK_TEST_DIR to ${TMPDIR:-./ublktest-dir}/${TID}.XXXXXX to create
per-test subdirectories organized by test ID. This makes it easier to
identify and debug specific test runs.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove test_generic_01.sh since block layer may reorder I/O, making
the test prone to false positives. Apply the improvements to
test_generic_02.sh instead, which supposes for covering ublk dispatch
io order.
Rework test_generic_02 to verify that ublk dispatch doesn't reorder I/O
by comparing request start order with completion order using bpftrace.
The bpftrace script now:
- Tracks each request's start sequence number in a map keyed by sector
- On completion, verifies the request's start order matches expected
completion order
- Reports any out-of-order completions detected
The test script:
- Wait bpftrace BEGIN code block is run
- Pins fio to CPU 0 for deterministic behavior
- Uses block_io_start and block_rq_complete tracepoints
- Checks bpftrace output for reordering errors
Reported-and-tested-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move integrity-focused tests into new 'integrity' group:
- test_null_04.sh -> test_integrity_01.sh
- test_loop_08.sh -> test_integrity_02.sh
Move recovery-focused tests into new 'recover' group:
- test_generic_04.sh -> test_recover_01.sh
- test_generic_05.sh -> test_recover_02.sh
- test_generic_11.sh -> test_recover_03.sh
- test_generic_14.sh -> test_recover_04.sh
Update Makefile to reflect the reorganization.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When running tests in parallel with high JOBS count (e.g., JOBS=64),
the existing timeouts can be insufficient due to system load:
- Increase state wait loops from 20/50 to 100 iterations in
_recover_ublk_dev(), __ublk_quiesce_dev(), and __ublk_kill_daemon()
to handle slower state transitions under heavy load
- Add --timeout=20 to udevadm settle calls to prevent indefinite
hangs when udev event queue is overwhelmed by rapid device
creation/deletion
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add _ublk_sleep() helper function that uses different sleep times
depending on whether tests run in parallel or sequential mode.
Usage: _ublk_sleep <normal_secs> <parallel_secs>
Export JOBS variable from Makefile so test scripts can detect parallel
execution, and use _ublk_sleep in test_part_02.sh to handle the
partition scan delay (1s normal, 5s parallel).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add convenient Makefile targets for running specific test groups:
- run_generic, run_batch, run_null, run_loop, run_stripe, run_stress, etc.
- run_all for running all tests
Test groups are auto-detected from TEST_PROGS using pattern matching
(test_<group>_<num>.sh -> group), and targets are generated dynamically
using define/eval templates.
Supports parallel execution via JOBS variable:
- JOBS=1 (default): sequential with kselftest TAP output
- JOBS>1: parallel execution with xargs -P
Usage examples:
make run_null # Sequential execution
make run_stress JOBS=4 # Parallel with 4 jobs
make run_all JOBS=8 # Run all tests with 8 parallel jobs
With JOBS=8, running time of `make run_all` is reduced to 2m2s from 6m5s
in my test VM.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Track device IDs in UBLK_DEVS array when created. Update
_cleanup_test() to only delete devices created by this test
instead of using 'del -a' which removes all devices.
This prepares for running tests concurrently where each test
should only clean up its own devices.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add _ublk_del_dev() to delete a specific ublk device by ID and
use it in all test scripts instead of calling UBLK_PROG directly.
Also remove unused _remove_ublk_devices() function.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Encapsulate each test case in its own function for better organization
and maintainability:
- _setup_device(): device and backfile initialization
- _test_fill_and_verify(): initial data population
- _test_corrupted_reftag(): reftag corruption detection test
- _test_corrupted_data(): data corruption detection test
- _test_bad_apptag(): apptag mismatch detection test
Also fix temp file creation to use ${UBLK_TEST_DIR}/fio_err_XXXXX instead of
creating in current directory.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove intermediate TDIR variable and set UBLK_TEST_DIR directly
in _prep_test(). Remove default initialization since the directory
is created dynamically when tests run.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Create and use a temporary directory for the files created during
test runs. If TMPDIR environment variable is set use it as a base
for the temporary directory path.
TMPDIR=/mnt/scratch make run_tests
and
TMPDIR=/mnt/scratch ./test_generic_01.sh
will place test directory under /mnt/scratch
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Log test start and end time in dmesg, so generated log messages
during the test run can be linked to specific test from the test
suite.
(switch to `date +%F %T`)
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The null target doesn't handle IO, so disable partition scan to avoid IO
failures caused by integrity verification during the kernel's partition
table read.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Encapsulate each test case in its own function that creates the
device, runs checks, and deletes only that device. This avoids
calling _cleanup_test multiple times.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This test exercises partition scanning behavior, so move it to
the test_part_* group for consistency.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test_part_01.sh to test the UBLK_F_NO_AUTO_PART_SCAN feature
flag which allows suppressing automatic partition scanning during
device startup while still allowing manual partition probing.
The test verifies:
- Normal behavior: partitions are auto-detected without the flag
- With flag: partitions are not auto-detected during START_DEV
- Manual scan: blockdev --rereadpt works with the flag
Also update kublk tool to support --no_auto_part_scan option and
recognize the feature flag.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add automatic TID derivation in test_common.sh based on the script
filename. The TID is extracted by stripping the "test_" prefix and
".sh" suffix from the script name (e.g., test_loop_01.sh -> loop_01).
This removes the need for each test script to manually define TID,
reducing boilerplate and preventing potential mismatches between
the script name and TID. Scripts can still override TID after
sourcing test_common.sh if needed.
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A new utility metadata_size was added in
commit 261b67f4e347 ("selftests: ublk: add utility to get block device metadata size")
but it was not added to .gitignore. Fix that by adding it there.
While at it sort all entries alphabetically and add a SPDX license header.
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 261b67f4e347 ("selftests: ublk: add utility to get block device metadata size")
Signed-off-by: Alexander Atanasov <alex@zazolabs.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix the two added test name.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Enable flexible thread-to-queue mapping in batch I/O mode to support
arbitrary combinations of threads and queues, improving resource
utilization and scalability.
Key improvements:
- Support N:M thread-to-queue mapping (previously limited to 1:1)
- Dynamic buffer allocation based on actual queue assignment per thread
- Thread-safe queue preparation with spinlock protection
- Intelligent buffer index calculation for multi-queue scenarios
- Enhanced validation for thread/queue combination constraints
Implementation details:
- Add q_thread_map matrix to track queue-to-thread assignments
- Dynamic allocation of commit and fetch buffers per thread
- Round-robin queue assignment algorithm for load balancing
- Per-queue spinlock to prevent race conditions during prep
- Updated buffer index calculation using queue position within thread
This enables efficient configurations like:
- Any other N:M combinations for optimal resource matching
Testing:
- Added test_batch_02.sh: 4 threads vs 1 queue
- Added test_batch_03.sh: 1 thread vs 4 queues
- Validates correctness across different mapping scenarios
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add --batch/-b for enabling F_BATCH_IO.
Add batch_01 for covering its basic function.
Add stress_08 and stress_09 for covering stress test.
Add recovery test for F_BATCH_IO in generic_04 and generic_05.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
More tests need to be covered in existing generic tests, and default
45sec isn't enough, and timeout is often triggered, increase timeout
by adding setting file.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for UBLK_U_IO_FETCH_IO_CMDS to enable efficient batch
fetching of I/O commands using multishot io_uring operations.
Key improvements:
- Implement multishot UBLK_U_IO_FETCH_IO_CMDS for continuous command fetching
- Add fetch buffer management with page-aligned, mlocked buffers
- Process fetched I/O command tags from kernel-provided buffers
- Integrate fetch operations with existing batch I/O infrastructure
- Significantly reduce uring_cmd issuing overhead through batching
The implementation uses two fetch buffers per thread with automatic
requeuing to maintain continuous I/O command flow. Each fetch operation
retrieves multiple command tags in a single syscall, dramatically
improving performance compared to individual command fetching.
Technical details:
- Fetch buffers are page-aligned and mlocked for optimal performance
- Uses IORING_URING_CMD_MULTISHOT for continuous operation
- Automatic buffer management and requeuing on completion
- Enhanced CQE handling for fetch command completions
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Implement UBLK_U_IO_COMMIT_IO_CMDS to enable efficient batched
completion of I/O operations in the batch I/O framework.
This completes the batch I/O infrastructure by adding the commit
phase that notifies the kernel about completed I/O operations:
Key features:
- Batch multiple I/O completions into single UBLK_U_IO_COMMIT_IO_CMDS
- Dynamic commit buffer allocation and management per thread
- Automatic commit buffer preparation before processing events
- Commit buffer submission after processing completed I/Os
- Integration with existing completion workflows
Implementation details:
- ublk_batch_prep_commit() allocates and initializes commit buffers
- ublk_batch_complete_io() adds completed I/Os to current batch
- ublk_batch_commit_io_cmds() submits batched completions to kernel
- Modified ublk_process_io() to handle batch commit lifecycle
- Enhanced ublk_complete_io() to route to batch or legacy completion
The commit buffer stores completion information (tag, result, buffer
details) for multiple I/Os, then submits them all at once, significantly
reducing syscall overhead compared to individual I/O completions.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Implement support for UBLK_U_IO_PREP_IO_CMDS in the batch I/O framework:
- Add batch command initialization and setup functions
- Implement prep command queueing with proper buffer management
- Add command completion handling for prep and commit commands
- Integrate batch I/O setup into thread initialization
- Update CQE handling to support batch commands
The implementation uses the previously established buffer management
infrastructure to queue UBLK_U_IO_PREP_IO_CMDS commands. Commands are
prepared in the first thread context and use commit buffers for
efficient command batching.
Key changes:
- ublk_batch_queue_prep_io_cmds() prepares I/O command batches
- ublk_batch_compl_cmd() handles batch command completions
- Modified thread setup to use batch operations when enabled
- Enhanced buffer index calculation for batch mode
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the foundational infrastructure for UBLK_F_BATCH_IO buffer
management including:
- Allocator utility functions for small sized per-thread allocation
- Batch buffer allocation and deallocation functions
- Buffer index management for commit buffers
- Thread state management for batch I/O mode
- Buffer size calculation based on device features
This prepares the groundwork for handling batch I/O commands by
establishing the buffer management layer needed for UBLK_U_IO_PREP_IO_CMDS
and UBLK_U_IO_COMMIT_IO_CMDS operations.
The allocator uses CPU sets for efficient per-thread buffer tracking,
and commit buffers are pre-allocated with 2 buffers per thread to handle
overlapping command operations.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since UBLK_F_PER_IO_DAEMON is added, io buffer index may depend on current
thread because the common way is to use per-pthread io_ring_ctx for issuing
ublk uring_cmd.
Add one helper for returning io buffer index, so we can hide the buffer
index implementation details for target code.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Replace assert() with ublk_assert() since it is often triggered in daemon,
and we may get nothing shown in terminal.
Add ublk_assert(), so we can log something to syslog when assert() is
triggered.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The build_user_data() function packs multiple fields into a __u64
value using bit shifts. Without explicit __u64 casts before shifting,
the shift operations are performed on 32-bit unsigned integers before
being promoted to 64-bit, causing data loss.
Specifically, when tgt_data >= 256, the expression (tgt_data << 24)
shifts on a 32-bit value, truncating the upper 8 bits before promotion
to __u64. Since tgt_data can be up to 16 bits (assertion allows up to
65535), values >= 256 would have their high byte lost.
Add explicit __u64 casts to both op and tgt_data before shifting to
ensure the shift operations happen in 64-bit space, preserving all
bits of the input values.
user_data_to_tgt_data() is only used by stripe.c, in which the max
supported member disks are 4, so won't trigger this issue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
when running kublk in foreground mode. Without this, _evtfd is
zero-initialized to 0 (stdin), and ublk_send_dev_event() writes
binary data to stdin which appears as garbage on the terminal.
Also fix debug message format string.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix error handling in ublk_start_daemon() when start_dev fails:
1. Call ublk_ctrl_stop_dev() to cancel inflight uring_cmd before
cleanup. Without this, the device deletion may hang waiting for
I/O completion that will never happen.
2. Add fail_start label so that pthread_join() is called on the
error path. This ensures proper thread cleanup when startup fails.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Include cmd_inflight in ublk_thread_is_done() check. Without this,
the thread may exit before all FETCH commands are completed, which
may cause device deletion to hang.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add 'stop' subcommand to kublk utility that uses the new
UBLK_CMD_TRY_STOP_DEV command when --safe option is specified.
This allows stopping a device only if it has no active openers,
returning -EBUSY otherwise.
Also add test_generic_16.sh to test the new functionality.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test case loop_08 to verify the ublk integrity data flow. It uses
the kublk loop target to create a ublk device with integrity on top of
backing data and integrity files. It then writes to the whole device
with fio configured to generate integrity data. Then it reads back the
whole device with fio configured to verify the integrity data.
It also verifies that injected guard, reftag, and apptag corruptions are
correctly detected.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add test case null_04 to exercise all the different integrity params. It
creates 4 different ublk devices with different combinations of
integrity arguments and verifies their integrity limits via sysfs and
the metadata_size utility.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To perform and end-to-end test of integrity information through a ublk
device, we need to actually store it somewhere and retrieve it. Add this
support to kublk's loop target. It uses a second backing file for the
integrity data corresponding to the data stored in the first file.
The integrity file is initialized with byte 0xFF, which ensures the app
and reference tags are set to the "escape" pattern to disable the
bio-integrity-auto guard and reftag checks until the blocks are written.
The integrity file is opened without O_DIRECT since it will be accessed
at sub-block granularity. Each incoming read/write results in a pair of
reads/writes, one to the data file, and one to the integrity file. If
either backing I/O fails, the error is propagated to the ublk request.
If both backing I/Os read/write some bytes, the ublk request is
completed with the smaller of the number of blocks accessed by each I/O.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A subsequent commit will add support for using a backing file to store
integrity data. Since integrity data is accessed in intervals of
metadata_size, which may be much smaller than a logical block on the
backing device, direct I/O cannot be used. Add an argument to
backing_file_tgt_init() to specify the number of files to open for
direct I/O. The remaining files will use buffered I/O. For now, continue
to request direct I/O for all the files.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If integrity data is enabled for kublk, allocate an integrity buffer for
each I/O. Extend ublk_user_copy() to copy the integrity data between the
ublk request and the integrity buffer if the ublksrv_io_desc indicates
that the request has integrity data.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add integrity param command line arguments to kublk. Plumb these to
struct ublk_params for the null and fault_inject targets, as they don't
need to actually read or write the integrity data. Forbid the integrity
params for loop or stripe until the integrity data copy is implemented.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Some block device integrity parameters are available in sysfs, but
others are only accessible using the FS_IOC_GETLBMD_CAP ioctl. Add a
metadata_size utility program to print out the logical block metadata
size, PI offset, and PI size within the metadata. Example output:
$ metadata_size /dev/ublkb0
metadata_size: 64
pi_offset: 56
pi_tuple_size: 8
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for printing the UBLK_F_INTEGRITY feature flag in the
human-readable kublk features output.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|