summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2026-01-20block: pass io_comp_batch to rq_end_io_fn callbackMing Lei
Add a third parameter 'const struct io_comp_batch *' to the rq_end_io_fn callback signature. This allows end_io handlers to access the completion batch context when requests are completed via blk_mq_end_request_batch(). The io_comp_batch is passed from blk_mq_end_request_batch(), while NULL is passed from __blk_mq_end_request() and blk_mq_put_rq_ref() which don't have batch context. This infrastructure change enables drivers to detect whether they're being called from a batched completion path (like iopoll) and access additional context stored in the io_comp_batch. Update all rq_end_io_fn implementations: - block/blk-mq.c: blk_end_sync_rq - block/blk-flush.c: flush_end_io, mq_flush_data_end_io - drivers/nvme/host/ioctl.c: nvme_uring_cmd_end_io - drivers/nvme/host/core.c: nvme_keep_alive_end_io - drivers/nvme/host/pci.c: abort_endio, nvme_del_queue_end, nvme_del_cq_end - drivers/nvme/target/passthru.c: nvmet_passthru_req_done - drivers/scsi/scsi_error.c: eh_lock_door_done - drivers/scsi/sg.c: sg_rq_end_io - drivers/scsi/st.c: st_scsi_execute_end - drivers/target/target_core_pscsi.c: pscsi_req_done - drivers/md/dm-rq.c: end_clone_request Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-19dm-unstripe: fix mapping bug when there are multiple targets in a tableMatt Whitlock
The "unstriped" device-mapper target incorrectly calculates the sector offset on the mapped device when the target's origin is not zero. Take for example this hypothetical concatenation of the members of a two-disk RAID0: linearized: 0 2097152 unstriped 2 128 0 /dev/md/raid0 0 linearized: 2097152 2097152 unstriped 2 128 1 /dev/md/raid0 0 The intent in this example is to create a single device named /dev/mapper/linearized that comprises all of the chunks of the first disk of the RAID0 set, followed by all of the chunks of the second disk of the RAID0 set. This fails because dm-unstripe.c's map_to_core function does its computations based on the sector number within the mapper device rather than the sector number within the target. The bug turns invisible when the target's origin is at sector zero of the mapper device, as is the common case. In the example above, however, what happens is that the first half of the mapper device gets mapped correctly to the first disk of the RAID0, but the second half of the mapper device gets mapped past the end of the RAID0 device, and accesses to any of those sectors return errors. Signed-off-by: Matt Whitlock <kernel@mattwhitlock.name> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 18a5bf270532 ("dm: add unstriped target")
2026-01-19dm-integrity: fix recalculation in bitmap modeMikulas Patocka
There's a logic quirk in the handling of suspend in the bitmap mode: This is the sequence of calls if we are reloading a dm-integrity table: * dm_integrity_ctr reads a superblock with the flag SB_FLAG_DIRTY_BITMAP set. * dm_integrity_postsuspend initializes a journal and clears the flag SB_FLAG_DIRTY_BITMAP. * dm_integrity_resume sees the superblock with SB_FLAG_DIRTY_BITMAP set - thus it interprets the journal as if it were a bitmap. This quirk causes recalculation problem if the user increases the size of the device in the bitmap mode. Fix this by reading a fresh copy on the superblock in dm_integrity_resume. This commit also fixes another logic quirk - the branch that sets bitmap bits if the device was extended should only be executed if the flag SB_FLAG_DIRTY_BITMAP is set. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Ondrej Kozina <okozina@redhat.com> Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode") Cc: stable@vger.kernel.org
2026-01-19dm-bufio: avoid redundant buffer_tree lookupsEric Biggers
dm-bufio's map from block number to buffer is organized as a hash table of red-black trees. It does far more lookups in this hash table than necessary: typically one lookup to lock the tree, one lookup to search the tree, and one lookup to unlock the tree. Only one of those lookups is needed. Optimize it to do only the minimum number of lookups. This improves performance. It also reduces the object code size, considering that the redundant hash table lookups were being inlined. For example, the size of the text section of dm-bufio.o decreases from 15599 to 15070 bytes with gcc 15 and x86_64, or from 20652 to 20244 bytes with clang 21 and arm64. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-19dm-bufio: merge cache_put() into cache_put_and_wake()Eric Biggers
Merge cache_put() into its only caller, cache_put_and_wake(). Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-19dm-verity: add dm-verity keyringChristian Brauner
Add a dedicated ".dm-verity" keyring for root hash signature verification, similar to the ".fs-verity" keyring used by fs-verity. By default the keyring is unused retaining the exact same old behavior. For systems that provision additional keys only intended for dm-verity images during boot, the dm_verity.keyring_unsealed=1 kernel parameter leaves the keyring open. We want to use this in systemd as a way add keys during boot that are only used for creating dm-verity devices for later mounting and nothing else. The discoverable disk image (DDI) spec at [1] heavily relies on dm-verity and we would like to expand this even more. This will allow us to do that in a fully backward compatible way. Once provisioning is complete, userspace restricts and activates it for dm-verity verification. If userspace fully seals the keyring then it gains the guarantee that no new keys can be added. Link: https://uapi-group.org/specifications/specs/discoverable_partitions_specification [1] Co-developed-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm: clear cloned request bio pointer when last clone bio completesMichael Liang
Stale rq->bio values have been observed to cause double-initialization of cloned bios in request-based device-mapper targets, leading to use-after-free and double-free scenarios. One such case occurs when using dm-multipath on top of a PCIe NVMe namespace, where cloned request bios are freed during blk_complete_request(), but rq->bio is left intact. Subsequent clone teardown then attempts to free the same bios again via blk_rq_unprep_clone(). The resulting double-free path looks like: nvme_pci_complete_batch() nvme_complete_batch() blk_mq_end_request_batch() blk_complete_request() // called on a DM clone request bio_endio() // first free of all clone bios ... rq->end_io() // end_clone_request() dm_complete_request(tio->orig) dm_softirq_done() dm_done() dm_end_request() blk_rq_unprep_clone() // second free of clone bios Fix this by clearing the clone request's bio pointer when the last cloned bio completes, ensuring that later teardown paths do not attempt to free already-released bios. Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org
2026-01-14dm-verity: fix up various workqueue-related commentsEric Biggers
Replace obsolete mentions of "tasklets" with "softirq context", and "workqueue" with "kworker". This reflects the fact that the implementation of the "try_verify_in_tasklet" dm-verity option now accesses softirq context using either the BH workqueue API or inline execution, not the tasklet API. The old names conflated the API with the intended execution context, so they became outdated when the APIs changed. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm-verity: switch to bio_advance_iter_single()Eric Biggers
dm-verity doesn't support data blocks that span pages, and it sets dma_alignment accordingly. As such, instead of using bio_advance_iter(), it can use the more lightweight function bio_advance_iter_single() to get the same result. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm-verity: consolidate the BH and normal work structsEric Biggers
Since each dm_verity_io is never on both the BH and normal workqueues at the same time, there's no need for two different work_structs. Replace the 'bh_work' and 'work' fields with just 'work'. Note: this is correct even though it means 'work' may be reused while verity_bh_work() is running. The workqueue API allows work functions to reuse or free their work_struct, and many workqueue users rely on that. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari
This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") The refactoring is going to alter the default behavior of alloc_workqueue() to be unbound by default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. For more details see the Link tag below. In order to keep alloc_workqueue() behavior identical, explicitly request WQ_PERCPU. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm-integrity: fix a typo in the code for write/discard raceMikulas Patocka
If we send a write followed by a discard, it may be possible that the discarded data end up being overwritten by the previous write from the journal. The code tries to prevent that, but there was a typo in this logic that made it not being activated as it should be. Note that if we end up here the second time (when discard_retried is true), it means that the write bio is actually racing with the discard bio, and in this situation it is not specified which of them should win. Cc: stable@vger.kernel.org Fixes: 31843edab7cb ("dm integrity: improve discard in journal mode") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-14dm: use READ_ONCE in dm_blk_report_zonesMikulas Patocka
The functon dm_blk_report_zones reads md->zone_revalidate_map, however it may change while the function is running. Use READ_ONCE. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 37f53a2c60d0 ("dm: fix dm_blk_report_zones") Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
2026-01-14dm: fix unlocked test for dm_suspended_mdMikulas Patocka
The function dm_blk_report_zones tests if the device is suspended with the "dm_suspended_md" call. However, this function is called without holding any locks, so the device may be suspended just after it. Move the call to dm_suspended_md after dm_get_live_table, so that the device can't be suspended after the suspended state was tested. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 37f53a2c60d0 ("dm: fix dm_blk_report_zones") Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
2026-01-11Merge branch 'block-6.19' into for-7.0/blockJens Axboe
Merge in fixes that went to 6.19 after for-7.0/block was branched. Pending ublk changes depend on particularly the async scan work. * block-6.19: block: zero non-PI portion of auto integrity buffer ublk: fix use-after-free in ublk_partition_scan_work blk-mq: avoid stall during boot due to synchronize_rcu_expedited loop: add missing bd_abort_claiming in loop_set_status block: don't merge bios with different app_tags blk-rq-qos: Remove unlikely() hints from QoS checks loop: don't change loop device under exclusive opener in loop_set_status block, bfq: update outdated comment blk-mq: skip CPU offline notify on unmapped hctx selftests/ublk: fix Makefile to rebuild on header changes selftests/ublk: add test for async partition scan ublk: scan partition in async way block,bfq: fix aux stat accumulation destination md: Fix forward incompatibility from configurable logical block size md: Fix logical_block_size configuration being overwritten md: suspend array while updating raid_disks via sysfs md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt() md: Fix static checker warning in analyze_sbs
2026-01-04dm-verity: allow REED_SOLOMON to be 'm' if DM_VERITY is 'm'Eric Biggers
The dm-verity kconfig options make the common mistake of selecting a dependency from a bool "sub-option" rather than the main tristate option. This unnecessarily forces the dependency to built-in ('y'). Fix this by moving the selections of REED_SOLOMON and REED_SOLOMON_DEC8 into DM_VERITY, conditional on DM_VERITY_FEC. This allows REED_SOLOMON to be 'm' if DM_VERITY is 'm'. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: correctly handle dm_bufio_client_create() failureEric Biggers
If either of the calls to dm_bufio_client_create() in verity_fec_ctr() fails, then dm_bufio_client_destroy() is later called with an ERR_PTR() argument. That causes a crash. Fix this. Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Cc: stable@vger.kernel.org Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: make verity_fec_is_enabled() an inline functionEric Biggers
verity_fec_is_enabled() is very short and is called in quite a few places, so make it an inline function. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: remove unnecessary ifdef around verity_fec_decode()Eric Biggers
Since verity_fec_decode() has a !CONFIG_DM_VERITY_FEC stub, it can just be called unconditionally, similar to the other calls in the same file. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: remove unnecessary condition for verity_fec_finish_io()Eric Biggers
Make verity_finish_io() call verity_fec_finish_io() unconditionally, instead of skipping it when 'in_bh' is true. Although FEC can't have been done when 'in_bh' is true, verity_fec_finish_io() is a no-op when FEC wasn't done. An earlier change also made verity_fec_finish_io() very lightweight when FEC wasn't done. So it should just be called unconditionally. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: make dm_verity_fec_io::bufs variable-lengthEric Biggers
When correcting a data block, the FEC code performs optimally when it has enough buffers to hold all the needed RS blocks. That number of buffers is '1 << (v->data_dev_block_bits - DM_VERITY_FEC_BUF_RS_BITS)'. However, since v->data_dev_block_bits isn't a compile-time constant, the code actually used PAGE_SHIFT instead. With the traditional PAGE_SIZE == data_block_size == 4096, this was fine. However, when PAGE_SIZE > data_block_size, this wastes space. E.g., with data_block_size == 4096 && PAGE_SIZE == 16384, struct dm_verity_fec_io is 9240 bytes, when in fact only 3096 bytes are needed. Fix this by making dm_verity_fec_io::bufs a variable-length array. This makes the macros DM_VERITY_FEC_BUF_MAX and fec_for_each_extra_buffer() no longer apply, so remove them. For consistency, and because DM_VERITY_FEC_BUF_PREALLOC is fixed at 1 and was already assumed to be 1 (considering that mempool_alloc() shouldn't be called in a loop), also remove the related macros DM_VERITY_FEC_BUF_PREALLOC and fec_for_each_prealloc_buffer(). Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm-verity: move dm_verity_fec_io to mempoolEric Biggers
Currently, struct dm_verity_fec_io is allocated in the front padding of struct bio using dm_target::per_io_data_size. Unfortunately, struct dm_verity_fec_io is very large: 3096 bytes when CONFIG_64BIT=y && PAGE_SIZE == 4096, or 9240 bytes when CONFIG_64BIT=y && PAGE_SIZE == 16384. This makes the bio size very large. Moreover, most of dm_verity_fec_io gets iterated over up to three times, even on I/O requests that don't require any error correction: 1. To zero the memory on allocation, if init_on_alloc=1. (This happens when the bio is allocated, not in dm-verity itself.) 2. To zero the buffers array in verity_fec_init_io(). 3. To free the buffers in verity_fec_finish_io(). Fix all of these inefficiencies by moving dm_verity_fec_io to a mempool. Replace the embedded dm_verity_fec_io with a pointer dm_verity_io::fec_io. verity_fec_init_io() initializes it to NULL, verity_fec_decode() allocates it on the first call, and verity_fec_finish_io() cleans it up. The normal case is that the pointer simply stays NULL, so the overhead becomes negligible. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm clone: drop redundant size checksLi Chen
The clone target already exposes both source and destination devices via clone_iterate_devices(), so dm-table's device_area_is_invalid() helper ensures that the mapping does not extend past either underlying block device. The manual comparisons between ti->len and the source/destination device sizes in parse_source_dev() and parse_dest_dev() are therefore redundant. Remove these checks and rely on the core validation instead. This changes the error strings reported when the devices are too small, but preserves the failure behaviour. Signed-off-by: Li Chen <me@linux.beauty> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-04dm cache: drop redundant origin size checkLi Chen
The cache target already exposes the origin device through cache_iterate_devices(), which allows dm-table to call device_area_is_invalid() and verify that the mapping fits inside the underlying block device. The explicit ti->len > origin_sectors test in parse_origin_dev() is therefore redundant. Drop this check and rely on the core device validation instead. This changes the user-visible error string when the origin is too small, but preserves the failure behaviour. Signed-off-by: Li Chen <me@linux.beauty> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-02Merge tag 'block-6.19-20260102' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Scan partition tables asynchronously for ublk, similarly to how nvme does it. This avoids potential deadlocks, which is why nvme does it that way too. Includes a set of selftests as well. - MD pull request via Yu: - Fix null-pointer dereference in raid5 sysfs group_thread_cnt store (Tuo Li) - Fix possible mempool corruption during raid1 raid_disks update via sysfs (FengWei Shih) - Fix logical_block_size configuration being overwritten during super_1_validate() (Li Nan) - Fix forward incompatibility with configurable logical block size: arrays assembled on new kernels could not be assembled on older kernels (v6.18 and before) due to non-zero reserved pad rejection (Li Nan) - Fix static checker warning about iterator not incremented (Li Nan) - Skip CPU offlining notifications on unmapped hardware queues - bfq-iosched block stats fix - Fix outdated comment in bfq-iosched * tag 'block-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: block, bfq: update outdated comment blk-mq: skip CPU offline notify on unmapped hctx selftests/ublk: fix Makefile to rebuild on header changes selftests/ublk: add test for async partition scan ublk: scan partition in async way block,bfq: fix aux stat accumulation destination md: Fix forward incompatibility from configurable logical block size md: Fix logical_block_size configuration being overwritten md: suspend array while updating raid_disks via sysfs md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt() md: Fix static checker warning in analyze_sbs
2026-01-02dm-stripe: adjust max_hw_discard_sectors to avoid unnecessary discard bio ↵Yongpeng Yang
splitting Currently, the max_hw_discard_sectors of a stripe target is set to the minimum max_hw_discard_sectors among all sub devices. When the discard bio is larger than max_hw_discard_sectors, this may cause the stripe device to split discard bios unnecessarily, because the value of max_hw_discard_sectors affects max_discard_sectors, which equal to min(max_hw_discard_sectors, max_user_discard_sectors). For example: root@vm:~# echo '0 33554432 striped 2 256 /dev/vdd 0 /dev/vde 0' | dmsetup create stripe_dev root@vm:~# cat /sys/block/dm-1/queue/discard_max_bytes 536870912 root@vm:~# cat /sys/block/dm-1/slaves/vdd/queue/discard_max_bytes 536870912 root@vm:~# blkdiscard -o 0 -l 1073741824 -p 1073741824 /dev/mapper/stripe_dev dm-1 is the stripe device, and its discard_max_bytes is equal to each sub device’s discard_max_bytes. Since the requested discard length exceeds discard_max_bytes, the block layer splits the discard bio: block_bio_queue: 252,1 DS 0 + 2097152 [blkdiscard] block_split: 252,1 DS 0 / 1048576 [blkdiscard] block_rq_issue: 253,48 DS 268435456 () 0 + 524288 be,0,4 [blkdiscard] block_bio_queue: 253,64 DS 524288 + 524288 [blkdiscard] However, both vdd and vde can actually handle a discard bio of 536870912 bytes, so this split is not necessary. This patch updates the stripe target’s q->limits.max_hw_discard_sectors to be the minimum max_hw_discard_sectors of the sub devices multiplied by the # of stripe devices, and max_hw_discard_sectors must round down to chunk size multiply # of stripe devices to avoid issue discard bio to sub devices which is larger than max_hw_discard_sectors. This patch enables the stripe device to handle larger discard bios without incurring unnecessary splitting. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-02dm: replace -EEXIST with -EBUSYDaniel Gomez
The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. This follows the precedent set by commit 54416fd76770 ("netfilter: conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same issue in nf_conntrack_helper_register(). Affected modules: * dm_cache dm_clone dm_integrity dm_mirror dm_multipath dm_pcache * dm_vdo dm-ps-round-robin dm_historical_service_time dm_io_affinity * dm_queue_length dm_service_time dm_snapshot Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-02dm: remove fake timeout to avoid leak requestDing Hui
Since commit 15f73f5b3e59 ("blk-mq: move failure injection out of blk_mq_complete_request"), drivers are responsible for calling blk_should_fake_timeout() at appropriate code paths and opportunities. However, the dm driver does not implement its own timeout handler and relies on the timeout handling of its slave devices. If an io-timeout-fail error is injected to a dm device, the request will be leaked and never completed, causing tasks to hang indefinitely. Reproduce: 1. prepare dm which has iscsi slave device 2. inject io-timeout-fail to dm echo 1 >/sys/class/block/dm-0/io-timeout-fail echo 100 >/sys/kernel/debug/fail_io_timeout/probability echo 10 >/sys/kernel/debug/fail_io_timeout/times 3. read/write dm 4. iscsiadm -m node -u Result: hang task like below [ 862.243768] INFO: task kworker/u514:2:151 blocked for more than 122 seconds. [ 862.244133] Tainted: G E 6.19.0-rc1+ #51 [ 862.244337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 862.244718] task:kworker/u514:2 state:D stack:0 pid:151 tgid:151 ppid:2 task_flags:0x4288060 flags:0x00080000 [ 862.245024] Workqueue: iscsi_ctrl_3:1 __iscsi_unbind_session [scsi_transport_iscsi] [ 862.245264] Call Trace: [ 862.245587] <TASK> [ 862.245814] __schedule+0x810/0x15c0 [ 862.246557] schedule+0x69/0x180 [ 862.246760] blk_mq_freeze_queue_wait+0xde/0x120 [ 862.247688] elevator_change+0x16d/0x460 [ 862.247893] elevator_set_none+0x87/0xf0 [ 862.248798] blk_unregister_queue+0x12e/0x2a0 [ 862.248995] __del_gendisk+0x231/0x7e0 [ 862.250143] del_gendisk+0x12f/0x1d0 [ 862.250339] sd_remove+0x85/0x130 [sd_mod] [ 862.250650] device_release_driver_internal+0x36d/0x530 [ 862.250849] bus_remove_device+0x1dd/0x3f0 [ 862.251042] device_del+0x38a/0x930 [ 862.252095] __scsi_remove_device+0x293/0x360 [ 862.252291] scsi_remove_target+0x486/0x760 [ 862.252654] __iscsi_unbind_session+0x18a/0x3e0 [scsi_transport_iscsi] [ 862.252886] process_one_work+0x633/0xe50 [ 862.253101] worker_thread+0x6df/0xf10 [ 862.253647] kthread+0x36d/0x720 [ 862.254533] ret_from_fork+0x2a6/0x470 [ 862.255852] ret_from_fork_asm+0x1a/0x30 [ 862.256037] </TASK> Remove the blk_should_fake_timeout() check from dm, as dm has no native timeout handling and should not attempt to fake timeouts. Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2026-01-02dm-vdo: adjust function name referenceJulia Lawall
There is no function advance_compression_stage(). But advance_data_vio_compression_stage() does iterate through the values of the data_vio_compression_stage enum, so it seems to be what was intended. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-27md: Fix forward incompatibility from configurable logical block sizeLi Nan
Commit 62ed1b582246 ("md: allow configuring logical block size") used reserved pad to add 'logical_block_size' to metadata. RAID rejects non-zero reserved pad, so arrays fail when rolling back to old kernels after booting new ones. Set 'logical_block_size' only for newly created arrays to support rollback to old kernels. Importantly new arrays still won't work on old kernels to prevent data loss issue from LBS changes. For arrays created on old kernels which confirmed not to rollback, configure LBS by echo current LBS (queue/logical_block_size) to md/logical_block_size. Fixes: 62ed1b582246 ("md: allow configuring logical block size") Reported-by: BugReports <bugreports61@gmail.com> Closes: https://lore.kernel.org/linux-raid/825e532d-d1e1-44bb-5581-692b7c091796@huaweicloud.com/T/#t Signed-off-by: Li Nan <linan122@huawei.com> Link: https://lore.kernel.org/linux-raid/20251226024221.724201-2-linan666@huaweicloud.com Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27md: Fix logical_block_size configuration being overwrittenLi Nan
In super_1_validate(), mddev->logical_block_size is directly overwritten with the value from metadata. This causes the previously configured lbs to be lost, making the configuration ineffective. Fix it. Fixes: 62ed1b582246 ("md: allow configuring logical block size") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/linux-raid/20251226024221.724201-1-linan666@huaweicloud.com Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27md: suspend array while updating raid_disks via sysfsFengWei Shih
In raid1_reshape(), freeze_array() is called before modifying the r1bio memory pool (conf->r1bio_pool) and conf->raid_disks, and unfreeze_array() is called after the update is completed. However, freeze_array() only waits until nr_sync_pending and (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error occurs, nr_queued is increased and the corresponding r1bio is queued to either retry_list or bio_end_io_list. As a result, freeze_array() may unblock before these r1bios are released. This can lead to a situation where conf->raid_disks and the mempool have already been updated while queued r1bios, allocated with the old raid_disks value, are later released. Consequently, free_r1bio() may access memory out of bounds in put_all_bios() and release r1bios of the wrong size to the new mempool, potentially causing issues with the mempool as well. Since only normal I/O might increase nr_queued while an I/O error occurs, suspending the array avoids this issue. Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends the array. Therefore, we suspend the array when updating raid_disks via sysfs to avoid this issue too. Signed-off-by: FengWei Shih <dannyshih@synology.com> Link: https://lore.kernel.org/linux-raid/20251226101816.4506-1-dannyshih@synology.com Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-27md/raid5: fix possible null-pointer dereferences in ↵Tuo Li
raid5_store_group_thread_cnt() The variable mddev->private is first assigned to conf and then checked: conf = mddev->private; if (!conf) ... If conf is NULL, then mddev->private is also NULL. In this case, null-pointer dereferences can occur when calling raid5_quiesce(): raid5_quiesce(mddev, true); raid5_quiesce(mddev, false); since mddev->private is assigned to conf again in raid5_quiesce(), and conf is dereferenced in several places, for example: conf->quiesce = 0; wake_up(&conf->wait_for_quiescent); To fix this issue, the function should unlock mddev and return before invoking raid5_quiesce() when conf is NULL, following the existing pattern in raid5_change_consistency_policy(). Fixes: fa1944bbe622 ("md/raid5: Wait sync io to finish before changing group cnt") Signed-off-by: Tuo Li <islituo@gmail.com> Reviewed-by: Xiao Ni <xni@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/linux-raid/20251225130326.67780-1-islituo@gmail.com Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-25md: Fix static checker warning in analyze_sbsLi Nan
The following warn is reported: drivers/md/md.c:3912 analyze_sbs() warn: iterator 'i' not incremented Fixes: d8730f0cf4ef ("md: Remove deprecated CONFIG_MD_MULTIPATH") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-raid/7e2e95ce-3740-09d8-a561-af6bfb767f18@huaweicloud.com/T/#t Signed-off-by: Li Nan <linan122@huawei.com> Link: https://lore.kernel.org/linux-raid/20251215124412.4015572-1-linan666@huaweicloud.com Signed-off-by: Yu Kuai <yukuai@fnnas.com>
2025-12-14Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI fixes from James Bottomley: "The only core fix is in doc; all the others are in drivers, with the biggest impacts in libsas being the rollback on error handling and in ufs coming from a couple of error handling fixes, one causing a crash if it's activated before scanning and the other fixing W-LUN resumption" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: qcom: Fix confusing cleanup.h syntax scsi: libsas: Add rollback handling when an error occurs scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name() scsi: ufs: core: Fix a deadlock in the frequency scaling code scsi: ufs: core: Fix an error handler crash scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed" scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies scsi: qla4xxx: Use time conversion macros scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset scsi: imm: Fix use-after-free bug caused by unfinished delayed work scsi: target: sbp: Remove KMSG_COMPONENT macro scsi: core: Correct documentation for scsi_device_quiesce() scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1 scsi: target: Reset t_task_cdb pointer in error case scsi: ufs: core: Fix EH failure after W-LUN resume error
2025-12-12Merge tag 'block-6.19-20251211' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Always initialize DMA state, fixing a potentially nasty issue on the block side - btrfs zoned write fix with cached zone reports - Fix corruption issues in bcache with chained bio's, and further make it clear that the chained IO handler is simply a marker, it's not code meant to be executed - Kill old code dealing with synchronous IO polling in the block layer, that has been dead for a long time. Only async polling is supported these days - Fix a lockdep issue in tag_set management, moving it to RCU - Fix an issue with ublks bio_vec iteration - Don't unconditionally enforce blocking issue of ublk control commands, allow some of them with non-blocking issue as they do not block * tag 'block-6.19-20251211' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: blk-mq-dma: always initialize dma state blk-mq: delete task running check in blk_hctx_poll() block: fix cached zone reports on devices with native zone append block: Use RCU in blk_mq_[un]quiesce_tagset() instead of set->tag_list_lock ublk: don't mutate struct bio_vec in iteration block: prohibit calls to bio_chain_endio bcache: fix improper use of bi_end_io ublk: allow non-blocking ctrl cmds in IO_URING_F_NONBLOCK issue
2025-12-11Merge tag 'for-6.19/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - convert crypto_shash users to direct crypto library use with simpler and faster code and reduced stack usage (Eric Biggers): - the dm-verity SHA-256 conversion also teaches it to do two-way interleaved hashing for added performance - dm-crypt MD5 conversion (used for Loop-AES compatibility) - added document for for takeover/reshape raid1 -> raid5 examples (Heinz Mauelshagen) - fix dm-vdo kerneldoc warnings (Matthew Sakai) - various random fixes and cleanups * tag 'for-6.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits) dm pcache: fix segment info indexing dm pcache: fix cache info indexing dm-pcache: advance slot index before writing slot dm raid: add documentation for takeover/reshape raid1 -> raid5 table line examples dm log-writes: Add missing set_freezable() for freezable kthread dm-raid: fix possible NULL dereference with undefined raid type dm-snapshot: fix 'scheduling while atomic' on real-time kernels dm: ignore discard return value MAINTAINERS: add Benjamin Marzinski as a device mapper maintainer dm-mpath: Simplify the setup_scsi_dh code dm vdo: fix kerneldoc warnings dm-bufio: align write boundary on physical block size dm-crypt: enable DM_TARGET_ATOMIC_WRITES dm: test for REQ_ATOMIC in dm_accept_partial_bio() dm-verity: remove useless mempool dm-verity: disable recursive forward error correction dm-ebs: Mark full buffer dirty even on partial write dm mpath: enable DM_TARGET_ATOMIC_WRITES dm verity fec: Expose corrected block count via status dm: Don't warn if IMA_DISABLE_HTABLE is not enabled ...
2025-12-10dm pcache: fix segment info indexingLi Chen
Segment info indexing also used sizeof(struct) instead of the 4K metadata stride, so info_index could point between slots and subsequent writes would advance incorrectly. Derive info_index from the pointer returned by the segment meta search using PCACHE_SEG_INFO_SIZE and advance to the next slot for future updates. Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-12-10dm pcache: fix cache info indexingLi Chen
The on-media cache_info index used sizeof(struct) instead of the 4K metadata stride, so gc_percent updates from dmsetup message were written between slots and lost after reboot. Use PCACHE_CACHE_INFO_SIZE in get_cache_info_addr() and align info_index with the slot returned by pcache_meta_find_latest(). Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-12-10dm-pcache: advance slot index before writing slotDongsheng Yang
In dm-pcache, in order to ensure crash-consistency, a dual-copy scheme is used to alternately update metadata, and there is a slot index that records the current slot. However, in the write path the current implementation writes directly to the current slot indexed by slot index, and then advances the slot — which ends up overwriting the existing slot, violating the crash-consistency guarantee. This patch fixes that behavior, preventing metadata from being overwritten incorrectly. In addition, this patch add a missing pmem_wmb() after memcpy_flushcache(). Signed-off-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Zheng Gu <cengku@gmail.com> Cc: stable@vger.kernel.org # 6.18
2025-12-10dm log-writes: Add missing set_freezable() for freezable kthreadHaotian Zhang
The log_writes_kthread() calls try_to_freeze() but lacks set_freezable(), rendering the freeze attempt ineffective since kernel threads are non-freezable by default. This prevents proper thread suspension during system suspend/hibernate. Add set_freezable() to explicitly mark the thread as freezable. Fixes: 0e9cebe72459 ("dm: add log writes target") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm-raid: fix possible NULL dereference with undefined raid typeAlexey Simakov
rs->raid_type is assigned from get_raid_type_by_ll(), which may return NULL. This NULL value could be dereferenced later in the condition 'if (!(rs_is_raid10(rs) && rt_is_raid0(rs->raid_type)))'. Add a fail-fast check to return early with an error if raid_type is NULL, similar to other uses of this function. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping") Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm-snapshot: fix 'scheduling while atomic' on real-time kernelsMikulas Patocka
There is reported 'scheduling while atomic' bug when using dm-snapshot on real-time kernels. The reason for the bug is that the hlist_bl code does preempt_disable() when taking the lock and the kernel attempts to take other spinlocks while holding the hlist_bl lock. Fix this by converting a hlist_bl spinlock into a regular spinlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Jiping Ma <jiping.ma2@windriver.com>
2025-12-10dm: ignore discard return valueChaitanya Kulkarni
__blkdev_issue_discard() always returns 0, making all error checking at call sites dead code. For dm-thin change issue_discard() return type to void, in passdown_double_checking_shared_status() remove the r assignment from return value of the issue_discard(), for end_discard() hardcode value of r to 0 that matches only value returned from __blkdev_issue_discard(). Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm-mpath: Simplify the setup_scsi_dh codeBenjamin Marzinski
There's no point to the MPATHF_RETAIN_ATTACHED_HW_HANDLER flag any more. The way setup_scsi_dh() worked, if that flag wasn't set, it would attempt to attach any passed in hardware handler. This would always fail if a different hardware handler was attached, which caused setup_scsi_dh() to rerun as if the flag was set. So the code would already retain any attached handler, because attaching a different one would always fail. Also, the code had a bug. If attached_handler_name was NULL but there was a scsi device handler attached (because either scsi_dh_attached_handler_name failed() to allocate a name, a handler got attached after it was called) the code would loop endlessly. Instead, ignore MPATHF_RETAIN_ATTACHED_HW_HANDLER, and always free the passed in handler if *attached_handler_name is set. This simplifies the code, and avoids the endless loop bug, while keeping the functionality the same. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm vdo: fix kerneldoc warningsMatthew Sakai
Fix kerneldoc warnings across the dm-vdo target. Also remove some unhelpful or inaccurate doc comments, and fix some format inconsistencies that did not produce warnings. No functional changes. Suggested-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm-bufio: align write boundary on physical block sizeMikulas Patocka
There may be devices with physical block size larger than 4k. If dm-bufio sends I/O that is not aligned on physical block size, performance is degraded. The 4k minimum alignment limit is there because some SSDs report logical and physical block size 512 despite having 4k internally - so dm-bufio shouldn't send I/Os not aligned on 4k boundary, because they perform badly (the SSD does read-modify-write for them). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: stable@vger.kernel.org
2025-12-10dm-crypt: enable DM_TARGET_ATOMIC_WRITESMikulas Patocka
Allow handling of bios with REQ_ATOMIC flag set. Don't split these bios and fail them if they overrun the hard limit "BIO_MAX_VECS << PAGE_SHIFT". In order to simplify the code, this commit joins the logic that avoids splitting emulated zone append bios with the logic that avoids splitting atomic write bios. Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: John Garry <john.g.garry@oracle.com>
2025-12-10dm: test for REQ_ATOMIC in dm_accept_partial_bio()Mikulas Patocka
Any bio with REQ_ATOMIC flag set should never be split or partially completed, so BUG_ON() on this scenario in dm_accept_partial_bio() (whose intent is to allow partial completions). Also, we must reject atomic bio to targets that don't support them, otherwise this BUG could be triggered by stray bios that have the REQ_ATOMIC set. Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: John Garry <john.g.garry@oracle.com>
2025-12-10dm-verity: remove useless mempoolMikulas Patocka
v->fec->extra_pool has zero reserved entries, so we can remove it and use the kernel cache directly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org>