summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2023-12-08bcache: revert replacing IS_ERR_OR_NULL with IS_ERRMarkus Weippert
commit bb6cc253861bd5a7cf8439e2118659696df9619f upstream. Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000080 Call Trace: ? __die_body.cold+0x1a/0x1f ? page_fault_oops+0xd2/0x2b0 ? exc_page_fault+0x70/0x170 ? asm_exc_page_fault+0x22/0x30 ? btree_node_free+0xf/0x160 [bcache] ? up_write+0x32/0x60 btree_gc_coalesce+0x2aa/0x890 [bcache] ? bch_extent_bad+0x70/0x170 [bcache] btree_gc_recurse+0x130/0x390 [bcache] ? btree_gc_mark_node+0x72/0x230 [bcache] bch_btree_gc+0x5da/0x600 [bcache] ? cpuusage_read+0x10/0x10 ? bch_btree_gc+0x600/0x600 [bcache] bch_gc_thread+0x135/0x180 [bcache] The relevant code starts with: new_nodes[0] = NULL; for (i = 0; i < nodes; i++) { if (__bch_keylist_realloc(&keylist, bkey_u64s(&r[i].b->key))) goto out_nocoalesce; // ... out_nocoalesce: // ... for (i = 0; i < nodes; i++) if (!IS_ERR(new_nodes[i])) { // IS_ERR_OR_NULL before 028ddcac477b btree_node_free(new_nodes[i]); // new_nodes[0] is NULL rw_unlock(true, new_nodes[i]); } This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this. Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/ Cc: stable@vger.kernel.org Cc: Zheng Wang <zyytlz.wz@163.com> Cc: Coly Li <colyli@suse.de> Signed-off-by: Markus Weippert <markus@gekmihesg.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08dm verity: don't perform FEC for failed readahead IOWu Bo
commit 0193e3966ceeeef69e235975918b287ab093082b upstream. We found an issue under Android OTA scenario that many BIOs have to do FEC where the data under dm-verity is 100% complete and no corruption. Android OTA has many dm-block layers, from upper to lower: dm-verity dm-snapshot dm-origin & dm-cow dm-linear ufs DM tables have to change 2 times during Android OTA merging process. When doing table change, the dm-snapshot will be suspended for a while. During this interval, many readahead IOs are submitted to dm_verity from filesystem. Then the kverity works are busy doing FEC process which cost too much time to finish dm-verity IO. This causes needless delay which feels like system is hung. After adding debugging it was found that each readahead IO needed around 10s to finish when this situation occurred. This is due to IO amplification: dm-snapshot suspend erofs_readahead // 300+ io is submitted dm_submit_bio (dm_verity) dm_submit_bio (dm_snapshot) bio return EIO bio got nothing, it's empty verity_end_io verity_verify_io forloop range(0, io->n_blocks) // each io->nblocks ~= 20 verity_fec_decode fec_decode_rsb fec_read_bufs forloop range(0, v->fec->rsn) // v->fec->rsn = 253 new_read submit_bio (dm_snapshot) end loop end loop dm-snapshot resume Readahead BIOs get nothing while dm-snapshot is suspended, so all of them will cause verity's FEC. Each readahead BIO needs to verify ~20 (io->nblocks) blocks. Each block needs to do FEC, and every block needs to do 253 (v->fec->rsn) reads. So during the suspend interval(~200ms), 300 readahead BIOs trigger ~1518000 (300*20*253) IOs to dm-snapshot. As readahead IO is not required by userspace, and to fix this issue, it is best to pass readahead errors to upper layer to handle it. Cc: stable@vger.kernel.org Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08dm-verity: align struct dm_verity_fec_io properlyMikulas Patocka
commit 38bc1ab135db87577695816b190e7d6d8ec75879 upstream. dm_verity_fec_io is placed after the end of two hash digests. If the hash digest has unaligned length, struct dm_verity_fec_io could be unaligned. This commit fixes the placement of struct dm_verity_fec_io, so that it's aligned. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08bcache: prevent potential division by zero errorRand Deeb
commit 2c7f497ac274a14330208b18f6f734000868ebf9 upstream. In SHOW(), the variable 'n' is of type 'size_t.' While there is a conditional check to verify that 'n' is not equal to zero before executing the 'do_div' macro, concerns arise regarding potential division by zero error in 64-bit environments. The concern arises when 'n' is 64 bits in size, greater than zero, and the lower 32 bits of it are zeros. In such cases, the conditional check passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to 'uint32_t,' effectively truncating it to its lower 32 bits. Consequently, the 'n' value becomes zero. To fix this potential division by zero error and ensure precise division handling, this commit replaces the 'do_div' macro with div64_u64(). div64_u64() is designed to work with 64-bit operands, guaranteeing that division is performed correctly. This change enhances the robustness of the code, ensuring that division operations yield accurate results in all scenarios, eliminating the possibility of division by zero, and improving compatibility across different 64-bit environments. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08bcache: check return value from btree_node_alloc_replacement()Coly Li
commit 777967e7e9f6f5f3e153abffb562bffaf4430d26 upstream. In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns from btree_gc_rewrite_node(). There is potential possibility that 'n' is a non NULL ERR_PTR(), referencing such error code is not permitted in following code. Therefore a return value checking is necessary after 'n' is back from btree_node_alloc_replacement(). Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231120052503.6122-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08dm-delay: fix a race between delay_presuspend and delay_bioMikulas Patocka
[ Upstream commit 6fc45b6ed921dc00dfb264dc08c7d67ee63d2656 ] In delay_presuspend, we set the atomic variable may_delay and then stop the timer and flush pending bios. The intention here is to prevent the delay target from re-arming the timer again. However, this test is racy. Suppose that one thread goes to delay_bio, sees that dc->may_delay is one and proceeds; now, another thread executes delay_presuspend, it sets dc->may_delay to zero, deletes the timer and flushes pending bios. Then, the first thread continues and adds the bio to delayed->list despite the fact that dc->may_delay is false. Fix this bug by changing may_delay's type from atomic_t to bool and only access it while holding the delayed_bios_lock mutex. Note that we don't have to grab the mutex in delay_resume because there are no bios in flight at this point. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()Coly Li
commit f72f4312d4388376fc8a1f6cf37cb21a0d41758b upstream. Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") do the following change inside btree_gc_coalesce(), 31 @@ -1340,7 +1340,7 @@ static int btree_gc_coalesce( 32 memset(new_nodes, 0, sizeof(new_nodes)); 33 closure_init_stack(&cl); 34 35 - while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b)) 36 + while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b)) 37 keys += r[nodes++].keys; 38 39 blocks = btree_default_blocks(b->c) * 2 / 3; At line 35 the original r[nodes].b is not always allocatored from __bch_btree_node_alloc(), and possibly initialized as NULL pointer by caller of btree_gc_coalesce(). Therefore the change at line 36 is not correct. This patch replaces the mistaken IS_ERR() by IS_ERR_OR_NULL() to avoid potential issue. Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") Cc: <stable@vger.kernel.org> # 6.5+ Cc: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-9-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-23md/raid1: fix error: ISO C90 forbids mixed declarationsNigel Croxon
[ Upstream commit df203da47f4428bc286fc99318936416253a321c ] There is a compile error when this commit is added: md: raid1: fix potential OOB in raid1_remove_disk() drivers/md/raid1.c: In function 'raid1_remove_disk': drivers/md/raid1.c:1844:9: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 1844 |         struct raid1_info *p = conf->mirrors + number;     |         ^~~~~~ That's because the new code was inserted before the struct. The change is move the struct command above this commit. Fixes: 8b0472b50bcf ("md: raid1: fix potential OOB in raid1_remove_disk()") Signed-off-by: Nigel Croxon <ncroxon@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/46d929d0-2aab-4cf2-b2bf-338963e8ba5a@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23md: raid1: fix potential OOB in raid1_remove_disk()Zhang Shurong
[ Upstream commit 8b0472b50bcf0f19a5119b00a53b63579c8e1e4d ] If rddev->raid_disk is greater than mddev->raid_disks, there will be an out-of-bounds in raid1_remove_disk(). We have already found similar reports as follows: 1) commit d17f744e883b ("md-raid10: fix KASAN warning") 2) commit 1ebc2cec0b7d ("dm raid: fix KASAN warning in raid5_remove_disk") Fix this bug by checking whether the "number" variable is valid. Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/tencent_0D24426FAC6A21B69AC0C03CE4143A508F09@qq.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23md/md-bitmap: hold 'reconfig_mutex' in backlog_store()Yu Kuai
[ Upstream commit 44abfa6a95df425c0660d56043020b67e6d93ab8 ] Several reasons why 'reconfig_mutex' should be held: 1) rdev_for_each() is not safe to be called without the lock, because rdev can be removed concurrently. 2) mddev_destroy_serial_pool() and mddev_create_serial_pool() should not be called concurrently. 3) mddev_suspend() from mddev_destroy/create_serial_pool() should be protected by the lock. Fixes: 10c92fca636e ("md-bitmap: create and destroy wb_info_pool with the change of backlog") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20230706083727.608914-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-23md/bitmap: don't set max_write_behind if there is no write mostly deviceGuoqing Jiang
[ Upstream commit 8c13ab115b577bd09097b9d77916732e97e31b7b ] We shouldn't set it since write behind IO should only happen to write mostly device. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <songliubraving@fb.com> Stable-dep-of: 44abfa6a95df ("md/md-bitmap: hold 'reconfig_mutex' in backlog_store()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30dm integrity: reduce vmalloc space footprint on 32-bit architecturesMikulas Patocka
[ Upstream commit 6d50eb4725934fd22f5eeccb401000687c790fd0 ] It was reported that dm-integrity runs out of vmalloc space on 32-bit architectures. On x86, there is only 128MiB vmalloc space and dm-integrity consumes it quickly because it has a 64MiB journal and 8MiB recalculate buffer. Fix this by reducing the size of the journal to 4MiB and the size of the recalculate buffer to 1MiB, so that multiple dm-integrity devices can be created and activated on 32-bit architectures. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30dm integrity: increase RECALC_SECTORS to improve recalculate speedMikulas Patocka
[ Upstream commit b1a2b9332050c7ae32a22c2c74bc443e39f37b23 ] Increase RECALC_SECTORS because it improves recalculate speed slightly (from 390kiB/s to 410kiB/s). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Stable-dep-of: 6d50eb472593 ("dm integrity: reduce vmalloc space footprint on 32-bit architectures") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11dm cache policy smq: ensure IO doesn't prevent cleaner policy progressJoe Thornber
commit 1e4ab7b4c881cf26c1c72b3f56519e03475486fb upstream. When using the cleaner policy to decommission the cache, there is never any writeback started from the cache as it is constantly delayed due to normal I/O keeping the device busy. Meaning @idle=false was always being passed to clean_target_met() Fix this by adding a specific 'cleaner' flag that is set when the cleaner policy is configured. This flag serves to always allow the cleaner's writeback work to be queued until the cache is decommissioned (even if the cache isn't idle). Reported-by: David Jeffery <djeffery@redhat.com> Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11dm raid: fix missing reconfig_mutex unlock in raid_ctr() error pathsYu Kuai
[ Upstream commit bae3028799dc4f1109acc4df37c8ff06f2d8f1a0 ] In the error paths 'bad_stripe_cache' and 'bad_check_reshape', 'reconfig_mutex' is still held after raid_ctr() returns. Fixes: 9dbd1aa3a81c ("dm raid: add reshaping support to the target") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11bcache: Fix __bch_btree_node_alloc to make the failure behavior consistentZheng Wang
[ Upstream commit 80fca8a10b604afad6c14213fdfd816c4eda3ee4 ] In some specific situations, the return value of __bch_btree_node_alloc may be NULL. This may lead to a potential NULL pointer dereference in caller function like a calling chain : btree_split->bch_btree_node_alloc->__bch_btree_node_alloc. Fix it by initializing the return value in __bch_btree_node_alloc. Fixes: cafe56359144 ("bcache: A block layer cache") Cc: stable@vger.kernel.org Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()Coly Li
[ Upstream commit 17e4aed8309ff28670271546c2c3263eb12f5eb6 ] The parameter 'int n' from bch_bucket_alloc_set() is not cleared defined. From the code comments n is the number of buckets to alloc, but from the code itself 'n' is the maximum cache to iterate. Indeed all the locations where bch_bucket_alloc_set() is called, 'n' is alwasy 1. This patch removes the confused and unnecessary 'int n' from parameter list of bch_bucket_alloc_set(), and explicitly allocates only 1 bucket for its caller. Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 80fca8a10b60 ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid10: prevent soft lockup while flush writesYu Kuai
[ Upstream commit 010444623e7f4da6b4a4dd603a7da7469981e293 ] Currently, there is no limit for raid1/raid10 plugged bio. While flushing writes, raid1 has cond_resched() while raid10 doesn't, and too many writes can cause soft lockup. Follow up soft lockup can be triggered easily with writeback test for raid10 with ramdisks: watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293] Call Trace: <TASK> call_rcu+0x16/0x20 put_object+0x41/0x80 __delete_object+0x50/0x90 delete_object_full+0x2b/0x40 kmemleak_free+0x46/0xa0 slab_free_freelist_hook.constprop.0+0xed/0x1a0 kmem_cache_free+0xfd/0x300 mempool_free_slab+0x1f/0x30 mempool_free+0x3a/0x100 bio_free+0x59/0x80 bio_put+0xcf/0x2c0 free_r10bio+0xbf/0xf0 raid_end_bio_io+0x78/0xb0 one_write_done+0x8a/0xa0 raid10_end_write_request+0x1b4/0x430 bio_endio+0x175/0x320 brd_submit_bio+0x3b9/0x9b7 [brd] __submit_bio+0x69/0xe0 submit_bio_noacct_nocheck+0x1e6/0x5a0 submit_bio_noacct+0x38c/0x7e0 flush_pending_writes+0xf0/0x240 raid10d+0xac/0x1ed0 Fix the problem by adding cond_resched() to raid10 like what raid1 did. Note that unlimited plugged bio still need to be optimized, for example, in the case of lots of dirty pages writeback, this will take lots of memory and io will spend a long time in plug, hence io latency is bad. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230529131106.2123367-2-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md: fix data corruption for raid456 when reshape restart while grow upYu Kuai
[ Upstream commit 873f50ece41aad5c4f788a340960c53774b5526e ] Currently, if reshape is interrupted, echo "reshape" to sync_action will restart reshape from scratch, for example: echo frozen > sync_action echo reshape > sync_action This will corrupt data before reshape_position if the array is growing, fix the problem by continue reshape from reshape_position. Reported-by: Peter Neuwirth <reddunur@online.de> Link: https://lore.kernel.org/linux-raid/e2f96772-bfbc-f43b-6da1-f520e5164536@online.de/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230512015610.821290-3-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid0: add discard support for the 'original' layoutJason Baron
commit e836007089ba8fdf24e636ef2b007651fb4582e6 upstream. We've found that using raid0 with the 'original' layout and discard enabled with different disk sizes (such that at least two zones are created) can result in data corruption. This is due to the fact that the discard handling in 'raid0_handle_discard()' assumes the 'alternate' layout. We've seen this corruption using ext4 but other filesystems are likely susceptible as well. More specifically, while multiple zones are necessary to create the corruption, the corruption may not occur with multiple zones if they layout in such a way the layout matches what the 'alternate' layout would have produced. Thus, not all raid0 devices with the 'original' layout, different size disks and discard enabled will encounter this corruption. The 3.14 kernel inadvertently changed the raid0 disk layout for different size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the same raid0 array could corrupt data. This lead to the creation of the 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout (to match the post 3.14 layout) in the 5.4 kernel time frame and an option to tell the kernel which layout to use (since it couldn't be autodetected). However, when the 'original' layout was added back to 5.4 discard support for the 'original' layout was not added leading this issue. I've been able to reliably reproduce the corruption with the following test case: 1. create raid0 array with different size disks using original layout 2. mkfs 3. mount -o discard 4. create lots of files 5. remove 1/2 the files 6. fstrim -a (or just the mount point for the raid0 array) 7. umount 8. fsck -fn /dev/md0 (spews all sorts of corruptions) Let's fix this by adding proper discard support to the 'original' layout. The fix 'maps' the 'original' layout disks to the order in which they are read/written such that we can compare the disks in the same way that the current 'alternate' layout does. A 'disk_shift' field is added to 'struct strip_zone'. This could be computed on the fly in raid0_handle_discard() but by adding this field, we save some computation in the discard path. Note we could also potentially fix this by re-ordering the disks in the zones that follow the first one, and then always read/writing them using the 'alternate' layout. However, that is seen as a more substantial change, and we are attempting the least invasive fix at this time to remedy the corruption. I've verified the change using the reproducer mentioned above. Typically, the corruption is seen after less than 3 iterations, while the patch has run 500+ iterations. Cc: NeilBrown <neilb@suse.de> Cc: Song Liu <song@kernel.org> Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.") Cc: stable@vger.kernel.org Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230623180523.1901230-1-jbaron@akamai.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27bcache: Remove unnecessary NULL point check in node allocationsZheng Wang
commit 028ddcac477b691dd9205c92f991cc15259d033e upstream. Due to the previous fix of __bch_btree_node_alloc, the return value will never be a NULL pointer. So IS_ERR is enough to handle the failure situation. Fix it by replacing IS_ERR_OR_NULL check by an IS_ERR check. Fixes: cafe56359144 ("bcache: A block layer cache") Cc: stable@vger.kernel.org Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20230615121223.22502-5-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27md/raid10: fix io loss while replacement replace rdevLi Nan
[ Upstream commit 2ae6aaf76912bae53c74b191569d2ab484f24bf3 ] When removing a disk with replacement, the replacement will be used to replace rdev. During this process, there is a brief window in which both rdev and replacement are read as NULL in raid10_write_request(). This will result in io not being submitted but it should be. //remove //write raid10_remove_disk raid10_write_request mirror->rdev = NULL read rdev -> NULL mirror->rdev = mirror->replacement mirror->replacement = NULL read replacement -> NULL Fix it by reading replacement first and rdev later, meanwhile, use smp_mb() to prevent memory reordering. Fixes: 475b0321a4df ("md/raid10: writes should get directed to replacement as well as original.") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230602091839.743798-3-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid10: fix null-ptr-deref of mreplace in raid10_sync_requestLi Nan
[ Upstream commit 34817a2441747b48e444cb0e05d84e14bc9443da ] There are two check of 'mreplace' in raid10_sync_request(). In the first check, 'need_replace' will be set and 'mreplace' will be used later if no-Faulty 'mreplace' exists, In the second check, 'mreplace' will be set to NULL if it is Faulty, but 'need_replace' will not be changed accordingly. null-ptr-deref occurs if Faulty is set between two check. Fix it by merging two checks into one. And replace 'need_replace' with 'mreplace' because their values are always the same. Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230527072218.2365857-2-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid10: fix wrong setting of max_corr_read_errorsLi Nan
[ Upstream commit f8b20a405428803bd9881881d8242c9d72c6b2b2 ] There is no input check when echo md/max_read_errors and overflow might occur. Add check of input number. Fixes: 1e50915fe0bb ("raid: improve MD/raid10 handling of correctable read errors.") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230522072535.1523740-3-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid10: fix overflow of md/safe_mode_delayLi Nan
[ Upstream commit 6beb489b2eed25978523f379a605073f99240c50 ] There is no input check when echo md/safe_mode_delay in safe_delay_store(). And msec might also overflow when HZ < 1000 in safe_delay_show(), Fix it by checking overflow in safe_delay_store() and use unsigned long conversion in safe_delay_show(). Fixes: 72e02075a33f ("md: factor out parsing of fixed-point numbers") Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230522072535.1523740-2-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27md/raid10: check slab-out-of-bounds in md_bitmap_get_counterLi Nan
[ Upstream commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5 ] If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage() will return -EINVAL because 'page >= bitmap->pages', but the return value was not checked immediately in md_bitmap_get_counter() in order to set *blocks value and slab-out-of-bounds occurs. Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and return directly if true. Fixes: ef4256733506 ("md/bitmap: optimise scanning of empty bitmaps.") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230515134808.3936750-2-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09treewide: Remove uninitialized_var() usageKees Cook
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17dm verity: fix error handling for check_at_most_once on FECYeongjin Gil
[ Upstream commit e8c5d45f82ce0c238a4817739892fe8897a3dcc3 ] In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return directly. But if FEC configured, it is desired to correct the data page through verity_verify_io. And the return value will be converted to blk_status and passed to verity_finish_io(). BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips verification regardless of I/O error for the corresponding bio. In this case, the I/O error could not be returned properly, and as a result, there is a problem that abnormal data could be read for the corresponding block. To fix this problem, when an I/O error occurs, do not skip verification even if the bit related is set in v->validated_blocks. Fixes: 843f38d382b1 ("dm verity: add 'check_at_most_once' option to only validate hashes once") Cc: stable@vger.kernel.org Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17dm verity: skip redundant verity_handle_err() on I/O errorsAkilesh Kailash
[ Upstream commit 2c0468e054c0adb660ac055fc396622ec7235df9 ] Without FEC, dm-verity won't call verity_handle_err() when I/O fails, but with FEC enabled, it currently does even if an I/O error has occurred. If there is an I/O error and FEC correction fails, return the error instead of calling verity_handle_err() again. Suggested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Akilesh Kailash <akailash@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Stable-dep-of: e8c5d45f82ce ("dm verity: fix error handling for check_at_most_once on FEC") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17dm ioctl: fix nested locking in table_clear() to remove deadlock concernMike Snitzer
commit 3d32aaa7e66d5c1479a3c31d6c2c5d45dd0d3b89 upstream. syzkaller found the following problematic rwsem locking (with write lock already held): down_read+0x9d/0x450 kernel/locking/rwsem.c:1509 dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773 __dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844 table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537 In table_clear, it first acquires a write lock https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520 down_write(&_hash_lock); Then before the lock is released at L1539, there is a path shown above: table_clear -> __dev_status -> dm_get_inactive_table -> down_read https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773 down_read(&_hash_lock); It tries to acquire the same read lock again, resulting in the deadlock problem. Fix this by moving table_clear()'s __dev_status() call to after its up_write(&_hash_lock); Cc: stable@vger.kernel.org Reported-by: Zheng Zhang <zheng.zhang@email.ucr.edu> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17dm flakey: fix a crash with invalid table lineMikulas Patocka
commit 98dba02d9a93eec11bffbb93c7c51624290702d2 upstream. This command will crash with NULL pointer dereference: dmsetup create flakey --table \ "0 `blockdev --getsize /dev/ram0` flakey /dev/ram0 0 0 1 2 corrupt_bio_byte 512" Fix the crash by checking if arg_name is non-NULL before comparing it. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17dm integrity: call kmem_cache_destroy() in dm_integrity_init() error pathMike Snitzer
commit 6b79a428c02769f2a11f8ae76bf866226d134887 upstream. Otherwise the journal_io_cache will leak if dm_register_target() fails. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17dm clone: call kmem_cache_destroy() in dm_clone_init() error pathMike Snitzer
commit 6827af4a9a9f5bb664c42abf7c11af4978d72201 upstream. Otherwise the _hydration_cache will leak if dm_register_target() fails. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17md/raid10: fix null-ptr-deref in raid10_sync_requestLi Nan
commit a405c6f0229526160aa3f177f65e20c86fce84c5 upstream. init_resync() inits mempool and sets conf->have_replacemnt at the beginning of sync, close_sync() frees the mempool when sync is completed. After [1] recovery might be skipped and init_resync() is called but close_sync() is not. null-ptr-deref occurs with r10bio->dev[i].repl_bio. The following is one way to reproduce the issue. 1) create a array, wait for resync to complete, mddev->recovery_cp is set to MaxSector. 2) recovery is woken and it is skipped. conf->have_replacement is set to 0 in init_resync(). close_sync() not called. 3) some io errors and rdev A is set to WantReplacement. 4) a new device is added and set to A's replacement. 5) recovery is woken, A have replacement, but conf->have_replacemnt is 0. r10bio->dev[i].repl_bio will not be alloced and null-ptr-deref occurs. Fix it by not calling init_resync() if recovery skipped. [1] commit 7e83ccbecd60 ("md/raid10: Allow skipping recovery when clean arrays are assembled") Fixes: 7e83ccbecd60 ("md/raid10: Allow skipping recovery when clean arrays are assembled") Cc: stable@vger.kernel.org Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230222041000.3341651-3-linan666@huaweicloud.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17md/raid10: fix memleak of md threadYu Kuai
[ Upstream commit f0ddb83da3cbbf8a1f9087a642c448ff52ee9abd ] In raid10_run(), if setup_conf() succeed and raid10_run() failed before setting 'mddev->thread', then in the error path 'conf->thread' is not freed. Fix the problem by setting 'mddev->thread' right after setup_conf(). Fixes: 43a521238aca ("md-cluster: choose correct label when clustered layout is not supported") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230310073855.1337560-7-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17md: update the optimal I/O size on reshapeChristoph Hellwig
[ Upstream commit 16ef510139315a2147ee7525796f8dbd4e4b7864 ] The raid5 and raid10 drivers currently update the read-ahead size, but not the optimal I/O size on reshape. To prepare for deriving the read-ahead size from the optimal I/O size make sure it is updated as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f0ddb83da3cb ("md/raid10: fix memleak of md thread") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17md/raid10: fix memleak for 'conf->bio_split'Yu Kuai
[ Upstream commit c9ac2acde53f5385de185bccf6aaa91cf9ac1541 ] In the error path of raid10_run(), 'conf' need be freed, however, 'conf->bio_split' is missed and memory will be leaked. Since there are 3 places to free 'conf', factor out a helper to fix the problem. Fixes: fc9977dd069e ("md/raid10: simplify the splitting of requests.") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230310073855.1337560-6-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17md/raid10: fix leak of 'r10bio->remaining' for recoveryYu Kuai
[ Upstream commit 26208a7cffd0c7cbf14237ccd20c7270b3ffeb7e ] raid10_sync_request() will add 'r10bio->remaining' for both rdev and replacement rdev. However, if the read io fails, recovery_request_write() returns without issuing the write io, in this case, end_sync_request() is only called once and 'remaining' is leaked, cause an io hang. Fix the problem by decreasing 'remaining' according to if 'bio' and 'repl_bio' is valid. Fixes: 24afd80d99f8 ("md/raid10: handle recovery of replacement devices.") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230310073855.1337560-5-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05md: avoid signed overflow in slot_store()NeilBrown
[ Upstream commit 3bc57292278a0b6ac4656cad94c14f2453344b57 ] slot_store() uses kstrtouint() to get a slot number, but stores the result in an "int" variable (by casting a pointer). This can result in a negative slot number if the unsigned int value is very large. A negative number means that the slot is empty, but setting a negative slot number this way will not remove the device from the array. I don't think this is a serious problem, but it could cause confusion and it is best to fix it. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05dm crypt: add cond_resched() to dmcrypt_write()Mikulas Patocka
commit fb294b1c0ba982144ca467a75e7d01ff26304e2b upstream. The loop in dmcrypt_write may be running for unbounded amount of time, thus we need cond_resched() in it. This commit fixes the following warning: [ 3391.153255][ C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897] ... [ 3391.387210][ C12] Call trace: [ 3391.390338][ C12] blk_attempt_bio_merge.part.6+0x38/0x158 [ 3391.395970][ C12] blk_attempt_plug_merge+0xc0/0x1b0 [ 3391.401085][ C12] blk_mq_submit_bio+0x398/0x550 [ 3391.405856][ C12] submit_bio_noacct+0x308/0x380 [ 3391.410630][ C12] dmcrypt_write+0x1e4/0x208 [dm_crypt] [ 3391.416005][ C12] kthread+0x130/0x138 [ 3391.419911][ C12] ret_from_fork+0x10/0x18 Reported-by: yangerkun <yangerkun@huawei.com> Fixes: dc2676210c42 ("dm crypt: offload writes to thread") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-05dm stats: check for and propagate alloc_percpu failureJiasheng Jiang
commit d3aa3e060c4a80827eb801fc448debc9daa7c46b upstream. Check alloc_precpu()'s return value and return an error from dm_stats_init() if it fails. Update alloc_dev() to fail if dm_stats_init() does. Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup() even if dm-stats isn't being actively used. Fixes: fd2ed4d25270 ("dm: add statistics support") Cc: stable@vger.kernel.org Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-05dm thin: fix deadlock when swapping to thin deviceColy Li
commit 9bbf5feecc7eab2c370496c1c161bbfe62084028 upstream. This is an already known issue that dm-thin volume cannot be used as swap, otherwise a deadlock may happen when dm-thin internal memory demand triggers swap I/O on the dm-thin volume itself. But thanks to commit a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device"), the limit_swap_bios target flag can also be used for dm-thin to avoid the recursive I/O when it is used as swap. Fix is to simply set ti->limit_swap_bios to true in both pool_ctr() and thin_ctr(). In my test, I create a dm-thin volume /dev/vg/swap and use it as swap device. Then I run fio on another dm-thin volume /dev/vg/main and use large --blocksize to trigger swap I/O onto /dev/vg/swap. The following fio command line is used in my test, fio --name recursive-swap-io --lockmem 1 --iodepth 128 \ --ioengine libaio --filename /dev/vg/main --rw randrw \ --blocksize 1M --numjobs 32 --time_based --runtime=12h Without this fix, the whole system can be locked up within 15 seconds. With this fix, there is no any deadlock or hung task observed after 2 hours of running fio. Furthermore, if blocksize is changed from 1M to 128M, after around 30 seconds fio has no visible I/O, and the out-of-memory killer message shows up in kernel message. After around 20 minutes all fio processes are killed and the whole system is back to being alive. This is exactly what is expected when recursive I/O happens on dm-thin volume when it is used as swap. Depends-on: a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device") Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11dm flakey: don't corrupt the zero pageMikulas Patocka
commit f50714b57aecb6b3dc81d578e295f86d9c73f078 upstream. When we need to zero some range on a block device, the function __blkdev_issue_zero_pages submits a write bio with the bio vector pointing to the zero page. If we use dm-flakey with corrupt bio writes option, it will corrupt the content of the zero page which results in crashes of various userspace programs. Glibc assumes that memory returned by mmap is zeroed and it uses it for calloc implementation; if the newly mapped memory is not zeroed, calloc will return non-zeroed memory. Fix this bug by testing if the page is equal to ZERO_PAGE(0) and avoiding the corruption in this case. Cc: stable@vger.kernel.org Fixes: a00f5276e266 ("dm flakey: Properly corrupt multi-page bios.") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11dm flakey: fix logic when corrupting a bioMikulas Patocka
commit aa56b9b75996ff4c76a0a4181c2fa0206c3d91cc upstream. If "corrupt_bio_byte" is set to corrupt reads and corrupt_bio_flags is used, dm-flakey would erroneously return all writes as errors. Likewise, if "corrupt_bio_byte" is set to corrupt writes, dm-flakey would return errors for all reads. Fix the logic so that if fc->corrupt_bio_byte is non-zero, dm-flakey will not abort reads on writes with an error. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11dm cache: add cond_resched() to various workqueue loopsMike Snitzer
[ Upstream commit 76227f6dc805e9e960128bcc6276647361e0827c ] Otherwise on resource constrained systems these workqueues may be too greedy. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11dm thin: add cond_resched() to various workqueue loopsMike Snitzer
[ Upstream commit e4f80303c2353952e6e980b23914e4214487f2a6 ] Otherwise on resource constrained systems these workqueues may be too greedy. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11dm: remove flush_scheduled_work() during local_exit()Mike Snitzer
[ Upstream commit 0b22ff5360f5c4e11050b89206370fdf7dc0a226 ] Commit acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal") switched from using system workqueue to a single workqueue local to DM. But it didn't eliminate the call to flush_scheduled_work() that was introduced purely for the benefit of deferred device removal with commit 2c140a246dc ("dm: allow remove to be deferred"). Since DM core uses its own workqueue (and queue_work) there is no need to call flush_scheduled_work() from local_exit(). local_exit()'s destroy_workqueue(deferred_remove_workqueue) handles flushing work started with queue_work(). Fixes: acfe0ad74d2e1 ("dm: allocate a special workqueue for deferred device removal") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18dm thin: resume even if in FAIL modeLuo Meng
[ Upstream commit 19eb1650afeb1aa86151f61900e9e5f1de5d8d02 ] If a thinpool set fail_io while suspending, resume will fail with: device-mapper: resume ioctl on vg-thinpool failed: Invalid argument The thin-pool also can't be removed if an in-flight bio is in the deferred list. This can be easily reproduced using: echo "offline" > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4K count=1 dmsetup suspend /dev/mapper/pool mkfs.ext4 /dev/mapper/thin dmsetup resume /dev/mapper/pool The root cause is maybe_resize_data_dev() will check fail_io and return error before called dm_resume. Fix this by adding FAIL mode check at the end of pool_preresume(). Cc: stable@vger.kernel.org Fixes: da105ed5fd7e ("dm thin metadata: introduce dm_pool_abort_metadata") Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18md/bitmap: Fix bitmap chunk size overflow issuesFlorian-Ewald Mueller
commit 4555211190798b6b6fa2c37667d175bf67945c78 upstream. - limit bitmap chunk size internal u64 variable to values not overflowing the u32 bitmap superblock structure variable stored on persistent media - assign bitmap chunk size internal u64 variable from unsigned values to avoid possible sign extension artifacts when assigning from a s32 value The bug has been there since at least kernel 4.0. Steps to reproduce it: 1: mdadm -C /dev/mdx -l 1 --bitmap=internal --bitmap-chunk=256M -e 1.2 -n2 /dev/rnbd1 /dev/rnbd2 2 resize member device rnbd1 and rnbd2 to 8 TB 3 mdadm --grow /dev/mdx --size=max The bitmap_chunksize will overflow without patch. Cc: stable@vger.kernel.org Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18dm cache: set needs_check flag after aborting metadataMike Snitzer
commit 6b9973861cb2e96dcd0bb0f1baddc5c034207c5c upstream. Otherwise the commit that will be aborted will be associated with the metadata objects that will be torn down. Must write needs_check flag to metadata with a reset block manager. Found through code-inspection (and compared against dm-thin.c). Cc: stable@vger.kernel.org Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>