summaryrefslogtreecommitdiff
path: root/drivers/md/persistent-data
AgeCommit message (Collapse)Author
2017-05-25dm space map disk: fix some book keeping in the disk space mapJoe Thornber
commit 0377a07c7a035e0d033cd8b29f0cb15244c0916a upstream. When decrementing the reference count for a block, the free count wasn't being updated if the reference count went to zero. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25dm btree: fix for dm_btree_find_lowest_key()Vinothkumar Raja
commit 7d1fedb6e96a960aa91e4ff70714c3fb09195a5a upstream. dm_btree_find_lowest_key() is giving incorrect results. find_key() traverses the btree correctly for finding the highest key, but there is an error in the way it traverses the btree for retrieving the lowest key. dm_btree_find_lowest_key() fetches the first key of the rightmost block of the btree instead of fetching the first key from the leftmost block. Fix this by conditionally passing the correct parameter to value64() based on the @find_highest flag. Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu> Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu> Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06dm space map metadata: fix 'struct sm_metadata' leak on failed createBenjamin Marzinski
commit 314c25c56c1ee5026cf99c570bdfe01847927acb upstream. In dm_sm_metadata_create() we temporarily change the dm_space_map operations from 'ops' (whose .destroy function deallocates the sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't). If dm_sm_metadata_create() fails in sm_ll_new_metadata() or sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls dm_sm_destroy() with the intention of freeing the sm_metadata, but it doesn't (because the dm_space_map operations is still set to 'bootstrap_ops'). Fix this by setting the dm_space_map operations back to 'ops' if dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-22dm array: introduce cursor apiJoe Thornber
More efficient way to iterate an array due to prefetching (makes use of the new dm_btree_cursor_* api). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm btree: introduce cursor apiJoe Thornber
This uses prefetching to speed up iteration through a btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22dm array: add dm_array_new()Joe Thornber
dm_array_new() creates a new, populated array more efficiently than starting with an empty one and resizing. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20dm btree: fix a bug in dm_btree_find_next_single()Joe Thornber
dm_btree_find_next_single() can short-circuit the search for a block with a return of -ENODATA if all entries are higher than the search key passed to lower_bound(). This hasn't been a problem because of the way the btree has been used by DM thinp. But it must be fixed now in preparation for fixing the race in DM thinp's handling of simultaneous block discard vs allocation. Otherwise, once that fix is in place, some of the blocks in a discard would not be unmapped as expected. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-14dm space map metadata: remove unused variable in brb_pop()Mike Snitzer
Remove the unused struct block_op pointer that was inadvertantly introduced, via cut-and-paste of previous brb_op() code, as part of commit 50dd842ad. (Cc'ing stable@ because commit 50dd842ad did) Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map") Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-10dm btree: factor out need_insert() helperMike Snitzer
Eliminates code duplication within insert(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm bufio: store stacktrace in buffers to help find buffer leaksMikulas Patocka
The option DM_DEBUG_BLOCK_STACK_TRACING is moved from persistent-data directory to device mapper directory because it will now be used by persistent-data and bufio. When the option is enabled, each bufio buffer stores the stacktrace of the last dm_bufio_get(), dm_bufio_read() or dm_bufio_new() call that increased the hold count to 1. The buffer's stacktrace is printed if the buffer was not released before the bufio client is destroyed. When DM_DEBUG_BLOCK_STACK_TRACING is enabled, any bufio buffer leaks are considered warnings - i.e. the kernel continues afterwards. If not enabled, buffer leaks are considered BUGs and the kernel with crash. Reasoning on this disposition is: if we only ever warned on buffer leaks users would generally ignore them and the problematic code would never get fixed. Successfully used to find source of bufio leaks fixed with commit fce079f63c3 ("dm btree: fix bufio buffer leaks in dm_btree_del() error path"). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm block manager: cleanup code that prints stacktraceMikulas Patocka
There is no need to record stack trace and immediately print it. Just use dump_stack() to print the current stack. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm btree: fix bufio buffer leaks in dm_btree_del() error pathJoe Thornber
If dm_btree_del()'s call to push_frame() fails, e.g. due to btree_node_validator finding invalid metadata, the dm_btree_del() error path must unlock all frames (which have active dm-bufio buffers) that were pushed onto the del_stack. Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio buffers have leaked, e.g.: device-mapper: bufio: leaked buffer 3, hold count 1, list 0 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-09dm space map metadata: fix ref counting bug when bootstrapping a new space mapJoe Thornber
When applying block operations (BOPs) do not remove them from the uncommitted BOP ring-buffer until after they've been applied -- in case we recurse. Also, perform BOP_INC operation, in dm_sm_metadata_create() and sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather than using direct calls to sm_ll_inc(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-02dm thin metadata: fix bug in dm_thin_remove_range()Joe Thornber
dm_btree_remove_leaves() only unmaps a contiguous region so we need a loop, in __remove_range(), to handle ranges that contain multiple regions. A new btree function, dm_btree_lookup_next(), is introduced which is more efficiently able to skip over regions of the thin device which aren't mapped. __remove_range() uses dm_btree_lookup_next() for each iteration of __remove_range()'s loop. Also, improve description of dm_btree_remove_leaves(). Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
2015-12-02dm btree: fix leak of bufio-backed block in btree_split_sibling error pathMike Snitzer
The block allocated at the start of btree_split_sibling() is never released if later insert_at() fails. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-11-04Merge tag 'dm-4.4-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: "Smaller set of DM changes for this merge. I've based these changes on Jens' for-4.4/reservations branch because the associated DM changes required it. - Revert a dm-multipath change that caused a regression for unprivledged users (e.g. kvm guests) that issued ioctls when a multipath device had no available paths. - Include Christoph's refactoring of DM's ioctl handling and add support for passing through persistent reservations with DM multipath. - All other changes are very simple cleanups" * tag 'dm-4.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm switch: simplify conditional in alloc_region_table() dm delay: document that offsets are specified in sectors dm delay: capitalize the start of an delay_ctr() error message dm delay: Use DM_MAPIO macros instead of open-coded equivalents dm linear: remove redundant target name from error messages dm persistent data: eliminate unnecessary return values dm: eliminate unused "bioset" process for each bio-based DM device dm: convert ffs to __ffs dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() dm: add support for passing through persistent reservations dm: refactor ioctl handling Revert "dm mpath: fix stalls when handling invalid ioctls" dm: initialize non-blk-mq queue data before queue is used
2015-10-31dm persistent data: eliminate unnecessary return valuesMikulas Patocka
dm_bm_unlock and dm_tm_unlock return an integer value but the returned value is always 0. The calling code sometimes checks the return value and sometimes doesn't. Eliminate these unnecessary return values and also the checks for them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-10-23dm btree: fix leak of bufio-backed block in btree_split_beneath error pathMike Snitzer
btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Cc: stable@vger.kernel.org
2015-10-23dm btree remove: fix a bug when rebalancing nodes after removalJoe Thornber
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't a complete fix for redistribute3(). The redistribute3 function takes 3 btree nodes and shares out the entries evenly between them. If the three nodes in total contained (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in the center. Fix this issue by being more careful about calculating the target number of entries for the left and right nodes. Unit tested in userspace using this program: https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-08-12dm: remove unlikely() before IS_ERR()viresh kumar
IS_ERR() already contains an 'unlikely' compiler flag so there is no need to do that again from IS_ERR() callers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree remove: remove unused function get_nr_entries()Vivek Goyal
rebalance_children() calls get_nr_entries() and assigns the result to an unused local 'child_entries' variable. Remove get_nr_entries() and cleanup rebalance_children() accordingly. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()Vivek Goyal
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm btree: add ref counting ops for the leaves of top level btreesJoe Thornber
When using nested btrees, the top leaves of the top levels contain block addresses for the root of the next tree down. If we shadow a shared leaf node the leaf values (sub tree roots) should be incremented accordingly. This is only an issue if there is metadata sharing in the top levels. Which only occurs if metadata snapshots are being used (as is possible with dm-thinp). And could result in a block from the thinp metadata snap being reused early, thus corrupting the thinp metadata snap. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-08-07dm btree remove: fix bug in remove_one()Joe Thornber
remove_one() was not incrementing the key for the beginning of the range, so not all entries were being removed. This resulted in discards that were not unmapping all blocks. Fixes: 4ec331c3ea ("dm btree: add dm_btree_remove_leaves()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-07-06dm btree: silence lockdep lock inversion in dm_btree_del()Joe Thornber
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del() can be called via an ioctl and we don't want to recurse into the FS or block layer. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-07-05dm btree remove: fix bug in redistribute3Dennis Yang
redistribute3() shares entries out across 3 nodes. Some entries were being moved the wrong way, breaking the ordering. This manifested as a BUG() in dm-btree-remove.c:shift() when entries were removed from the btree. For additional context see: https://www.redhat.com/archives/dm-devel/2015-May/msg00113.html Signed-off-by: Dennis Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-06-17dm space map metadata: fix occasional leak of a metadata block on resizeJoe Thornber
The metadata space map has a simplified 'bootstrap' mode that is operational when extending the space maps. Whilst in this mode it's possible for some refcount decrement operations to become queued (eg, as a result of shadowing one of the bitmap indexes). These decrements were not being applied when switching out of bootstrap mode. The effect of this bug was the leaking of a 4k metadata block. This is detected by the latest version of thin_check as a non fatal error. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-06-11dm btree: add dm_btree_remove_leaves()Joe Thornber
Removes a range of leaf values from the tree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29dm thin metadata: remove in-core 'read_only' flagMike Snitzer
Leverage the block manager's read_only flag instead of duplicating it; access with new dm_bm_is_read_only() method. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-21Merge tag 'dm-3.20-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper changes from Mike Snitzer: - Significant dm-crypt CPU scalability performance improvements thanks to changes that enable effective use of an unbound workqueue across all available CPUs. A large battery of tests were performed to validate these changes, summary of results is available here: https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html - A few additional stable fixes (to DM core, dm-snapshot and dm-mirror) and a small fix to the dm-space-map-disk. * tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm snapshot: fix a possible invalid memory access on unload dm: fix a race condition in dm_get_md dm crypt: sort writes dm crypt: add 'submit_from_crypt_cpus' option dm crypt: offload writes to thread dm crypt: remove unused io_pool and _crypt_io_pool dm crypt: avoid deadlock in mempools dm crypt: don't allocate pages for a partial request dm crypt: use unbound workqueue for request processing dm io: reject unsupported DISCARD requests with EOPNOTSUPP dm mirror: do not degrade the mirror on discard error dm space map disk: fix sm_disk_count_is_more_than_one()
2015-02-13dm space map disk: fix sm_disk_count_is_more_than_one()Mike Snitzer
dm_tm_shadow_block() is the only caller of dm_sm_count_is_more_than_one() which only ever operates on a metadata space-map. So in practice, sm_disk_count_is_more_than_one() isn't actually used (which explains why this bug never amounted to anything). But fix sm_disk_count_is_more_than_one() to properly set *result and return 0. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-01-07kconfig: use bool instead of boolean for type definition attributesChristoph Jaeger
Support for keyword 'boolean' will be dropped later on. No functional change. Reference: http://lkml.kernel.org/r/cover.1418003065.git.cj@linux.com Signed-off-by: Christoph Jaeger <cj@linux.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
2014-12-02dm space map metadata: fix sm_bootstrap_get_count()Joe Thornber
Must set 'result' accordingly rather than return it. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-12-01dm space map metadata: fix sm_bootstrap_get_nr_blocks()Dan Carpenter
This function isn't right and it causes a static checker warning: drivers/md/dm-thin.c:3016 maybe_resize_data_dev() error: potentially using uninitialized 'sb_data_size'. It should set "*count" and return zero on success the same as the sm_metadata_get_nr_blocks() function does earlier. Fixes: 3241b1d3e0aa ('dm: add persistent data library') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-12-01dm array: if resizing the array is a noop set the new root to the old oneJoe Thornber
This could've been quite bad (to return success but not update the new root to point at the old) but in practice the only known consumer of the dm array code is the DM cache target. And the DM cache target passes in the same old root to array_resize() anyway. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm transaction manager: add support for prefetching blocks of metadataJoe Thornber
Introduce the dm_tm_issue_prefetches interface. If you're using a non-blocking clone the tm will build up a list of requested blocks that weren't in core. dm_tm_issue_prefetches will request those blocks to be prefetched. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm btree: fix a recursion depth bug in btree walking codeJoe Thornber
The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-03-27dm transaction manager: fix corruption due to non-atomic transaction commitJoe Thornber
The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-03-27dm bitset: only flush the current word if it has been dirtiedJoe Thornber
This change offers a big performance boost for dm-era. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-03-07dm space map metadata: fix refcount decrement below 0 which caused corruptionJoe Thornber
This has been a relatively long-standing issue that wasn't nailed down until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014, see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html From that report: "When decreasing the reference count of a metadata block with its reference count equals 3, we will call dm_btree_remove() to remove this enrty from the B+tree which keeps the reference count info in metadata device. The B+tree will try to rebalance the entry of the child nodes in each node it traversed, and the rebalance process contains the following steps. (1) Finding the corresponding children in current node (shadow_current(s)) (2) Shadow the children block (issue BOP_INC) (3) redistribute keys among children, and free children if necessary (issue BOP_DEC) Since the update of a metadata block's reference count could be recursive, we will stash these reference count update operations in smm->uncommitted and then process them in a FILO fashion. The problem is that step(3) could free the children which is created in step(2), so the BOP_DEC issued in step(3) will be carried out before the BOP_INC issued in step(2) since these BOPs will be processed in FILO fashion. Once the BOP_DEC from step(3) tries to decrease the reference count of newly shadow block, it will report failure for its reference equals 0 before decreasing. It looks like we can solve this issue by processing these BOPs in a FIFO fashion instead of FILO." Commit 5b564d80 ("dm space map: disallow decrementing a reference count below zero") changed the code to report an error for this temporary refcount decrement below zero. So what was previously a harmless invalid refcount became a hard failure due to the new error path: device-mapper: space map common: unable to decrement a reference count below 0 device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22 device-mapper: thin: 253:6: switching pool to read-only mode This bug is in dm persistent-data code that is common to the DM thin and cache targets. So any users of those targets should apply this fix. Fix this by applying recursive space map operations in FIFO order rather than FILO. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801 Reported-by: Apollon Oikonomopoulos <apoikos@debian.org> Reported-by: edwillam1007@gmail.com Reported-by: Teng-Feng Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.13+
2014-03-03dm: fix Kconfig indentationMike Snitzer
Since DM_DEBUG_BLOCK_STACK_TRACING is a DM_PERSISTENT_DATA config option move it from drivers/md/Kconfig to drivers/md/persistent-data/Kconfig. Doing so fixes indentation for other DM config options. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-02-27dm thin: allow metadata space larger than supported to go unusedMike Snitzer
It was always intended that a user could provide a thin metadata device that is larger than the max supported by the on-disk format. The extra space would just go unused. Unfortunately that never worked. If the user attempted to use a larger metadata device on creation they would get an error like the following: device-mapper: space map common: space map too large device-mapper: transaction manager: couldn't create metadata space map device-mapper: thin metadata: tm_create_with_sm failed device-mapper: table: 252:17: thin-pool: Error creating metadata object device-mapper: ioctl: error adding target to table Fix this by allowing the initial metadata space map creation to cap its size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS). get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported). Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for the sizeof the disk_bitmap_header. So the supported maximum metadata size is a bit smaller (reduced from 33423360 to 33292800 sectors). Lastly, remove the "excess space will not be used" warning message from get_metadata_dev_size(); it resulted in printing the warning multiple times. Factor out warn_if_metadata_device_too_big(), call it from pool_ctr() and maybe_resize_metadata_dev(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2014-01-21dm space map metadata: fix bug in resizing of thin metadataJoe Thornber
This bug was introduced in commit 7e664b3dec431e ("dm space map metadata: fix extending the space map"). When extending a dm-thin metadata volume we: - Switch the space map into a simple bootstrap mode, which allocates all space linearly from the newly added space. - Add new bitmap entries for the new space - Increment the reference counts for those newly allocated bitmap entries - Commit changes to disk - Switch back out of bootstrap mode. But, the disk commit may allocate space itself, if so this fact will be lost when switching out of bootstrap mode. The bug exhibited itself as an error when the bitmap_root, with an erroneous ref count of 0, was subsequently decremented as part of a later disk commit. This would cause the disk commit to fail, and thinp to enter read_only mode. The metadata was not damaged (thin_check passed). The fix is to put the increments + commit into a loop, running until the commit has not allocated extra space. In practise this loop only runs twice. With this fix the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n thin_remove_works_after_resize Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # depends on commit 7e664b3dec431e
2014-01-09dm btree: add dm_btree_find_lowest_keyJoe Thornber
dm_btree_find_lowest_key is the reciprocal of dm_btree_find_highest_key. Factor out common code for dm_btree_find_{highest,lowest}_key. dm_btree_find_lowest_key is needed for an upcoming DM target, as such it is best to get this interface in place. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-01-07dm space map metadata: fix extending the space mapJoe Thornber
When extending a metadata space map we should do the first commit whilst still in bootstrap mode -- a mode where all blocks get allocated in the new area. That way the commit overhead is allocated from the newly added space. Otherwise we risk running out of space. With this fix, and the previous commit "dm space map common: make sure new space is used during extend", the following device mapper testsuite test passes: dmtest run --suite thin-provisioning -n /resize_metadata_no_io/ Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-01-07dm space map common: make sure new space is used during extendJoe Thornber
When extending a low level space map we should update nr_blocks at the start so the new space is used for the index entries. Otherwise extend can fail, e.g.: sm_metadata_extend call sequence that fails: -> sm_ll_extend -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block => returns -ENOSPC because smm->begin == smm->ll.nr_blocks Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-01-07dm persistent data: cleanup dm-thin specific references in textMike Snitzer
DM's persistent-data library is now used my multiple targets so exclusive references to "pool" or "thin provisioning" need to be cleaned up. Adjust Kconfig's DM_DEBUG_BLOCK_STACK_TRACING text and remove "pool" from a block manager error message. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2014-01-07dm space map metadata: limit errors in sm_metadata_new_blockMike Snitzer
The "unable to allocate new metadata block" error can be a particularly verbose error if there is a systemic issue with the metadata device. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2013-12-13dm array: fix a reference counting bug in shadow_ablockJoe Thornber
An old array block could have its reference count decremented below zero when it is being replaced in the btree by a new array block. The fix is to increment the old ablock's reference count just before inserting a new ablock into the btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
2013-12-13dm space map: disallow decrementing a reference count below zeroJoe Thornber
The old behaviour, returning -EINVAL if a ref_count of 0 would be decremented, was removed in commit f722063 ("dm space map: optimise sm_ll_dec and sm_ll_inc"). To fix this regression we return an error code from the mutator function pointer passed to sm_ll_mutate() and have dec_ref_count() return -EINVAL if the old ref_count is 0. Add a DMERR to reflect the potential seriousness of this error. Also, add missing dm_tm_unlock() to sm_ll_mutate()'s error path. With this fix the following dmts regression test now passes: dmtest run --suite cache -n /metadata_use_kernel/ The next patch fixes the higher-level dm-array code that exposed this regression. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.12+