summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2014-11-19dm thin: do not allow thin device activation while pool is suspendedMike Snitzer
Otherwise IO could be issued to the pool while it is suspended. Care was taken to properly interlock between the thin and thin-pool targets when accessing the pool's 'suspended' flag. The thin_ctr will not add a new thin device to the pool's active_thins list if the pool is susepended. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2014-11-19dm: add presuspend_undo hook to target_typeMike Snitzer
The DM thin-pool target now must undo the changes performed during pool_presuspend() so introduce presuspend_undo hook in target_type. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
2014-11-19dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctlMike Snitzer
No point checking if the device is suspended if the current target doesn't even implement .ioctl Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-12dm thin: remove stale 'trim' message in block comment above pool_messageMike Snitzer
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-12dm thin: fix a race in thin_dtrMikulas Patocka
As long as struct thin_c is in the list, anyone can grab a reference of it. Consequently, we must wait for the reference count to drop to zero *after* we remove the structure from the list, not before. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-12dm cache: emit a warning message if there are a lot of cache blocksJoe Thornber
Loading and saving millions of block mappings takes time. We may as well explain what's going on, and encourage people to use a larger cache block size. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm cache: improve discard supportJoe Thornber
Safely allow the discard blocksize to be larger than the cache blocksize by using the bio prison's range locking support. This also improves discard performance considerly because larger discards are issued to the dm-cache device. The discard blocksize was always intended to be greater than the cache blocksize. But until now it wasn't implemented safely. Also, by safely restoring the ability to have discard blocksize larger than cache blocksize we're able to significantly reduce the memory used for the cache's discard bitset. Before, with a small discard blocksize, the discard bitset could get quite large because its size is a function of the discard blocksize and the origin device's size. For example, previously, using a 32KB cache blocksize with a 40TB origin resulted in 1280MB of incore memory use for the discard bitset! Now, the discard blocksize is scaled up accordingly to ensure the discard bitset is capped at 2**14 bits, or 16KB. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm cache: revert "prevent corruption caused by discard_block_size > ↵Joe Thornber
cache_block_size" This reverts commit d132cc6d9e92424bb9d4fd35f5bd0e55d583f4be because we actually do want to allow the discard blocksize to be larger than the cache blocksize. Further dm-cache discard changes will make this possible. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm cache: revert "remove remainder of distinct discard block size"Joe Thornber
This reverts commit 64ab346a360a4b15c28fb8531918d4a01f4eabd9 because we actually do want to allow the discard blocksize to be larger than the cache blocksize. Further dm-cache discard changes will make this possible. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm bio prison: introduce support for locking ranges of blocksJoe Thornber
Ranges will be placed in the same cell if they overlap. Range locking is a prerequisite for more efficient multi-block discard support in both the cache and thin-provisioning targets. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm cache policy mq: simplify ability to promote sequential IO to the cacheMike Snitzer
Before, if the user wanted sequential IO to be promoted to the cache they'd have to set sequential_threshold to some nebulous large value. Now, the user may easily disable sequential IO detection (and sequential IO's implicit bypass of the cache) by setting sequential_threshold to 0. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm cache policy mq: tweak algorithm that decides when to promote a blockJoe Thornber
Rather than maintaining a separate promote_threshold variable that we periodically update we now use the hit count of the oldest clean block. Also add a fudge factor to discourage demoting dirty blocks. With some tests this has a sizeable difference, because the old code was too eager to demote blocks. For example, device-mapper-test-suite's git_extract_cache_quick test goes from taking 190 seconds, to 142 (linear on spindle takes 250). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm: do not call dm_sync_table() when creating new devicesHannes Reinecke
When creating new devices dm_sync_table() calls synchronize_rcu_expedited(), causing _all_ pending RCU pointers to be flushed. This causes a latency overhead that is especially noticeable when creating lots of devices. And all of this is pointless as there are no old maps to be disconnected, and hence no stale pointers which would need to be cleared up. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm: sparse: Annotate field with __rcu for checkingPranith Kumar
Annotate the map field with __rcu since this is a rcu pointer which is checked by sparse. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm: Use rcu_dereference() for accessing rcu pointerPranith Kumar
The map field in 'struct mapped_device' is an rcu pointer. Use rcu_dereference() while accessing it. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: refactor requeue_io to eliminate spinlock bouncingMike Snitzer
Also refactor some other bio_list erroring helpers. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: optimize retry_bios_on_resumeMike Snitzer
Eliminate redundant should_error_unserviceable_bio check and error loop. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: sort the deferred cellsJoe Thornber
Sort the cells in logical block order before processing each cell in process_thin_deferred_cells(). This significantly improves the ondisk layout on rotational storage, whereby improving read performance. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: direct dispatch when breaking sharingJoe Thornber
This use of direct submission in process_shared_bio() reduces latency for submitting bios in the shared cell by avoiding adding those bios to the deferred list and waiting for the next iteration of the worker. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: remap the bios in a cell immediatelyJoe Thornber
This use of direct submission in process_prepared_mapping() reduces latency for submitting bios in a cell by avoiding adding those bios to the deferred list and waiting for the next iteration of the worker. But this direct submission exposes the potential for a race between releasing a cell and incrementing deferred set. Fix this by introducing dm_cell_visit_release() and refactoring inc_remap_and_issue_cell() accordingly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: defer whole cells rather than individual biosJoe Thornber
This avoids dropping the cell, so increases the probability that other bios will collect within the cell, rather than being passed individually to the worker. Also add required process_cell and process_discard_cell error handling wrappers and set associated pool-mode function pointers accordingly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: factor out remap_and_issue_overwriteMike Snitzer
Purely cleanup of duplicated code, no functional change. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: performance improvement to discard processingJoe Thornber
When processing a discard bio, if the block is already quiesced do the discard immediately rather than adding the mapping to a list for the next iteration of the worker thread. Discarding a fully provisioned 100G thin volume with 64k block size goes from 860s to 95s with this change. Clearly there's something wrong with the worker architecture, more investigation needed. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: implement thin_mergeMike Snitzer
Introduce thin_merge so that any additional constraints from the data volume may be taken into account when determing the maximum number of sectors that can be issued relative to the specified logical offset. This is particularly important if/when the data volume is layered ontop of a more sophisticated device (e.g. dm-raid or some other DM target). Reviewed-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm: improve documentation and code clarity in dm_merge_bvecMike Snitzer
These code changes do not introduce a functional change. But bio_add_page() will never attempt to build up a bio larger than queue_max_sectors(). Similarly, bio_get_nr_vecs() is also bound by queue_max_sectors(). Therefore, there is no point in allowing dm_merge_bvec() to answer "how many sectors can a bio have at this offset?" with anything larger than queue_max_sectors(). Using queue_max_sectors() rather than BIO_MAX_SECTORS serves to more accurately convey the limits that are being imposed. Also, use unlikely() to clarify the fact that the defensive code in dm_merge_bvec() relative to max_size going negative shouldn't ever happen -- if it does happen there is a bug in the block layer for requesting larger than dm_merge_bvec()'s initial response for a given offset. Also, update a comment in dm_merge_bvec() relative to max_hw_sectors_kb. And fix empty newline whitespace. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: adjust max_sectors_kb based on thinp blocksizeMike Snitzer
Allows for filesystems to submit bios that are a factor of the thinp blocksize, improving dm-thinp efficiency (particularly when the data volume is RAID). Also set io_min to max_sectors_kb if it is a factor of the thinp blocksize. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: throttle incoming IOJoe Thornber
Throttle IO based on the time it's taking the worker to do one loop. There were reports of hung task timeouts occuring and it was observed that the excessively long avgqu-sz (as reported by iostat) was contributing to these hung tasks. Throttling definitely helps dm-thinp perform better under heavy IO load (without being detremental by being overzealous). It reduces avgqu-sz drastically, e.g.: from 60K to ~6K, and even as low as 150 once metadata is cached by bufio, when dirty_ratio=5, dirty_background_ratio=2. And avgqu-sz stays at or below 30K even with dirty_ratio=20, dirty_background_ratio=10. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: prefetch missing metadata pagesJoe Thornber
Prefetch metadata at the start of the worker thread and then again every 128th bio processed from the deferred list. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm transaction manager: add support for prefetching blocks of metadataJoe Thornber
Introduce the dm_tm_issue_prefetches interface. If you're using a non-blocking clone the tm will build up a list of requested blocks that weren't in core. dm_tm_issue_prefetches will request those blocks to be prefetched. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin metadata: change dm_thin_find_block to allow blocking, but not ↵Joe Thornber
issuing, IO This change is a prerequisite for allowing metadata to be prefetched. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm bio prison: switch to using a red black treeJoe Thornber
Previously it was using a fixed sized hash table. There are times when very many concurrent cells are held (such as when processing a very large discard). When this happens the hash table performance becomes very poor. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm bufio: evict buffers that are past the max age but retain some buffersJoe Thornber
These changes help keep metadata backed by dm-bufio in-core longer which fixes reports of metadata churn in the face of heavy random IO workloads. Before, bufio evicted all buffers older than DM_BUFIO_DEFAULT_AGE_SECS. Having a device (e.g. dm-thinp or dm-cache) lose all metadata just because associated buffers had been idle for some time is unfriendly. Now, the user may now configure the number of bytes that bufio retains using the 'retain_bytes' module parameter. The default is 256K. Also, the DM_BUFIO_WORK_TIMER_SECS and DM_BUFIO_DEFAULT_AGE_SECS defaults were quite low so increase them (to 30 and 300 respectively). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm bufio: switch from a huge hash table to an rbtreeJoe Thornber
Converting over to using an rbtree eliminates a fixed 8MB allocation from vmalloc space for the hash table. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm btree: fix a recursion depth bug in btree walking codeJoe Thornber
The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-11-04dm thin: grab a virtual cell before looking up the mappingJoe Thornber
Avoids normal IO racing with discard. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-10-29dm raid: fix inaccessible superblocks causing oops in configure_discard_supportHeinz Mauelshagen
Commit 48cf06bc5f ("dm raid: add discard support for RAID levels 4, 5 and 6") did not properly handle missing metadata device(s). A failing read of the superblock causes the metadata and data devices to be removed from the dev array in struct raid_set, setting references to both devices to NULL. configure_discard_support() nonetheless tries to access the data dev unconditionally causing an oops. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-21dm raid: ensure superblock's size matches device's logical block sizeHeinz Mauelshagen
The dm-raid superblock (struct dm_raid_superblock) is padded to 512 bytes and that size is being used to read it in from the metadata device into one preallocated page. Reading or writing this on a 512-byte sector device works fine but on a 4096-byte sector device this fails. Set the dm-raid superblock's size to the logical block size of the metadata device, because IO at that size is guaranteed too work. Also add a size check to avoid silent partial metadata loss in case the superblock should ever grow past the logical block size or PAGE_SIZE. [includes pointer math fix from Dan Carpenter] Reported-by: "Liuhua Wang" <lwang@suse.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-10-17dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacksMikulas Patocka
The shrinker uses gfp flags to indicate what kind of operation can the driver wait for. If __GFP_IO flag is present, the driver can wait for block I/O operations, if __GFP_FS flag is present, the driver can wait on operations involving the filesystem. dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block device that makes calls into the filesystem. If __GFP_IO is present and __GFP_FS isn't, dm-bufio could still block on filesystem operations if it runs on a loop block device. The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though unreproducible) deadlock involving dm-bufio and loop device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-10-10dm stripe: fix potential for leak in stripe_ctr error pathPavitra Kumar
Fix a potential struct stripe_c leak that would occur if the chunk_size exceeded the maximum allowed by dm_set_target_max_io_len (UINT_MAX). However, in practice there is no possibility of this occuring given that chunk_size is of type uint32_t. But it is good to fix this to future-proof in case dm_set_target_max_io_len's implementation were to change. Signed-off-by: Pavitra Kumar <pavitrak@nvidia.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm log userspace: fix memory leak in dm_ulog_tfr_init failure pathAlexey Khoroshilov
If cn_add_callback() fails in dm_ulog_tfr_init(), it does not deallocate prealloced memory but calls cn_del_callback(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Reviewed-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-10-05dm bufio: when done scanning return from __scan immediatelyMikulas Patocka
When __scan frees the required number of buffer entries that the shrinker requested (nr_to_scan becomes zero) it must return. Before this fix the __scan code exited only the inner loop and continued in the outer loop -- which could result in reduced performance due to extra buffers being freed (e.g. unnecessarily evicted thinp metadata needing to be synchronously re-read into bufio's cache). Also, move dm_bufio_cond_resched to __scan's inner loop, so that iterating the bufio client's lru lists doesn't result in scheduling latency. Reported-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.2+
2014-10-05dm bufio: update last_accessed when relinking a bufferJoe Thornber
The 'last_accessed' member of the dm_buffer structure was only set when the the buffer was created. This led to each buffer being discarded after dm_bufio_max_age time even if it was used recently. In practice this resulted in all thinp metadata being evicted soon after being read -- this is particularly problematic for metadata intensive workloads like multithreaded small random IO. 'last_accessed' is now updated each time the buffer is moved to the head of the LRU list, so the buffer is now properly discarded if it was not used in dm_bufio_max_age time. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.2+
2014-10-05dm raid: add discard support for RAID levels 4, 5 and 6Heinz Mauelshagen
In case of RAID levels 4, 5 and 6 we have to verify each RAID members' ability to zero data on discards to avoid stripe data corruption -- if discard_zeroes_data is not set for each RAID member discard support must be disabled. But given the uncertainty of whether or not a RAID member properly supports zeroing data on discard we require the user to explicitly allow discard support on RAID levels 4, 5, and 6 by setting a dm-raid module paramter, e.g.: dm-raid.devices_handle_discard_safely=Y Otherwise, discards could cause data corruption on RAID4/5/6. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm raid: add discard support for RAID levels 1 and 10Heinz Mauelshagen
Discard support is not enabled for RAID levels 4, 5, and 6 at this time due to concerns about unreliable discard_zeroes_data support on some hardware. Otherwise, discards could cause stripe data corruption (classic example of bad apples spoiling the bunch). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm: allow active and inactive tables to share dm_devsBenjamin Marzinski
Until this change, when loading a new DM table, DM core would re-open all of the devices in the DM table. Now, DM core will avoid redundant device opens (and closes when destroying the old table) if the old table already has a device open using the same mode. This is achieved by managing reference counts on the table_devices that DM core now stores in the mapped_device structure (rather than in the dm_table structure). So a mapped_device's active and inactive dm_tables' dm_dev lists now just point to the dm_devs stored in the mapped_device's table_devices list. This improvement in DM core's device reference counting has the side-effect of fixing a long-standing limitation of the multipath target: a DM multipath table couldn't include any paths that were unusable (failed). For example: if all paths have failed and you add a new, working, path to the table; you can't use it since the table load would fail due to it still containing failed paths. Now a re-load of a multipath table can include failed devices and when those devices become active again they can be used instantly. The device list code in dm.c isn't a straight copy/paste from the code in dm-table.c, but it's very close (aside from some variable renames). One subtle difference is that find_table_device for the tables_devices list will only match devices with the same name and mode. This is because we don't want to upgrade a device's mode in the active table when an inactive table is loaded. Access to the mapped_device structure's tables_devices list requires a mutex (tables_devices_lock), so that tables cannot be created and destroyed concurrently. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm mpath: stop queueing IO when no valid paths existBenjamin Marzinski
'queue_io' is set so that IO is queued while paths are being initialized. Clear queue_io in __choose_pgpath if there are no valid paths, since there are obviously no paths that can be initialized. Otherwise IOs to the device will back up. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm: use bioset_create_nobvec()Junichi Nomura
Since DM core uses bio_clone_fast() for both bio-based and request-based DM devices there is no need for DM's bioset to have a bvec mempool. With this patch, on arch with 4KB page for example, memory usage will be reduced by 64KB for each bio-based DM device and 1MB for each request-based DM device. For example, when you create 10,000 bio-based DM devices and 1,000 request-based DM devices, memory usage of biovec under no load is: # grep biovec /proc/slabinfo biovec-256 418068 418068 4096 ... biovec-128 0 0 2048 ... biovec-64 0 0 1024 ... biovec-16 0 0 256 ... With this patch series applied, the usage becomes: # grep biovec /proc/slabinfo biovec-256 116 116 4096 ... biovec-128 0 0 2048 ... biovec-64 0 0 1024 ... biovec-16 0 0 256 ... So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-10-05dm: remove nr_iovecs parameter from alloc_tio()Junichi Nomura
alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio. alloc_tio() takes the number of bvecs to allocate for the clone-bio. However, with v3.14's immutable biovec changes DM now uses __bio_clone_fast() and no longer needs to allocate bvecs. In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively 0. __clone_and_map_simple_bio() looked like it was passing non-zero nr_iovecs, but its value was always within the range of inline bvecs and no allocation actually happened. If allocation happened, the BUG_ON() in __bio_clone_fast() would've triggered. Remove the nr_iovecs parameter from alloc_tio() to prevent possible future bio_alloc_bioset() mis-use of a new bioset interface that will no longer allow bvecs to be allocated. Also fix extra whitespace before the __bio_clone_fast() call in __clone_and_map_simple_bio(). Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-09-30sd: Honor block layer integrity handling flagsMartin K. Petersen
A set of flags introduced in the block layer enable better control over how protection information is handled. These flags are useful for both error injection and data recovery purposes. Checking can be enabled and disabled for controller and disk, and the guard tag format is now a per-I/O property. Update sd_protect_op to communicate the relevant information to the low-level device driver via a set of flags in scsi_cmnd. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-27block: Add T10 Protection Information functionsMartin K. Petersen
The T10 Protection Information format is also used by some devices that do not go through the SCSI layer (virtual block devices, NVMe). Relocate the relevant functions to a block layer library that can be used without involving SCSI. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>