linux-toradex.git/drivers/md, branch v3.2.68

dm space map metadata: fix sm_bootstrap_get_nr_blocks()

2015-02-20T00:49:28+00:00

commit c1c6156fe4d4577444b769d7edd5dd503e57bbc9 upstream.

This function isn't right and it causes a static checker warning:

	drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
	error: potentially using uninitialized 'sb_data_size'.

It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.

Fixes: 3241b1d3e0aa ('dm: add persistent data library')
Signed-off-by: Dan Carpenter 
Acked-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Signed-off-by: Ben Hutchings

dm raid: ensure superblock's size matches device's logical block size

2014-12-14T16:23:49+00:00

commit 40d43c4b4cac4c2647bf07110d7b07d35f399a84 upstream.

The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.

Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.

Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work.  Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.

[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" 
Signed-off-by: Heinz Mauelshagen 
Signed-off-by: Dan Carpenter 
Signed-off-by: Mike Snitzer 
Signed-off-by: Ben Hutchings

dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks

2014-12-14T16:23:49+00:00

commit 9d28eb12447ee08bb5d1e8bb3195cf20e1ecd1c0 upstream.

The shrinker uses gfp flags to indicate what kind of operation can the
driver wait for. If __GFP_IO flag is present, the driver can wait for
block I/O operations, if __GFP_FS flag is present, the driver can wait on
operations involving the filesystem.

dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
device that makes calls into the filesystem. If __GFP_IO is present and
__GFP_FS isn't, dm-bufio could still block on filesystem operations if it
runs on a loop block device.

The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
unreproducible) deadlock involving dm-bufio and loop device.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
[bwh: Backported to 3.2:
 - There's only one shrinker callback
 - Adjust context]
Signed-off-by: Ben Hutchings

dm log userspace: fix memory leak in dm_ulog_tfr_init failure path

2014-12-14T16:23:46+00:00

commit 56ec16cb1e1ce46354de8511eef962a417c32c92 upstream.

If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
deallocate prealloced memory but calls cn_del_callback().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
Reviewed-by: Jonathan Brassow 
Signed-off-by: Mike Snitzer 
Signed-off-by: Ben Hutchings

dm bufio: update last_accessed when relinking a buffer

2014-12-14T16:23:46+00:00

commit eb76faf53b1ff7a77ce3f78cc98ad392ac70c2a0 upstream.

The 'last_accessed' member of the dm_buffer structure was only set when
the the buffer was created.  This led to each buffer being discarded
after dm_bufio_max_age time even if it was used recently.  In practice
this resulted in all thinp metadata being evicted soon after being read
-- this is particularly problematic for metadata intensive workloads
like multithreaded small random IO.

'last_accessed' is now updated each time the buffer is moved to the head
of the LRU list, so the buffer is now properly discarded if it was not
used in dm_bufio_max_age time.

Signed-off-by: Joe Thornber 
Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

dm crypt: fix access beyond the end of allocated space

2014-11-05T20:27:49+00:00

commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream.

The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.

This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm
crypt: use async crypto").  However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two.  This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data").  By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.

To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc.  The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.

The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block.  dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.

When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).

dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure.  However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector.  If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.

Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.

Also correct the alignment of dm_crypt_request.  struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).

Signed-off-by: Mikulas Patocka 
Signed-off-by: Ben Hutchings

md/raid6: avoid data corruption during recovery of double-degraded RAID6

2014-09-13T22:41:45+00:00

commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd upstream.

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov 
Cc: Dan Williams 
Reported-by: "Manibalan P" 
Tested-by: "Manibalan P" 
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown 
Acked-by: Dan Williams 
Signed-off-by: Ben Hutchings

md/raid1,raid10: always abort recover on write error.

2014-09-13T22:41:40+00:00

commit 2446dba03f9dabe0b477a126cbeb377854785b47 upstream.

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui 
Reported-and-tested-by: jiao hui 
Signed-off-by: NeilBrown 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

dm io: fix a race condition in the wake up code for sync_io

2014-08-06T17:07:37+00:00

commit 10f1d5d111e8aed46a0f1179faf9a3cf422f689e upstream.

There's a race condition between the atomic_dec_and_test(&io->count)
in dec_count() and the waking of the sync_io() thread.  If the thread
is spuriously woken immediately after the decrement it may exit,
making the on stack io struct invalid, yet the dec_count could still
be using it.

Fix this race by using a completion in sync_io() and dec_count().

Reported-by: Minfei Huang 
Signed-off-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Acked-by: Mikulas Patocka 
[bwh: Backported to 3.2: use wait_for_completion() as wait_for_completion_io()
 is not available]
Signed-off-by: Ben Hutchings

md: flush writes before starting a recovery.

2014-08-06T17:07:33+00:00

commit 133d4527eab8d199a62eee6bd433f0776842df2e upstream.

When we write to a degraded array which has a bitmap, we
make sure the relevant bit in the bitmap remains set when
the write completes (so a 're-add' can quickly rebuilt a
temporarily-missing device).

If, immediately after such a write starts, we incorporate a spare,
commence recovery, and skip over the region where the write is
happening (because the 'needs recovery' flag isn't set yet),
then that write will not get to the new device.

Once the recovery finishes the new device will be trusted, but will
have incorrect data, leading to possible corruption.

We cannot set the 'needs recovery' flag when we start the write as we
do not know easily if the write will be "degraded" or not.  That
depends on details of the particular raid level and particular write
request.

This patch fixes a corruption issue of long standing and so it
suitable for any -stable kernel.  It applied correctly to 3.0 at
least and will minor editing to earlier kernels.

Reported-by: Bill 
Tested-by: Bill 
Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net
Signed-off-by: NeilBrown 
Signed-off-by: Ben Hutchings