linux-toradex.git/drivers/md/raid10.c, branch T30_LinuxImageV2.0Beta2_20130626

md/raid10: Fix bug when activating a hot-spare.

2011-11-11T17:43:29+00:00

commit 7fcc7c8acf0fba44d19a713207af7e58267c1179 upstream.

This is a fairly serious bug in RAID10.

When a RAID10 array is degraded and a hot-spare is activated, the
spare does not take up the empty slot, but rather replaces the first
working device.
This is likely to make the array non-functional.   It would normally
be possible to recover the data, but that would need care and is not
guaranteed.

This bug was introduced in commit
   2bb77736ae5dca0a189829fbb7379d43364a9dac
which first appeared in 3.1.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Avoid waking up a thread after it has been freed.

2011-09-21T05:30:20+00:00

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov" 
Signed-off-by: NeilBrown 
cc: stable@kernel.org

md/raid1,10: Remove use-after-free bug in make_request.

2011-09-10T07:21:23+00:00

A single request to RAID1 or RAID10 might result in multiple
requests if there are known bad blocks that need to be avoided.

To detect if we need to submit another write request we test:
 	if (sectors_handled < (bio->bi_size >> 9)) {

However this is after we call **_write_done() so the 'bio' no longer
belongs to us - the writes could have completed and the bio freed.

So move the **_write_done call until after the test against
bio->bi_size.

This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862

Reported-by: Bruno Wolff III 
Tested-by: Bruno Wolff III 
Signed-off-by: NeilBrown

md/raid10: unify handling of write completion.

2011-09-10T07:21:17+00:00

A write can complete at two different places:
1/ when the last member-device write completes, through
   raid10_end_write_request
2/ in make_request() when we remove the initial bias from ->remaining.

These two should do exactly the same thing and the comment says they
do, but they don't.

So factor the correct code out into a function and call it in both
places.  This makes the code much more similar to RAID1.

The difference is only significant if there is an error, and they
usually take a while, so it is unlikely that there will be an error
already when make_request is completing, so this is unlikely to cause
real problems.

Signed-off-by: NeilBrown

md/raid10: handle further errors during fix_read_error better.

2011-07-28T01:39:25+00:00

If we find more read/write errors we should record a bad block before
failing the device.

Signed-off-by: NeilBrown

md/raid10: Handle read errors during recovery better.

2011-07-28T01:39:25+00:00

Currently when we get a read error during recovery, we simply abort
the recovery.

Instead, repeat the read in page-sized blocks.
On successful reads, write to the target.
On read errors, record a bad block on the destination,
and only if that fails do we abort the recovery.

As we now retry reads we need to know where we read from.  This was in
bi_sector but that can be changed during a read attempt.
So store the correct from_addr and to_addr in the r10_bio for later
access.


Signed-off-by: NeilBrown

md/raid10: simplify read error handling during recovery.

2011-07-28T01:39:25+00:00

If a read error is detected during recovery the code currently
fails the read device.
This isn't really necessary.  recovery_request_write will signal
a write error to end_sync_write and it will record a write
error on the destination device which will record a bad block
there or kick it from the array.

So just remove this call to do md_error.

Signed-off-by: NeilBrown

md/raid10: record bad blocks due to write errors during resync/recovery.

2011-07-28T01:39:25+00:00

If we get a write error during resync/recovery don't fail the device
but instead record a bad block.  If that fails we can then fail the
device.

Signed-off-by: NeilBrown

md/raid10: attempt to fix read errors during resync/check

2011-07-28T01:39:25+00:00

We already attempt to fix read errors found during normal IO
and a 'repair' process.
It is best to try to repair them at any time they are found,
so move a test so that during sync and check a read error will
be corrected by over-writing with good data.

If both (all) devices have known bad blocks in the sync section we
won't try to fix even though the bad blocks might not overlap.  That
should be considered later.

Also if we hit a read error during recovery we don't try to fix it.
It would only be possible to fix if there were at least three copies
of data, which is not very common with RAID10.  But it should still
be considered later.

Signed-off-by: NeilBrown

md/raid10: Handle write errors by updating badblock log.

2011-07-28T01:39:24+00:00

When we get a write error (in the data area, not in metadata),
update the badblock log rather than failing the whole device.

As the write may well be many blocks, we trying writing each
block individually and only log the ones which fail.

Signed-off-by: NeilBrown