linux-toradex.git/drivers/md/raid10.c, branch v2.6.27.38

md/raid10: don't clear bitmap during recovery if array will still be degraded.

2009-05-20T05:20:05+00:00

commit 18055569127253755d01733f6ecc004ed02f88d0 upstream.

If we have a raid10 with multiple missing devices, and we recover just
one of these to a spare, then we risk (depending on the bitmap and
array chunk size) clearing bits of the bitmap for which recovery isn't
complete (because a device is still missing).

This can lead to a subsequent "re-add" being recovered without
any IO happening, which would result in loss of data.

This patch takes the safe approach of not clearing bitmap bits
if the array will still be degraded.

This patch is suitable for all active -stable kernels.

Cc: stable@kernel.org
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid10: Don't skip more than 1 bitmap-chunk at a time during recovery.

2009-03-17T00:52:54+00:00

commit 09b4068a7fe442efc40e9dcbcf5ff37c3338ab15 upstream.

When doing recovery on a raid10 with a write-intent bitmap, we only
need to recovery chunks that are flagged in the bitmap.

However if we choose to skip a chunk as it isn't flag, the code
currently skips the whole raid10-chunk, thus it might not recovery
some blocks that need recovering.

This patch fixes it.

In case that is confusing, it might help to understand that there
is a 'raid10 chunk size' which guides how data is distributed across
the devices, and a 'bitmap chunk size' which says how much data
corresponds to a single bit in the bitmap.

This bug only affects cases where the bitmap chunk size is smaller
than the raid10 chunk size.



Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid10: Don't call bitmap_cond_end_sync when we are doing recovery.

2009-03-17T00:52:54+00:00

commit 78200d45cde2a79c0d0ae0407883bb264caa3c18 upstream.

For raid1/4/5/6, resync (fixing inconsistencies between devices) is
very similar to recovery (rebuilding a failed device onto a spare).
The both walk through the device addresses in order.

For raid10 it can be quite different.  resync follows the 'array'
address, and makes sure all copies are the same.  Recover walks
through 'device' addresses and recreates each missing block.

The 'bitmap_cond_end_sync' function allows the write-intent-bitmap
(When present) to be updated to reflect a partially completed resync.
It makes assumptions which mean that it does not work correctly for
raid10 recovery at all.

In particularly, it can cause bitmap-directed recovery of a raid10 to
not recovery some of the blocks that need to be recovered.

So move the call to bitmap_cond_end_sync into the resync path, rather
than being in the common "resync or recovery" path.


Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: avoid races when stopping resync.

2009-03-17T00:52:54+00:00

commit 73d5c38a9536142e062c35997b044e89166e063b upstream.

There has been a race in raid10 and raid1 for a long time
which has only recently started showing up due to a scheduler changed.

When a sync_read request finishes, as soon as reschedule_retry
is called, another thread can mark the resync request as having
completed, so md_do_sync can finish, ->stop can be called, and
->conf can be freed.  So using conf after reschedule_retry is not
safe.

Similarly, when finishing a sync_write, calling md_done_sync must be
the last thing we do, as it allows a chain of events which will free
conf and other data structures.

The first of these requires action in raid10.c
The second requires action in raid1.c and raid10.c

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: fix bug in raid10 recovery.

2008-11-13T17:55:57+00:00

commit a53a6c85756339f82ff19e001e90cfba2d6299a8 upstream

Adding a spare to a raid10 doesn't cause recovery to start.
This is due to an silly type in
  commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
and so is a bug in 2.6.27 and .28-rc.

Thanks to Thomas Backlund for bisecting to find this.

Cc: Thomas Backlund 
Cc: George Spelvin 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

Allow raid10 resync to happening in larger chunks.

2008-08-05T05:56:32+00:00

The raid10 resync/recovery code currently limits the amount of
in-flight resync IO to 2Meg.  This was copied from raid1 where
it seems quite adequate.  However for raid10, some layouts require
a bit of seeking to perform a resync, and allowing a larger buffer
size means that the seeking can be significantly reduced.

There is probably no real need to limit the amount of in-flight
IO at all.  Any shortage of memory will naturally reduce the
amount of buffer space available down to a set minimum, and any
concurrent normal IO will quickly cause resync IO to back off.

The only problem would be that normal IO has to wait for all resync IO
to finish, so a very large amount of resync IO could cause unpleasant
latency when normal IO starts up.

So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
available) which seems to be a good amount.  Also reduce the amount
of memory reserved as there is no need to keep 2Meg just for resync if
memory is tight.

Thanks to Keld for the suggestion.

Cc: Keld Jørn Simonsen 
Signed-off-by: NeilBrown

Merge branch 'for-linus' of git://neil.brown.name/md

2008-08-01T18:56:07+00:00

* 'for-linus' of git://neil.brown.name/md:
  md: raid10: wake up frozen array
  md: do not count blocked devices as spares
  md: do not progress the resync process if the stripe was blocked
  md: delay notification of 'active_idle' to the recovery thread
  md: fix merge error
  md: move async_tx_issue_pending_all outside spin_lock_irq

md: raid10: wake up frozen array

2008-08-01T02:55:14+00:00

When rescheduling a bio in raid10, we wake up
the md thread, but if the array is frozen, this
will have no effect.  This causes the array to
remain frozen for eternity.  We add a wake_up
to allow the array to de-freeze.  This code is
nearly identical to the raid1 code, which has
this fix already.

Signed-off-by: Arthur Jones 
Signed-off-by: NeilBrown

Merge branch 'for-linus' of git://neil.brown.name/md

2008-07-21T17:29:12+00:00

* 'for-linus' of git://neil.brown.name/md: (52 commits)
  md: Protect access to mddev->disks list using RCU
  md: only count actual openers as access which prevent a 'stop'
  md: linear: Make array_size sector-based and rename it to array_sectors.
  md: Make mddev->array_size sector-based.
  md: Make super_type->rdev_size_change() take sector-based sizes.
  md: Fix check for overlapping devices.
  md: Tidy up rdev_size_store a bit:
  md: Remove some unused macros.
  md: Turn rdev->sb_offset into a sector-based quantity.
  md: Make calc_dev_sboffset() return a sector count.
  md: Replace calc_dev_size() by calc_num_sectors().
  md: Make update_size() take the number of sectors.
  md: Better control of when do_md_stop is allowed to stop the array.
  md: get_disk_info(): Don't convert between signed and unsigned and back.
  md: Simplify restart_array().
  md: alloc_disk_sb(): Return proper error value.
  md: Simplify sb_equal().
  md: Simplify uuid_equal().
  md: sb_equal(): Fix misleading printk.
  md: Fix a typo in the comment to cmd_match().
  ...

md: Make mddev->array_size sector-based.

2008-07-21T07:05:22+00:00

This patch renames the array_size field of struct mddev_s to array_sectors
and converts all instances to use units of 512 byte sectors instead of 1k
blocks.

Signed-off-by: Andre Noll 
Signed-off-by: NeilBrown