linux-toradex.git/include/linux/raid, branch v2.6.26-rc5

md: restart recovery cleanly after device failure.

2008-05-24T16:56:10+00:00

When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: allow parallel resync of md-devices.

2008-05-24T16:56:10+00:00

In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed.  In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
 So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.

Signed-off-by: Bernd Schubert 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: kill file_path wrapper

2008-05-24T16:56:09+00:00

Kill the trivial and rather pointless file_path wrapper around d_path.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: proper extern for mdp_major

2008-05-24T16:56:09+00:00

This patch adds a proper extern for mdp_major in include/linux/raid/md.h

Signed-off-by: Adrian Bunk 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: support blocking writes to an array on device failure

2008-04-30T15:29:33+00:00

Allows a userspace metadata handler to take action upon detecting a device
failure.

Based on an original patch by Neil Brown.

Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
 userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external

Signed-off-by: Dan Williams 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: introduce get_priority_stripe() to improve raid456 write performance

2008-04-28T15:58:42+00:00

Improve write performance by preventing the delayed_list from dumping all its
stripes onto the handle_list in one shot.  Delayed stripes are now further
delayed by being held on the 'hold_list'.  The 'hold_list' is bypassed when:

  * a STRIPE_IO_STARTED stripe is found at the head of 'handle_list'
  * 'handle_list' is empty and i/o is being done to satisfy full stripe-width
    write requests
  * 'bypass_count' is less than 'bypass_threshold'.  By default the threshold
    is 1, i.e. every other stripe handled is a preread stripe provided the
    top two conditions are false.

Benchmark data:
System: 2x Xeon 5150, 4x SATA, mem=1GB
Baseline: 2.6.24-rc7
Configuration: mdadm --create /dev/md0 /dev/sd[b-e] -n 4 -l 5 --assume-clean
Test1: dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
  * patched:  +33% (stripe_cache_size = 256), +25% (stripe_cache_size = 512)

Test2: tiobench --size 2048 --numruns 5 --block 4096 --block 131072 (XFS)
  * patched: +13%
  * patched + preread_bypass_threshold = 0: +37%

Changes since v1:
* reduce bypass_threshold from (chunk_size / sectors_per_chunk) to (1) and
  make it configurable.  This defaults to fairness and modest performance
  gains out of the box.
Changes since v2:
* [neilb@suse.de]: kill STRIPE_PRIO_HI and preread_needed as they are not
  necessary, the important change was clearing STRIPE_DELAYED in
  add_stripe_bio and this has been moved out to make_request for the hang
  fix.
* [neilb@suse.de]: simplify get_priority_stripe
* [dan.j.williams@intel.com]: reset the bypass_count when ->hold_list is
  sampled empty (+11%)
* [dan.j.williams@intel.com]: decrement the bypass_count at the detection
  of stripes being naturally promoted off of hold_list +2%.  Note, resetting
  bypass_count instead of decrementing on these events yields +4% but that is
  probably too aggressive.
Changes since v3:
* cosmetic fixups

Tested-by: James W. Laferriere 
Signed-off-by: Dan Williams 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

include: Remove unnecessary inclusions of asm/semaphore.h

2008-04-19T02:16:54+00:00

None of these files use any of the functionality promised by
asm/semaphore.h.  It's possible that they (or some user of them) rely
on it dragging in some unrelated header file, but I can't build all
these files, so we'll have to fix any build failures as they come up.

Signed-off-by: Matthew Wilcox

md: clean up irregularity with raid autodetect

2008-03-05T00:35:18+00:00

When a raid1 array is stopped, all components currently get added to the list
for auto-detection.  However we should really only add components that were
found by autodetection in the first place.  So add a flag to record that
information, and use it.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: reduce CPU wastage on idle md array with a write-intent bitmap

2008-03-05T00:35:17+00:00

On an md array with a write-intent bitmap, a thread wakes up every few seconds
and scans the bitmap looking for work to do.  If the array is idle, there will
be no work to do, but a lot of scanning is done to discover this.

So cache the fact that the bitmap is completely clean, and avoid scanning the
whole bitmap when the cache is known to be clean.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

md: change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING.

2008-02-06T18:41:19+00:00

Finish ITERATE_ to for_each conversion.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds