linux-toradex.git/drivers/md/raid5.c, branch v2.6.20.19

Fix various bugs with aligned reads in RAID5.

2007-03-09T18:50:19+00:00

Fix various bugs with aligned reads in RAID5.

It is possible for raid5 to be sent a bio that is too big
for an underlying device.  So if it is a READ that we
pass stright down to a device, it will fail and confuse
RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the
parameters for the target device and if it doesn't fit, fall back
on reading through the stripe cache and making lots of one-page
requests.

Note that this is the earliest time we can check against the device
because earlier we don't have a lock on the device, so it could change
underneath us.

Also, the code for handling a retry through the cache when a read
fails has not been tested and was badly broken.  This patch fixes that
code.

Signed-off-by: Neil Brown 
Signed-off-by: Greg Kroah-Hartman

[PATCH] md: remove unnecessary printk when raid5 gets an unaligned read.

2007-01-26T21:51:00+00:00

raid5_mergeable_bvec tries to ensure that raid5 never sees a read request
that does not fit within just one chunk.  However as we must always accept
a single-page read, that is not always possible.

So when "in_chunk_boundary" fails, it might be unusual, but it is not a
problem and printing a message every time is a bad idea.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: fix potential memalloc deadlock in md

2007-01-26T21:51:00+00:00

If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: Don't assume that READ==0 and WRITE==1 - use the names explicitly

2006-12-13T17:05:48+00:00

Thanks Jens for alerting me to this.

Cc: Jens Axboe 
Cc: 
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5

2006-12-10T17:57:21+00:00

Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels.  While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: remove some old ifdefed-out code from raid5.c

2006-12-10T17:57:21+00:00

There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c.  This was never updated after the change from
buffer_heads to bios, but was left as a reminder.

That functionality has nowe been implemented in a completely different way, so
the old code can go.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx

2006-12-10T17:57:21+00:00

stripe_to_pdidx finds the index of the parity disk for a given stripe.  It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each other
out.  The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number.  But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.

So it is innocuous but needs to be fixed.

Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: enable bypassing cache for reads

2006-12-10T17:57:20+00:00

Call the chunk_aligned_read where appropriate.

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: allow reads that have bypassed the cache to be retried on failure

2006-12-10T17:57:20+00:00

If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown 

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev->data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown 

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

[PATCH] md: handle bypassing the read cache (assuming nothing fails)

2006-12-10T17:57:20+00:00

Signed-off-by: Neil Brown 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds