<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/raid5.c, branch v2.6.20.19</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Fix various bugs with aligned reads in RAID5.</title>
<updated>2007-03-09T18:50:19+00:00</updated>
<author>
<name>Neil Brown</name>
<email>neilb@suse.de</email>
</author>
<published>2007-02-06T23:26:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=789daef87cd3fe0ed77dfc8a26356982f14fa023'/>
<id>789daef87cd3fe0ed77dfc8a26356982f14fa023</id>
<content type='text'>
Fix various bugs with aligned reads in RAID5.

It is possible for raid5 to be sent a bio that is too big
for an underlying device.  So if it is a READ that we
pass stright down to a device, it will fail and confuse
RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the
parameters for the target device and if it doesn't fit, fall back
on reading through the stripe cache and making lots of one-page
requests.

Note that this is the earliest time we can check against the device
because earlier we don't have a lock on the device, so it could change
underneath us.

Also, the code for handling a retry through the cache when a read
fails has not been tested and was badly broken.  This patch fixes that
code.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix various bugs with aligned reads in RAID5.

It is possible for raid5 to be sent a bio that is too big
for an underlying device.  So if it is a READ that we
pass stright down to a device, it will fail and confuse
RAID5.

So in 'chunk_aligned_read' we check that the bio fits within the
parameters for the target device and if it doesn't fit, fall back
on reading through the stripe cache and making lots of one-page
requests.

Note that this is the earliest time we can check against the device
because earlier we don't have a lock on the device, so it could change
underneath us.

Also, the code for handling a retry through the cache when a read
fails has not been tested and was badly broken.  This patch fixes that
code.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: remove unnecessary printk when raid5 gets an unaligned read.</title>
<updated>2007-01-26T21:51:00+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2007-01-26T08:57:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c20086de9319ac406f1e96ad459763c9f9965b18'/>
<id>c20086de9319ac406f1e96ad459763c9f9965b18</id>
<content type='text'>
raid5_mergeable_bvec tries to ensure that raid5 never sees a read request
that does not fit within just one chunk.  However as we must always accept
a single-page read, that is not always possible.

So when "in_chunk_boundary" fails, it might be unusual, but it is not a
problem and printing a message every time is a bad idea.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
raid5_mergeable_bvec tries to ensure that raid5 never sees a read request
that does not fit within just one chunk.  However as we must always accept
a single-page read, that is not always possible.

So when "in_chunk_boundary" fails, it might be unusual, but it is not a
problem and printing a message every time is a bad idea.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: fix potential memalloc deadlock in md</title>
<updated>2007-01-26T21:51:00+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2007-01-26T08:57:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2a2275d630b982e5f90206f9bc497f6695a3ec5d'/>
<id>2a2275d630b982e5f90206f9bc497f6695a3ec5d</id>
<content type='text'>
If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held,
it is possible for a deadlock to eventuate.

This happens if the array was marked 'clean', and the memalloc triggers a
write-out to the md device.

For the writeout to succeed, the array must be marked 'dirty', and that
requires getting the mddev_lock.

So, before attempting a GFP_KERNEL allocation while holding the lock, make
sure the array is marked 'dirty' (unless it is currently read-only).

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: Don't assume that READ==0 and WRITE==1 - use the names explicitly</title>
<updated>2006-12-13T17:05:48+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2006-12-13T08:34:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=802ba064c49f655d20fed563f2a4924c8256ea10'/>
<id>802ba064c49f655d20fed563f2a4924c8256ea10</id>
<content type='text'>
Thanks Jens for alerting me to this.

Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: &lt;raziebe@gmail.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Thanks Jens for alerting me to this.

Cc: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Cc: &lt;raziebe@gmail.com&gt;
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5</title>
<updated>2006-12-10T17:57:21+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2006-12-10T10:20:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c2b00852fbae4f8c45c2651530ded3bd01bde814'/>
<id>c2b00852fbae4f8c45c2651530ded3bd01bde814</id>
<content type='text'>
Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels.  While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error
to higher levels.  While this should be sufficient, it is safer to explicitly
set the error code as well - less room for confusion.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: remove some old ifdefed-out code from raid5.c</title>
<updated>2006-12-10T17:57:21+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2006-12-10T10:20:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8c6b645563d641df91fdcfd84a9c73c91d75b61'/>
<id>b8c6b645563d641df91fdcfd84a9c73c91d75b61</id>
<content type='text'>
There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c.  This was never updated after the change from
buffer_heads to bios, but was left as a reminder.

That functionality has nowe been implemented in a completely different way, so
the old code can go.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are some vestiges of old code that was used for bypassing the stripe
cache on reads in raid5.c.  This was never updated after the change from
buffer_heads to bios, but was left as a reminder.

That functionality has nowe been implemented in a completely different way, so
the old code can go.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidx</title>
<updated>2006-12-10T17:57:21+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2006-12-10T10:20:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b875e531fc82db592d6093594593d5cafde0a1cd'/>
<id>b875e531fc82db592d6093594593d5cafde0a1cd</id>
<content type='text'>
stripe_to_pdidx finds the index of the parity disk for a given stripe.  It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each other
out.  The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number.  But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.

So it is innocuous but needs to be fixed.

Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
stripe_to_pdidx finds the index of the parity disk for a given stripe.  It
assumes raid5 in that it uses "disks-1" to determine the number of data disks.

This is incorrect for raid6 but fortunately the two usages cancel each other
out.  The only way that 'data_disks' affects the calculation of pd_idx in
raid5_compute_sector is when it is divided into the sector number.  But as
that sector number is calculated by multiplying in the wrong value of
'data_disks' the division produces the right value.

So it is innocuous but needs to be fixed.

Also change the calculation of raid_disks in compute_blocknr to make it
more obviously correct (it seems at first to always use disks-1 too).

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: enable bypassing cache for reads</title>
<updated>2006-12-10T17:57:20+00:00</updated>
<author>
<name>Raz Ben-Jehuda(caro)</name>
<email>raziebe@gmail.com</email>
</author>
<published>2006-12-10T10:20:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5248861511d6aae4997a5aa7152824d87587b0b6'/>
<id>5248861511d6aae4997a5aa7152824d87587b0b6</id>
<content type='text'>
Call the chunk_aligned_read where appropriate.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Call the chunk_aligned_read where appropriate.

Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: allow reads that have bypassed the cache to be retried on failure</title>
<updated>2006-12-10T17:57:20+00:00</updated>
<author>
<name>Raz Ben-Jehuda(caro)</name>
<email>raziebe@gmail.com</email>
</author>
<published>2006-12-10T10:20:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46031f9a38a9773021f1872abc713d62467ac22e'/>
<id>46031f9a38a9773021f1872abc713d62467ac22e</id>
<content type='text'>
If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown &lt;neilb@suse.de&gt;

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev-&gt;data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown &lt;neilb@suse.de&gt;

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a bypass-the-cache read fails, we simply try again through the cache.  If
it fails again it will trigger normal recovery precedures.

update 1:

From: NeilBrown &lt;neilb@suse.de&gt;

1/
  chunk_aligned_read and retry_aligned_read assume that
      data_disks == raid_disks - 1
  which is not true for raid6.
  So when an aligned read request bypasses the cache, we can get the wrong data.

2/ The cloned bio is being used-after-free in raid5_align_endio
   (to test BIO_UPTODATE).

3/ We forgot to add rdev-&gt;data_offset when submitting
   a bio for aligned-read

4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
   so we need to invalidate the segment counts.

5/ We don't de-reference the rdev when the read completes.
   This means we need to record the rdev to so it is still
   available in the end_io routine.  Fortunately
   bi_next in the original bio is unused at this point so
   we can stuff it in there.

6/ We leak a cloned bio if the target rdev is not usable.

From: NeilBrown &lt;neilb@suse.de&gt;

update 2:

1/ When aligned requests fail (read error) they need to be retried
   via the normal method (stripe cache).  As we cannot be sure that
   we can process a single read in one go (we may not be able to
   allocate all the stripes needed) we store a bio-being-retried
   and a list of bioes-that-still-need-to-be-retried.
   When find a bio that needs to be retried, we should add it to
   the list, not to single-bio...

2/ We were never incrementing 'scnt' when resubmitting failed
   aligned requests.

[akpm@osdl.org: build fix]
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>[PATCH] md: handle bypassing the read cache (assuming nothing fails)</title>
<updated>2006-12-10T17:57:20+00:00</updated>
<author>
<name>Raz Ben-Jehuda(caro)</name>
<email>raziebe@gmail.com</email>
</author>
<published>2006-12-10T10:20:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f679623f50545bc0577caf2d0f8675b61162f059'/>
<id>f679623f50545bc0577caf2d0f8675b61162f059</id>
<content type='text'>
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Neil Brown &lt;neilb@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
