linux-toradex.git/drivers/md, branch v2.6.34.4

md/raid1: delay reads that could overtake behind-writes.

2010-08-13T20:27:37+00:00

commit e555190d82c0f58e825e3cbd9e6ebe2e7ac713bd upstream.

When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.

So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid10: fix deadlock with unaligned read during resync

2010-08-13T20:27:20+00:00

commit 51e9ac77035a3dfcb6fc0a88a0d80b6f99b5edb1 upstream.

If the 'bio_split' path in raid10-read is used while
resync/recovery is happening it is possible to deadlock.
Fix this be elevating ->nr_waiting for the duration of both
parts of the split request.

This fixes a bug that has been present since 2.6.22
but has only started manifesting recently for unknown reasons.
It is suitable for and -stable since then.

Reported-by:  Justin Bronder 
Tested-by:  Justin Bronder 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: raid10: Fix null pointer dereference in fix_read_error()

2010-08-02T17:29:42+00:00

commit 0544a21db02c1d8883158fd6f323364f830a120a upstream.

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Signed-off-by: Prasanna S. Panchamukhi 
Signed-off-by: Rob Becker 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: manage redundancy group in sysfs when changing level.

2010-07-05T18:22:30+00:00

commit a64c876fd357906a1f7193723866562ad290654c upstream.

Some levels expect the 'redundancy group' to be present,
others don't.
So when we change level of an array we might need to
add or remove this group.

This requires fixing up the current practice of overloading ->private
to indicate (when ->pers == NULL) that something needs to be removed.
So create a new ->to_remove to fill that role.

When changing levels, we may need to add or remove attributes.  When
changing RAID5 -> RAID6, we both add and remove the same thing.  It is
important to catch this and optimise it out as the removal is delayed
until a lock is released, so trying to add immediately would cause
problems.


Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: set mddev readonly flag on blkdev BLKROSET ioctl

2010-07-05T18:22:24+00:00

commit e2218350465e7e0931676b4849b594c978437bce upstream.

When the user sets the block device to readwrite then the mddev should
follow suit.  Otherwise, the BUG_ON in md_write_start() will be set to
trigger.

The reverse direction, setting mddev->ro to match a set readonly
request, can be ignored because the blkdev level readonly flag precludes
the need to have mddev->ro set correctly.  Nevermind the fact that
setting mddev->ro to 1 may fail if the array is in use.

Signed-off-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: remove unneeded sysfs files more promptly

2010-07-05T18:22:23+00:00

commit b6eb127d274385d81ce8dd45c98190f097bce1b4 upstream.

When an array is stopped we need to remove some
sysfs files which are dependent on the type of array.

We need to delay that deletion as deleting them while holding
reconfig_mutex can lead to deadlocks.

We currently delay them until the array is completely destroyed.
However it is possible to deactivate and then reactivate the array.
It is also possible to need to remove sysfs files when changing level,
which can potentially happen several times before an array is
destroyed.

So we need to delete these files more promptly: as soon as
reconfig_mutex is dropped.

We need to ensure this happens before do_md_run can restart the array,
so we use open_mutex for some extra locking.  This is not deadlock
prone.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/linear: avoid possible oops and array stop

2010-07-05T18:22:23+00:00

commit ef2f80ff7325b2c1888ff02ead28957b5840bf51 upstream.

Since commit ef286f6fa673cd7fb367e1b145069d8dbfcc6081
it has been important that each personality clears
->private in the ->stop() function, or sets it to a
attribute group to be removed.
linear.c doesn't.  This can sometimes lead to an oops,
though it doesn't always.

Suitable for 2.6.33-stable and 2.6.34.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Fix read balancing in RAID1 and RAID10 on drives > 2TB

2010-07-05T18:22:22+00:00

commit af3a2cd6b8a479345786e7fe5e199ad2f6240e56 upstream.

read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.

Reported-by: Jordan Russell 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid1: fix counting of write targets.

2010-07-05T18:22:21+00:00

commit 964147d5c86d63be79b442c30f3783d49860c078 upstream.

There is a very small race window when writing to a
RAID1 such that if a device is marked faulty at exactly the wrong
time, the write-in-progress will not be sent to the device,
but the bitmap (if present) will be updated to say that
the write was sent.

Then if the device turned out to still be usable as was re-added
to the array, the bitmap-based-resync would skip resyncing that
block, possibly leading to corruption.  This would only be a problem
if no further writes were issued to that area of the device (i.e.
that bitmap chunk).

Suitable for any pending -stable kernel.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: restore ability of spare drives to spin down.

2010-05-07T11:10:57+00:00

Some time ago we stopped the clean/active metadata updates
from being written to a 'spare' device in most cases so that
it could spin down and say spun down.  Device failure/removal
etc are still recorded on spares.

However commit 51d5668cb2e3fd1827a55 broke this 50% of the time,
depending on whether the event count is even or odd.
The change log entry said:

   This means that the alignment between 'odd/even' and
    'clean/dirty' might take a little longer to attain,

how ever the code makes no attempt to create that alignment, so it
could take arbitrarily long.

So when we find that clean/dirty is not aligned with odd/even,
force a second metadata-update immediately.  There are already cases
where a second metadata-update is needed immediately (e.g. when a
device fails during the metadata update).  We just piggy-back on that.

Reported-by: Joe Bryant 
Signed-off-by: NeilBrown 
Cc: stable@kernel.org