linux-toradex.git/drivers/md, branch v2.6.35.3

md/raid10: fix deadlock with unaligned read during resync

2010-08-13T20:30:49+00:00

commit 51e9ac77035a3dfcb6fc0a88a0d80b6f99b5edb1 upstream.

If the 'bio_split' path in raid10-read is used while
resync/recovery is happening it is possible to deadlock.
Fix this be elevating ->nr_waiting for the duration of both
parts of the split request.

This fixes a bug that has been present since 2.6.22
but has only started manifesting recently for unknown reasons.
It is suitable for and -stable since then.

Reported-by:  Justin Bronder 
Tested-by:  Justin Bronder 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: fix another deadlock with removing sysfs attributes.

2010-08-13T20:30:48+00:00

commit bb4f1e9d0e2ef93de8e36ca0f5f26625fcd70b7d upstream.

Move the deletion of sysfs attributes from reconfig_mutex to
open_mutex didn't really help as a process can try to take
open_mutex while holding reconfig_mutex, so the same deadlock can
happen, just requiring one more process to be involved in the chain.

I looks like I cannot easily use locking to wait for the sysfs
deletion to complete, so don't.

The only things that we cannot do while the deletions are still
pending is other things which can change the sysfs namespace: run,
takeover, stop.  Each of these can fail with -EBUSY.
So set a flag while doing a sysfs deletion, and fail run, takeover,
stop if that flag is set.

This is suitable for 2.6.35.x

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: move revalidate_disk() back outside open_mutex

2010-08-13T20:30:48+00:00

commit 147e0b6a639ac581ca3bf627bedc3f4a6d3eca66 upstream.

Commit b821eaa5 "md: remove ->changed and related code" moved
revalidate_disk() under open_mutex, and lockdep noticed.

[ INFO: possible circular locking dependency detected ]
2.6.32-mdadm-locking #1
-------------------------------------------------------
mdadm/3640 is trying to acquire lock:
 (&bdev->bd_mutex){+.+.+.}, at: [] revalidate_disk+0x5b/0x90

but task is already holding lock:
 (&mddev->open_mutex){+.+...}, at: [] do_md_stop+0x4a/0x4d0 [md_mod]

which lock already depends on the new lock.

It is suitable for 2.6.35.x

Reported-by: Przemyslaw Czarnowski 
Signed-off-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid5: don't include 'spare' drives when reshaping to fewer devices.

2010-06-24T03:36:04+00:00

There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.

Signed-off-by: NeilBrown

md/raid5: add a missing 'continue' in a loop.

2010-06-24T03:35:49+00:00

As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem.  The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.

Signed-off-by: NeilBrown

md/raid5: Allow recovered part of partially recovered devices to be in-sync

2010-06-24T03:35:39+00:00

During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure.  Treating the data as in-sync can mean the difference between
data-safety and data-loss.

Signed-off-by: NeilBrown

md/raid5: More careful check for "has array failed".

2010-06-24T03:35:27+00:00

When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

  good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.

Reported-by: Terry Morris 
Signed-off-by: NeilBrown

md: Don't update ->recovery_offset when reshaping an array to fewer devices.

2010-06-24T03:35:18+00:00

When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the ->recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5->RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -> 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown

md/raid5: avoid oops when number of devices is reduced then increased.

2010-06-24T03:35:02+00:00

The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant.  Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.

Signed-off-by: NeilBrown

md: enable raid4->raid0 takeover

2010-06-24T03:34:57+00:00

Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela 
Signed-off-by: NeilBrown