<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md, branch v2.6.35.3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>md/raid10: fix deadlock with unaligned read during resync</title>
<updated>2010-08-13T20:30:49+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-08-07T11:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f986737b930114d41549f405a4efdeb13002db70'/>
<id>f986737b930114d41549f405a4efdeb13002db70</id>
<content type='text'>
commit 51e9ac77035a3dfcb6fc0a88a0d80b6f99b5edb1 upstream.

If the 'bio_split' path in raid10-read is used while
resync/recovery is happening it is possible to deadlock.
Fix this be elevating -&gt;nr_waiting for the duration of both
parts of the split request.

This fixes a bug that has been present since 2.6.22
but has only started manifesting recently for unknown reasons.
It is suitable for and -stable since then.

Reported-by:  Justin Bronder &lt;jsbronder@gentoo.org&gt;
Tested-by:  Justin Bronder &lt;jsbronder@gentoo.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 51e9ac77035a3dfcb6fc0a88a0d80b6f99b5edb1 upstream.

If the 'bio_split' path in raid10-read is used while
resync/recovery is happening it is possible to deadlock.
Fix this be elevating -&gt;nr_waiting for the duration of both
parts of the split request.

This fixes a bug that has been present since 2.6.22
but has only started manifesting recently for unknown reasons.
It is suitable for and -stable since then.

Reported-by:  Justin Bronder &lt;jsbronder@gentoo.org&gt;
Tested-by:  Justin Bronder &lt;jsbronder@gentoo.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: fix another deadlock with removing sysfs attributes.</title>
<updated>2010-08-13T20:30:48+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-08-08T11:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8fa8b10c9dc5729d3af41babae879b48ab4ee095'/>
<id>8fa8b10c9dc5729d3af41babae879b48ab4ee095</id>
<content type='text'>
commit bb4f1e9d0e2ef93de8e36ca0f5f26625fcd70b7d upstream.

Move the deletion of sysfs attributes from reconfig_mutex to
open_mutex didn't really help as a process can try to take
open_mutex while holding reconfig_mutex, so the same deadlock can
happen, just requiring one more process to be involved in the chain.

I looks like I cannot easily use locking to wait for the sysfs
deletion to complete, so don't.

The only things that we cannot do while the deletions are still
pending is other things which can change the sysfs namespace: run,
takeover, stop.  Each of these can fail with -EBUSY.
So set a flag while doing a sysfs deletion, and fail run, takeover,
stop if that flag is set.

This is suitable for 2.6.35.x

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bb4f1e9d0e2ef93de8e36ca0f5f26625fcd70b7d upstream.

Move the deletion of sysfs attributes from reconfig_mutex to
open_mutex didn't really help as a process can try to take
open_mutex while holding reconfig_mutex, so the same deadlock can
happen, just requiring one more process to be involved in the chain.

I looks like I cannot easily use locking to wait for the sysfs
deletion to complete, so don't.

The only things that we cannot do while the deletions are still
pending is other things which can change the sysfs namespace: run,
takeover, stop.  Each of these can fail with -EBUSY.
So set a flag while doing a sysfs deletion, and fail run, takeover,
stop if that flag is set.

This is suitable for 2.6.35.x

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: move revalidate_disk() back outside open_mutex</title>
<updated>2010-08-13T20:30:48+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2010-08-07T01:01:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3b9f17f709e0b691940bfd9eda45b716932696f6'/>
<id>3b9f17f709e0b691940bfd9eda45b716932696f6</id>
<content type='text'>
commit 147e0b6a639ac581ca3bf627bedc3f4a6d3eca66 upstream.

Commit b821eaa5 "md: remove -&gt;changed and related code" moved
revalidate_disk() under open_mutex, and lockdep noticed.

[ INFO: possible circular locking dependency detected ]
2.6.32-mdadm-locking #1
-------------------------------------------------------
mdadm/3640 is trying to acquire lock:
 (&amp;bdev-&gt;bd_mutex){+.+.+.}, at: [&lt;ffffffff811acecb&gt;] revalidate_disk+0x5b/0x90

but task is already holding lock:
 (&amp;mddev-&gt;open_mutex){+.+...}, at: [&lt;ffffffffa055e07a&gt;] do_md_stop+0x4a/0x4d0 [md_mod]

which lock already depends on the new lock.

It is suitable for 2.6.35.x

Reported-by: Przemyslaw Czarnowski &lt;przemyslaw.hawrylewicz.czarnowski@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 147e0b6a639ac581ca3bf627bedc3f4a6d3eca66 upstream.

Commit b821eaa5 "md: remove -&gt;changed and related code" moved
revalidate_disk() under open_mutex, and lockdep noticed.

[ INFO: possible circular locking dependency detected ]
2.6.32-mdadm-locking #1
-------------------------------------------------------
mdadm/3640 is trying to acquire lock:
 (&amp;bdev-&gt;bd_mutex){+.+.+.}, at: [&lt;ffffffff811acecb&gt;] revalidate_disk+0x5b/0x90

but task is already holding lock:
 (&amp;mddev-&gt;open_mutex){+.+...}, at: [&lt;ffffffffa055e07a&gt;] do_md_stop+0x4a/0x4d0 [md_mod]

which lock already depends on the new lock.

It is suitable for 2.6.35.x

Reported-by: Przemyslaw Czarnowski &lt;przemyslaw.hawrylewicz.czarnowski@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: don't include 'spare' drives when reshaping to fewer devices.</title>
<updated>2010-06-24T03:36:04+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:48:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3424bf6a772cff606fc4bc24a3639c937afb547f'/>
<id>3424bf6a772cff606fc4bc24a3639c937afb547f</id>
<content type='text'>
There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: add a missing 'continue' in a loop.</title>
<updated>2010-06-24T03:35:49+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:41:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2f115882499f3e5eca33d1df07b8876cc752a1ff'/>
<id>2f115882499f3e5eca33d1df07b8876cc752a1ff</id>
<content type='text'>
As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem.  The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem.  The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: Allow recovered part of partially recovered devices to be in-sync</title>
<updated>2010-06-24T03:35:39+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-17T07:25:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=415e72d034c50520ddb7ff79e7d1792c1306f0c9'/>
<id>415e72d034c50520ddb7ff79e7d1792c1306f0c9</id>
<content type='text'>
During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure.  Treating the data as in-sync can mean the difference between
data-safety and data-loss.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure.  Treating the data as in-sync can mean the difference between
data-safety and data-loss.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: More careful check for "has array failed".</title>
<updated>2010-06-24T03:35:27+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T07:17:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=674806d62fb02a22eea948c9f1b5e58e0947b728'/>
<id>674806d62fb02a22eea948c9f1b5e58e0947b728</id>
<content type='text'>
When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

  good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.

Reported-by: Terry Morris &lt;tbmorris@tbmorris.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

  good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.

Reported-by: Terry Morris &lt;tbmorris@tbmorris.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Don't update -&gt;recovery_offset when reshaping an array to fewer devices.</title>
<updated>2010-06-24T03:35:18+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T07:01:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=70fffd0bfab1558a8c64c5e903dea1fb84cd9f6b'/>
<id>70fffd0bfab1558a8c64c5e903dea1fb84cd9f6b</id>
<content type='text'>
When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the -&gt;recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5-&gt;RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -&gt; 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the -&gt;recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5-&gt;RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -&gt; 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: avoid oops when number of devices is reduced then increased.</title>
<updated>2010-06-24T03:35:02+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-06-16T06:45:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e4e11e385d1e5516ac76c956d6c25e6c2fa1b8d0'/>
<id>e4e11e385d1e5516ac76c956d6c25e6c2fa1b8d0</id>
<content type='text'>
The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant.  Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant.  Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: enable raid4-&gt;raid0 takeover</title>
<updated>2010-06-24T03:34:57+00:00</updated>
<author>
<name>Maciej Trela</name>
<email>maciej.trela@intel.com</email>
</author>
<published>2010-06-16T10:56:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=049d6c1ef983c9ac43aa423dfd752071a5b0002d'/>
<id>049d6c1ef983c9ac43aa423dfd752071a5b0002d</id>
<content type='text'>
Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.

Signed-off-by: Maciej Trela &lt;maciej.trela@intel.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
