linux-toradex.git/drivers/md/md.c, branch v4.9.87

md: only allow remove_and_add_spares when no sync_thread running.

2018-03-11T15:21:31+00:00

commit 39772f0a7be3b3dc26c74ea13fe7847fd1522c8b upstream.

The locking protocols in md assume that a device will
never be removed from an array during resync/recovery/reshape.
When that isn't happening, rcu or reconfig_mutex is needed
to protect an rdev pointer while taking a refcount.  When
it is happening, that protection isn't needed.

Unfortunately there are cases were remove_and_add_spares() is
called when recovery might be happening: is state_store(),
slot_store() and hot_remove_disk().
In each case, this is just an optimization, to try to expedite
removal from the personality so the device can be removed from
the array.  If resync etc is happening, we just have to wait
for md_check_recover to find a suitable time to call
remove_and_add_spares().

This optimization and not essential so it doesn't
matter if it fails.
So change remove_and_add_spares() to abort early if
resync/recovery/reshape is happening, unless it is called
from md_check_recovery() as part of a newly started recovery.
The parameter "this" is only NULL when called from
md_check_recovery() so when it is NULL, there is no need to abort.

As this can result in a NULL dereference, the fix is suitable
for -stable.

cc: yuyufen 
Cc: Tomasz Majchrzak 
Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
Cc: stable@ver.kernel.org (v4.8+)
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: fix super_offset endianness in super_1_rdev_size_change

2017-07-15T10:16:15+00:00

commit 3fb632e40d7667d8bedfabc28850ac06d5493f54 upstream.

The sb->super_offset should be big-endian, but the rdev->sb_start is in
host byte order, so fix this by adding cpu_to_le64.

Signed-off-by: Jason Yan 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: fix incorrect use of lexx_to_cpu in does_sb_need_changing

2017-07-15T10:16:15+00:00

commit 1345921393ba23b60d3fcf15933e699232ad25ae upstream.

The sb->layout is of type __le32, so we shoud use le32_to_cpu.

Signed-off-by: Jason Yan 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop

2017-05-25T13:44:33+00:00

commit 065e519e71b2c1f41936cce75b46b5ab34adb588 upstream.

if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
needs to be cleared in md_ioctl after any call to md_set_readonly()
or do_md_stop().

Signed-off-by: NeilBrown 
Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: Zhilong Liu 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: fix refcount problem on mddev when stopping array.

2017-01-12T10:39:35+00:00

commit e2342ca832726a840ca6bd196dd2cc073815b08a upstream.

md_open() gets a counted reference on an mddev using mddev_find().
If it ends up returning an error, it must drop this reference.

There are two error paths where the reference is not dropped.
One only happens if the process is signalled and an awkward time,
which is quite unlikely.
The other was introduced recently in commit af8d8e6f0.

Change the code to ensure the drop the reference when returning an error,
and make it harded to re-introduce this sort of bug in the future.

Reported-by: Marc Smith 
Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: NeilBrown 
Acked-by: Guoqing Jiang 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: MD_RECOVERY_NEEDED is set for mddev->recovery

2017-01-12T10:39:34+00:00

commit 82a301cb0ea2df8a5c88213094a01660067c7fb4 upstream.

Fixes: 90f5f7ad4f38("md: Wait for md_check_recovery before attempting device
removal.")

Reviewed-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: be careful not lot leak internal curr_resync value into metadata. -- (all)

2016-10-29T05:04:05+00:00

mddev->curr_resync usually records where the current resync is up to,
but during the starting phase it has some "magic" values.

 1 - means that the array is trying to start a resync, but has yielded
     to another array which shares physical devices, and also needs to
     start a resync
 2 - means the array is trying to start resync, but has found another
     array which shares physical devices and has already started resync.

 3 - means that resync has commensed, but it is possible that nothing
     has actually been resynced yet.

It is important that this value not be visible to user-space and
particularly that it doesn't get written to the metadata, as the
resync or recovery checkpoint.  In part, this is because it may be
slightly higher than the correct value, though this is very rare.
In part, because it is not a multiple of 4K, and some devices only
support 4K aligned accesses.

There are two places where this value is propagates into either
->curr_resync_completed or ->recovery_cp or ->recovery_offset.
These currently avoid the propagation of values 1 and 3, but will
allow 3 to leak through.

Change them to only propagate the value if it is > 3.

As this can cause an array to fail, the patch is suitable for -stable.

Cc: stable@vger.kernel.org (v3.7+)
Reported-by: Viswesh 
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li

md: report 'write_pending' state when array in sync

2016-10-24T22:28:19+00:00

If there is a bad block on a disk and there is a recovery performed from
this disk, the same bad block is reported for a new disk. It involves
setting MD_CHANGE_PENDING flag in rdev_set_badblocks. For external
metadata this flag is not being cleared as array state is reported as
'clean'. The read request to bad block in RAID5 array gets stuck as it
is waiting for a flag to be cleared - as per commit c3cce6cda162
("md/raid5: ensure device failure recorded before write request
returns.").

The meaning of MD_CHANGE_PENDING and MD_CHANGE_CLEAN flags has been
clarified in commit 070dc6dd7103 ("md: resolve confusion of
MD_CHANGE_CLEAN"), however MD_CHANGE_PENDING flag has been used in
personality error handlers since and it doesn't fully comply with
initial purpose. It was supposed to notify that write request is about
to start, however now it is also used to request metadata update.
Initially (in md_allow_write, md_write_start) MD_CHANGE_PENDING flag has
been set and in_sync has been set to 0 at the same time. Error handlers
just set the flag without modifying in_sync value. Sysfs array state is
a single value so now it reports 'clean' when MD_CHANGE_PENDING flag is
set and in_sync is set to 1. Userspace has no idea it is expected to
take some action.

Swap the order that array state is checked so 'write_pending' is
reported ahead of 'clean' ('write_pending' is a misleading name but it
is too late to rename it now).

Signed-off-by: Tomasz Majchrzak 
Signed-off-by: Shaohua Li

md: set rotational bit

2016-10-03T17:20:27+00:00

if all disks in an array are non-rotational, set the array
non-rotational.

This only works for array with all disks populated at startup. Support
for disk hotadd/hotremove could be added later if necessary.

Acked-by: Tejun Heo 
Signed-off-by: Shaohua Li

md: fix a potential deadlock

2016-09-21T16:09:44+00:00

lockdep reports a potential deadlock. Fix this by droping the mutex
before md_import_device

[ 1137.126601] ======================================================
[ 1137.127013] [ INFO: possible circular locking dependency detected ]
[ 1137.127013] 4.8.0-rc4+ #538 Not tainted
[ 1137.127013] -------------------------------------------------------
[ 1137.127013] mdadm/16675 is trying to acquire lock:
[ 1137.127013]  (&bdev->bd_mutex){+.+.+.}, at: [] __blkdev_get+0x63/0x450
[ 1137.127013]
but task is already holding lock:
[ 1137.127013]  (detected_devices_mutex){+.+.+.}, at: [] md_ioctl+0x2ac/0x1f50
[ 1137.127013]
which lock already depends on the new lock.

[ 1137.127013]
the existing dependency chain (in reverse order) is:
[ 1137.127013]
-> #1 (detected_devices_mutex){+.+.+.}:
[ 1137.127013]        [] lock_acquire+0xb9/0x220
[ 1137.127013]        [] mutex_lock_nested+0x67/0x3d0
[ 1137.127013]        [] md_autodetect_dev+0x3f/0x90
[ 1137.127013]        [] rescan_partitions+0x1a8/0x2c0
[ 1137.127013]        [] __blkdev_reread_part+0x71/0xb0
[ 1137.127013]        [] blkdev_reread_part+0x25/0x40
[ 1137.127013]        [] blkdev_ioctl+0x51b/0xa30
[ 1137.127013]        [] block_ioctl+0x41/0x50
[ 1137.127013]        [] do_vfs_ioctl+0x96/0x6e0
[ 1137.127013]        [] SyS_ioctl+0x41/0x70
[ 1137.127013]        [] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 1137.127013]
-> #0 (&bdev->bd_mutex){+.+.+.}:
[ 1137.127013]        [] __lock_acquire+0x1662/0x1690
[ 1137.127013]        [] lock_acquire+0xb9/0x220
[ 1137.127013]        [] mutex_lock_nested+0x67/0x3d0
[ 1137.127013]        [] __blkdev_get+0x63/0x450
[ 1137.127013]        [] blkdev_get+0x227/0x350
[ 1137.127013]        [] blkdev_get_by_dev+0x36/0x50
[ 1137.127013]        [] lock_rdev+0x35/0x80
[ 1137.127013]        [] md_import_device+0xb4/0x1b0
[ 1137.127013]        [] md_ioctl+0x2f6/0x1f50
[ 1137.127013]        [] blkdev_ioctl+0x283/0xa30
[ 1137.127013]        [] block_ioctl+0x41/0x50
[ 1137.127013]        [] do_vfs_ioctl+0x96/0x6e0
[ 1137.127013]        [] SyS_ioctl+0x41/0x70
[ 1137.127013]        [] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 1137.127013]
other info that might help us debug this:

[ 1137.127013]  Possible unsafe locking scenario:

[ 1137.127013]        CPU0                    CPU1
[ 1137.127013]        ----                    ----
[ 1137.127013]   lock(detected_devices_mutex);
[ 1137.127013]                                lock(&bdev->bd_mutex);
[ 1137.127013]                                lock(detected_devices_mutex);
[ 1137.127013]   lock(&bdev->bd_mutex);
[ 1137.127013]
 *** DEADLOCK ***

Cc: Cong Wang 
Signed-off-by: Shaohua Li