linux-toradex.git/drivers/md, branch v4.9.49

md/raid5: add thread_group worker async_tx_issue_pending_all

2017-08-07T01:59:40+00:00

commit 7e96d559634b73a8158ee99a7abece2eacec2668 upstream.

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.

Signed-off-by: Ofer Heifetz 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

Raid5 should update rdev->sectors after reshape

2017-07-27T22:08:02+00:00

commit b5d27718f38843a74552e9a93d32e2391fd3999f upstream.

The raid5 md device is created by the disks which we don't use the total size. For example,
the size of the device is 5G and it just uses 3G of the devices to create one raid5 device.
Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid
and assemble it again. It fails.
mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean
mdadm /dev/md0 --grow --chunk=64
wait reshape to finish
mdadm -S /dev/md0
mdadm -As
The error messages:
[197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing!
[197519.821686] md: md_import_device returned -22

After reshape the data offset is changed. It selects backwards direction in this condition.
In function super_1_load it compares the available space of the underlying device with
sb->data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL.
rdev->sectors is updated in md_finish_reshape. Then sb->data_size is set in super_1_sync based
on rdev->sectors. So add md_finish_reshape in end_reshape.

Signed-off-by: Xiao Ni 
Acked-by: Guoqing Jiang 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: don't use flush_signals in userspace processes

2017-07-27T22:08:01+00:00

commit f9c79bc05a2a91f4fba8bfd653579e066714b1ec upstream.

The function flush_signals clears all pending signals for the process. It
may be used by kernel threads when we need to prepare a kernel thread for
responding to signals. However using this function for an userspaces
processes is incorrect - clearing signals without the program expecting it
can cause misbehavior.

The raid1 and raid5 code uses flush_signals in its request routine because
it wants to prepare for an interruptible wait. This patch drops
flush_signals and uses sigprocmask instead to block all signals (including
SIGKILL) around the schedule() call. The signals are not lost, but the
schedule() call won't respond to them.

Signed-off-by: Mikulas Patocka 
Acked-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

dm mpath: cleanup -Wbool-operation warning in choose_pgpath()

2017-07-27T22:07:55+00:00

commit d19a55ccad15a486ffe03030570744e5d5bd9f8e upstream.

Reported-by: David Binderman 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

md: fix super_offset endianness in super_1_rdev_size_change

2017-07-15T10:16:15+00:00

commit 3fb632e40d7667d8bedfabc28850ac06d5493f54 upstream.

The sb->super_offset should be big-endian, but the rdev->sb_start is in
host byte order, so fix this by adding cpu_to_le64.

Signed-off-by: Jason Yan 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: fix incorrect use of lexx_to_cpu in does_sb_need_changing

2017-07-15T10:16:15+00:00

commit 1345921393ba23b60d3fcf15933e699232ad25ae upstream.

The sb->layout is of type __le32, so we shoud use le32_to_cpu.

Signed-off-by: Jason Yan 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

dm thin: do not queue freed thin mapping for next stage processing

2017-07-05T12:40:18+00:00

commit 00a0ea33b495ee6149bf5a77ac5807ce87323abb upstream.

process_prepared_discard_passdown_pt1() should cleanup
dm_thin_new_mapping in cases of error.

dm_pool_inc_data_range() can fail trying to get a block reference:

metadata operation 'dm_pool_inc_data_range' failed: error = -61

When dm_pool_inc_data_range() fails, dm thin aborts current metadata
transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
is released as well. However, current thin mapping will be queued
onto next stage as part of queue_passdown_pt2() or passdown_endio().
This dangling thin mapping memory when processed and accessed in
next stage will lead to device mapper crashing.

Code flow without fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> discard passdown
      --> passdown_endio(m) queues m onto next stage
   -> dm_pool_inc_data_range() fails, frees memory m
            but does not remove it from next stage queue

-> process_prepared_discard_passdown_pt2(m)
   -> processes freed memory m and crashes

One such stack:

Call Trace:
[] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
[] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
[] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
[] process_prepared+0x81/0xa0 [dm_thin_pool]
[] do_worker+0xc5/0x820 [dm_thin_pool]
[] ? __schedule+0x244/0x680
[] ? pwq_activate_delayed_work+0x42/0xb0
[] process_one_work+0x153/0x3f0
[] worker_thread+0x12b/0x4b0
[] ? rescuer_thread+0x350/0x350
[] kthread+0xca/0xe0
[] ? kthread_park+0x60/0x60
[] ret_from_fork+0x25/0x30

The fix is to first take the block ref count for discarded block and
then do a passdown discard of this block. If block ref count fails,
then bail out aborting current metadata transaction, mark pool as
PM_READ_ONLY and also free current thin mapping memory (existing error
handling code) without queueing this thin mapping onto next stage of
processing. If block ref count succeeds, then passdown discard of this
block. Discard callback of passdown_endio() will queue this thin mapping
onto next stage of processing.

Code flow with fix:
-> process_prepared_discard_passdown_pt1(m)
   -> dm_thin_remove_range()
   -> dm_pool_inc_data_range()
      --> if fails, free memory m and bail out
   -> discard passdown
      --> passdown_endio(m) queues m onto next stage

Reviewed-by: Eduardo Valentin 
Reviewed-by: Cristian Gafton 
Reviewed-by: Anchal Agarwal 
Signed-off-by: Vallish Vaidyeshwara 
Reviewed-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop

2017-05-25T13:44:33+00:00

commit 065e519e71b2c1f41936cce75b46b5ab34adb588 upstream.

if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
needs to be cleared in md_ioctl after any call to md_set_readonly()
or do_md_stop().

Signed-off-by: NeilBrown 
Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: Zhilong Liu 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

md: update slab_cache before releasing new stripes when stripes resizing

2017-05-25T13:44:33+00:00

commit 583da48e388f472e8818d9bb60ef6a1d40ee9f9d upstream.

When growing raid5 device on machine with small memory, there is chance that
mdadm will be killed and the following bug report can be observed. The same
bug could also be reproduced in linux-4.10.6.

[57600.075774] BUG: unable to handle kernel NULL pointer dereference at           (null)
[57600.083796] IP: [] _raw_spin_lock+0x7/0x20
[57600.110378] PGD 421cf067 PUD 4442d067 PMD 0
[57600.114678] Oops: 0002 [#1] SMP
[57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P           O    4.2.8 #1
[57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013
[57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000
[57600.204963] RIP: 0010:[]  [] _raw_spin_lock+0x7/0x20
[57600.213057] RSP: 0018:ffff880043073810  EFLAGS: 00010046
[57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0
[57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000
[57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282
[57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838
[57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00
[57600.253999] FS:  00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
[57600.262078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0
[57600.274942] Stack:
[57600.276949]  ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f
[57600.284383]  ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98
[57600.291820]  ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968
[57600.299254] Call Trace:
[57600.301698]  [] ? cache_flusharray+0x35/0xe0
[57600.307523]  [] ? __page_cache_release+0x23/0x110
[57600.313779]  [] kmem_cache_free+0x63/0xc0
[57600.319344]  [] drop_one_stripe+0x62/0x90
[57600.324915]  [] raid5_cache_scan+0x8b/0xb0
[57600.330563]  [] shrink_slab.part.36+0x19a/0x250
[57600.336650]  [] shrink_zone+0x23c/0x250
[57600.342039]  [] do_try_to_free_pages+0x153/0x420
[57600.348210]  [] try_to_free_pages+0x91/0xa0
[57600.353959]  [] __alloc_pages_nodemask+0x4d1/0x8b0
[57600.360303]  [] check_reshape+0x62b/0x770
[57600.365866]  [] raid5_check_reshape+0x55/0xa0
[57600.371778]  [] update_raid_disks+0xc7/0x110
[57600.377604]  [] md_ioctl+0xd83/0x1b10
[57600.382827]  [] blkdev_ioctl+0x170/0x690
[57600.388307]  [] block_ioctl+0x38/0x40
[57600.393525]  [] do_vfs_ioctl+0x2b5/0x480
[57600.399010]  [] ? vfs_write+0x14b/0x1f0
[57600.404400]  [] SyS_ioctl+0x3c/0x70
[57600.409447]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00  0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d
[57600.435460] RIP  [] _raw_spin_lock+0x7/0x20
[57600.441208]  RSP 
[57600.444690] CR2: 0000000000000000
[57600.448000] ---[ end trace cbc6b5cc4bf9831d ]---

The problem is that resize_stripes() releases new stripe_heads before assigning new
slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
after resize_stripes() starting releasing new stripes but right before new slab cache
being assigned, it is possible that these new stripe_heads will be freed with the old
slab_cache which was already been destoryed and that triggers this bug.

Signed-off-by: Dennis Yang 
Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.")
Reviewed-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

dm space map disk: fix some book keeping in the disk space map

2017-05-25T13:44:33+00:00

commit 0377a07c7a035e0d033cd8b29f0cb15244c0916a upstream.

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Signed-off-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman