<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/raid1.c, branch v6.7-rc1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>md/raid1: don't split discard io for write behind</title>
<updated>2023-10-09T23:32:02+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-10-07T11:21:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9e55a22fce1384837c213274d1a3b93be16ed9d7'/>
<id>9e55a22fce1384837c213274d1a3b93be16ed9d7</id>
<content type='text'>
Currently, discad io is treated the same as normal write io, and for
write behind case, io size is limited to:

BIO_MAX_VECS * (PAGE_SIZE &gt;&gt; 9)

For 0.5KB sector size and 4KB PAGE_SIZE, this is just 1MB. For
consequence, if 'WriteMostly' is set to one of the underlying disks,
then diskcard io will be splited into 1MB and it will take a long time
for the diskcard to finish.

Fix this problem by disable write behind for discard io.

Reported-by: Roman Mamedov &lt;rm@romanrm.net&gt;
Closes: https://lore.kernel.org/all/6a1165f7-c792-c054-b8f0-1ad4f7b8ae01@ultracoder.org/
Reported-and-tested-by: Kirill Kirilenko &lt;kirill@ultracoder.org&gt;
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231007112105.407449-1-yukuai1@huaweicloud.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, discad io is treated the same as normal write io, and for
write behind case, io size is limited to:

BIO_MAX_VECS * (PAGE_SIZE &gt;&gt; 9)

For 0.5KB sector size and 4KB PAGE_SIZE, this is just 1MB. For
consequence, if 'WriteMostly' is set to one of the underlying disks,
then diskcard io will be splited into 1MB and it will take a long time
for the diskcard to finish.

Fix this problem by disable write behind for discard io.

Reported-by: Roman Mamedov &lt;rm@romanrm.net&gt;
Closes: https://lore.kernel.org/all/6a1165f7-c792-c054-b8f0-1ad4f7b8ae01@ultracoder.org/
Reported-and-tested-by: Kirill Kirilenko &lt;kirill@ultracoder.org&gt;
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231007112105.407449-1-yukuai1@huaweicloud.com
</pre>
</div>
</content>
</entry>
<entry>
<title>md: initialize 'writes_pending' while allocating mddev</title>
<updated>2023-09-22T17:28:26+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-08-25T03:09:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b8494823e236326500aa1004155e83f748dd10da'/>
<id>b8494823e236326500aa1004155e83f748dd10da</id>
<content type='text'>
Currently 'writes_pending' is initialized in pers-&gt;run for raid1/5/10,
and it's freed while deleing mddev, instead of pers-&gt;free. pers-&gt;run can
be called multiple times before mddev is deleted, and a helper
mddev_init_writes_pending() is used to prevent 'writes_pending' to be
initialized multiple times, this usage is safe but a litter weird.

On the other hand, 'writes_pending' is only initialized for raid1/5/10,
however, it's used in common layer, for example:

array_state_store
 set_in_sync
  if (!mddev-&gt;in_sync) -&gt; in_sync is used for all levels
   // access writes_pending

There might be some implicit dependency that I don't recognized to make
sure 'writes_pending' can only be accessed for raid1/5/10, but there are
no comments about that.

By the way, it make sense to initialize 'writes_pending' in common layer
because there are already three levels use it.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently 'writes_pending' is initialized in pers-&gt;run for raid1/5/10,
and it's freed while deleing mddev, instead of pers-&gt;free. pers-&gt;run can
be called multiple times before mddev is deleted, and a helper
mddev_init_writes_pending() is used to prevent 'writes_pending' to be
initialized multiple times, this usage is safe but a litter weird.

On the other hand, 'writes_pending' is only initialized for raid1/5/10,
however, it's used in common layer, for example:

array_state_store
 set_in_sync
  if (!mddev-&gt;in_sync) -&gt; in_sync is used for all levels
   // access writes_pending

There might be some implicit dependency that I don't recognized to make
sure 'writes_pending' can only be accessed for raid1/5/10, but there are
no comments about that.

By the way, it make sense to initialize 'writes_pending' in common layer
because there are already three levels use it.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1: fix error: ISO C90 forbids mixed declarations</title>
<updated>2023-09-11T21:41:58+00:00</updated>
<author>
<name>Nigel Croxon</name>
<email>ncroxon@redhat.com</email>
</author>
<published>2023-09-11T21:25:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=df203da47f4428bc286fc99318936416253a321c'/>
<id>df203da47f4428bc286fc99318936416253a321c</id>
<content type='text'>
There is a compile error when this commit is added:
md: raid1: fix potential OOB in raid1_remove_disk()

drivers/md/raid1.c: In function 'raid1_remove_disk':
drivers/md/raid1.c:1844:9: error: ISO C90 forbids mixed declarations
and code [-Werror=declaration-after-statement]
1844 |         struct raid1_info *p = conf-&gt;mirrors + number;
     |         ^~~~~~

That's because the new code was inserted before the struct.
The change is move the struct command above this commit.

Fixes: 8b0472b50bcf ("md: raid1: fix potential OOB in raid1_remove_disk()")
Signed-off-by: Nigel Croxon &lt;ncroxon@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/46d929d0-2aab-4cf2-b2bf-338963e8ba5a@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is a compile error when this commit is added:
md: raid1: fix potential OOB in raid1_remove_disk()

drivers/md/raid1.c: In function 'raid1_remove_disk':
drivers/md/raid1.c:1844:9: error: ISO C90 forbids mixed declarations
and code [-Werror=declaration-after-statement]
1844 |         struct raid1_info *p = conf-&gt;mirrors + number;
     |         ^~~~~~

That's because the new code was inserted before the struct.
The change is move the struct command above this commit.

Fixes: 8b0472b50bcf ("md: raid1: fix potential OOB in raid1_remove_disk()")
Signed-off-by: Nigel Croxon &lt;ncroxon@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/46d929d0-2aab-4cf2-b2bf-338963e8ba5a@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>md raid1: allow writebehind to work on any leg device set WriteMostly</title>
<updated>2023-08-18T04:11:31+00:00</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2023-08-14T12:01:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6b2460e66ce6d483b5ff77227ac799d6e8a9ebd6'/>
<id>6b2460e66ce6d483b5ff77227ac799d6e8a9ebd6</id>
<content type='text'>
As the WriteMostly flag can be set on any component device of a RAID1
array, remove the constraint that it only works if set on the first one.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Tested-by: Xiao Ni &lt;xni@redhat.com&gt;
Link: https://lore.kernel.org/r/2a9592bf3340f34bf588eec984b23ee219f3985e.1692013451.git.heinzm@redhat.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As the WriteMostly flag can be set on any component device of a RAID1
array, remove the constraint that it only works if set on the first one.

Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Tested-by: Xiao Ni &lt;xni@redhat.com&gt;
Link: https://lore.kernel.org/r/2a9592bf3340f34bf588eec984b23ee219f3985e.1692013451.git.heinzm@redhat.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1: hold the barrier until handle_read_error() finishes</title>
<updated>2023-08-18T04:11:30+00:00</updated>
<author>
<name>Xueshi Hu</name>
<email>xueshi.hu@smartx.com</email>
</author>
<published>2023-08-14T13:53:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c069da449a13669ffa754fd971747e7e17e7d691'/>
<id>c069da449a13669ffa754fd971747e7e17e7d691</id>
<content type='text'>
handle_read_error() will call allow_barrier() to match the former barrier
raising. However, it should put the allow_barrier() at the end to avoid a
concurrent raid reshape.

Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-4-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
handle_read_error() will call allow_barrier() to match the former barrier
raising. However, it should put the allow_barrier() at the end to avoid a
concurrent raid reshape.

Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-4-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1: free the r1bio before waiting for blocked rdev</title>
<updated>2023-08-18T04:11:30+00:00</updated>
<author>
<name>Xueshi Hu</name>
<email>xueshi.hu@smartx.com</email>
</author>
<published>2023-08-14T13:53:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=992db13a4aee766c8bfbf046ad15c2db5fa7cab8'/>
<id>992db13a4aee766c8bfbf046ad15c2db5fa7cab8</id>
<content type='text'>
Raid1 reshape will change mempool and r1conf::raid_disks which are
needed to free r1bio. allow_barrier() make a concurrent raid1_reshape()
possible. So, free the in-flight r1bio before waiting blocked rdev.

Fixes: 6bfe0b499082 ("md: support blocking writes to an array on device failure")
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-3-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Raid1 reshape will change mempool and r1conf::raid_disks which are
needed to free r1bio. allow_barrier() make a concurrent raid1_reshape()
possible. So, free the in-flight r1bio before waiting blocked rdev.

Fixes: 6bfe0b499082 ("md: support blocking writes to an array on device failure")
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-3-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()</title>
<updated>2023-08-18T04:11:30+00:00</updated>
<author>
<name>Xueshi Hu</name>
<email>xueshi.hu@smartx.com</email>
</author>
<published>2023-08-14T13:53:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c5d736f548ec5aab7e877872417ac23a5c42f1fd'/>
<id>c5d736f548ec5aab7e877872417ac23a5c42f1fd</id>
<content type='text'>
After allow_barrier, a concurrent raid1_reshape() will replace old mempool
and r1conf::raid_disks. Move allow_barrier() to the end of raid_end_bio_io(),
so that r1bio can be freed safely.

Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-2-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After allow_barrier, a concurrent raid1_reshape() will replace old mempool
and r1conf::raid_disks. Move allow_barrier() to the end of raid_end_bio_io(),
so that r1bio can be freed safely.

Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Xueshi Hu &lt;xueshi.hu@smartx.com&gt;
Link: https://lore.kernel.org/r/20230814135356.1113639-2-xueshi.hu@smartx.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Hold mddev-&gt;reconfig_mutex when trying to get mddev-&gt;sync_thread</title>
<updated>2023-08-15T16:40:26+00:00</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2023-08-03T07:17:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7eb8ff02c1df279bf7f7f29b866beb655a9eebe9'/>
<id>7eb8ff02c1df279bf7f7f29b866beb655a9eebe9</id>
<content type='text'>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: raid1: fix potential OOB in raid1_remove_disk()</title>
<updated>2023-08-15T16:37:27+00:00</updated>
<author>
<name>Zhang Shurong</name>
<email>zhang_shurong@foxmail.com</email>
</author>
<published>2023-07-22T07:53:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8b0472b50bcf0f19a5119b00a53b63579c8e1e4d'/>
<id>8b0472b50bcf0f19a5119b00a53b63579c8e1e4d</id>
<content type='text'>
If rddev-&gt;raid_disk is greater than mddev-&gt;raid_disks, there will be
an out-of-bounds in raid1_remove_disk(). We have already found
similar reports as follows:

1) commit d17f744e883b ("md-raid10: fix KASAN warning")
2) commit 1ebc2cec0b7d ("dm raid: fix KASAN warning in raid5_remove_disk")

Fix this bug by checking whether the "number" variable is
valid.

Signed-off-by: Zhang Shurong &lt;zhang_shurong@foxmail.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/tencent_0D24426FAC6A21B69AC0C03CE4143A508F09@qq.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If rddev-&gt;raid_disk is greater than mddev-&gt;raid_disks, there will be
an out-of-bounds in raid1_remove_disk(). We have already found
similar reports as follows:

1) commit d17f744e883b ("md-raid10: fix KASAN warning")
2) commit 1ebc2cec0b7d ("dm raid: fix KASAN warning in raid5_remove_disk")

Fix this bug by checking whether the "number" variable is
valid.

Signed-off-by: Zhang Shurong &lt;zhang_shurong@foxmail.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/tencent_0D24426FAC6A21B69AC0C03CE4143A508F09@qq.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1: Avoid lock contention from wake_up()</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Jack Wang</name>
<email>jinpu.wang@ionos.com</email>
</author>
<published>2023-07-05T11:32:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=21bd9a68fef47c4f0e951be9a6fac9745cee1bab'/>
<id>21bd9a68fef47c4f0e951be9a6fac9745cee1bab</id>
<content type='text'>
wake_up is called unconditionally in a few paths such as make_request(),
which cause lock contention under high concurrency workload like below
    raid1_end_write_request
     wake_up
      __wake_up_common_lock
       spin_lock_irqsave

Improve performance by only call wake_up() if waitqueue is not empty

Fio test script:

[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]

Test result with 2 ramdisk in raid1 on a Intel Broadwell 56 cores server.

	Before this patch       With this patch
	READ	BW=4621MB/s	BW=7337MB/s
	WRITE	BW=1980MB/s	BW=3144MB/s

The patch is inspired by Yu Kuai's change for raid10:
https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com

Cc: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jack Wang &lt;jinpu.wang@ionos.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230705113227.148494-1-jinpu.wang@ionos.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
wake_up is called unconditionally in a few paths such as make_request(),
which cause lock contention under high concurrency workload like below
    raid1_end_write_request
     wake_up
      __wake_up_common_lock
       spin_lock_irqsave

Improve performance by only call wake_up() if waitqueue is not empty

Fio test script:

[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]

Test result with 2 ramdisk in raid1 on a Intel Broadwell 56 cores server.

	Before this patch       With this patch
	READ	BW=4621MB/s	BW=7337MB/s
	WRITE	BW=1980MB/s	BW=3144MB/s

The patch is inspired by Yu Kuai's change for raid10:
https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com

Cc: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Jack Wang &lt;jinpu.wang@ionos.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230705113227.148494-1-jinpu.wang@ionos.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
