<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/raid5.c, branch v4.10</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>md/r5cache: disable write back for degraded array</title>
<updated>2017-01-24T19:26:06+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2017-01-24T18:45:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2e38a37f23c98d7fad87ff022670060b8a0e2bf5'/>
<id>2e38a37f23c98d7fad87ff022670060b8a0e2bf5</id>
<content type='text'>
write-back cache in degraded mode introduces corner cases to the array.
Although we try to cover all these corner cases, it is safer to just
disable write-back cache when the array is in degraded mode.

In this patch, we disable writeback cache for degraded mode:
1. On device failure, if the array enters degraded mode, raid5_error()
   will submit async job r5c_disable_writeback_async to disable
   writeback;
2. In r5c_journal_mode_store(), it is invalid to enable writeback in
   degraded mode;
3. In r5c_try_caching_write(), stripes with s-&gt;failed&gt;0 will be handled
   in write-through mode.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
write-back cache in degraded mode introduces corner cases to the array.
Although we try to cover all these corner cases, it is safer to just
disable write-back cache when the array is in degraded mode.

In this patch, we disable writeback cache for degraded mode:
1. On device failure, if the array enters degraded mode, raid5_error()
   will submit async job r5c_disable_writeback_async to disable
   writeback;
2. In r5c_journal_mode_store(), it is invalid to enable writeback in
   degraded mode;
3. In r5c_try_caching_write(), stripes with s-&gt;failed&gt;0 will be handled
   in write-through mode.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/r5cache: shift complex rmw from read path to write path</title>
<updated>2017-01-24T19:20:15+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2017-01-24T01:12:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=07e83364845e1e1c7e189a01206a9d7d33831568'/>
<id>07e83364845e1e1c7e189a01206a9d7d33831568</id>
<content type='text'>
Write back cache requires a complex RMW mechanism, where old data is
read into dev-&gt;orig_page for prexor, and then xor is done with
dev-&gt;page. This logic is already implemented in the write path.

However, current read path is not awared of this requirement. When
the array is optimal, the RMW is not required, as the data are
read from raid disks. However, when the target stripe is degraded,
complex RMW is required to generate right data.

To keep read path as clean as possible, we handle read path by
flushing degraded, in-journal stripes before processing reads to
missing dev.

Specifically, when there is read requests to a degraded stripe
with data in journal, handle_stripe_fill() calls
r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying()
will do the complex RMW and flush the stripe to RAID disks. After
that, read requests are handled.

There is one more corner case when there is non-overwrite bio for
the missing (or out of sync) dev. handle_stripe_dirtying() will not
be able to process the non-overwrite bios without constructing the
data in handle_stripe_fill(). This is fixed by delaying non-overwrite
bios in handle_stripe_dirtying(). So handle_stripe_fill() works on
these bios after the stripe is flushed to raid disks.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Write back cache requires a complex RMW mechanism, where old data is
read into dev-&gt;orig_page for prexor, and then xor is done with
dev-&gt;page. This logic is already implemented in the write path.

However, current read path is not awared of this requirement. When
the array is optimal, the RMW is not required, as the data are
read from raid disks. However, when the target stripe is degraded,
complex RMW is required to generate right data.

To keep read path as clean as possible, we handle read path by
flushing degraded, in-journal stripes before processing reads to
missing dev.

Specifically, when there is read requests to a degraded stripe
with data in journal, handle_stripe_fill() calls
r5c_make_stripe_write_out() and exits. Then handle_stripe_dirtying()
will do the complex RMW and flush the stripe to RAID disks. After
that, read requests are handled.

There is one more corner case when there is non-overwrite bio for
the missing (or out of sync) dev. handle_stripe_dirtying() will not
be able to process the non-overwrite bios without constructing the
data in handle_stripe_fill(). This is fixed by delaying non-overwrite
bios in handle_stripe_dirtying(). So handle_stripe_fill() works on
these bios after the stripe is flushed to raid disks.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: move comment of fetch_block to right location</title>
<updated>2017-01-24T19:20:14+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2017-01-13T01:22:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba02684daf7fb4a827580f909b7c7db61c05ae7d'/>
<id>ba02684daf7fb4a827580f909b7c7db61c05ae7d</id>
<content type='text'>
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/r5cache: read data into orig_page for prexor of cached data</title>
<updated>2017-01-24T19:20:14+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2017-01-13T01:22:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=86aa1397ddfde563b3692adadb8b8e32e97b4e5e'/>
<id>86aa1397ddfde563b3692adadb8b8e32e97b4e5e</id>
<content type='text'>
With write back cache, we use orig_page to do prexor. This patch
makes sure we read data into orig_page for it.

Flag R5_OrigPageUPTDODATE is added to show whether orig_page
has the latest data from raid disk.

We introduce a helper function uptodate_for_rmw() to simplify
the a couple conditions in handle_stripe_dirtying().

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With write back cache, we use orig_page to do prexor. This patch
makes sure we read data into orig_page for it.

Flag R5_OrigPageUPTDODATE is added to show whether orig_page
has the latest data from raid disk.

We introduce a helper function uptodate_for_rmw() to simplify
the a couple conditions in handle_stripe_dirtying().

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: Use correct IS_ERR() variation on pointer check</title>
<updated>2017-01-09T21:58:10+00:00</updated>
<author>
<name>Jes Sorensen</name>
<email>Jes.Sorensen@redhat.com</email>
</author>
<published>2017-01-07T00:31:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=32cd7cbbacf700885a2316275f188f2d5739b5f4'/>
<id>32cd7cbbacf700885a2316275f188f2d5739b5f4</id>
<content type='text'>
This fixes a build error on certain architectures, such as ppc64.

Fixes: 6995f0b247e("md: takeover should clear unrelated bits")
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This fixes a build error on certain architectures, such as ppc64.

Fixes: 6995f0b247e("md: takeover should clear unrelated bits")
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: cleanup mddev flag clear for takeover</title>
<updated>2017-01-05T19:45:18+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-01-05T00:10:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=394ed8e4743b0cfc5496fe49059fbfc2bc8eae35'/>
<id>394ed8e4743b0cfc5496fe49059fbfc2bc8eae35</id>
<content type='text'>
Commit 6995f0b (md: takeover should clear unrelated bits) clear
unrelated bits, but it's quite fragile. To avoid error in the future,
define a macro for unsupported mddev flags for each raid type and use it
to clear unsupported mddev flags. This should be less error-prone.

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 6995f0b (md: takeover should clear unrelated bits) clear
unrelated bits, but it's quite fragile. To avoid error in the future,
define a macro for unsupported mddev flags for each raid type and use it
to clear unsupported mddev flags. This should be less error-prone.

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'md-next' into md-linus</title>
<updated>2016-12-13T20:40:15+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-12-13T20:40:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=20737738d397dfadbca1ea50dcc00d7259f500cf'/>
<id>20737738d397dfadbca1ea50dcc00d7259f500cf</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>md: separate flags for superblock changes</title>
<updated>2016-12-09T06:01:47+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-12-08T23:48:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2953079c692da067aeb6345659875b97378f9b0a'/>
<id>2953079c692da067aeb6345659875b97378f9b0a</id>
<content type='text'>
The mddev-&gt;flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The mddev-&gt;flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: takeover should clear unrelated bits</title>
<updated>2016-12-09T06:00:11+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-12-08T23:48:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6995f0b247e15e34fbcd10852c08b30bdc1a78da'/>
<id>6995f0b247e15e34fbcd10852c08b30bdc1a78da</id>
<content type='text'>
When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit
will be accidentally set, but raid5 doesn't support it. The same is true
for the MD_HAS_JOURNAL bit.

Fix: 46533ff (md: Use REQ_FAILFAST_* on metadata writes where appropriate)
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we change level from raid1 to raid5, the MD_FAILFAST_SUPPORTED bit
will be accidentally set, but raid5 doesn't support it. The same is true
for the MD_HAS_JOURNAL bit.

Fix: 46533ff (md: Use REQ_FAILFAST_* on metadata writes where appropriate)
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: limit request size according to implementation limits</title>
<updated>2016-11-29T23:53:21+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2016-11-27T16:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e8d7c33232e5fdfa761c3416539bc5b4acd12db5'/>
<id>e8d7c33232e5fdfa761c3416539bc5b4acd12db5</id>
<content type='text'>
Current implementation employ 16bit counter of active stripes in lower
bits of bio-&gt;bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current implementation employ 16bit counter of active stripes in lower
bits of bio-&gt;bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
