<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/block/drbd/drbd_worker.c, branch v3.6.7</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drbd: do not reset rs_pending_cnt too early</title>
<updated>2012-07-24T12:09:53+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-06-14T16:02:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0029d62434d9045bc3e8b2eb48ae696e30336e92'/>
<id>0029d62434d9045bc3e8b2eb48ae696e30336e92</id>
<content type='text'>
Fix asserts like
  block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 &lt; 0 !

We reset the resync lru cache and related information (rs_pending_cnt),
once we successfully finished a resync or online verify, or if the
replication connection is lost.

We also need to reset it if a resync or online verify is aborted
because a lower level disk failed.

In that case the replication link is still established,
and we may still have packets queued in the network buffers
which want to touch rs_pending_cnt.

We do not have any synchronization mechanism to know for sure when all
such pending resync related packets have been drained.

To avoid this counter to go negative (and violate the ASSERT that it
will always be &gt;= 0), just do not reset it when we lose a disk.

It is good enough to make sure it is re-initialized before the next
resync can start: reset it when we re-attach a disk.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix asserts like
  block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 &lt; 0 !

We reset the resync lru cache and related information (rs_pending_cnt),
once we successfully finished a resync or online verify, or if the
replication connection is lost.

We also need to reset it if a resync or online verify is aborted
because a lower level disk failed.

In that case the replication link is still established,
and we may still have packets queued in the network buffers
which want to touch rs_pending_cnt.

We do not have any synchronization mechanism to know for sure when all
such pending resync related packets have been drained.

To avoid this counter to go negative (and violate the ASSERT that it
will always be &gt;= 0), just do not reset it when we lose a disk.

It is good enough to make sure it is re-initialized before the next
resync can start: reset it when we re-attach a disk.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: differentiate between normal and forced detach</title>
<updated>2012-07-24T12:06:18+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-06-14T12:21:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=383606e0dea6a380097dbcb0c319b09ca372f36b'/>
<id>383606e0dea6a380097dbcb0c319b09ca372f36b</id>
<content type='text'>
Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: move put_ldev from __req_mod() to the endio callback</title>
<updated>2012-05-09T13:16:51+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-01-16T14:04:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=46385c84acd6654d3a38c9c7af1921dbded74aa2'/>
<id>46385c84acd6654d3a38c9c7af1921dbded74aa2</id>
<content type='text'>
One invocation in the endio handler is good enough,
we don't need mention it for each of the different ways
it calls __req_mod().

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One invocation in the endio handler is good enough,
we don't need mention it for each of the different ways
it calls __req_mod().

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: fix potential data corruption and protocol error</title>
<updated>2012-05-09T13:16:39+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-03-08T15:43:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=001a88687aff26d62f8b61d55c6973618cf0f72f'/>
<id>001a88687aff26d62f8b61d55c6973618cf0f72f</id>
<content type='text'>
We assumed only bios with bi_idx == 0 would end up
in drbd_make_request().

That is wrong.

At least device mapper, in __clone_and_map(), may submit
clones only covering a partial bio, but sharing
the original bvec, by adjusting bi_idx and relevant
other bio members of the clone.

We used __bio_for_each_segment() in various places,
even though that is documented as
 * drivers should not use the __ version unless they _really_ want to
 * run through the entire bio and not just pending pieces

Impact: we would send the full bio bvec, even for the clone
with bi_idx &gt; 0, which will cause data corruption on the
peer (because we submit wrong data at the clone offset),
and will cause a DRBD protocol error, disconnect/reconnect
and resync (thus fixing the corruption),
because the next package header would be expected right
in the middle of the sent data, causing DRBD magic mismatch.

Fix: drop the assert, and use bio_for_each_segment()
instead of the __ version.

Conflicts:

	drbd/drbd_tracing.c

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We assumed only bios with bi_idx == 0 would end up
in drbd_make_request().

That is wrong.

At least device mapper, in __clone_and_map(), may submit
clones only covering a partial bio, but sharing
the original bvec, by adjusting bi_idx and relevant
other bio members of the clone.

We used __bio_for_each_segment() in various places,
even though that is documented as
 * drivers should not use the __ version unless they _really_ want to
 * run through the entire bio and not just pending pieces

Impact: we would send the full bio bvec, even for the clone
with bi_idx &gt; 0, which will cause data corruption on the
peer (because we submit wrong data at the clone offset),
and will cause a DRBD protocol error, disconnect/reconnect
and resync (thus fixing the corruption),
because the next package header would be expected right
in the middle of the sent data, causing DRBD magic mismatch.

Fix: drop the assert, and use bio_for_each_segment()
instead of the __ version.

Conflicts:

	drbd/drbd_tracing.c

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Fixed an obvious copy-n-paste mistake</title>
<updated>2012-05-09T13:15:57+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2011-11-09T20:04:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e89868a0927cfb8a3f535c938e5d6dd7edc6353c'/>
<id>e89868a0927cfb8a3f535c938e5d6dd7edc6353c</id>
<content type='text'>
This bug might have caused troubles if disk-barriers and the ahead-behind
more are enabled at the same time.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This bug might have caused troubles if disk-barriers and the ahead-behind
more are enabled at the same time.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Fixed a race condition between detach and start of resync</title>
<updated>2012-05-09T13:15:53+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2011-10-06T16:29:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aaae506d545bb9d06f4d8362f670f406f12e4b58'/>
<id>aaae506d545bb9d06f4d8362f670f406f12e4b58</id>
<content type='text'>
drbd_state_lock() is only there to serialize cluster wide state
changes. Testing the local disk state needs to happen while
holding the global_state_lock.

Otherwise you might see something like this (Oct 6 on kugel)
14:20:24 drbd0: conn( WFSyncUUID -&gt; Connected ) disk( Inconsistent -&gt; Failed )
14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
14:20:24 drbd0: conn( Connected -&gt; SyncTarget ) disk( Failed -&gt; Inconsistent )

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drbd_state_lock() is only there to serialize cluster wide state
changes. Testing the local disk state needs to happen while
holding the global_state_lock.

Otherwise you might see something like this (Oct 6 on kugel)
14:20:24 drbd0: conn( WFSyncUUID -&gt; Connected ) disk( Inconsistent -&gt; Failed )
14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
14:20:24 drbd0: conn( Connected -&gt; SyncTarget ) disk( Failed -&gt; Inconsistent )

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: fix harmless race to not trigger an ASSERT</title>
<updated>2012-05-09T13:15:52+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2011-10-06T15:10:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6a9a92f4ef05bb3e94bbfe123c21482fa5da9866'/>
<id>6a9a92f4ef05bb3e94bbfe123c21482fa5da9866</id>
<content type='text'>
We have one pre-allocated page to do certain synchronous meta data IO with,
using it is serialized like so:
	drbd_md_get_buffer();
	drbd_md_sync_page_io();
	drbd_md_sync_page_io();
	...
	drbd_md_put_buffer();

In drbd_md_sync_page_io() there is an
	ASSERT(atomic_read(&amp;mdev-&gt;md_io_in_use) == 1);

We want to be able to timeout on unresponsive lower level devices, so we
can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra
reference, to not have a dangling pointer in case a delayed IO eventually
does still complete, even after we "detached" already.

We need to put the extra reference before we signal completion from the
completion handler, or the second drbd_md_sync_page_io() above may
trigger the assert (reference count still 2).

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have one pre-allocated page to do certain synchronous meta data IO with,
using it is serialized like so:
	drbd_md_get_buffer();
	drbd_md_sync_page_io();
	drbd_md_sync_page_io();
	...
	drbd_md_put_buffer();

In drbd_md_sync_page_io() there is an
	ASSERT(atomic_read(&amp;mdev-&gt;md_io_in_use) == 1);

We want to be able to timeout on unresponsive lower level devices, so we
can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra
reference, to not have a dangling pointer in case a delayed IO eventually
does still complete, even after we "detached" already.

We need to put the extra reference before we signal completion from the
completion handler, or the second drbd_md_sync_page_io() above may
trigger the assert (reference count still 2).

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Hold a reference to ldev while doing meta-data IO</title>
<updated>2012-05-09T08:31:11+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2011-06-27T10:23:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b2057629ea96c33e4ae38102ecd0f27ed9a3c3ef'/>
<id>b2057629ea96c33e4ae38102ecd0f27ed9a3c3ef</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Keep a reference to the bio until the completion handler finished</title>
<updated>2012-05-09T08:28:51+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2011-07-04T09:14:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a2fe568b5428abc56d7d172e3571e33d8ab7265'/>
<id>4a2fe568b5428abc56d7d172e3571e33d8ab7265</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Implemented wait_until_done_or_disk_failure()</title>
<updated>2012-05-09T08:26:51+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2011-06-26T20:26:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0c464425158482647226fb30708c68fffc061585'/>
<id>0c464425158482647226fb30708c68fffc061585</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
