<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/block/drbd/drbd_state.c, branch v3.9</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drbd: fix potential protocol error and resulting disconnect/reconnect</title>
<updated>2013-01-21T21:58:36+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2013-01-21T14:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2681f7f6ce6c7416eb619d0fb19422bcc68bd9e1'/>
<id>2681f7f6ce6c7416eb619d0fb19422bcc68bd9e1</id>
<content type='text'>
When we notice a disk failure on the receiving side,
we stop sending it new incoming writes.

Depending on exact timing of various events, the same transfer log epoch
could end up containing both replicated (before we noticed the failure)
and local-only requests (after we noticed the failure).

The sanity checks in tl_release(), called when receiving a
P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
the expected epoch, and the number of contained writes matches
the number of ack'ed writes.

In this case, they counted both replicated and local-only writes,
but the peer only acknowledges those it has seen.  We get a mismatch,
resulting in a protocol error and disconnect/reconnect cycle.

Messages logged are
  "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"

A similar issue can also be triggered when starting a resync while
having a healthy replication link, by invalidating one side, forcing a
full sync, or attaching to a diskless node.

Fix this by closing the current epoch if the state changes in a way
that would cause the replication intent of the next write.

Epochs now contain either only non-replicated,
or only replicated writes.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we notice a disk failure on the receiving side,
we stop sending it new incoming writes.

Depending on exact timing of various events, the same transfer log epoch
could end up containing both replicated (before we noticed the failure)
and local-only requests (after we noticed the failure).

The sanity checks in tl_release(), called when receiving a
P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
the expected epoch, and the number of contained writes matches
the number of ack'ed writes.

In this case, they counted both replicated and local-only writes,
but the peer only acknowledges those it has seen.  We get a mismatch,
resulting in a protocol error and disconnect/reconnect cycle.

Messages logged are
  "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"

A similar issue can also be triggered when starting a resync while
having a healthy replication link, by invalidating one side, forcing a
full sync, or attaching to a diskless node.

Fix this by closing the current epoch if the state changes in a way
that would cause the replication intent of the next write.

Epochs now contain either only non-replicated,
or only replicated writes.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: fixup after wait_even_lock_irq() addition to generic code</title>
<updated>2012-11-30T20:20:15+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2012-11-30T20:20:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2cecb7309897c872b977a60d53e5de0af4265dd6'/>
<id>2cecb7309897c872b977a60d53e5de0af4265dd6</id>
<content type='text'>
Compiling drbd yields:

drivers/block/drbd/drbd_state.c: In function ‘_conn_request_state’:
drivers/block/drbd/drbd_state.c:1804:5: error: macro "wait_event_lock_irq" passed 4 arguments, but takes just 3
drivers/block/drbd/drbd_state.c:1801:3: error: ‘wait_event_lock_irq’ undeclared (first use in this function)
drivers/block/drbd/drbd_state.c:1801:3: note: each undeclared identifier is reported only once for each function it appears in
drivers/block/drbd/drbd_state.c: At top level:
drivers/block/drbd/drbd_state.c:1734:1: warning: ‘_conn_rq_cond’ defined but not used [-Wunused-function]

Due to drbd having copied the MD definition for wait_event_lock_irq()
as well. Kill them.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Compiling drbd yields:

drivers/block/drbd/drbd_state.c: In function ‘_conn_request_state’:
drivers/block/drbd/drbd_state.c:1804:5: error: macro "wait_event_lock_irq" passed 4 arguments, but takes just 3
drivers/block/drbd/drbd_state.c:1801:3: error: ‘wait_event_lock_irq’ undeclared (first use in this function)
drivers/block/drbd/drbd_state.c:1801:3: note: each undeclared identifier is reported only once for each function it appears in
drivers/block/drbd/drbd_state.c: At top level:
drivers/block/drbd/drbd_state.c:1734:1: warning: ‘_conn_rq_cond’ defined but not used [-Wunused-function]

Due to drbd having copied the MD definition for wait_event_lock_irq()
as well. Kill them.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Fix comparison of is_valid_transition()'s return code</title>
<updated>2012-11-09T13:11:38+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-09-03T13:39:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a3025a273700fc51dd561c7b2941f3c9db9be90a'/>
<id>a3025a273700fc51dd561c7b2941f3c9db9be90a</id>
<content type='text'>
is_valid_transition() might return SS_NOTHING_TO_DO.

The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which
caused the wait_event to abort too early. Therefore drbd_req_state()
did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER
causing serve disruption of the state machine logic...

Detaching from a single volue was one way to trigger this bug.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
is_valid_transition() might return SS_NOTHING_TO_DO.

The condition function _req_st_cond() returned SS_NOTHING_TO_DO, which
caused the wait_event to abort too early. Therefore drbd_req_state()
did not consume the next CL_ST_CHG_SUCCESS or SS_CW_FAILED_BY_PEER
causing serve disruption of the state machine logic...

Detaching from a single volue was one way to trigger this bug.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Call drbd_md_sync() explicitly after a state change on the connection</title>
<updated>2012-11-09T13:11:08+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-28T14:48:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=19fffd7b0303e8843aa2decfd43fa57c9d511409'/>
<id>19fffd7b0303e8843aa2decfd43fa57c9d511409</id>
<content type='text'>
Without this, the meta-data gets updates after 5 seconds by the
md_sync_timer. Better to do it immeditaly after a state change.

If the asender detects a network failure, it may take a bit until
the worker processes the according after-conn-state-change work item.

  The worker might be blocked in sending something, i.e. it
  takes until it gets into its timeout. That is 6 seconds by
  default which is longer than the 5 seconds of the md_sync_timer.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Without this, the meta-data gets updates after 5 seconds by the
md_sync_timer. Better to do it immeditaly after a state change.

If the asender detects a network failure, it may take a bit until
the worker processes the according after-conn-state-change work item.

  The worker might be blocked in sending something, i.e. it
  takes until it gets into its timeout. That is 6 seconds by
  default which is longer than the 5 seconds of the md_sync_timer.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Imporve the error reporting of failed conn state changes</title>
<updated>2012-11-09T13:08:24+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-28T09:46:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4ae98b4db3d0e9336d0b771636b7d053a1460805'/>
<id>4ae98b4db3d0e9336d0b771636b7d053a1460805</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Fix the way the STATE_SENT bit is cleared</title>
<updated>2012-11-09T13:08:23+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-28T09:33:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=797020117761eee21ef284cea90c51c690fca169'/>
<id>797020117761eee21ef284cea90c51c690fca169</id>
<content type='text'>
With merging the commit
'drbd: Delay/reject other state changes while establishing a connection'
the condition check for clearing the flag was wrong.

Move the bit clearing to the __drbd_set_state() function
in order to have it already cleared for the other parts of
the function. I.e. clearing the susp_fen in the after_state_ch() function.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With merging the commit
'drbd: Delay/reject other state changes while establishing a connection'
the condition check for clearing the flag was wrong.

Move the bit clearing to the __drbd_set_state() function
in order to have it already cleared for the other parts of
the function. I.e. clearing the susp_fen in the after_state_ch() function.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Do not check aspects that are not subject to change in _conn_requests_state()</title>
<updated>2012-11-09T13:08:23+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-28T09:07:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=07fc96197aec46f7ad4f08a4b2a1ea426112e24d'/>
<id>07fc96197aec46f7ad4f08a4b2a1ea426112e24d</id>
<content type='text'>
When _conn_requests_state() is used to change other parts of the state
than the connection, do not check for a valid connection transition.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When _conn_requests_state() is used to change other parts of the state
than the connection, do not check for a valid connection transition.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Improve readability of IO resuming after freeze due to no data access</title>
<updated>2012-11-09T13:08:22+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-27T15:20:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=892fdd1aee35803bfa41ccf4df850ddb70f0811c'/>
<id>892fdd1aee35803bfa41ccf4df850ddb70f0811c</id>
<content type='text'>
The previous way of doing the state change was also okay since the
state change on the susp flag gets propagated from the mdev
to the tconn.

Fortunately all this goes away in drbd-9.0

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The previous way of doing the state change was also okay since the
state change on the susp flag gets propagated from the mdev
to the tconn.

Fortunately all this goes away in drbd-9.0

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Fix IO resuming after connection was established while executing the fence handler</title>
<updated>2012-11-09T13:08:22+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-27T15:16:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=88f79ec4ae22a82d7b9a0cd4c9f3ee1def2a382c'/>
<id>88f79ec4ae22a82d7b9a0cd4c9f3ee1def2a382c</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: mutex_unlock "... must no be used in interrupt context"</title>
<updated>2012-11-09T13:08:21+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-08-22T14:15:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c02abda2b28734a7e97e15db866507b3cb92b7d0'/>
<id>c02abda2b28734a7e97e15db866507b3cb92b7d0</id>
<content type='text'>
Documentation of mutex_unlock says
we must not use it in interrupt context.
So do not call it while holding the spin_lock_irq,
but give up the spinlock temporarily.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Documentation of mutex_unlock says
we must not use it in interrupt context.
So do not call it while holding the spin_lock_irq,
but give up the spinlock temporarily.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
