<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/block/drbd/drbd_receiver.c, branch v3.9</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>drbd: close race between drbd_set_role and drbd_connect</title>
<updated>2012-12-06T12:00:33+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-11-22T16:06:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=13c76aba7846647f86d479293ae0a0adc1ca840a'/>
<id>13c76aba7846647f86d479293ae0a0adc1ca840a</id>
<content type='text'>
drbd_set_role(, R_PRIMARY, ) does the state change to Primary,
some more housekeeping, and possibly generates a new UUID set.

All of this holding the "state_mutex".

The connection handshake involves sending of various state information,
including the current data generation UUID set, and two connection
state changes from C_WF_CONNECTION to C_WF_REPORT_PARAMS further to
a number of different outcomes, resync being one of them.

If the connection handshake happens between the state change to Primary
and the generation of the new UUIDs, the resync decision based on the
old UUID set may be confused, depending on circumstances.

Make sure that, before we do the handshake, any promotion to Primary
role will either be complete (including the housekeeping stuff), or can
see, and serialize with, the ongoing handshake, based on the
"STATE_SENT" bit, which is set when we start the handshake, and cleared
only when we leave C_WF_REPORT_PARAMS again.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drbd_set_role(, R_PRIMARY, ) does the state change to Primary,
some more housekeeping, and possibly generates a new UUID set.

All of this holding the "state_mutex".

The connection handshake involves sending of various state information,
including the current data generation UUID set, and two connection
state changes from C_WF_CONNECTION to C_WF_REPORT_PARAMS further to
a number of different outcomes, resync being one of them.

If the connection handshake happens between the state change to Primary
and the generation of the new UUIDs, the resync decision based on the
old UUID set may be confused, depending on circumstances.

Make sure that, before we do the handshake, any promotion to Primary
role will either be complete (including the housekeeping stuff), or can
see, and serialize with, the ongoing handshake, based on the
"STATE_SENT" bit, which is set when we start the handshake, and cleared
only when we leave C_WF_REPORT_PARAMS again.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: if the replication link breaks during handshake, keep retrying</title>
<updated>2012-11-09T13:22:19+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-11-05T10:54:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ed635cb0674d6e4303d1a2e27d9e6e80b451a338'/>
<id>ed635cb0674d6e4303d1a2e27d9e6e80b451a338</id>
<content type='text'>
The 8.3.12 commit drbd: Bugfix for the connection behavior fixes a
"wasted established connection", if a former connection attempt failed
during its early stages.

However it opened a window for a regression, if a connection attempt
fails during its last stages.  The result was a terminated receiver
thread, that left behind the supposedly transient "C_UNCONNECTED" state.
Any later requests to change the connection state fail, as they wait for
the connection state to "stabilize".

Fix: short circuit and keep retrying to restablish a new connection,
if we don't reach C_WF_REPORT_PARAMS.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 8.3.12 commit drbd: Bugfix for the connection behavior fixes a
"wasted established connection", if a former connection attempt failed
during its early stages.

However it opened a window for a regression, if a connection attempt
fails during its last stages.  The result was a terminated receiver
thread, that left behind the supposedly transient "C_UNCONNECTED" state.
Any later requests to change the connection state fail, as they wait for
the connection state to "stabilize".

Fix: short circuit and keep retrying to restablish a new connection,
if we don't reach C_WF_REPORT_PARAMS.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: check return of kmalloc in receive_uuids</title>
<updated>2012-11-09T13:22:10+00:00</updated>
<author>
<name>Jing Wang</name>
<email>windsdaemon@gmail.com</email>
</author>
<published>2012-10-25T07:00:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=063eacf88cc1394ece125d106c05cba1ca03aa3d'/>
<id>063eacf88cc1394ece125d106c05cba1ca03aa3d</id>
<content type='text'>
Signed-off-by: Jing Wang &lt;windsdaemon@gmail.com&gt;
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Jing Wang &lt;windsdaemon@gmail.com&gt;
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6</title>
<updated>2012-11-09T13:20:23+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-11-09T13:18:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=986836503e49ccf7e84b813715d344964ec93566'/>
<id>986836503e49ccf7e84b813715d344964ec93566</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Remove duplicate code</title>
<updated>2012-11-09T13:11:38+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-09-03T12:04:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1393b59f8c46001c8dbd47078881483cf97813c3'/>
<id>1393b59f8c46001c8dbd47078881483cf97813c3</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Call drbd_md_sync() explicitly after a state change on the connection</title>
<updated>2012-11-09T13:11:08+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-28T14:48:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=19fffd7b0303e8843aa2decfd43fa57c9d511409'/>
<id>19fffd7b0303e8843aa2decfd43fa57c9d511409</id>
<content type='text'>
Without this, the meta-data gets updates after 5 seconds by the
md_sync_timer. Better to do it immeditaly after a state change.

If the asender detects a network failure, it may take a bit until
the worker processes the according after-conn-state-change work item.

  The worker might be blocked in sending something, i.e. it
  takes until it gets into its timeout. That is 6 seconds by
  default which is longer than the 5 seconds of the md_sync_timer.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Without this, the meta-data gets updates after 5 seconds by the
md_sync_timer. Better to do it immeditaly after a state change.

If the asender detects a network failure, it may take a bit until
the worker processes the according after-conn-state-change work item.

  The worker might be blocked in sending something, i.e. it
  takes until it gets into its timeout. That is 6 seconds by
  default which is longer than the 5 seconds of the md_sync_timer.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: properly call drbd_rs_cancel_all() in drbd_disconnected()</title>
<updated>2012-11-09T13:08:19+00:00</updated>
<author>
<name>Lars Ellenberg</name>
<email>lars.ellenberg@linbit.com</email>
</author>
<published>2012-08-17T13:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=08332d73250eec349b055843a503d45a9b5c13b6'/>
<id>08332d73250eec349b055843a503d45a9b5c13b6</id>
<content type='text'>
drbd_disconnected() is supposed to clear the resync lru cache,
by calling drbd_rs_cancel_all().

We must do so before we call drbd_flush_workqueue(), as at least the
callback w_restart_disk_io() may wait for resync progres, and would
otherwise deadlock.

drbd_finish_peer_reqs() may again populate that cache, which will
then potentially be stale after the next resync handshake and bitmap
exchange, we have to do it again after that.

A stale resync lru cache causes no harm but ugly messages like this:
 BAD! sector=196608s enr=6 rs_left=-256 rs_failed=0 count=256 cstate=SyncTarget

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
drbd_disconnected() is supposed to clear the resync lru cache,
by calling drbd_rs_cancel_all().

We must do so before we call drbd_flush_workqueue(), as at least the
callback w_restart_disk_io() may wait for resync progres, and would
otherwise deadlock.

drbd_finish_peer_reqs() may again populate that cache, which will
then potentially be stale after the next resync handshake and bitmap
exchange, we have to do it again after that.

A stale resync lru cache causes no harm but ugly messages like this:
 BAD! sector=196608s enr=6 rs_left=-256 rs_failed=0 count=256 cstate=SyncTarget

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Remove dead code</title>
<updated>2012-11-09T13:08:19+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-08T19:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=155522df5b8ac24ee66a903e51d5b3023b2a76f9'/>
<id>155522df5b8ac24ee66a903e51d5b3023b2a76f9</id>
<content type='text'>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Avoid NetworkFailure state during disconnect</title>
<updated>2012-11-09T13:08:18+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-08T19:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b66623e33efbbf55717df7bfc49882371118b866'/>
<id>b66623e33efbbf55717df7bfc49882371118b866</id>
<content type='text'>
Disconnecting is a cluster wide state change. In case the peer node agrees
to the state transition, it sends back the fact on the meta-data connection
and closes both sockets.

In case the node node that initiated the state transfer sees the closing
action on the data-socket, before the P_STATE_CHG_REPLY packet, it was
going into one of the network failure states.

At least with the fencing option set to something else thatn "dont-care",
the unclean shutdown of the connection causes a short IO freeze or
a fence operation.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Disconnecting is a cluster wide state change. In case the peer node agrees
to the state transition, it sends back the fact on the meta-data connection
and closes both sockets.

In case the node node that initiated the state transfer sees the closing
action on the data-socket, before the P_STATE_CHG_REPLY packet, it was
going into one of the network failure states.

At least with the fencing option set to something else thatn "dont-care",
the unclean shutdown of the connection causes a short IO freeze or
a fence operation.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: Protect accesses to the uuid set with a spinlock</title>
<updated>2012-11-09T13:08:04+00:00</updated>
<author>
<name>Philipp Reisner</name>
<email>philipp.reisner@linbit.com</email>
</author>
<published>2012-08-08T19:19:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=39a1aa7f49dc8eae5c8d3a4bf759eb7abeabe6c0'/>
<id>39a1aa7f49dc8eae5c8d3a4bf759eb7abeabe6c0</id>
<content type='text'>
There is at least the worker context, the receiver context, the context of
receiving netlink packts.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is at least the worker context, the receiver context, the context of
receiving netlink packts.

Signed-off-by: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
Signed-off-by: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
