<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/md-cluster.c, branch v4.10</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>md-cluster: make resync lock also could be interruptted</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d6385db94196b253ae5eb3678fa95cdf1f839fcc'/>
<id>d6385db94196b253ae5eb3678fa95cdf1f839fcc</id>
<content type='text'>
When one node is perform resync or recovery, other nodes
can't get resync lock and could block for a while before
it holds the lock, so we can't stop array immediately for
this scenario.

To make array could be stop quickly, we check MD_CLOSING
in dlm_lock_sync_interruptible to make us can interrupt
the lock request.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When one node is perform resync or recovery, other nodes
can't get resync lock and could block for a while before
it holds the lock, so we can't stop array immediately for
this scenario.

To make array could be stop quickly, we check MD_CLOSING
in dlm_lock_sync_interruptible to make us can interrupt
the lock request.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7bcda7149dd6911bee15a51b22477f592f8b9620'/>
<id>7bcda7149dd6911bee15a51b22477f592f8b9620</id>
<content type='text'>
When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.

1. Node A create a resyncing cluster raid1, assemble it in
   other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.

linux44:~ # ps aux|grep md|grep D
root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]

linux44:~ # cat /proc/5939/stack
[&lt;ffffffffa04cf321&gt;] dlm_lock_sync+0x71/0x90 [md_cluster]
[&lt;ffffffffa04d0705&gt;] recover_bitmaps+0x125/0x220 [md_cluster]
[&lt;ffffffffa052105d&gt;] md_thread+0x16d/0x180 [md_mod]
[&lt;ffffffff8107ad94&gt;] kthread+0xb4/0xc0
[&lt;ffffffff8152a518&gt;] ret_from_fork+0x58/0x90

linux44:~ # cat /proc/5938/stack
[&lt;ffffffff8107afde&gt;] kthread_stop+0x6e/0x120
[&lt;ffffffffa0519da0&gt;] md_unregister_thread+0x40/0x80 [md_mod]
[&lt;ffffffffa04cfd20&gt;] leave+0x70/0x120 [md_cluster]
[&lt;ffffffffa0525e24&gt;] md_cluster_stop+0x14/0x30 [md_mod]
[&lt;ffffffffa05269ab&gt;] bitmap_free+0x14b/0x150 [md_mod]
[&lt;ffffffffa0523f3b&gt;] do_md_stop+0x35b/0x5a0 [md_mod]
[&lt;ffffffffa0524e83&gt;] md_ioctl+0x873/0x1590 [md_mod]
[&lt;ffffffff81288464&gt;] blkdev_ioctl+0x214/0x7d0
[&lt;ffffffff811dd3dd&gt;] block_ioctl+0x3d/0x40
[&lt;ffffffff811b92d4&gt;] do_vfs_ioctl+0x2d4/0x4b0
[&lt;ffffffff811b9538&gt;] SyS_ioctl+0x88/0xa0
[&lt;ffffffff8152a5c9&gt;] system_call_fastpath+0x16/0x1b

The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.

1. Node A create a resyncing cluster raid1, assemble it in
   other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.

linux44:~ # ps aux|grep md|grep D
root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]

linux44:~ # cat /proc/5939/stack
[&lt;ffffffffa04cf321&gt;] dlm_lock_sync+0x71/0x90 [md_cluster]
[&lt;ffffffffa04d0705&gt;] recover_bitmaps+0x125/0x220 [md_cluster]
[&lt;ffffffffa052105d&gt;] md_thread+0x16d/0x180 [md_mod]
[&lt;ffffffff8107ad94&gt;] kthread+0xb4/0xc0
[&lt;ffffffff8152a518&gt;] ret_from_fork+0x58/0x90

linux44:~ # cat /proc/5938/stack
[&lt;ffffffff8107afde&gt;] kthread_stop+0x6e/0x120
[&lt;ffffffffa0519da0&gt;] md_unregister_thread+0x40/0x80 [md_mod]
[&lt;ffffffffa04cfd20&gt;] leave+0x70/0x120 [md_cluster]
[&lt;ffffffffa0525e24&gt;] md_cluster_stop+0x14/0x30 [md_mod]
[&lt;ffffffffa05269ab&gt;] bitmap_free+0x14b/0x150 [md_mod]
[&lt;ffffffffa0523f3b&gt;] do_md_stop+0x35b/0x5a0 [md_mod]
[&lt;ffffffffa0524e83&gt;] md_ioctl+0x873/0x1590 [md_mod]
[&lt;ffffffff81288464&gt;] blkdev_ioctl+0x214/0x7d0
[&lt;ffffffff811dd3dd&gt;] block_ioctl+0x3d/0x40
[&lt;ffffffff811b92d4&gt;] do_vfs_ioctl+0x2d4/0x4b0
[&lt;ffffffff811b9538&gt;] SyS_ioctl+0x88/0xa0
[&lt;ffffffff8152a5c9&gt;] system_call_fastpath+0x16/0x1b

The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: convert the completion to wait queue</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fccb60a42cdd863aa80f32214ae58ae13936c927'/>
<id>fccb60a42cdd863aa80f32214ae58ae13936c927</id>
<content type='text'>
Previously, we used completion to sync between require dlm lock
and sync_ast, however we will have to expose completion.wait
and completion.done in dlm_lock_sync_interruptible (introduced
later), it is not a common usage for completion, so convert
related things to wait queue.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, we used completion to sync between require dlm lock
and sync_ast, however we will have to expose completion.wait
and completion.done in dlm_lock_sync_interruptible (introduced
later), it is not a common usage for completion, so convert
related things to wait queue.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: protect md_find_rdev_nr_rcu with rcu lock</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5f0aa21da6cc620b08e5f69f51db29cb1f722174'/>
<id>5f0aa21da6cc620b08e5f69f51db29cb1f722174</id>
<content type='text'>
We need to use rcu_read_lock/unlock to avoid potential
race.

Reported-by: Shaohua Li &lt;shli@fb.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to use rcu_read_lock/unlock to avoid potential
race.

Reported-by: Shaohua Li &lt;shli@fb.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: remove some unnecessary dlm_unlock_sync</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3f924d3dfc672dd1292bc7eb6f2a305c13981ec'/>
<id>e3f924d3dfc672dd1292bc7eb6f2a305c13981ec</id>
<content type='text'>
Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
we don't need to call dlm_unlock_sync before free
lock resource.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: use FORCEUNLOCK in lockres_free</title>
<updated>2016-09-21T16:09:44+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-08-12T05:42:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=400cb454a4205ec1d7311bc3dd8104859c26ba46'/>
<id>400cb454a4205ec1d7311bc3dd8104859c26ba46</id>
<content type='text'>
For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res-&gt;flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.

Acked-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For dlm_unlock, we need to pass flag to dlm_unlock as the
third parameter instead of set res-&gt;flags.

Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
since it works even the lock is on waiting or convert queue.

Acked-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: fix error return code in join()</title>
<updated>2016-08-24T17:21:51+00:00</updated>
<author>
<name>Wei Yongjun</name>
<email>weiyongjun1@huawei.com</email>
</author>
<published>2016-08-21T14:42:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0f6187dbe542d71ace8ba0908954b0f4f8a30a1e'/>
<id>0f6187dbe542d71ace8ba0908954b0f4f8a30a1e</id>
<content type='text'>
Fix to return error code -ENOMEM from the lockres_init() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun &lt;weiyongjun1@huawei.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix to return error code -ENOMEM from the lockres_init() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun &lt;weiyongjun1@huawei.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: check the return value of process_recvd_msg</title>
<updated>2016-05-09T16:24:04+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-05-04T02:22:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1fa9a1ad0a9db3c745fe0c1bfa73fd87901fd7f3'/>
<id>1fa9a1ad0a9db3c745fe0c1bfa73fd87901fd7f3</id>
<content type='text'>
We don't need to run the full path of recv_daemon
if process_recvd_msg doesn't return 0.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't need to run the full path of recv_daemon
if process_recvd_msg doesn't return 0.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: gather resync infos and enable recv_thread after bitmap is ready</title>
<updated>2016-05-09T16:24:03+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-05-04T06:17:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=51e453aecb267b6a99b1d2853bccd5bba7340236'/>
<id>51e453aecb267b6a99b1d2853bccd5bba7340236</id>
<content type='text'>
The in-memory bitmap is not ready when node joins cluster,
so it doesn't make sense to make gather_all_resync_info()
called so earlier, we need to call it after the node's
bitmap is setup. Also, recv_thread could be wake up after
node joins cluster, but it could cause problem if node
receives RESYNCING message without persionality since
mddev-&gt;pers-&gt;quiesce is called in process_suspend_info.

This commit introduces a new cluster interface load_bitmaps
to fix above problems, load_bitmaps is called in bitmap_load
where bitmap and persionality are ready, and load_bitmaps
does the following tasks:

1. call gather_all_resync_info to load all the node's
   bitmap info.
2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
   could be wake up, and wake up recv_thread if there is
   pending recv event.

Then ack_bast only wakes up recv_thread after IN_CLUSTER
bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
set.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The in-memory bitmap is not ready when node joins cluster,
so it doesn't make sense to make gather_all_resync_info()
called so earlier, we need to call it after the node's
bitmap is setup. Also, recv_thread could be wake up after
node joins cluster, but it could cause problem if node
receives RESYNCING message without persionality since
mddev-&gt;pers-&gt;quiesce is called in process_suspend_info.

This commit introduces a new cluster interface load_bitmaps
to fix above problems, load_bitmaps is called in bitmap_load
where bitmap and persionality are ready, and load_bitmaps
does the following tasks:

1. call gather_all_resync_info to load all the node's
   bitmap info.
2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
   could be wake up, and wake up recv_thread if there is
   pending recv event.

Then ack_bast only wakes up recv_thread after IN_CLUSTER
bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
set.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md-cluster: sync bitmap when node received RESYNCING msg</title>
<updated>2016-05-04T19:39:35+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2016-05-02T15:50:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=18c9ff7f487efa8e88886bee21bd3516dde05bc5'/>
<id>18c9ff7f487efa8e88886bee21bd3516dde05bc5</id>
<content type='text'>
If the node received RESYNCING message which means
another node will perform resync with the area, then
we don't want to do it again in another node.

Let's set RESYNC_MASK and clear NEEDED_MASK for the
region from old-low to new-low which has finished
syncing, and the region from old-hi to new-hi is about
to syncing, bitmap_sync_with_cluste is introduced for
the purpose.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If the node received RESYNCING message which means
another node will perform resync with the area, then
we don't want to do it again in another node.

Let's set RESYNC_MASK and clear NEEDED_MASK for the
region from old-low to new-low which has finished
syncing, and the region from old-hi to new-hi is about
to syncing, bitmap_sync_with_cluste is introduced for
the purpose.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
