<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/dlm, branch v6.2-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Treewide: Stop corrupting socket's task_frag</title>
<updated>2022-12-20T01:28:49+00:00</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2022-12-16T12:45:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=98123866fcf3fe95a0c1b198ef122dfdbd351916'/>
<id>98123866fcf3fe95a0c1b198ef122dfdbd351916</id>
<content type='text'>
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current-&gt;task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
CC: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
CC: "Christoph Böhmwalder" &lt;christoph.boehmwalder@linbit.com&gt;
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
CC: Josef Bacik &lt;josef@toxicpanda.com&gt;
CC: Keith Busch &lt;kbusch@kernel.org&gt;
CC: Christoph Hellwig &lt;hch@lst.de&gt;
CC: Sagi Grimberg &lt;sagi@grimberg.me&gt;
CC: Lee Duncan &lt;lduncan@suse.com&gt;
CC: Chris Leech &lt;cleech@redhat.com&gt;
CC: Mike Christie &lt;michael.christie@oracle.com&gt;
CC: "James E.J. Bottomley" &lt;jejb@linux.ibm.com&gt;
CC: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
CC: Valentina Manea &lt;valentina.manea.m@gmail.com&gt;
CC: Shuah Khan &lt;shuah@kernel.org&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Marc Dionne &lt;marc.dionne@auristor.com&gt;
CC: Steve French &lt;sfrench@samba.org&gt;
CC: Christine Caulfield &lt;ccaulfie@redhat.com&gt;
CC: David Teigland &lt;teigland@redhat.com&gt;
CC: Mark Fasheh &lt;mark@fasheh.com&gt;
CC: Joel Becker &lt;jlbec@evilplan.org&gt;
CC: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
CC: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
CC: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
CC: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
CC: Ilya Dryomov &lt;idryomov@gmail.com&gt;
CC: Xiubo Li &lt;xiubli@redhat.com&gt;
CC: Chuck Lever &lt;chuck.lever@oracle.com&gt;
CC: Jeff Layton &lt;jlayton@kernel.org&gt;
CC: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
CC: Anna Schumaker &lt;anna@kernel.org&gt;
CC: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
CC: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;

Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current-&gt;task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner &lt;philipp.reisner@linbit.com&gt;
CC: Lars Ellenberg &lt;lars.ellenberg@linbit.com&gt;
CC: "Christoph Böhmwalder" &lt;christoph.boehmwalder@linbit.com&gt;
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
CC: Josef Bacik &lt;josef@toxicpanda.com&gt;
CC: Keith Busch &lt;kbusch@kernel.org&gt;
CC: Christoph Hellwig &lt;hch@lst.de&gt;
CC: Sagi Grimberg &lt;sagi@grimberg.me&gt;
CC: Lee Duncan &lt;lduncan@suse.com&gt;
CC: Chris Leech &lt;cleech@redhat.com&gt;
CC: Mike Christie &lt;michael.christie@oracle.com&gt;
CC: "James E.J. Bottomley" &lt;jejb@linux.ibm.com&gt;
CC: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
CC: Valentina Manea &lt;valentina.manea.m@gmail.com&gt;
CC: Shuah Khan &lt;shuah@kernel.org&gt;
CC: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
CC: David Howells &lt;dhowells@redhat.com&gt;
CC: Marc Dionne &lt;marc.dionne@auristor.com&gt;
CC: Steve French &lt;sfrench@samba.org&gt;
CC: Christine Caulfield &lt;ccaulfie@redhat.com&gt;
CC: David Teigland &lt;teigland@redhat.com&gt;
CC: Mark Fasheh &lt;mark@fasheh.com&gt;
CC: Joel Becker &lt;jlbec@evilplan.org&gt;
CC: Joseph Qi &lt;joseph.qi@linux.alibaba.com&gt;
CC: Eric Van Hensbergen &lt;ericvh@gmail.com&gt;
CC: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
CC: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
CC: Ilya Dryomov &lt;idryomov@gmail.com&gt;
CC: Xiubo Li &lt;xiubli@redhat.com&gt;
CC: Chuck Lever &lt;chuck.lever@oracle.com&gt;
CC: Jeff Layton &lt;jlayton@kernel.org&gt;
CC: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
CC: Anna Schumaker &lt;anna@kernel.org&gt;
CC: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
CC: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;

Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: fix building without lockdep</title>
<updated>2022-11-22T16:14:26+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-22T14:48:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7a5e9f1f83e3271a9f05933a80b870fe55ebbb3d'/>
<id>7a5e9f1f83e3271a9f05933a80b870fe55ebbb3d</id>
<content type='text'>
This patch uses assert_spin_locked() instead of lockdep_is_held()
where it's available to use because lockdep_is_held() is only available
if CONFIG_LOCKDEP is set.

In other cases like lockdep_sock_is_held() we surround it by a
CONFIG_LOCKDEP idef.

Fixes: dbb751ffab0b ("fs: dlm: parallelize lowcomms socket handling")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch uses assert_spin_locked() instead of lockdep_is_held()
where it's available to use because lockdep_is_held() is only available
if CONFIG_LOCKDEP is set.

In other cases like lockdep_sock_is_held() we surround it by a
CONFIG_LOCKDEP idef.

Fixes: dbb751ffab0b ("fs: dlm: parallelize lowcomms socket handling")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: parallelize lowcomms socket handling</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dbb751ffab0b764720e360efd642ba6bf076d87f'/>
<id>dbb751ffab0b764720e360efd642ba6bf076d87f</id>
<content type='text'>
This patch is rework of lowcomms handling, the main goal was here to
handle recvmsg() and sendpage() to run parallel. Parallel in two senses:
1. per connection and 2. that recvmsg()/sendpage() doesn't block each
other.

Currently recvmsg()/sendpage() cannot run parallel because two
workqueues "dlm_recv" and "dlm_send" are ordered workqueues. That means
only one work item can be executed. The amount of queue items will be
increased about the amount of nodes being inside the cluster. The current
two workqueues for sending and receiving can also block each other if the
same connection is executed at the same time in dlm_recv and dlm_send
workqueue because a per connection mutex for the socket handling.

To make it more parallel we introduce one "dlm_io" workqueue which is
not an ordered workqueue, the amount of workers are not limited. Due
per connection flags SEND/RECV pending we schedule workers ordered per
connection and per send and receive task. To get rid of the mutex
blocking same workers to do socket handling we switched to a semaphore
which handles socket operations as read lock and sock releases as write
operations, to prevent sock_release() being called while the socket is
being used.

There might be more optimization removing the semaphore and replacing it
with other synchronization mechanism, however due other circumstances
e.g. othercon behaviour it seems complicated to doing this change. I
added comments to remove the othercon handling and moving to a different
synchronization mechanism as this is done. We need to do that to the next
dlm major version upgrade because it is not backwards compatible with the
current connect mechanism.

The processing of dlm messages need to be still handled by a ordered
workqueue. An dlm_process ordered workqueue was introduced which gets
filled by the receive worker. This is probably the next bottleneck of
DLM but the application can't currently parse dlm messages parallel. A
comment was introduced to lift the workqueue context of dlm processing
in a non-sleepable softirq to get messages processing done fast.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch is rework of lowcomms handling, the main goal was here to
handle recvmsg() and sendpage() to run parallel. Parallel in two senses:
1. per connection and 2. that recvmsg()/sendpage() doesn't block each
other.

Currently recvmsg()/sendpage() cannot run parallel because two
workqueues "dlm_recv" and "dlm_send" are ordered workqueues. That means
only one work item can be executed. The amount of queue items will be
increased about the amount of nodes being inside the cluster. The current
two workqueues for sending and receiving can also block each other if the
same connection is executed at the same time in dlm_recv and dlm_send
workqueue because a per connection mutex for the socket handling.

To make it more parallel we introduce one "dlm_io" workqueue which is
not an ordered workqueue, the amount of workers are not limited. Due
per connection flags SEND/RECV pending we schedule workers ordered per
connection and per send and receive task. To get rid of the mutex
blocking same workers to do socket handling we switched to a semaphore
which handles socket operations as read lock and sock releases as write
operations, to prevent sock_release() being called while the socket is
being used.

There might be more optimization removing the semaphore and replacing it
with other synchronization mechanism, however due other circumstances
e.g. othercon behaviour it seems complicated to doing this change. I
added comments to remove the othercon handling and moving to a different
synchronization mechanism as this is done. We need to do that to the next
dlm major version upgrade because it is not backwards compatible with the
current connect mechanism.

The processing of dlm messages need to be still handled by a ordered
workqueue. An dlm_process ordered workqueue was introduced which gets
filled by the receive worker. This is probably the next bottleneck of
DLM but the application can't currently parse dlm messages parallel. A
comment was introduced to lift the workqueue context of dlm processing
in a non-sleepable softirq to get messages processing done fast.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: don't init error value</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1351975ac1377225cef5d858971e17252c06ff51'/>
<id>1351975ac1377225cef5d858971e17252c06ff51</id>
<content type='text'>
This patch removes a init of an error value to -EINVAL which is not
necessary.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes a init of an error value to -EINVAL which is not
necessary.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: use saved sk_error_report()</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c852a6d70698d5e4a9f3670ecabf9656ff00c73c'/>
<id>c852a6d70698d5e4a9f3670ecabf9656ff00c73c</id>
<content type='text'>
This patch changes the handling of calling the original
sk_error_report() by not putting it on the stack and calling it later.
If the listen_sock.sk_error_report() is NULL in this moment it indicates
a bug in our implementation.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch changes the handling of calling the original
sk_error_report() by not putting it on the stack and calling it later.
If the listen_sock.sk_error_report() is NULL in this moment it indicates
a bug in our implementation.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: use sock2con without checking null</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e9dd5fd849f1ac125919a79eb4be66f078dd7f51'/>
<id>e9dd5fd849f1ac125919a79eb4be66f078dd7f51</id>
<content type='text'>
This patch removes null checks on private data for sockets. If we have a
null dereference there we having a bug in our implementation that such
callback occurs in this state.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes null checks on private data for sockets. If we have a
null dereference there we having a bug in our implementation that such
callback occurs in this state.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: remove dlm_node_addrs lookup list</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6f0b0b5d7ae70423c94915989c45b29e87c61ad7'/>
<id>6f0b0b5d7ae70423c94915989c45b29e87c61ad7</id>
<content type='text'>
This patch merges the dlm_node_addrs lookup list to the connection
structure. It is a per node mapping to some configuration setup by
configfs. We don't need two lookup structures. The connection hash has
now a lifetime like the dlm_node_addrs entries. Means we add only new
entries when configure cluster and not while new connections are coming
in, remove connection when a node got fenced and cleanup all connection
when the dlm exits. It should work the same and even will show more
issues because we don't try to somehow keep those two data structures in
sync with the current cluster configuration.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch merges the dlm_node_addrs lookup list to the connection
structure. It is a per node mapping to some configuration setup by
configfs. We don't need two lookup structures. The connection hash has
now a lifetime like the dlm_node_addrs entries. Means we add only new
entries when configure cluster and not while new connections are coming
in, remove connection when a node got fenced and cleanup all connection
when the dlm exits. It should work the same and even will show more
issues because we don't try to somehow keep those two data structures in
sync with the current cluster configuration.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: don't put dlm_local_addrs on heap</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c51c9cd8addcfbdc097dbefd59f022402183644b'/>
<id>c51c9cd8addcfbdc097dbefd59f022402183644b</id>
<content type='text'>
This patch removes to allocate the dlm_local_addr[] pointers on the
heap. Instead we directly store the type of "struct sockaddr_storage".
This removes function deinit_local() because it was freeing memory only.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes to allocate the dlm_local_addr[] pointers on the
heap. Instead we directly store the type of "struct sockaddr_storage".
This removes function deinit_local() because it was freeing memory only.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: cleanup listen sock handling</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c3d88dfd15835f42c79467eee2c40db3095e5271'/>
<id>c3d88dfd15835f42c79467eee2c40db3095e5271</id>
<content type='text'>
This patch removes save_listen_callbacks() and add_listen_sock() as they
are only used once in lowcomms functionality. For shutdown lowcomms it's
not necessary to whole flush the workqueues to synchronize with
restoring the old sk_data_ready() callback. Only the listen con receive
work need to be cancelled. For each individual node shutdown we should be
sure that last ack was been transmitted which is done by flushing per
connection swork worker.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch removes save_listen_callbacks() and add_listen_sock() as they
are only used once in lowcomms functionality. For shutdown lowcomms it's
not necessary to whole flush the workqueues to synchronize with
restoring the old sk_data_ready() callback. Only the listen con receive
work need to be cancelled. For each individual node shutdown we should be
sure that last ack was been transmitted which is done by flushing per
connection swork worker.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: dlm: remove socket shutdown handling</title>
<updated>2022-11-21T15:45:49+00:00</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2022-11-17T22:11:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4f567acb0b8622fe265aac4b0037a03ef544ba24'/>
<id>4f567acb0b8622fe265aac4b0037a03ef544ba24</id>
<content type='text'>
Since commit 489d8e559c65 ("fs: dlm: add reliable connection if
reconnect") we have functionality like TCP offers for half-closed
sockets on dlm application protocol layer. This feature is required
because the cluster manager events about leaving resource memberships
can be locally already occurred but other cluster nodes having a pending
leaving membership over the cluster manager protocol happening. In this
time the local dlm node already shutdown it's connection and don't
transmit anymore any new dlm messages, but however it still needs to be
able to accept dlm messages because the pending leave membership request
of the cluster manager protocol which the dlm kernel implementation has
no control about it.

We have this functionality on the application for two reasons, the main
reason is that SCTP does not support such functionality on socket
layer. But we can do it inside application layer.

Another small issue is that this feature is broken in the TCP world
because some NAT devices does not implement such functionality
correctly. This is the same reason why the reliable connection session
layer in DLM exists. We give up on middle devices in the networking
which sends e.g. TCP resets out. In DLM we cannot have any message
dropping and we ensure it over a session layer that it can't happen.

Back to the half-closed grace shutdown handling. It's not necessary
anymore to do it on socket layer (which is only support for TCP sockets)
because we do it on application layer. This patch removes this handling,
if there are still issues then we have a problem on the application
layer for such handling.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 489d8e559c65 ("fs: dlm: add reliable connection if
reconnect") we have functionality like TCP offers for half-closed
sockets on dlm application protocol layer. This feature is required
because the cluster manager events about leaving resource memberships
can be locally already occurred but other cluster nodes having a pending
leaving membership over the cluster manager protocol happening. In this
time the local dlm node already shutdown it's connection and don't
transmit anymore any new dlm messages, but however it still needs to be
able to accept dlm messages because the pending leave membership request
of the cluster manager protocol which the dlm kernel implementation has
no control about it.

We have this functionality on the application for two reasons, the main
reason is that SCTP does not support such functionality on socket
layer. But we can do it inside application layer.

Another small issue is that this feature is broken in the TCP world
because some NAT devices does not implement such functionality
correctly. This is the same reason why the reliable connection session
layer in DLM exists. We give up on middle devices in the networking
which sends e.g. TCP resets out. In DLM we cannot have any message
dropping and we ensure it over a session layer that it can't happen.

Back to the half-closed grace shutdown handling. It's not necessary
anymore to do it on socket layer (which is only support for TCP sockets)
because we do it on application layer. This patch removes this handling,
if there are still issues then we have a problem on the application
layer for such handling.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
