<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/sunrpc/sched.h, branch v2.6.38.5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>SUNRPC: Close a race in __rpc_wait_for_completion_task()</title>
<updated>2011-03-10T20:04:52+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2011-02-21T19:05:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bf294b41cefcb22fc3139e0f42c5b3f06728bd5e'/>
<id>bf294b41cefcb22fc3139e0f42c5b3f06728bd5e</id>
<content type='text'>
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops-&gt;rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops-&gt;rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Ensure that rpc_exit() always wakes up a sleeping task</title>
<updated>2010-08-04T12:54:07+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2010-07-31T18:29:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d9b6cd94601e1d17273f93a326a135fbf487a918'/>
<id>d9b6cd94601e1d17273f93a326a135fbf487a918</id>
<content type='text'>
Make rpc_exit() non-inline, and ensure that it always wakes up a task that
has been queued.

Kill off the now unused rpc_wake_up_task().

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make rpc_exit() non-inline, and ensure that it always wakes up a task that
has been queued.

Kill off the now unused rpc_wake_up_task().

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Reorder the struct rpc_task fields</title>
<updated>2010-05-14T19:09:37+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2010-05-13T16:51:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9bb0b8136a7d5b50c5807af3bf12b758fb257814'/>
<id>9bb0b8136a7d5b50c5807af3bf12b758fb257814</id>
<content type='text'>
This improves the packing of the rpc_task, and ensures that on 64-bit
platforms the size reduces to 216 bytes.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This improves the packing of the rpc_task, and ensures that on 64-bit
platforms the size reduces to 216 bytes.

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Remove the 'tk_magic' debugging field</title>
<updated>2010-05-14T19:09:36+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2010-05-13T16:51:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d72b6cec8d42eb7c2a249b613abf2c2b7a6eeb47'/>
<id>d72b6cec8d42eb7c2a249b613abf2c2b7a6eeb47</id>
<content type='text'>
It has not triggered in almost a decade. Time to get rid of it...

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It has not triggered in almost a decade. Time to get rid of it...

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Move the task-&gt;tk_bytes_sent and tk_rtt to struct rpc_rqst</title>
<updated>2010-05-14T19:09:36+00:00</updated>
<author>
<name>Trond Myklebust</name>
<email>Trond.Myklebust@netapp.com</email>
</author>
<published>2010-05-13T16:51:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d60dbb20a74c2cfa142be0a34dac3c6547ea086c'/>
<id>d60dbb20a74c2cfa142be0a34dac3c6547ea086c</id>
<content type='text'>
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Replace jiffies-based metrics with ktime-based metrics</title>
<updated>2010-05-14T19:09:33+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2010-05-07T17:34:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ff8399709e41bf72b4cb145612a0f9a9f7283c83'/>
<id>ff8399709e41bf72b4cb145612a0f9a9f7283c83</id>
<content type='text'>
Currently RPC performance metrics that tabulate elapsed time use
jiffies time values.  This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments).  It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation.  As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds.  In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools.  At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently RPC performance metrics that tabulate elapsed time use
jiffies time values.  This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments).  It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation.  As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds.  In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools.  At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: add a new priority in RPC task</title>
<updated>2009-12-15T18:53:54+00:00</updated>
<author>
<name>Alexandros Batsakis</name>
<email>batsakis@netapp.com</email>
</author>
<published>2009-12-15T05:27:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cf3b01b54880debb01ea7d471123da5887a7c2cb'/>
<id>cf3b01b54880debb01ea7d471123da5887a7c2cb</id>
<content type='text'>
Signed-off-by: Alexandros Batsakis &lt;batsakis@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Alexandros Batsakis &lt;batsakis@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rpc: add rpc_queue_empty function</title>
<updated>2009-12-15T18:51:17+00:00</updated>
<author>
<name>Alexandros Batsakis</name>
<email>batsakis@netapp.com</email>
</author>
<published>2009-12-15T05:27:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=48f186124220794fce85ed1439fc32f16f69d3e2'/>
<id>48f186124220794fce85ed1439fc32f16f69d3e2</id>
<content type='text'>
Signed-off-by: Alexandros Batsakis &lt;batsakis@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Alexandros Batsakis &lt;batsakis@netapp.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>SUNRPC: Allow RPCs to fail quickly if the server is unreachable</title>
<updated>2009-12-03T20:58:56+00:00</updated>
<author>
<name>Chuck Lever</name>
<email>chuck.lever@oracle.com</email>
</author>
<published>2009-12-03T20:58:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=09a21c4102c8f7893368553273d39c0cadedf9af'/>
<id>09a21c4102c8f7893368553273d39c0cadedf9af</id>
<content type='text'>
The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.

In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable.  This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.

Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.

We don't want soft connection semantics to apply to other errors.  The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.

The transport's connection re-establishment timeout is also ignored for
such requests.  We want the request to fail immediately, so the
reconnect delay is skipped.  Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.

In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable.  This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.

Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.

We don't want soft connection semantics to apply to other errors.  The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.

The transport's connection re-establishment timeout is also ignored for
such requests.  We want the request to fail immediately, so the
reconnect delay is skipped.  Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.

Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;Trond.Myklebust@netapp.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nfs41: Add backchannel processing support to RPC state machine</title>
<updated>2009-06-17T21:11:24+00:00</updated>
<author>
<name>Ricardo Labiaga</name>
<email>Ricardo.Labiaga@netapp.com</email>
</author>
<published>2009-04-01T13:23:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=55ae1aabfb108106dd095de2578ceef1c755a8b8'/>
<id>55ae1aabfb108106dd095de2578ceef1c755a8b8</id>
<content type='text'>
Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests.  It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.

It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.

Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.

Introduce a new transmit state for the client to reply on to backchannel
requests.  This new state simply reserves the transport and issues the
reply.  In case of a connection related error, disconnects the transport and
drops the reply.  It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".

Note: There is no need to loop attempting to reserve the transport.  If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit.  rpc_execute() will invoke it again
after the task is taken off the sleep queue.

[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson &lt;andros@netapp.com&gt;
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests.  It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.

It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.

Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.

Introduce a new transmit state for the client to reply on to backchannel
requests.  This new state simply reserves the transport and issues the
reply.  In case of a connection related error, disconnects the transport and
drops the reply.  It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".

Note: There is no need to loop attempting to reserve the transport.  If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit.  rpc_execute() will invoke it again
after the task is taken off the sleep queue.

[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson &lt;andros@netapp.com&gt;
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga &lt;Ricardo.Labiaga@netapp.com&gt;
Signed-off-by: Benny Halevy &lt;bhalevy@panasas.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
