summaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtrdma/transport.c
AgeCommit message (Collapse)Author
2012-09-19SUNRPC: Fix a UDP transport regressionTrond Myklebust
commit f39c1bfb5a03e2d255451bff05be0d7255298fa4 upstream. Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing hangs in the case of NFS over UDP mounts. Since neither the UDP or the RDMA transport mechanism use dynamic slot allocation, we can skip grabbing the socket lock for those transports. Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA case. Note that the NFSv4.1 back channel assigns the slot directly through rpc_run_bc_task, so we can ignore that case. Reported-by: Dick Streefland <dick.streefland@altium.nl> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-10nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton
commit 5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 upstream. We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2011-07-17SUNRPC: Support dynamic slot allocation for TCP connectionsTrond Myklebust
Allow the number of available slots to grow with the TCP window size. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-17SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slotTrond Myklebust
This throttles the allocation of new slots when the socket is busy reconnecting and/or is out of buffer space. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-21sunrpc/xprtrdma: clean up workqueue usageTejun Heo
* Create and use svc_rdma_wq instead of using the system workqueue and flush_scheduled_work(). This workqueue is necessary to serve as flushing domain for rdma->sc_work which is used to destroy itself and thus can't be flushed explicitly. * Replace cancel_delayed_work() + flush_scheduled_work() with cancel_delayed_work_sync(). * Implement synchronous connect in xprt_rdma_connect() using flush_delayed_work() on the rdma_connect work instead of using flush_scheduled_work(). This is to prepare for the deprecation and removal of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Tag rpc_xprt with netPavel Emelyanov
The net is known from the xprt_create and this tagging will also give un the context in the conntection workers where real sockets are created. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Factor out rpc_xprt freeingPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01sunrpc: Factor out rpc_xprt allocationPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-05-14SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqstTrond Myklebust
It seems strange to maintain stats for bytes_sent in one structure, and bytes received in another. Try to assemble all the RPC request-related stats in struct rpc_rqst Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14SUNRPC: Fail over more quickly on connect errorsTrond Myklebust
We should not allow soft tasks to wait for longer than the major timeout period when waiting for a reconnect to occur. Remove the field xprt->connect_timeout since it has been obsoleted by xprt->reestablish_timeout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14SUNRPC: Move the test for XPRT_CONNECTING into xprt_connect()Trond Myklebust
This fixes a bug with setting xprt->stat.connect_start. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-08net/sunrpc: Convert (void)snprintf to snprintfJoe Perches
(Applies on top of "Remove uses of NIPQUAD, use %pI4") Casts to void of snprintf are most uncommon in kernel source. 9 use casts, 1301 do not. Remove the remaining uses in net/sunrpc/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08net/sunrpc: Remove uses of NIPQUAD, use %pI4Joe Perches
Originally submitted Jan 1, 2010 http://patchwork.kernel.org/patch/71221/ Convert NIPQUAD to the %pI4 format extension where possible Convert %02x%02x%02x%02x/NIPQUAD to %08x/ntohl Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18sysctl: Drop & in front of every proc_handler.Eric W. Biederman
For consistency drop & in front of every proc_handler. Explicity taking the address is unnecessary and it prevents optimizations like stubbing the proc_handlers to NULL. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12sysctl net: Remove unused binary sysctl codeEric W. Biederman
Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-08-09SUNRPC: Kill RPC_DISPLAY_ALLChuck Lever
At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL. Currently there are no uses of RPC_DISPLAY_ALL outside the transport modules themselves, so we can safely get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Use rpc_ntop() for constructing transport address stringsChuck Lever
Clean up: In addition to using the new generic rpc_ntop() and rpc_get_port() functions, have the RPC client compute the presentation address buffer sizes dynamically using kstrdup(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09SUNRPC: Remove duplicate universal address generationChuck Lever
RPC universal address generation is currently done in several places: rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the redundant cases that convert a socket address to a universal address. The nfs4proc.c case takes a pre-formatted presentation address string, not a socket address, so we'll leave that one. Because the new uaddr constructor uses the recently introduced rpc_ntop(), it now supports proper "::" shorthanding for IPv6 addresses. This allows the kernel to register properly formed universal addresses with the local rpcbind service, in _all_ cases. The kernel can now also send properly formed universal addresses in RPCB_GETADDR requests, and support link-local properly when encoding and decoding IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-31net: replace NIPQUAD() in net/*/Harvey Harrison
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10RPC/RDMA: correct the reconnect timer backoffTom Talpey
The RPC/RDMA code had a constant 5-second reconnect backoff, and always performed it, even when re-establishing a connection to a server after the RPC layer closed it due to being idle. Make it an geometric backoff (up to 30 seconds), and don't delay idle reconnect. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: optionally emit useful transport info upon connect/disconnect.Tom Talpey
Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.Tom Talpey
The RPC/RDMA protocol allows clients and servers to avoid RDMA operations for data which is purely the result of XDR padding. On the client, automatically insert the necessary padding for such server replies, and optionally don't marshal such chunks. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: maintain the RPC task bytes-sent statistic.Tom Talpey
Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: suppress retransmit on RPC/RDMA clients.Tom Talpey
An RPC/RDMA client cannot retransmit on an unbroken connection, doing so violates its flow control with the server. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10RPC/RDMA: support FRMR client memory registration.Tom Talpey
Configure, detect and use "fastreg" support from IB/iWARP verbs layer to perform RPC/RDMA memory registration. Make FRMR the default memreg mode (will fall back if not supported by the selected RDMA adapter). This allows full and optimal operation over the cxgb3 adapter, and others. Signed-off-by: Tom Talpey <talpey@netapp.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07SUNRPC: Fix a nfs4 over rdma transport oopsTom Talpey
Prevent an RPC oops when freeing a dynamically allocated RDMA buffer, used in certain special-case large metadata operations. Signed-off-by: Tom Talpey <tmt@netapp.com> Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Clean up functions that free address_strings arrayChuck Lever
Clean up: document the rule (kfree) and the exceptions (RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in a transport's address_strings array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Add support for per-client timeout valuesTrond Myklebust
In order to be able to support setting the timeo and retrans parameters on a per-mountpoint basis, we move the rpc_timeout structure into the rpc_clnt. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Clean up the transport timeout initialisationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Rename xprt_disconnect()Trond Myklebust
xprt_disconnect() should really only be called when the transport shutdown is completed, and it is time to wake up any pending tasks. Rename it to xprt_disconnect_done() in order to reflect the semantical change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26SUNRPC: remove NFS/RDMA client's binary sysctlsJames Lentini
Support for binary sysctls is being deprecated in 2.6.24. Since there are no applications using the NFS/RDMA client's binary sysctls, it makes sense to remove them. The patch below does this while leaving the /proc/sys interface unchanged. Please consider this for 2.6.24. Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-14sunrpc/xprtrdma/transport.c: fix use-after-freeAdrian Bunk
Fix an obvious use-after-free spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-09RPCRDMA: rpc rdma transport switch\"Talpey, Thomas\
This implements the configuration and building of the core transport switch implementation of the rpcrdma transport. Stubs are provided for the rpcrdma protocol handling, and the infiniband/iwarp verbs interface. These are provided in following patches. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>