linux-toradex.git/net/sunrpc, branch v3.18

SUNRPC: Fix locking around callback channel reply receive

2014-11-19T17:03:20+00:00

Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().

Signed-off-by: Trond Myklebust 
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton 
Signed-off-by: J. Bruce Fields

sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor

2014-11-13T18:15:49+00:00

Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0:  (nfs_clid_init_mutex){+.+.+.}, at: [] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1:  (rcu_read_lock){......}, at: [] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e9 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [] dump_stack+0x4f/0x7c
    [] __might_sleep+0x114/0x180
    [] __kmalloc+0x1a3/0x280
    [] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [] nfs_get_client+0x314/0x3f0 [nfs]
    [] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [] nfs4_set_client+0x8a/0x110 [nfsv4]
    [] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [] nfs4_create_server+0x12f/0x390 [nfsv4]
    [] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [] mount_fs+0x39/0x1b0
    [] ? __alloc_percpu+0x15/0x20
    [] vfs_kern_mount+0x6b/0x150
    [] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [] ? get_nfs_version+0x27/0x90 [nfs]
    [] nfs_fs_mount+0x47d/0xd60 [nfs]
    [] ? mutex_unlock+0xe/0x10
    [] ? nfs_remount+0x430/0x430 [nfs]
    [] ? nfs_clone_super+0x140/0x140 [nfs]
    [] mount_fs+0x39/0x1b0
    [] ? __alloc_percpu+0x15/0x20
    [] vfs_kern_mount+0x6b/0x150
    [] do_mount+0x210/0xbe0
    [] ? copy_mount_options+0x3a/0x160
    [] SyS_mount+0x6f/0xb0
    [] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc:  # v3.17+
Reported-by: "J. Bruce Fields" 
Signed-off-by: Jeff Layton 
Signed-off-by: Trond Myklebust

Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux

2014-10-08T16:51:44+00:00

Pull nfsd updates from Bruce Fields:
 "Highlights:

   - support the NFSv4.2 SEEK operation (allowing clients to support
     SEEK_HOLE/SEEK_DATA), thanks to Anna.
   - end the grace period early in a number of cases, mitigating a
     long-standing annoyance, thanks to Jeff
   - improve SMP scalability, thanks to Trond"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: eliminate "to_delegation" define
  NFSD: Implement SEEK
  NFSD: Add generic v4.2 infrastructure
  svcrdma: advertise the correct max payload
  nfsd: introduce nfsd4_callback_ops
  nfsd: split nfsd4_callback initialization and use
  nfsd: introduce a generic nfsd4_cb
  nfsd: remove nfsd4_callback.cb_op
  nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
  nfsd: fix nfsd4_cb_recall_done error handling
  nfsd4: clarify how grace period ends
  nfsd4: stop grace_time update at end of grace period
  nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
  nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
  nfsd: serialize nfsdcltrack upcalls for a particular client
  nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  nfsd: add a v4_end_grace file to /proc/fs/nfsd
  lockd: add a /proc/fs/lockd/nlm_end_grace file
  nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
  nfsd: remove redundant boot_time parm from grace_done client tracking op
  ...

Merge branch 'bugfixes' into linux-next

2014-09-30T21:21:41+00:00

* bugfixes:
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  NFS: Fabricate fscache server index key correctly
  SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
  nfs: fix duplicate proc entries

svcrdma: advertise the correct max payload

2014-09-29T18:35:18+00:00

Svcrdma currently advertises 1MB, which is too large.  The correct value
is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
in an NFSRDMA IO chunk * the host page size. This bug is usually benign
because the Linux X64 NFSRDMA client correctly limits the payload size to
the correct value (64*4096 = 256KB).  But if the Linux client is PPC64
with a 64KB page size, then the client will indeed use a payload size
that will overflow the server.

Signed-off-by: Steve Wise 
Signed-off-by: J. Bruce Fields

SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT

2014-09-26T01:25:17+00:00

The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
order to allow NFSv4 clients to disable resend timeouts. Since those
cause the RPC layer to break the connection, they mess up the duplicate
reply caches that remain indexed on the port number in NFSv4..

This patch includes the code that was missing in the original to
set the appropriate flag in struct rpc_clnt, when the caller of
rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.

Fixes: 8a19a0b6cb2e (SUNRPC: Add RPC task and client level options to...)
Signed-off-by: Trond Myklebust

NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()

2014-09-25T12:25:47+00:00

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.

Signed-off-by: NeilBrown 
Acked-by: Jeff Layton 
Signed-off-by: Trond Myklebust

rpc: Add -EPERM processing for xs_udp_send_request()

2014-09-25T03:13:46+00:00

If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:

1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d  -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d  -j DROP

The softlockup occurs in step 4 above.

The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.

With this patch in place we can successfully abort the above sequence and
avoid the softlockup.

I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.

Reported-by: Yigong Lou 
Signed-off-by: Jason Baron 
Signed-off-by: Trond Myklebust

rpc: return sent and err from xs_sendpages()

2014-09-25T03:13:37+00:00

If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.

However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.

This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.

Signed-off-by: Jason Baron 
Signed-off-by: Trond Myklebust

SUNRPC: Don't wake tasks during connection abort

2014-09-25T03:06:56+00:00

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.

Signed-off-by: Benjamin Coddington 
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust