summaryrefslogtreecommitdiff
path: root/fs/nfsd
AgeCommit message (Collapse)Author
12 daysMerge tag 'nfsd-6.18-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Regression fixes: - Revert the patch that removed the cap on MAX_OPS_PER_COMPOUND - Address a kernel build issue Stable fixes: - Fix crash when a client queries new attributes on forechannel - Fix rare NFSD crash when tracing is enabled" * tag 'nfsd-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND" nfsd: Avoid strlen conflict in nfsd4_encode_components_esc() NFSD: Fix crash in nfsd4_read_release() NFSD: Define actions for the new time_deleg FATTR4 attributes
2025-10-21Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"Chuck Lever
I've found that pynfs COMP6 now leaves the connection or lease in a strange state, which causes CLOSE9 to hang indefinitely. I've dug into it a little, but I haven't been able to root-cause it yet. However, I bisected to commit 48aab1606fa8 ("NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"). Tianshuo Han also reports a potential vulnerability when decoding an NFSv4 COMPOUND. An attacker can place an arbitrarily large op count in the COMPOUND header, which results in: [ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 when NFSD attempts to allocate the COMPOUND op array. Let's restore the operation-per-COMPOUND limit, but increased to 200 for now. Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Tested-by: Tianshuo Han <hantianshuo233@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21nfsd: Avoid strlen conflict in nfsd4_encode_components_esc()Nathan Chancellor
There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting with the function strlen(): In file included from include/linux/cpumask.h:11, from arch/x86/include/asm/paravirt.h:21, from arch/x86/include/asm/irqflags.h:102, from include/linux/irqflags.h:18, from include/linux/spinlock.h:59, from include/linux/mmzone.h:8, from include/linux/gfp.h:7, from include/linux/slab.h:16, from fs/nfsd/nfs4xdr.c:37: fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc': include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer 321 | __trace_puts(_THIS_IP_, str, strlen(str)); \ | ^~~~~~ include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts' 265 | trace_puts(fmt); \ | ^~~~~~~~~~ include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk' 34 | # define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__) | ^~~~~~~~~~~~ include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk' 42 | __sunrpc_printk(fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~~~~~~ include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk' 25 | dfprintk(FACILITY, fmt, ##__VA_ARGS__) | ^~~~~~~~ fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk' 2646 | dprintk("nfsd4_encode_components(%s)\n", components); | ^~~~~~~ fs/nfsd/nfs4xdr.c:2643:13: note: declared here 2643 | int strlen, count=0; | ^~~~~~ This dprintk() instance is not particularly useful, so just remove it altogether to get rid of the immediate strlen() conflict. At the same time, eliminate the local strlen variable to avoid potential conflicts with strlen() in the future. Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21NFSD: Fix crash in nfsd4_read_release()Chuck Lever
When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21NFSD: Define actions for the new time_deleg FATTR4 attributesChuck Lever
NFSv4 clients won't send legitimate GETATTR requests for these new attributes because they are intended to be used only with CB_GETATTR and SETATTR. But NFSD has to do something besides crashing if it ever sees a GETATTR request that queries these attributes. RFC 8881 Section 18.7.3 states: > The server MUST return a value for each attribute that the client > requests if the attribute is supported by the server for the > target file system. If the server does not support a particular > attribute on the target file system, then it MUST NOT return the > attribute value and MUST NOT set the attribute bit in the result > bitmap. The server MUST return an error if it supports an > attribute on the target but cannot obtain its value. In that case, > no attribute values will be returned. Further, RFC 9754 Section 5 states: > These new attributes are invalid to be used with GETATTR, VERIFY, > and NVERIFY, and they can only be used with CB_GETATTR and SETATTR > by a client holding an appropriate delegation. Thus there does not appear to be a specific server response mandated by specification. Taking the guidance that querying these attributes via GETATTR is "invalid", NFSD will return nfserr_inval, failing the request entirely. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t Fixes: 51c0d4f7e317 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-14Merge tag 'nfsd-6.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix a crasher reported by rtm@csail.mit.edu * tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Define a proc_layoutcommit for the FlexFiles layout type
2025-10-10NFSD: Define a proc_layoutcommit for the FlexFiles layout typeChuck Lever
Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT operation on a FlexFiles layout. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t Cc: Thomas Haynes <loghyr@hammerspace.com> Cc: stable@vger.kernel.org Fixes: 9b9960a0ca47 ("nfsd: Add a super simple flex file server") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-06Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Mike Snitzer has prototyped a mechanism for disabling I/O caching in NFSD. This is introduced in v6.18 as an experimental feature. This enables scaling NFSD in /both/ directions: - NFS service can be supported on systems with small memory footprints, such as low-cost cloud instances - Large NFS workloads will be less likely to force the eviction of server-local activity, helping it avoid thrashing Jeff Layton contributed a number of fixes to the new attribute delegation implementation (based on a pending Internet RFC) that we hope will make attribute delegation reliable enough to enable by default, as it is on the Linux NFS client. The remaining patches in this pull request are clean-ups and minor optimizations. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.18 NFSD development cycle" * tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) nfsd: discard nfserr_dropit SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it NFSD: Add io_cache_{read,write} controls to debugfs NFSD: Do the grace period check in ->proc_layoutget nfsd: delete unnecessary NULL check in __fh_verify() NFSD: Allow layoutcommit during grace period NFSD: Disallow layoutget during grace period sunrpc: fix "occurence"->"occurrence" nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in nfsd: nfserr_jukebox in nlm_fopen should lead to a retry NFSD: Reduce DRC bucket size NFSD: Delay adding new entries to LRU SUNRPC: Move the svc_rpcb_cleanup() call sites NFS: Remove rpcbind cleanup for NFSv4.0 callback nfsd: unregister with rpcbind when deleting a transport NFSD: Drop redundant conversion to bool sunrpc: eliminate return pointer in svc_tcp_sendmsg() sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length nfsd: decouple the xprtsec policy check from check_nfsd_access() NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() ...
2025-10-03Merge tag 'pull-f_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...
2025-10-03Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add a Kconfig option to redirect dfprintk() to the trace buffer - Enable use of the RWF_DONTCACHE flag on the NFS client - Add striped layout handling to pNFS flexfiles - Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: - Handle NFS4ERR_GRACE errors during delegation recall - Fix NFSv4.1 backchannel max_resp_sz verification check - Fix mount hang after CREATE_SESSION failure - Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: - Improvements to write handling tracepoints - Fix a few trivial spelling mistakes - Cleanups to the rpcbind cleanup call sites - Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page - Remove unused NFS_WBACK_BUSY() macro - Remove __GFP_NOWARN flags - Unexport rpc_malloc() and rpc_free()" * tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits) NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support nfs/localio: add tracepoints for misaligned DIO READ and WRITE support nfs/localio: add proper O_DIRECT support for READ and WRITE nfs/localio: refactor iocb initialization nfs/localio: refactor iocb and iov_iter_bvec initialization nfs/localio: avoid issuing misaligned IO using O_DIRECT nfs/localio: make trace_nfs_local_open_fh more useful NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support sunrpc: unexport rpc_malloc() and rpc_free() NFSv4/flexfiles: Add support for striped layouts NFSv4/flexfiles: Update layout stats & error paths for striped layouts NFSv4/flexfiles: Write path updates for striped layouts NFSv4/flexfiles: Commit path updates for striped layouts NFSv4/flexfiles: Read path updates for striped layouts NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. NFSv4/flexfiles: Add data structure support for striped layouts NFSv4/flexfiles: Use ds_commit_idx when marking a write commit NFSv4/flexfiles: Remove cred local variable dependency nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing NFS: Enable use of the RWF_DONTCACHE flag on the NFS client ...
2025-10-03Merge tag 'pull-nfsctl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nfsctl updates from Al Viro: "nfsctl cleanups and a fix" * tag 'pull-nfsctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd_get_inode(): lift setting ->i_{,f}op to callers. nfsdfs_create_files(): switch to simple_start_creating() _nfsd_symlink(): switch to simple_start_creating() nfsd_mkdir(): switch to simple_start_creating() nfsctl: symlink has no business bumping link count of parent directory
2025-10-01nfsd: discard nfserr_dropitNeilBrown
nfserr_dropit hasn't been used for over a decade, since rq_dropme and the RQ_DROPME were introduced. Time to get rid of it completely. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01NFSD: Add io_cache_{read,write} controls to debugfsMike Snitzer
Add 'io_cache_read' to NFSD's debugfs interface so that any data read by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_read may be set by writing to: /sys/kernel/debug/nfsd/io_cache_read Add 'io_cache_write' to NFSD's debugfs interface so that any data written by NFSD will either be: - cached using page cache (NFSD_IO_BUFFERED=0) - cached but removed from the page cache upon completion (NFSD_IO_DONTCACHE=1). io_cache_write may be set by writing to: /sys/kernel/debug/nfsd/io_cache_write The default value for both settings is NFSD_IO_BUFFERED, which is NFSD's existing behavior for both read and write. Changes to these settings take immediate effect for all exports and NFS versions. Currently only xfs and ext4 implement RWF_DONTCACHE. For file systems that do not implement RWF_DONTCACHE, NFSD use only buffered I/O when the io_cache setting is NFSD_IO_DONTCACHE. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01NFSD: Do the grace period check in ->proc_layoutgetChuck Lever
RFC 8881 Section 18.43.3 states: > If the metadata server is in a grace period, and does not persist > layouts and device ID to device address mappings, then it MUST > return NFS4ERR_GRACE (see Section 8.4.2.1). Jeff observed that this suggests the grace period check is better done by the individual layout type implementations, because checking for the server grace period is unnecessary for some layout types. Suggested-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01nfsd: delete unnecessary NULL check in __fh_verify()Dan Carpenter
In commit 4a0de50a44bb ("nfsd: decouple the xprtsec policy check from check_nfsd_access()") we added a NULL check on "rqstp" to earlier in the function. This check is no longer required so delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-01NFSD: Allow layoutcommit during grace periodSergey Bashirov
If the loca_reclaim field is set to TRUE, this indicates that the client is attempting to commit changes to a layout after the restart of the metadata server during the metadata server's recovery grace period. This type of request may be necessary when the client has uncommitted writes to provisionally allocated byte-ranges of a file that were sent to the storage devices before the restart of the metadata server. See RFC 8881, section 18.42.3. Without this, the client is not able to increase the file size and commit preallocated extents when the block/scsi layout server is restarted during a write and is in a grace period. And when the grace period ends, the client also cannot perform layoutcommit because the old layout state becomes invalid, resulting in file corruption. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-30nfs/localio: avoid issuing misaligned IO using O_DIRECTMike Snitzer
Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO using O_DIRECT. Any misaligned DIO falls back to using buffered IO. Because misaligned DIO is now handled safely, remove the nfs modparam 'localio_O_DIRECT_semantics' that was added to require users opt-in to the requirement that all O_DIRECT be properly DIO-aligned. Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of iov_iter_aligned_bvec() that also verifies the offset associated with an iov_iter is DIO-aligned. NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed along with iov_iter_is_aligned(). Lastly, add pr_info_ratelimited if underlying filesystem returns -EINVAL because it was made to try O_DIRECT for IO that is not DIO-aligned (shouldn't happen, so its best to be louder if it does). Fixes: 3feec68563d ("nfs/localio: add direct IO enablement with sync and async IO support") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN supportMike Snitzer
Use STATX_DIOALIGN and STATX_DIO_READ_ALIGN to get DIO alignment attributes from the underlying filesystem and store them in the associated nfsd_file. This is done when the nfsd_file is first opened for each regular file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-29Merge tag 'vfs-6.18-rc1.async' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async directory updates from Christian Brauner: "This contains further preparatory changes for the asynchronous directory locking scheme: - Add lookup_one_positive_killable() which allows overlayfs to perform lookup that won't block on a fatal signal - Unify the mount idmap handling in struct renamedata as a rename can only happen within a single mount - Introduce kern_path_parent() for audit which sets the path to the parent and returns a dentry for the target without holding any locks on return - Rename kern_path_locked() as it is only used to prepare for the removal of an object from the filesystem: kern_path_locked() => start_removing_path() kern_path_create() => start_creating_path() user_path_create() => start_creating_user_path() user_path_locked_at() => start_removing_user_path_at() done_path_create() => end_creating_path() NA => end_removing_path()" * tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: debugfs: rename start_creating() to debugfs_start_creating() VFS: rename kern_path_locked() and related functions. VFS/audit: introduce kern_path_parent() for audit VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata VFS: discard err2 in filename_create() VFS/ovl: add lookup_one_positive_killable()
2025-09-29Merge tag 'vfs-6.18-rc1.workqueue' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq
2025-09-25NFSD: Disallow layoutget during grace periodSergey Bashirov
When the server is recovering from a reboot and is in a grace period, any operation that may result in deletion or reallocation of block extents should not be allowed. See RFC 8881, section 18.43.3. If multiple clients write data to the same file, rebooting the server during writing may result in file corruption. In the worst case, the exported XFS may also become corrupted. Observed this behavior while testing pNFS block volume setup. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-23SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23VFS: unify old_mnt_idmap and new_mnt_idmap in renamedataNeilBrown
A rename operation can only rename within a single mount. Callers of vfs_rename() must and do ensure this is the case. So there is no point in having two mnt_idmaps in renamedata as they are always the same. Only one of them is passed to ->rename in any case. This patch replaces both with a single "mnt_idmap" and changes all callers. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-21nfsd: Don't force CRYPTO_LIB_SHA256 to be built-inEric Biggers
Now that nfsd is accessing SHA-256 via the library API instead of via crypto_shash, there is a direct symbol dependency on the SHA-256 code and there is no benefit to be gained from forcing it to be built-in. Therefore, select CRYPTO_LIB_SHA256 from NFSD (conditional on NFSD_V4) instead of from NFSD_V4, so that it can be 'm' if NFSD is 'm'. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: nfserr_jukebox in nlm_fopen should lead to a retryOlga Kornievskaia
When v3 NLM request finds a conflicting delegation, it triggers a delegation recall and nfsd_open fails with EAGAIN. nfsd_open then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead of returning nlm_failed for when there is a conflicting delegation, drop this NLM request so that the client retries. Once delegation is recalled and if a local lock is claimed, a retry would lead to nfsd returning a nlm_lck_blocked error or a successful nlm lock. Fixes: d343fce148a4 ("[PATCH] knfsd: Allow lockd to drop replies as appropriate") Cc: stable@vger.kernel.org # v6.6 Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Reduce DRC bucket sizeChuck Lever
The common case is that a DRC lookup will not find the XID in the bucket. Reduce the amount of pointer chasing during the lookup by keeping fewer entries in each hash bucket. Changing the bucket size constant forces the size of the DRC hash table to increase, and the height of each bucket r-b tree to be reduced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Delay adding new entries to LRUChuck Lever
Neil Brown observes: > I would not include RC_INPROG entries in the lru at all - they are > always ignored, and will be added when they are switched to > RCU_DONE. I also removed a stale comment. Suggested-by: NeilBrown <neil@brown.name> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Drop redundant conversion to boolXichao Zhao
The result of integer comparison already evaluates to bool. No need for explicit conversion. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: decouple the xprtsec policy check from check_nfsd_access()Scott Mayhew
A while back I had reported that an NFSv3 client could successfully mount using '-o xprtsec=none' an export that had been exported with 'xprtsec=tls:mtls'. By "successfully" I mean that the mount command would succeed and the mount would show up in /proc/mount. Attempting to do anything futher with the mount would be met with NFS3ERR_ACCES. This was fixed (albeit accidentally) by commit bb4f07f2409c ("nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was subsequently re-broken by commit 0813c5f01249 ("nfsd: fix access checking for NLM under XPRTSEC policies"). Transport Layer Security isn't an RPC security flavor or pseudo-flavor, so we shouldn't be conflating them when determining whether the access checks can be bypassed. Split check_nfsd_access() into two helpers, and have __fh_verify() call the helpers directly since __fh_verify() has logic that allows one or both of the checks to be skipped. All other sites will continue to call check_nfsd_access(). Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/ Fixes: 9280c5774314 ("NFSD: Handle new xprtsec= export option") Cc: stable@vger.kernel.org Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()Thorsten Blum
Commit 5304877936c0 ("NFSD: Fix strncpy() fortify warning") replaced strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy() already guaranteed NUL-termination of the destination buffer and subtracting one byte potentially truncated the source string. The incorrect size was then carried over in commit 72f78ae00a8e ("NFSD: move from strlcpy with unused retval to strscpy") when switching from strlcpy() to strscpy(). Fix this off-by-one error by using the full size of the destination buffer again. Cc: stable@vger.kernel.org Fixes: 5304877936c0 ("NFSD: Fix strncpy() fortify warning") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: Eliminate an allocation in nfs4_make_rec_clidname()Eric Biggers
Since MD5 digests are fixed-size, make nfs4_make_rec_clidname() store the digest in a stack buffer instead of a dynamically allocated buffer. Use MD5_DIGEST_SIZE instead of a hard-coded value, both in nfs4_make_rec_clidname() and in the definition of HEXDIR_LEN. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: Replace open-coded conversion of bytes to hexEric Biggers
Since the Linux kernel's sprintf() has conversion to hex built-in via "%*phN", delete md5_to_hex() and just use that. Also add an explicit array bound to the dname parameter of nfs4_make_rec_clidname() to make its size clear. No functional change. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegationJeff Layton
Instead of allowing the ctime to roll backward with a WRITE_ATTRS delegation, set FMODE_NOCMTIME on the file and have it skip mtime and ctime updates. It is possible that the client will never send a SETATTR to set the times before returning the delegation. Add two new bools to struct nfs4_delegation: dl_written: tracks whether the file has been written since the delegation was granted. This is set in the WRITE and LAYOUTCOMMIT handlers. dl_setattr: tracks whether the client has sent at least one valid mtime that can also update the ctime in a SETATTR. When unlocking the lease for the delegation, clear FMODE_NOCMTIME. If the file has been written, but no setattr for the delegated mtime and ctime has been done, update the timestamps to current_time(). Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: fix timestamp updates in CB_GETATTRJeff Layton
When updating the local timestamps from CB_GETATTR, the updated values are not being properly vetted. Compare the update times vs. the saved times in the delegation rather than the current times in the inode. Also, ensure that the ctime is properly vetted vs. its original value. Fixes: 6ae30d6eb26b ("nfsd: add support for delegated timestamps") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: fix SETATTR updates for delegated timestampsJeff Layton
SETATTRs containing delegated timestamp updates are currently not being vetted properly. Since we no longer need to compare the timestamps vs. the current timestamps, move the vetting of delegated timestamps wholly into nfsd. Rename the set_cb_time() helper to nfsd4_vet_deleg_time(), and make it non-static. Add a new vet_deleg_attrs() helper that is called from nfsd4_setattr that uses nfsd4_vet_deleg_time() to properly validate the all the timestamps. If the validation indicates that the update should be skipped, unset the appropriate flags in ia_valid. Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: track original timestamps in nfs4_delegationJeff Layton
As Trond points out [1], the "original time" mentioned in RFC 9754 refers to the timestamps on the files at the time that the delegation was granted, and not the current timestamp of the file on the server. Store the current timestamps for the file in the nfs4_delegation when granting one. Add STATX_ATIME and STATX_MTIME to the request mask in nfs4_delegation_stat(). When granting OPEN_DELEGATE_READ_ATTRS_DELEG, do a nfs4_delegation_stat() and save the correct atime. If the stat() fails for any reason, fall back to granting a normal read deleg. [1]: https://lore.kernel.org/linux-nfs/47a4e40310e797f21b5137e847b06bb203d99e66.camel@kernel.org/ Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: use ATTR_CTIME_SET for delegated ctime updatesJeff Layton
Ensure that notify_change() doesn't clobber a delegated ctime update with current_time() by setting ATTR_CTIME_SET for those updates. Don't bother setting the timestamps in cb_getattr_update_times() in the non-delegated case. notify_change() will do that itself. Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: ignore ATTR_DELEG when checking ia_valid before notify_change()Jeff Layton
If the only flag left is ATTR_DELEG, then there are no changes to be made. Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: fix assignment of ia_ctime.tv_nsec on delegated mtime updateJeff Layton
The ia_ctime.tv_nsec field should be set to modify.nseconds. Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Fix last write offset handling in layoutcommitSergey Bashirov
The data type of loca_last_write_offset is newoffset4 and is switched on a boolean value, no_newoffset, that indicates if a previous write occurred or not. If no_newoffset is FALSE, an offset is not given. This means that client does not try to update the file size. Thus, server should not try to calculate new file size and check if it fits into the segment range. See RFC 8881, section 12.5.4.2. Sometimes the current incorrect logic may cause clients to hang when trying to sync an inode. If layoutcommit fails, the client marks the inode as dirty again. Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Implement large extent array support in pNFSSergey Bashirov
When pNFS client in the block or scsi layout mode sends layoutcommit to MDS, a variable length array of modified extents is supplied within the request. This patch allows the server to accept such extent arrays if they do not fit within single memory page. The issue can be reproduced when writing to a 1GB file using FIO with O_DIRECT, 4K block and large I/O depth without preallocation of the file. In this case, the server returns NFSERR_BADXDR to the client. Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Minor cleanup in layoutcommit decodingSergey Bashirov
Use the appropriate xdr function to decode the lc_newoffset field, which is a boolean value. See RFC 8881, section 18.42.1. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Minor cleanup in layoutcommit processingSergey Bashirov
Remove dprintk in nfsd4_layoutcommit. These are not needed in day to day usage, and the information is also available in Wireshark when capturing NFS traffic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Rework encoding and decoding of nfsd4_deviceidSergey Bashirov
Compilers may optimize the layout of C structures, so we should not rely on sizeof struct and memcpy to encode and decode XDR structures. The byte order of the fields should also be taken into account. This patch adds the correct functions to handle the deviceid4 structure and removes the pad field, which is currently not used by NFSD, from the runtime state. The server's byte order is preserved because the deviceid4 blob on the wire is only used as a cookie by the client. Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: discard nfsd_file_get_local()NeilBrown
This interface was deprecated by commit e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file") and is now unused. So let's remove it. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Move the fh_getattr() helperChuck Lever
Clean up: The fh_getattr() function is part of NFSD's file handle API, so relocate it. I've made it an un-inlined function so that trace points and new functionality can easily be introduced. That increases the size of nfsd.ko by about a page on my x86_64 system (out of 26MB; compiled with -O2). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21NFSD: Relocate the fh_want_write() and fh_drop_write() helpersChuck Lever
Clean up: these helpers are part of the NFSD file handle API. Relocate them to fs/nfsd/nfsfh.h. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-19fs: replace use of system_unbound_wq with system_dfl_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. system_unbound_wq should be the default workqueue so as not to enforce locality constraints for random work whenever it's not required. Adding system_dfl_wq to encourage its use when unbound work should be used. The old system_unbound_wq will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://lore.kernel.org/20250916082906.77439-2-marco.crivellari@suse.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-17nfsd_get_inode(): lift setting ->i_{,f}op to callers.Al Viro
Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>