summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2026-02-27NFS: Fix NFS KConfig typosAnna Schumaker
Two issues were noticed after the NFS v4.0 KConfig changes were merged upstream. First, the text of CONFIG_NFS_V4 should not encourage people to select it if they are unsure. Second, the new CONFIG_NFS_V4_0 option should default to "on" instead of "off" to avoid breaking people's setups if they are using NFS v4.0. Reported-by: Niklas Cassel <cassel@kernel.org> Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Fixes: 4e0269352534 ("NFS: Add a way to disable NFS v4.0 via KConfig") Fixes: 7537db24806f ("NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-23nfs: return EISDIR on nfs3_proc_create if d_alias is a dirRoberto Bergantinos Corpas
If we found an alias through nfs3_do_create/nfs_add_or_obtain /d_splice_alias which happens to be a dir dentry, we don't return any error, and simply forget about this alias, but the original dentry we were adding and passed as parameter remains negative. This later causes an oops on nfs_atomic_open_v23/finish_open since we supply a negative dentry to do_dentry_open. This has been observed running lustre-racer, where dirs and files are created/removed concurrently with the same name and O_EXCL is not used to open files (frequent file redirection). While d_splice_alias typically returns a directory alias or NULL, we explicitly check d_is_dir() to ensure that we don't attempt to perform file operations (like finish_open) on a directory inode, which triggers the observed oops. Fixes: 7c6c5249f061 ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Reviewed-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...
2026-02-12Merge tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Neil Brown and Jeff Layton contributed a dynamic thread pool sizing mechanism for NFSD. The sunrpc layer now tracks minimum and maximum thread counts per pool, and NFSD adjusts running thread counts based on workload: idle threads exit after a timeout when the pool exceeds its minimum, and new threads spawn automatically when all threads are busy. Administrators control this behavior via the nfsdctl netlink interface. Rick Macklem, FreeBSD NFS maintainer, generously contributed server- side support for the POSIX ACL extension to NFSv4, as specified in draft-ietf-nfsv4-posix-acls. This extension allows NFSv4 clients to get and set POSIX access and default ACLs using native NFSv4 operations, eliminating the need for sideband protocols. The feature is gated by a Kconfig option since the IETF draft has not yet been ratified. Chuck Lever delivered numerous improvements to the xdrgen tool. Error reporting now covers parsing, AST transformation, and invalid declarations. Generated enum decoders validate incoming values against valid enumerator lists. New features include pass-through line support for embedding C directives in XDR specifications, 16-bit integer types, and program number definitions. Several code generation issues were also addressed. When an administrator revokes NFSv4 state for a filesystem via the unlock_fs interface, ongoing async COPY operations referencing that filesystem are now cancelled, with CB_OFFLOAD callbacks notifying affected clients. The remaining patches in this pull request are clean-ups and minor optimizations. Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.0 NFSD development cycle" * tag 'nfsd-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits) NFSD: Add POSIX ACL file attributes to SUPPATTR bitmasks NFSD: Add POSIX draft ACL support to the NFSv4 SETATTR operation NFSD: Add support for POSIX draft ACLs for file creation NFSD: Add support for XDR decoding POSIX draft ACLs NFSD: Refactor nfsd_setattr()'s ACL error reporting NFSD: Do not allow NFSv4 (N)VERIFY to check POSIX ACL attributes NFSD: Add nfsd4_encode_fattr4_posix_access_acl NFSD: Add nfsd4_encode_fattr4_posix_default_acl NFSD: Add nfsd4_encode_fattr4_acl_trueform_scope NFSD: Add nfsd4_encode_fattr4_acl_trueform Add RPC language definition of NFSv4 POSIX ACL extension NFSD: Add a Kconfig setting to enable support for NFSv4 POSIX ACLs xdrgen: Implement pass-through lines in specifications nfsd: cancel async COPY operations when admin revokes filesystem state nfsd: add controls to set the minimum number of threads per pool nfsd: adjust number of running nfsd threads based on activity sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSY sunrpc: split new thread creation into a separate function sunrpc: introduce the concept of a minimum number of threads per pool sunrpc: track the max number of requested threads in a pool ...
2026-02-09Merge tag 'vfs-7.0-rc1.leases' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs lease updates from Christian Brauner: "This contains updates for lease support to require filesystems to explicitly opt-in to lease support Currently kernel_setlease() falls through to generic_setlease() when a a filesystem does not define ->setlease(), silently granting lease support to every filesystem regardless of whether it is prepared for it. This is a poor default: most filesystems never intended to support leases, and the silent fallthrough makes it impossible to distinguish "supports leases" from "never thought about it". This inverts the default. It adds explicit .setlease = generic_setlease; assignments to every in-tree filesystem that should retain lease support, then changes kernel_setlease() to return -EINVAL when ->setlease is NULL. With the new default in place, simple_nosetlease() is redundant and is removed along with all references to it" * tag 'vfs-7.0-rc1.leases' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) fuse: add setlease file operation fs: remove simple_nosetlease() filelock: default to returning -EINVAL when ->setlease operation is NULL xfs: add setlease file operation ufs: add setlease file operation udf: add setlease file operation tmpfs: add setlease file operation squashfs: add setlease file operation overlayfs: add setlease file operation orangefs: add setlease file operation ocfs2: add setlease file operation ntfs3: add setlease file operation nilfs2: add setlease file operation jfs: add setlease file operation jffs2: add setlease file operation gfs2: add a setlease file operation fat: add setlease file operation f2fs: add setlease file operation exfat: add setlease file operation ext4: add setlease file operation ...
2026-02-09Merge tag 'vfs-7.0-rc1.nonblocking_timestamps' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This contains the changes to support non-blocking timestamp updates. Since commit 66fa3cedf16a ("fs: Add async write file modification handling") file_update_time_flags() unconditionally returns -EAGAIN when any timestamp needs updating and IOCB_NOWAIT is set. This makes non-blocking direct writes impossible on file systems with granular enough timestamps, which in practice means all of them. This reworks the timestamp update path to propagate IOCB_NOWAIT through ->update_time so that file systems which can update timestamps without blocking are no longer penalized. With that groundwork in place, the core change passes IOCB_NOWAIT into ->update_time and returns -EAGAIN only when the file system indicates it would block. XFS implements non-blocking timestamp updates by using the new ->sync_lazytime and open-coding generic_update_time without the S_NOWAIT check, since the lazytime path through the generic helpers can never block in XFS" * tag 'vfs-7.0-rc1.nonblocking_timestamps' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: enable non-blocking timestamp updates xfs: implement ->sync_lazytime fs: refactor file_update_time_flags fs: add support for non-blocking timestamp updates fs: add a ->sync_lazytime method fs: factor out a sync_lazytime helper fs: refactor ->update_time handling fat: cleanup the flags for fat_truncate_time nfs: split nfs_update_timestamps fs: allow error returns from generic_update_time fs: remove inode_update_time
2026-02-09nfs: nfs4proc: Convert comma to semicolonChen Ni
Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09NFSv4: limit lease period in nfs4_set_lease_period()Sergey Shtylyov
In nfs4_set_lease_period(), the passed 32-bit lease period in seconds is multiplied by HZ -- that might overflow before being implicitly cast to *unsigned long* (32/64-bit type), while initializing the lease variable. Cap the lease period at MAX_LEASE_PERIOD (#define'd to 1 hour for now), before multipying to avoid such overflow... Found by Linux Verification Center (linuxtesting.org) with the Svace static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Suggested-by: Trond Myklebust <trondmy@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09NFSv4: pass lease period in seconds to nfs4_set_lease_period()Sergey Shtylyov
There's no need to multiply the lease period by HZ at all the call sites of nfs4_set_lease_period() -- it makes more sense to do that only once, inside that function, by passing to it lease period as 32-bit # of seconds instead of 32/64-bit *unsigned long* # of jiffies... Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09nfs: unify security_inode_listsecurity() callsStephen Smalley
commit 243fea134633 ("NFSv4.2: fix listxattr to return selinux security label") introduced a direct call to security_inode_listsecurity() in nfs4_listxattr(). However, nfs4_listxattr() already indirectly called security_inode_listsecurity() via nfs4_listxattr_nfs4_label() if CONFIG_NFS_V4_SECURITY_LABEL is enabled and the server has the NFS_CAP_SECURITY_LABEL capability enabled. This duplication was fixed by commit 9acb237deff7 ("NFSv4.2: another fix for listxattr") by making the second call conditional on NFS_CAP_SECURITY_LABEL not being set by the server. However, the combination of the two changes effectively makes one call to security_inode_listsecurity() in every case - which is the desired behavior since getxattr() always returns a security xattr even if it has to synthesize one. Further, the two different calls produce different xattr name ordering between security.* and user.* xattr names. Unify the two separate calls into a single call and get rid of nfs4_listxattr_nfs4_label() altogether. Link: https://lore.kernel.org/selinux/CAEjxPJ6e8z__=MP5NfdUxkOMQ=EnUFSjWFofP4YPwHqK=Ki5nw@mail.gmail.com/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09fs/nfs: Fix readdir slow-start regressionSagi Grimberg
Commit 580f236737d1 ("NFS: Adjust the amount of readahead performed by NFS readdir") reduces the amount of readahead names caching done by the client. The downside of this approach is READDIR now may suffer from a slow-start issue, where initially it will fetch names that fit in a single page, then in 2, 4, 8 until the maximum supported transfer size (usually 1M). This patch tries to take a balanced approach between mitigating the slow-start issue still maintaining some efficiency gains. Fixes: 580f236737d1 ("NFS: Adjust the amount of readahead performed by NFS readdir") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09Merge tag 'lsm-pr-20260203' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Unify the security_inode_listsecurity() calls in NFSv4 While looking at security_inode_listsecurity() with an eye towards improving the interface, we realized that the NFSv4 code was making multiple calls to the LSM hook that could be consolidated into one. - Mark the LSM static branch keys as static - this helps resolve some sparse warnings - Add __rust_helper annotations to the LSM and cred wrapper functions - Remove the unsused set_security_override_from_ctx() function - Minor fixes to some of the LSM kdoc comment blocks * tag 'lsm-pr-20260203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: make keys for static branch static cred: remove unused set_security_override_from_ctx() rust: security: add __rust_helper to helpers rust: cred: add __rust_helper to helpers nfs: unify security_inode_listsecurity() calls lsm: fix kernel-doc struct member names
2026-02-02pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAINOlga Kornievskaia
It is possible to have a task get stuck on waiting on the NFS_LAYOUT_DRAIN in the following scenario 1. cpu a: waiter test NFS_LAYOUT_DRAIN (1) and plh_outstanding (1) 2. cpu b: atomic_dec_and_test() -> clear bit -> wake up 3. cpu c: sets NFS_LAYOUT_DRAIN again 4. cpu a: calls wait_on_bit() sleeps forever. To expand on this we have say 2 outstanding pnfs write IO that get ESTALE which causes both to call pnfs_destroy_layout() and set the NFS_LAYOUT_DRAIN bit but the 1st one doesn't call the pnfs_put_layout_hdr() yet (as that would prevent the 2nd ESTALE write from trying to call pnfs_destroy_layout()). If the 1st ESTALE write is the one that initially sets the NFS_LAYOUT_DRAIN so that new IO on this file initiates new LAYOUTGET. Another new write would find NFS_LAYOUT_DRAIN set and phl_outstanding>0 (step 1) and would wait_on_bit(). LAYOUTGET completes doing step 2. Now, the 2nd of ESTALE writes is calling pnfs_destory_layout() and set the NFS_LAYOUT_DRAIN bit (step 3). Finally, the waiting write wakes up to check the bit and goes back to sleep. The problem revolves around the fact that if NFS_LAYOUT_INVALID_STID was already set, it should not do the work of pnfs_mark_layout_stateid_invalid(), thus NFS_LAYOUT_DRAIN will not be set more than once for an invalid layout. Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: 880265c77ac4 ("pNFS: Avoid a live lock condition in pnfs_update_layout()") Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: fix delayed delegation return handlingChristoph Hellwig
Rework this code that was totally busted at least as of my most recent changes. Introduce a separate list for delayed delegations so that they can't get lost and don't clutter up the returns list. Add a missing spin_unlock in the helper marking it as a regular pending return. Fixes: 0ebe655bd033 ("NFS: add a separate delegation return list") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: simplify error handling in nfs_end_delegation_returnChristoph Hellwig
Drop the pointless delegation->lock held over setting multiple atomic bits in different structures, and use separate labels for the delay vs abort cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: fold nfs_abort_delegation_return into nfs_end_delegation_returnChristoph Hellwig
This will allow to simplify the error handling flow. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: remove the delegation == NULL check in nfs_end_delegation_returnChristoph Hellwig
All callers now pass a non-NULL delegation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: use bool for the issync argument to nfs_end_delegation_returnChristoph Hellwig
Replace the integer used as boolean with a bool type, and tidy up the prototype and top of function comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: return void from ->return_delegationChristoph Hellwig
The caller doesn't check the return value, so drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: return void from nfs4_inode_make_writeableChristoph Hellwig
None of the callers checks the return value, so drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4Anna Schumaker
Compiling the NFSv4 module without any minorversion support doesn't make much sense, so this patch sets NFS v4.1 as the default, always enabled NFS version allowing us to replace all the CONFIG_NFS_V4_1s scattered throughout the code with CONFIG_NFS_V4. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Add a way to disable NFS v4.0 via KConfigAnna Schumaker
I introduce NFS4_MIN_MINOR_VERSION as a parallel to NFS4_MAX_MINOR_VERSION to check if NFS v4.0 has been compiled in and return an appropriate error if not. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move sequence slot operations into minorversion operationsAnna Schumaker
At the same time, I move the NFS v4.0 functions into nfs40proc.c to keep v4.0 features together in their own files. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Pass a struct nfs_client to nfs4_init_sequence()Anna Schumaker
No functional change in this patch. This just makes the next patch where I introduce "sequence slot operations" simpler. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move NFS v4.0 pathdown recovery into nfs40client.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move nfs40_init_client into nfs40client.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move nfs40_shutdown_client into nfs40client.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Make the various NFS v4.0 operations static againAnna Schumaker
They don't need to be visible outside of nfs40proc.c anymore now that the minor version ops have been moved over. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move the NFS v4.0 minor version ops into nfs40proc.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Split out the nfs40_mig_recovery_ops to nfs40proc.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Split out the nfs40_state_renewal_ops into nfs40proc.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Split out the nfs40_nograce_recovery_ops into nfs40proc.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Split out the nfs40_reboot_recovery_ops into nfs40client.cAnna Schumaker
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move nfs40_call_sync_ops into nfs40proc.cAnna Schumaker
This is the first step in extracting NFS v4.0 into its own set of files that can be disabled through Kconfig. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-28sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSYJeff Layton
To dynamically adjust the thread count, nfsd requires some information about how busy things are. Change svc_recv() to take a timeout value, and then allow the wait for work to time out if it's set. If a timeout is not defined, then the schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the full timeout, then have it return -ETIMEDOUT to the caller. If it wakes up, finds that there is more work and that no threads are available, then attempt to set SP_TASK_STARTING. If wasn't already set, have the task return -EBUSY to cue to the caller that the service could use more threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: introduce the concept of a minimum number of threads per poolJeff Layton
Add a new pool->sp_nrthrmin field to track the minimum number of threads in a pool. Add min_threads parameters to both svc_set_num_threads() and svc_set_pool_threads(). If min_threads is non-zero and less than the max, svc_set_num_threads() will ensure that the number of running threads is between the min and the max. If the min is 0 or greater than the max, then it is ignored, and the maximum number of threads will be started, and never spun down. For now, the min_threads is always 0, but a later patch will pass the proper value through from nfsd. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split svc_set_num_threads() into two functionsJeff Layton
svc_set_num_threads() will set the number of running threads for a given pool. If the pool argument is set to NULL however, it will distribute the threads among all of the pools evenly. These divergent codepaths complicate the move to dynamic threading. Simplify the API by splitting these two cases into different helpers: Add a new svc_set_pool_threads() function that sets the number of threads in a single, given pool. Modify svc_set_num_threads() to distribute the threads evenly between all of the pools and then call svc_set_pool_threads() for each. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26Merge tag 'vfs-6.19-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-22NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return voidMike Snitzer
Both nfs_local_do_read and nfs_local_do_write only return 0 at the end, so switch them to returning void. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-22NFS/localio: remove -EAGAIN handling in nfs_local_doio()Mike Snitzer
Handling -EAGAIN in nfs_local_doio() was introduced with commit 0978e5b85fc08 (nfs_do_local_{read,write} were made to have negative checks for correspoding iter method) but commit e43e9a3a3d66 since eliminated the possibility for this -EAGAIN early return. So remove nfs_local_doio()'s -EAGAIN handling that calls nfs_localio_disable_client() -- while it should never happen from nfs_do_local_{read,write} this particular -EAGAIN handling is now "dead" and so it has become a liability. Fixes: e43e9a3a3d66 ("nfs/localio: refactor iocb initialization") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-22NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commitMike Snitzer
nfslocaliod_workqueue is a non-memreclaim workqueue (it isn't initialized with WQ_MEM_RECLAIM), see commit b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write"). Use nfslocaliod_workqueue for LOCALIO's SYNC work. Also, set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in nfs_local_fsync_work. Fixes: b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write") Signed-off-by: Mike Snitzer <snitzer@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-22NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepagesMike Snitzer
LOCALIO is an NFS loopback mount optimization that avoids using the network for READ, WRITE and COMMIT if the NFS client and server are determined to be on the same system. But because LOCALIO is still fundamentally "just NFS loopback mount" it is susceptible to recursion deadlock via direct reclaim, e.g.: NFS LOCALIO down to XFS and then back into NFS via nfs_writepages. Fix LOCALIO's potential for direct reclaim deadlock by ensuring that all its page cache allocations are done from GFP_NOFS context. Thanks to Ben Coddington for pointing out commit ad22c7a043c2 ("xfs: prevent stack overflows from page cache allocation"). Reported-by: John Cagle <john.cagle@hammerspace.com> Tested-by: Allen Lu <allen.lu@hammerspace.com> Suggested-by: Benjamin Coddington <bcodding@hammerspace.com> Fixes: 70ba381e1a43 ("nfs: add LOCALIO support") Signed-off-by: Mike Snitzer <snitzer@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-22NFS/localio: Cleanup the nfs_local_pgio_done() parametersTrond Myklebust
Remove the redundant 'force' parameter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-22NFS/localio: Handle short writes by retryingTrond Myklebust
The current code for handling short writes in localio just truncates the I/O and then sets an error. While that is close to how the ordinary NFS code behaves, it does mean there is a chance the data that got written is lost because it isn't persisted. To fix this, change localio so that the upper layers can direct the behaviour to persist any unstable data by rewriting it, and then continuing writing until an ENOSPC is hit. Fixes: 70ba381e1a43 ("nfs: add LOCALIO support") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-20NFS: make nfs_mark_return_unreferenced_delegations less aggressiveChristoph Hellwig
Currently nfs_mark_return_unreferenced_delegations marks all open but not referenced delegations (i.e., those were found by a previous pass) as return on close, which means that we'll return them on close without a way out. Replace this with only iterating delegations that are on the LRU list, and avoid delegations that are in use by an open files to avoid this. Delegations that were never referenced while open still are be prime candidates for return from the LRU if the number of delegations is over the watermark, or otherwise will be returned by the next nfs_mark_return_unreferenced_delegations pass after they are closed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-20NFS: return delegations from the end of a LRU when over the watermarkChristoph Hellwig
Directly returning delegations on close when over the watermark is rather suboptimal as these delegations are much more likely to be reused than those that have been unused for a long time. Switch to returning unused delegations from a new LRU list when we are above the threshold and there are reclaimable delegations instead. Pass over referenced delegations during the first pass to give delegations that aren't in active used by frequently used for stat() or similar another chance to not be instantly reclaimed. This scheme works the same as the referenced flags in the VFS inode and dentry caches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-20NFS: add a separate delegation return listChristoph Hellwig
Searching for returnable delegations in the per-server delegations list can be very expensive. While commit e04bbf6b1bbe ("NFS: Avoid quadratic search when freeing delegations.") reduced the overhead a bit, the fact that all the non-returnable delegations have to be searched limits the amount of optimizations that can be done. Fix this by introducing a separate list that only contains delegations scheduled for return. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>