| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Regression fixes:
- Revert the patch that removed the cap on MAX_OPS_PER_COMPOUND
- Address a kernel build issue
Stable fixes:
- Fix crash when a client queries new attributes on forechannel
- Fix rare NFSD crash when tracing is enabled"
* tag 'nfsd-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"
nfsd: Avoid strlen conflict in nfsd4_encode_components_esc()
NFSD: Fix crash in nfsd4_read_release()
NFSD: Define actions for the new time_deleg FATTR4 attributes
|
|
I've found that pynfs COMP6 now leaves the connection or lease in a
strange state, which causes CLOSE9 to hang indefinitely. I've dug
into it a little, but I haven't been able to root-cause it yet.
However, I bisected to commit 48aab1606fa8 ("NFSD: Remove the cap on
number of operations per NFSv4 COMPOUND").
Tianshuo Han also reports a potential vulnerability when decoding
an NFSv4 COMPOUND. An attacker can place an arbitrarily large op
count in the COMPOUND header, which results in:
[ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total
pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO),
nodemask=(null),cpuset=/,mems_allowed=0
when NFSD attempts to allocate the COMPOUND op array.
Let's restore the operation-per-COMPOUND limit, but increased to 200
for now.
Reported-by: tianshuo han <hantianshuo233@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: stable@vger.kernel.org
Tested-by: Tianshuo Han <hantianshuo233@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y
and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting
with the function strlen():
In file included from include/linux/cpumask.h:11,
from arch/x86/include/asm/paravirt.h:21,
from arch/x86/include/asm/irqflags.h:102,
from include/linux/irqflags.h:18,
from include/linux/spinlock.h:59,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:7,
from include/linux/slab.h:16,
from fs/nfsd/nfs4xdr.c:37:
fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc':
include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer
321 | __trace_puts(_THIS_IP_, str, strlen(str)); \
| ^~~~~~
include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts'
265 | trace_puts(fmt); \
| ^~~~~~~~~~
include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk'
34 | # define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__)
| ^~~~~~~~~~~~
include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk'
42 | __sunrpc_printk(fmt, ##__VA_ARGS__); \
| ^~~~~~~~~~~~~~~
include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk'
25 | dfprintk(FACILITY, fmt, ##__VA_ARGS__)
| ^~~~~~~~
fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk'
2646 | dprintk("nfsd4_encode_components(%s)\n", components);
| ^~~~~~~
fs/nfsd/nfs4xdr.c:2643:13: note: declared here
2643 | int strlen, count=0;
| ^~~~~~
This dprintk() instance is not particularly useful, so just remove it
altogether to get rid of the immediate strlen() conflict.
At the same time, eliminate the local strlen variable to avoid potential
conflicts with strlen() in the future.
Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When tracing is enabled, the trace_nfsd_read_done trace point
crashes during the pynfs read.testNoFh test.
Fixes: 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns an error")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
NFSv4 clients won't send legitimate GETATTR requests for these new
attributes because they are intended to be used only with CB_GETATTR
and SETATTR. But NFSD has to do something besides crashing if it
ever sees a GETATTR request that queries these attributes.
RFC 8881 Section 18.7.3 states:
> The server MUST return a value for each attribute that the client
> requests if the attribute is supported by the server for the
> target file system. If the server does not support a particular
> attribute on the target file system, then it MUST NOT return the
> attribute value and MUST NOT set the attribute bit in the result
> bitmap. The server MUST return an error if it supports an
> attribute on the target but cannot obtain its value. In that case,
> no attribute values will be returned.
Further, RFC 9754 Section 5 states:
> These new attributes are invalid to be used with GETATTR, VERIFY,
> and NVERIFY, and they can only be used with CB_GETATTR and SETATTR
> by a client holding an appropriate delegation.
Thus there does not appear to be a specific server response mandated
by specification. Taking the guidance that querying these attributes
via GETATTR is "invalid", NFSD will return nfserr_inval, failing the
request entirely.
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t
Fixes: 51c0d4f7e317 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix a crasher reported by rtm@csail.mit.edu
* tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Define a proc_layoutcommit for the FlexFiles layout type
|
|
Avoid a crash if a pNFS client should happen to send a LAYOUTCOMMIT
operation on a FlexFiles layout.
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/linux-nfs/152f99b2-ba35-4dec-93a9-4690e625dccd@oracle.com/T/#t
Cc: Thomas Haynes <loghyr@hammerspace.com>
Cc: stable@vger.kernel.org
Fixes: 9b9960a0ca47 ("nfsd: Add a super simple flex file server")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull nfsd updates from Chuck Lever:
"Mike Snitzer has prototyped a mechanism for disabling I/O caching in
NFSD. This is introduced in v6.18 as an experimental feature. This
enables scaling NFSD in /both/ directions:
- NFS service can be supported on systems with small memory
footprints, such as low-cost cloud instances
- Large NFS workloads will be less likely to force the eviction of
server-local activity, helping it avoid thrashing
Jeff Layton contributed a number of fixes to the new attribute
delegation implementation (based on a pending Internet RFC) that we
hope will make attribute delegation reliable enough to enable by
default, as it is on the Linux NFS client.
The remaining patches in this pull request are clean-ups and minor
optimizations. Many thanks to the contributors, reviewers, testers,
and bug reporters who participated during the v6.18 NFSD development
cycle"
* tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
nfsd: discard nfserr_dropit
SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it
NFSD: Add io_cache_{read,write} controls to debugfs
NFSD: Do the grace period check in ->proc_layoutget
nfsd: delete unnecessary NULL check in __fh_verify()
NFSD: Allow layoutcommit during grace period
NFSD: Disallow layoutget during grace period
sunrpc: fix "occurence"->"occurrence"
nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in
nfsd: nfserr_jukebox in nlm_fopen should lead to a retry
NFSD: Reduce DRC bucket size
NFSD: Delay adding new entries to LRU
SUNRPC: Move the svc_rpcb_cleanup() call sites
NFS: Remove rpcbind cleanup for NFSv4.0 callback
nfsd: unregister with rpcbind when deleting a transport
NFSD: Drop redundant conversion to bool
sunrpc: eliminate return pointer in svc_tcp_sendmsg()
sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length
nfsd: decouple the xprtsec policy check from check_nfsd_access()
NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull file->f_path constification from Al Viro:
"Only one thing was modifying ->f_path of an opened file - acct(2).
Massaging that away and constifying a bunch of struct path * arguments
in functions that might be given &file->f_path ends up with the
situation where we can turn ->f_path into an anon union of const
struct path f_path and struct path __f_path, the latter modified only
in a few places in fs/{file_table,open,namei}.c, all for struct file
instances that are yet to be opened"
* tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits)
Have cc(1) catch attempts to modify ->f_path
kernel/acct.c: saner struct file treatment
configfs:get_target() - release path as soon as we grab configfs_item reference
apparmor/af_unix: constify struct path * arguments
ovl_is_real_file: constify realpath argument
ovl_sync_file(): constify path argument
ovl_lower_dir(): constify path argument
ovl_get_verity_digest(): constify path argument
ovl_validate_verity(): constify {meta,data}path arguments
ovl_ensure_verity_loaded(): constify datapath argument
ksmbd_vfs_set_init_posix_acl(): constify path argument
ksmbd_vfs_inherit_posix_acl(): constify path argument
ksmbd_vfs_kern_path_unlock(): constify path argument
ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path *
check_export(): constify path argument
export_operations->open(): constify path argument
rqst_exp_get_by_name(): constify path argument
nfs: constify path argument of __vfs_getattr()
bpf...d_path(): constify path argument
done_path_create(): constify path argument
...
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add a Kconfig option to redirect dfprintk() to the trace buffer
- Enable use of the RWF_DONTCACHE flag on the NFS client
- Add striped layout handling to pNFS flexfiles
- Add proper localio handling for READ and WRITE O_DIRECT
Bugfixes:
- Handle NFS4ERR_GRACE errors during delegation recall
- Fix NFSv4.1 backchannel max_resp_sz verification check
- Fix mount hang after CREATE_SESSION failure
- Fix d_parent->d_inode locking in nfs4_setup_readdir()
Other Cleanups and Improvements:
- Improvements to write handling tracepoints
- Fix a few trivial spelling mistakes
- Cleanups to the rpcbind cleanup call sites
- Convert the SUNRPC xdr_buf to use a scratch folio instead of
scratch page
- Remove unused NFS_WBACK_BUSY() macro
- Remove __GFP_NOWARN flags
- Unexport rpc_malloc() and rpc_free()"
* tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits)
NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
nfs/localio: add tracepoints for misaligned DIO READ and WRITE support
nfs/localio: add proper O_DIRECT support for READ and WRITE
nfs/localio: refactor iocb initialization
nfs/localio: refactor iocb and iov_iter_bvec initialization
nfs/localio: avoid issuing misaligned IO using O_DIRECT
nfs/localio: make trace_nfs_local_open_fh more useful
NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
sunrpc: unexport rpc_malloc() and rpc_free()
NFSv4/flexfiles: Add support for striped layouts
NFSv4/flexfiles: Update layout stats & error paths for striped layouts
NFSv4/flexfiles: Write path updates for striped layouts
NFSv4/flexfiles: Commit path updates for striped layouts
NFSv4/flexfiles: Read path updates for striped layouts
NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.
NFSv4/flexfiles: Add data structure support for striped layouts
NFSv4/flexfiles: Use ds_commit_idx when marking a write commit
NFSv4/flexfiles: Remove cred local variable dependency
nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing
NFS: Enable use of the RWF_DONTCACHE flag on the NFS client
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull nfsctl updates from Al Viro:
"nfsctl cleanups and a fix"
* tag 'pull-nfsctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd_get_inode(): lift setting ->i_{,f}op to callers.
nfsdfs_create_files(): switch to simple_start_creating()
_nfsd_symlink(): switch to simple_start_creating()
nfsd_mkdir(): switch to simple_start_creating()
nfsctl: symlink has no business bumping link count of parent directory
|
|
nfserr_dropit hasn't been used for over a decade, since rq_dropme and
the RQ_DROPME were introduced.
Time to get rid of it completely.
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add 'io_cache_read' to NFSD's debugfs interface so that any data
read by NFSD will either be:
- cached using page cache (NFSD_IO_BUFFERED=0)
- cached but removed from the page cache upon completion
(NFSD_IO_DONTCACHE=1).
io_cache_read may be set by writing to:
/sys/kernel/debug/nfsd/io_cache_read
Add 'io_cache_write' to NFSD's debugfs interface so that any data
written by NFSD will either be:
- cached using page cache (NFSD_IO_BUFFERED=0)
- cached but removed from the page cache upon completion
(NFSD_IO_DONTCACHE=1).
io_cache_write may be set by writing to:
/sys/kernel/debug/nfsd/io_cache_write
The default value for both settings is NFSD_IO_BUFFERED, which is
NFSD's existing behavior for both read and write. Changes to these
settings take immediate effect for all exports and NFS versions.
Currently only xfs and ext4 implement RWF_DONTCACHE. For file
systems that do not implement RWF_DONTCACHE, NFSD use only buffered
I/O when the io_cache setting is NFSD_IO_DONTCACHE.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
RFC 8881 Section 18.43.3 states:
> If the metadata server is in a grace period, and does not persist
> layouts and device ID to device address mappings, then it MUST
> return NFS4ERR_GRACE (see Section 8.4.2.1).
Jeff observed that this suggests the grace period check is better
done by the individual layout type implementations, because checking
for the server grace period is unnecessary for some layout types.
Suggested-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/linux-nfs/7h5p5ktyptyt37u6jhpbjfd5u6tg44lriqkdc7iz7czeeabrvo@ijgxz27dw4sg/T/#t
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In commit 4a0de50a44bb ("nfsd: decouple the xprtsec policy check from
check_nfsd_access()") we added a NULL check on "rqstp" to earlier in
the function. This check is no longer required so delete it.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If the loca_reclaim field is set to TRUE, this indicates that the client
is attempting to commit changes to a layout after the restart of the
metadata server during the metadata server's recovery grace period. This
type of request may be necessary when the client has uncommitted writes
to provisionally allocated byte-ranges of a file that were sent to the
storage devices before the restart of the metadata server. See RFC 8881,
section 18.42.3.
Without this, the client is not able to increase the file size and commit
preallocated extents when the block/scsi layout server is restarted
during a write and is in a grace period. And when the grace period ends,
the client also cannot perform layoutcommit because the old layout state
becomes invalid, resulting in file corruption.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO
using O_DIRECT. Any misaligned DIO falls back to using buffered IO.
Because misaligned DIO is now handled safely, remove the nfs modparam
'localio_O_DIRECT_semantics' that was added to require users opt-in to
the requirement that all O_DIRECT be properly DIO-aligned.
Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of
iov_iter_aligned_bvec() that also verifies the offset associated with
an iov_iter is DIO-aligned. NOTE: in a parallel effort,
iov_iter_aligned_bvec() is being removed along with
iov_iter_is_aligned().
Lastly, add pr_info_ratelimited if underlying filesystem returns
-EINVAL because it was made to try O_DIRECT for IO that is not
DIO-aligned (shouldn't happen, so its best to be louder if it does).
Fixes: 3feec68563d ("nfs/localio: add direct IO enablement with sync and async IO support")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Use STATX_DIOALIGN and STATX_DIO_READ_ALIGN to get DIO alignment
attributes from the underlying filesystem and store them in the
associated nfsd_file. This is done when the nfsd_file is first
opened for each regular file.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async directory updates from Christian Brauner:
"This contains further preparatory changes for the asynchronous directory
locking scheme:
- Add lookup_one_positive_killable() which allows overlayfs to
perform lookup that won't block on a fatal signal
- Unify the mount idmap handling in struct renamedata as a rename can
only happen within a single mount
- Introduce kern_path_parent() for audit which sets the path to the
parent and returns a dentry for the target without holding any
locks on return
- Rename kern_path_locked() as it is only used to prepare for the
removal of an object from the filesystem:
kern_path_locked() => start_removing_path()
kern_path_create() => start_creating_path()
user_path_create() => start_creating_user_path()
user_path_locked_at() => start_removing_user_path_at()
done_path_create() => end_creating_path()
NA => end_removing_path()"
* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
debugfs: rename start_creating() to debugfs_start_creating()
VFS: rename kern_path_locked() and related functions.
VFS/audit: introduce kern_path_parent() for audit
VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
VFS: discard err2 in filename_create()
VFS/ovl: add lookup_one_positive_killable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs workqueue updates from Christian Brauner:
"This contains various workqueue changes affecting the filesystem
layer.
Currently if a user enqueue a work item using schedule_delayed_work()
the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies
to schedule_work() that is using system_wq and queue_work(), that
makes use again of WORK_CPU_UNBOUND.
This replaces the use of system_wq and system_unbound_wq. system_wq is
a per-CPU workqueue which isn't very obvious from the name and
system_unbound_wq is to be used when locality is not required.
So this renames system_wq to system_percpu_wq, and system_unbound_wq
to system_dfl_wq.
This also adds a new WQ_PERCPU flag to allow the fs subsystem users to
explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and
WQ_PERCPU flags coexist for one release cycle to allow callers to
transition their calls. WQ_UNBOUND will be removed in a next release
cycle"
* tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: WQ_PERCPU added to alloc_workqueue users
fs: replace use of system_wq with system_percpu_wq
fs: replace use of system_unbound_wq with system_dfl_wq
|
|
When the server is recovering from a reboot and is in a grace period,
any operation that may result in deletion or reallocation of block
extents should not be allowed. See RFC 8881, section 18.43.3.
If multiple clients write data to the same file, rebooting the server
during writing may result in file corruption. In the worst case, the
exported XFS may also become corrupted. Observed this behavior while
testing pNFS block volume setup.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all()
are always invoked in pairs, we can deduplicate code by moving
the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
A rename operation can only rename within a single mount. Callers of
vfs_rename() must and do ensure this is the case.
So there is no point in having two mnt_idmaps in renamedata as they are
always the same. Only one of them is passed to ->rename in any case.
This patch replaces both with a single "mnt_idmap" and changes all
callers.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now that nfsd is accessing SHA-256 via the library API instead of via
crypto_shash, there is a direct symbol dependency on the SHA-256 code
and there is no benefit to be gained from forcing it to be built-in.
Therefore, select CRYPTO_LIB_SHA256 from NFSD (conditional on NFSD_V4)
instead of from NFSD_V4, so that it can be 'm' if NFSD is 'm'.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When v3 NLM request finds a conflicting delegation, it triggers
a delegation recall and nfsd_open fails with EAGAIN. nfsd_open
then translates EAGAIN into nfserr_jukebox. In nlm_fopen, instead
of returning nlm_failed for when there is a conflicting delegation,
drop this NLM request so that the client retries. Once delegation
is recalled and if a local lock is claimed, a retry would lead to
nfsd returning a nlm_lck_blocked error or a successful nlm lock.
Fixes: d343fce148a4 ("[PATCH] knfsd: Allow lockd to drop replies as appropriate")
Cc: stable@vger.kernel.org # v6.6
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The common case is that a DRC lookup will not find the XID in the
bucket. Reduce the amount of pointer chasing during the lookup by
keeping fewer entries in each hash bucket.
Changing the bucket size constant forces the size of the DRC hash
table to increase, and the height of each bucket r-b tree to be
reduced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Neil Brown observes:
> I would not include RC_INPROG entries in the lru at all - they are
> always ignored, and will be added when they are switched to
> RCU_DONE.
I also removed a stale comment.
Suggested-by: NeilBrown <neil@brown.name>
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all()
are always invoked in pairs, we can deduplicate code by moving
the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all().
Tested-by: Olga Kornievskaia <okorniev@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The result of integer comparison already evaluates to bool. No need for
explicit conversion.
Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
A while back I had reported that an NFSv3 client could successfully
mount using '-o xprtsec=none' an export that had been exported with
'xprtsec=tls:mtls'. By "successfully" I mean that the mount command
would succeed and the mount would show up in /proc/mount. Attempting
to do anything futher with the mount would be met with NFS3ERR_ACCES.
This was fixed (albeit accidentally) by commit bb4f07f2409c ("nfsd:
Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT") and was
subsequently re-broken by commit 0813c5f01249 ("nfsd: fix access
checking for NLM under XPRTSEC policies").
Transport Layer Security isn't an RPC security flavor or pseudo-flavor,
so we shouldn't be conflating them when determining whether the access
checks can be bypassed. Split check_nfsd_access() into two helpers, and
have __fh_verify() call the helpers directly since __fh_verify() has
logic that allows one or both of the checks to be skipped. All other
sites will continue to call check_nfsd_access().
Link: https://lore.kernel.org/linux-nfs/ZjO3Qwf_G87yNXb2@aion/
Fixes: 9280c5774314 ("NFSD: Handle new xprtsec= export option")
Cc: stable@vger.kernel.org
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit 5304877936c0 ("NFSD: Fix strncpy() fortify warning") replaced
strncpy(,, sizeof(..)) with strlcpy(,, sizeof(..) - 1), but strlcpy()
already guaranteed NUL-termination of the destination buffer and
subtracting one byte potentially truncated the source string.
The incorrect size was then carried over in commit 72f78ae00a8e ("NFSD:
move from strlcpy with unused retval to strscpy") when switching from
strlcpy() to strscpy().
Fix this off-by-one error by using the full size of the destination
buffer again.
Cc: stable@vger.kernel.org
Fixes: 5304877936c0 ("NFSD: Fix strncpy() fortify warning")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since MD5 digests are fixed-size, make nfs4_make_rec_clidname() store
the digest in a stack buffer instead of a dynamically allocated buffer.
Use MD5_DIGEST_SIZE instead of a hard-coded value, both in
nfs4_make_rec_clidname() and in the definition of HEXDIR_LEN.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Since the Linux kernel's sprintf() has conversion to hex built-in via
"%*phN", delete md5_to_hex() and just use that. Also add an explicit
array bound to the dname parameter of nfs4_make_rec_clidname() to make
its size clear. No functional change.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Instead of allowing the ctime to roll backward with a WRITE_ATTRS
delegation, set FMODE_NOCMTIME on the file and have it skip mtime and
ctime updates.
It is possible that the client will never send a SETATTR to set the
times before returning the delegation. Add two new bools to struct
nfs4_delegation:
dl_written: tracks whether the file has been written since the
delegation was granted. This is set in the WRITE and LAYOUTCOMMIT
handlers.
dl_setattr: tracks whether the client has sent at least one valid
mtime that can also update the ctime in a SETATTR.
When unlocking the lease for the delegation, clear FMODE_NOCMTIME. If
the file has been written, but no setattr for the delegated mtime and
ctime has been done, update the timestamps to current_time().
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When updating the local timestamps from CB_GETATTR, the updated values
are not being properly vetted.
Compare the update times vs. the saved times in the delegation rather
than the current times in the inode. Also, ensure that the ctime is
properly vetted vs. its original value.
Fixes: 6ae30d6eb26b ("nfsd: add support for delegated timestamps")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
SETATTRs containing delegated timestamp updates are currently not being
vetted properly. Since we no longer need to compare the timestamps vs.
the current timestamps, move the vetting of delegated timestamps wholly
into nfsd.
Rename the set_cb_time() helper to nfsd4_vet_deleg_time(), and make it
non-static. Add a new vet_deleg_attrs() helper that is called from
nfsd4_setattr that uses nfsd4_vet_deleg_time() to properly validate the
all the timestamps. If the validation indicates that the update should
be skipped, unset the appropriate flags in ia_valid.
Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
As Trond points out [1], the "original time" mentioned in RFC 9754
refers to the timestamps on the files at the time that the delegation
was granted, and not the current timestamp of the file on the server.
Store the current timestamps for the file in the nfs4_delegation when
granting one. Add STATX_ATIME and STATX_MTIME to the request mask in
nfs4_delegation_stat(). When granting OPEN_DELEGATE_READ_ATTRS_DELEG, do
a nfs4_delegation_stat() and save the correct atime. If the stat() fails
for any reason, fall back to granting a normal read deleg.
[1]: https://lore.kernel.org/linux-nfs/47a4e40310e797f21b5137e847b06bb203d99e66.camel@kernel.org/
Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Ensure that notify_change() doesn't clobber a delegated ctime update
with current_time() by setting ATTR_CTIME_SET for those updates.
Don't bother setting the timestamps in cb_getattr_update_times() in the
non-delegated case. notify_change() will do that itself.
Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If the only flag left is ATTR_DELEG, then there are no changes to be
made.
Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The ia_ctime.tv_nsec field should be set to modify.nseconds.
Fixes: 7e13f4f8d27d ("nfsd: handle delegated timestamps in SETATTR")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The data type of loca_last_write_offset is newoffset4 and is switched
on a boolean value, no_newoffset, that indicates if a previous write
occurred or not. If no_newoffset is FALSE, an offset is not given.
This means that client does not try to update the file size. Thus,
server should not try to calculate new file size and check if it fits
into the segment range. See RFC 8881, section 12.5.4.2.
Sometimes the current incorrect logic may cause clients to hang when
trying to sync an inode. If layoutcommit fails, the client marks the
inode as dirty again.
Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations")
Cc: stable@vger.kernel.org
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
When pNFS client in the block or scsi layout mode sends layoutcommit
to MDS, a variable length array of modified extents is supplied within
the request. This patch allows the server to accept such extent arrays
if they do not fit within single memory page.
The issue can be reproduced when writing to a 1GB file using FIO with
O_DIRECT, 4K block and large I/O depth without preallocation of the
file. In this case, the server returns NFSERR_BADXDR to the client.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Use the appropriate xdr function to decode the lc_newoffset field,
which is a boolean value. See RFC 8881, section 18.42.1.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Remove dprintk in nfsd4_layoutcommit. These are not needed
in day to day usage, and the information is also available
in Wireshark when capturing NFS traffic.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Compilers may optimize the layout of C structures, so we should not rely
on sizeof struct and memcpy to encode and decode XDR structures. The byte
order of the fields should also be taken into account.
This patch adds the correct functions to handle the deviceid4 structure
and removes the pad field, which is currently not used by NFSD, from the
runtime state. The server's byte order is preserved because the deviceid4
blob on the wire is only used as a cookie by the client.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This interface was deprecated by commit e6f7e1487ab5 ("nfs_localio:
simplify interface to nfsd for getting nfsd_file") and is now
unused. So let's remove it.
Signed-off-by: NeilBrown <neil@brown.name>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: The fh_getattr() function is part of NFSD's file handle
API, so relocate it.
I've made it an un-inlined function so that trace points and new
functionality can easily be introduced. That increases the size of
nfsd.ko by about a page on my x86_64 system (out of 26MB; compiled
with -O2).
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up: these helpers are part of the NFSD file handle API.
Relocate them to fs/nfsd/nfsfh.h.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be used.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/20250916082906.77439-2-marco.crivellari@suse.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|