summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2025-11-13Merge tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix smbdirect (RDMA) disconnect hang bug - Fix potential Denial of Service when connection limit exceeded - Fix smbdirect (RDMA) connection (potentially accessing freed memory) bug * tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd: smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED ksmbd: close accepted socket when per-IP limit rejects connection smb: server: rdma: avoid unmapping posted recv on accept failure
2025-11-13fuse: rename 'namelen' to 'namesize'Miquel Sabaté Solà
By "length of a string" usually the number of non-null chars is meant (i.e. strlen(str)). So the variable 'namelen' was confusingly named, whereas 'namesize' refers more to what's being done in 'get_security_context'. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-13fuse: use strscpy instead of strcpyMiquel Sabaté Solà
As pointed out in [1], strcpy() is deprecated in favor of strscpy(). Furthermore, the size of the buffer for the name to be copied is well known at this point since we are going to move the pointer by that much on the next line. Hence, it's safe to assume 'namelen' for the size of the string to be copied. [1] https://github.com/KSPP/linux/issues/88 Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12Merge tag 'nfsd-6.18-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Address recently reported issues or issues found at the recent NFS bake-a-thon held in Raleigh, NC. Issues reported with v6.18-rc: - Address a kernel build issue - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED Issues that need expedient stable backports: - Close a refcount leak exposure - Report support for NFSv4.2 CLONE correctly - Fix oops during COPY_NOTIFY processing - Prevent rare crash after XDR encoding failure - Prevent crash due to confused or malicious NFSv4.1 client" * tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" nfsd: ensure SEQUENCE replay sends a valid reply. NFSD: Never cache a COMPOUND when the SEQUENCE operation fails NFSD: Skip close replay processing if XDR encoding fails NFSD: free copynotify stateid in nfs4_free_ol_stateid() nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-12nilfs2: replace vmalloc + copy_from_user with vmemdup_userThorsten Blum
Replace vmalloc() followed by copy_from_user() with vmemdup_user() to improve nilfs_ioctl_clean_segments() and nilfs_ioctl_set_suinfo(). Use kvfree() to free the buffers created by vmemdup_user(). Use u64_to_user_ptr() instead of manually casting the pointers and remove the obsolete 'out_free' label. No functional changes intended. Link: https://lkml.kernel.org/r/20251030154700.7444-1-konishi.ryusuke@gmail.com Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: add inline inode consistency check to ocfs2_validate_inode_block()Dmitry Antipov
In 'ocfs2_validate_inode_block()', add an extra check whether an inode with inline data (i.e. self-contained) has no clusters, thus preventing an invalid inode from being passed to 'ocfs2_evict_inode()' and below. Link: https://lkml.kernel.org/r/20251023141650.417129-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+c16daba279a1161acfb0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=c16daba279a1161acfb0 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: convert to host endian in ocfs2_validate_inode_blockJoseph Qi
Convert to host endian when checking OCFS2_VALID_FL to keep consistent with other checks. Link: https://lkml.kernel.org/r/20251025123218.3997866-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: use correct endian in ocfs2_dinode_has_extentsJoseph Qi
Fields in ocfs2_dinode is little endian, covert to host endian when checking those contents. Link: https://lkml.kernel.org/r/20251025123218.3997866-1-joseph.qi@linux.alibaba.com Fixes: fdbb6cd96ed5 ("ocfs2: correct l_next_free_rec in online check") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: add boundary check to ocfs2_check_dir_entry()Dmitry Antipov
In 'ocfs2_check_dir_entry()', add extra check whether at least the smallest possible dirent may be located at the specified offset within bh's data, thus preventing an out-of-bounds accesses below. Link: https://lkml.kernel.org/r/20251013062826.122586-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reported-by: syzbot+b20bbf680bb0f2ecedae@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b20bbf680bb0f2ecedae Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: add directory size check to ocfs2_find_dir_space_id()Dmitry Antipov
Fix a null-pointer-deref which was detected by UBSAN: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 UID: 0 PID: 5317 Comm: syz-executor310 Not tainted 6.15.0-syzkaller-12141-gec7714e49479 #0 PREEMPT(full) In 'ocfs2_find_dir_space_id()', add extra check whether the directory data block is large enough to hold at least one directory entry, and raise 'ocfs2_error()' if the former is unexpectedly small. Link: https://lkml.kernel.org/r/20251013103709.146001-1-dmantipov@yandex.ru Reported-by: syzbot+ded9116588a7b73c34bc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ded9116588a7b73c34bc Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: add extra consistency check to ocfs2_dx_dir_lookup_rec()Dmitry Antipov
In 'ocfs2_dx_dir_lookup_rec()', check whether an extent list length of the directory indexing block matches the one configured via the superblock parameters established at mount, thus preventing an out-of-bounds accesses while iterating over the extent records below. Link: https://lkml.kernel.org/r/20251007094626.196143-1-dmantipov@yandex.ru Reported-by: syzbot+30b53487d00b4f7f0922@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=30b53487d00b4f7f0922 Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com>> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: annotate flexible array members with __counted_by_le()Dmitry Antipov
Annotate flexible array members of 'struct ocfs2_extent_list', 'struct ocfs2_chain_list', 'struct ocfs2_truncate_log', 'struct ocfs2_dx_entry_list', 'ocfs2_refcount_list' and 'struct ocfs2_xattr_header' with '__counted_by_le()' attribute to improve array bounds checking when CONFIG_UBSAN_BOUNDS is enabled. [dmantipov@yandex.ru: fix __counted_by_le() usage in ocfs2_expand_inline_dx_root()] Link: https://lkml.kernel.org/r/20251014070324.130313-1-dmantipov@yandex.ru Link: https://lkml.kernel.org/r/20251007123526.213150-1-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12ocfs2: relax BUG() to ocfs2_error() in __ocfs2_move_extent()Dmitry Antipov
In '__ocfs2_move_extent()', relax 'BUG()' to 'ocfs2_error()' just to avoid crashing the whole kernel due to a filesystem corruption. Fixes: 8f603e567aa7 ("Ocfs2/move_extents: move a range of extent.") Link: https://lkml.kernel.org/r/20251009102349.181126-2-dmantipov@yandex.ru Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Closes: https://syzkaller.appspot.com/bug?extid=727d161855d11d81e411 Reported-by: syzbot+727d161855d11d81e411@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-12fs: move fd_install() slowpath into a dedicated routine and provide commentaryMateusz Guzik
On stock kernel gcc 14 emits avoidable register spillage: endbr64 call ffffffff81374630 <__fentry__> push %r13 push %r12 push %rbx sub $0x8,%rsp [snip] Total fast path is 99 bytes. Moving the slowpath out avoids it and shortens the fast path to 74 bytes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251110095634.1433061-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs: hide dentry_cache behind runtime const machineryMateusz Guzik
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251105153622.758836-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs: touch predicts in do_dentry_open()Mateusz Guzik
Helps out some of the asm, the routine is still a mess. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251109125254.1288882-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs: retire now stale MAY_WRITE predicts in inode_permission()Mateusz Guzik
The primary non-MAY_WRITE consumer now uses lookup_inode_permission_may_exec(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251107142149.989998-4-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12btrfs: utilize IOP_FASTPERM_MAY_EXECMateusz Guzik
Root filesystem was ext4, btrfs was mounted on /testfs. Then issuing access(2) in a loop on /testfs/repos/linux/include/linux/fs.h on Sapphire Rapids (ops/s): before: 3447976 after: 3620879 (+5%) Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251107142149.989998-3-mjguzik@gmail.com Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs: speed up path lookup with cheaper handling of MAY_EXECMateusz Guzik
The generic inode_permission() routine does work which is known to be of no significance for lookup. There are checks for MAY_WRITE, while the requested permission is MAY_EXEC. Additionally devcgroup_inode_permission() is called to check for devices, but it is an invariant the inode is a directory. Absent a ->permission func, execution lands in generic_permission() which checks upfront if the requested permission is granted for everyone. We can elide the branches which are guaranteed to be false and cut straight to the check if everyone happens to be allowed MAY_EXEC on the inode (which holds true most of the time). Moreover, filesystems which provide their own ->permission routine can take advantage of the optimization by setting the IOP_FASTPERM_MAY_EXEC flag on their inodes, which they can legitimately do if their MAY_EXEC handling matches generic_permission(). As a simple benchmark, as part of compilation gcc issues access(2) on numerous long paths, for example /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o Issuing access(2) on it in a loop on ext4 on Sapphire Rapids (ops/s): before: 3797556 after: 3987789 (+5%) Note: this depends on the not-yet-landed ext4 patch to mark inodes with cache_no_acl() Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251107142149.989998-2-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fuse: refactor fuse_conn_put() to remove negative logic.Luis Henriques
There is no functional change with this patch. It simply refactors function fuse_conn_put() to not use negative logic, which makes it more easier to read. Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12fuse: new work queue to invalidate dentries from old epochsLuis Henriques
With the infrastructure introduced to periodically invalidate expired dentries, it is now possible to add an extra work queue to invalidate dentries when an epoch is incremented. This work queue will only be triggered when the 'inval_wq' parameter is set. Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12fuse: new work queue to periodically invalidate expired dentriesLuis Henriques
This patch adds the necessary infrastructure to keep track of all dentries created for FUSE file systems. A set of rbtrees, protected by hashed locks, will be used to keep all these dentries sorted by expiry time. A new module parameter 'inval_wq' is also added. When set, it will start a work queue which will periodically invalidate expired dentries. The value of this new parameter is the period, in seconds, for this work queue. Once this parameter is set, every new dentry will be added to one of the rbtrees. When the work queue is executed, it will check all the rbtrees and will invalidate those dentries that have timed-out. The work queue period can not be smaller than 5 seconds, but can be disabled by setting 'inval_wq' to zero (which is the default). Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12dcache: export shrink_dentry_list() and add new helper d_dispose_if_unused()Luis Henriques
Add and export a new helper d_dispose_if_unused() which is simply a wrapper around to_shrink_list(), to add an entry to a dispose list if it's not used anymore. Also export shrink_dentry_list() to kill all dentries in a dispose list. Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Luis Henriques <luis@igalia.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12fuse: add WARN_ON and comment for RCU revalidateMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12fuse: Fix whitespace for fuse_uring_args_to_ring() commentBernd Schubert
The function comment accidentally got wrong indentation. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12fuse: missing copy_finish in fuse-over-io-uring argument copiesCheng Ding
Fix a possible reference count leak of payload pages during fuse argument copies. [Joanne: simplified error cleanup] Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support") Cc: stable@vger.kernel.org # v6.14 Signed-off-by: Cheng Ding <cding@ddn.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-11-12xfs: remove xarray mark for reclaimable zonesHans Holmberg
We can easily check if there are any reclaimble zones by just looking at the used counters in the reclaim buckets, so do that to free up the xarray mark we currently use for this purpose. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: remove the xlog_in_core_t typedefChristoph Hellwig
Switch the few remaining users to use the underlying struct directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: remove l_iclog_headsChristoph Hellwig
l_iclog_heads is only used in one place and can be trivially derived from l_iclog_hsize by a single shift operation. Remove it, and switch the initialization of l_iclog_hsize to use struct_size so that it is directly derived from the on-disk format definition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: remove the xlog_rec_header_t typedefChristoph Hellwig
There are almost no users of the typedef left, kill it and switch the remaining users to use the underlying struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: remove xlog_in_core_2_tChristoph Hellwig
xlog_in_core_2_t is a really odd type, not only is it grossly misnamed because it actually is an on-disk structure, but it also reprents the actual on-disk structure in a rather odd way. A v1 or small v2 log header look like: +-----------------------+ | xlog_record | +-----------------------+ while larger v2 log headers look like: +-----------------------+ | xlog_record | +-----------------------+ | xlog_rec_ext_header | +-------------------+---+ | ..... | +-----------------------+ | xlog_rec_ext_header | +-----------------------+ I.e., the ext headers are a variable sized array at the end of the header. So instead of declaring a union of xlog_rec_header, xlog_rec_ext_header and padding to BBSIZE, add the proper padding to struct struct xlog_rec_header and struct xlog_rec_ext_header, and add a variable sized array of the latter to the former. This also exposes the somewhat unusual scope of the log checksums, which is made explicitly now by adding proper padding and macro designating the actual payload length. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: remove a very outdated comment from xlog_alloc_logChristoph Hellwig
The xlog_iclog definition has been pretty standard for a while, so drop this now rather misleading comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: cleanup xlog_alloc_log a bitChristoph Hellwig
Remove the separate head variable, move the ic_datap initialization up a bit where the context is more obvious and remove the duplicate memset right after a zeroing memory allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: don't use xlog_in_core_2_t in struct xlog_in_coreChristoph Hellwig
Most accessed to the on-disk log record header are for the original xlog_rec_header. Make that the main structure, and case for the single remaining place using other union legs. This prepares for removing xlog_in_core_2_t entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: add a on-disk log header cycle array accessorChristoph Hellwig
Accessing the cycle arrays in the original log record header vs the extended header is messy and duplicated in multiple places. Add a xlog_cycle_data helper to abstract it out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12xfs: add a XLOG_CYCLE_DATA_SIZE constantChristoph Hellwig
The XLOG_HEADER_CYCLE_SIZE / BBSIZE expression is used a lot in the log code, give it a symbolic name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-11-12iomap: simplify when reads can be skipped for writesJoanne Koong
Currently, the logic for skipping the read range for a write is if (!(iter->flags & IOMAP_UNSHARE) && (from <= poff || from >= poff + plen) && (to <= poff || to >= poff + plen)) which breaks down to skipping the read if any of these are true: a) from <= poff && to <= poff b) from <= poff && to >= poff + plen c) from >= poff + plen && to <= poff d) from >= poff + plen && to >= poff + plen This can be simplified to if (!(iter->flags & IOMAP_UNSHARE) && from <= poff && to >= poff + plen) from the following reasoning: a) from <= poff && to <= poff This reduces to 'to <= poff' since it is guaranteed that 'from <= to' (since to = from + len). It is not possible for 'from <= to' to be true here because we only reach here if plen > 0 (thanks to the preceding 'if (plen == 0)' check that would break us out of the loop). If 'to <= poff', plen would have to be 0 since poff and plen get adjusted in lockstep for uptodate blocks. This means we can eliminate this check. c) from >= poff + plen && to <= poff This is not possible since 'from <= to' and 'plen > 0'. We can eliminate this check. d) from >= poff + plen && to >= poff + plen This reduces to 'from >= poff + plen' since 'from <= to'. It is not possible for 'from >= poff + plen' to be true here. We only reach here if plen > 0 and for writes, poff and plen will always be block-aligned, which means poff <= from < poff + plen. We can eliminate this check. The only valid check is b) from <= poff && to >= poff + plen. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-7-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12iomap: simplify ->read_folio_range() error handling for readsJoanne Koong
Instead of requiring that the caller calls iomap_finish_folio_read() even if the ->read_folio_range() callback returns an error, account for this internally in iomap instead, which makes the interface simpler and makes it match writeback's ->read_folio_range() error handling expectations. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-6-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12iomap: optimize pending async writeback accountingJoanne Koong
Pending writebacks must be accounted for to determine when all requests have completed and writeback on the folio should be ended. Currently this is done by atomically incrementing ifs->write_bytes_pending for every range to be written back. Instead, the number of atomic operations can be minimized by setting ifs->write_bytes_pending to the folio size, internally tracking how many bytes are written back asynchronously, and then after sending off all the requests, decrementing ifs->write_bytes_pending by the number of bytes not written back asynchronously. Now, for N ranges written back, only N + 2 atomic operations are required instead of 2N + 2. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-5-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12iomap: account for unaligned end offsets when truncating read rangeJoanne Koong
The end position to start truncating from may be at an offset into a block, which under the current logic would result in overtruncation. Adjust the calculation to account for unaligned end offsets. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-3-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12iomap: rename bytes_pending/bytes_accounted to ↵Joanne Koong
bytes_submitted/bytes_not_submitted The naming "bytes_pending" and "bytes_accounted" may be confusing and could be better named. Rename this to "bytes_submitted" and "bytes_not_submitted" to make it more clear that these are bytes we passed to the IO helper to read in. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20251111193658.3495942-2-joannelkoong@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs: add iput_not_last()Mateusz Guzik
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251105212025.807549-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12fs/namespace: correctly handle errors returned by grab_requested_mnt_nsAndrei Vagin
grab_requested_mnt_ns was changed to return error codes on failure, but its callers were not updated to check for error pointers, still checking only for a NULL return value. This commit updates the callers to use IS_ERR() or IS_ERR_OR_NULL() and PTR_ERR() to correctly check for and propagate errors. This also makes sure that the logic actually works and mount namespace file descriptors can be used to refere to mounts. Christian Brauner <brauner@kernel.org> says: Rework the patch to be more ergonomic and in line with our overall error handling patterns. Fixes: 7b9d14af8777 ("fs: allow mount namespace fd") Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrei Vagin <avagin@google.com> Link: https://patch.msgid.link/20251111062815.2546189-1-avagin@google.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12power: always freeze efivarfsChristian Brauner
The efivarfs filesystems must always be frozen and thawed to resync variable state. Make it so. Link: https://patch.msgid.link/20251105-vorbild-zutreffen-fe00d1dd98db@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12vfs: expose delegation support to userlandJeff Layton
Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-17-52f3feebb2f2@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12nfsd: wire up GET_DIR_DELEGATION handlingJeff Layton
Add a new routine for acquiring a read delegation on a directory. These are recallable-only delegations with no support for CB_NOTIFY. That will be added in a later phase. Since the same CB_RECALL/DELEGRETURN infrastructure is used for regular and directory delegations, a normal nfs4_delegation is used to represent a directory delegation. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-16-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12nfsd: allow DELEGRETURN on directoriesJeff Layton
As Trond pointed out: "...provided that the presented stateid is actually valid, it is also sufficient to uniquely identify the file to which it is associated (see RFC8881 Section 8.2.4), so the filehandle should be considered mostly irrelevant for operations like DELEGRETURN." Don't ask fh_verify to filter on file type. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-15-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12nfsd: allow filecache to hold S_IFDIR filesJeff Layton
The filecache infrastructure will only handle S_IFREG files at the moment. Directory delegations will require adding support for opening S_IFDIR inodes. Plumb a "type" argument into nfsd_file_do_acquire() and have all of the existing callers set it to S_IFREG. Add a new nfsd_file_acquire_dir() wrapper that nfsd can call to request a nfsd_file that holds a directory open. For now, there is no need for a fsnotify_mark for directories, as CB_NOTIFY is not yet supported. Change nfsd_file_do_acquire() to avoid allocating one for non-S_IFREG inodes. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-14-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12filelock: lift the ban on directory leases in generic_setleaseJeff Layton
With the addition of the try_break_lease calls in directory changing operations, allow generic_setlease to hand them out. Write leases on directories are never allowed however, so continue to reject them. For now, there is no API for requesting delegations from userland, so ensure that userland is prevented from acquiring a lease on a directory. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-13-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12vfs: make vfs_symlink break delegations on parent dirJeff Layton
In order to add directory delegation support, we must break delegations on the parent on any change to the directory. Add a delegated_inode parameter to vfs_symlink() and have it break the delegation. do_symlinkat() can then wait on the delegation break before proceeding. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-12-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>