summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
5 daysMerge tag 'xfs-merge-7.2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "The main highlight is the removal of experimental tag of the zone allocator feature. Besides that, this contains a collection of bug fixes and code refactoring but no new features have been added" * tag 'xfs-merge-7.2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (29 commits) xfs: shut down the filesystem on a failed mount xfs: skip inode inactivation on a shut down mount xfs: move XFS_LSN_CMP to xfs_log_format.h xfs: shut down zoned file systems on writeback errors xfs: cleanup xfs_growfs_compute_deltas xfs: pass back updated nb from xfs_growfs_compute_deltas xfs: fix pointer arithmetic error on 32-bit systems xfs: initialize iomap->flags earlier in xfs_bmbt_to_iomap xfs: only log freed extents for the current RTG in zoned growfs xfs: add newly added RTGs to the free pool in growfs xfs: factor out a xfs_zone_mark_free helper xfs: mark struct xfs_imap as __packed xfs: store an agbno in struct xfs_imap xfs: massage xfs_imap_to_bp into xfs_read_icluster xfs: remove im_len field in struct xfs_imap xfs: cleanup xfs_imap xfs: remove the call to xfs_buf_reverify in xfs_trans_read_buf_map xfs: remove the i_ino field in struct xfs_inode xfs: remove xfs_setup_existing_inode xfs: convert xchk_inode_xref_set_corrupt to xchk_ip_xref_set_corrupt ...
6 daysMerge tag 'vfs-7.2-rc1.casefold' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs casefolding updates from Christian Brauner: "This exposes the case folding behavior of local filesystems so that file servers - nfsd, ksmbd, and user space file servers - can report the actual behavior to clients instead of guessing. Filesystems report case-insensitive and case-nonpreserving behavior via new file_kattr flags in their fileattr_get implementations. fat, exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are wired up. Local filesystems that are not explicitly handled default to the usual POSIX behavior of case-sensitive and case-preserving. nfsd uses this to report case folding via NFSv3 PATHCONF and to implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING attributes - both have been part of the NFS protocols for decades to support clients on non-POSIX systems - and ksmbd reports it via FS_ATTRIBUTE_INFORMATION. Exposing the information through the fileattr uapi covers user space file servers. The immediate motivation is interoperability: Windows NFS clients hard-require servers to report case-insensitivity for Win32 applications to work correctly, and a client that knows the server is case-insensitive can avoid issuing multiple LOOKUP/READDIR requests searching for case variants. The Linux NFS client already grew support for case-insensitive shares years ago in support of the Hammerspace NFS server - negative dentry caching must be disabled (a lookup for "FILE.TXT" failing must not cache a negative entry when "file.txt" exists) and directory change invalidation must drop cached case-folded name variants. Such servers often operate in multi-protocol environments where a single file service instance caters to both NFS and SMB clients, and nfsd needs to report case folding properly to participate as a first-class citizen there. A follow-up series brings fixes for the initial work: the nfsd case-info probe now uses kernel credentials, maps -ESTALE to NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs client avoids transiently zeroed case capability bits during the probe and skips the pathconf probe when neither field is consumed; the FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the tools UAPI headers are synced" * tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) nfsd: Cap case-folding probe cost across READDIR entries nfsd: Map -ESTALE from case probe to NFS3ERR_STALE nfsd: Use kernel credentials for case-info probe fs: Clarify FS_CASEFOLD_FL semantics in UAPI header nfs: Skip pathconf probe when neither field is consumed nfs: Avoid transient zeroed case capability bits during probe tools headers UAPI: Sync case-sensitivity flags from linux/fs.h ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING nfsd: Report export case-folding via NFSv3 PATHCONF isofs: Implement fileattr_get for case sensitivity vboxsf: Implement fileattr_get for case sensitivity nfs: Implement fileattr_get for case sensitivity cifs: Implement fileattr_get for case sensitivity xfs: Report case sensitivity in fileattr_get hfsplus: Report case sensitivity in fileattr_get hfs: Implement fileattr_get for case sensitivity ntfs3: Implement fileattr_get for case sensitivity exfat: Implement fileattr_get for case sensitivity fat: Implement fileattr_get for case sensitivity ...
6 daysMerge tag 'vfs-7.2-rc1.inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This extends the lockless ->i_count handling. iput() could already decrement any value greater than one locklessly but acquiring a reference always required taking inode->i_lock. Now acquiring a reference is lockless as long as the count was already at least 1, i.e., only the 0->1 and 1->0 transitions take the lock. This avoids the lock for the common cases of nfs calling into the inode hash and btrfs using igrab(). Cleanup-wise icount_read_once() is added to line up with inode_state_read_once() and the open-coded ->i_count loads across the tree are converted, and ihold() is relocated and tidied up. On top of that some stale lock ordering annotations are retired from the inode hash code: iunique() no longer takes the hash lock since the inode hash became RCU-searchable and s_inode_list_lock is no longer taken under the hash lock either" * tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: retire stale lock ordering annotations from inode hash fs: allow lockless ->i_count bumps as long as it does not transition 0->1 fs: relocate and tidy up ihold() fs: add icount_read_once() and stop open-coding ->i_count loads
6 daysMerge tag 'vfs-7.2-rc1.exportfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull exportfs updates from Christian Brauner: "This cleans up the exportfs support for block-style layouts that provide direct block device access: the operations for layout-based block device access are split out of struct export_operations into a separate header, ->commit_blocks() no longer takes a struct iattr argument, and the way support for layout-based block device access is detected is reworked. nfsd's blocklayout code also stops honoring loca_time_modify. This is preparation for supporting export of more than a single device per file system" * tag 'vfs-7.2-rc1.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: exportfs,nfsd: rework checking for layout-based block device access support exportfs: don't pass struct iattr to ->commit_blocks exportfs: split out the ops for layout-based block device access nfsd/blocklayout: always ignore loca_time_modify
9 daysxfs: shut down the filesystem on a failed mountMikhail Lobanov
A corrupt/crafted XFS image can make mount fail after background inode inactivation has already been enabled. xfs_mountfs() turns on inodegc (xfs_inodegc_start()) right after log recovery, but the quota subsystem (mp->m_quotainfo) is only allocated much later, in xfs_qm_newmount() / xfs_qm_mount_quotas(). The quota accounting flags in mp->m_qflags are parsed from the mount options before xfs_mountfs() even runs. If the mount then aborts in between - e.g. xfs_rtmount_inodes() failing with "failed to read RT inodes" - the unwind path flushes the inodegc queue, which inactivates the inodes that are still queued, and xfs_inactive() calls xfs_qm_dqattach(). That path trusts XFS_IS_QUOTA_ON() (the flag is set) and dereferences the not yet allocated mp->m_quotainfo: XFS (loop0): failed to read RT inodes Oops: general protection fault, probably for non-canonical address 0xdffffc000000002a: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000150-0x0000000000000157] Workqueue: xfs-inodegc/loop0 xfs_inodegc_worker RIP: 0010:__mutex_lock+0xfe/0x930 Call Trace: xfs_qm_dqget_cache_lookup+0x63/0x7f0 xfs_qm_dqget_inode+0x336/0x860 xfs_qm_dqattach_one+0x232/0x4e0 xfs_qm_dqattach_locked+0x2c6/0x470 xfs_qm_dqattach+0x46/0x70 xfs_inactive+0x988/0xe80 xfs_inodegc_worker+0x27c/0x730 The NULL m_quotainfo deref is only one symptom. The deeper problem is that a failed mount should not be inactivating inodes at all: it must not write to the (possibly corrupt, only partially set up) persistent metadata of a filesystem we just refused to mount, and the subsystems inactivation relies on may not be initialised. Mark the filesystem shut down before flushing the inodegc queue in the xfs_mountfs() failure path. With the preceding patch a shut down mount no longer inactivates the queued inodes: xfs_inactive() returns early so they are dropped straight to reclaim instead. They are still pulled down so reclaim can free them (which is why the flush was added in commit ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues")), but without touching the on-disk structures - matching that comment's own "pull down all the state and flee" intent. Use SHUTDOWN_META_IO_ERROR for the shutdown: it is the generic "cannot safely touch metadata" reason already used elsewhere in this file and in the xfs_ifree() failure path, and unlike SHUTDOWN_FORCE_UMOUNT it does not log a misleading "User initiated shutdown received". A failed mount is not necessarily on-disk corruption (it can be a transient I/O or resource error), so SHUTDOWN_CORRUPT_ONDISK would not be accurate either. Found by fuzzing XFS with syzkaller (corrupt image mount); reproduced and verified under QEMU/KASAN. Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: skip inode inactivation on a shut down mountMikhail Lobanov
XFS already declines to inactivate inodes on a shut down mount, but only at queue time: xfs_inode_mark_reclaimable() calls xfs_inode_needs_inactive(), which returns false when the mount is shut down ("If the log isn't running, push inodes straight to reclaim"), and then drops the dquots and marks the inode reclaimable directly. An inode that was queued for background inactivation while the mount was still live is not covered by that check: the inodegc worker still calls xfs_inactive() on it even after the mount has been shut down in the meantime. Inactivation modifies persistent metadata and runs transactions that cannot complete on a shut down mount, and it relies on subsystems (e.g. quota) that a torn down, or never fully set up, mount may not have available. Honour the same invariant in xfs_inactive() itself: if the mount is shut down, return early before doing any inactivation work. The dquots attached to the inode are released by the existing xfs_qm_dqdetach() at the out: label, so references are not leaked, and the caller then makes the inode reclaimable exactly as before. On its own this is a consistency fix with the existing queue-time behaviour; it is also a prerequisite for shutting the mount down in the xfs_mountfs() failure path in the following patch. Fixes: ab23a7768739 ("xfs: per-cpu deferred inode inactivation queues") Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: move XFS_LSN_CMP to xfs_log_format.hChristoph Hellwig
Because CYCLE_LSN/BLOCK_LSN are defined in xfs_log_format.h, XFS_LSN_CMP forces a xfs_log_format.h dependency in xfs_log.h. Move XFS_LSN_CMP to xfs_log_format.h and drop the macro/inline indirection to clean up our header mess a little bit. This also helps xfsprogs, which doesn't have xfs_log.h, but needs XFS_LSN_CMP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
9 daysxfs: shut down zoned file systems on writeback errorsYao Sang
Zoned writeback allocates space from an open zone and advances the in-memory allocation state before submitting the bio. The completion path only records the written blocks and updates the mapping on success. If the write fails, XFS cannot tell how far the device write pointer advanced and cannot safely roll the open zone accounting back. This was observed while investigating xfs/643 and xfs/646 on an external ZNS realtime device. A writeback error after consuming space from an open zone left later writers waiting for open-zone or GC progress that could not happen. xfs/643 exposed this through the GC defragmentation path, while xfs/646 exposed the same failure mode through the truncate/EOF-zeroing space wait path. There is no local recovery path in ioend completion that can restore a consistent zoned allocation state after the device has rejected the write. Treat writeback errors for zoned inodes as fatal and force a file system shutdown from the ioend completion path. The existing shutdown path wakes zoned allocation waiters and makes future space waits return -EIO instead of leaving tasks stuck waiting for progress. Signed-off-by: Yao Sang <sangyao@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: cleanup xfs_growfs_compute_deltasChristoph Hellwig
xfs_growfs_compute_deltas has an odd calling conventions, and looks very convoluted due to the use of do_div and strangely named and typed variables. Rename it, make it return the agcount and let the caller calculate the delta. The internally use the better div_u64_rem helper and descriptive variable names and types. Also add a comment describing what the function is used for. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: pass back updated nb from xfs_growfs_compute_deltasChristoph Hellwig
xfs_growfs_compute_deltas can update nb for corner cases like a number of blocks that would create a less the minimal sized AG, or running past the max AG limit. Pass back the calculated value to the caller, as it relies on to calculate the new number of perag structures. Note that the grown file system size is not affected by this miscalculation as it uses the passed back delta value. Fixes: a49b7ff63f98 ("xfs: Refactoring the nagcount and delta calculation") Cc: stable@vger.kernel.org # v7.0 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: fix pointer arithmetic error on 32-bit systemsDarrick J. Wong
The translation of the old XFS_BMBT_KEY_ADDR macro into a static function is not correct on 32-bit systems because the sizeof() argument went from being a xfs_bmbt_key_t (i.e. a struct) to a (struct xfs_bmbt_key *) (i.e. a pointer to the same struct). On 64-bit systems this turns out ok because they are the same size, but on 32-bit systems this is catastrophic because they are not the same size. So far there have been no complaints, most likely because the xfs developers urge against running it on 32-bit systems. But this needs fixing asap. Cc: stable@vger.kernel.org # v6.12 Fixes: 79124b37400635 ("xfs: replace shouty XFS_BM{BT,DR} macros") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: initialize iomap->flags earlier in xfs_bmbt_to_iomapChristoph Hellwig
Otherwise we lose the IOMAP_IOEND_BOUNDARY assingment for writes to the first block in a realtime group, and could cause incorrect merges for such writes. Fixes: b91afef72471 ("xfs: don't merge ioends across RTGs") Cc: <stable@vger.kernel.org> # v6.13 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: only log freed extents for the current RTG in zoned growfsChristoph Hellwig
Otherwise a power fail or crash during growfs could lead to an elevated sb_rblocks counter. Note that the step function is much simpler compared to the classic RT allocator as zoned RT sections must be aligned to real time group boundaries. Fixes: 01b71e64bb87 ("xfs: support growfs on zoned file systems") Cc: <stable@vger.kernel.org> # v6.15 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
11 daysxfs: add newly added RTGs to the free pool in growfsChristoph Hellwig
When growing a zoned RT section, the newly added RTGs also need to be tagged as free in the radix tree and add to the nr_free_zones counters. Call xfs_add_free_zone to do that, otherwise using up the newly added space will wait for free zones forever. Fixes: 01b71e64bb87 ("xfs: support growfs on zoned file systems") Cc: stable@vger.kernel.org # v6.15 Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
11 daysxfs: factor out a xfs_zone_mark_free helperChristoph Hellwig
Add a helper for adding a zone to the free pool in preparation of adding another caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: mark struct xfs_imap as __packedChristoph Hellwig
This returns 2 bytes of padding at the to struct xfs_inode into which this structure is embedded. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: store an agbno in struct xfs_imapChristoph Hellwig
The xfs_imap structure is embedded into the xfs_inode, which means the size of it directly affects the inode size. Replacing the xfs_daddr_t with an xfs_agbno_t and taking the AG information from other easily available sources allows us to shrink the structure including the typical padding from 16 bytes to 8 bytes. As a side-effect the debugging check in xfs_imap() naturally now converges to a stricter variant that checks that the cluster is located inside a single AG, and not just inside the entire device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: massage xfs_imap_to_bp into xfs_read_iclusterChristoph Hellwig
xfs_imap_to_bp only uses the im_blkno field from struct xfs_imap, so pass that directly. Rename the function to xfs_read_icluster, which describes the functionality much better and matches other helpers like xfs_read_agf and xfs_read_agi. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: remove im_len field in struct xfs_imapChristoph Hellwig
im_len is always set to the same value for a given file system, which makes it redundant. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: cleanup xfs_imapChristoph Hellwig
Reshuffle the code a bit so that the imap_lookup and filling out of the xfs_imap structure aren't duplicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: remove the call to xfs_buf_reverify in xfs_trans_read_buf_mapChristoph Hellwig
xfs_trans_read_buf_map asserts bp->b_ops is non-NULL just before calling xfs_buf_reverify which is a no-op if bp->b_ops is set, making the call dead code ever since it as added in commit 1aff5696f3e0 ("xfs: always assign buffer verifiers when one is provided"). Remove the useless call, mark xfs_buf_reverify static and clean up the branch dealing with a buffer attached to the transaction in a bit by deduplicating and keeping together the asserts and removing the bip variable only used once outside of asserts and tracing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: remove the i_ino field in struct xfs_inodeChristoph Hellwig
Now that the VFS inode has a u64 i_ino field, there is no need to store a copy of the inode number in the xfs_inode structure. Introduce an I_INO() wrapper as a shortcut to the inode number so that we don't have to propagate the VFS inode everywhere. The only non-obvious part is the clearing of i_ino to 0 for RCU freeing the inode. None of this calls into VFS paths, which makes clearing the VFS inode field here just as safe as clearing the old field in the xfs_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: remove xfs_setup_existing_inodeChristoph Hellwig
xfs_setup_existing_inode only has a single caller, fold it into that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: convert xchk_inode_xref_set_corrupt to xchk_ip_xref_set_corruptChristoph Hellwig
All xref corruption reports have the xfs_inode structure, so switch the helper to work based on that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a xchk_ip_set_corrupt helperChristoph Hellwig
Add a smaller wrapper to set a inode corrupted by the xfs_inode pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a xfs_rmap_inode_owner helperChristoph Hellwig
Add a small wrapper for initializing the rmap owner to i_ino. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a xfs_rmap_inode_bmbt_ownerChristoph Hellwig
Add a small wrapper for initializing the bmbt owner to i_ino. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a XFS_INO_TO_FSB helperChristoph Hellwig
Add a shortcut for the common XFS_INO_TO_FSB(mp, ip->i_ino) pattern. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a XFS_INODE_TO_AGINO helperChristoph Hellwig
Add a shortcut for the common XFS_INO_TO_AGINO(mp, ip->i_ino) pattern. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: add a XFS_INODE_TO_AGNO helperChristoph Hellwig
Add a shortcut for the common XFS_INO_TO_AGNO(mp, ip->i_ino) pattern. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: fix unreachable BIGTIME check in dquot flush validationAlexey Nepomnyashih
The dqp->q_id == 0 check inside the XFS_DQTYPE_BIGTIME block is unreachable because root dquots return successfully earlier. Reject root dquots with XFS_DQTYPE_BIGTIME before that early return, preserving the intended validation and removing the unreachable condition. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 4ea1ff3b4968 ("xfs: widen ondisk quota expiration timestamps to handle y2038+") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Allison Henderson <achender@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: fix exchmaps reservation limit checkYingjie Gao
xfs_exchmaps_estimate_overhead() adds the bmbt and rmapbt overhead to a local resblks variable, but the final UINT_MAX check still tests req->resblks. That is the reservation value from before the overhead was added. The computed value is stored back in req->resblks and later passed to xfs_trans_alloc(), whose block reservation argument is unsigned int. Check the computed reservation so the existing limit applies to the value that will be used. Fixes: 966ceafc7a43 ("xfs: create deferred log items for file mapping exchanges") Cc: stable@vger.kernel.org # v6.10 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
12 daysxfs: drop the experimental warning for the zoned allocatorChristoph Hellwig
The zoned allocator has been released with 6.15 on May 25, 2025. It has seen constant maintenance and improvements and no major issues, so promote it out of the experimental category. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: Remove mention of PageWritebackMatthew Wilcox (Oracle)
Update a comment to refer to folios instead of pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: abort mount if xfs_fs_reserve_ag_blocks failsChristoph Hellwig
xfs_mountfs currently ignores all errors from xfs_fs_reserve_ag_blocks, which can lead to the mount path continuing on corruption errors. Fix the check to only ignore -ENOSPC as in other callers, and unwind for all other errors. Fixes: 81ed94751b15 ("xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: factor rtgroup geom write pointer reporting into a helperChristoph Hellwig
Sticks out a bit better if we add a separate helper for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: drop the RTG reference later in xfs_ioc_rtgroup_geometryChristoph Hellwig
Keep the rtgroup reference until after reporting the write pointer, as that uses it. Right now this is not a major issue as we don't support shrinking file systems in a way that makes RTGs go away, but let's stick to the proper reference counting to prepare for that. Fixes: c6ce65cb17aa ("xfs: add write pointer to xfs_rtgroup_geometry") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: fix rtgroup cleanup in CoW fork repairYingjie Gao
xrep_cow_find_bad_rt() initializes scrub rtgroup state before the force-rebuild path calls xrep_cow_mark_file_range(). If that call fails, the code jumps directly to out_rtg, which skips the scrub rtgroup cleanup and only drops the local rtgroup reference. Remove the unnecessary jump so the function falls through to out_sr, ensuring the realtime cursors, lock state, and sr->rtg reference are released before returning. Fixes: fd97fe111208 ("xfs: fix CoW forks for realtime files") Cc: <stable@vger.kernel.org> # v6.14 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: fix error returns in CoW fork repairYingjie Gao
xrep_cow_find_bad() returns success after the cleanup labels even if AG setup, btree queries, or bitmap updates failed. This can make repair continue with an incomplete bad-file-offset bitmap instead of stopping at the original error. The force-rebuild path has a related cleanup problem. If xrep_cow_mark_file_range() fails, the function returns directly and skips the scrub AG context and perag cleanup. Let the force-rebuild path fall through to the existing cleanup code and return the saved error after cleanup. Fixes: dbbdbd008632 ("xfs: repair problems in CoW forks") Cc: <stable@vger.kernel.org> # v6.8 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: fix overlapping extents returned for pNFS LAYOUTGETDai Ngo
xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(), which causes the bmap code to expand the mapping to cover the entire extent rather than the requested range. A single LAYOUTGET request from the client can cause the server to issue multiple calls to xfs_fs_map_blocks() for different offsets within the same extent. Because the use of XFS_BMAPI_ENTIRE flag, these calls can produce overlapping mappings. As a result, the LAYOUTGET reply sent to the NFS client may contain overlapping extents. This creates ambiguity in extent selection for a given file range, which can lead to incorrect device selection, inconsistent handling of datastate, and ultimately data corruption or protocol violations on the client side. Problem discovered with xfstest generic/075 test using NFSv4.2 mount with SCSI layout. Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that xfs_bmapi_read() returns only the mapping for the requested range. Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET"). Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: fix use of uninitialized imap in xfs_fs_map_blocks error pathDai Ngo
xfs_fs_map_blocks() acquires the data map lock and then calls xfs_bmapi_read(). If xfs_bmapi_read() fails, the function currently still falls through to xfs_bmbt_to_iomap(), which consumes an uninitialized imap record and may return invalid data to the caller. Fix this by releasing the data map lock and returning immediately when xfs_bmapi_read() reports an error. This prevents xfs_bmbt_to_iomap() from being called with an uninitialized xfs_bmbt_irec. Fixes: 527851124d10f ("xfs: implement pNFS export operations") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-30xfs: handle racing deletions in xfs_zone_gc_iter_irecHans Holmberg
Under heavy garbage collection pressure from RocksDB workloads, filesystem shutdowns can occur in xfs_zone_gc_iter_irec when xfs_iget() returns -EINVAL for deleted files. Fix this by handling -EINVAL just like we handle -ENOENT, allowing zone GC to safely ignore stale mappings. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-21xfs: fix a buffer lookup against removal raceChristoph Hellwig
When a buffer is freed either by LRU eviction or because it is unset, the lockref is marked as dead instantly, which prevents the buffer from being used after finding it in the buffer hash in xfs_buf_lookup and xfs_buf_find_insert. But the latter will then not add the new buffer to the hash because it already found an existing buffer. Fix this using in two places: Remove the buffer from the hash before marking the lockref dead so that that no buffer with a dead lockref can be found in the hash, but if we find one in xfs_buf_find_insert due to store reordering, handle this case correctly instead of returning an unhashed buffer. Fixes: 67fe4303972e ("xfs: don't keep a reference for buffers on the LRU") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Reported-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-11fs: add icount_read_once() and stop open-coding ->i_count loadsMateusz Guzik
Similarly to inode_state_read_once(), it makes the caller spell out they acknowledge instability of the returned value. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260421182538.1215894-2-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-11xfs: Report case sensitivity in fileattr_getChuck Lever
Upper layers such as NFSD need to query whether a filesystem is case-sensitive. Add FS_XFLAG_CASEFOLD to xfs_ip2xflags() when the filesystem is formatted with the ASCIICI feature flag. This serves both FS_IOC_FSGETXATTR (via xfs_fill_fsxattr() in xfs_fileattr_get()) and XFS_IOC_BULKSTAT (which populates bs_xflags directly from xfs_ip2xflags()), so bulkstat consumers and per-inode queries see a consistent view of the filesystem's case-folding behavior. FS_XFLAG_CASEFOLD is read-only: FS_XFLAG_RDONLY_MASK ensures FS_IOC_FSSETXATTR strips it, and xfs_flags2diflags() has no clause for CASEFOLD so the on-disk diflags are unaffected. The legacy FS_IOC_SETFLAGS path in xfs_fileattr_set() also allows FS_CASEFOLD_FL through its allowlist on ASCIICI filesystems so that a chattr read-modify-write cycle does not fail with EOPNOTSUPP. XFS always preserves case. XFS is case-sensitive by default, but supports ASCII case-insensitive lookups when formatted with the ASCIICI feature flag. Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260507-case-sensitivity-v14-8-e62cc8200435@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-11fs: Move file_kattr initialization to callersChuck Lever
fileattr_fill_xflags() and fileattr_fill_flags() memset the entire file_kattr struct before populating select fields, so callers cannot pre-set fields in fa->fsx_xflags without having their values clobbered. Darrick Wong noted that a function named "fill_xflags" touching more than xflags forces callers to know implementation details beyond its apparent scope. Drop the memset from both fill functions and initialize at the entry points instead: ioctl_setflags(), ioctl_fssetxattr(), the file_setattr() syscall, and xfs_ioc_fsgetxattra() now declare fa with an aggregate initializer. ioctl_getflags(), ioctl_fsgetxattr(), and the file_getattr() syscall already aggregate-initialize fa to pass flags_valid/fsx_valid hints into vfs_fileattr_get(). Subsequent patches rely on this so that ->fileattr_get() handlers can set case-sensitivity flags (FS_XFLAG_CASEFOLD, FS_XFLAG_CASENONPRESERVING) in fa->fsx_xflags before the fill functions run. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260507-case-sensitivity-v14-1-e62cc8200435@oracle.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-11xfs: Fix typo in commentMd Shofiqul Islam
Fix spelling mistake in comment: - occured -> occurred Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Md Shofiqul Islam <shofiqtest@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-11xfs: fix the "limiting open zones" messageChristoph Hellwig
The xfs logging macros include a newline, remove the \n, which adds an extra one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-11exportfs,nfsd: rework checking for layout-based block device access supportChristoph Hellwig
Currently NFSD hard codes checking support for block-style layouts. Lift the checks into a file system-helper and provide a exportfs-level helper to implement the typical checks. This prepares for supporting block layout export of multiple devices per file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260423181854.743150-5-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-11exportfs: don't pass struct iattr to ->commit_blocksChristoph Hellwig
The only thing ->commit_blocks really needs is the new size, with a magic -1 placeholder 0 for "do not change the size" because it only ever extends the size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260423181854.743150-4-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>