summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2026-03-10smb: client: fix atomic open with O_DIRECT & O_SYNCPaulo Alcantara
When user application requests O_DIRECT|O_SYNC along with O_CREAT on open(2), CREATE_NO_BUFFER and CREATE_WRITE_THROUGH bits were missed in CREATE request when performing an atomic open, thus leading to potentially data integrity issues. Fix this by setting those missing bits in CREATE request when O_DIRECT|O_SYNC has been specified in cifs_do_create(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Acked-by: Henrique Carvalho <henrique.carvalho@suse.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-10xfs: fix undersized l_iclog_roundoff valuesDarrick J. Wong
If the superblock doesn't list a log stripe unit, we set the incore log roundoff value to 512. This leads to corrupt logs and unmountable filesystems in generic/617 on a disk with 4k physical sectors... XFS (sda1): Mounting V5 Filesystem ff3121ca-26e6-4b77-b742-aaff9a449e1c XFS (sda1): Torn write (CRC failure) detected at log block 0x318e. Truncating head block from 0x3197. XFS (sda1): failed to locate log tail XFS (sda1): log mount/recovery failed: error -74 XFS (sda1): log mount failed XFS (sda1): Mounting V5 Filesystem ff3121ca-26e6-4b77-b742-aaff9a449e1c XFS (sda1): Ending clean mount ...on the current xfsprogs for-next which has a broken mkfs. xfs_info shows this... meta-data=/dev/sda1 isize=512 agcount=4, agsize=644992 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=1 metadir=1 data = bsize=4096 blocks=2579968, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=4096 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=268435456 extents = zoned=0 start=0 reserved=0 ...observe that the log section has sectsz=4096 sunit=0, which means that the roundoff factor is 512, not 4096 as you'd expect. We should fix mkfs not to generate broken filesystems, but anyone can fuzz the ondisk superblock so we should be more cautious. I think the inadequate logic predates commit a6a65fef5ef8d0, but that's clearly going to require a different backport. Cc: stable@vger.kernel.org # v5.14 Fixes: a6a65fef5ef8d0 ("xfs: log stripe roundoff is a property of the log") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-10xfs: ensure dquot item is deleted from AIL only after log shutdownLong Li
In xfs_qm_dqflush(), when a dquot flush fails due to corruption (the out_abort error path), the original code removed the dquot log item from the AIL before calling xfs_force_shutdown(). This ordering introduces a subtle race condition that can lead to data loss after a crash. The AIL tracks the oldest dirty metadata in the journal. The position of the tail item in the AIL determines the log tail LSN, which is the oldest LSN that must be preserved for crash recovery. When an item is removed from the AIL, the log tail can advance past the LSN of that item. The race window is as follows: if the dquot item happens to be at the tail of the log, removing it from the AIL allows the log tail to advance. If a concurrent log write is sampling the tail LSN at the same time and subsequently writes a complete checkpoint (i.e., one containing a commit record) to disk before the shutdown takes effect, the journal will no longer protect the dquot's last modification. On the next mount, log recovery will not replay the dquot changes, even though they were never written back to disk, resulting in silent data loss. Fix this by calling xfs_force_shutdown() before xfs_trans_ail_delete() in the out_abort path. Once the log is shut down, no new log writes can complete with an updated tail LSN, making it safe to remove the dquot item from the AIL. Cc: stable@vger.kernel.org Fixes: b707fffda6a3 ("xfs: abort consistently on dquot flush failure") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-10xfs: remove redundant set null for ip->i_itempLong Li
ip->i_itemp has been set null in xfs_inode_item_destroy(), so there is no need set it null again in xfs_inode_free_callback(). Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-09ceph: do not skip the first folio of the next object in writebackHristo Venev
When `ceph_process_folio_batch` encounters a folio past the end of the current object, it should leave it in the batch so that it is picked up in the next iteration. Removing the folio from the batch means that it does not get written back and remains dirty instead. This makes `fsync()` silently skip some of the data, delays capability release, and breaks coherence with `O_DIRECT`. The link below contains instructions for reproducing the bug. Cc: stable@vger.kernel.org Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method") Link: https://tracker.ceph.com/issues/75156 Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-03-09ceph: fix memory leaks in ceph_mdsc_build_path()Max Kellermann
Add __putname() calls to error code paths that did not free the "path" pointer obtained by __getname(). If ownership of this pointer is not passed to the caller via path_info.path, the function must free it before returning. Cc: stable@vger.kernel.org Fixes: 3fd945a79e14 ("ceph: encode encrypted name in ceph_mdsc_build_path and dentry release") Fixes: 550f7ca98ee0 ("ceph: give up on paths longer than PATH_MAX") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-03-09ceph: add a bunch of missing ceph_path_info initializersMax Kellermann
ceph_mdsc_build_path() must be called with a zero-initialized ceph_path_info parameter, or else the following ceph_mdsc_free_path_info() may crash. Example crash (on Linux 6.18.12): virt_to_cache: Object is not a Slab page! WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6732 kmem_cache_free+0x316/0x400 [...] Call Trace: [...] ceph_open+0x13d/0x3e0 do_dentry_open+0x134/0x480 vfs_open+0x2a/0xe0 path_openat+0x9a3/0x1160 [...] cache_from_obj: Wrong slab cache. names_cache but object is from ceph_inode_info WARNING: CPU: 184 PID: 2871736 at mm/slub.c:6746 kmem_cache_free+0x2dd/0x400 [...] kernel BUG at mm/slub.c:634! Oops: invalid opcode: 0000 [#1] SMP NOPTI RIP: 0010:__slab_free+0x1a4/0x350 Some of the ceph_mdsc_build_path() callers had initializers, but others had not, even though they were all added by commit 15f519e9f883 ("ceph: fix race condition validating r_parent before applying state"). The ones without initializer are suspectible to random crashes. (I can imagine it could even be possible to exploit this bug to elevate privileges.) Unfortunately, these Ceph functions are undocumented and its semantics can only be derived from the code. I see that ceph_mdsc_build_path() initializes the structure only on success, but not on error. Calling ceph_mdsc_free_path_info() after a failed ceph_mdsc_build_path() call does not even make sense, but that's what all callers do, and for it to be safe, the structure must be zero-initialized. The least intrusive approach to fix this is therefore to add initializers everywhere. Cc: stable@vger.kernel.org Fixes: 15f519e9f883 ("ceph: fix race condition validating r_parent before applying state") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-03-09ceph: fix i_nlink underrun during async unlinkMax Kellermann
During async unlink, we drop the `i_nlink` counter before we receive the completion (that will eventually update the `i_nlink`) because "we assume that the unlink will succeed". That is not a bad idea, but it races against deletions by other clients (or against the completion of our own unlink) and can lead to an underrun which emits a WARNING like this one: WARNING: CPU: 85 PID: 25093 at fs/inode.c:407 drop_nlink+0x50/0x68 Modules linked in: CPU: 85 UID: 3221252029 PID: 25093 Comm: php-cgi8.1 Not tainted 6.14.11-cm4all1-ampere #655 Hardware name: Supermicro ARS-110M-NR/R12SPD-A, BIOS 1.1b 10/17/2023 pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drop_nlink+0x50/0x68 lr : ceph_unlink+0x6c4/0x720 sp : ffff80012173bc90 x29: ffff80012173bc90 x28: ffff086d0a45aaf8 x27: ffff0871d0eb5680 x26: ffff087f2a64a718 x25: 0000020000000180 x24: 0000000061c88647 x23: 0000000000000002 x22: ffff07ff9236d800 x21: 0000000000001203 x20: ffff07ff9237b000 x19: ffff088b8296afc0 x18: 00000000f3c93365 x17: 0000000000070000 x16: ffff08faffcbdfe8 x15: ffff08faffcbdfec x14: 0000000000000000 x13: 45445f65645f3037 x12: 34385f6369706f74 x11: 0000a2653104bb20 x10: ffffd85f26d73290 x9 : ffffd85f25664f94 x8 : 00000000000000c0 x7 : 0000000000000000 x6 : 0000000000000002 x5 : 0000000000000081 x4 : 0000000000000481 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff08727d3f91e8 Call trace: drop_nlink+0x50/0x68 (P) vfs_unlink+0xb0/0x2e8 do_unlinkat+0x204/0x288 __arm64_sys_unlinkat+0x3c/0x80 invoke_syscall.constprop.0+0x54/0xe8 do_el0_svc+0xa4/0xc8 el0_svc+0x18/0x58 el0t_64_sync_handler+0x104/0x130 el0t_64_sync+0x154/0x158 In ceph_unlink(), a call to ceph_mdsc_submit_request() submits the CEPH_MDS_OP_UNLINK to the MDS, but does not wait for completion. Meanwhile, between this call and the following drop_nlink() call, a worker thread may process a CEPH_CAP_OP_IMPORT, CEPH_CAP_OP_GRANT or just a CEPH_MSG_CLIENT_REPLY (the latter of which could be our own completion). These will lead to a set_nlink() call, updating the `i_nlink` counter to the value received from the MDS. If that new `i_nlink` value happens to be zero, it is illegal to decrement it further. But that is exactly what ceph_unlink() will do then. The WARNING can be reproduced this way: 1. Force async unlink; only the async code path is affected. Having no real clue about Ceph internals, I was unable to find out why the MDS wouldn't give me the "Fxr" capabilities, so I patched get_caps_for_async_unlink() to always succeed. (Note that the WARNING dump above was found on an unpatched kernel, without this kludge - this is not a theoretical bug.) 2. Add a sleep call after ceph_mdsc_submit_request() so the unlink completion gets handled by a worker thread before drop_nlink() is called. This guarantees that the `i_nlink` is already zero before drop_nlink() runs. The solution is to skip the counter decrement when it is already zero, but doing so without a lock is still racy (TOCTOU). Since ceph_fill_inode() and handle_cap_grant() both hold the `ceph_inode_info.i_ceph_lock` spinlock while set_nlink() runs, this seems like the proper lock to protect the `i_nlink` updates. I found prior art in NFS and SMB (using `inode.i_lock`) and AFS (using `afs_vnode.cb_lock`). All three have the zero check as well. Cc: stable@vger.kernel.org Fixes: 2ccb45462aea ("ceph: perform asynchronous unlink if we have sufficient caps") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-03-08ksmbd: Don't log keys in SMB3 signing and encryption key generationThorsten Blum
When KSMBD_DEBUG_AUTH logging is enabled, generate_smb3signingkey() and generate_smb3encryptionkey() log the session, signing, encryption, and decryption key bytes. Remove the logs to avoid exposing credentials. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-08smb: server: fix use-after-free in smb2_open()Marios Makassikis
The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-08ksmbd: fix use-after-free in smb_lazy_parent_lease_break_close()Namjae Jeon
opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is being accessed after rcu_read_unlock() has been called. This creates a race condition where the memory could be freed by a concurrent writer between the unlock and the subsequent pointer dereferences (opinfo->is_lease, etc.), leading to a use-after-free. Fixes: 5fb282ba4fef ("ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-08ksmbd: fix use-after-free by using call_rcu() for oplock_infoNamjae Jeon
ksmbd currently frees oplock_info immediately using kfree(), even though it is accessed under RCU read-side critical sections in places like opinfo_get() and proc_show_files(). Since there is no RCU grace period delay between nullifying the pointer and freeing the memory, a reader can still access oplock_info structure after it has been freed. This can leads to a use-after-free especially in opinfo_get() where atomic_inc_not_zero() is called on already freed memory. Fix this by switching to deferred freeing using call_rcu(). Fixes: 18b4fac5ef17 ("ksmbd: fix use-after-free in smb_break_all_levII_oplock()") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-08ksmbd: fix use-after-free in proc_show_files due to early rcu_read_unlockAli Khaledi
The opinfo pointer obtained via rcu_dereference(fp->f_opinfo) is dereferenced after rcu_read_unlock(), creating a use-after-free window. A concurrent opinfo_put() can free the opinfo between the unlock and the subsequent access to opinfo->is_lease, opinfo->o_lease->state, and opinfo->level. Fix this by deferring rcu_read_unlock() until after all opinfo field accesses are complete. The values needed (const_names, count, level) are copied into local variables under the RCU read lock, and the potentially-sleeping seq_printf calls happen after the lock is released. Found by AI-assisted code review (Claude Opus 4.6, Anthropic) in collaboration with Ali Khaledi. Cc: stable@vger.kernel.org Fixes: b38f99c1217a ("ksmbd: add procfs interface for runtime monitoring and statistics") Signed-off-by: Ali Khaledi <ali.khaledi1989@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-08smb/server: Fix another refcount leak in smb2_open()Guenter Roeck
If ksmbd_override_fsids() fails, we jump to err_out2. At that point, fp is NULL because it hasn't been assigned dh_info.fp yet, so ksmbd_fd_put(work, fp) will not be called. However, dh_info.fp was already inserted into the session file table by ksmbd_reopen_durable_fd(), so it will leak in the session file table until the session is closed. Move fp = dh_info.fp; ahead of the ksmbd_override_fsids() check to fix the problem. Found by an experimental AI code review agent at Google. Fixes: c8efcc786146a ("ksmbd: add support for durable handles v1/v2") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-06rxrpc, afs: Fix missing error pointer check after rxrpc_kernel_lookup_peer()Miaoqian Lin
rxrpc_kernel_lookup_peer() can also return error pointers in addition to NULL, so just checking for NULL is not sufficient. Fix this by: (1) Changing rxrpc_kernel_lookup_peer() to return -ENOMEM rather than NULL on allocation failure. (2) Making the callers in afs use IS_ERR() and PTR_ERR() to pass on the error code returned. Fixes: 72904d7b9bfb ("rxrpc, afs: Allow afs to pin rxrpc_peer objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Co-developed-by: David Howells <dhowells@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/368272.1772713861@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-06Merge tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Fix potential oops on open failure - Fix unmount to better free deferred closes - Use proper constant-time MAC comparison function - Two buffer allocation size fixes - Two minor cleanups - make SMB2 kunit tests a distinct module * tag 'v7.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix oops due to uninitialised var in smb2_unlink() cifs: open files should not hold ref on superblock smb: client: Compare MACs in constant time smb/client: remove unused SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info() smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op() smb: update some doc references smb/client: make SMB2 maperror KUnit tests a separate module
2026-03-06xfs: fix returned valued from xfs_defer_can_appendCarlos Maiolino
xfs_defer_can_append returns a bool, it shouldn't be returning a NULL. Found by code inspection. Fixes: 4dffb2cbb483 ("xfs: allow pausing of pending deferred work items") Cc: <stable@vger.kernel.org> # v6.8 Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Souptick Joarder <souptick.joarder@hpe.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-05smb: client: fix oops due to uninitialised var in smb2_unlink()Paulo Alcantara
If SMB2_open_init() or SMB2_close_init() fails (e.g. reconnect), the iovs set @rqst will be left uninitialised, hence calling SMB2_open_free(), SMB2_close_free() or smb2_set_related() on them will oops. Fix this by initialising @close_iov and @open_iov before setting them in @rqst. Reported-by: Thiago Becker <tbecker@redhat.com> Fixes: 1cf9f2a6a544 ("smb: client: handle unlink(2) of files open by different clients") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-05Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds
Pull fsverity fix from Eric Biggers: "Prevent CONFIG_FS_VERITY from being enabled when the page size is 256K, since it doesn't work in that case" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: add dependency on 64K or smaller pages
2026-03-05xfs: Remove redundant NULL check after __GFP_NOFAILhongao
kzalloc() is called with __GFP_NOFAIL, so a NULL return is not expected. Drop the redundant !map check in xfs_dabuf_map(). Also switch the nirecs-sized allocation to kcalloc(). Signed-off-by: hongao <hongao@uniontech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-04Merge tag 'vfs-7.0-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - kthread: consolidate kthread exit paths to prevent use-after-free - iomap: - don't mark folio uptodate if read IO has bytes pending - don't report direct-io retries to fserror - reject delalloc mappings during writeback - ns: tighten visibility checks - netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence * tag 'vfs-7.0-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: reject delalloc mappings during writeback iomap: don't mark folio uptodate if read IO has bytes pending selftests: fix mntns iteration selftests nstree: tighten permission checks for listing nsfs: tighten permission checks for handle opening nsfs: tighten permission checks for ns iteration ioctls netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence kthread: consolidate kthread exit paths to prevent use-after-free iomap: don't report direct-io retries to fserror
2026-03-04cifs: open files should not hold ref on superblockShyam Prasad N
Today whenever we deal with a file, in addition to holding a reference on the dentry, we also get a reference on the superblock. This happens in two cases: 1. when a new cinode is allocated 2. when an oplock break is being processed The reasoning for holding the superblock ref was to make sure that when umount happens, if there are users of inodes and dentries, it does not try to clean them up and wait for the last ref to superblock to be dropped by last of such users. But the side effect of doing that is that umount silently drops a ref on the superblock and we could have deferred closes and lease breaks still holding these refs. Ideally, we should ensure that all of these users of inodes and dentries are cleaned up at the time of umount, which is what this code is doing. This code change allows these code paths to use a ref on the dentry (and hence the inode). That way, umount is ensured to clean up SMB client resources when it's the last ref on the superblock (For ex: when same objects are shared). The code change also moves the call to close all the files in deferred close list to the umount code path. It also waits for oplock_break workers to be flushed before calling kill_anon_super (which eventually frees up those objects). Fixes: 24261fc23db9 ("cifs: delay super block destruction until all cifsFileInfo objects are gone") Fixes: 705c79101ccf ("smb: client: fix use-after-free in cifs_oplock_break") Cc: <stable@vger.kernel.org> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-04iomap: reject delalloc mappings during writebackDarrick J. Wong
Filesystems should never provide a delayed allocation mapping to writeback; they're supposed to allocate the space before replying. This can lead to weird IO errors and crashes in the block layer if the filesystem is being malicious, or if it hadn't set iomap->dev because it's a delalloc mapping. Fix this by failing writeback on delalloc mappings. Currently no filesystems actually misbehave in this manner, but we ought to be stricter about things like that. Cc: stable@vger.kernel.org # v5.5 Fixes: 598ecfbaa742ac ("iomap: lift the xfs writeback code to iomap") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20260302173002.GL13829@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-04iomap: don't mark folio uptodate if read IO has bytes pendingJoanne Koong
If a folio has ifs metadata attached to it and the folio is partially read in through an async IO helper with the rest of it then being read in through post-EOF zeroing or as inline data, and the helper successfully finishes the read first, then post-EOF zeroing / reading inline will mark the folio as uptodate in iomap_set_range_uptodate(). This is a problem because when the read completion path later calls iomap_read_end(), it will call folio_end_read(), which sets the uptodate bit using XOR semantics. Calling folio_end_read() on a folio that was already marked uptodate clears the uptodate bit. Fix this by not marking the folio as uptodate if the read IO has bytes pending. The folio uptodate state will be set in the read completion path through iomap_end_read() -> folio_end_read(). Reported-by: Wei Gao <wegao@suse.com> Suggested-by: Sasha Levin <sashal@kernel.org> Tested-by: Wei Gao <wegao@suse.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Cc: stable@vger.kernel.org # v6.19 Link: https://lore.kernel.org/linux-fsdevel/aYbmy8JdgXwsGaPP@autotest-wegao.qe.prg2.suse.org/ Fixes: b2f35ac4146d ("iomap: add caller-provided callbacks for read and readahead") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260303233420.874231-2-joannelkoong@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-04xfs: fix race between healthmon unmount and read_iterDarrick J. Wong
xfs/1879 on one of my test VMs got stuck due to the xfs_io healthmon subcommand sleeping in wait_event_interruptible at: xfs_healthmon_read_iter+0x558/0x5f8 [xfs] vfs_read+0x248/0x320 ksys_read+0x78/0x120 Looking at xfs_healthmon_read_iter, in !O_NONBLOCK mode it will sleep until the mount cookie == DETACHED_MOUNT_COOKIE, there are events waiting to be formatted, or there are formatted events in the read buffer that could be copied to userspace. Poking into the running kernel, I see that there are zero events in the list, the read buffer is empty, and the mount cookie is indeed in DETACHED state. IOWs, xfs_healthmon_has_eventdata should have returned true, but instead we're asleep waiting for a wakeup. I think what happened here is that xfs_healthmon_read_iter and xfs_healthmon_unmount were racing with each other, and _read_iter lost the race. _unmount queued an unmount event, which woke up _read_iter. It found, formatted, and copied the event out to userspace. That cleared out the pending event list and emptied the read buffer. xfs_io then called read() again, so _has_eventdata decided that we should sleep on the empty event queue. Next, _unmount called xfs_healthmon_detach, which set the mount cookie to DETACHED. Unfortunately, it didn't call wake_up_all on the hm, so the wait_event_interruptible in the _read_iter thread remains asleep. That's why the test stalled. Fix this by moving the wake_up_all call to xfs_healthmon_detach. Fixes: b3a289a2a9397b ("xfs: create event queuing, formatting, and discovery infrastructure") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-04xfs: remove scratch field from struct xfs_gc_bioDamien Le Moal
The scratch field in struct xfs_gc_bio is unused. Remove it. Fixes: 102f444b57b3 ("xfs: rework zone GC buffer management") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-03smb: client: Compare MACs in constant timeEric Biggers
To prevent timing attacks, MAC comparisons need to be constant-time. Replace the memcmp() with the correct function, crypto_memneq(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-03smb/client: remove unused SMB311_posix_query_info()ZhangGuoDong
It is currently unused, as now we are doing compounding instead (see smb2_query_path_info()). Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-03smb/client: fix buffer size for smb311_posix_qinfo in SMB311_posix_query_info()ZhangGuoDong
SMB311_posix_query_info() is currently unused, but it may still be used in some stable versions, so these changes are submitted as a separate patch. Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: b1bc1874b885 ("smb311: Add support for SMB311 query info (non-compounded)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-03smb/client: fix buffer size for smb311_posix_qinfo in smb2_compound_op()ZhangGuoDong
Use `sizeof(struct smb311_posix_qinfo)` instead of sizeof its pointer, so the allocated buffer matches the actual struct size. Fixes: 6a5f6592a0b6 ("SMB311: Add support for query info using posix extensions (level 100)") Reported-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-03Merge tag 'for-7.0-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One-liner or short fixes for minor/moderate problems reported recently: - fixes or level adjustments of error messages - fix leaked transaction handles after aborted transactions, when using the remap tree feature - fix a few leaked chunk maps after errors - fix leaked page array in io_uring encoded read if an error occurs and the 'finished' is not called - fix double release of reserved extents when doing a range COW - don't commit super block when the filesystem is in shutdown state - fix squota accounting condition when checking members vs parent usage - other error handling fixes" * tag 'for-7.0-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: check block group lookup in remove_range_from_remap_tree() btrfs: fix transaction handle leaks in btrfs_last_identity_remap_gone() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_translate_remap() btrfs: fix chunk map leak in btrfs_map_block() after btrfs_chunk_map_num_copies() btrfs: fix compat mask in error messages in btrfs_check_features() btrfs: print correct subvol num if active swapfile prevents deletion btrfs: fix warning in scrub_verify_one_metadata() btrfs: fix objectid value in error message in check_extent_data_ref() btrfs: fix incorrect key offset in error message in check_dev_extent_item() btrfs: fix error message order of parameters in btrfs_delete_delayed_dir_index() btrfs: don't commit the super block when unmounting a shutdown filesystem btrfs: free pages on error in btrfs_uring_read_extent() btrfs: fix referenced/exclusive check in squota_check_parent_usage() btrfs: remove pointless WARN_ON() in cache_save_setup() btrfs: convert log messages to error level in btrfs_replay_log() btrfs: remove btrfs_handle_fs_error() after failure to recover log trees btrfs: remove redundant warning message in btrfs_check_uuid_tree() btrfs: change warning messages to error level in open_ctree() btrfs: fix a double release on reserved extents in cow_one_range() btrfs: handle discard errors in in btrfs_finish_extent_commit()
2026-03-03btrfs: remove duplicated definition of btrfs_printk_in_rcu()Filipe Manana
It's defined twice in a row for the !CONFIG_PRINTK case, so remove one of the duplicates. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: remove unnecessary transaction abort in the received subvol ioctlFilipe Manana
If we fail to remove an item from the uuid tree, we don't need to abort the transaction since we have not done any change before. So remove that transaction abort. Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: abort transaction on failure to update root in the received subvol ioctlFilipe Manana
If we failed to update the root we don't abort the transaction, which is wrong since we already used the transaction to remove an item from the uuid tree. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: fix transaction abort on set received ioctl due to item overflowFilipe Manana
If the set received ioctl fails due to an item overflow when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL we have to abort the transaction since we did some metadata updates before. This means that if a user calls this ioctl with the same received UUID field for a lot of subvolumes, we will hit the overflow, trigger the transaction abort and turn the filesystem into RO mode. A malicious user could exploit this, and this ioctl does not even requires that a user has admin privileges (CAP_SYS_ADMIN), only that he/she owns the subvolume. Fix this by doing an early check for item overflow before starting a transaction. This is also race safe because we are holding the subvol_sem semaphore in exclusive (write) mode. A test case for fstests will follow soon. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: fix transaction abort when snapshotting received subvolumesFilipe Manana
Currently a user can trigger a transaction abort by snapshotting a previously received snapshot a bunch of times until we reach a BTRFS_UUID_KEY_RECEIVED_SUBVOL item overflow (the maximum item size we can store in a leaf). This is very likely not common in practice, but if it happens, it turns the filesystem into RO mode. The snapshot, send and set_received_subvol and subvol_setflags (used by receive) don't require CAP_SYS_ADMIN, just inode_owner_or_capable(). A malicious user could use this to turn a filesystem into RO mode and disrupt a system. Reproducer script: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # Create a subvolume and set it to RO so that it can be used for send. btrfs subvolume create $MNT/sv touch $MNT/sv/foo btrfs property set $MNT/sv ro true # Send and receive the subvolume into snaps/sv. mkdir $MNT/snaps btrfs send $MNT/sv | btrfs receive $MNT/snaps # Now snapshot the received subvolume, which has a received_uuid, a # lot of times to trigger the leaf overflow. total=500 for ((i = 1; i <= $total; i++)); do echo -ne "\rCreating snapshot $i/$total" btrfs subvolume snapshot -r $MNT/snaps/sv $MNT/snaps/sv_$i > /dev/null done echo umount $MNT When running the test: $ ./test.sh (...) Create subvolume '/mnt/sdi/sv' At subvol /mnt/sdi/sv At subvol sv Creating snapshot 496/500ERROR: Could not create subvolume: Value too large for defined data type Creating snapshot 497/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 498/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 499/500ERROR: Could not create subvolume: Read-only file system Creating snapshot 500/500ERROR: Could not create subvolume: Read-only file system And in dmesg/syslog: $ dmesg (...) [251067.627338] BTRFS warning (device sdi): insert uuid item failed -75 (0x4628b21c4ac8d898, 0x2598bee2b1515c91) type 252! [251067.629212] ------------[ cut here ]------------ [251067.630033] BTRFS: Transaction aborted (error -75) [251067.630871] WARNING: fs/btrfs/transaction.c:1907 at create_pending_snapshot.cold+0x52/0x465 [btrfs], CPU#10: btrfs/615235 [251067.632851] Modules linked in: btrfs dm_zero (...) [251067.644071] CPU: 10 UID: 0 PID: 615235 Comm: btrfs Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [251067.646165] Tainted: [W]=WARN [251067.646733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [251067.648735] RIP: 0010:create_pending_snapshot.cold+0x55/0x465 [btrfs] [251067.649984] Code: f0 48 0f (...) [251067.653313] RSP: 0018:ffffce644908fae8 EFLAGS: 00010292 [251067.653987] RAX: 00000000ffffff01 RBX: ffff8e5639e63a80 RCX: 00000000ffffffd3 [251067.655042] RDX: ffff8e53faa76b00 RSI: 00000000ffffffb5 RDI: ffffffffc0919750 [251067.656077] RBP: ffffce644908fbd8 R08: 0000000000000000 R09: ffffce644908f820 [251067.657068] R10: ffff8e5adc1fffa8 R11: 0000000000000003 R12: ffff8e53c0431bd0 [251067.658050] R13: ffff8e5414593600 R14: ffff8e55efafd000 R15: 00000000ffffffb5 [251067.659019] FS: 00007f2a4944b3c0(0000) GS:ffff8e5b27dae000(0000) knlGS:0000000000000000 [251067.660115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [251067.660943] CR2: 00007ffc5aa57898 CR3: 00000005813a2003 CR4: 0000000000370ef0 [251067.661972] Call Trace: [251067.662292] <TASK> [251067.662653] create_pending_snapshots+0x97/0xc0 [btrfs] [251067.663413] btrfs_commit_transaction+0x26e/0xc00 [btrfs] [251067.664257] ? btrfs_qgroup_convert_reserved_meta+0x35/0x390 [btrfs] [251067.665238] ? _raw_spin_unlock+0x15/0x30 [251067.665837] ? record_root_in_trans+0xa2/0xd0 [btrfs] [251067.666531] btrfs_mksubvol+0x330/0x580 [btrfs] [251067.667145] btrfs_mksnapshot+0x74/0xa0 [btrfs] [251067.667827] __btrfs_ioctl_snap_create+0x194/0x1d0 [btrfs] [251067.668595] btrfs_ioctl_snap_create_v2+0x107/0x130 [btrfs] [251067.669479] btrfs_ioctl+0x1580/0x2690 [btrfs] [251067.670093] ? count_memcg_events+0x6d/0x180 [251067.670849] ? handle_mm_fault+0x1a0/0x2a0 [251067.671652] __x64_sys_ioctl+0x92/0xe0 [251067.672406] do_syscall_64+0x50/0xf20 [251067.673129] entry_SYSCALL_64_after_hwframe+0x76/0x7e [251067.674096] RIP: 0033:0x7f2a495648db [251067.674812] Code: 00 48 89 (...) [251067.678227] RSP: 002b:00007ffc5aa57840 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [251067.679691] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2a495648db [251067.681145] RDX: 00007ffc5aa588b0 RSI: 0000000050009417 RDI: 0000000000000004 [251067.682511] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 [251067.683842] R10: 000000000000000a R11: 0000000000000246 R12: 00007ffc5aa59910 [251067.685176] R13: 00007ffc5aa588b0 R14: 0000000000000004 R15: 0000000000000006 [251067.686524] </TASK> [251067.686972] ---[ end trace 0000000000000000 ]--- [251067.687890] BTRFS: error (device sdi state A) in create_pending_snapshot:1907: errno=-75 unknown [251067.689049] BTRFS info (device sdi state EA): forced readonly [251067.689054] BTRFS warning (device sdi state EA): Skipping commit of aborted transaction. [251067.690119] BTRFS: error (device sdi state EA) in cleanup_transaction:2043: errno=-75 unknown [251067.702028] BTRFS info (device sdi state EA): last unmount of filesystem 46dc3975-30a2-4a69-a18f-418b859cccda Fix this by ignoring -EOVERFLOW errors from btrfs_uuid_tree_add() in the snapshot creation code when attempting to add the BTRFS_UUID_KEY_RECEIVED_SUBVOL item. This is OK because it's not critical and we are still able to delete the snapshot, as snapshot/subvolume deletion ignores if a BTRFS_UUID_KEY_RECEIVED_SUBVOL is missing (see inode.c:btrfs_delete_subvolume()). As for send/receive, we can still do send/receive operations since it always peeks the first root ID in the existing BTRFS_UUID_KEY_RECEIVED_SUBVOL (it could peek any since all snapshots have the same content), and even if the key is missing, it falls back to searching by BTRFS_UUID_KEY_SUBVOL key. A test case for fstests will be sent soon. Fixes: dd5f9615fc5c ("Btrfs: maintain subvolume items in the UUID tree") CC: stable@vger.kernel.org # 3.12+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: fix transaction abort on file creation due to name hash collisionFilipe Manana
If we attempt to create several files with names that result in the same hash, we have to pack them in same dir item and that has a limit inherent to the leaf size. However if we reach that limit, we trigger a transaction abort and turns the filesystem into RO mode. This allows for a malicious user to disrupt a system, without the need to have administration privileges/capabilities. Reproducer: $ cat exploit-hash-collisions.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi # Use smallest node size to make the test faster and require fewer file # names that result in hash collision. mkfs.btrfs -f --nodesize 4K $DEV mount $DEV $MNT # List of names that result in the same crc32c hash for btrfs. declare -a names=( 'foobar' '%a8tYkxfGMLWRGr55QSeQc4PBNH9PCLIvR6jZnkDtUUru1t@RouaUe_L:@xGkbO3nCwvLNYeK9vhE628gss:T$yZjZ5l-Nbd6CbC$M=hqE-ujhJICXyIxBvYrIU9-TDC' 'AQci3EUB%shMsg-N%frgU:02ByLs=IPJU0OpgiWit5nexSyxZDncY6WB:=zKZuk5Zy0DD$Ua78%MelgBuMqaHGyKsJUFf9s=UW80PcJmKctb46KveLSiUtNmqrMiL9-Y0I_l5Fnam04CGIg=8@U:Z' 'CvVqJpJzueKcuA$wqwePfyu7VxuWNN3ho$p0zi2H8QFYK$7YlEqOhhb%:hHgjhIjW5vnqWHKNP4' 'ET:vk@rFU4tsvMB0$C_p=xQHaYZjvoF%-BTc%wkFW8yaDAPcCYoR%x$FH5O:' 'HwTon%v7SGSP4FE08jBwwiu5aot2CFKXHTeEAa@38fUcNGOWvE@Mz6WBeDH_VooaZ6AgsXPkVGwy9l@@ZbNXabUU9csiWrrOp0MWUdfi$EZ3w9GkIqtz7I_eOsByOkBOO' 'Ij%2VlFGXSuPvxJGf5UWy6O@1svxGha%b@=%wjkq:CIgE6u7eJOjmQY5qTtxE2Rjbis9@us' 'KBkjG5%9R8K9sOG8UTnAYjxLNAvBmvV5vz3IiZaPmKuLYO03-6asI9lJ_j4@6Xo$KZicaLWJ3Pv8XEwVeUPMwbHYWwbx0pYvNlGMO9F:ZhHAwyctnGy%_eujl%WPd4U2BI7qooOSr85J-C2V$LfY' 'NcRfDfuUQ2=zP8K3CCF5dFcpfiOm6mwenShsAb_F%n6GAGC7fT2JFFn:c35X-3aYwoq7jNX5$ZJ6hI3wnZs$7KgGi7wjulffhHNUxAT0fRRLF39vJ@NvaEMxsMO' 'Oj42AQAEzRoTxa5OuSKIr=A_lwGMy132v4g3Pdq1GvUG9874YseIFQ6QU' 'Ono7avN5GjC:_6dBJ_' 'WHmN2gnmaN-9dVDy4aWo:yNGFzz8qsJyJhWEWcud7$QzN2D9R0efIWWEdu5kwWr73NZm4=@CoCDxrrZnRITr-kGtU_cfW2:%2_am' 'WiFnuTEhAG9FEC6zopQmj-A-$LDQ0T3WULz%ox3UZAPybSV6v1Z$b4L_XBi4M4BMBtJZpz93r9xafpB77r:lbwvitWRyo$odnAUYlYMmU4RvgnNd--e=I5hiEjGLETTtaScWlQp8mYsBovZwM2k' 'XKyH=OsOAF3p%uziGF_ZVr$ivrvhVgD@1u%5RtrV-gl_vqAwHkK@x7YwlxX3qT6WKKQ%PR56NrUBU2dOAOAdzr2=5nJuKPM-T-$ZpQfCL7phxQbUcb:BZOTPaFExc-qK-gDRCDW2' 'd3uUR6OFEwZr%ns1XH_@tbxA@cCPmbBRLdyh7p6V45H$P2$F%w0RqrD3M0g8aGvWpoTFMiBdOTJXjD:JF7=h9a_43xBywYAP%r$SPZi%zDg%ql-KvkdUCtF9OLaQlxmd' 'ePTpbnit%hyNm@WELlpKzNZYOzOTf8EQ$sEfkMy1VOfIUu3coyvIr13-Y7Sv5v-Ivax2Go_GQRFMU1b3362nktT9WOJf3SpT%z8sZmM3gvYQBDgmKI%%RM-G7hyrhgYflOw%z::ZRcv5O:lDCFm' 'evqk743Y@dvZAiG5J05L_ROFV@$2%rVWJ2%3nxV72-W7$e$-SK3tuSHA2mBt$qloC5jwNx33GmQUjD%akhBPu=VJ5g$xhlZiaFtTrjeeM5x7dt4cHpX0cZkmfImndYzGmvwQG:$euFYmXn$_2rA9mKZ' 'gkgUtnihWXsZQTEkrMAWIxir09k3t7jk_IK25t1:cy1XWN0GGqC%FrySdcmU7M8MuPO_ppkLw3=Dfr0UuBAL4%GFk2$Ma10V1jDRGJje%Xx9EV2ERaWKtjpwiZwh0gCSJsj5UL7CR8RtW5opCVFKGGy8Cky' 'hNgsG_8lNRik3PvphqPm0yEH3P%%fYG:kQLY=6O-61Wa6nrV_WVGR6TLB09vHOv%g4VQRP8Gzx7VXUY1qvZyS' 'isA7JVzN12xCxVPJZ_qoLm-pTBuhjjHMvV7o=F:EaClfYNyFGlsfw-Kf%uxdqW-kwk1sPl2vhbjyHU1A6$hz' 'kiJ_fgcdZFDiOptjgH5PN9-PSyLO4fbk_:u5_2tz35lV_iXiJ6cx7pwjTtKy-XGaQ5IefmpJ4N_ZqGsqCsKuqOOBgf9LkUdffHet@Wu' 'lvwtxyhE9:%Q3UxeHiViUyNzJsy:fm38pg_b6s25JvdhOAT=1s0$pG25x=LZ2rlHTszj=gN6M4zHZYr_qrB49i=pA--@WqWLIuX7o1S_SfS@2FSiUZN' 'rC24cw3UBDZ=5qJBUMs9e$=S4Y94ni%Z8639vnrGp=0Hv4z3dNFL0fBLmQ40=EYIY:Z=SLc@QLMSt2zsss2ZXrP7j4=' 'uwGl2s-fFrf@GqS=DQqq2I0LJSsOmM%xzTjS:lzXguE3wChdMoHYtLRKPvfaPOZF2fER@j53evbKa7R%A7r4%YEkD=kicJe@SFiGtXHbKe4gCgPAYbnVn' 'UG37U6KKua2bgc:IHzRs7BnB6FD:2Mt5Cc5NdlsW%$1tyvnfz7S27FvNkroXwAW:mBZLA1@qa9WnDbHCDmQmfPMC9z-Eq6QT0jhhPpqyymaD:R02ghwYo%yx7SAaaq-:x33LYpei$5g8DMl3C' 'y2vjek0FE1PDJC0qpfnN:x8k2wCFZ9xiUF2ege=JnP98R%wxjKkdfEiLWvQzmnW' '8-HCSgH5B%K7P8_jaVtQhBXpBk:pE-$P7ts58U0J@iR9YZntMPl7j$s62yAJO@_9eanFPS54b=UTw$94C-t=HLxT8n6o9P=QnIxq-f1=Ne2dvhe6WbjEQtc' 'YPPh:IFt2mtR6XWSmjHptXL_hbSYu8bMw-JP8@PNyaFkdNFsk$M=xfL6LDKCDM-mSyGA_2MBwZ8Dr4=R1D%7-mCaaKGxb990jzaagRktDTyp' '9hD2ApKa_t_7x-a@GCG28kY:7$M@5udI1myQ$x5udtggvagmCQcq9QXWRC5hoB0o-_zHQUqZI5rMcz_kbMgvN5jr63LeYA4Cj-c6F5Ugmx6DgVf@2Jqm%MafecpgooqreJ53P-QTS' ) # Now create files with all those names in the same parent directory. # It should not fail since a 4K leaf has enough space for them. for name in "${names[@]}"; do touch $MNT/$name done # Now add one more file name that causes a crc32c hash collision. # This should fail, but it should not turn the filesystem into RO mode # (which could be exploited by malicious users) due to a transaction # abort. touch $MNT/'W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt' # Check that we are able to create another file, with a name that does not cause # a crc32c hash collision. echo -n "hello world" > $MNT/baz # Unmount and mount again, verify file baz exists and with the right content. umount $MNT mount $DEV $MNT echo "File baz content: $(cat $MNT/baz)" umount $MNT When running the reproducer: $ ./exploit-hash-collisions.sh (...) touch: cannot touch '/mnt/sdi/W6tIm-VK2@BGC@IBfcgg6j_p:pxp_QUqtWpGD5Ok_GmijKOJJt': Value too large for defined data type ./exploit-hash-collisions.sh: line 57: /mnt/sdi/baz: Read-only file system cat: /mnt/sdi/baz: No such file or directory File baz content: And the transaction abort stack trace in dmesg/syslog: $ dmesg (...) [758240.509761] ------------[ cut here ]------------ [758240.510668] BTRFS: Transaction aborted (error -75) [758240.511577] WARNING: fs/btrfs/inode.c:6854 at btrfs_create_new_inode+0x805/0xb50 [btrfs], CPU#6: touch/888644 [758240.513513] Modules linked in: btrfs dm_zero (...) [758240.523221] CPU: 6 UID: 0 PID: 888644 Comm: touch Tainted: G W 6.19.0-rc8-btrfs-next-225+ #1 PREEMPT(full) [758240.524621] Tainted: [W]=WARN [758240.525037] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [758240.526331] RIP: 0010:btrfs_create_new_inode+0x80b/0xb50 [btrfs] [758240.527093] Code: 0f 82 cf (...) [758240.529211] RSP: 0018:ffffce64418fbb48 EFLAGS: 00010292 [758240.529935] RAX: 00000000ffffffd3 RBX: 0000000000000000 RCX: 00000000ffffffb5 [758240.531040] RDX: 0000000d04f33e06 RSI: 00000000ffffffb5 RDI: ffffffffc0919dd0 [758240.531920] RBP: ffffce64418fbc10 R08: 0000000000000000 R09: 00000000ffffffb5 [758240.532928] R10: 0000000000000000 R11: ffff8e52c0000000 R12: ffff8e53eee7d0f0 [758240.533818] R13: ffff8e57f70932a0 R14: ffff8e5417629568 R15: 0000000000000000 [758240.534664] FS: 00007f1959a2a740(0000) GS:ffff8e5b27cae000(0000) knlGS:0000000000000000 [758240.535821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [758240.536644] CR2: 00007f1959b10ce0 CR3: 000000012a2cc005 CR4: 0000000000370ef0 [758240.537517] Call Trace: [758240.537828] <TASK> [758240.538099] btrfs_create_common+0xbf/0x140 [btrfs] [758240.538760] path_openat+0x111a/0x15b0 [758240.539252] do_filp_open+0xc2/0x170 [758240.539699] ? preempt_count_add+0x47/0xa0 [758240.540200] ? __virt_addr_valid+0xe4/0x1a0 [758240.540800] ? __check_object_size+0x1b3/0x230 [758240.541661] ? alloc_fd+0x118/0x180 [758240.542315] do_sys_openat2+0x70/0xd0 [758240.543012] __x64_sys_openat+0x50/0xa0 [758240.543723] do_syscall_64+0x50/0xf20 [758240.544462] entry_SYSCALL_64_after_hwframe+0x76/0x7e [758240.545397] RIP: 0033:0x7f1959abc687 [758240.546019] Code: 48 89 fa (...) [758240.548522] RSP: 002b:00007ffe16ff8690 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 [758240.566278] RAX: ffffffffffffffda RBX: 00007f1959a2a740 RCX: 00007f1959abc687 [758240.567068] RDX: 0000000000000941 RSI: 00007ffe16ffa333 RDI: ffffffffffffff9c [758240.567860] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [758240.568707] R10: 00000000000001b6 R11: 0000000000000202 R12: 0000561eec7c4b90 [758240.569712] R13: 0000561eec7c311f R14: 00007ffe16ffa333 R15: 0000000000000000 [758240.570758] </TASK> [758240.571040] ---[ end trace 0000000000000000 ]--- [758240.571681] BTRFS: error (device sdi state A) in btrfs_create_new_inode:6854: errno=-75 unknown [758240.572899] BTRFS info (device sdi state EA): forced readonly Fix this by checking for hash collision, and if the adding a new name is possible, early in btrfs_create_new_inode() before we do any tree updates, so that we don't need to abort the transaction if we cannot add the new name due to the leaf size limit. A test case for fstests will be sent soon. Fixes: caae78e03234 ("btrfs: move common inode creation code into btrfs_create_new_inode()") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: read key again after incrementing slot in move_existing_remaps()Mark Harmstone
Fix move_existing_remaps() so that if we increment the slot because the key we encounter isn't a REMAP_BACKREF, we don't reuse the objectid and offset of the old item. Link: https://lore.kernel.org/linux-btrfs/20260125123908.2096548-1-clm@meta.com/ Reported-by: Chris Mason <clm@fb.com> Fixes: bbea42dfb91f ("btrfs: move existing remaps before relocating block group") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: add missing RCU unlock in error path in ↵Bart Van Assche
try_release_subpage_extent_buffer() Call rcu_read_lock() before exiting the loop in try_release_subpage_extent_buffer() because there is a rcu_read_unlock() call past the loop. This has been detected by the Clang thread-safety analyzer. Fixes: ad580dfa388f ("btrfs: fix subpage deadlock in try_release_subpage_extent_buffer()") CC: stable@vger.kernel.org # 6.18+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: set BTRFS_ROOT_ORPHAN_CLEANUP during subvol createBoris Burkov
We have recently observed a number of subvolumes with broken dentries. ls-ing the parent dir looks like: drwxrwxrwt 1 root root 16 Jan 23 16:49 . drwxr-xr-x 1 root root 24 Jan 23 16:48 .. d????????? ? ? ? ? ? broken_subvol and similarly stat-ing the file fails. In this state, deleting the subvol fails with ENOENT, but attempting to create a new file or subvol over it errors out with EEXIST and even aborts the fs. Which leaves us a bit stuck. dmesg contains a single notable error message reading: "could not do orphan cleanup -2" 2 is ENOENT and the error comes from the failure handling path of btrfs_orphan_cleanup(), with the stack leading back up to btrfs_lookup(). btrfs_lookup btrfs_lookup_dentry btrfs_orphan_cleanup // prints that message and returns -ENOENT After some detailed inspection of the internal state, it became clear that: - there are no orphan items for the subvol - the subvol is otherwise healthy looking, it is not half-deleted or anything, there is no drop progress, etc. - the subvol was created a while ago and does the meaningful first btrfs_orphan_cleanup() call that sets BTRFS_ROOT_ORPHAN_CLEANUP much later. - after btrfs_orphan_cleanup() fails, btrfs_lookup_dentry() returns -ENOENT, which results in a negative dentry for the subvolume via d_splice_alias(NULL, dentry), leading to the observed behavior. The bug can be mitigated by dropping the dentry cache, at which point we can successfully delete the subvolume if we want. i.e., btrfs_lookup() btrfs_lookup_dentry() if (!sb_rdonly(inode->vfs_inode)->vfs_inode) btrfs_orphan_cleanup(sub_root) test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) btrfs_search_slot() // finds orphan item for inode N ... prints "could not do orphan cleanup -2" if (inode == ERR_PTR(-ENOENT)) inode = NULL; return d_splice_alias(NULL, dentry) // NEGATIVE DENTRY for valid subvolume btrfs_orphan_cleanup() does test_and_set_bit(BTRFS_ROOT_ORPHAN_CLEANUP) on the root when it runs, so it cannot run more than once on a given root, so something else must run concurrently. However, the obvious routes to deleting an orphan when nlinks goes to 0 should not be able to run without first doing a lookup into the subvolume, which should run btrfs_orphan_cleanup() and set the bit. The final important observation is that create_subvol() calls d_instantiate_new() but does not set BTRFS_ROOT_ORPHAN_CLEANUP, so if the dentry cache gets dropped, the next lookup into the subvolume will make a real call into btrfs_orphan_cleanup() for the first time. This opens up the possibility of concurrently deleting the inode/orphan items but most typical evict() paths will be holding a reference on the parent dentry (child dentry holds parent->d_lockref.count via dget in d_alloc(), released in __dentry_kill()) and prevent the parent from being removed from the dentry cache. The one exception is delayed iputs. Ordered extent creation calls igrab() on the inode. If the file is unlinked and closed while those refs are held, iput() in __dentry_kill() decrements i_count but does not trigger eviction (i_count > 0). The child dentry is freed and the subvol dentry's d_lockref.count drops to 0, making it evictable while the inode is still alive. Since there are two races (the race between writeback and unlink and the race between lookup and delayed iputs), and there are too many moving parts, the following three diagrams show the complete picture. (Only the second and third are races) Phase 1: Create Subvol in dentry cache without BTRFS_ROOT_ORPHAN_CLEANUP set btrfs_mksubvol() lookup_one_len() __lookup_slow() d_alloc_parallel() __d_alloc() // d_lockref.count = 1 create_subvol(dentry) // doesn't touch the bit.. d_instantiate_new(dentry, inode) // dentry in cache with d_lockref.count == 1 Phase 2: Create a delayed iput for a file in the subvol but leave the subvol in state where its dentry can be evicted (d_lockref.count == 0) T1 (task) T2 (writeback) T3 (OE workqueue) write() // dirty pages btrfs_writepages() btrfs_run_delalloc_range() cow_file_range() btrfs_alloc_ordered_extent() igrab() // i_count: 1 -> 2 btrfs_unlink_inode() btrfs_orphan_add() close() __fput() dput() finish_dput() __dentry_kill() dentry_unlink_inode() iput() // 2 -> 1 --parent->d_lockref.count // 1 -> 0; evictable finish_ordered_fn() btrfs_finish_ordered_io() btrfs_put_ordered_extent() btrfs_add_delayed_iput() Phase 3: Once the delayed iput is pending and the subvol dentry is evictable, the shrinker can free it, causing the next lookup to go through btrfs_lookup() and call btrfs_orphan_cleanup() for the first time. If the cleaner kthread processes the delayed iput concurrently, the two race: T1 (shrinker) T2 (cleaner kthread) T3 (lookup) super_cache_scan() prune_dcache_sb() __dentry_kill() // subvol dentry freed btrfs_run_delayed_iputs() iput() // i_count -> 0 evict() // sets I_FREEING btrfs_evict_inode() // truncation loop btrfs_lookup() btrfs_lookup_dentry() btrfs_orphan_cleanup() // first call (bit never set) btrfs_iget() // blocks on I_FREEING btrfs_orphan_del() // inode freed // returns -ENOENT btrfs_del_orphan_item() // -ENOENT // "could not do orphan cleanup -2" d_splice_alias(NULL, dentry) // negative dentry for valid subvol The most straightforward fix is to ensure the invariant that a dentry for a subvolume can exist if and only if that subvolume has BTRFS_ROOT_ORPHAN_CLEANUP set on its root (and is known to have no orphans or ran btrfs_orphan_cleanup()). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: zoned: move btrfs_zoned_reserve_data_reloc_bg() after kthread startJohannes Thumshirn
btrfs_zoned_reserve_data_reloc_bg() is called on each mount of a file system and allocates a new block-group, to assign it to be the dedicated relocation target, if no pre-existing usable block-group for this task is found. If for some reason the transaction is aborted, btrfs_end_transaction() will wake up the transaction kthread. But the transaction kthread is not yet initialized at the time btrfs_zoned_reserve_data_reloc_bg() is called, leading to the following NULL-pointer dereference: RSP: 0018:ffffc9000c617c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 000000000000073c RCX: 0000000000000002 RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000001 RBP: 0000000000000207 R08: ffffffff8223c71d R09: 0000000000000635 R10: ffff888108588000 R11: 0000000000000003 R12: 0000000000000003 R13: 000000000000073c R14: 0000000000000000 R15: ffff888114dd6000 FS: 00007f2993745840(0000) GS:ffff8882b508d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000073c CR3: 0000000121a82006 CR4: 0000000000770eb0 PKRU: 55555554 Call Trace: <TASK> try_to_wake_up (./include/linux/spinlock.h:557 kernel/sched/core.c:4106) __btrfs_end_transaction (fs/btrfs/transaction.c:1115 (discriminator 2)) btrfs_zoned_reserve_data_reloc_bg (fs/btrfs/zoned.c:2840) open_ctree (fs/btrfs/disk-io.c:3588) btrfs_get_tree.cold (fs/btrfs/super.c:982 fs/btrfs/super.c:1944 fs/btrfs/super.c:2087 fs/btrfs/super.c:2121) vfs_get_tree (fs/super.c:1752) __do_sys_fsconfig (fs/fsopen.c:231 fs/fsopen.c:295 fs/fsopen.c:473) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131) RIP: 0033:0x7f299392740e Move the call to btrfs_zoned_reserve_data_reloc_bg() after the transaction_kthread has been initialized to fix this problem. Fixes: 694ce5e143d6 ("btrfs: zoned: reserve data_reloc block group on mount") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: hold space_info->lock when clearing periodic reclaim readySun YangKai
btrfs_set_periodic_reclaim_ready() requires space_info->lock to be held, as enforced by lockdep_assert_held(). However, btrfs_reclaim_sweep() was calling it after do_reclaim_sweep() returns, at which point space_info->lock is no longer held. Fix this by explicitly acquiring space_info->lock before clearing the periodic reclaim ready flag in btrfs_reclaim_sweep(). Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208182556.891815-1-clm@meta.com/ Fixes: 19eff93dc738 ("btrfs: fix periodic reclaim condition") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-03btrfs: print-tree: add remap tree definitionsMark Harmstone
Add the definitions for the remap tree to print-tree.c, so that we get more useful information if a tree is dumped to dmesg. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Mark Harmstone <mark@harmstone.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-02fsverity: add dependency on 64K or smaller pagesEric Biggers
Currently, all filesystems that support fsverity (ext4, f2fs, and btrfs) cache the Merkle tree in the pagecache at a 64K aligned offset after the end of the file data. This offset needs to be a multiple of the page size, which is guaranteed only when the page size is 64K or smaller. 64K was chosen to be the "largest reasonable page size". But it isn't the largest *possible* page size: the hexagon and powerpc ports of Linux support 256K pages, though that configuration is rarely used. For now, just disable support for FS_VERITY in these odd configurations to ensure it isn't used in cases where it would have incorrect behavior. Fixes: 671e67b47e9f ("fs-verity: add Kconfig and the helper functions for hashing") Reported-by: Christoph Hellwig <hch@lst.de> Closes: https://lore.kernel.org/r/20260119063349.GA643@lst.de Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20260221204525.30426-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-02Merge tag 'nfsd-7.0-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Restore previous nfsd thread count reporting behavior - Fix credential reference leaks in the NFSD netlink admin protocol * tag 'nfsd-7.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: report the requested maximum number of threads instead of number running nfsd: Fix cred ref leak in nfsd_nl_listener_set_doit(). nfsd: Fix cred ref leak in nfsd_nl_threads_set_doit().
2026-03-01smb: update some doc referencesZhangGuoDong
To make it easier to locate the documentation during development. Signed-off-by: ZhangGuoDong <zhangguodong@kylinos.cn> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-01smb/client: make SMB2 maperror KUnit tests a separate moduleChenXiaoSong
Build the SMB2 maperror KUnit tests as `smb2maperror_test.ko`. Link: https://lore.kernel.org/linux-cifs/20260210081040.4156383-1-geert@linux-m68k.org/ Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-03-01Merge tag 'sched-urgent-2026-03-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix zero_vruntime tracking when there's a single task running - Fix slice protection logic - Fix the ->vprot logic for reniced tasks - Fix lag clamping in mixed slice workloads - Fix objtool uaccess warning (and bug) in the !CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining, which triggers with older compilers - Fix a comment in the rseq registration rseq_size bound check code - Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes differently, which special size we now reached naturally and want to avoid. The visible ugliness of the new reserved field will be avoided the next time the RSEQ area is extended. * tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: slice ext: Ensure rseq feature size differs from original rseq size rseq: Clarify rseq registration rseq_size bound check comment sched/core: Fix wakeup_preempt's next_class tracking rseq: Mark rseq_arm_slice_extension_timer() __always_inline sched/fair: Fix lag clamp sched/eevdf: Update se->vprot in reweight_entity() sched/fair: Only set slice protection at pick time sched/fair: Fix zero_vruntime tracking
2026-02-28Merge tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: - Two multichannel fixes - Locking fix for superblock flags - Fix to remove debug message that could log password - Cleanup fix for setting credentials * tag 'v7.0rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: Use snprintf in cifs_set_cifscreds smb: client: Don't log plaintext credentials in cifs_set_cifscreds smb: client: fix broken multichannel with krb5+signing smb: client: use atomic_t for mnt_cifs_flags smb: client: fix cifs_pick_channel when channels are equally loaded
2026-02-27nsfs: tighten permission checks for handle openingChristian Brauner
Even privileged services should not necessarily be able to see other privileged service's namespaces so they can't leak information to each other. Use may_see_all_namespaces() helper that centralizes this policy until the nstree adapts. Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-2-d2c2853313bd@kernel.org Fixes: 5222470b2fbb ("nsfs: support file handles") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@kernel.org # v6.18+ Signed-off-by: Christian Brauner <brauner@kernel.org>