summaryrefslogtreecommitdiff
path: root/fs/overlayfs/util.c
AgeCommit message (Collapse)Author
2025-12-01Merge tag 'vfs-6.19-rc1.ovl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull overlayfs cred guard conversion from Christian Brauner: "This converts all of overlayfs to use credential guards, eliminating manual credential management throughout the filesystem. Credential guard conversion: - Convert all of overlayfs to use credential guards, replacing the manual ovl_override_creds()/ovl_revert_creds() pattern with scoped guards. This makes credential handling visually explicit and eliminates a class of potential bugs from mismatched override/revert calls. (1) Basic credential guard (with_ovl_creds) (2) Creator credential guard (ovl_override_creator_creds): Introduced a specialized guard for file creation operations that handles the two-phase credential override (mounter credentials, then fs{g,u}id override). The new pattern is much clearer: with_ovl_creds(dentry->d_sb) { scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) { if (IS_ERR(cred)) return PTR_ERR(cred); /* creation operations */ } } (3) Copy-up credential guard (ovl_cu_creds): Introduced a specialized guard for copy-up operations, simplifying the previous struct ovl_cu_creds helper and associated functions. Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this pattern. Cleanups: - Remove ovl_revert_creds() after all callers converted to guards - Remove struct ovl_cu_creds and associated functions - Drop ovl_setup_cred_for_create() after conversion - Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(), ovl_rename() for cleaner credential guard scope - Introduce struct ovl_renamedata to simplify rename handling - Don't override credentials for ovl_check_whiteouts() (unnecessary) - Remove unneeded semicolon" * tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits) ovl: remove unneeded semicolon ovl: remove struct ovl_cu_creds and associated functions ovl: port ovl_copy_up_tmpfile() to cred guard ovl: mark *_cu_creds() as unused temporarily ovl: port ovl_copy_up_workdir() to cred guard ovl: add copy up credential guard ovl: drop ovl_setup_cred_for_create() ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard ovl: mark ovl_setup_cred_for_create() as unused temporarily ovl: reflow ovl_create_or_link() ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard ovl: add ovl_override_creator_creds cred guard ovl: remove ovl_revert_creds() ovl: port ovl_fill_super() to cred guard ovl: refactor ovl_fill_super() ovl: port ovl_lower_positive() to cred guard ovl: port ovl_lookup() to cred guard ovl: refactor ovl_lookup() ovl: port ovl_copyfile() to cred guard ovl: port ovl_rename() to cred guard ...
2025-12-01Merge tag 'vfs-6.19-rc1.directory.locking' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-28ovl: fail ovl_lock_rename_workdir() if either target is unhashedNeilBrown
As well as checking that the parent hasn't changed after getting the lock we need to check that the dentry hasn't been unhashed. Otherwise we might try to rename something that has been removed. Reported-by: syzbot+bfc9a0ccf0de47d04e8c@syzkaller.appspotmail.com Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.") Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/176429295510.634289.1552337113663461690@noble.neil.brown.name Tested-by: syzbot+bfc9a0ccf0de47d04e8c@syzkaller.appspotmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19ovl: remove ovl_revert_creds()Christian Brauner
The wrapper isn't needed anymore. Overlayfs completely relies on its cleanup guard. Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-42-b31603935724@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19ovl: port ovl_nlink_end() to cred guardChristian Brauner
Use the scoped ovl cred guard. Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-29-b31603935724@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-19ovl: port ovl_nlink_start() to cred guardChristian Brauner
Use the scoped ovl cred guard. Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-28-b31603935724@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS/ovl/smb: introduce start_renaming_dentry()NeilBrown
Several callers perform a rename on a dentry they already have, and only require lookup for the target name. This includes smb/server and a few different places in overlayfs. start_renaming_dentry() performs the required lookup and takes the required lock using lock_rename_child() It is used in three places in overlayfs and in ksmbd_vfs_rename(). In the ksmbd case, the parent of the source is not important - the source must be renamed from wherever it is. So start_renaming_dentry() allows rd->old_parent to be NULL and only checks it if it is non-NULL. On success rd->old_parent will be the parent of old_dentry with an extra reference taken. Other start_renaming function also now take the extra reference and end_renaming() now drops this reference as well. ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are all removed as they are no longer needed. OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so that ovl_check_rename_whiteout() can access them. ovl_copy_up_workdir() now always cleans up on error. Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-12-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20overlayfs: use the new ->i_state accessorsMateusz Guzik
Change generated with coccinelle and fixed up by hand as appropriate. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-03Merge tag 'pull-f_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...
2025-09-23ovl: Support mounting case-insensitive enabled layersAndré Almeida
Drop the restriction for casefold dentries lookup to enable support for case-insensitive layers in overlayfs. Support case-insensitive layers with the condition that they should be uniformly enabled across the stack and (i.e. if the root mount dir has casefold enabled, so should all the dirs bellow for every layer). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-15ovl_get_verity_digest(): constify path argumentAl Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_validate_verity(): constify {meta,data}path argumentsAl Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-15ovl_ensure_verity_loaded(): constify datapath argumentAl Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-18ovl: fix possible double unlinkAmir Goldstein
commit 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()") introduced the helper ovl_cleanup_unlocked(), which is later used in several following patches to re-acquire the parent inode lock and unlink a dentry that was earlier found using lookup. This helper was eventually renamed to ovl_cleanup(). The helper ovl_parent_lock() is used to re-acquire the parent inode lock. After acquiring the parent inode lock, the helper verifies that the dentry has not since been moved to another parent, but it failed to verify that the dentry wasn't unlinked from the parent. This means that now every call to ovl_cleanup() could potentially race with another thread, unlinking the dentry to be cleaned up underneath overlayfs and trigger a vfs assertion. Reported-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com Tested-by: syzbot+ec9fab8b7f0386b98a17@syzkaller.appspotmail.com Fixes: 9d23967b18c6 ("ovl: simplify an error path in ovl_copy_up_workdir()") Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-18ovl: rename ovl_cleanup_unlocked() to ovl_cleanup()NeilBrown
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we no longer need both. This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it static. ovl_cleanup_unlocked() is renamed to ovl_cleanup(). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: change ovl_cleanup_and_whiteout() to take rename lock as neededNeilBrown
Rather than locking the directory(s) before calling ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the locking, so the locking can be fine grained as will be needed for proposed locking changes. Sometimes this is called to whiteout something in the index dir, in which case only that dir must be locked. In one case it is called on something in an upperdir, so two directories must be locked. We use ovl_lock_rename_workdir() for this and remove the restriction that upperdir cannot be indexdir - because now sometimes it is. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: narrow locking in ovl_cleanup_index()NeilBrown
ovl_cleanup_index() takes a lock on the directory and then does a lookup and possibly one of two different cleanups. This patch narrows the locking to use the _unlocked() versions of the lookup and one cleanup, and just takes the lock for the other cleanup. A subsequent patch will take the lock into the cleanup. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: Call ovl_create_temp() without lock held.NeilBrown
ovl currently locks a directory or two and then performs multiple actions in one or both directories. This is incompatible with proposed changes which will lock just the dentry of objects being acted on. This patch moves calls to ovl_create_temp() out of the locked regions and has it take and release the relevant lock itself. The lock that was taken before this function was called is now taken after. This means that any code between where the lock was taken and ovl_create_temp() is now unlocked. This necessitates the use of ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked(). These will be used more widely in future patches. Now that the file is created before the lock is taken for rename, we need to ensure the parent wasn't changed before the lock was gained. ovl_lock_rename_workdir() is changed to optionally receive the dentries that will be involved in the rename. If either is present but has the wrong parent, an error is returned. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: simplify an error path in ovl_copy_up_workdir()NeilBrown
If ovl_copy_up_data() fails the error is not immediately handled but the code continues on to call ovl_start_write() and lock_rename(), presumably because both of these locks are needed for the cleanup. Only then (if the lock was successful) is the error checked. This makes the code a little hard to follow and could be fragile. This patch changes to handle the error after the ovl_start_write() (which cannot fail, so there aren't multiple errors to deail with). A new ovl_cleanup_unlocked() is created which takes the required directory lock. This will be used extensively in later patches. In general we need to check the parent is still correct after taking the lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so that is included in ovl_cleanup_unlocked() using new ovl_parent_lock() and ovl_parent_unlock() calls (it is planned to move this API into VFS code eventually, though in a slightly different form). Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-18ovl: support layers on case-folding capable filesystemsAmir Goldstein
Case folding is often applied to subtrees and not on an entire filesystem. Disallowing layers from filesystems that support case folding is over limiting. Replace the rule that case-folding capable are not allowed as layers with a rule that case folded directories are not allowed in a merged directory stack. Should case folding be enabled on an underlying directory while overlayfs is mounted the outcome is generally undefined. Specifically in ovl_lookup(), we check the base underlying directory and fail with -ESTALE and write a warning to kmsg if an underlying directory case folding is enabled. Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-16VFS: change old_dir and new_dir in struct renamedata to dentrysNeilBrown
all users of 'struct renamedata' have the dentry for the old and new directories, and often have no use for the inode except to store it in the renamedata. This patch changes struct renamedata to hold the dentry, rather than the inode, for the old and new directories, and changes callers to match. The names are also changed from a _dir suffix to _parent. This is consistent with other usage in namei.c and elsewhere. This results in the removal of several local variables and several dereferences of ->d_inode at the cost of adding ->d_inode dereferences to vfs_rename(). Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-05ovl: Replace offsetof() with struct_size() in ovl_stack_free()Thorsten Blum
Compared to offsetof(), struct_size() provides additional compile-time checks for structs with flexible arrays (e.g., __must_be_array()). No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-04-30ovl: Check for NULL d_inode() in ovl_dentry_upper()Kees Cook
In ovl_path_type() and ovl_is_metacopy_dentry() GCC notices that it is possible for OVL_E() to return NULL (which implies that d_inode(dentry) may be NULL). This would result in out of bounds reads via container_of(), seen with GCC 15's -Warray-bounds -fdiagnostics-details. For example: In file included from arch/x86/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:339, from include/linux/export.h:5, from include/linux/linkage.h:7, from include/linux/fs.h:5, from fs/overlayfs/util.c:7: In function 'ovl_upperdentry_dereference', inlined from 'ovl_dentry_upper' at ../fs/overlayfs/util.c:305:9, inlined from 'ovl_path_type' at ../fs/overlayfs/util.c:216:6: include/asm-generic/rwonce.h:44:26: error: array subscript 0 is outside array bounds of 'struct inode[7486503276667837]' [-Werror=array-bounds=] 44 | #define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x)) | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:50:9: note: in expansion of macro '__READ_ONCE' 50 | __READ_ONCE(x); \ | ^~~~~~~~~~~ fs/overlayfs/ovl_entry.h:195:16: note: in expansion of macro 'READ_ONCE' 195 | return READ_ONCE(oi->__upperdentry); | ^~~~~~~~~ 'ovl_path_type': event 1 185 | return inode ? OVL_I(inode)->oe : NULL; 'ovl_path_type': event 2 Avoid this by allowing ovl_dentry_upper() to return NULL if d_inode() is NULL, as that means the problematic dereferencing can never be reached. Note that this fixes the over-eager compiler warning in an effort to being able to enable -Warray-bounds globally. There is no known behavioral bug here. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-12-02tree-wide: s/revert_creds_light()/revert_creds()/gChristian Brauner
Rename all calls to revert_creds_light() back to revert_creds(). Link: https://lore.kernel.org/r/20241125-work-cred-v2-6-68b9d38bb5b2@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-02tree-wide: s/override_creds_light()/override_creds()/gChristian Brauner
Rename all calls to override_creds_light() back to overrid_creds(). Link: https://lore.kernel.org/r/20241125-work-cred-v2-5-68b9d38bb5b2@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-20ovl: Filter invalid inodes with missing lookup functionVasiliy Kovalev
Add a check to the ovl_dentry_weird() function to prevent the processing of directory inodes that lack the lookup function. This is important because such inodes can cause errors in overlayfs when passed to the lowerstack. Reported-by: syzbot+a8c9d476508bd14a90e5@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a8c9d476508bd14a90e5 Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Link: https://lore.kernel.org/linux-unionfs/CAJfpegvx-oS9XGuwpJx=Xe28_jzWx5eRo1y900_ZzWY+=gGzUg@mail.gmail.com/ Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-15ovl: Optimize override/revert credsVinicius Costa Gomes
Use override_creds_light() in ovl_override_creds() and revert_creds_light() in ovl_revert_creds(). The _light() functions do not change the 'usage' of the credentials in question, as they refer to the credentials associated with the mounter, which have a longer lifetime. In ovl_setup_cred_for_create(), do not need to modify the mounter credentials (returned by override_creds_light()) 'usage' counter. Add a warning to verify that we are indeed working with the mounter credentials (stored in the superblock). Failure in this assumption means that creds may leak. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-11-11ovl: use wrapper ovl_revert_creds()Vinicius Costa Gomes
Introduce ovl_revert_creds() wrapper of revert_creds() to match callers of ovl_override_creds(). Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-04-15kernel_file_open(): get rid of inode argumentAl Viro
always equal to ->dentry->d_inode of the path argument these days. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-02-08ovl: convert to super_set_uuid()Kent Overstreet
We don't want to be settingc sb->s_uuid directly anymore, as there's a length field that also has to be set, and this conversion was not completely trivial. Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240207025624.1019754-3-kent.overstreet@linux.dev Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23ovl: mark xwhiteouts directory with overlay.opaque='x'Amir Goldstein
An opaque directory cannot have xwhiteouts, so instead of marking an xwhiteouts directory with a new xattr, overload overlay.opaque xattr for marking both opaque dir ('y') and xwhiteouts dir ('x'). This is more efficient as the overlay.opaque xattr is checked during lookup of directory anyway. This also prevents unnecessary checking the xattr when reading a directory without xwhiteouts, i.e. most of the time. Note that the xwhiteouts marker is not checked on the upper layer and on the last layer in lowerstack, where xwhiteouts are not expected. Fixes: bc8df7a3dc03 ("ovl: Add an alternative type of whiteout") Cc: <stable@vger.kernel.org> # v6.7 Reviewed-by: Alexander Larsson <alexl@redhat.com> Tested-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2024-01-11Merge tag 'pull-rename' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change
2023-11-25rename(): avoid a deadlock in the case of parents having no common ancestorAl Viro
... and fix the directory locking documentation and proof of correctness. Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the case where we really don't want it is splicing the root of disconnected tree to somewhere. In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an ancestor of Y" only if X and Y are already in the same tree. Otherwise it can go from false to true, and one can construct a deadlock on that. Make lock_two_directories() report an error in such case and update the callers of lock_rename()/lock_rename_child() to handle such errors. And yes, such conditions are not impossible to create ;-/ Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-20ovl: remove redundant ofs->indexdir memberAmir Goldstein
When the index feature is disabled, ofs->indexdir is NULL. When the index feature is enabled, ofs->indexdir has the same value as ofs->workdir and takes an extra reference. This makes the code harder to understand when it is not always clear that ofs->indexdir in one function is the same dentry as ofs->workdir in another function. Remove this redundancy, by referencing ofs->workdir directly in index helpers and by using the ovl_indexdir() accessor in generic code. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-11-14ovl: fix misformatted commentAmir Goldstein
Remove misleading /** prefix from a regular comment. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311121628.byHp8tkv-lkp@intel.com/ Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-11-07Merge tag 'vfs-6.7.fsid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles
2023-10-31ovl: Add an alternative type of whiteoutAlexander Larsson
An xattr whiteout (called "xwhiteout" in the code) is a reguar file of zero size with the "overlay.whiteout" xattr set. A file like this in a directory with the "overlay.whiteouts" xattrs set will be treated the same way as a regular whiteout. The "overlay.whiteouts" directory xattr is used in order to efficiently handle overlay checks in readdir(), as we only need to checks xattrs in affected directories. The advantage of this kind of whiteout is that they can be escaped using the standard overlay xattr escaping mechanism. So, a file with a "overlay.overlay.whiteout" xattr would be unescaped to "overlay.whiteout", which could then be consumed by another overlayfs as a whiteout. Overlayfs itself doesn't create whiteouts like this, but a userspace mechanism could use this alternative mechanism to convert images that may contain whiteouts to be used with overlayfs. To work as a whiteout for both regular overlayfs mounts as well as userxattr mounts both the "user.overlay.whiteout*" and the "trusted.overlay.whiteout*" xattrs will need to be created. Signed-off-by: Alexander Larsson <alexl@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: do not encode lower fh with upper sb_writers heldAmir Goldstein
When lower fs is a nested overlayfs, calling encode_fh() on a lower directory dentry may trigger copy up and take sb_writers on the upper fs of the lower nested overlayfs. The lower nested overlayfs may have the same upper fs as this overlayfs, so nested sb_writers lock is illegal. Move all the callers that encode lower fh to before ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: do not open/llseek lower file with upper sb_writers heldAmir Goldstein
overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek take the ovl_inode_lock, without holding upper sb_writers. In case of nested lower overlay that uses same upper fs as this overlay, lockdep will warn about (possibly false positive) circular lock dependency when doing open/llseek of lower ovl file during copy up with our upper sb_writers held, because the locking ordering seems reverse to the locking order in ovl_copy_up_start(): - lower ovl_inode_lock - upper sb_writers Let the copy up "transaction" keeps an elevated mnt write count on upper mnt, but leaves taking upper sb_writers to lower level helpers only when they actually need it. This allows to avoid holding upper sb_writers during lower file open/llseek and prevents the lockdep warning. Minimizing the scope of upper sb_writers during copy up is also needed for fixing another possible deadlocks by a following patch. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: reorder ovl_want_write() after ovl_inode_lock()Amir Goldstein
Make the locking order of ovl_inode_lock() strictly between the two vfs stacked layers, i.e.: - ovl vfs locks: sb_writers, inode_lock, ... - ovl_inode_lock - upper vfs locks: sb_writers, inode_lock, ... To that effect, move ovl_want_write() into the helpers ovl_nlink_start() and ovl_copy_up_start which currently take the ovl_inode_lock() after ovl_want_write(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: split ovl_want_write() into two helpersAmir Goldstein
ovl_get_write_access() gets write access to upper mnt without taking freeze protection on upper sb and ovl_start_write() only takes freeze protection on upper sb. These helpers will be used to breakup the large ovl_want_write() scope during copy up into finer grained freeze protection scopes. Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-31ovl: protect copying of realinode attributes to ovl inodeAmir Goldstein
ovl_copyattr() may be called concurrently from aio completion context without any lock and that could lead to overlay inode attributes getting permanently out of sync with real inode attributes. Use ovl inode spinlock to protect ovl_copyattr(). Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-24exportfs: add helpers to check if filesystem can encode/decode file handlesAmir Goldstein
The logic of whether filesystem can encode/decode file handles is open coded in many places. In preparation to changing the logic, move the open coded logic into inline helpers. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-2-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18overlayfs: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-58-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-30Merge tag 'ovl-update-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs updates from Amir Goldstein: - add verification feature needed by composefs (Alexander Larsson) - improve integration of overlayfs and fanotify (Amir Goldstein) - fortify some overlayfs code (Andrea Righi) * tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: validate superblock in OVL_FS() ovl: make consistent use of OVL_FS() ovl: Kconfig: introduce CONFIG_OVERLAY_FS_DEBUG ovl: auto generate uuid for new overlay filesystems ovl: store persistent uuid/fsid with uuid=on ovl: add support for unique fsid per instance ovl: support encoding non-decodable file handles ovl: Handle verity during copy-up ovl: Validate verity xattr when resolving lowerdata ovl: Add versioned header for overlay.metacopy xattr ovl: Add framework for verity support
2023-08-12ovl: make consistent use of OVL_FS()Andrea Righi
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a struct super_block. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>