| Age | Commit message (Collapse) | Author |
|
When creating new files the security layer expects the original
credentials to be passed. When cleaning up the code this was accidently
changed to pass the mounter's credentials by relying on current->cred
which is already overriden at this point. Pass the original credentials
directly.
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: Paul Moore <paul@paul-moore.com>
Fixes: e566bff96322 ("ovl: port ovl_create_or_link() to new ovl_override_creator_creds")
Link: https://lore.kernel.org/CAFqZXNvL1ciLXMhHrnoyBmQu1PAApH41LkSWEhrcvzAAbFij8Q@mail.gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Tested-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs cred guard conversion from Christian Brauner:
"This converts all of overlayfs to use credential guards, eliminating
manual credential management throughout the filesystem.
Credential guard conversion:
- Convert all of overlayfs to use credential guards, replacing the
manual ovl_override_creds()/ovl_revert_creds() pattern with scoped
guards.
This makes credential handling visually explicit and eliminates a
class of potential bugs from mismatched override/revert calls.
(1) Basic credential guard (with_ovl_creds)
(2) Creator credential guard (ovl_override_creator_creds):
Introduced a specialized guard for file creation operations
that handles the two-phase credential override (mounter
credentials, then fs{g,u}id override). The new pattern is much
clearer:
with_ovl_creds(dentry->d_sb) {
scoped_class(prepare_creds_ovl, cred, dentry, inode, mode) {
if (IS_ERR(cred))
return PTR_ERR(cred);
/* creation operations */
}
}
(3) Copy-up credential guard (ovl_cu_creds):
Introduced a specialized guard for copy-up operations,
simplifying the previous struct ovl_cu_creds helper and
associated functions.
Ported ovl_copy_up_workdir() and ovl_copy_up_tmpfile() to this
pattern.
Cleanups:
- Remove ovl_revert_creds() after all callers converted to guards
- Remove struct ovl_cu_creds and associated functions
- Drop ovl_setup_cred_for_create() after conversion
- Refactor ovl_fill_super(), ovl_lookup(), ovl_iterate(),
ovl_rename() for cleaner credential guard scope
- Introduce struct ovl_renamedata to simplify rename handling
- Don't override credentials for ovl_check_whiteouts() (unnecessary)
- Remove unneeded semicolon"
* tag 'vfs-6.19-rc1.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (54 commits)
ovl: remove unneeded semicolon
ovl: remove struct ovl_cu_creds and associated functions
ovl: port ovl_copy_up_tmpfile() to cred guard
ovl: mark *_cu_creds() as unused temporarily
ovl: port ovl_copy_up_workdir() to cred guard
ovl: add copy up credential guard
ovl: drop ovl_setup_cred_for_create()
ovl: port ovl_create_or_link() to new ovl_override_creator_creds cleanup guard
ovl: mark ovl_setup_cred_for_create() as unused temporarily
ovl: reflow ovl_create_or_link()
ovl: port ovl_create_tmpfile() to new ovl_override_creator_creds cleanup guard
ovl: add ovl_override_creator_creds cred guard
ovl: remove ovl_revert_creds()
ovl: port ovl_fill_super() to cred guard
ovl: refactor ovl_fill_super()
ovl: port ovl_lower_positive() to cred guard
ovl: port ovl_lookup() to cred guard
ovl: refactor ovl_lookup()
ovl: port ovl_copyfile() to cred guard
ovl: port ovl_rename() to cred guard
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull directory locking updates from Christian Brauner:
"This contains the work to add centralized APIs for directory locking
operations.
This series is part of a larger effort to change directory operation
locking to allow multiple concurrent operations in a directory. The
ultimate goal is to lock the target dentry(s) rather than the whole
parent directory.
To help with changing the locking protocol, this series centralizes
locking and lookup in new helper functions. The helpers establish a
pattern where it is the dentry that is being locked and unlocked
(currently the lock is held on dentry->d_parent->d_inode, but that can
change in the future).
This also changes vfs_mkdir() to unlock the parent on failure, as well
as dput()ing the dentry. This allows end_creating() to only require
the target dentry (which may be IS_ERR() after vfs_mkdir()), not the
parent"
* tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nfsd: fix end_creating() conversion
VFS: introduce end_creating_keep()
VFS: change vfs_mkdir() to unlock on failure.
ecryptfs: use new start_creating/start_removing APIs
Add start_renaming_two_dentries()
VFS/ovl/smb: introduce start_renaming_dentry()
VFS/nfsd/ovl: introduce start_renaming() and end_renaming()
VFS: add start_creating_killable() and start_removing_killable()
VFS: introduce start_removing_dentry()
smb/server: use end_removing_noperm for for target of smb2_create_link()
VFS: introduce start_creating_noperm() and start_removing_noperm()
VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()
VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()
VFS: tidy up do_unlinkat()
VFS: introduce start_dirop() and end_dirop()
debugfs: rename end_creating() to debugfs_end_creating()
|
|
Remove unnecessary semicolons reported by Coccinelle/coccicheck and the
semantic patch at scripts/coccinelle/misc/semicolon.cocci.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Fixed: 7ab96df840e60 ("VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is now unused and can be removed.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-6-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This clearly indicates the double-credential override and makes the code
a lot easier to grasp with one glance.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-5-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The function will become unused in the next patch.
We'll remove it in later patches to keep the diff legible.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-4-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Reflow the creation routine in preparation of porting it to a guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-3-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This clearly indicates the double-credential override and makes the code
a lot easier to grasp with one glance.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-2-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The current code to override credentials for creation operations is
pretty difficult to understand. We effectively override the credentials
twice:
(1) override with the mounter's credentials
(2) copy the mounts credentials and override the fs{g,u}id with the inode {u,g}id
And then we elide the revert because it would be an idempotent revert.
That elision doesn't buy us anything anymore though because I've made it
all work without any reference counting anyway. All it does is mix the
two credential overrides together.
We can use a cleanup guard to clarify the creation codepaths and make
them easier to understand.
This just introduces the cleanup guard keeping the patch reviewable.
We'll convert the caller in follow-up patches and then drop the
duplicated code.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-prepare-v2-1-bd1c97a36d7b@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the scoped ovl cred guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-35-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Extract the code that runs under overridden credentials into a separate
ovl_rename_upper() helper function and the code that runs before/after to
ovl_rename_start/end(). Error handling is simplified.
The helpers returns errors directly instead of using goto labels.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-34-b31603935724@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a struct ovl_renamedata to group rename-related state that was
previously stored in local variables. Embedd struct renamedata directly
aligning with the vfs.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-33-b31603935724@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the scoped ovl cred guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-6-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the scoped ovl cred guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-5-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the scoped ovl cred guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-4-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use the scoped ovl cred guard.
Link: https://patch.msgid.link/20251117-work-ovl-cred-guard-v4-3-b31603935724@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Occasionally the caller of end_creating() wants to keep using the dentry.
Rather then requiring them to dget() the dentry (when not an error)
before calling end_creating(), provide end_creating_keep() which does
this.
cachefiles and overlayfs make use of this.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-16-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
vfs_mkdir() already drops the reference to the dentry on failure but it
leaves the parent locked.
This complicates end_creating() which needs to unlock the parent even
though the dentry is no longer available.
If we change vfs_mkdir() to unlock on failure as well as releasing the
dentry, we can remove the "parent" arg from end_creating() and simplify
the rules for calling it.
Note that cachefiles_get_directory() can choose to substitute an error
instead of actually calling vfs_mkdir(), for fault injection. In that
case it needs to call end_creating(), just as vfs_mkdir() now does on
error.
ovl_create_real() will now unlock on error. So the conditional
end_creating() after the call is removed, and end_creating() is called
internally on error.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-15-neilb@ownmail.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
A few callers want to lock for a rename and already have both dentries.
Also debugfs does want to perform a lookup but doesn't want permission
checking, so start_renaming_dentry() cannot be used.
This patch introduces start_renaming_two_dentries() which is given both
dentries. debugfs performs one lookup itself. As it will only continue
with a negative dentry and as those cannot be renamed or unlinked, it is
safe to do the lookup before getting the rename locks.
overlayfs uses start_renaming_two_dentries() in three places and selinux
uses it twice in sel_make_policy_nodes().
In sel_make_policy_nodes() we now lock for rename twice instead of just
once so the combined operation is no longer atomic w.r.t the parent
directory locks. As selinux_state.policy_mutex is held across the whole
operation this does not open up any interesting races.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-13-neilb@ownmail.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Several callers perform a rename on a dentry they already have, and only
require lookup for the target name. This includes smb/server and a few
different places in overlayfs.
start_renaming_dentry() performs the required lookup and takes the
required lock using lock_rename_child()
It is used in three places in overlayfs and in ksmbd_vfs_rename().
In the ksmbd case, the parent of the source is not important - the
source must be renamed from wherever it is. So start_renaming_dentry()
allows rd->old_parent to be NULL and only checks it if it is non-NULL.
On success rd->old_parent will be the parent of old_dentry with an extra
reference taken. Other start_renaming function also now take the extra
reference and end_renaming() now drops this reference as well.
ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are
all removed as they are no longer needed.
OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so
that ovl_check_rename_whiteout() can access them.
ovl_copy_up_workdir() now always cleans up on error.
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-12-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
start_renaming() combines name lookup and locking to prepare for rename.
It is used when two names need to be looked up as in nfsd and overlayfs -
cases where one or both dentries are already available will be handled
separately.
__start_renaming() avoids the inode_permission check and hash
calculation and is suitable after filename_parentat() in do_renameat2().
It subsumes quite a bit of code from that function.
start_renaming() does calculate the hash and check X permission and is
suitable elsewhere:
- nfsd_rename()
- ovl_rename()
In ovl, ovl_do_rename_rd() is factored out of ovl_do_rename(), which
itself will be gone by the end of the series.
Acked-by: Chuck Lever <chuck.lever@oracle.com> (for nfsd parts)
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
--
Changes since v3:
- added missig dput() in ovl_rename when "whiteout" is not-NULL.
Changes since v2:
- in __start_renaming() some label have been renamed, and err
is always set before a "goto out_foo" rather than passing the
error in a dentry*.
- ovl_do_rename() changed to call the new ovl_do_rename_rd() rather
than keeping duplicate code
- code around ovl_cleanup() call in ovl_rename() restructured.
Link: https://patch.msgid.link/20251113002050.676694-11-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
start_removing_dentry() is similar to start_removing() but instead of
providing a name for lookup, the target dentry is given.
start_removing_dentry() checks that the dentry is still hashed and in
the parent, and if so it locks and increases the refcount so that
end_removing() can be used to finish the operation.
This is used in cachefiles, overlayfs, smb/server, and apparmor.
There will be other users including ecryptfs.
As start_removing_dentry() takes an extra reference to the dentry (to be
put by end_removing()), there is no need to explicitly take an extra
reference to stop d_delete() from using dentry_unlink_inode() to negate
the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink().
cachefiles_bury_object() now gets an extra ref to the victim, which is
drops. As it includes the needed end_removing() calls, the caller
doesn't need them.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-9-neilb@ownmail.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
start_removing() is similar to start_creating() but will only return a
positive dentry with the expectation that it will be removed. This is
used by nfsd, cachefiles, and overlayfs. They are changed to also use
end_removing() to terminate the action begun by start_removing(). This
is a simple alias for end_dirop().
Apart from changes to the error paths, as we no longer need to unlock on
a lookup error, an effect on callers is that they don't need to test if
the found dentry is positive or negative - they can be sure it is
positive.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-6-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
start_creating() is similar to simple_start_creating() but is not so
simple.
It takes a qstr for the name, includes permission checking, and does NOT
report an error if the name already exists, returning a positive dentry
instead.
This is currently used by nfsd, cachefiles, and overlayfs.
end_creating() is called after the dentry has been used.
end_creating() drops the reference to the dentry as it is generally no
longer needed. This is exactly the first section of end_creating_path()
so that function is changed to call the new end_creating()
These calls help encapsulate locking rules so that directory locking can
be changed.
Occasionally this change means that the parent lock is held for a
shorter period of time, for example in cachefiles_commit_tmpfile().
As this function now unlocks after an unlink and before the following
lookup, it is possible that the lookup could again find a positive
dentry, so a while loop is introduced there.
In overlayfs the ovl_lookup_temp() function has ovl_tempname()
split out to be used in ovl_start_creating_temp(). The other use
of ovl_lookup_temp() is preparing for a rename. When rename handling
is updated, ovl_lookup_temp() will be removed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://patch.msgid.link/20251113002050.676694-5-neilb@ownmail.net
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Change generated with coccinelle and fixed up by hand as appropriate.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
e8bd877fb76bb9f3 ("ovl: fix possible double unlink") added a sanity
check of !d_unhashed(child) to try to verify that child dentry was not
unlinked while parent dir was unlocked.
This "was not unlink" check has a false positive result in the case of
casefolded parent dir, because in that case, ovl_create_temp() returns
an unhashed dentry after ovl_create_real() gets an unhashed dentry from
ovl_lookup_upper() and makes it positive.
To avoid returning unhashed dentry from ovl_create_temp(), let
ovl_create_real() lookup again after making the newdentry positive,
so it always returns a hashed positive dentry (or an error).
This fixes the error in ovl_parent_lock() in ovl_check_rename_whiteout()
after ovl_create_temp() and allows mount of overlayfs with casefolding
enabled layers.
Reported-by: André Almeida <andrealmeid@igalia.com>
Closes: https://lore.kernel.org/r/18704e8c-c734-43f3-bc7c-b8be345e1bf5@igalia.com/
Suggested-by: Neil Brown <neil@brown.name>
Reviewed-by: Neil Brown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
In a overlayfs with casefold enabled, all new dentries should have
casefold enabled as well. Check this at ovl_create_real().
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_create_temp() treats "workdir" as a parent in which it creates an
object so it should use I_MUTEX_PARENT.
Prior to the commit identified below the lock was taken by the caller
which sometimes used I_MUTEX_PARENT and sometimes used I_MUTEX_NORMAL.
The use of I_MUTEX_NORMAL was incorrect but unfortunately copied into
ovl_create_temp().
Note to backporters: This patch only applies after the last Fixes given
below (post v6.16). To fix the bug in v6.7 and later the
inode_lock() call in ovl_copy_up_workdir() needs to nest using
I_MUTEX_PARENT.
Link: https://lore.kernel.org/all/67a72070.050a0220.3d72c.0022.GAE@google.com/
Cc: stable@vger.kernel.org
Reported-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Tested-by: syzbot+7836a68852a10ec3d790@syzkaller.appspotmail.com
Fixes: c63e56a4a652 ("ovl: do not open/llseek lower file with upper sb_writers held")
Fixes: d2c995581c7c ("ovl: Call ovl_create_temp() without lock held.")
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
In case of ovl_lookup_temp() failure, we currently print `err`
which is actually not initialized at all.
Instead, properly print PTR_ERR(whiteout) which is where the
actual error really is.
Address-Coverity-ID: 1647983 ("Uninitialized variables (UNINIT)")
Fixes: 8afa0a7367138 ("ovl: narrow locking in ovl_whiteout()")
Signed-off-by: Antonio Quartulli <antonio@mandelbit.com>
Link: https://lore.kernel.org/20250721203821.7812-1-antonio@mandelbit.com
Reviewed-by: NeilBrown <neil@brown.name>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we
no longer need both.
This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it
static.
ovl_cleanup_unlocked() is renamed to ovl_cleanup().
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of passing an inode *dir, pass a dentry *parent. This makes the
calling slightly cleaner.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-21-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_whiteout() relies on the workdir i_rwsem to provide exclusive access
to ofs->whiteout which it manipulates. Rather than depending on this,
add a new mutex, "whiteout_lock" to explicitly provide the required
locking. Use guard(mutex) for this so that we can return without
needing to explicitly unlock.
Then take the lock on workdir only when needed - to lookup the temp name
and to do the whiteout or link.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-19-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than locking the directory(s) before calling
ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the
locking, so the locking can be fine grained as will be needed for
proposed locking changes.
Sometimes this is called to whiteout something in the index dir, in
which case only that dir must be locked. In one case it is called on
something in an upperdir, so two directories must be locked. We use
ovl_lock_rename_workdir() for this and remove the restriction that
upperdir cannot be indexdir - because now sometimes it is.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This code:
performs a lookup_upper
creates a whiteout object
renames the whiteout over the result of the lookup
The create and the rename must be locked separately for proposed
directory locking changes. This patch takes a first step of moving the
lookup out of the locked region. A subsequent patch will separate the
create from the rename.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-17-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the rename lock immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-10-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than having three separate goto label: out_unlock, out_dput_old,
and out_dput, make use of that fact that dput() happily accepts a NULL
pointer to reduce this to just one goto label: out_unlock.
olddentry and newdentry are initialised to NULL and only set once a
value dentry is found. They are then dput() late in the function.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-9-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Unlock the parents immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup, which takes a separate lock.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-8-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the locks immediately after rename, and use a separate lock for
cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Note that ovl_cleanup_whiteouts() operates on "upper", a child of
"upperdir" and does not require upperdir or workdir to be locked.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-7-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the directory lock immediately after the ovl_create_real() call and
take a separate lock later for cleanup in ovl_cleanup_unlocked() - if
needed.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl currently locks a directory or two and then performs multiple actions
in one or both directories. This is incompatible with proposed changes
which will lock just the dentry of objects being acted on.
This patch moves calls to ovl_create_temp() out of the locked regions and
has it take and release the relevant lock itself.
The lock that was taken before this function was called is now taken
after. This means that any code between where the lock was taken and
ovl_create_temp() is now unlocked. This necessitates the use of
ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked().
These will be used more widely in future patches.
Now that the file is created before the lock is taken for rename, we
need to ensure the parent wasn't changed before the lock was gained.
ovl_lock_rename_workdir() is changed to optionally receive the dentries
that will be involved in the rename. If either is present but has the
wrong parent, an error is returned.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If ovl_copy_up_data() fails the error is not immediately handled but the
code continues on to call ovl_start_write() and lock_rename(),
presumably because both of these locks are needed for the cleanup.
Only then (if the lock was successful) is the error checked.
This makes the code a little hard to follow and could be fragile.
This patch changes to handle the error after the ovl_start_write()
(which cannot fail, so there aren't multiple errors to deail with). A
new ovl_cleanup_unlocked() is created which takes the required directory
lock. This will be used extensively in later patches.
In general we need to check the parent is still correct after taking the
lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so
that is included in ovl_cleanup_unlocked() using new ovl_parent_lock()
and ovl_parent_unlock() calls (it is planned to move this API into VFS code
eventually, though in a slightly different form).
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
all users of 'struct renamedata' have the dentry for the old and new
directories, and often have no use for the inode except to store it in
the renamedata.
This patch changes struct renamedata to hold the dentry, rather than
the inode, for the old and new directories, and changes callers to
match. The names are also changed from a _dir suffix to _parent. This
is consistent with other usage in namei.c and elsewhere.
This results in the removal of several local variables and several
dereferences of ->d_inode at the cost of adding ->d_inode dereferences
to vfs_rename().
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/174977089072.608730.4244531834577097454@noble.neil.brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
vfs_mkdir() does not guarantee to leave the child dentry hashed or make
it positive on success, and in many such cases the filesystem had to use
a different dentry which it can now return.
This patch changes vfs_mkdir() to return the dentry provided by the
filesystems which is hashed and positive when provided. This reduces
the number of cases where the resulting dentry is not positive to a
handful which don't deserve extra efforts.
The only callers of vfs_mkdir() which are interested in the resulting
inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server.
The only filesystems that don't reliably provide the inode are:
- kernfs, tracefs which these clients are unlikely to be interested in
- cifs in some configurations would need to do a lookup to find the
created inode, but doesn't. cifs cannot be exported via NFS, is
unlikely to be used by cachefiles, and smb/server only has a soft
requirement for the inode, so this is unlikely to be a problem in
practice.
- hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is
possible for a race to make that lookup fail. Actual failure
is unlikely and providing callers handle negative dentries graceful
they will fail-safe.
So this patch removes the lookup code in nfsd and smb/server and adjusts
them to fail safe if a negative dentry is provided:
- cache-files already fails safe by restarting the task from the
top - it still does with this change, though it no longer calls
cachefiles_put_directory() as that will crash if the dentry is
negative.
- nfsd reports "Server-fault" which it what it used to do if the lookup
failed. This will never happen on any file-systems that it can actually
export, so this is of no consequence. I removed the fh_update()
call as that is not needed and out-of-place. A subsequent
nfsd_create_setattr() call will call fh_update() when needed.
- smb/server only wants the inode to call ksmbd_smb_inherit_owner()
which updates ->i_uid (without calling notify_change() or similar)
which can be safely skipping on cifs (I hope).
If a different dentry is returned, the first one is put. If necessary
the fact that it is new can be determined by comparing pointers. A new
dentry will certainly have a new pointer (as the old is put after the
new is obtained).
Similarly if an error is returned (via ERR_PTR()) the original dentry is
put.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g. on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns. For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.
This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes. Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.
This lookup-after-mkdir requires that the directory remains locked to
avoid races. Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.
To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
NULL - the directory was created and no other dentry was used
ERR_PTR() - an error occurred
non-NULL - this other dentry was spliced in
This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations. Subsequent patches will make
further changes to some file-systems to return a correct dentry.
Not all filesystems reliably result in a positive hashed dentry:
- NFS, cifs, hostfs will sometimes need to perform a lookup of
the name to get inode information. Races could result in this
returning something different. Note that this lookup is
non-atomic which is what we are trying to avoid. Placing the
lookup in filesystem code means it only happens when the filesystem
has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
operation ensures that lookup will be called to correctly populate
the dentry. This could be fixed but I don't think it is important
to any of the users of vfs_mkdir() which look at the dentry.
The recommendation to use
d_drop();d_splice_alias()
is ugly but fits with current practice. A planned future patch will
change this.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rename all calls to revert_creds_light() back to revert_creds().
Link: https://lore.kernel.org/r/20241125-work-cred-v2-6-68b9d38bb5b2@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rename all calls to override_creds_light() back to overrid_creds().
Link: https://lore.kernel.org/r/20241125-work-cred-v2-5-68b9d38bb5b2@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of using ->private_data to point at realfile directly, so
that we can add more context per ovl open file.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Use override_creds_light() in ovl_override_creds() and
revert_creds_light() in ovl_revert_creds().
The _light() functions do not change the 'usage' of the credentials in
question, as they refer to the credentials associated with the
mounter, which have a longer lifetime.
In ovl_setup_cred_for_create(), do not need to modify the mounter
credentials (returned by override_creds_light()) 'usage' counter.
Add a warning to verify that we are indeed working with the mounter
credentials (stored in the superblock). Failure in this assumption
means that creds may leak.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
ovl_setup_cred_for_create() decrements one refcount of new creds and
ovl_revert_creds() in callers decrements the last refcount.
In preparation to revert_creds_light() back to caller creds, pass an
explicit reference of the creators creds to the callers and drop the
refcount explicitly in the callers after ovl_revert_creds().
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|