summaryrefslogtreecommitdiff
path: root/fs/namei.c
AgeCommit message (Collapse)Author
2016-05-02lookup_open(): put the dentry fed to ->lookup() or ->atomic_open() into ↵Al Viro
in-lookup hash Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02lookup_open(): expand the call of real_lookup()Al Viro
... and lose the duplicate IS_DEADDIR() - we'd already checked that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): reorder and clean up a bitAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): be paranoid about may_open() return valueAl Viro
It should never return positives; however, with Linux S&M crowd involved, no bogosity is impossible. Results would be unpleasant... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): delay open_to_namei_flags() until the method callAl Viro
nobody else needs that transformation. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02do_last(): take fput() on error after opening to out:Al Viro
make it conditional on *opened & FILE_OPENED; in addition to getting rid of exit_fput: thing, it simplifies atomic_open() cleanup on may_open() failure. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02do_last(): get rid of duplicate ELOOP checkAl Viro
may_open() will catch it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): massage the create_error logics a bitAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): consolidate "overridden ENOENT" in open-yourself casesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02atomic_open(): don't bother with EEXIST check - it's done in do_last()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02Merge branch 'for-linus' into work.lookupsAl Viro
2016-05-02lookup_open(): expand the call of vfs_create()Al Viro
Lift IS_DEADDIR handling up into the part common with atomic_open(), remove it from the latter. Collapse permission checks into the call of may_o_create(), getting it closer to atomic_open() case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02path_openat(): take O_PATH handling out of do_last()Al Viro
do_last() and lookup_open() simpler that way and so does O_PATH itself. As it bloody well should: we find what the pathname resolves to, same way as in stat() et.al. and associate it with FMODE_PATH struct file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups: actual switch to rwsemAl Viro
ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups machinery, part 4 (and last)Al Viro
If we *do* run into an in-lookup match, we need to wait for it to cease being in-lookup. Fortunately, we do have unused space in in-lookup dentries - d_lru is never looked at until it stops being in-lookup. So we can stash a pointer to wait_queue_head from stack frame of the caller of ->lookup(). Some precautions are needed while waiting, but it's not that hard - we do hold a reference to dentry we are waiting for, so it can't go away. If it's found to be in-lookup the wait_queue_head is still alive and will remain so at least while ->d_lock is held. Moreover, the condition we are waiting for becomes true at the same point where everything on that wq gets woken up, so we can just add ourselves to the queue once. d_alloc_parallel() gets a pointer to wait_queue_head_t from its caller; lookup_slow() adjusted, d_add_ci() taught to use d_alloc_parallel() if the dentry passed to it happens to be in-lookup one (i.e. if it's been called from the parallel lookup). That's pretty much it - all that remains is to switch ->i_mutex to rwsem and have lookup_slow() take it shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02parallel lookups machinery, part 3Al Viro
We will need to be able to check if there is an in-lookup dentry with matching parent/name. Right now it's impossible, but as soon as start locking directories shared such beasts will appear. Add a secondary hash for locating those. Hash chains go through the same space where d_alias will be once it's not in-lookup anymore. Search is done under the same bitlock we use for modifications - with the primary hash we can rely on d_rehash() into the wrong chain being the worst that could happen, but here the pointers are buggered once it's removed from the chain. On the other hand, the chains are not going to be long and normally we'll end up adding to the chain anyway. That allows us to avoid bothering with ->d_lock when doing the comparisons - everything is stable until removed from chain. New helper: d_alloc_parallel(). Right now it allocates, verifies that no hashed and in-lookup matches exist and adds to in-lookup hash. Returns ERR_PTR() for error, hashed match (in the unlikely case it's been found) or new dentry. In-lookup matches trigger BUG() for now; that will change in the next commit when we introduce waiting for ongoing lookup to finish. Note that in-lookup matches won't be possible until we actually go for shared locking. lookup_slow() switched to use of d_alloc_parallel(). Again, these commits are separated only for making it easier to review. All this machinery will start doing something useful only when we go for shared locking; it's just that the combination is too large for my taste. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02beginning of transition to parallel lookups - marking in-lookup dentriesAl Viro
marked as such when (would be) parallel lookup is about to pass them to actual ->lookup(); unmarked when * __d_add() is about to make it hashed, positive or not. * __d_move() (from d_splice_alias(), directly or via __d_unalias()) puts a preexisting dentry in its place * in caller of ->lookup() if it has escaped all of the above. Bug (WARN_ON, actually) if it reaches the final dput() or d_instantiate() while still marked such. As the result, we are guaranteed that for as long as the flag is set, dentry will * remain negative unhashed with positive refcount * never have its ->d_alias looked at * never have its ->d_lru looked at * never have its ->d_parent and ->d_name changed Right now we have at most one such for any given parent directory. With parallel lookups that restriction will weaken to * only exist when parent is locked shared * at most one with given (parent,name) pair (comparison of names is according to ->d_compare()) * only exist when there's no hashed dentry with the same (parent,name) Transition will take the next several commits; unfortunately, we'll only be able to switch to rwsem at the end of this series. The reason for not making it a single patch is to simplify review. New primitives: d_in_lookup() (a predicate checking if dentry is in the in-lookup state) and d_lookup_done() (tells the system that we are done with lookup and if it's still marked as in-lookup, it should cease to be such). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02lookup_slow(): bugger off on IS_DEADDIR() from the very beginningAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02Merge getxattr prototype change into work.lookupsAl Viro
The rest of work.xattr stuff isn't needed for this branch
2016-05-01ima: add support for creating files using the mknodat syscallMimi Zohar
Commit 3034a14 "ima: pass 'opened' flag to identify newly created files" stopped identifying empty files as new files. However new empty files can be created using the mknodat syscall. On systems with IMA-appraisal enabled, these empty files are not labeled with security.ima extended attributes properly, preventing them from subsequently being opened in order to write the file data contents. This patch defines a new hook named ima_post_path_mknod() to mark these empty files, created using mknodat, as new in order to allow the file data contents to be written. In addition, files with security.ima xattrs containing a file signature are considered "immutable" and can not be modified. The file contents need to be written, before signing the file. This patch relaxes this requirement for new files, allowing the file signature to be written before the file contents. Changelog: - defer identifying files with signatures stored as security.ima (based on Dmitry Rozhkov's comments) - removing tests (eg. dentry, dentry->d_inode, inode->i_size == 0) (based on Al's review) Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Al Viro <<viro@zeniv.linux.org.uk> Tested-by: Dmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
2016-04-30atomic_open(): fix the handling of create_errorAl Viro
* if we have a hashed negative dentry and either CREAT|EXCL on r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL with failing may_o_create(), we should fail with EROFS or the error may_o_create() has returned, but not ENOENT. Which is what the current code ends up returning. * if we have CREAT|TRUNC hitting a regular file on a read-only filesystem, we can't fail with EROFS here. At the very least, not until we'd done follow_managed() - we might have a writable file (or a device, for that matter) bound on top of that one. Moreover, the code downstream will see that O_TRUNC and attempt to grab the write access (*after* following possible mount), so if we really should fail with EROFS, it will happen. No need to do that inside atomic_open(). The real logics is much simpler than what the current code is trying to do - if we decided to go for simple lookup, ended up with a negative dentry *and* had create_error set, fail with create_error. No matter whether we'd got that negative dentry from lookup_real() or had found it in dcache. Cc: stable@vger.kernel.org # v3.6+ Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10don't bother with ->d_inode->i_sb - it's always equal to ->d_sbAl Viro
... and neither can ever be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-06xfs: use ->readlink to implement the readlink_by_handle ioctlChristoph Hellwig
Also drop the now unused readlink_copy export. [dchinner: use d_inode(dentry) rather than dentry->d_inode] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-03-31posix_acl: Inode acl caching fixesAndreas Gruenbacher
When get_acl() is called for an inode whose ACL is not cached yet, the get_acl inode operation is called to fetch the ACL from the filesystem. The inode operation is responsible for updating the cached acl with set_cached_acl(). This is done without locking at the VFS level, so another task can call set_cached_acl() or forget_cached_acl() before the get_acl inode operation gets to calling set_cached_acl(), and then get_acl's call to set_cached_acl() results in caching an outdate ACL. Prevent this from happening by setting the cached ACL pointer to a task-specific sentinel value before calling the get_acl inode operation. Move the responsibility for updating the cached ACL from the get_acl inode operations to get_acl(). There, only set the cached ACL if the sentinel value hasn't changed. The sentinel values are chosen to have odd values. Likewise, the value of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have an even value (ACLs are aligned in memory). This allows to distinguish uncached ACLs values from ACL objects. In addition, switch from guarding inode->i_acl and inode->i_default_acl upates by the inode->i_lock spinlock to using xchg() and cmpxchg(). Filesystems that do not want ACLs returned from their get_acl inode operations to be cached must call forget_cached_acl() to prevent the VFS from doing so. (Patch written by Al Viro and Andreas Gruenbacher.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-31fix the braino in "namei: massage lookup_slow() to be usable by ↵Al Viro
lookup_one_len_unlocked()" We should try to trigger automount *before* bailing out on negative dentry. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Reported-by: Arend van Spriel <arend@broadcom.com> Tested-by: Arend van Spriel <arend@broadcom.com> Tested-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-28constify security_path_{mkdir,mknod,symlink}Al Viro
... as well as unix_mknod() and may_o_create() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14kill dentry_unhash()Al Viro
the last user is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14namei: teach lookup_slow() to skip revalidateAl Viro
... and make mountpoint_last() use it. That makes all candidates for lookup with parent locked shared go through lookup_slow(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()Al Viro
Return dentry and don't pass nameidata or path; lift crossing mountpoints into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14lookup_one_len_unlocked(): use lookup_dcache()Al Viro
No need to lock parent just because of ->d_revalidate() on child; contrary to the stale comment, lookup_dcache() *can* be used without locking the parent. Result can be moved as soon as we return, of course, but the same is true for lookup_one_len_unlocked() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14namei: simplify invalidation logics in lookup_dcache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14namei: change calling conventions for lookup_{fast,slow} and follow_managed()Al Viro
Have lookup_fast() return 1 on success and 0 on "need to fall back"; lookup_slow() and follow_managed() return positive (1) on success. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14namei: untanlge lookup_fast()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-05lookup_dcache(): lift d_alloc() into callersAl Viro
... and kill need_lookup thing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-05do_last(): reorder and simplify a bitAl Viro
bugger off on negatives a bit earlier, simplify the tests Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27do_last(): ELOOP failure exit should be done after leaving RCU modeAl Viro
... or we risk seeing a bogus value of d_is_symlink() there. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27should_follow_link(): validate ->d_seq after having decided to followAl Viro
... otherwise d_is_symlink() above might have nothing to do with the inode value we've got. Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27namei: ->d_inode of a pinned dentry is stable only for positivesAl Viro
both do_last() and walk_component() risk picking a NULL inode out of dentry about to become positive, *then* checking its flags and seeing that it's not negative anymore and using (already stale by then) value they'd fetched earlier. Usually ends up oopsing soon after that... Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-27do_last(): don't let a bogus return value from ->open() et.al. to confuse usAl Viro
... into returning a positive to path_openat(), which would interpret that as "symlink had been encountered" and proceed to corrupt memory, etc. It can only happen due to a bug in some ->open() instance or in some LSM hook, etc., so we report any such event *and* make sure it doesn't trick us into further unpleasantness. Cc: stable@vger.kernel.org # v3.6+, at least Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-12Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of stuff. That probably should've been 5 or 6 separate branches, but by the time I'd realized how large and mixed that bag had become it had been too close to -final to play with rebasing. Some fs/namei.c cleanups there, memdup_user_nul() introduction and switching open-coded instances, burying long-dead code, whack-a-mole of various kinds, several new helpers for ->llseek(), assorted cleanups and fixes from various people, etc. One piece probably deserves special mention - Neil's lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets called without ->i_mutex and tries to avoid ever taking it. That, of course, means that it's not useful for any directory modifications, but things like getting inode attributes in nfds readdirplus are fine with that. I really should've asked for moratorium on lookup-related changes this cycle, but since I hadn't done that early enough... I *am* asking for that for the coming cycle, though - I'm going to try and get conversion of i_mutex to rwsem with ->lookup() done under lock taken shared. There will be a patch closer to the end of the window, along the lines of the one Linus had posted last May - mechanical conversion of ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/ inode_is_locked()/inode_lock_nested(). To quote Linus back then: ----- | This is an automated patch using | | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/' | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/' | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/' | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/' | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/' | | with a very few manual fixups ----- I'm going to send that once the ->i_mutex-affecting stuff in -next gets mostly merged (or when Linus says he's about to stop taking merges)" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) nfsd: don't hold i_mutex over userspace upcalls fs:affs:Replace time_t with time64_t fs/9p: use fscache mutex rather than spinlock proc: add a reschedule point in proc_readfd_common() logfs: constify logfs_block_ops structures fcntl: allow to set O_DIRECT flag on pipe fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE fs: xattr: Use kvfree() [s390] page_to_phys() always returns a multiple of PAGE_SIZE nbd: use ->compat_ioctl() fs: use block_device name vsprintf helper lib/vsprintf: add %*pg format specifier fs: use gendisk->disk_name where possible poll: plug an unused argument to do_poll amdkfd: don't open-code memdup_user() cdrom: don't open-code memdup_user() rsxx: don't open-code memdup_user() mtip32xx: don't open-code memdup_user() [um] mconsole: don't open-code memdup_user_nul() [um] hostaudio: don't open-code memdup_user() ...
2016-01-09nfsd: don't hold i_mutex over userspace upcallsNeilBrown
We need information about exports when crossing mountpoints during lookup or NFSv4 readdir. If we don't already have that information cached, we may have to ask (and wait for) rpc.mountd. In both cases we currently hold the i_mutex on the parent of the directory we're asking rpc.mountd about. We've seen situations where rpc.mountd performs some operation on that directory that tries to take the i_mutex again, resulting in deadlock. With some care, we may be able to avoid that in rpc.mountd. But it seems better just to avoid holding a mutex while waiting on userspace. It appears that lookup_one_len is pretty much the only operation that needs the i_mutex. So we could just drop the i_mutex elsewhere and do something like mutex_lock() lookup_one_len() mutex_unlock() In many cases though the lookup would have been cached and not required the i_mutex, so it's more efficient to create a lookup_one_len() variant that only takes the i_mutex when necessary. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04don't carry MAY_OPEN in op->acc_modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30switch ->get_link() to delayed_call, kill ->put_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08teach page_get_link() to work in RCU modeAl Viro
more or less along the lines of Neil's patchset, sans the insanity around kmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08replace ->follow_link() with new method that could stay in RCU modeAl Viro
new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-08don't put symlink bodies in pagecache into highmemAl Viro
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06restore_nameidata(): no need to clear now->stackAl Viro
microoptimization: in all callers *now is in the frame we are about to leave. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-06namei.c: take "jump to root" into a new helperAl Viro
... and use it both in path_init() (for absolute pathnames) and get_link() (for absolute symlinks). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>