diff options
Diffstat (limited to 'Documentation/filesystems/vfs.txt')
-rw-r--r-- | Documentation/filesystems/vfs.txt | 173 |
1 files changed, 132 insertions, 41 deletions
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 20899e095e7e..ef0714aa8e40 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -95,10 +95,11 @@ functions: extern int unregister_filesystem(struct file_system_type *); The passed struct file_system_type describes your filesystem. When a -request is made to mount a device onto a directory in your filespace, -the VFS will call the appropriate get_sb() method for the specific -filesystem. The dentry for the mount point will then be updated to -point to the root inode for the new filesystem. +request is made to mount a filesystem onto a directory in your namespace, +the VFS will call the appropriate mount() method for the specific +filesystem. New vfsmount refering to the tree returned by ->mount() +will be attached to the mountpoint, so that when pathname resolution +reaches the mountpoint it will jump into the root of that vfsmount. You can see all filesystems that are registered to the kernel in the file /proc/filesystems. @@ -107,14 +108,14 @@ file /proc/filesystems. struct file_system_type ----------------------- -This describes the filesystem. As of kernel 2.6.22, the following +This describes the filesystem. As of kernel 2.6.39, the following members are defined: struct file_system_type { const char *name; int fs_flags; - int (*get_sb) (struct file_system_type *, int, - const char *, void *, struct vfsmount *); + struct dentry (*mount) (struct file_system_type *, int, + const char *, void *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; @@ -128,11 +129,11 @@ struct file_system_type { fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) - get_sb: the method to call when a new instance of this + mount: the method to call when a new instance of this filesystem should be mounted kill_sb: the method to call when an instance of this filesystem - should be unmounted + should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in most cases. @@ -141,7 +142,7 @@ struct file_system_type { s_lock_key, s_umount_key: lockdep-specific -The get_sb() method has the following arguments: +The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized by the specific filesystem code @@ -153,32 +154,39 @@ The get_sb() method has the following arguments: void *data: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) - struct vfsmount *mnt: a vfs-internal representation of a mount point +The mount() method must return the root dentry of the tree requested by +caller. An active reference to its superblock must be grabbed and the +superblock must be locked. On failure it should return ERR_PTR(error). -The get_sb() method must determine if the block device specified -in the dev_name and fs_type contains a filesystem of the type the method -supports. If it succeeds in opening the named block device, it initializes a -struct super_block descriptor for the filesystem contained by the block device. -On failure it returns an error. +The arguments match those of mount(2) and their interpretation +depends on filesystem type. E.g. for block filesystems, dev_name is +interpreted as block device name, that device is opened and if it +contains a suitable filesystem image the method creates and initializes +struct super_block accordingly, returning its root dentry to caller. + +->mount() may choose to return a subtree of existing filesystem - it +doesn't have to create a new one. The main result from the caller's +point of view is a reference to dentry at the root of (sub)tree to +be attached; creation of new superblock is a common side effect. The most interesting member of the superblock structure that the -get_sb() method fills in is the "s_op" field. This is a pointer to +mount() method fills in is the "s_op" field. This is a pointer to a "struct super_operations" which describes the next level of the filesystem implementation. -Usually, a filesystem uses one of the generic get_sb() implementations -and provides a fill_super() method instead. The generic methods are: +Usually, a filesystem uses one of the generic mount() implementations +and provides a fill_super() callback instead. The generic variants are: - get_sb_bdev: mount a filesystem residing on a block device + mount_bdev: mount a filesystem residing on a block device - get_sb_nodev: mount a filesystem that is not backed by a device + mount_nodev: mount a filesystem that is not backed by a device - get_sb_single: mount a filesystem which shares the instance between + mount_single: mount a filesystem which shares the instance between all mounts -A fill_super() method implementation has the following arguments: +A fill_super() callback implementation has the following arguments: - struct super_block *sb: the superblock structure. The method fill_super() + struct super_block *sb: the superblock structure. The callback must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII @@ -325,7 +333,8 @@ struct inode_operations { void * (*follow_link) (struct dentry *, struct nameidata *); void (*put_link) (struct dentry *, struct nameidata *, void *); void (*truncate) (struct inode *); - int (*permission) (struct inode *, int, struct nameidata *); + int (*permission) (struct inode *, int, unsigned int); + int (*check_acl)(struct inode *, int, unsigned int); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); @@ -414,6 +423,13 @@ otherwise noted. permission: called by the VFS to check for access rights on a POSIX-like filesystem. + May be called in rcu-walk mode (flags & IPERM_FLAG_RCU). If in rcu-walk + mode, the filesystem must check the permission without blocking or + storing to the inode. + + If a situation is encountered that rcu-walk cannot handle, return + -ECHILD and it will be called again in ref-walk mode. + setattr: called by the VFS to set attributes for a file. This method is called by chmod(2) and related system calls. @@ -847,12 +863,17 @@ defined: struct dentry_operations { int (*d_revalidate)(struct dentry *, struct nameidata *); - int (*d_hash) (struct dentry *, struct qstr *); - int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); - int (*d_delete)(struct dentry *); + int (*d_hash)(const struct dentry *, const struct inode *, + struct qstr *); + int (*d_compare)(const struct dentry *, const struct inode *, + const struct dentry *, const struct inode *, + unsigned int, const char *, const struct qstr *); + int (*d_delete)(const struct dentry *); void (*d_release)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)(struct dentry *, char *, int); + struct vfsmount *(*d_automount)(struct path *); + int (*d_manage)(struct dentry *, bool, bool); }; d_revalidate: called when the VFS needs to revalidate a dentry. This @@ -860,13 +881,45 @@ struct dentry_operations { dcache. Most filesystems leave this as NULL, because all their dentries in the dcache are valid - d_hash: called when the VFS adds a dentry to the hash table + d_revalidate may be called in rcu-walk mode (nd->flags & LOOKUP_RCU). + If in rcu-walk mode, the filesystem must revalidate the dentry without + blocking or storing to the dentry, d_parent and d_inode should not be + used without care (because they can go NULL), instead nd->inode should + be used. + + If a situation is encountered that rcu-walk cannot handle, return + -ECHILD and it will be called again in ref-walk mode. + + d_hash: called when the VFS adds a dentry to the hash table. The first + dentry passed to d_hash is the parent directory that the name is + to be hashed into. The inode is the dentry's inode. - d_compare: called when a dentry should be compared with another + Same locking and synchronisation rules as d_compare regarding + what is safe to dereference etc. - d_delete: called when the last reference to a dentry is - deleted. This means no-one is using the dentry, however it is - still valid and in the dcache + d_compare: called to compare a dentry name with a given name. The first + dentry is the parent of the dentry to be compared, the second is + the parent's inode, then the dentry and inode (may be NULL) of the + child dentry. len and name string are properties of the dentry to be + compared. qstr is the name to compare it with. + + Must be constant and idempotent, and should not take locks if + possible, and should not or store into the dentry or inodes. + Should not dereference pointers outside the dentry or inodes without + lots of care (eg. d_parent, d_inode, d_name should not be used). + + However, our vfsmount is pinned, and RCU held, so the dentries and + inodes won't disappear, neither will our sb or filesystem module. + ->i_sb and ->d_sb may be used. + + It is a tricky calling convention because it needs to be called under + "rcu-walk", ie. without any locks or references on things. + + d_delete: called when the last reference to a dentry is dropped and the + dcache is deciding whether or not to cache it. Return 1 to delete + immediately, or 0 to cache the dentry. Default is NULL which means to + always cache a reachable dentry. d_delete must be constant and + idempotent. d_release: called when a dentry is really deallocated @@ -887,6 +940,47 @@ struct dentry_operations { at the end of the buffer, and returns a pointer to the first char. dynamic_dname() helper function is provided to take care of this. + d_automount: called when an automount dentry is to be traversed (optional). + This should create a new VFS mount record and return the record to the + caller. The caller is supplied with a path parameter giving the + automount directory to describe the automount target and the parent + VFS mount record to provide inheritable mount parameters. NULL should + be returned if someone else managed to make the automount first. If + the vfsmount creation failed, then an error code should be returned. + If -EISDIR is returned, then the directory will be treated as an + ordinary directory and returned to pathwalk to continue walking. + + If a vfsmount is returned, the caller will attempt to mount it on the + mountpoint and will remove the vfsmount from its expiration list in + the case of failure. The vfsmount should be returned with 2 refs on + it to prevent automatic expiration - the caller will clean up the + additional ref. + + This function is only used if DCACHE_NEED_AUTOMOUNT is set on the + dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the + inode being added. + + d_manage: called to allow the filesystem to manage the transition from a + dentry (optional). This allows autofs, for example, to hold up clients + waiting to explore behind a 'mountpoint' whilst letting the daemon go + past and construct the subtree there. 0 should be returned to let the + calling process continue. -EISDIR can be returned to tell pathwalk to + use this directory as an ordinary directory and to ignore anything + mounted on it and not to check the automount flag. Any other error + code will abort pathwalk completely. + + If the 'mounting_here' parameter is true, then namespace_sem is being + held by the caller and the function should not initiate any mounts or + unmounts that it will then wait for. + + If the 'rcu_walk' parameter is true, then the caller is doing a + pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, + and the caller can be asked to leave it and call again by returing + -ECHILD. + + This function is only used if DCACHE_MANAGE_TRANSIT is set on the + dentry being transited from. + Example : static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen) @@ -910,14 +1004,11 @@ manipulate dentries: the usage count) dput: close a handle for a dentry (decrements the usage count). If - the usage count drops to 0, the "d_delete" method is called - and the dentry is placed on the unused list if the dentry is - still in its parents hash list. Putting the dentry on the - unused list just means that if the system needs some RAM, it - goes through the unused list of dentries and deallocates them. - If the dentry has already been unhashed and the usage count - drops to 0, in this case the dentry is deallocated after the - "d_delete" method is called + the usage count drops to 0, and the dentry is still in its + parent's hash, the "d_delete" method is called to check whether + it should be cached. If it should not be cached, or if the dentry + is not hashed, it is deleted. Otherwise cached dentries are put + into an LRU list to be reclaimed on memory shortage. d_drop: this unhashes a dentry from its parents hash list. A subsequent call to dput() will deallocate the dentry if its |