summaryrefslogtreecommitdiff
path: root/fs/ceph/file.c
AgeCommit message (Collapse)Author
2015-04-15VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11mirror O_APPEND and O_DIRECT into iocb->ki_flagsAl Viro
... avoiding write_iter/fcntl races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11switch generic_write_checks() to iocb and iterAl Viro
... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c
2015-04-11generic_write_checks(): drop isblk argumentAl Viro
all remaining callers are passing 0; some just obscure that fact. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11make new_sync_{read,write}() staticAl Viro
All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11Merge branch 'iocb' into for-nextAl Viro
2015-03-25fs: move struct kiocb to fs.hChristoph Hellwig
struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-12fs: remove ki_nbytesChristoph Hellwig
There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...
2015-02-22VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)David Howells
Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "On the RBD side, there is a conversion to blk-mq from Christoph, several long-standing bug fixes from Ilya, and some cleanup from Rickard Strandqvist. On the CephFS side there is a long list of fixes from Zheng, including improved session handling, a few IO path fixes, some dcache management correctness fixes, and several blocking while !TASK_RUNNING fixes. The core code gets a few cleanups and Chaitanya has added support for TCP_NODELAY (which has been used on the server side for ages but we somehow missed on the kernel client). There is also an update to MAINTAINERS to fix up some email addresses and reflect that Ilya and Zheng are doing most of the maintenance for RBD and CephFS these days. Do not be surprised to see a pull request come from one of them in the future if I am unavailable for some reason" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) MAINTAINERS: update Ceph and RBD maintainers libceph: kfree() in put_osd() shouldn't depend on authorizer libceph: fix double __remove_osd() problem rbd: convert to blk-mq ceph: return error for traceless reply race ceph: fix dentry leaks ceph: re-send requests when MDS enters reconnecting stage ceph: show nocephx_require_signatures and notcp_nodelay options libceph: tcp_nodelay support rbd: do not treat standalone as flatten ceph: fix atomic_open snapdir ceph: properly mark empty directory as complete client: include kernel version in client metadata ceph: provide seperate {inode,file}_operations for snapdir ceph: fix request time stamp encoding ceph: fix reading inline data when i_size > PAGE_SIZE ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) rbd: fix error paths in rbd_dev_refresh() ...
2015-02-19ceph: fix atomic_open snapdirYan, Zheng
ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value and creates snapdir inode if it's -ENOENT Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: fix reading inline data when i_size > PAGE_SIZEYan, Zheng
when inode has inline data but its size > PAGE_SIZE (it was truncated to larger size), previous direct read code return -EIO. This patch adds code to return zeros for data whose offset > PAGE_SIZE. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-02-19ceph: properly zero data pages for file holes.Yan, Zheng
A bug is found in striped_read() of fs/ceph/file.c. striped_read() calls ceph_zero_pape_vector_range(). The first argument, page_align + read + ret, passed to ceph_zero_pape_vector_range() is wrong. When a file has holes, this wrong parameter may cause memory corruption either in kernal space or user space. Kernel space memory may be corrupted in the case of non direct IO; user space memory may be corrupted in the case of direct IO. In the latter case, the application doing direct IO may crash due to memory corruption, as we have experienced. The correct value should be initial_align + read + ret, where intial_align = o_direct ? buf_align : io_align. Compared with page_align, the current page offest, initial_align is the initial page offest, which should be used to calculate the page and offset in ceph_zero_pape_vector_range(). Reported-by: caifeng zhu <zhucaifeng@unissoft-nj.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-01-20fs: export inode_to_bdi and use it in favor of mapping->backing_dev_infoChristoph Hellwig
Now that we got rid of the bdi abuse on character devices we can always use sb->s_bdi to get at the backing_dev_info for a file, except for the block device special case. Export inode_to_bdi and replace uses of mapping->backing_dev_info with it to prepare for the removal of mapping->backing_dev_info. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The big item here is support for inline data for CephFS and for message signatures from Zheng. There are also several bug fixes, including interrupted flock request handling, 0-length xattrs, mksnap, cached readdir results, and a message version compat field. Finally there are several cleanups from Ilya, Dan, and Markus. Note that there is another series coming soon that fixes some bugs in the RBD 'lingering' requests, but it isn't quite ready yet" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) ceph: fix setting empty extended attribute ceph: fix mksnap crash ceph: do_sync is never initialized libceph: fixup includes in pagelist.h ceph: support inline data feature ceph: flush inline version ceph: convert inline data to normal data before data write ceph: sync read inline data ceph: fetch inline data when getting Fcr cap refs ceph: use getattr request to fetch inline data ceph: add inline data to pagecache ceph: parse inline data in MClientReply and MClientCaps libceph: specify position of extent operation libceph: add CREATE osd operation support libceph: add SETXATTR/CMPXATTR osd operations support rbd: don't treat CEPH_OSD_OP_DELETE as extent op ceph: remove unused stringification macros libceph: require cephx message signature by default ceph: introduce global empty snap context ceph: message versioning fixes ...
2014-12-17ceph: convert inline data to normal data before data writeYan, Zheng
Before any data write, convert inline data to normal data and set i_inline_version to CEPH_INLINE_NONE. The OSD request that saves inline data to object contains 3 operations (CMPXATTR, WRITE and SETXATTR). It compares a xattr named 'inline_version' to prevent old data overwrites newer data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: sync read inline dataYan, Zheng
we can't use getattr to fetch inline data while holding Fr cap, because it can cause deadlock. If we need to sync read inline data, drop cap refs first, then use getattr to fetch inline data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: fetch inline data when getting Fcr cap refsYan, Zheng
we can't use getattr to fetch inline data after getting Fcr caps, because it can cause deadlock. The solution is try bringing inline data to page cache when not holding any cap, and hope the inline data page is still there after getting the Fcr caps. If the page is still there, pin it in page cache for later IO. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17libceph: specify position of extent operationYan, Zheng
allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-11-19kill f_dentry usesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19assorted conversions to %p[dD]Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-14ceph: include the initial ACL in create/mkdir/mknod MDS requestsYan, Zheng
Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL in create/mkdir/mknod MDS requests. So MDS can handle creating file/directory and setting the initial ACL in one request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-10-14ceph: remove redundant io_iter_advance()Yan, Zheng
ceph_sync_read and generic_file_read_iter() have already advanced the IO iterator. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-10-14ceph: request xattrs if xattr_version is zeroYan, Zheng
Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-07-28ceph: fix append mode writeYan, Zheng
generic_write_checks() may update 'pos', so we need to pass 'pos' to ceph_sync_write() and ceph_sync_direct_write(); Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-21ceph: check zero length in ceph_sync_read()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: pass proper page offset to copy_page_to_iter()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-07-08ceph: check unsupported fallocate modeYan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-06-12ceph: switch to iter_file_splice_write()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph: switch to ->write_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_sync_direct_write: stop poking into iov_iter gutsAl Viro
all needed primitives are there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_sync_read: stop poking into iov_iter gutsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph: switch to ->read_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06start adding the tag to iov_iterAl Viro
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06new helper: generic_file_read_iter()Al Viro
iov_iter-using variant of generic_file_aio_read(). Some callers converted. Note that it's still not quite there for use as ->read_iter() - we depend on having zero iter->iov_offset in O_DIRECT case. Fortunately, that's true for all converted callers (and for generic_file_aio_read() itself). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06ceph_aio_read(): keep iov_iter across retriesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill generic_segment_checks()Al Viro
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06kill iov_iter_copy_from_user()Al Viro
all callers can use copy_page_from_iter() and it actually simplifies them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-20Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "These are regression and bug fixes for ext4. We had a number of new features in ext4 during this merge window (ZERO_RANGE and COLLAPSE_RANGE fallocate modes, renameat, etc.) so there were many more regression and bug fixes this time around. It didn't help that xfstests hadn't been fully updated to fully stress test COLLAPSE_RANGE until after -rc1" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (31 commits) ext4: disable COLLAPSE_RANGE for bigalloc ext4: fix COLLAPSE_RANGE failure with 1KB block size ext4: use EINVAL if not a regular file in ext4_collapse_range() ext4: enforce we are operating on a regular file in ext4_zero_range() ext4: fix extent merging in ext4_ext_shift_path_extents() ext4: discard preallocations after removing space ext4: no need to truncate pagecache twice in collapse range ext4: fix removing status extents in ext4_collapse_range() ext4: use filemap_write_and_wait_range() correctly in collapse range ext4: use truncate_pagecache() in collapse range ext4: remove temporary shim used to merge COLLAPSE_RANGE and ZERO_RANGE ext4: fix ext4_count_free_clusters() with EXT4FS_DEBUG and bigalloc enabled ext4: always check ext4_ext_find_extent result ext4: fix error handling in ext4_ext_shift_extents ext4: silence sparse check warning for function ext4_trim_extent ext4: COLLAPSE_RANGE only works on extent-based files ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches ext4: use i_size_read in ext4_unaligned_aio() fs: disallow all fallocate operation on active swapfile fs: move falloc collapse range check into the filesystem methods ...
2014-04-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
2014-04-12fs: disallow all fallocate operation on active swapfileLukas Czerner
Currently some file system have IS_SWAPFILE check in their fallocate implementations and some do not. However we should really prevent any fallocate operation on swapfile so move the check to vfs and remove the redundant checks from the file systems fallocate implementations. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-04-12ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failureAl Viro
ceph_osdc_put_request(ERR_PTR(-error)) oopses. What we want there is break, not goto out. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-04ceph: drop extra open file reference in ceph_atomic_open()Yan, Zheng
ceph_atomic_open() calls ceph_open() after receiving the MDS reply. ceph_open() grabs an extra open file reference. (The open request already holds an open file reference) Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-03ceph: fscache: Update object store limit after file writingYunchuan Wen
Synchronize object->store_limit[_l] with new inode->i_size after file writing. Tested-by: Milosz Tanski <milosz@adfin.com> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com> Signed-off-by: Min Chen <minchen@ubuntukylin.com> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
2014-04-03ceph: do not chain inode updates to parent fsyncSage Weil
The fsync(dirfd) only covers namespace operations, not inode updates. We do not need to cover setattr variants or O_TRUNC. Reported-by: Al Viro <viro@xeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-04-01ceph_aio_write(): switch to generic_perform_write()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01kill the 5th argument of generic_file_buffered_write()Al Viro
same story - it's &iocb->ki_pos in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-17ceph: add missing init_acl() for mkdir() and atomic_open()Yan, Zheng
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-01-28ceph: cast PAGE_SIZE to size_t in ceph_sync_write()Ilya Dryomov
Use min_t(size_t, ...) instead of plain min(), which does strict type checking, to avoid compile warning on i386. Cc: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>