summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
2013-06-07xfs: kill suid/sgid through the truncate path.Dave Chinner
commit 2962f5a5dcc56f69cbf62121a7be67cc15d6940b upstream. XFS has failed to kill suid/sgid bits correctly when truncating files of non-zero size since commit c4ed4243 ("xfs: split xfs_setattr") introduced in the 3.1 kernel. Fix it. Fix it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-03xfs: Fix possible use-after-free with AIOJan Kara
commit 4b05d09c18d9aa62d2e7fb4b057f54e5a38963f5 upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> CC: xfs@oss.sgi.com CC: Ben Myers <bpm@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26xfs: drop buffer io reference when a bad bio is builtDave Chinner
commit d69043c42d8c6414fa28ad18d99973aa6c1c2e24 upstream. Error handling in xfs_buf_ioapply_map() does not handle IO reference counts correctly. We increment the b_io_remaining count before building the bio, but then fail to decrement it in the failure case. This leads to the buffer never running IO completion and releasing the reference that the IO holds, so at unmount we can leak the buffer. This leak is captured by this assert failure during unmount: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273 This is not a new bug - the b_io_remaining accounting has had this problem for a long, long time - it's just very hard to get a zero length bio being built by this code... Further, the buffer IO error can be overwritten on a multi-segment buffer by subsequent bio completions for partial sections of the buffer. Hence we should only set the buffer error status if the buffer is not already carrying an error status. This ensures that a partial IO error on a multi-segment buffer will not be lost. This part of the problem is a regression, however. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-17xfs: fix reading of wrapped log dataDave Chinner
commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream. Commit 4439647 ("xfs: reset buffer pointers before freeing them") in 3.0-rc1 introduced a regression when recovering log buffers that wrapped around the end of log. The second part of the log buffer at the start of the physical log was being read into the header buffer rather than the data buffer, and hence recovery was seeing garbage in the data buffer when it got to the region of the log buffer that was incorrectly read. Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-21tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checkingHugh Dickins
commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream. Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-28Merge tag 'split-asm_system_h-for-linus-20120328' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...
2012-03-28Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull XFS update (part 2) from Ben Myers: "Fixes for tracing of xfs_name strings, flag handling in open_by_handle, a log space hang with freeze/unfreeze, fstrim offset calculations, a section mismatch with xfs_qm_exit, an oops in xlog_recover_process_iunlinks, and a deadlock in xfs_rtfree_extent. There are also additional trace points for attributes, and the addition of a workqueue for allocation to work around kernel stack size limitations." * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: add lots of attribute trace points xfs: Fix oops on IO error during xlog_recover_process_iunlinks() xfs: fix fstrim offset calculations xfs: Account log unmount transaction correctly xfs: don't cache inodes read through bulkstat xfs: trace xfs_name strings correctly xfs: introduce an allocation workqueue xfs: Fix open flag handling in open_by_handle code xfs: fix deadlock in xfs_rtfree_extent fs: xfs: fix section mismatch in linux-next
2012-03-28Remove all #inclusions of asm/system.hDavid Howells
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-27xfs: add lots of attribute trace pointsDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-27xfs: Fix oops on IO error during xlog_recover_process_iunlinks()Jan Kara
When an IO error happens during inode deletion run from xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not count with the fact that read of a buffer which was read a while ago can really fail which results in the oops on agi = XFS_BUF_TO_AGI(agibp); Fix the problem by cleaning up the buffer handling in xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer lock but keep buffer reference to AG buffer. That is enough for buffer to stay pinned in memory and we don't have to call xfs_read_agi() all the time. CC: stable@kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-27xfs: fix fstrim offset calculationsDave Chinner
xfs_ioc_fstrim() doesn't treat the incoming offset and length correctly. It treats them as a filesystem block address, rather than a disk address. This is wrong because the range passed in is a linear representation, while the filesystem block address notation is a sparse representation. Hence we cannot convert the range direct to filesystem block units and then use that for calculating the range to trim. While this sounds dangerous, the problem is limited to calculating what AGs need to be trimmed. The code that calcuates the actual ranges to trim gets the right result (i.e. only ever discards free space), even though it uses the wrong ranges to limit what is trimmed. Hence this is not a bug that endangers user data. Fix this by treating the range as a disk address range and use the appropriate functions to convert the range into the desired formats for calculations. Further, fix the first free extent lookup (the longest) to actually find the largest free extent. Currently this lookup uses a <= lookup, which results in finding the extent to the left of the largest because we can never get an exact match on the largest extent. This is due to the fact that while we know it's size, we don't know it's location and so the exact match fails and we move one record to the left to get the next largest extent. Instead, use a >= search so that the lookup returns the largest extent regardless of the fact we don't get an exact match on it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-26xfs: Account log unmount transaction correctlyDave Chinner
There have been a few reports of this warning appearing recently: XFS (dm-4): xlog_space_left: head behind tail tail_cycle = 129, tail_bytes = 20163072 GH cycle = 129, GH bytes = 20162880 The common cause appears to be lots of freeze and unfreeze cycles, and the output from the warnings indicates that we are leaking around 8 bytes of log space per freeze/unfreeze cycle. When we freeze the filesystem, we write an unmount record and that uses xlog_write directly - a special type of transaction, effectively. What it doesn't do, however, is correctly account for the log space it uses. The unmount record writes an 8 byte structure with a special magic number into the log, and the space this consumes is not accounted for in the log ticket tracking the operation. Hence we leak 8 bytes every unmount record that is written. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-26xfs: don't cache inodes read through bulkstatDave Chinner
When we read inodes via bulkstat, we generally only read them once and then throw them away - they never get used again. If we retain them in cache, then it simply causes the working set of inodes and other cached items to be reclaimed just so the inode cache can grow. Avoid this problem by marking inodes read by bulkstat not to be cached and check this flag in .drop_inode to determine whether the inode should be added to the VFS LRU or not. If the inode lookup hits an already cached inode, then don't set the flag. If the inode lookup hits an inode marked with no cache flag, remove the flag and allow it to be cached once the current reference goes away. Inodes marked as not cached will get cleaned up by the background inode reclaim or via memory pressure, so they will still generate some short term cache pressure. They will, however, be reclaimed much sooner and in preference to cache hot inodes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-26xfs: trace xfs_name strings correctlyChristoph Hellwig
Strings store in an xfs_name structure are often not NUL terminated, print them using the correct printf specifiers that make use of the string length store in the xfs_name structure. Reported-by: Brian Candler <B.Candler@pobox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-23Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull XFS updates from Ben Myers: "Scalability improvements for dquots, log grant code cleanups, plus bugfixes and cleanups large and small" Fix up various trivial conflicts that were due to some of the earlier patches already having been integrated into v3.3 as bugfixes, and then there were development patches on top of those. Easily merged by just taking the newer version from the pulled branch. * 'for-linus' of git://oss.sgi.com/xfs/xfs: (45 commits) xfs: fallback to vmalloc for large buffers in xfs_getbmap xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get xfs: remove remaining scraps of struct xfs_iomap xfs: fix inode lookup race xfs: clean up minor sparse warnings xfs: remove the global xfs_Gqm structure xfs: remove the per-filesystem list of dquots xfs: use per-filesystem radix trees for dquot lookup xfs: per-filesystem dquot LRU lists xfs: use common code for quota statistics xfs: reimplement fdatasync support xfs: split in-core and on-disk inode log item fields xfs: make xfs_inode_item_size idempotent xfs: log timestamp updates xfs: log file size updates at I/O completion time xfs: log file size updates as part of unwritten extent conversion xfs: do not require an ioend for new EOF calculation xfs: use per-filesystem I/O completion workqueues quota: make Q_XQUOTASYNC a noop xfs: include reservations in quota reporting ...
2012-03-22xfs: introduce an allocation workqueueDave Chinner
We currently have significant issues with the amount of stack that allocation in XFS uses, especially in the writeback path. We can easily consume 4k of stack between mapping the page, manipulating the bmap btree and allocating blocks from the free list. Not to mention btree block readahead and other functionality that issues IO in the allocation path. As a result, we can no longer fit allocation in the writeback path in the stack space provided on x86_64. To alleviate this problem, introduce an allocation workqueue and move all allocations to a seperate context. This can be easily added as an interposing layer into xfs_alloc_vextent(), which takes a single argument structure and does not return until the allocation is complete or has failed. To do this, add a work structure and a completion to the allocation args structure. This allows xfs_alloc_vextent to queue the args onto the workqueue and wait for it to be completed by the worker. This can be done completely transparently to the caller. The worker function needs to ensure that it sets and clears the PF_TRANS flag appropriately as it is being run in an active transaction context. Work can also be queued in a memory reclaim context, so a rescuer is needed for the workqueue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-22xfs: Fix open flag handling in open_by_handle codeDave Chinner
Sparse identified some unsafe handling of open flags in the xfs open by handle ioctl code. Update the code to use the correct access macros to ensure that we handle the open flags correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-22xfs: fix deadlock in xfs_rtfree_extentKamal Dasu
To fix the deadlock caused by repeatedly calling xfs_rtfree_extent - removed xfs_ilock() and xfs_trans_ijoin() from xfs_rtfree_extent(), instead added asserts that the inode is locked and has an inode_item attached to it. - in xfs_bunmapi() when dealing with an inode with the rt flag call xfs_ilock() and xfs_trans_ijoin() so that the reference count is bumped on the inode and attached it to the transaction before calling into xfs_bmap_del_extent, similar to what we do in xfs_bmap_rtalloc. Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-22fs: xfs: fix section mismatch in linux-nextGerard Snitselaar
xfs_qm_exit() is called in init_xfs_fs(). Signed-off-by: Gerard Snitselaar <dev@snitselaar.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-20switch open-coded instances of d_make_root() to new helperAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}Al Viro
New field of struct super_block - ->s_max_links. Maximal allowed value of ->i_nlink or 0; in the latter case all checks still need to be done in ->link/->mkdir/->rename instances. Note that this limit applies both to directoris and to non-directories. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-15xfs: fallback to vmalloc for large buffers in xfs_getbmapDave Chinner
xfs_getbmap uses for a large buffer for extents, which is kmalloc'd. This can fail after the system has been running for some time as it is a high order allocation. Add a fallback to vmalloc so that it doesn't require contiguous memory and so won't randomly fail on files with large extent lists. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-15xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_getDave Chinner
xfsdump uses for a large buffer for extended attributes, which has a kmalloc'd shadow buffer in the kernel. This can fail after the system has been running for some time as it is a high order allocation. Add a fallback to vmalloc so that it doesn't require contiguous memory and so won't randomly fail while xfsdump is running. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-15xfs: remove remaining scraps of struct xfs_iomapDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-15xfs: fix inode lookup raceDave Chinner
When we get concurrent lookups of the same inode that is not in the per-AG inode cache, there is a race condition that triggers warnings in unlock_new_inode() indicating that we are initialising an inode that isn't in a the correct state for a new inode. When we do an inode lookup via a file handle or a bulkstat, we don't serialise lookups at a higher level through the dentry cache (i.e. pathless lookup), and so we can get concurrent lookups of the same inode. The race condition is between the insertion of the inode into the cache in the case of a cache miss and a concurrently lookup: Thread 1 Thread 2 xfs_iget() xfs_iget_cache_miss() xfs_iread() lock radix tree radix_tree_insert() rcu_read_lock radix_tree_lookup lock inode flags XFS_INEW not set igrab() unlock inode flags rcu_read_unlock use uninitialised inode ..... lock inode flags set XFS_INEW unlock inode flags unlock radix tree xfs_setup_inode() inode flags = I_NEW unlock_new_inode() WARNING as inode flags != I_NEW This can lead to inode corruption, inode list corruption, etc, and is generally a bad thing to occur. Fix this by setting XFS_INEW before inserting the inode into the radix tree. This will ensure any concurrent lookup will find the new inode with XFS_INEW set and that forces the lookup to wait until the XFS_INEW flag is removed before allowing the lookup to succeed. cc: <stable@vger.kernel.org> # for 3.0.x, 3.2.x Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: clean up minor sparse warningsDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: remove the global xfs_Gqm structureChristoph Hellwig
If we initialize the slab caches for the quota code when XFS is loaded there is no need for a global and reference counted quota manager structure. Drop all this overhead and also fix the error handling during quota initialization. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: remove the per-filesystem list of dquotsChristoph Hellwig
Instead of keeping a separate per-filesystem list of dquots we can walk the radix tree for the two places where we need to iterate all quota structures. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: use per-filesystem radix trees for dquot lookupChristoph Hellwig
Replace the global hash tables for looking up in-memory dquot structures with per-filesystem radix trees to allow scaling to a large number of in-memory dquot structures. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: per-filesystem dquot LRU listsChristoph Hellwig
Replace the global dquot lru lists with a per-filesystem one. Note that the shrinker isn't wire up to the per-superblock VFS shrinker infrastructure as would have problems summing up and splitting the counts for inodes and dquots. I don't think this is a major problem as the quota cache isn't as interwinded with the inode cache as the dentry cache is, because an inode that is dropped from the cache will generally release a dquot reference, but most of the time it won't be the last one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-14xfs: use common code for quota statisticsChristoph Hellwig
Switch the quota code over to use the generic XFS statistics infrastructure. While the legacy /proc/fs/xfs/xqm and /proc/fs/xfs/xqmstats interfaces are preserved for now the statistics that still have a meaning with the current code are now also available from /proc/fs/xfs/stats. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-13xfs: reimplement fdatasync supportChristoph Hellwig
Add an in-memory only flag to say we logged timestamps only, and use it to check if fdatasync can optimize away the log force. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-13xfs: split in-core and on-disk inode log item fieldsChristoph Hellwig
Add a new ili_fields member to the inode log item to isolate the in-memory flags from the ones that actually go to the log. This will allow tracking timestamp-only updates for fdatasync and O_DSYNC in the next patch and prepares for divorcing the on-disk log format from the in-memory log item a little further down the road. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-13xfs: make xfs_inode_item_size idempotentChristoph Hellwig
Move all code messing with the inode log item flags into xfs_inode_item_format to make sure xfs_inode_item_size really only calculates the the number of vectors, but doesn't modify any state of the inode item. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-13xfs: log timestamp updatesChristoph Hellwig
Timestamps on regular files are the last metadata that XFS does not update transactionally. Now that we use the delaylog mode exclusively and made the log scode scale extremly well there is no need to bypass that code for timestamp updates. Logging all updates allows to drop a lot of code, and will allow for further performance improvements later on. Note that this patch drops optimized handling of fdatasync - it will be added back in a separate commit. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-13xfs: log file size updates at I/O completion timeChristoph Hellwig
Do not use unlogged metadata updates and the VFS dirty bit for updating the file size after writeback. In addition to causing various problems with updates getting delayed for far too long this also drags in the unscalable VFS dirty tracking, and is one of the few remaining unlogged metadata updates. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-05xfs: log file size updates as part of unwritten extent conversionChristoph Hellwig
If we convert and unwritten extent past the current i_size log the size update as part of the extent manipulation transactions instead of doing an unlogged metadata update later. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-05xfs: do not require an ioend for new EOF calculationChristoph Hellwig
Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that doesn't require and ioend, and is available also outside of xfs_aops.c. Also make the code a bit more clear by using a normal if statement instead of a slightly misleading MIN(). Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-03-05xfs: use per-filesystem I/O completion workqueuesChristoph Hellwig
The new concurrency managed workqueues are cheap enough that we can create per-filesystem instead of global workqueues. This allows us to remove the trylock or defer scheme on the ilock, which is not helpful once we have outstanding log reservations until finishing a size update. Also allow the default concurrency on this workqueues so that I/O completions blocking on the ilock for one inode do not block process for another inode. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-29xfs: include reservations in quota reportingChristoph Hellwig
Report all quota usage including the currently pending reservations. This avoids the need to flush delalloc space before gathering quota information, and matches quota enforcement, which already takes the reservations into account. This fixes xfstests 270. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-29xfs: merge xfs_qm_export_dquot into xfs_qm_scall_getquotaChristoph Hellwig
The is no good reason to have these two separate, and for the next change we would need the full struct xfs_dquot in xfs_qm_export_dquot, so better just fold the code now instead of changing it spuriously. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-25xfs: only take the ILOCK in xfs_reclaim_inode()Alex Elder
At the end of xfs_reclaim_inode(), the inode is locked in order to we wait for a possible concurrent lookup to complete before the inode is freed. This synchronization step was taking both the ILOCK and the IOLOCK, but the latter was causing lockdep to produce reports of the possibility of deadlock. It turns out that there's no need to acquire the IOLOCK at this point anyway. It may have been required in some earlier version of the code, but there should be no need to take the IOLOCK in xfs_iget(), so there's no (longer) any need to get it here for synchronization. Add an assertion in xfs_iget() as a reminder of this assumption. Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested no longer including the IOLOCK. I just put together the patch. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: split and cleanup xfs_log_reserveChristoph Hellwig
Split the log regrant case out of xfs_log_reserve into a separate function, and merge xlog_grant_log_space and xlog_regrant_write_log_space into their respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily got misused before the previous cleanups with a simple boolean parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: share code for grant head availability checksChristoph Hellwig
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: share code for grant head wakeupsChristoph Hellwig
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: share code for grant head waitingChristoph Hellwig
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: add xlog_grant_head_wake_allChristoph Hellwig
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: add xlog_grant_head_initChristoph Hellwig
Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: add the xlog_grant_head structureChristoph Hellwig
Add a new data structure to allow sharing code between the log grant and regrant code. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-22xfs: remove log space waitqueuesChristoph Hellwig
The tic->t_wait waitqueues can never have more than a single waiter on them, so we can easily replace them with a task_struct pointer and wake_up_process. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>