linux-toradex.git/fs/xfs, branch v3.2.5

xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()

2012-02-03T17:21:27+00:00

commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream.

Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.

CC: Carlos Maiolino 
Signed-off-by: Jan Kara 
Reviewed-by: Alex Elder 
Reviewed-by: Dave Chinner 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix endian conversion issue in discard code

2012-01-26T00:13:55+00:00

commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream

When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix acl count validation in xfs_acl_from_disk()

2012-01-12T19:29:46+00:00

commit 093019cf1b18dd31b2c3b77acce4e000e2cbc9ce upstream.

Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption.  "count" can go negative and bypass the check.

Signed-off-by: Xi Wang 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: log all dirty inodes in xfs_fs_sync_fs

2011-12-23T22:41:47+00:00

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Tested-by: Mark Tinguely 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers

xfs: log the inode in ->write_inode calls for kupdate

2011-12-23T22:41:47+00:00

If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Tested-by: Mark Tinguely 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers

xfs: fix the logspace waiting algorithm

2011-12-06T20:19:47+00:00

Apply the scheme used in log_regrant_write_log_space to wake up any other
threads waiting for log space before the newly added one to
log_regrant_write_log_space as well, and factor the code into readable
helpers.  For each of the queues we have add two helpers:

 - one to try to wake up all waiting threads.  This helper will also be
   usable by xfs_log_move_tail once we remove the current opportunistic
   wakeups in it.
 - one to sleep on t_wait until enough log space is available, loosely
   modelled after Linux waitqueues.
 
And use them to reimplement the guts of log_regrant_write_log_space and
log_regrant_write_log_space.  These two function now use one and the same
algorithm for waiting on log space instead of subtly different ones before,
with an option to completely unify them in the near future.

Also move the filesystem shutdown handling to the common caller given
that we had to touch it anyway.

Based on hard debugging and an earlier patch from
Chandra Seetharaman .

Signed-off-by: Christoph Hellwig 
Reviewed-by: Chandra Seetharaman 
Tested-by: Chandra Seetharaman 
Signed-off-by: Ben Myers

xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels

2011-12-06T16:46:23+00:00

The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels.  We have the full
inode number in the XFS inode, so use that one for nfs exports.  Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.

Reported-by: Guoquan Yang 
Reported-by: Hank Peng 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Ben Myers

xfs: fix allocation length overflow in xfs_bmapi_write()

2011-12-02T22:24:02+00:00

When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():

XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239

The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.

This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.

Fix it by limiting the extent lenth to allocate to MAXEXTLEN.

Signed-off-by: Dave Chinner 
Signed-off-by: Ben Myers 
Reviewed-by: Christoph Hellwig

xfs: fix attr2 vs large data fork assert

2011-11-29T19:03:12+00:00

With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute.  It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some loose bits.

Reviewed-by: Dave Chinner 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Ben Myers

xfs: force buffer writeback before blocking on the ilock in inode reclaim

2011-11-29T18:06:14+00:00

If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim.  So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.

The scheme is copied from what we do for dquot reclaims.

Reported-by: Simon Kirby 
Signed-off-by: Christoph Hellwig 
Tested-by: Simon Kirby 
Signed-off-by: Ben Myers