linux-toradex.git/fs/xfs, branch v3.2.38

xfs: Fix possible use-after-free with AIO

2013-02-06T04:33:49+00:00

commit 4b05d09c18d9aa62d2e7fb4b057f54e5a38963f5 upstream.

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: xfs@oss.sgi.com
CC: Ben Myers 
Signed-off-by: Jan Kara 
Reviewed-by: Ben Myers 
Signed-off-by: Ben Myers 
Signed-off-by: Ben Hutchings

xfs: drop buffer io reference when a bad bio is built

2012-12-06T11:20:25+00:00

commit d69043c42d8c6414fa28ad18d99973aa6c1c2e24 upstream.

Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

Signed-off-by: Dave Chinner 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers 
Signed-off-by: Ben Hutchings

xfs: fix reading of wrapped log data

2012-11-16T16:47:12+00:00

commit 6ce377afd1755eae5c93410ca9a1121dfead7b87 upstream.

Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Reported-by: Torsten Kaiser 
Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers 
Signed-off-by: Ben Hutchings

tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking

2012-10-30T23:26:40+00:00

commit 35c2a7f4908d404c9124c2efc6ada4640ca4d5d5 upstream.

Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
	u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
 [] ? exportfs_decode_fh+0x79/0x2d0
 [] do_handle_open+0x163/0x2c0
 [] sys_open_by_handle_at+0xc/0x10
 [] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin 
Signed-off-by: Hugh Dickins 
Cc: Al Viro 
Cc: Sage Weil 
Cc: Steven Whitehouse 
Cc: Christoph Hellwig 
Signed-off-by: Al Viro 
Signed-off-by: Ben Hutchings

xfs: Fix oops on IO error during xlog_recover_process_iunlinks()

2012-04-02T16:53:06+00:00

commit d97d32edcd732110758799ae60af725e5110b3dc upstream.

When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
  agi = XFS_BUF_TO_AGI(agibp);

Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.

Signed-off-by: Jan Kara 
Reviewed-by: Dave Chinner 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix inode lookup race

2012-04-02T16:52:50+00:00

commit f30d500f809eca67a21704347ab14bb35877b5ee upstream.

When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.

When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.

The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:

Thread 1			Thread 2
xfs_iget()
  xfs_iget_cache_miss()
    xfs_iread()
    lock radix tree
    radix_tree_insert()
				rcu_read_lock
				radix_tree_lookup
				lock inode flags
				XFS_INEW not set
				igrab()
				unlock inode flags
				rcu_read_unlock
				use uninitialised inode
				.....
    lock inode flags
    set XFS_INEW
    unlock inode flags
    unlock radix tree
  xfs_setup_inode()
    inode flags = I_NEW
    unlock_new_inode()
      WARNING as inode flags != I_NEW

This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.

Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()

2012-02-03T17:21:27+00:00

commit 9b025eb3a89e041bab6698e3858706be2385d692 upstream.

Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted
symplink and bailed out. Fix it by jumping to 'out' instead of doing return.

CC: Carlos Maiolino 
Signed-off-by: Jan Kara 
Reviewed-by: Alex Elder 
Reviewed-by: Dave Chinner 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix endian conversion issue in discard code

2012-01-26T00:13:55+00:00

commit b1c770c273a4787069306fc82aab245e9ac72e9d upstream

When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: fix acl count validation in xfs_acl_from_disk()

2012-01-12T19:29:46+00:00

commit 093019cf1b18dd31b2c3b77acce4e000e2cbc9ce upstream.

Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption.  "count" can go negative and bypass the check.

Signed-off-by: Xi Wang 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Ben Myers 
Signed-off-by: Greg Kroah-Hartman

xfs: log all dirty inodes in xfs_fs_sync_fs

2011-12-23T22:41:47+00:00

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dave Chinner 
Tested-by: Mark Tinguely 
Reviewed-by: Mark Tinguely 
Signed-off-by: Ben Myers