linux-toradex.git/fs, branch v3.18.13

xfs: ensure truncate forces zeroed blocks to disk

2015-04-24T21:14:13+00:00

[ Upstream commit 5885ebda878b47c4b4602d4b0410cb4b282af024 ]

A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.

Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.

Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.

Reported-by: Liu Bo 
cc: 
Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Sasha Levin

ext4: fix indirect punch hole corruption

2015-04-24T21:14:13+00:00

[ Upstream commit 6f30b7e37a8239f9d27db626a1d3427bc7951908 ]

Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are bugs in several corner cases. This fixes 5
distinct bugs:

1. When there is at least one entire level of indirection between the
start and end of the punch range and the end of the punch range is the
first block of its level, we can't return early; we have to free the
intervening levels.

2. When the end is at a higher level of indirection than the start and
ext4_find_shared returns a top branch for the end, we still need to free
the rest of the shared branch it returns; we can't decrement partial2.

3. When a punch happens within one level of indirection, we need to
converge on an indirect block that contains the start and end. However,
because the branches returned from ext4_find_shared do not necessarily
start at the same level (e.g., the partial2 chain will be shallower if
the last block occurs at the beginning of an indirect group), the walk
of the two chains can end up "missing" each other and freeing a bunch of
extra blocks in the process. This mismatch can be handled by first
making sure that the chains are at the same level, then walking them
together until they converge.

4. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the start, we must free it,
but only if the end does not occur within that branch.

5. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the end, then we shouldn't
free the block referenced by the end of the returned chain (this mirrors
the different levels case).

Signed-off-by: Omar Sandoval 
Signed-off-by: Sasha Levin

ioctx_alloc(): fix vma (and file) leak on failure

2015-04-24T21:14:06+00:00

[ Upstream commit deeb8525f9bcea60f5e86521880c1161de7a5829 ]

If we fail past the aio_setup_ring(), we need to destroy the
mapping.  We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.

Reproducer (based on one by Joe Mario ):

void count(char *p)
{
	char s[80];
	printf("%s: ", p);
	fflush(stdout);
	sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
	system(s);
}

int main()
{
	io_context_t *ctx;
	int created, limit, i, destroyed;
	FILE *f;

	count("before");
	if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
		perror("opening aio-max-nr");
	else if (fscanf(f, "%d", &limit) != 1)
		fprintf(stderr, "can't parse aio-max-nr\n");
	else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
		perror("allocating aio_context_t array");
	else {
		for (i = 0, created = 0; i < limit; i++) {
			if (io_setup(1000, ctx + created) == 0)
				created++;
		}
		for (i = 0, destroyed = 0; i < created; i++)
			if (io_destroy(ctx[i]) == 0)
				destroyed++;
		printf("created %d, failed %d, destroyed %d\n",
			created, limit - created, destroyed);
		count("after");
	}
}

Found-by: Joe Mario 
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin

ocfs2: _really_ sync the right range

2015-04-24T21:14:05+00:00

[ Upstream commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a ]

"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.

Spotted by Joseph Qi  back in January;
unfortunately, I'd missed his mail back then ;-/

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
Signed-off-by: Sasha Levin

cifs: fix use-after-free bug in find_writable_file

2015-04-24T21:14:01+00:00

[ Upstream commit e1e9bda22d7ddf88515e8fe401887e313922823e ]

Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:

Thread 1                                        Thread 2
========                                        ========

inv_file = NULL
refind = 0
spin_lock(&cifs_file_list_lock)

// invalidHandle found on openFileList

inv_file = open_file
// inv_file->count currently 1

cifsFileInfo_get(inv_file)
// inv_file->count = 2

spin_unlock(&cifs_file_list_lock);

cifs_reopen_file()                            cifs_close()
// fails (rc != 0)                            ->cifsFileInfo_put()
                                       spin_lock(&cifs_file_list_lock)
                                       // inv_file->count = 1
                                       spin_unlock(&cifs_file_list_lock)

spin_lock(&cifs_file_list_lock);
list_move_tail(&inv_file->flist,
      &cifs_inode->openFileList);
spin_unlock(&cifs_file_list_lock);

cifsFileInfo_put(inv_file);
->spin_lock(&cifs_file_list_lock)

  // inv_file->count = 0
  list_del(&cifs_file->flist);
  // cleanup!!
  kfree(cifs_file);

  spin_unlock(&cifs_file_list_lock);

spin_lock(&cifs_file_list_lock);
++refind;
// refind = 1
goto refind_writable;

At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.

Signed-off-by: David Disseldorp 
Reviewed-by: Jeff Layton 
CC: 
Signed-off-by: Steve French 
Signed-off-by: Sasha Levin

cifs: smb2_clone_range() - exit on unhandled error

2015-04-24T21:14:00+00:00

[ Upstream commit 2477bc58d49edb1c0baf59df7dc093dce682af2b ]

While attempting to clone a file on a samba server, we receive a
STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
isn't handled in smb2_clone_range(). We end up looping in the while loop
making same call to the samba server over and over again.

The proposed fix is to exit and return the error value when encountered
with an unhandled error.

Cc: 
Signed-off-by: Sachin Prabhu 
Signed-off-by: Steve French 
Signed-off-by: Steve French 
Signed-off-by: Sasha Levin

nfsd: return correct lockowner when there is a race on hash insert

2015-04-23T18:58:27+00:00

[ Upstream commit 340f0ba1c6c8412aa35fd6476044836b84361ea6 ]

alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

Noticed by inspection after Jeff Layton fixed the same bug for open
owners.  Depending on client behavior, this one may be trickier to
trigger in practice.

Fixes: c58c6610ec24 "nfsd: Protect adding/removing lock owners using client_lock"
Cc: 
Cc: Trond Myklebust 
Acked-by: Jeff Layton 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Sasha Levin

nfsd: return correct openowner when there is a race to put one in the hash

2015-04-23T18:58:26+00:00

[ Upstream commit c5952338bfc234e54deda45b7228f610a545e28a ]

alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

In commit 7ffb588086e9, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).

This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner.  That
has changed now with v4.0 kernels, and this race can now easily occur.

Fixes: 7ffb588086e9
Cc:  # v3.17+
Cc: Trond Myklebust 
Reported-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Jeff Layton 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Sasha Levin

btrfs: simplify insert_orphan_item

2015-04-23T18:58:24+00:00

[ Upstream commit 9c4f61f01d269815bb7c37be3ede59c5587747c6 ]

We can search and add the orphan item in one go,
btrfs_insert_orphan_item will find out if the item already exists.

Signed-off-by: David Sterba 
Signed-off-by: Sasha Levin

hfsplus: fix B-tree corruption after insertion at position 0

2015-04-17T00:13:14+00:00

[ Upstream commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f ]

Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().  In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated.  This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.

The resulting first leaf node is correct:

  ----------------------------------------------------
  | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
  ----------------------------------------------------

But the parent index key0 still contains the previous key CNID=123:

  -----------------------
  | key0.CNID=123 | ... |
  -----------------------

A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().

Along the way, I removed duplicate code with unification of the if
condition.  The resulting code is equivalent to the original code
because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value.  However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch.  brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.

Signed-off-by: Sergei Antonov 
Cc: Joe Perches 
Reviewed-by: Vyacheslav Dubeyko 
Acked-by: Hin-Tak Leung 
Cc: Anton Altaparmakov 
Cc: Al Viro 
Cc: Christoph Hellwig 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Sasha Levin