linux-toradex.git/fs, branch v4.1.5

xfs: remote attributes need to be considered data

2015-08-10T19:21:59+00:00

commit df150ed102baa0e78c06e08e975dfb47147dd677 upstream.

We don't log remote attribute contents, and instead write them
synchronously before we commit the block allocation and attribute
tree update transaction. As a result we are writing to the allocated
space before the allcoation has been made permanent.

As a result, we cannot consider this allocation to be a metadata
allocation. Metadata allocation can take blocks from the free list
and so reuse them before the transaction that freed the block is
committed to disk. This behaviour is perfectly fine for journalled
metadata changes as log recovery will ensure the free operation is
replayed before the overwrite, but for remote attribute writes this
is not the case.

Hence we have to consider the remote attribute blocks to contain
data and allocate accordingly. We do this by dropping the
XFS_BMAPI_METADATA flag from the block allocation. This means the
allocation will not use blocks that are on the busy list without
first ensuring that the freeing transaction has been committed to
disk and the blocks removed from the busy list. This ensures we will
never overwrite a freed block without first ensuring that it is
really free.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

xfs: remote attribute headers contain an invalid LSN

2015-08-10T19:21:59+00:00

commit e3c32ee9e3e747fec01eb38e6610a9157d44c3ea upstream.

In recent testing, a system that crashed failed log recovery on
restart with a bad symlink buffer magic number:

XFS (vda): Starting recovery (logdev: internal)
XFS (vda): Bad symlink block magic!
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060

On examination of the log via xfs_logprint, none of the symlink
buffers in the log had a bad magic number, nor were any other types
of buffer log format headers mis-identified as symlink buffers.
Tracing was used to find the buffer the kernel was tripping over,
and xfs_db identified it's contents as:

000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
.....

This is a remote attribute buffer, which are notable in that they
are not logged but are instead written synchronously by the remote
attribute code so that they exist on disk before the attribute
transactions are committed to the journal.

The above remote attribute block has an invalid LSN in it - cycle
0xd002000, block 0 - which means when log recovery comes along to
determine if the transaction that writes to the underlying block
should be replayed, it sees a block that has a future LSN and so
does not replay the buffer data in the transaction. Instead, it
validates the buffer magic number and attaches the buffer verifier
to it.  It is this buffer magic number check that is failing in the
above assert, indicating that we skipped replay due to the LSN of
the underlying buffer.

The problem here is that the remote attribute buffers cannot have a
valid LSN placed into them, because the transaction that contains
the attribute tree pointer changes and the block allocation that the
attribute data is being written to hasn't yet been committed. Hence
the LSN field in the attribute block is completely unwritten,
thereby leaving the underlying contents of the block in the LSN
field. It could have any value, and hence a future overwrite of the
block by log recovery may or may not work correctly.

Fix this by always writing an invalid LSN to the remote attribute
block, as any buffer in log recovery that needs to write over the
remote attribute should occur. We are protected from having old data
written over the attribute by the fact that freeing the block before
the remote attribute is written will result in the buffer being
marked stale in the log and so all changes prior to the buffer stale
transaction will be cancelled by log recovery.

Hence it is safe to ignore the LSN in the case or synchronously
written, unlogged metadata such as remote attribute blocks, and to
ensure we do that correctly, we need to write an invalid LSN to all
remote attribute blocks to trigger immediate recovery of metadata
that is written over the top.

As a further protection for filesystems that may already have remote
attribute blocks with bad LSNs on disk, change the log recovery code
to always trigger immediate recovery of metadata over remote
attribute blocks.

Signed-off-by: Dave Chinner 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Signed-off-by: Greg Kroah-Hartman

NFS: Fix a memory leak in nfs_do_recoalesce

2015-08-10T19:21:58+00:00

commit 03d5eb65b53889fe98a5ecddfe205c16e3093190 upstream.

If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().

Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer")
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman

NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked

2015-08-10T19:21:58+00:00

commit 3c38cbe2ade88240fabb585b408f779ad3b9a31b upstream.

Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
instead of the correct open stateid.

Fixes: f95549cf24660 ("NFSv4: More CLOSE/OPEN races")
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman

NFS: Don't revalidate the mapping if both size and change attr are up to date

2015-08-10T19:21:58+00:00

commit 85a23cee3f2c928475f31777ead5a71340a12fc3 upstream.

If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.

Fixes: f2467b6f64da ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman

mnt: In detach_mounts detach the appropriate unmounted mount

2015-08-10T19:21:55+00:00

commit fe78fcc85a2046c51f1535710996860557eeec20 upstream.

The handling of in detach_mounts of unmounted but connected mounts is
buggy and can lead to an infinite loop.

Correct the handling of unmounted mounts in detach_mount.  When the
mountpoint of an unmounted but connected mount is connected to a
dentry, and that dentry is deleted we need to disconnect that mount
from the parent mount and the deleted dentry.

Nothing changes for the unmounted and connected children.  They can be
safely ignored.

Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

mnt: Clarify and correct the disconnect logic in umount_tree

2015-08-10T19:21:54+00:00

commit f2d0a123bcf16d1a9cf7942ddc98e0ef77862c2b upstream.

rmdir mntpoint will result in an infinite loop when there is
a mount locked on the mountpoint in another mount namespace.

This is because the logic to test to see if a mount should
be disconnected in umount_tree is buggy.

Move the logic to decide if a mount should remain connected to
it's mountpoint into it's own function disconnect_mount so that
clarity of expression instead of terseness of expression becomes
a virtue.

When the conditions where it is invalid to leave a mount connected
are first ruled out, the logic for deciding if a mount should
be disconnected becomes much clearer and simpler.

Fixes: e0c9c0afd2fc958ffa34b697972721d81df8a56f mnt: Update detach_mounts to leave mounts connected
Fixes: ce07d891a0891d3c0d0c2d73d577490486b809e1 mnt: Honor MNT_LOCKED when detaching mounts
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

freeing unlinked file indefinitely delayed

2015-08-10T19:21:51+00:00

commit 75a6f82a0d10ef8f13cd8fe7212911a0252ab99e upstream.

	Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course).  However, there's one case where that
does *not* happen.  Namely, if you open it by fhandle with cold dcache,
then unlink() and close().

	In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal.  In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()).  The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in.  As the result, we have the final iput() delayed
indefinitely.  It's trivial to reproduce -

void flush_dcache(void)
{
        system("mount -o remount,rw /");
}

static char buf[20 * 1024 * 1024];

main()
{
        int fd;
        union {
                struct file_handle f;
                char buf[MAX_HANDLE_SZ];
        } x;
        int m;

        x.f.handle_bytes = sizeof(x);
        chdir("/root");
        mkdir("foo", 0700);
        fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
        close(fd);
        name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
        flush_dcache();
        fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
        unlink("foo/bar");
        write(fd, buf, sizeof(buf));
        system("df .");			/* 20Mb eaten */
        close(fd);
        system("df .");			/* should've freed those 20Mb */
        flush_dcache();
        system("df .");			/* should be the same as #2 */
}

will spit out something like
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 283282     21692  93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).

Acked-by: J. Bruce Fields 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead

2015-08-03T16:29:19+00:00

commit a28e4b2b18ccb90df402da3f21e1a83c9d4f8ec1 upstream.

Removing unnecessary static buffers is good.
Use the vsprintf %pV extension instead.

Signed-off-by: Joe Perches 
Signed-off-by: Mikulas Patocka 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

hpfs: kstrdup() out of memory handling

2015-08-03T16:29:19+00:00

commit ce657611baf902f14ae559ce4e0787ead6712067 upstream.

There is a possibility of nothing being allocated to the new_opts in
case of memory pressure, therefore return ENOMEM for such case.

Signed-off-by: Sanidhya Kashyap 
Signed-off-by: Mikulas Patocka 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman