linux-toradex.git/fs, branch v4.4.105

Revert "ocfs2: should wait dio before inode lock in ocfs2_setattr()"

2017-12-09T17:42:43+00:00

This reverts commit c4baa4a5870cb02f713def1620052bfca7a82bbb which is
commit 28f5a8a7c033cbf3e32277f4cc9c6afd74f05300 upstream.

It shouldn't be applied to the 4.4-stable tree.

Ben and Alex write:

> Now that ocfs2_setattr() calls this outside of the inode locked region,
> what prevents another task adding a new dio request immediately
> afterward?
>

In the kernel 4.6, firstly, we use the inode_lock() in do_truncate() to
prevent another bio to be issued from this node.
Furthermore, we use the ocfs2_rw_lock() and ocfs2_inode_lock() in ocfs2_setattr()
to guarantee no more bio will be issued from the other nodes in this cluster.

> Also, ocfs2_dio_end_io_write() was introduced in 4.6 and it looks like
> the dio completion path didn't previously take the inode lock.  So it
> doesn't look this fix is needed in 3.18 or 4.4.

Yes, ocfs2_dio_end_io_write() was introduced in 4.6 and the problem this patch
fixes is only exist in the kernel 4.6 and above 4.6.

Reported-by: Ben Hutchings 
Cc: Alex Chen 
Cc: Jun Piao 
Cc: Joseph Qi 
Cc: Changwei Ge 
Cc: Mark Fasheh 
Cc: Joel Becker 
Cc: Junxiao Bi 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

NFSv4: Fix client recovery when server reboots multiple times

2017-12-09T17:42:42+00:00

[ Upstream commit c6180a6237174f481dc856ed6e890d8196b6f0fb ]

If the server reboots multiple times, the client should rely on the
server to tell it that it cannot reclaim state as per section 9.6.3.4
in RFC7530 and section 8.4.2.1 in RFC5661.
Currently, the client is being to conservative, and is assuming that
if the server reboots while state recovery is in progress, then it must
ignore state that was not recovered before the reboot.

Signed-off-by: Trond Myklebust 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nfs: Don't take a reference on fl->fl_file for LOCK operation

2017-12-09T17:42:42+00:00

[ Upstream commit 4b09ec4b14a168bf2c687e1f598140c3c11e9222 ]

I have reports of a crash that look like __fput() was called twice for
a NFSv4.0 file.  It seems possible that the state manager could try to
reclaim a lock and take a reference on the fl->fl_file at the same time the
file is being released if, during the close(), a signal interrupts the wait
for outstanding IO while removing locks which then skips the removal
of that lock.

Since 83bfff23e9ed ("nfs4: have do_vfs_lock take an inode pointer") has
removed the need to traverse fl->fl_file->f_inode in nfs4_lock_done(),
taking that reference is no longer necessary.

Signed-off-by: Benjamin Coddington 
Reviewed-by: Jeff Layton 
Signed-off-by: Trond Myklebust 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

nfsd: Fix another OPEN stateid race

2017-12-05T10:22:52+00:00

commit d8a1a000555ecd1b824ac1ed6df8fe364dfbbbb0 upstream.

If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.

Reported-by: Andrew W Elble 
Signed-off-by: Trond Myklebust 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

nfsd: Fix stateid races between OPEN and CLOSE

2017-12-05T10:22:52+00:00

commit 15ca08d3299682dc49bad73251677b2c5017ef08 upstream.

Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.

Signed-off-by: Trond Myklebust 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

nfsd: Make init_open_stateid() a bit more whole

2017-12-05T10:22:52+00:00

commit 8c7245abda877d4689b3371db8ae2a4400d7d9ce upstream.

Move the state selection logic inside from the caller,
always making it return correct stp to use.

Signed-off-by: J . Bruce Fields 
Signed-off-by: Oleg Drokin 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

NFS: revalidate "." etc correctly on "open".

2017-12-05T10:22:51+00:00

commit b688741cb06695312f18b730653d6611e1bad28d upstream.

For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.

Since commit ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).

Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.

This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate().  This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.

The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.

Fixes: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable@vger.kernel.org (3.9+)
Signed-off-by: NeilBrown 
Signed-off-by: Anna Schumaker 
Signed-off-by: Greg Kroah-Hartman

btrfs: clear space cache inode generation always

2017-12-05T10:22:50+00:00

commit 8e138e0d92c6c9d3d481674fb14e3439b495be37 upstream.

We discovered a box that had double allocations, and suspected the space
cache may be to blame.  While auditing the write out path I noticed that
if we've already setup the space cache we will just carry on.  This
means that any error we hit after cache_save_setup before we go to
actually write the cache out we won't reset the inode generation, so
whatever was already written will be considered correct, except it'll be
stale.  Fix this by _always_ resetting the generation on the block group
inode, this way we only ever have valid or invalid cache.

With this patch I was no longer able to reproduce cache corruption with
dm-log-writes and my bpf error injection tool.

Signed-off-by: Josef Bacik 
Signed-off-by: David Sterba 
Signed-off-by: Greg Kroah-Hartman

btrfs: return the actual error value from from btrfs_uuid_tree_iterate

2017-11-30T08:37:28+00:00

[ Upstream commit 73ba39ab9307340dc98ec3622891314bbc09cc2e ]

In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret
on errors. However, it directly returns 0. It may be better to return
ret. This patch also removes the warning, because the caller already
prints a warning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
Signed-off-by: Pan Bian 
Reviewed-by: Omar Sandoval 
[ edited subject ]
Signed-off-by: David Sterba 

Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

fscrypt: lock mutex before checking for bounce page pool

2017-11-30T08:37:25+00:00

commit a0b3bc855374c50b5ea85273553485af48caf2f7 upstream.

fscrypt_initialize(), which allocates the global bounce page pool when
an encrypted file is first accessed, uses "double-checked locking" to
try to avoid locking fscrypt_init_mutex.  However, it doesn't use any
memory barriers, so it's theoretically possible for a thread to observe
a bounce page pool which has not been fully initialized.  This is a
classic bug with "double-checked locking".

While "only a theoretical issue" in the latest kernel, in pre-4.8
kernels the pointer that was checked was not even the last to be
initialized, so it was easily possible for a crash (NULL pointer
dereference) to happen.  This was changed only incidentally by the large
refactor to use fs/crypto/.

Solve both problems in a trivial way that can easily be backported: just
always take the mutex.  It's theoretically less efficient, but it
shouldn't be noticeable in practice as the mutex is only acquired very
briefly once per encrypted file.

Later I'd like to make this use a helper macro like DO_ONCE().  However,
DO_ONCE() runs in atomic context, so we'd need to add a new macro that
allows blocking.

Signed-off-by: Eric Biggers 
Signed-off-by: Theodore Ts'o 
Signed-off-by: Greg Kroah-Hartman