linux-toradex.git/fs, branch v3.15.10

xfs: log vector rounding leaks log space

2014-08-14T01:51:50+00:00

commit 110dc24ad2ae4e9b94b08632fe1eb2fcdff83045 upstream.

The addition of direct formatting of log items into the CIL
linear buffer added alignment restrictions that the start of each
vector needed to be 64 bit aligned. Hence padding was added in
xlog_finish_iovec() to round up the vector length to ensure the next
vector started with the correct alignment.

This adds a small number of bytes to the size of
the linear buffer that is otherwise unused. The issue is that we
then use the linear buffer size to determine the log space used by
the log item, and this includes the unused space. Hence when we
account for space used by the log item, it's more than is actually
written into the iclogs, and hence we slowly leak this space.

This results on log hangs when reserving space, with threads getting
stuck with these stack traces:

Call Trace:
[] schedule+0x29/0x70
[] xlog_grant_head_wait+0xa2/0x1a0
[] xlog_grant_head_check+0xbd/0x140
[] xfs_log_reserve+0x103/0x220
[] xfs_trans_reserve+0x2f5/0x310
.....

The 4 bytes is significant. Brain Foster did all the hard work in
tracking down a reproducable leak to inode chunk allocation (it went
away with the ikeep mount option). His rough numbers were that
creating 50,000 inodes leaked 11 log blocks. This turns out to be
roughly 800 inode chunks or 1600 inode cluster buffers. That
works out at roughly 4 bytes per cluster buffer logged, and at that
I started looking for a 4 byte leak in the buffer logging code.

What I found was that a struct xfs_buf_log_format structure for an
inode cluster buffer is 28 bytes in length. This gets rounded up to
32 bytes, but the vector length remains 28 bytes. Hence the CIL
ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
for that vector rather than 28 bytes which are written into the log.

The fix for this problem is to separately track the bytes used by
the log vectors in the item and use that instead of the buffer
length when accounting for the log space that will be used by the
formatted log item.

Again, thanks to Brian Foster for doing all the hard work and long
hours to isolate this leak and make finding the bug relatively
simple.

Signed-off-by: Dave Chinner 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Brian Foster 
Signed-off-by: Dave Chinner 
Cc: Bill 
Signed-off-by: Greg Kroah-Hartman

vfs: fix check for fallocate on active swapfile

2014-08-07T23:53:53+00:00

commit 6d2b6170c8914c6c69256b687651fb16d7ec3e18 upstream.

Fix the broken check for calling sys_fallocate() on an active swapfile,
introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate
operation on active swapfile").

Signed-off-by: Eric Biggers 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

fs: umount on symlink leaks mnt count

2014-07-31T19:44:08+00:00

commit 295dc39d941dc2ae53d5c170365af4c9d5c16212 upstream.

Currently umount on symlink blocks following umount:

/vz is separate mount

# ls /vz/ -al | grep test
drwxr-xr-x.  2 root root       4096 Jul 19 01:14 testdir
lrwxrwxrwx.  1 root root         11 Jul 19 01:16 testlink -> /vz/testdir
# umount -l /vz/testlink
umount: /vz/testlink: not mounted (expected)

# lsof /vz
# umount /vz
umount: /vz: device is busy. (unexpected)

In this case mountpoint_last() gets an extra refcount on path->mnt

Signed-off-by: Vasily Averin 
Acked-by: Ian Kent 
Acked-by: Jeff Layton 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Greg Kroah-Hartman

fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT

2014-07-31T19:44:08+00:00

commit d7afaec0b564f0609e116f562983b8e72fc3e9c9 upstream.

Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.

This amends the following commit introduced in 3.14:

  7678ac50615d  fuse: support clients that don't implement 'open'

However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.

Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

fuse: s_time_gran fix

2014-07-31T19:44:08+00:00

commit a800bad36619ce47ac0222004635448e6c91ff72 upstream.

Default s_time_gran is 1, don't overwrite that if userspace didn't
explicitly specify one.

Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

coredump: fix the setting of PF_DUMPCORE

2014-07-31T19:44:08+00:00

commit aed8adb7688d5744cb484226820163af31d2499a upstream.

Commit 079148b919d0 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags.  This causes issues during
core generation when tsk->flags is checked again (eg.  for PF_USED_MATH
to dump floating point registers).  Fix this.

Signed-off-by: Silesh C V 
Acked-by: Oleg Nesterov 
Cc: Mandeep Singh Baines 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

nfs: only show Posix ACLs in listxattr if actually present

2014-07-31T19:44:06+00:00

commit 74adf83f5d7720925499b4938f930591f947b660 upstream.

The big ACL switched nfs to use generic_listxattr, which calls all existing
->list handlers.  Add a custom .listxattr implementation that only lists
the ACLs if they actually are present on the given inode.

Signed-off-by: Christoph Hellwig 
Reported-by: Philippe Troin 
Tested-by: Philippe Troin 
Fixes: 013cdf1088d7 (nfs: use generic posix ACL infrastructure ...)
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman

aio: protect reqs_available updates from changes in interrupt handlers

2014-07-28T15:08:28+00:00

commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream.

As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to
have put_reqs_available() called from irq context.  While put_reqs_available()
is per cpu, it did not protect itself from interrupts on the same CPU.  This
lead to aio_complete() corrupting the available io requests count when run
under a heavy O_DIRECT workloads as reported by Robert Elliott.  Fix this by
disabling irq updates around the per cpu batch updates of reqs_available.

Many thanks to Robert and folks for testing and tracking this down.

Reported-by: Robert Elliot 
Tested-by: Robert Elliot 
Signed-off-by: Benjamin LaHaise 
Cc: Jens Axboe , Christoph Hellwig 
Signed-off-by: Greg Kroah-Hartman

quota: missing lock in dqcache_shrink_scan()

2014-07-28T15:08:21+00:00

commit d68aab6b8f572406aa93b45ef6483934dd3b54a6 upstream.

Commit 1ab6c4997e04 (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

Fixes: 1ab6c4997e04a00c50c6d786c2f046adc0d1f5de
Signed-off-by: Niu Yawei 
Signed-off-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

fuse: ignore entry-timeout on LOOKUP_REVAL

2014-07-28T15:08:20+00:00

commit 154210ccb3a871e631bf39fdeb7a8731d98af87b upstream.

The following test case demonstrates the bug:

  sh# mount -t glusterfs localhost:meta-test /mnt/one

  sh# mount -t glusterfs localhost:meta-test /mnt/two

  sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; echo stuff > /mnt/one/file
  bash: /mnt/one/file: Stale file handle

  sh# echo stuff > /mnt/one/file; rm -f /mnt/two/file; sleep 1; echo stuff > /mnt/one/file

On the second open() on /mnt/one, FUSE would have used the old
nodeid (file handle) trying to re-open it. Gluster is returning
-ESTALE. The ESTALE propagates back to namei.c:filename_lookup()
where lookup is re-attempted with LOOKUP_REVAL. The right
behavior now, would be for FUSE to ignore the entry-timeout and
and do the up-call revalidation. Instead FUSE is ignoring
LOOKUP_REVAL, succeeding the revalidation (because entry-timeout
has not passed), and open() is again retried on the old file
handle and finally the ESTALE is going back to the application.

Fix: if revalidation is happening with LOOKUP_REVAL, then ignore
entry-timeout and always do the up-call.

Signed-off-by: Anand Avati 
Reviewed-by: Niels de Vos 
Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman