linux-toradex.git/fs, branch v3.4.35

pstore: Avoid deadlock in panic and emergency-restart path

2013-03-03T22:06:43+00:00

commit 9f244e9cfd70c7c0f82d3c92ce772ab2a92d9f64 upstream.

[Issue]

When pstore is in panic and emergency-restart paths, it may be blocked
in those paths because it simply takes spin_lock.

This is an example scenario which pstore may hang up in a panic path:

 - cpuA grabs psinfo->buf_lock
 - cpuB panics and calls smp_send_stop
 - smp_send_stop sends IRQ to cpuA
 - after 1 second, cpuB gives up on cpuA and sends an NMI instead
 - cpuA is now in an NMI handler while still holding buf_lock
 - cpuB is deadlocked

This case may happen if a firmware has a bug and
cpuA is stuck talking with it more than one second.

Also, this is a similar scenario in an emergency-restart path:

 - cpuA grabs psinfo->buf_lock and stucks in a firmware
 - cpuB kicks emergency-restart via either sysrq-b or hangcheck timer.
   And then, cpuB is deadlocked by taking psinfo->buf_lock again.

[Solution]

This patch avoids the deadlocking issues in both panic and emergency_restart
paths by introducing a function, is_non_blocking_path(), to check if a cpu
can be blocked in current path.

With this patch, pstore is not blocked even if another cpu has
taken a spin_lock, in those paths by changing from spin_lock_irqsave
to spin_trylock_irqsave.

In addition, according to a comment of emergency_restart() in kernel/sys.c,
spin_lock shouldn't be taken in an emergency_restart path to avoid
deadlock. This patch fits the comment below.


/**
 *      emergency_restart - reboot the system
 *
 *      Without shutting down any hardware or taking any locks
 *      reboot the system.  This is called when we know we are in
 *      trouble so this is our best effort to reboot.  This is
 *      safe to call in interrupt context.
 */
void emergency_restart(void)


Signed-off-by: Seiji Aguchi 
Acked-by: Don Zickus 
Signed-off-by: Tony Luck 
Cc: CAI Qian 
Signed-off-by: Greg Kroah-Hartman

fuse: don't WARN when nlink is zero

2013-03-03T22:06:43+00:00

commit dfca7cebc2679f3d129f8e680a8f199a7ad16e38 upstream.

drop_nlink() warns if nlink is already zero.  This is triggerable by a buggy
userspace filesystem.  The cure, I think, is worse than the disease so disable
the warning.

Reported-by: Tero Roponen 
Signed-off-by: Miklos Szeredi 
Signed-off-by: Greg Kroah-Hartman

nfsd: Fix memleak

2013-03-03T22:06:42+00:00

commit 2d32b29a1c2830f7c42caa8258c714acd983961f upstream.

When free nfs-client, it must free the ->cl_stateids.

Signed-off-by: Jianpeng Ma 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

ext4: fix free clusters calculation in bigalloc filesystem

2013-03-03T22:06:42+00:00

commit 304e220f0879198b1f5309ad6f0be862b4009491 upstream.

ext4_has_free_clusters() should tell us whether there is enough free
clusters to allocate, however number of free clusters in the file system
is converted to blocks using EXT4_C2B() which is not only wrong use of
the macro (we should have used EXT4_NUM_B2C) but it's also completely
wrong concept since everything else is in cluster units.

Moreover when calculating number of root clusters we should be using
macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
off by one. However r_blocks_count should always be a multiple of the
cluster ratio so doing a plain bit shift should be enough here. We
avoid using EXT4_B2C() because it's confusing.

As a result of the first problem number of free clusters is much bigger
than it should have been and ext4_has_free_clusters() would return 1 even
if there is really not enough free clusters available.

Fix this by removing the EXT4_C2B() conversion of free clusters and
using bit shift when calculating number of root clusters. This bug
affects number of xfstests tests covering file system ENOSPC situation
handling. With this patch most of the ENOSPC problems with bigalloc file
system disappear, especially the errors caused by delayed allocation not
having enough space when the actual allocation is finally requested.

Signed-off-by: Lukas Czerner 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

ext4: fix xattr block allocation/release with bigalloc

2013-03-03T22:06:42+00:00

commit 1231b3a1eb5740192aeebf5344dd6d6da000febf upstream.

Currently when new xattr block is created or released we we would call
dquot_free_block() or dquot_alloc_block() respectively, among the else
decrementing or incrementing the number of blocks assigned to the
inode by one block.

This however does not work for bigalloc file system because we always
allocate/free the whole cluster so we have to count with that in
dquot_free_block() and dquot_alloc_block() as well.

Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
blocks to the dquot_alloc/free functions to fix the problem.

The problem has been revealed by xfstests #117 (and possibly others).

Signed-off-by: Lukas Czerner 
Signed-off-by: "Theodore Ts'o" 
Reviewed-by: Eric Sandeen 
Signed-off-by: Greg Kroah-Hartman

ext4: fix race in ext4_mb_add_n_trim()

2013-03-03T22:06:42+00:00

commit f1167009711032b0d747ec89a632a626c901a1ad upstream.

In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.

Signed-off-by: Niu Yawei 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

ext4: check bh in ext4_read_block_bitmap()

2013-03-03T22:06:42+00:00

commit 15b49132fc972c63894592f218ea5a9a61b1a18f upstream.

Validate the bh pointer before using it, since
ext4_read_block_bitmap_nowait() might return NULL.

I've seen this in fsfuzz testing.

 EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [] ext4_wait_block_bitmap+0x25/0xe0
 ...
 Call Trace:
  [] ext4_read_block_bitmap+0x35/0x60
  [] ext4_free_blocks+0x236/0xb80
  [] ? __getblk+0x36/0x70
  [] ? __find_get_block+0x8f/0x210
  [] ? kmem_cache_free+0x33/0x140
  [] ext4_xattr_release_block+0x1b5/0x1d0
  [] ext4_xattr_delete_inode+0xbe/0x100
  [] ext4_free_inode+0x7c/0x4d0
  [] ? ext4_mark_inode_dirty+0x88/0x230
  [] ext4_evict_inode+0x32c/0x490
  [] evict+0xa7/0x1c0
  [] iput_final+0xe3/0x170
  [] iput+0x3e/0x50
  [] ext4_add_nondir+0x4d/0x90
  [] ext4_create+0xeb/0x170
  [] vfs_create+0xac/0xd0
  [] lookup_open+0x185/0x1c0
  [] ? selinux_inode_permission+0xa9/0x170
  [] do_last+0x2d4/0x7a0
  [] path_openat+0xb3/0x480
  [] ? handle_mm_fault+0x251/0x3b0
  [] do_filp_open+0x49/0xa0
  [] ? __alloc_fd+0xdd/0x150
  [] do_sys_open+0x108/0x1f0
  [] sys_open+0x21/0x30
  [] system_call_fastpath+0x16/0x1b

Also fix comment for ext4_read_block_bitmap_nowait()

Signed-off-by: Eryu Guan 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

fs: Fix possible use-after-free with AIO

2013-03-03T22:06:41+00:00

commit 54c807e71d5ac59dee56c685f2b66e27cd54c475 upstream.

Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

Acked-by: Jeff Moyer 
CC: Christoph Hellwig 
CC: Jens Axboe 
CC: Jeff Moyer 
Signed-off-by: Jan Kara 
Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

ocfs2: ac->ac_allow_chain_relink=0 won't disable group relink

2013-03-03T22:06:40+00:00

commit 309a85b6861fedbb48a22d45e0e079d1be993b3a upstream.

ocfs2_block_group_alloc_discontig() disables chain relink by setting
ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple
cluster groups.

It doesn't keep the credits for all chain relink,but
ocfs2_claim_suballoc_bits overrides this in this call trace:
ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()->
__ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits()
ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call
ocfs2_search_chain() one time and disable it again, and then we run out
of credits.

Fix is to allow relink by default and disable it in
ocfs2_block_group_alloc_discontig.

Without this patch, End-users will run into a crash due to run out of
credits, backtrace like this:

  RIP: 0010:[]  []
  jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
  RSP: 0018:ffff8801b919b5b8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
  RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
  RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
  R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
  FS:  00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
  Call Trace:
    ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
    ocfs2_relink_block_group+0x111/0x480 [ocfs2]
    ocfs2_search_chain+0x455/0x9a0 [ocfs2]
    ...

Signed-off-by: Xiaowei.Hu 
Reviewed-by: Srinivas Eeda 
Cc: Mark Fasheh 
Cc: Joel Becker 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ocfs2: fix ocfs2_init_security_and_acl() to initialize acl correctly

2013-03-03T22:06:40+00:00

commit 32918dd9f19e5960af4cdfa41190bb843fb2247b upstream.

We need to re-initialize the security for a new reflinked inode with its
parent dirs if it isn't specified to be preserved for ocfs2_reflink().
However, the code logic is broken at ocfs2_init_security_and_acl()
although ocfs2_init_security_get() succeed.  As a result,
ocfs2_acl_init() does not involked and therefore the default ACL of
parent dir was missing on the new inode.

Note this was introduced by 9d8f13ba3 ("security: new
security_inode_init_security API adds function callback")

To reproduce:

    set default ACL for the parent dir(ocfs2 in this case):
    $ setfacl -m default:user:jeff:rwx ../ocfs2/
    $ getfacl ../ocfs2/
    # file: ../ocfs2/
    # owner: jeff
    # group: jeff
    user::rwx
    group::r-x
    other::r-x
    default:user::rwx
    default:user:jeff:rwx
    default:group::r-x
    default:mask::rwx
    default:other::r-x

    $ touch a
    $ getfacl a
    # file: a
    # owner: jeff
    # group: jeff
    user::rw-
    group::rw-
    other::r--

Before patching, create reflink file b from a, the user
default ACL entry(user:jeff:rwx)was missing:

    $ ./ocfs2_reflink a b
    $ getfacl b
    # file: b
    # owner: jeff
    # group: jeff
    user::rw-
    group::rw-
    other::r--

In this case, the end user can also observed an error message at syslog:

  (ocfs2_reflink,3229,2):ocfs2_init_security_and_acl:7193 ERROR: status = 0

After applying this patch, create reflink file c from a:

    $ ./ocfs2_reflink a c
    $ getfacl c
    # file: c
    # owner: jeff
    # group: jeff
    user::rw-
    user:jeff:rwx			#effective:rw-
    group::r-x			#effective:r--
    mask::rw-
    other::r--

Test program:
/* Usage: reflink   */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static int
reflink_file(char const *src_name, char const *dst_name,
	     bool preserve_attrs)
{
	int fd;

#ifndef REFLINK_ATTR_NONE
#  define REFLINK_ATTR_NONE 0
#endif
#ifndef REFLINK_ATTR_PRESERVE
#  define REFLINK_ATTR_PRESERVE 1
#endif
#ifndef OCFS2_IOC_REFLINK
	struct reflink_arguments {
		uint64_t old_path;
		uint64_t new_path;
		uint64_t preserve;
	};

#  define OCFS2_IOC_REFLINK _IOW ('o', 4, struct reflink_arguments)
#endif
	struct reflink_arguments args = {
		.old_path = (unsigned long) src_name,
		.new_path = (unsigned long) dst_name,
		.preserve = preserve_attrs ? REFLINK_ATTR_PRESERVE :
					     REFLINK_ATTR_NONE,
	};

	fd = open(src_name, O_RDONLY);
	if (fd < 0) {
		fprintf(stderr, "Failed to open %s: %s\n",
			src_name, strerror(errno));
		return -1;
	}

	if (ioctl(fd, OCFS2_IOC_REFLINK, &args) < 0) {
		fprintf(stderr, "Failed to reflink %s to %s: %s\n",
			src_name, dst_name, strerror(errno));
		return -1;
	}
}

int
main(int argc, char *argv[])
{
	if (argc != 3) {
		fprintf(stdout, "Usage: %s source dest\n", argv[0]);
		return 1;
	}

	return reflink_file(argv[1], argv[2], 0);
}

Signed-off-by: Jie Liu 
Reviewed-by: Tao Ma 
Cc: Mimi Zohar 
Cc: Joel Becker 
Cc: Mark Fasheh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman