linux-toradex.git/fs, branch v3.10.41

don't bother with {get,put}_write_access() on non-regular files

2014-05-31T04:52:12+00:00

commit dd20908a8a06b22c171f6c3fcdbdbd65bed07505 upstream.

it's pointless and actually leads to wrong behaviour in at least one
moderately convoluted case (pipe(), close one end, try to get to
another via /proc/*/fd and run into ETXTBUSY).

Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

lockd: ensure we tear down any live sockets when socket creation fails during lockd_up

2014-05-13T11:59:46+00:00

commit 679b033df48422191c4cac52b610d9980e019f9b upstream.

We had a Fedora ABRT report with a stack trace like this:

kernel BUG at net/sunrpc/svc.c:550!
invalid opcode: 0000 [#1] SMP
[...]
CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
RIP: 0010:[]  [] svc_destroy+0x128/0x130 [sunrpc]
RSP: 0018:ffff88003f9b9de0  EFLAGS: 00010206
RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
FS:  00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
Stack:
 ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
 ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
 ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
Call Trace:
 [] lockd_up+0xaa/0x330 [lockd]
 [] nfsd_svc+0x1b5/0x2f0 [nfsd]
 [] ? simple_strtoull+0x2c/0x50
 [] ? write_pool_threads+0x280/0x280 [nfsd]
 [] write_threads+0x8b/0xf0 [nfsd]
 [] ? __get_free_pages+0x14/0x50
 [] ? get_zeroed_page+0x16/0x20
 [] ? simple_transaction_get+0xb1/0xd0
 [] nfsctl_transaction_write+0x48/0x80 [nfsd]
 [] vfs_write+0xb4/0x1f0
 [] ? putname+0x29/0x40
 [] SyS_write+0x49/0xa0
 [] ? __audit_syscall_exit+0x1f6/0x2a0
 [] system_call_fastpath+0x16/0x1b
Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
RIP  [] svc_destroy+0x128/0x130 [sunrpc]
 RSP 

Evidently, we created some lockd sockets and then failed to create
others. make_socks then returned an error and we tried to tear down the
svc, but svc->sv_permsocks was not empty so we ended up tripping over
the BUG() in svc_destroy().

Fix this by ensuring that we tear down any live sockets we created when
socket creation is going to return an error.

Fixes: 786185b5f8abefa (SUNRPC: move per-net operations from...)
Reported-by: Raphos 
Signed-off-by: Jeff Layton 
Reviewed-by: Stanislav Kinsbursky 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

locks: allow __break_lease to sleep even when break_time is 0

2014-05-13T11:59:44+00:00

commit 4991a628a789dc5954e98e79476d9808812292ec upstream.

A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.

Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This causes __break_lease to spin in a tight loop and
causes soft lockups.

Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.

Cc: J. Bruce Fields 
Reported-by: Terry Barnaby 
Signed-off-by: Jeff Layton 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

ext4: use i_size_read in ext4_unaligned_aio()

2014-05-06T14:55:32+00:00

commit 6e6358fc3c3c862bfe9a5bc029d3f8ce43dc9765 upstream.

We haven't taken i_mutex yet, so we need to use i_size_read().

Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

ext4: fix jbd2 warning under heavy xattr load

2014-05-06T14:55:32+00:00

commit ec4cb1aa2b7bae18dd8164f2e9c7c51abcf61280 upstream.

When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:

WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()

CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
 [] dump_stack+0x19/0x1b
 [] warn_slowpath_common+0x70/0xa0
 [] warn_slowpath_null+0x1a/0x20
 [] jbd2_journal_dirty_metadata+0x1ba/0x260
 [] __ext4_handle_dirty_metadata+0xa3/0x140
 [] ext4_xattr_release_block+0x103/0x1f0
 [] ext4_xattr_block_set+0x1e0/0x910
 [] ext4_xattr_set_handle+0x38b/0x4a0
 [] ? trace_hardirqs_on+0xd/0x10
 [] ext4_xattr_set+0xc2/0x140
 [] ext4_xattr_user_set+0x47/0x50
 [] generic_setxattr+0x6e/0x90
 [] __vfs_setxattr_noperm+0x7b/0x1c0
 [] vfs_setxattr+0xc4/0xd0
 [] setxattr+0x13e/0x1e0
 [] ? __sb_start_write+0xe7/0x1b0
 [] ? mnt_want_write_file+0x28/0x60
 [] ? fget_light+0x3c/0x130
 [] ? mnt_want_write_file+0x28/0x60
 [] ? __mnt_want_write+0x58/0x70
 [] SyS_fsetxattr+0xbe/0x100
 [] system_call_fastpath+0x16/0x1b

The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:

CPU1                                CPU2
ext4_xattr_release_block()          ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
  le32_add_cpu(&BHDR(bh)->h_refcount, -1);
  unlock_buffer(bh);
                                    lock_buffer(bh);
                                    /* True */
                                    if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                      get_bh(bh);
                                      ext4_free_blocks()
                                        ...
                                        jbd2_journal_forget()
                                          jbd2_journal_unfile_buffer()
                                          -> JH is gone
  error = ext4_handle_dirty_xattr_block(handle, inode, bh);
  -> triggers the warning

We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.

Reported-by: Sage Weil 
Signed-off-by: Jan Kara 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

ocfs2: do not put bh when buffer_uptodate failed

2014-05-06T14:55:32+00:00

commit f7cf4f5bfe073ad792ab49c04f247626b3e38db6 upstream.

Do not put bh when buffer_uptodate failed in ocfs2_write_block and
ocfs2_write_super_or_backup, because it will put bh in b_end_io.
Otherwise it will hit a warning "VFS: brelse: Trying to free free
buffer".

Signed-off-by: Alex Chen 
Reviewed-by: Joseph Qi 
Reviewed-by: Srinivas Eeda 
Cc: Mark Fasheh 
Acked-by: Joel Becker 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ocfs2: dlm: fix recovery hung

2014-05-06T14:55:32+00:00

commit ded2cf71419b9353060e633b59e446c42a6a2a09 upstream.

There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node.  After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value.  This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master:                                 new recovery master:
dlm_remaster_locks()
                                                  become recovery master for
                                                  another dead node.
                                                  dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
 if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
  return -EAGAIN;
 }
 dlm_set_reco_master(dlm, br->node_idx);
 dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
 dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
                                                  will hang in dlm_remaster_locks() for
                                                  request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.

Signed-off-by: Junxiao Bi 
Reviewed-by: Srinivas Eeda 
Reviewed-by: Wengang Wang 
Cc: Joel Becker 
Cc: Mark Fasheh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

ocfs2: dlm: fix lock migration crash

2014-05-06T14:55:32+00:00

commit 34aa8dac482f1358d59110d5e3a12f4351f6acaa upstream.

This issue was introduced by commit 800deef3f6f8 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry.  The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this.  Sunil advised reverting it back, but the old version was
also not right.  At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic.  So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.

Another concern is that this seemes can not happen because the "tmpq"
list should not be empty.  Let me describe how.

old lock resource owner(node 1):                                  migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
 dlm_pick_migration_target() {
   pick node 2 as target as its lock is the first one
   in granted list.
 }
 dlm_migrate_lockres() {
   dlm_mark_lockres_migrating() {
     res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
     wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
	 //after the above code, we can not dirty lockres any more,
     // so dlm_thread shuffle list will not run
                                                                   downconvert lock from EX to NR
                                                                   upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.

	 // will set lockres RES_MIGRATING flag, the following
	 // lock/unlock can not run
     dlm_lockres_release_ast(dlm, res);
   }

   dlm_send_one_lockres()
                                                                 dlm_process_recovery_data()
                                                                   for (i=0; inum_locks; i++)
                                                                     if (ml->node == dlm->node_num)
                                                                       for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                        list_for_each_entry(lock, tmpq, list)
                                                                        if (lock) break; <<< lock is invalid as grant list is empty.
                                                                       }
                                                                       if (lock->ml.node != ml->node)
                                                                         BUG() >>> crash here
 }

I see the above locks status from a vmcore of our internal bug.

Signed-off-by: Junxiao Bi 
Reviewed-by: Wengang Wang 
Cc: Sunil Mushran 
Reviewed-by: Srinivas Eeda 
Cc: Joel Becker 
Cc: Mark Fasheh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

reiserfs: fix race in readdir

2014-05-06T14:55:30+00:00

commit 01d8885785a60ae8f4c37b0ed75bdc96d0fc6a44 upstream.

jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)

The -ENOENT is due to readdir calling dir_emit on the same entry twice.

If the dir_emit callback sleeps and the tree is changed underneath us,
we won't be able to trust deh_offset(deh) anymore. We need to save
next_pos before we might sleep so we can find the next entry.

Signed-off-by: Jeff Mahoney 
Signed-off-by: Jan Kara 
Signed-off-by: Greg Kroah-Hartman

nfsd: set timeparms.to_maxval in setup_callback_client

2014-05-06T14:55:29+00:00

commit 3758cf7e14b753838fe754ede3862af10b35fdac upstream.

...otherwise the logic in the timeout handling doesn't work correctly.

Spotted-by: Trond Myklebust 
Signed-off-by: Jeff Layton 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman