linux-toradex.git/fs, branch v3.4.71

configfs: fix race between dentry put and lookup

2013-11-29T18:50:37+00:00

commit 76ae281f6307331aa063288edb6422ae99f435f0 upstream.

A race window in configfs, it starts from one dentry is UNHASHED and end
before configfs_d_iput is called.  In this window, if a lookup happen,
since the original dentry was UNHASHED, so a new dentry will be
allocated, and then in configfs_attach_attr(), sd->s_dentry will be
updated to the new dentry.  Then in configfs_d_iput(),
BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.

sys_open:                     sys_close:
 ...                           fput
                                dput
                                 dentry_kill
                                  __d_drop <--- dentry unhashed here,
                                           but sd->dentry still point
                                           to this dentry.

 lookup_real
  configfs_lookup
   configfs_attach_attr---> update sd->s_dentry
                            to new allocated dentry here.

                                   d_kill
                                     configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                     triggered here.

To fix it, change configfs_d_iput to not update sd->s_dentry if
sd->s_count > 2, that means there are another dentry is using the sd
beside the one that is going to be put.  Use configfs_dirent_lock in
configfs_attach_attr to sync with configfs_d_iput.

With the following steps, you can reproduce the bug.

1. enable ocfs2, this will mount configfs at /sys/kernel/config and
   fill configure in it.

2. run the following script.
	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &

Signed-off-by: Junxiao Bi 
Cc: Joel Becker 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

nfsd: make sure to balance get/put_write_access

2013-11-29T18:50:35+00:00

commit 987da4791052fa298b7cfcde4dea9f6f2bbc786b upstream.

Use a straight goto error label style in nfsd_setattr to make sure
we always do the put_write_access call after we got it earlier.

Note that the we have been failing to do that in the case
nfsd_break_lease() returns an error, a bug introduced into 2.6.38 with
6a76bebefe15d9a08864f824d7f8d5beaf37c997 "nfsd4: break lease on nfsd
setattr".

Signed-off-by: Christoph Hellwig 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

nfsd: split up nfsd_setattr

2013-11-29T18:50:35+00:00

commit 818e5a22e907fbae75e9c1fd78233baec9fa64b6 upstream.

Split out two helpers to make the code more readable and easier to verify
for correctness.

Signed-off-by: Christoph Hellwig 
Signed-off-by: J. Bruce Fields 
Signed-off-by: Greg Kroah-Hartman

NFSv4: Fix a use-after-free situation in _nfs4_proc_getlk()

2013-11-29T18:50:34+00:00

commit a6f951ddbdfb7bd87d31a44f61abe202ed6ce57f upstream.

In nfs4_proc_getlk(), when some error causes a retry of the call to
_nfs4_proc_getlk(), we can end up with Oopses of the form

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000134
 IP: [] _raw_spin_lock+0xe/0x30

 Call Trace:
  [] _atomic_dec_and_lock+0x4d/0x70
  [] nfs4_put_lock_state+0x32/0xb0 [nfsv4]
  [] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
  [] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4]
  [] nfs4_proc_lock+0x399/0x5a0 [nfsv4]

The problem is that we don't clear the request->fl_ops after the first
try and so when we retry, nfs4_set_lock_state() exits early without
setting the lock stateid.
Regression introduced by commit 70cc6487a4e08b8698c0e2ec935fb48d10490162
(locks: make ->lock release private data before returning in GETLK case)

Reported-by: Weston Andros Adamson 
Reported-by: Jorge Mora 
Signed-off-by: Trond Myklebust 
Signed-off-by: Greg Kroah-Hartman

exec/ptrace: fix get_dumpable() incorrect tests

2013-11-29T18:50:34+00:00

commit d049f74f2dbe71354d43d393ac3a188947811348 upstream.

The get_dumpable() return value is not boolean.  Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
protected state.  Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
        or
    if (dumpable == 0) { /* be protective */ }
        or
    if (!dumpable) { /* be protective */ }

Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
        or
    if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user.  (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov 
Signed-off-by: Kees Cook 
Cc: "Luck, Tony" 
Cc: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

exec: do not abuse ->cred_guard_mutex in threadgroup_lock()

2013-11-29T18:50:33+00:00

commit e56fb2874015370e3b7f8d85051f6dce26051df9 upstream.

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable.  This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

	do_execve:		cred_guard_mutex -> i_mutex
	cgroup_mount:		i_mutex -> cgroup_mutex
	attach_task_by_pid:	cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().

Reported-by: Dave Jones 
Acked-by: Tejun Heo 
Acked-by: Li Zefan 
Signed-off-by: Oleg Nesterov 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[ zhj: adjust context ]
Signed-off-by: Zhao Hongjiang 
Signed-off-by: Greg Kroah-Hartman

Nest rename_lock inside vfsmount_lock

2013-11-29T18:50:33+00:00

commit 7ea600b5314529f9d1b9d6d3c41cb26fce6a7a4a upstream.

... lest we get livelocks between path_is_under() and d_path() and friends.

The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.

As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.

Spotted-by: Brad Spengler 
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro 
[ zhj: backport to 3.4:
  - Adjust context
  - s/&vfsmount_lock/vfsmount_lock/]
Signed-off-by: Zhao Hongjiang 
Signed-off-by: Greg Kroah-Hartman

SUNRPC handle EKEYEXPIRED in call_refreshresult

2013-11-29T18:50:32+00:00

commit eb96d5c97b0825d542e9c4ba5e0a22b519355166 upstream.

Currently, when an RPCSEC_GSS context has expired or is non-existent
and the users (Kerberos) credentials have also expired or are non-existent,
the client receives the -EKEYEXPIRED error and tries to refresh the context
forever.  If an application is performing I/O, or other work against the share,
the application hangs, and the user is not prompted to refresh/establish their
credentials. This can result in a denial of service for other users.

Users are expected to manage their Kerberos credential lifetimes to mitigate
this issue.

Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number
of times to refresh the gss_context, and then return -EACCES to the application.

Signed-off-by: Andy Adamson 
Signed-off-by: Trond Myklebust 
[bwh: Backported to 3.2:
 - Adjust context
 - Drop change to nfs4_handle_reclaim_lease_error()]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman

nfs: don't allow nfs_find_actor to match inodes of the wrong type

2013-11-29T18:50:30+00:00

commit f6488c9ba51d65410e2dbc4345413c0d9120971e upstream.

Benny Halevy reported the following oops when testing RHEL6:

<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
<4>RIP: 0010:[]  [] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [] __fput+0xf5/0x210
<4> [] ? do_open+0x0/0x20 [nfs]
<4> [] fput+0x25/0x30
<4> [] __dentry_open+0x27e/0x360
<4> [] ? inotify_d_instantiate+0x2a/0x60
<4> [] lookup_instantiate_filp+0x69/0x90
<4> [] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [] __lookup_hash+0x102/0x160
<4> [] ? selinux_inode_permission+0x72/0xb0
<4> [] lookup_hash+0x3a/0x50
<4> [] do_filp_open+0x2eb/0xdd0
<4> [] ? __do_page_fault+0x1ec/0x480
<4> [] ? alloc_fd+0x92/0x160
<4> [] do_sys_open+0x69/0x140
<4> [] ? sys_lseek+0x66/0x80
<4> [] sys_open+0x20/0x30
<4> [] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP  [] nfs_closedir+0x15/0x30 [nfs]
<4> RSP 
<4>CR2: 0000000000000000

I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.

Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.

Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.

This should fix the oops reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=913660

Reported-by: Benny Halevy 
Signed-off-by: Jeff Layton 
Signed-off-by: Trond Myklebust 
Cc: Rui Xiang 
Signed-off-by: Greg Kroah-Hartman

vfs,proc: guarantee unique inodes in /proc

2013-11-29T18:50:30+00:00

commit 51f0885e5415b4cc6535e9cdcc5145bfbc134353 upstream.

Dave Jones found another /proc issue with his Trinity tool: thanks to
the namespace model, we can have multiple /proc dentries that point to
the same inode, aliasing directories in /proc//net/ for example.

This ends up being a total disaster, because it acts like hardlinked
directories, and causes locking problems.  We rely on the topological
sort of the inodes pointed to by dentries, and if we have aliased
directories, that odering becomes unreliable.

In short: don't do this.  Multiple dentries with the same (directory)
inode is just a bad idea, and the namespace code should never have
exposed things this way.  But we're kind of stuck with it.

This solves things by just always allocating a new inode during /proc
dentry lookup, instead of using "iget_locked()" to look up existing
inodes by superblock and number.  That actually simplies the code a bit,
at the cost of potentially doing more inode [de]allocations.

That said, the inode lookup wasn't free either (and did a lot of locking
of inodes), so it is probably not that noticeable.  We could easily keep
the old lookup model for non-directory entries, but rather than try to
be excessively clever this just implements the minimal and simplest
workaround for the problem.

Reported-and-tested-by: Dave Jones 
Analyzed-by: Al Viro 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Adjust context
 - Never drop the pde reference in proc_get_inode(), as callers only
   expect this when we return an existing inode, and we never do that now]
Signed-off-by: Ben Hutchings 
Cc: Rui Xiang 
Signed-off-by: Greg Kroah-Hartman