summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-12-09nfs: handle lock context allocation failures in nfs_create_requestJeff Layton
commit 015f0212d51d85bd281a831639a769b4a1a3307a upstream. nfs_get_lock_context can return NULL on an allocation failure. Regression introduced by commit f11ac8db. Reported-by: Steve Dickson <steved@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09ecryptfs: call vfs_setxattr() in ecryptfs_setxattr()Roberto Sassu
commit 48b512e6857139393cdfce26348c362b87537018 upstream. Ecryptfs is a stackable filesystem which relies on lower filesystems the ability of setting/getting extended attributes. If there is a security module enabled on the system it updates the 'security' field of inodes according to the owned extended attribute set with the function vfs_setxattr(). When this function is performed on a ecryptfs filesystem the 'security' field is not updated for the lower filesystem since the call security_inode_post_setxattr() is missing for the lower inode. Further, the call security_inode_setxattr() is missing for the lower inode, leading to policy violations in the security module because specific checks for this hook are not performed (i. e. filesystem 'associate' permission on SELinux is not checked for the lower filesystem). This patch replaces the call of the setxattr() method of the lower inode in the function ecryptfs_setxattr() with vfs_setxattr(). Signed-off-by: Roberto Sassu <roberto.sassu@polito.it> Cc: Dustin Kirkland <kirkland@canonical.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09eCryptfs: Clear LOOKUP_OPEN flag when creating lower fileTyler Hicks
commit 2e21b3f124eceb6ab5a07c8a061adce14ac94e14 upstream. eCryptfs was passing the LOOKUP_OPEN flag through to the lower file system, even though ecryptfs_create() doesn't support the flag. A valid filp for the lower filesystem could be returned in the nameidata if the lower file system's create() function supported LOOKUP_OPEN, possibly resulting in unencrypted writes to the lower file. However, this is only a potential problem in filesystems (FUSE, NFS, CIFS, CEPH, 9p) that eCryptfs isn't known to support today. https://bugs.launchpad.net/ecryptfs/+bug/641703 Reported-by: Kevin Buhr Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09hostfs: fix UML crash: remove f_spare from hostfsRichard Weinberger
commit 1b627d5771312c92404b66f0a0b16f66036dd2e1 upstream. 365b1818 ("add f_flags to struct statfs(64)") resized f_spare within struct statfs which caused a UML crash. There is no need to copy f_spare. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09reiserfs: don't acquire lock recursively in reiserfs_acl_chmodFrederic Weisbecker
commit 238af8751f64a75f8b638193353b1c31ea32e738 upstream. reiserfs_acl_chmod() can be called by reiserfs_set_attr() and then take the reiserfs lock a second time. Thereafter it may call journal_begin() that definitely requires the lock not to be nested in order to release it before taking the journal mutex because the reiserfs lock depends on the journal mutex already. So, aviod nesting the lock in reiserfs_acl_chmod(). Reported-by: Pawel Zawora <pzawora@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Pawel Zawora <pzawora@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09reiserfs: fix inode mutex - reiserfs lock misorderingFrederic Weisbecker
commit da905873effecd1c0166e578bc4b5006f041b18b upstream. reiserfs_unpack() locks the inode mutex with reiserfs_mutex_lock_safe() to protect against reiserfs lock dependency. However this protection requires to have the reiserfs lock to be locked. This is the case if reiserfs_unpack() is called by reiserfs_ioctl but not from reiserfs_quota_on() when it tries to unpack tails of quota files. Fix the ordering of the two locks in reiserfs_unpack() to fix this issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Markus Gapp <markus.gapp@gmx.net> Reported-by: Jan Kara <jack@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidationTrond Myklebust
commit bc4866b6e0b44f8ea0df22a16e5927714beb4983 upstream. In the case where we lock the page, and then find out that the page has been thrown out of the page cache, we should just return VM_FAULT_NOPAGE. This is what block_page_mkwrite() does in these situations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09NFSv4: Fix open recoveryTrond Myklebust
commit b0ed9dbc24f1fd912b2dd08b995153cafc1d5b1c upstream. NFSv4 open recovery is currently broken: since we do not clear the state->flags states before attempting recovery, we end up with the 'can_open_cached()' function triggering. This again leads to no OPEN call being put on the wire. Reported-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlersTrond Myklebust
commit ae1007d37e00144b72906a4bdc47d517ae91bcc1 upstream. In the case of a server reboot, the state recovery thread starts by calling nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when the server reboots while the client is in the middle of recovery. However, if the client has already marked the nfs4_state as requiring reboot recovery, then the above behaviour will cause the recovery thread to treat the open as if it was part of such an edge condition: the open will be recovered as if it was part of a lease expiration (and all the locks will be lost). Fix is to remove the call to nfs4_state_mark_reclaim_reboot from nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it to the recovery thread to do this for us. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTIDTrond Myklebust
commit 6eaa61496fb3b93cceface7a296415fc4c030bce upstream. If the server sends us an NFS4ERR_STALE_CLIENTID while the state management thread is busy reclaiming state, we do want to treat all state that wasn't reclaimed before the STALE_CLIENTID as if a network partition occurred (see the edge conditions described in RFC3530 and RFC5661). What we do not want to do is to send an nfs4_reclaim_complete(), since we haven't yet even started reclaiming state after the server rebooted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09block: limit vec count in bio_kmalloc() and bio_alloc_map_data()Jens Axboe
commit f3f63c1c28bc861a931fac283b5bc3585efb8967 upstream. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22Fixed Regression in NFS Direct I/O pathSteve Dickson
commit 568a810d7edd58bd505222dd1c7e48895532290b upstream. A typo, introduced by commit f11ac8db, in the nfs_direct_write() routine causes writes with O_DIRECT set to fail with a ENOMEM error. Found-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22pipe: fix failure to return error code on ->confirm()Nicolas Kaiser
commit e5953cbdff26f7cbae7eff30cd9b18c4e19b7594 upstream. The arguments were transposed, we want to assign the error code to 'ret', which is being returned. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22cifs: fix broken oplock handlingSuresh Jayaraman
commit aa91c7e4ab9b0842b7d7a7cbf8cca18b20df89b5 upstream. cifs_new_fileinfo() does not use the 'oplock' value from the callers. Instead, it sets it to REQ_OPLOCK which seems wrong. We should be using the oplock value obtained from the Server to set the inode's clientCanCacheAll or clientCanCacheRead flags. Fix this by passing oplock from the callers to cifs_new_fileinfo(). This change dates back to commit a6ce4932 (2.6.30-rc3). So, all the affected versions will need this fix. Please Cc stable once reviewed and accepted. Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-14Export dump_{write,seek} to binary loader modulesLinus Torvalds
If you build aout support as a module, you'll want these exported. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14Un-inline the core-dump helper functionsLinus Torvalds
Tony Luck reports that the addition of the access_ok() check in commit 0eead9ab41da ("Don't dump task struct in a.out core-dumps") broke the ia64 compile due to missing the necessary header file includes. Rather than add yet another include (<asm/unistd.h>) to make everything happy, just uninline the silly core dump helper functions and move the bodies to fs/exec.c where they make a lot more sense. dump_seek() in particular was too big to be an inline function anyway, and none of them are in any way performance-critical. And we really don't need to mess up our include file headers more than they already are. Reported-and-tested-by: Tony Luck <tony.luck@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14Don't dump task struct in a.out core-dumpsLinus Torvalds
akiphie points out that a.out core-dumps have that odd task struct dumping that was never used and was never really a good idea (it goes back into the mists of history, probably the original core-dumping code). Just remove it. Also do the access_ok() check on dump_write(). It probably doesn't matter (since normal filesystems all seem to do it anyway), but he points out that it's normally done by the VFS layer, so ... [ I suspect that we should possibly do "vfs_write()" instead of calling ->write directly. That also does the whole fsnotify and write statistics thing, which may or may not be a good idea. ] And just to be anal, do this all for the x86-64 32-bit a.out emulation code too, even though it's not enabled (and won't currently even compile) Reported-by: akiphie <akiphie@lavabit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-13Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
2010-10-13nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlinkJ. Bruce Fields
As of commit 43a9aa64a2f4330a9cb59aaf5c5636566bce067c "NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call fh_unlock on a filehandle that isn't fully initialized. We should fix up the callers, but as a quick fix it is also sufficient just to remove this assertion. Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-11fanotify: disable fanotify syscallsEric Paris
This patch disables the fanotify syscalls by just not building them and letting the cond_syscall() statements in kernel/sys_ni.c redirect them to sys_ni_syscall(). It was pointed out by Tvrtko Ursulin that the fanotify interface did not include an explicit prioritization between groups. This is necessary for fanotify to be usable for hierarchical storage management software, as they must get first access to the file, before inotify-like notifiers see the file. This feature can be added in an ABI compatible way in the next release (by using a number of bits in the flags field to carry the info) but it was suggested by Alan that maybe we should just hold off and do it in the next cycle, likely with an (new) explicit argument to the syscall. I don't like this approach best as I know people are already starting to use the current interface, but Alan is all wise and noone on list backed me up with just using what we have. I feel this is needlessly ripping the rug out from under people at the last minute, but if others think it needs to be a new argument it might be the best way forward. Three choices: Go with what we got (and implement the new feature next cycle). Add a new field right now (and implement the new feature next cycle). Wait till next cycle to release the ABI (and implement the new feature next cycle). This is number 3. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: update issue_seq on cap grant ceph: send cap release message early on failed revoke. ceph: Update max_len with minimum required size ceph: Fix return value of encode_fh function ceph: avoid null deref in osd request error path ceph: fix list_add usage on unsafe_writes list
2010-10-09Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Fix double page_unlock BUG in write_begin/end
2010-10-08exofs: Fix double page_unlock BUG in write_begin/endBoaz Harrosh
This BUG is there since the first submit of the code, but only triggered in last Kernel. It's timing related do to the asynchronous object-creation behaviour of exofs. (Which should be investigated farther) The bug is obvious hence the fixed. Signed-off-by: Boaz Harrosh <Boaz Harrosh bharrosh@panasas.com>
2010-10-07Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: properly account for reclaimed inodes
2010-10-07ceph: update issue_seq on cap grantSage Weil
We need to update the issue_seq on any grant operation, be it via an MDS reply or a separate grant message. The update in the grant path was missing. This broke cap release for inodes in which the MDS sent an explicit grant message that was not soon after followed by a successful MDS reply on the same inode. Also fix the signedness on seq locals. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: send cap release message early on failed revoke.Greg Farnum
If an MDS tries to revoke caps that we don't have, we want to send releases early since they probably contain the caps message the MDS is looking for. Previously, we only sent the messages if we didn't have the inode either. But in a multi-mds system we can retain the inode after dropping all caps for a single MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: Update max_len with minimum required sizeAneesh Kumar K.V
encode_fh on error should update max_len with minimum required size, so that caller can redo the call with the reallocated buffer. This is required with open by handle patch series Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: Fix return value of encode_fh functionAneesh Kumar K.V
encode_fh function should return 255 on error as done by other file system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units and not in bytes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: avoid null deref in osd request error pathSage Weil
If we interrupt an osd request, we call __cancel_request, but it wasn't verifying that req->r_osd was non-NULL before dereferencing it. This could cause a crash if osds were flapping and we aborted a request on said osd. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: fix list_add usage on unsafe_writes listHenry C Chang
Fix argument order. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-06xfs: properly account for reclaimed inodesJohannes Weiner
When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: stable@kernel.org Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2010-10-06Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: always use sb->s_bdi for writeback purposes
2010-10-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Initialize total_len in fuse_retrieve()
2010-10-04writeback: always use sb->s_bdi for writeback purposesChristoph Hellwig
We currently use struct backing_dev_info for various different purposes. Originally it was introduced to describe a backing device which includes an unplug and congestion function and various bits of readahead information and VM-relevant flags. We're also using for tracking dirty inodes for writeback. To make writeback properly find all inodes we need to only access the per-filesystem backing_device pointed to by the superblock in ->s_bdi inside the writeback code, and not the instances pointeded to by inode->i_mapping->backing_dev which can be overriden by special devices or might not be set at all by some filesystems. Long term we should split out the writeback-relevant bits of struct backing_device_info (which includes more than the current bdi_writeback) and only point to it from the superblock while leaving the traditional backing device as a separate structure that can be overriden by devices. The one exception for now is the block device filesystem which really wants different writeback contexts for it's different (internal) inodes to handle the writeout more efficiently. For now we do this with a hack in fs-writeback.c because we're so late in the cycle, but in the future I plan to replace this with a superblock method that allows for multiple writeback contexts per filesystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-04fuse: Initialize total_len in fuse_retrieve()Geert Uytterhoeven
fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this function Initialize total_len to zero, else its value will be undefined. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2010-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: prevent infinite recursion in cifs_reconnect_tcon cifs: set backing_dev_info on new S_ISREG inodes
2010-10-01reiserfs: fix unwanted reiserfs lock recursionFrederic Weisbecker
Prevent from recursively locking the reiserfs lock in reiserfs_unpack() because we may call journal_begin() that requires the lock to be taken only once, otherwise it won't be able to release the lock while taking other mutexes, ending up in inverted dependencies between the journal mutex and the reiserfs lock for example. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35.4.4a #3 ------------------------------------------------------- lilo/1620 is trying to acquire lock: (&journal->j_mutex){+.+...}, at: [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0325c06>] do_journal_begin_r+0x86/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0315be4>] reiserfs_remount+0x224/0x530 [reiserfs] [<c10b6a20>] do_remount_sb+0x60/0x110 [<c10cee25>] do_mount+0x625/0x790 [<c10cf014>] sys_mount+0x84/0xb0 [<c12fca3d>] syscall_call+0x7/0xb -> #0 (&journal->j_mutex){+.+...}: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb other info that might help us debug this: 2 locks held by lilo/1620: #0: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d032945a>] reiserfs_unpack+0x6a/0x120 [reiserfs] #1: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a278>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1620, comm: lilo Not tainted 2.6.35.4.4a #3 Call Trace: [<c10560f6>] __lock_acquire+0x1026/0x1180 [<c10562b7>] lock_acquire+0x67/0x80 [<c12facad>] __mutex_lock_common+0x4d/0x410 [<c12fb0c8>] mutex_lock_nested+0x18/0x20 [<d0325bff>] do_journal_begin_r+0x7f/0x340 [reiserfs] [<d0325f77>] journal_begin+0x77/0x140 [reiserfs] [<d0326271>] reiserfs_persistent_transaction+0x41/0x90 [reiserfs] [<d030d06c>] reiserfs_get_block+0x22c/0x1530 [reiserfs] [<c10db9db>] __block_prepare_write+0x1bb/0x3a0 [<c10dbbe6>] block_prepare_write+0x26/0x40 [<d030b738>] reiserfs_prepare_write+0x88/0x170 [reiserfs] [<d03294d6>] reiserfs_unpack+0xe6/0x120 [reiserfs] [<d0329782>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3188>] vfs_ioctl+0x28/0xa0 [<c10c3bbd>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3eb3>] sys_ioctl+0x63/0x70 [<c12fca3d>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: All since 2.6.32 <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01reiserfs: fix dependency inversion between inode and reiserfs mutexesFrederic Weisbecker
The reiserfs mutex already depends on the inode mutex, so we can't lock the inode mutex in reiserfs_unpack() without using the safe locking API, because reiserfs_unpack() is always called with the reiserfs mutex locked. This fixes: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35c #13 ------------------------------------------------------- lilo/1606 is trying to acquire lock: (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] but task is already holding lock: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] [<d0329e9a>] reiserfs_lookup_privroot+0x2a/0x90 [reiserfs] [<d0316b81>] reiserfs_fill_super+0x941/0xe60 [reiserfs] [<c10b7d17>] get_sb_bdev+0x117/0x170 [<d0313e21>] get_super_block+0x21/0x30 [reiserfs] [<c10b74ba>] vfs_kern_mount+0x6a/0x1b0 [<c10b7659>] do_kern_mount+0x39/0xe0 [<c10cebe0>] do_mount+0x340/0x790 [<c10cf0b4>] sys_mount+0x84/0xb0 [<c12f25cd>] syscall_call+0x7/0xb -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb other info that might help us debug this: 1 lock held by lilo/1606: #0: (&REISERFS_SB(s)->lock){+.+.+.}, at: [<d032a268>] reiserfs_write_lock+0x28/0x40 [reiserfs] stack backtrace: Pid: 1606, comm: lilo Not tainted 2.6.35c #13 Call Trace: [<c1056186>] __lock_acquire+0x1026/0x1180 [<c1056347>] lock_acquire+0x67/0x80 [<c12f083d>] __mutex_lock_common+0x4d/0x410 [<c12f0c58>] mutex_lock_nested+0x18/0x20 [<d0329450>] reiserfs_unpack+0x60/0x110 [reiserfs] [<d0329772>] reiserfs_ioctl+0x272/0x320 [reiserfs] [<c10c3228>] vfs_ioctl+0x28/0xa0 [<c10c3c5d>] do_vfs_ioctl+0x32d/0x5c0 [<c10c3f53>] sys_ioctl+0x63/0x70 [<c12f25cd>] syscall_call+0x7/0xb Reported-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.32 and later] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01proc: make /proc/pid/limits world readableJiri Olsa
Having the limits file world readable will ease the task of system management on systems where root privileges might be restricted. Having admin restricted with root priviledges, he/she could not check other users process' limits. Also it'd align with most of the /proc stat files. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Eugene Teo <eugene@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01cifs: prevent infinite recursion in cifs_reconnect_tconJeff Layton
cifs_reconnect_tcon is called from smb_init. After a successful reconnect, cifs_reconnect_tcon will call reset_cifs_unix_caps. That function will, in turn call CIFSSMBQFSUnixInfo and CIFSSMBSetFSUnixInfo. Those functions also call smb_init. It's possible for the session and tcon reconnect to succeed, and then for another cifs_reconnect to occur before CIFSSMBQFSUnixInfo or CIFSSMBSetFSUnixInfo to be called. That'll cause those functions to call smb_init and cifs_reconnect_tcon again, ad infinitum... Break the infinite recursion by having those functions use a new smb_init variant that doesn't attempt to perform a reconnect. Reported-and-Tested-by: Michal Suchanek <hramrach@centrum.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-29Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Don't walk off the end of fast symlinks.
2010-09-29ocfs2: Don't walk off the end of fast symlinks.Joel Becker
ocfs2 fast symlinks are NUL terminated strings stored inline in the inode data area. However, disk corruption or a local attacker could, in theory, remove that NUL. Because we're using strlen() (my fault, introduced in a731d1 when removing vfs_follow_link()), we could walk off the end of that string. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
2010-09-29cifs: set backing_dev_info on new S_ISREG inodesJeff Layton
Testing on very recent kernel (2.6.36-rc6) made this warning pop: WARNING: at fs/fs-writeback.c:87 inode_to_bdi+0x65/0x70() Hardware name: Dirtiable inode bdi default != sb bdi cifs ...the following patch fixes it and seems to be the obviously correct thing to do for cifs. Cc: stable@kernel.org Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2010-09-29xfs: force background CIL push under sustained loadDave Chinner
I have been seeing occasional pauses in transaction throughput up to 30s long under heavy parallel workloads. The only notable thing was that the xfsaild was trying to be active during the pauses, but making no progress. It was running exactly 20 times a second (on the 50ms no-progress backoff), and the number of pushbuf events was constant across this time as well. IOWs, the xfsaild appeared to be stuck on buffers that it could not push out. Further investigation indicated that it was trying to push out inode buffers that were pinned and/or locked. The xfsbufd was also getting woken at the same frequency (by the xfsaild, no doubt) to push out delayed write buffers. The xfsbufd was not making any progress because all the buffers in the delwri queue were pinned. This scan- and-make-no-progress dance went one in the trace for some seconds, before the xfssyncd came along an issued a log force, and then things started going again. However, I noticed something strange about the log force - there were way too many IO's issued. 516 log buffers were written, to be exact. That added up to 129MB of log IO, which got me very interested because it's almost exactly 25% of the size of the log. He delayed logging code is suppose to aggregate the minimum of 25% of the log or 8MB worth of changes before flushing. That's what really puzzled me - why did a log force write 129MB instead of only 8MB? Essentially what has happened is that no CIL pushes had occurred since the previous tail push which cleared out 25% of the log space. That caused all the new transactions to block because there wasn't log space for them, but they kick the xfsaild to push the tail. However, the xfsaild was not making progress because there were buffers it could not lock and flush, and the xfsbufd could not flush them because they were pinned. As a result, both the xfsaild and the xfsbufd could not move the tail of the log forward without the CIL first committing. The cause of the problem was that the background CIL push, which should happen when 8MB of aggregated changes have been committed, is being held off by the concurrent transaction commit load. The background push does a down_write_trylock() which will fail if there is a concurrent transaction commit holding the push lock in read mode. With 8 CPUs all doing transactions as fast as they can, there was enough concurrent transaction commits to hold off the background push until tail-pushing could no longer free log space, and the halt would occur. It should be noted that there is no reason why it would halt at 25% of log space used by a single CIL checkpoint. This bug could definitely violate the "no transaction should be larger than half the log" requirement and hence result in corruption if the system crashed under heavy load. This sort of bug is exactly the reason why delayed logging was tagged as experimental.... The fix is to start blocking background pushes once the threshold has been exceeded. Rework the threshold calculations to keep the amount of log space a CIL checkpoint can use to below that of the AIL push threshold to avoid the problem completely. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-09-24Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: o2dlm: force free mles during dlm exit ocfs2: Sync inode flags with ext2. ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits. ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent. ocfs2: update ctime when changing the file's permission by setfacl ocfs2/net: fix uninitialized ret in o2net_send_message_vec() Ocfs2: Handle empty list in lockres_seq_start() for dlmdebug.c Ocfs2: Re-access the journal after ocfs2_insert_extent() in dxdir codes. ocfs2: Fix lockdep warning in reflink. ocfs2/lockdep: Move ip_xattr_sem out of ocfs2_xattr_get_nolock.
2010-09-23o2dlm: force free mles during dlm exitSrinivas Eeda
While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Sync inode flags with ext2.Tao Ma
We sync our inode flags with ext2 and define them by hex values. But actually in commit 3669567(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Move 'wanted' into parens of ocfs2_resmap_resv_bits.Tao Ma
The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: Use cpu_to_le16 for e_leaf_clusters in ocfs2_bg_discontig_add_extent.Tao Ma
e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-09-23ocfs2: update ctime when changing the file's permission by setfaclTao Ma
In commit 30e2bab, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>