summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-06-16cifs: fix oops on mount when CONFIG_CIFS_DFS_UPCALL is enabledMarcin Slusarz
simple "mount -t cifs //xxx /mnt" oopsed on strlen of options http://kerneloops.org/guilty.php?guilty=cifs_get_sb&version=2.6.25-release&start=16711 \ 68&end=1703935&class=oops Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16ecryptfs: fix missed mutex_unlockCyrill Gorcunov
upstream commit: 71fd5179e8d1d4d503b517e0c5374f7c49540bfc Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ecryptfs: clean up (un)lock_parentMiklos Szeredi
upstream commit: 8dc4e37362a5dc910d704d52ac6542bfd49ddc2f dget(dentry->d_parent) --> dget_parent(dentry) unlock_parent() is racy and unnecessary. Replace single caller with unlock_dir(). There are several other suspect uses of ->d_parent in ecryptfs... Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16eCryptfs: protect crypt_stat->flags in ecryptfs_open()Michael Halcrow
upstream commit: 2f9b12a31fcb738ea8c9eb0d4ddf906c6f1d696c Make sure crypt_stat->flags is protected with a lock in ecryptfs_open(). Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16ecryptfs: add missing lock around notify_changeMiklos Szeredi
upstream commit: 9c3580aa52195699065bc2d7242b1c7e3e6903fa Callers of notify_change() need to hold i_mutex. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-16double-free of inode on alloc_file() failure exit in create_write_pipe()Al Viro
upstream commit: ed1524371716466e9c762808b02601d0d0276a92 Duh... Fortunately, the bug is quite recent (post-2.6.25) and, embarrassingly, mine ;-/ http://bugzilla.kernel.org/show_bug.cgi?id=10878 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09capabilities: remain source compatible with 32-bit raw legacy capability ↵Andrew G. Morgan
support. upstream commit: ca05a99a54db1db5bca72eccb5866d2a86f8517f Source code out there hard-codes a notion of what the _LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the raw capability system calls capget() and capset(). Its unfortunate, but true. Since the confusing header file has been in a released kernel, there is software that is erroneously using 64-bit capabilities with the semantics of 32-bit compatibilities. These recently compiled programs may suffer corruption of their memory when sys_getcap() overwrites more memory than they are coded to expect, and the raising of added capabilities when using sys_capset(). As such, this patch does a number of things to clean up the situation for all. It 1. forces the _LINUX_CAPABILITY_VERSION define to always retain its legacy value. 2. adopts a new #define strategy for the kernel's internal implementation of the preferred magic. 3. deprecates v2 capability magic in favor of a new (v3) magic number. The functionality of v3 is entirely equivalent to v2, the only difference being that the v2 magic causes the kernel to log a "deprecated" warning so the admin can find applications that may be using v2 inappropriately. [User space code continues to be encouraged to use the libcap API which protects the application from details like this. libcap-2.10 is the first to support v3 capabilities.] Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518. Thanks to Bojan Smojver for the report. [akpm@linux-foundation.org: s/depreciate/deprecate/g] [akpm@linux-foundation.org: be robust about put_user size] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Bojan Smojver <bojan@rexursive.com> Cc: stable@kernel.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09eCryptfs: remove unnecessary page decrypt callMichael Halcrow
upstream commit: d3e49afbb66109613c3474f2273f5830ac2dcb09 The page decrypt calls in ecryptfs_write() are both pointless and buggy. Pointless because ecryptfs_get_locked_page() has already brought the page up to date, and buggy because prior mmap writes will just be blown away by the decrypt call. This patch also removes the declaration of a now-nonexistent function ecryptfs_write_zeros(). Thanks to Eric Sandeen and David Kleikamp for helping to track this down. Eric said: fsx w/ mmap dies quickly ( < 100 ops) without this, and survives nicely (to millions of ops+) with it in place. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [chrisw: backport to 2.6.25.5] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09pagemap: fix bug in add_to_pagemap, require aligned-length reads of ↵Thomas Tuttle
/proc/pid/pagemap upstream commit: aae8679b0ebcaa92f99c1c3cb0cd651594a43915 Fix a bug in add_to_pagemap. Previously, since pm->out was a char *, put_user was only copying 1 byte of every PFN, resulting in the top 7 bytes of each PFN not being copied. By requiring that reads be a multiple of 8 bytes, I can make pm->out and pm->end u64*s instead of char*s, which makes put_user work properly, and also simplifies the logic in add_to_pagemap a bit. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Thomas Tuttle <ttuttle@google.com> Cc: Matt Mackall <mpm@selenic.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09proc: calculate the correct /proc/<pid> link countVegard Nossum
upstream commit: aed5417593ad125283f35513573282139a8664b5 This patch: commit e9720acd728a46cb40daa52c99a979f7c4ff195c Author: Pavel Emelyanov <xemul@openvz.org> Date: Fri Mar 7 11:08:40 2008 -0800 [NET]: Make /proc/net a symlink on /proc/self/net (v3) introduced a /proc/self/net directory without bumping the corresponding link count for /proc/self. This patch replaces the static link count initializations with a call that counts the number of directory entries in the given pid_entry table whenever it is instantiated, and thus relieves the burden of manually keeping the two in sync. [akpm@linux-foundation.org: cleanup] Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09XFS: Fix memory corruption with small buffer readsChristoph Hellwig
upstream commit: 6ab455eeaff6893cd06da33843e840d888cdc04a When we have multiple buffers in a single page for a blocksize == pagesize filesystem we might overwrite the page contents if two callers hit it shortly after each other. To prevent that we need to keep the page locked until I/O is completed and the page marked uptodate. Thanks to Eric Sandeen for triaging this bug and finding a reproducible testcase and Dave Chinner for additional advice. This should fix kernel.org bz #10421. Tested-by: Eric Sandeen <sandeen@sandeen.net> SGI-PV: 981813 SGI-Modid: xfs-linux-melb:xfs-kern:31173a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09CIFS: Fix UNC path prefix on QueryUnixPathInfo to have correct slashSteve French
upstream commit: 076d8423a98659a92837b07aa494cb74bfefe77c When a share was in DFS and the server was Unix/Linux, we were sending paths of the form \\server\share/dir/file rather than //server/share/dir/file There was some discussion between me and jra over whether we should use /server/share/dir/file as MS sometimes says - but the documentation for this claims it should be doubleslash for this type of UNC-like path format and that works, so leaving it as doubleslash but converting the \ to / in the the //server/share portion. This gets Samba to now correctly return STATUS_PATH_NOT_COVERED when it is supposed to (Windows already did since the direction of the slash was not an issue for them). Still need another minor change to fully enable DFS (need to finish some chages to SMBGetDFSRefer Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-09ext3/4: fix uninitialized bs in ext3/4_xattr_set_handle()Tiger Yang
commit 7e01c8e5420b6c7f9d85d34c15d8c7a15c9fc720 upstream This fix the uninitialized bs when we try to replace a xattr entry in ibody with the new value which require more than free space. This situation only happens we format ext3/4 with inode size more than 128 and we have put xattr entries both in ibody and block. The consequences about this bug is we will lost the xattr block which pointed by i_file_acl with all xattr entires in it. We will alloc a new xattr block and put that large value entry in it. The old xattr block will become orphan block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Gruenbacher <agruen@suse.de> Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-06asn1: additional sanity checking during BER decoding (CVE-2008-1673)Chris Wright
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5 - Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09reiserfs: Unpack tails on quota filesJan Kara
commit d5dee5c395062a55236318ac4eec1f4ebb9de6db upstream Quota files cannot have tails because quota_write and quota_read functions do not support them. So far when quota files did have tail, we just refused to turn quotas on it. Sadly this check has been wrong and so there are now plenty installations where quota files don't have NOTAIL flag set and so now after fixing the check, they suddently fail to turn quotas on. Since it's easy to unpack the tail from kernel, do this from reiserfs_quota_on() which solves the problem and is generally nicer to users anyway. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: <urhausen@urifabi.net> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-09vfs: fix permission checking in sys_utimensatMiklos Szeredi
commit: 02c6be615f1fcd37ac5ed93a3ad6692ad8991cd9 upstream If utimensat() is called with both times set to UTIME_NOW or one of them to UTIME_NOW and the other to UTIME_OMIT, then it will update the file time without any permission checking. I don't think this can be used for anything other than a local DoS, but could be quite bewildering at that (e.g. "Why was that large source tree rebuilt when I didn't modify anything???") This affects all kernels from 2.6.22, when the utimensat() syscall was introduced. Fix by doing the same permission checking as for the "times == NULL" case. Thanks to Michael Kerrisk, whose utimensat-non-conformances-and-fixes.patch in -mm also fixes this (and breaks other stuff), only he didn't realize the security implications of this bug. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-06fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)Al Viro
commit 0b2bac2f1ea0d33a3621b27ca68b9ae760fca2e9 upstream. fcntl_setlk()/close() race prevention has a subtle hole - we need to make sure that if we *do* have an fcntl/close race on SMP box, the access to descriptor table and inode->i_flock won't get reordered. As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs. STORE descriptor table entry, LOAD inode->i_flock with not a single lock in common on both sides. We do have BKL around the first STORE, but check in locks_remove_posix() is outside of BKL and for a good reason - we don't want BKL on common path of close(2). Solution is to hold ->file_lock around fcheck() in there; that orders us wrt removal from descriptor table that preceded locks_remove_posix() on close path and we either come first (in which case eviction will be handled by the close side) or we'll see the effect of close and do eviction ourselves. Note that even though it's read-only access, we do need ->file_lock here - rcu_read_lock() won't be enough to order the things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01Fix dnotify/close race (CVE-2008-1375)Al Viro
commit 214b7049a7929f03bbd2786aaef04b8b79db34e2 upstream. We have a race between fcntl() and close() that can lead to dnotify_struct inserted into inode's list *after* the last descriptor had been gone from current->files. Since that's the only point where dnotify_struct gets evicted, we are screwed - it will stick around indefinitely. Even after struct file in question is gone and freed. Worse, we can trigger send_sigio() on it at any later point, which allows to send an arbitrary signal to arbitrary process if we manage to apply enough memory pressure to get the page that used to host that struct file and fill it with the right pattern... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01aio: io_getevents() should return if io_destroy() is invokedJeff Moyer
commit e92adcba261fd391591bb63c1703185a04a41554 upstream This patch wakes up a thread waiting in io_getevents if another thread destroys the context. This was tested using a small program that spawns a thread to wait in io_getevents while the parent thread destroys the io context and then waits for the getevents thread to exit. Without this patch, the program hangs indefinitely. With the patch, the program exits as expected. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Christopher Smith <x@xman.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01JFFS2: Fix free space leak with in-band cleanmarkersDavid Woodhouse
We were accounting for the cleanmarker by calling jffs2_link_node_ref() (without locking!), which adjusted both superblock and per-eraseblock accounting, subtracting the size of the cleanmarker from {jeb,c}->free_size and adding it to {jeb,c}->used_size. But only _then_ were we adding the size of the newly-erased block back to the superblock counts, and we were adding each of jeb->{free,used}_size to the corresponding superblock counts. Thus, the size of the cleanmarker was effectively subtracted from the superblock's free_size _twice_. Fix this, by always adding a full eraseblock size to c->free_size when we've erased a block. And call jffs2_link_node_ref() under the proper lock, while we're at it. Thanks to Alexander Yurchenko and/or Damir Shayhutdinov for (almost) pinpointing the problem. [Backport of commit 014b164e1392a166fe96e003d2f0e7ad2e2a0bb7] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-16AFS: Do not describe debug parameters with their valuePaul Bolle
Describe debug parameters with their names (and not their values). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrsJan Kara
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But filesystems are calling this function while holding xattr_sem so possible recursion into the fs violates locking ordering of xattr_sem and transaction start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems can specify desired gfp mask and use GFP_NOFS from all of them. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Dave Jones <davej@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-14JFFS2 Fix of panics caused by wrong condition for hole frag creation in ↵Alexey Korolev
write_begin This fixes a regression introduced in commit 205c109a7a96d9a3d8ffe64c4068b70811fef5e8 when switching to write_begin/write_end operations in JFFS2. The page offset is miscalculated, leading to corruption of the fragment lists and subsequently to memory corruption and panics. [ Side note: the bug is a fairly direct result of the naming. Nick was likely misled by the use of "offs", since we tend to use the notion of "offset" not as an absolute position, but as an offset _within_ a page or allocation. Alternatively, a "pgoff_t" is a page index, but not a byte offset - our VM naming can be a bit confusing. So in this case, a VM person would likely have called this a "pos", not an "offs", or perhaps talked about byte offsets rather than page offsets (since it's counted in bytes, not pages). - Linus ] Signed-off-by: Alexey Korolev <akorolev@infradead.org> Signed-off-by: Vasiliy Leonenko <vasiliy.leonenko@mail.ru> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-14locks: fix possible infinite loop in fcntl(F_SETLKW) over nfsJ. Bruce Fields
Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: 7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-11Merge branch 'docs' of git://git.lwn.net/linux-2.6Linus Torvalds
* 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation
2008-04-11Documentation: move nfsroot.txt to filesystems/J. Bruce Fields
Documentation/ is a little large, and filesystems/ seems an obvious place for this file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-11signalfd: fix for incorrect SI_QUEUE user data reportingDavide Libenzi
Michael Kerrisk found out that signalfd was not reporting back user data pushed using sigqueue: http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123 The following patch makes signalfd report back the ssi_ptr and ssi_int members of the signalfd_siginfo structure. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-11eventfd/kaio integration fixDavide Libenzi
Jeff Roberson discovered a race when using kaio eventfd based notifications. When it occurs it can lead tomissed wakeups and hung userspace. This patch fixes the race by moving the notification inside the spinlocked section of kaio. The operation is safe since eventfd spinlock and kaio one are unrelated. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jeff Roberson <jroberson@chesapeake.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10asmlinkage_protect sys_io_geteventsRoland McGrath
Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the stack, i.e. the user struct pt_regs. Here the problem is not a tail call, but just the compiler's use of the stack when it inlines and optimizes the body of the called function. This seems to avoid it. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10asmlinkage_protect replaces prevent_tail_callRoland McGrath
The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6Linus Torvalds
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Ensure "both" features2 slots are consistent [XFS] Fix superblock features2 field alignment problem [XFS] remove shouting-indirection macros from xfs_sb.h
2008-04-10Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: do not leak ioc_data across iosched switches splice: fix infinite loop in generic_file_splice_read()
2008-04-10HFS+: fix unlink of linksRoman Zippel
Some time ago while attempting to handle invalid link counts, I botched the unlink of links itself, so this patch fixes this now correctly, so that only the link count of nodes that don't point to links is ignored. Thanks to Vlado Plaga <rechner@vlado-do.de> to notify me of this problem. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10[XFS] Ensure "both" features2 slots are consistentEric Sandeen
Since older kernels may look in the sb_bad_features2 slot for flags, rather than zeroing it out on fixup, we should make it equal to the sb_features2 value. Also, if the ATTR2 flag was not found prior to features2 fixup, it was not set in the mount flags, so re-check after the fixup so that the current session will use the feature. Also fix up the comments to reflect these changes. SGI-PV: 980085 SGI-Modid: xfs-linux-melb:xfs-kern:30778a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10[XFS] Fix superblock features2 field alignment problemDavid Chinner
Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of the on-disk superblock can vary in location This causes problems when the filesystem gets moved to a different platform, or there is a 32 bit userspace and 64 bit kernel. This patch detects the defect at mount time, logs a warning such as: XFS: correcting sb_features alignment problem in dmesg and corrects the problem so that everything is OK. it also blacklists the bad field in the superblock so it does not get used for something else later on. SGI-PV: 977636 SGI-Modid: xfs-linux-melb:xfs-kern:30539a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10[XFS] remove shouting-indirection macros from xfs_sb.hEric Sandeen
Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-10splice: fix infinite loop in generic_file_splice_read()Jens Axboe
There's a quirky loop in generic_file_splice_read() that could go on indefinitely, if the file splice returns 0 permanently (and not just as a temporary condition). Get rid of the loop and pass back -EAGAIN correctly from __generic_file_splice_read(), so we handle that condition properly as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-04-08fix bug - executing FDPIC ELF on NFS mount triggers BUG() at ↵Bryan Wu
mm/nommu.c:862:/do_mmap_private() NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992 Signed-off-by: Bryan Wu <cooloney@kernel.org> Cc: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08NFS: initialize flags field in nfs_open_contextJeff Layton
The nfs_open_context struct had a "flags" field added recently, but the allocator isn't initializing it. It also looks like the allocator isn't initializing the mode or list either, but they seem to be overwritten by the caller, so that's less of an issue. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-04Be more careful about marking buffers dirtyLinus Torvalds
Mikulas Patocka noted that the optimization where we check if a buffer was already dirty (and we avoid re-dirtying it) was not really SMP-safe. Since the read of the old status was not synchronized with anything, an aggressive CPU re-ordering of memory accesses might have moved that read up to before the data was even written to the buffer, and another CPU that cleaned it again, causing the newly dirty state to never actually hit the disk. Admittedly this would probably never trigger in practice, but it's still wrong. Mikulas sent a patch that fixed the problem, but I dislike the subtlety of the whole optimization, so this is an alternate fix that is more explicit about the particular SMP ordering for the optimization, and separates out the speculative reads of the buffer state into its own conditional (and makes the memory barrier only happen if we are likely to actually hit the optimized case in the first place). I considered removing the optimization entirely, but Andrew argued for it's continued existence. I'm a push-over. Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-03afs: remove smp_prcessor_id() from debug macroSven Schnelle
Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-03splice: use mapping_gfp_maskHugh Dickins
The loop block driver is careful to mask __GFP_IO|__GFP_FS out of its mapping_gfp_mask, to avoid hangs under memory pressure. But nowadays it uses splice, usually going through __generic_file_splice_read. That must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-02efs: update error msg to not refer to deleted read_inode()Robert P. J. Day
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-02afs: add missing up_write() on returnSven Schnelle
If afs_cell_alloc() fails, afs_cells_sem doesn't get unlocked, which leads to a deadlock. Unlock it before returning. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30cifs: fix misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30NULL noise: fs/*, mm/*, kernel/*Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30jbd/jbd2 NULL noiseAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock [PATCH] do shrink_submounts() for all fs types [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() [PATCH] count ghost references to vfsmounts [PATCH] reduce stack footprint in namespace.c
2008-03-28afs: prevent double cell registrationSven Schnelle
kafs doesn't check if the cell already exists - so if you do an echo "add newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell again. kobject will also complain about a double registration. To prevent such problems, return -EEXIST in that case. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28vfs: fix data leak in nobh_write_end()Dmitri Monakhov
Current nobh_write_end() implementation ignore partial writes(copied < len) case if page was fully mapped and simply mark page as Uptodate, which is totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in previous write_begin call. It simply contains garbage from pagecache and result in data leakage. #TEST_CASE_BEGIN: ~~~~~~~~~~~~~~~~ In fact issue triggered by classical testcase open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 ftruncate(3, 409600) = 0 writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1 ##TESTCASE_SOURCE: ~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/uio.h> #include <sys/mman.h> #include <errno.h> int main(int argc, char **argv) { int fd, ret; void* p; struct iovec iov[2]; fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666); ftruncate(fd, 409600); iov[0].iov_base="a"; iov[0].iov_len=1; iov[1].iov_base=NULL; iov[1].iov_len=4096; ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec)); printf("writev = %d, err = %d\n", ret, errno); return 0; } ##TESTCASE RESULT: ~~~~~~~~~~~~~~~~~~ [root@ts63 ~]# mount | grep mnt2 /dev/mapper/test on /mnt2 type ext2 (rw,nobh) [root@ts63 ~]# /tmp/writev /mnt2/test writev = 1, err = 0 [root@ts63 ~]# hexdump -C /mnt2/test 00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...| 00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000020 df df df df df df df df df df df df df df df df |................| 00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......| 00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......| 00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/| 00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n| skip.. 00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......| 00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v| 00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.| 00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......| 00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...| 00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................| 00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/| 00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat| 00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| As you can see file's page contains garbage from pagecache instead of zeros. #TEST_CASE_END Attached patch: - Add sanity check BUG_ON in order to prevent incorrect usage by caller, This is function invariant because page can has buffers and in no zero *fadata pointer at the same time. - Always attach buffers to page is it is partial write case. - Always switch back to generic_write_end if page has buffers. This is reasonable because if page already has buffer then generic_write_begin was called previously. Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>