Age | Commit message (Collapse) | Author |
|
simple "mount -t cifs //xxx /mnt" oopsed on strlen of options
http://kerneloops.org/guilty.php?guilty=cifs_get_sb&version=2.6.25-release&start=16711 \
68&end=1703935&class=oops
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
upstream commit: 71fd5179e8d1d4d503b517e0c5374f7c49540bfc
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 8dc4e37362a5dc910d704d52ac6542bfd49ddc2f
dget(dentry->d_parent) --> dget_parent(dentry)
unlock_parent() is racy and unnecessary. Replace single caller with
unlock_dir().
There are several other suspect uses of ->d_parent in ecryptfs...
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 2f9b12a31fcb738ea8c9eb0d4ddf906c6f1d696c
Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 9c3580aa52195699065bc2d7242b1c7e3e6903fa
Callers of notify_change() need to hold i_mutex.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: ed1524371716466e9c762808b02601d0d0276a92
Duh... Fortunately, the bug is quite recent (post-2.6.25) and, embarrassingly,
mine ;-/
http://bugzilla.kernel.org/show_bug.cgi?id=10878
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
support.
upstream commit: ca05a99a54db1db5bca72eccb5866d2a86f8517f
Source code out there hard-codes a notion of what the
_LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the
raw capability system calls capget() and capset(). Its unfortunate, but
true.
Since the confusing header file has been in a released kernel, there is
software that is erroneously using 64-bit capabilities with the semantics
of 32-bit compatibilities. These recently compiled programs may suffer
corruption of their memory when sys_getcap() overwrites more memory than
they are coded to expect, and the raising of added capabilities when using
sys_capset().
As such, this patch does a number of things to clean up the situation
for all. It
1. forces the _LINUX_CAPABILITY_VERSION define to always retain its
legacy value.
2. adopts a new #define strategy for the kernel's internal
implementation of the preferred magic.
3. deprecates v2 capability magic in favor of a new (v3) magic
number. The functionality of v3 is entirely equivalent to v2,
the only difference being that the v2 magic causes the kernel
to log a "deprecated" warning so the admin can find applications
that may be using v2 inappropriately.
[User space code continues to be encouraged to use the libcap API which
protects the application from details like this. libcap-2.10 is the first
to support v3 capabilities.]
Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518.
Thanks to Bojan Smojver for the report.
[akpm@linux-foundation.org: s/depreciate/deprecate/g]
[akpm@linux-foundation.org: be robust about put_user size]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Bojan Smojver <bojan@rexursive.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: d3e49afbb66109613c3474f2273f5830ac2dcb09
The page decrypt calls in ecryptfs_write() are both pointless and buggy.
Pointless because ecryptfs_get_locked_page() has already brought the page
up to date, and buggy because prior mmap writes will just be blown away by
the decrypt call.
This patch also removes the declaration of a now-nonexistent function
ecryptfs_write_zeros().
Thanks to Eric Sandeen and David Kleikamp for helping to track this
down.
Eric said:
fsx w/ mmap dies quickly ( < 100 ops) without this, and survives
nicely (to millions of ops+) with it in place.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: backport to 2.6.25.5]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
/proc/pid/pagemap
upstream commit: aae8679b0ebcaa92f99c1c3cb0cd651594a43915
Fix a bug in add_to_pagemap. Previously, since pm->out was a char *,
put_user was only copying 1 byte of every PFN, resulting in the top 7
bytes of each PFN not being copied. By requiring that reads be a multiple
of 8 bytes, I can make pm->out and pm->end u64*s instead of char*s, which
makes put_user work properly, and also simplifies the logic in
add_to_pagemap a bit.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Tuttle <ttuttle@google.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: aed5417593ad125283f35513573282139a8664b5
This patch:
commit e9720acd728a46cb40daa52c99a979f7c4ff195c
Author: Pavel Emelyanov <xemul@openvz.org>
Date: Fri Mar 7 11:08:40 2008 -0800
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
introduced a /proc/self/net directory without bumping the corresponding
link count for /proc/self.
This patch replaces the static link count initializations with a call that
counts the number of directory entries in the given pid_entry table
whenever it is instantiated, and thus relieves the burden of manually
keeping the two in sync.
[akpm@linux-foundation.org: cleanup]
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 6ab455eeaff6893cd06da33843e840d888cdc04a
When we have multiple buffers in a single page for a blocksize == pagesize
filesystem we might overwrite the page contents if two callers hit it
shortly after each other. To prevent that we need to keep the page locked
until I/O is completed and the page marked uptodate.
Thanks to Eric Sandeen for triaging this bug and finding a reproducible
testcase and Dave Chinner for additional advice.
This should fix kernel.org bz #10421.
Tested-by: Eric Sandeen <sandeen@sandeen.net>
SGI-PV: 981813
SGI-Modid: xfs-linux-melb:xfs-kern:31173a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 076d8423a98659a92837b07aa494cb74bfefe77c
When a share was in DFS and the server was Unix/Linux, we were sending paths of the form
\\server\share/dir/file
rather than
//server/share/dir/file
There was some discussion between me and jra over whether we should use
/server/share/dir/file
as MS sometimes says - but the documentation for this claims it should be
doubleslash for this type of UNC-like path format and that works, so leaving
it as doubleslash but converting the \ to / in the the //server/share portion.
This gets Samba to now correctly return STATUS_PATH_NOT_COVERED when it is
supposed to (Windows already did since the direction of the slash was not an issue
for them). Still need another minor change to fully enable DFS (need to finish
some chages to SMBGetDFSRefer
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7e01c8e5420b6c7f9d85d34c15d8c7a15c9fc720 upstream
This fix the uninitialized bs when we try to replace a xattr entry in
ibody with the new value which require more than free space.
This situation only happens we format ext3/4 with inode size more than 128 and
we have put xattr entries both in ibody and block. The consequences about
this bug is we will lost the xattr block which pointed by i_file_acl with all
xattr entires in it. We will alloc a new xattr block and put that large value
entry in it. The old xattr block will become orphan block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5
- Don't trust a length which is greater than the working buffer.
An invalid length could cause overflow when calculating buffer size
for decoding oid.
- An oid length of zero is invalid and allows for an off-by-one error when
decoding oid because the first subid actually encodes first 2 subids.
- A primitive encoding may not have an indefinite length.
Thanks to Wei Wang from McAfee for report.
Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
commit d5dee5c395062a55236318ac4eec1f4ebb9de6db upstream
Quota files cannot have tails because quota_write and quota_read functions do
not support them. So far when quota files did have tail, we just refused to
turn quotas on it. Sadly this check has been wrong and so there are now plenty
installations where quota files don't have NOTAIL flag set and so now after
fixing the check, they suddently fail to turn quotas on. Since it's easy to
unpack the tail from kernel, do this from reiserfs_quota_on() which solves the
problem and is generally nicer to users anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: <urhausen@urifabi.net>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit: 02c6be615f1fcd37ac5ed93a3ad6692ad8991cd9 upstream
If utimensat() is called with both times set to UTIME_NOW or one of them to
UTIME_NOW and the other to UTIME_OMIT, then it will update the file time
without any permission checking.
I don't think this can be used for anything other than a local DoS, but could
be quite bewildering at that (e.g. "Why was that large source tree rebuilt
when I didn't modify anything???")
This affects all kernels from 2.6.22, when the utimensat() syscall was
introduced.
Fix by doing the same permission checking as for the "times == NULL" case.
Thanks to Michael Kerrisk, whose utimensat-non-conformances-and-fixes.patch in
-mm also fixes this (and breaks other stuff), only he didn't realize the
security implications of this bug.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0b2bac2f1ea0d33a3621b27ca68b9ae760fca2e9 upstream.
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.
As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).
Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 214b7049a7929f03bbd2786aaef04b8b79db34e2 upstream.
We have a race between fcntl() and close() that can lead to
dnotify_struct inserted into inode's list *after* the last descriptor
had been gone from current->files.
Since that's the only point where dnotify_struct gets evicted, we are
screwed - it will stick around indefinitely. Even after struct file in
question is gone and freed. Worse, we can trigger send_sigio() on it at
any later point, which allows to send an arbitrary signal to arbitrary
process if we manage to apply enough memory pressure to get the page
that used to host that struct file and fill it with the right pattern...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e92adcba261fd391591bb63c1703185a04a41554 upstream
This patch wakes up a thread waiting in io_getevents if another thread
destroys the context. This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit. Without this patch, the
program hangs indefinitely. With the patch, the program exits as expected.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Christopher Smith <x@xman.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
We were accounting for the cleanmarker by calling jffs2_link_node_ref()
(without locking!), which adjusted both superblock and per-eraseblock
accounting, subtracting the size of the cleanmarker from {jeb,c}->free_size
and adding it to {jeb,c}->used_size.
But only _then_ were we adding the size of the newly-erased block back
to the superblock counts, and we were adding each of jeb->{free,used}_size
to the corresponding superblock counts. Thus, the size of the cleanmarker
was effectively subtracted from the superblock's free_size _twice_.
Fix this, by always adding a full eraseblock size to c->free_size when
we've erased a block. And call jffs2_link_node_ref() under the proper
lock, while we're at it.
Thanks to Alexander Yurchenko and/or Damir Shayhutdinov for (almost)
pinpointing the problem.
[Backport of commit 014b164e1392a166fe96e003d2f0e7ad2e2a0bb7]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Describe debug parameters with their names (and not their values).
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
filesystems are calling this function while holding xattr_sem so possible
recursion into the fs violates locking ordering of xattr_sem and transaction
start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
can specify desired gfp mask and use GFP_NOFS from all of them.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Dave Jones <davej@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
write_begin
This fixes a regression introduced in commit
205c109a7a96d9a3d8ffe64c4068b70811fef5e8 when switching to
write_begin/write_end operations in JFFS2.
The page offset is miscalculated, leading to corruption of the fragment
lists and subsequently to memory corruption and panics.
[ Side note: the bug is a fairly direct result of the naming. Nick was
likely misled by the use of "offs", since we tend to use the notion of
"offset" not as an absolute position, but as an offset _within_ a page
or allocation.
Alternatively, a "pgoff_t" is a page index, but not a byte offset -
our VM naming can be a bit confusing.
So in this case, a VM person would likely have called this a "pos",
not an "offs", or perhaps talked about byte offsets rather than page
offsets (since it's counted in bytes, not pages). - Linus ]
Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Signed-off-by: Vasiliy Leonenko <vasiliy.leonenko@mail.ru>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Miklos Szeredi found the bug:
"Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns
NLM_LCK_DENIED.
"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever."
So, for example, opening a file on an nfs filesystem, changing
permissions to forbid further access, then trying to lock the file,
could result in an infinite loop.
And Trond Myklebust identified the culprit, from Marc Eshel and I:
7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
generic/filesystem switch from setlock code"
That commit claimed to just be reshuffling code, but actually introduced
a behavioral change by calling the lock method repeatedly as long as it
returned -EAGAIN.
We assumed this would be safe, since we assumed a lock of type SETLKW
would only return with either success or an error other than -EAGAIN.
However, nfs does can in fact return -EAGAIN in this situation, and
independently of whether that behavior is correct or not, we don't
actually need this change, and it seems far safer not to depend on such
assumptions about the filesystem's ->lock method.
Therefore, revert the problematic part of the original commit. This
leaves vfs_lock_file() and its other callers unchanged, while returning
fcntl_setlk and fcntl_setlk64 to their former behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'docs' of git://git.lwn.net/linux-2.6:
Add additional examples in Documentation/spinlocks.txt
Move sched-rt-group.txt to scheduler/
Documentation: move rpc-cache.txt to filesystems/
Documentation: move nfsroot.txt to filesystems/
Spell out behavior of atomic_dec_and_lock() in kerneldoc
Fix a typo in highres.txt
Fixes to the seq_file document
Fill out information on patch tags in SubmittingPatches
Add the seq_file documentation
|
|
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Michael Kerrisk found out that signalfd was not reporting back user data
pushed using sigqueue:
http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123
The following patch makes signalfd report back the ssi_ptr and ssi_int members
of the signalfd_siginfo structure.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Jeff Roberson discovered a race when using kaio eventfd based notifications.
When it occurs it can lead tomissed wakeups and hung userspace.
This patch fixes the race by moving the notification inside the spinlocked
section of kaio. The operation is safe since eventfd spinlock and kaio one
are unrelated.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jeff Roberson <jroberson@chesapeake.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with
CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the
stack, i.e. the user struct pt_regs. Here the problem is not a tail
call, but just the compiler's use of the stack when it inlines and
optimizes the body of the called function. This seems to avoid it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs. The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.
Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.
More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function. This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else. It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Ensure "both" features2 slots are consistent
[XFS] Fix superblock features2 field alignment problem
[XFS] remove shouting-indirection macros from xfs_sb.h
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
cfq-iosched: do not leak ioc_data across iosched switches
splice: fix infinite loop in generic_file_splice_read()
|
|
Some time ago while attempting to handle invalid link counts, I botched
the unlink of links itself, so this patch fixes this now correctly, so
that only the link count of nodes that don't point to links is ignored.
Thanks to Vlado Plaga <rechner@vlado-do.de> to notify me of this
problem.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since older kernels may look in the sb_bad_features2 slot for flags,
rather than zeroing it out on fixup, we should make it equal to the
sb_features2 value.
Also, if the ATTR2 flag was not found prior to features2 fixup, it was not
set in the mount flags, so re-check after the fixup so that the current
session will use the feature.
Also fix up the comments to reflect these changes.
SGI-PV: 980085
SGI-Modid: xfs-linux-melb:xfs-kern:30778a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
|
Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of
the on-disk superblock can vary in location This causes problems when the
filesystem gets moved to a different platform, or there is a 32 bit
userspace and 64 bit kernel.
This patch detects the defect at mount time, logs a warning such as:
XFS: correcting sb_features alignment problem
in dmesg and corrects the problem so that everything is OK. it also
blacklists the bad field in the superblock so it does not get used for
something else later on.
SGI-PV: 977636
SGI-Modid: xfs-linux-melb:xfs-kern:30539a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
|
Remove macro-to-small-function indirection from xfs_sb.h, and remove some
which are completely unused.
SGI-PV: 976035
SGI-Modid: xfs-linux-melb:xfs-kern:30528a
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
|
There's a quirky loop in generic_file_splice_read() that could go
on indefinitely, if the file splice returns 0 permanently (and not
just as a temporary condition). Get rid of the loop and pass
back -EAGAIN correctly from __generic_file_splice_read(), so we
handle that condition properly as well.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
mm/nommu.c:862:/do_mmap_private()
NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine
http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The nfs_open_context struct had a "flags" field added recently, but the
allocator isn't initializing it. It also looks like the allocator isn't
initializing the mode or list either, but they seem to be overwritten
by the caller, so that's less of an issue.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Mikulas Patocka noted that the optimization where we check if a buffer
was already dirty (and we avoid re-dirtying it) was not really SMP-safe.
Since the read of the old status was not synchronized with anything, an
aggressive CPU re-ordering of memory accesses might have moved that read
up to before the data was even written to the buffer, and another CPU
that cleaned it again, causing the newly dirty state to never actually
hit the disk.
Admittedly this would probably never trigger in practice, but it's still
wrong.
Mikulas sent a patch that fixed the problem, but I dislike the subtlety
of the whole optimization, so this is an alternate fix that is more
explicit about the particular SMP ordering for the optimization, and
separates out the speculative reads of the buffer state into its own
conditional (and makes the memory barrier only happen if we are likely
to actually hit the optimized case in the first place).
I considered removing the optimization entirely, but Andrew argued for
it's continued existence. I'm a push-over.
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The loop block driver is careful to mask __GFP_IO|__GFP_FS out of its
mapping_gfp_mask, to avoid hangs under memory pressure. But nowadays
it uses splice, usually going through __generic_file_splice_read. That
must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If afs_cell_alloc() fails, afs_cells_sem doesn't get unlocked, which
leads to a deadlock. Unlock it before returning.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock
[PATCH] do shrink_submounts() for all fs types
[PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()
[PATCH] count ghost references to vfsmounts
[PATCH] reduce stack footprint in namespace.c
|
|
kafs doesn't check if the cell already exists - so if you do an echo "add
newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
again. kobject will also complain about a double registration. To prevent
such problems, return -EEXIST in that case.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Current nobh_write_end() implementation ignore partial writes(copied < len)
case if page was fully mapped and simply mark page as Uptodate, which is
totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
previous write_begin call. It simply contains garbage from pagecache and
result in data leakage.
#TEST_CASE_BEGIN:
~~~~~~~~~~~~~~~~
In fact issue triggered by classical testcase
open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
ftruncate(3, 409600) = 0
writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1
##TESTCASE_SOURCE:
~~~~~~~~~~~~~~~~~
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/mman.h>
#include <errno.h>
int main(int argc, char **argv)
{
int fd, ret;
void* p;
struct iovec iov[2];
fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
ftruncate(fd, 409600);
iov[0].iov_base="a";
iov[0].iov_len=1;
iov[1].iov_base=NULL;
iov[1].iov_len=4096;
ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec));
printf("writev = %d, err = %d\n", ret, errno);
return 0;
}
##TESTCASE RESULT:
~~~~~~~~~~~~~~~~~~
[root@ts63 ~]# mount | grep mnt2
/dev/mapper/test on /mnt2 type ext2 (rw,nobh)
[root@ts63 ~]# /tmp/writev /mnt2/test
writev = 1, err = 0
[root@ts63 ~]# hexdump -C /mnt2/test
00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...|
00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......|
00000020 df df df df df df df df df df df df df df df df |................|
00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......|
00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......|
00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......|
00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/|
00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n|
skip..
00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......|
00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v|
00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.|
00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......|
00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...|
00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......|
00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................|
00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......|
00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/|
00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat|
00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....|
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
As you can see file's page contains garbage from pagecache instead of zeros.
#TEST_CASE_END
Attached patch:
- Add sanity check BUG_ON in order to prevent incorrect usage by caller,
This is function invariant because page can has buffers and in no zero
*fadata pointer at the same time.
- Always attach buffers to page is it is partial write case.
- Always switch back to generic_write_end if page has buffers.
This is reasonable because if page already has buffer then generic_write_begin
was called previously.
Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org>
Reviewed-by: Nick Piggin <npiggin@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|