Age | Commit message (Collapse) | Author |
|
commit 838726c4756813576078203eb7e1e219db0da870 upstream
The direct I/O write codepath for CIFS is done through
cifs_user_write(). That function does not currently call
generic_write_checks() so the file position isn't being properly set
when the file is opened with O_APPEND. It's also not doing the other
"normal" checks that should be done for a write call.
The problem is currently that when you open a file with O_APPEND on a
mount with the directio mount option, the file position is set to the
beginning of the file. This makes any subsequent writes clobber the data
in the file starting at the beginning.
This seems to fix the problem in cursory testing. It is, however
important to note that NFS disallows the combination of
(O_DIRECT|O_APPEND). If my understanding is correct, the concern is
races with multiple clients appending to a file clobbering each others'
data. Since the write model for CIFS and NFS is pretty similar in this
regard, CIFS is probably subject to the same sort of races. What's
unclear to me is why this is a particular problem with O_DIRECT and not
with buffered writes...
Regardless, disallowing O_APPEND on an entire mount is probably not
reasonable, so we'll probably just have to deal with it and reevaluate
this flag combination when we get proper support for O_DIRECT. In the
meantime this patch at least fixes the existing problem.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 82d63fc9e30687c055b97928942b8893ea65b0bb upstream
After commit a97c9bf33f4612e2aed6f000f6b1d268b6814f3c (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.
It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).
Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 91b80969ba466ba4b915a4a1d03add8c297add3f upstream
The array we kmalloc() here is not large enough.
Thanks to Johann Dahm and David Richter for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 04e1e0cccade330ab3715ce59234f7e3b087e246 upstream.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Cc: Eugene Teo <eteo@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2c731afb0d4ba16018b400c75665fbdb8feb2175 upstream
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ad661334b8ae421154b121ee6ad3b56807adbf11 upstream
In looking at network named pipe support on cifs, I noticed that
Dave Howell's iget patch:
iget: stop CIFS from using iget() and read_inode()
broke mounts to IPC$ (the interprocess communication share), and don't
handle the error case (when getting info on the root inode fails).
Thanks to Gunter who noted a typo in a debug line in the original
version of this patch.
CC: David Howells <dhowells@redhat.com>
CC: Gunter Kukkukk <linux@kukkukk.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d70b67c8bc72ee23b55381bd6a884f4796692f77 upstream
Lookup can install a child dentry for a deleted directory. This keeps
the directory dentry alive, and the inode pinned in the cache and on
disk, even after all external references have gone away.
This isn't a big problem normally, since memory pressure or umount
will clear out the directory dentry and its children, releasing the
inode. But for UBIFS this causes problems because its orphan area can
overflow.
Fix this by returning ENOENT for all lookups on a S_DEAD directory
before creating a child dentry.
Thanks to Zoltan Sogor for noticing this while testing UBIFS, and
Artem for the excellent analysis of the problem and testing.
Reported-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Tested-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0607fd02587a6b4b086dc746d63123c1f284db68 upstream
I received a complaint that some FAT formated medias (e.g. sd memory cards)
trigger a "unknown partition table" message even though there is no partition
table and they work correctly, while in general (when e.g. formated with
mkdosfs or even Windows Vista) this message is not shown.
Currently this seems only to happen when the medias get formatted with Windows
XP (and possibly Win 2000). Then the boot indicator byte contains garbage
(part of text message) and so do the other parts checked by msdos_paritition
which then later triggers this message.
References: novell bug #364365
Most fat formatted media without partition table contains zeros in the boot
indication and the other tested bytes and so falls through the checks in
msdos_partition, leading it to return with 1 (all is fine).
But some (e.g. WinXP formatted) fat fomated medias don't use boot_ind and so
the check fails and causes a "unkown partition table" warning eventhough there
is none and everything would be fine.
This additional check directly verifies if there is a fat formatted medium
without a partition table.
Signed-off-by: Frank Seidel <fseidel@suse.de>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 73f20e58b1d586e9f6d3ddc3aad872829aca7743 upstream
The on-disk media specification field in FAT is only 8-bits, so testing for
<=0xff is pointless, and can generate a "comparison is always true due to
limited range of data type" warning.
While we're there, convert FAT_VALID_MEDIA() into a C function - the present
implementation is buggy: it generates either one or two references to its
argument.
Cc: Frank Seidel <fseidel@suse.de>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5b9a499d77e9dd39c9e6611ea10c56a31604f274 upstream
There are several cases where the running transaction can get buffers added to
its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers
counter end up larger than its t_outstanding_credits counter.
This will cause issues when starting new transactions as while we are logging
buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes
negative, we will report that we need less space in the journal than we
actually need, so transactions will be started even though there may not be
enough room for them. In the worst case scenario (which admittedly is almost
impossible to reproduce) this will result in the journal running out of space.
The fix is to only
refile buffers from the committing transaction to the running transactions
BJ_Modified list when b_modified is set on that journal, which is the only way
to be sure if the running transaction has modified that buffer.
This patch also fixes an accounting error in journal_forget, it is possible
that we can call journal_forget on a buffer without having modified it, only
gotten write access to it, so instead of freeing a credit, we only do so if
the buffer was modified. The assert will help catch if this problem occurs.
Without these two patches I could hit this assert within minutes of running
postmark, with them this issue no longer arises. Thank you,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91 upstream
journal_try_to_free_buffers() could race with jbd commit transaction when
the later is holding the buffer reference while waiting for the data
buffer to flush to disk. If the caller of journal_try_to_free_buffers()
request tries hard to release the buffers, it will treat the failure as
error and return back to the caller. We have seen the directo IO failed
due to this race. Some of the caller of releasepage() also expecting the
buffer to be dropped when passed with GFP_KERNEL mask to the
releasepage()->journal_try_to_free_buffers().
With this patch, if the caller is passing the __GFP_WAIT and __GFP_FS to
indicating this call could wait, in case of try_to_free_buffers() failed,
let's waiting for journal_commit_transaction() to finish commit the
current committing transaction, then try to free those buffers again.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5bc833feaa8b2236265764e7e81f44937be46eda upstream
Currently at the start of a journal commit we loop through all of the buffers
on the committing transaction and clear the b_modified flag (the flag that is
set when a transaction modifies the buffer) under the j_list_lock.
The problem is that everywhere else this flag is modified only under the jbd
lock buffer flag, so it will race with a running transaction who could
potentially set it, and have it unset by the committing transaction.
This is also a big waste, you can have several thousands of buffers that you
are clearing the modified flag on when you may not need to. This patch
removes this code and instead clears the b_modified flag upon entering
do_get_write_access/journal_get_create_access, so if that transaction does
indeed use the buffer then it will be accounted for properly, and if it does
not then we know we didn't use it.
That will be important for the next patch in this series. Tested thoroughly
by myself using postmark/iozone/bonnie++.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f41f741838480aeaa3a189cff6e210503cf9c42d upstream
...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
acls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e9baf6e59842285bcf9570f5094e4c27674a0f7c upstream
In case when both EEXIST and EROFS would apply we used to
return the former in mkdir(2) and friends. Lest anyone suspects
us of being consistent, in the same situation knfsd gave clients
nfs_erofs...
ro-bind series had switched the syscall side of things to
returning -EROFS and immediately broke an application - namely,
mkdir -p. Patch restores the original behaviour...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0056e65f9e28d83ee1a3fb4f7d0041e838f03c34 upstream
We zero-fill them like we are supposed to, and that's all fine. It's
only an error if the 'romfs_copyfrom()' routine isn't able to fill the
data that is supposed to be there.
Most of the patch is really just re-organizing the code a bit, and using
separate variables for the error value and for how much of the page we
actually filled from the filesystem.
Reported-and-tested-by: Chris Fester <cfester@wms.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Matt Waddel <matt.waddel@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7fcba054373d5dfc43d26e243a5c9b92069972ee upstream
Date: Mon, 28 Jul 2008 15:46:39 -0700
Subject: eCryptfs: use page_alloc not kmalloc to get a page of memory
With SLUB debugging turned on in 2.6.26, I was getting memory corruption
when testing eCryptfs. The root cause turned out to be that eCryptfs was
doing kmalloc(PAGE_CACHE_SIZE); virt_to_page() and treating that as a nice
page-aligned chunk of memory. But at least with SLUB debugging on, this
is not always true, and the page we get from virt_to_page does not
necessarily match the PAGE_CACHE_SIZE worth of memory we got from kmalloc.
My simple testcase was 2 loops doing "rm -f fileX; cp /tmp/fileX ." for 2
different multi-megabyte files. With this change I no longer see the
corruption.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3971e1a917548977cff71418a7c3575ffbc9571f upstream
This commit:
commit ba52de123d454b57369f291348266d86f4b35070
Author: Theodore Ts'o <tytso@mit.edu>
Date: Wed Sep 27 01:50:49 2006 -0700
[PATCH] inode-diet: Eliminate i_blksize from the inode structure
caused the block size used by pseudo-filesystems to decrease from
PAGE_SIZE to 1024 leading to a doubling of the number of context switches
during a kernbench run.
Signed-off-by: Alex Nixon <Alex.Nixon@citrix.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ian Campbell <Ian.Campbell@eu.citrix.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c0a1633b6201ef79e31b7da464d44fdf5953054d upstream
Some iso9660 images contain files with rockridge data that is either
incorrect or incompletely parsed. Prior to commit
f2966632a134e865db3c819346a1dc7d96e05309 ("[PATCH] rock: handle directory
overflows") (included with kernel 2.6.13) the kernel ignored the rockridge
data for these files, while still allowing the files to be accessed under
their non-rockridge names. That commit inadvertently changed things so
that files with invalid rockridge data could not be accessed at all. (I
ran across the problem when comparing some old CDs with hard disk copies I
had made long ago under kernel 2.4: a few of the files on the hard disk
copies were no longer visible on the CDs.)
This change reverts to the pre-2.6.13 behavior.
Signed-off-by: Adam Greenblatt <adam.greenblatt@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b48d380541f634663b71766005838edbb7261685 upstream
When quota structure is going to be dropped and it is dirty, quota code tries
to write it. If the write fails for some reason (e. g. transaction cannot
be started because the journal is aborted), we try writing again and again and
again... Fix the problem by clearing the dirty bit even if the write failed.
(akpm: for 2.6.27, 2.6.26.x and 2.6.25.x)
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: dingdinghua <dingdinghua85@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 536abdb0802f3fac1b217530741853843d63c281 upstream
The current definition of wksidarr works fine on little endian arches
(since cpu_to_le32 is a no-op there), but on big-endian arches, it fails
to compile with this error:
error: braced-group within expression allowed only inside a function
The problem is that this static declaration has cpu_to_le32 embedded
within it, and that expands into a function macro. We need to use
__constant_cpu_to_le32() instead.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 96a8e13ed44e380fc2bb6c711d74d5ba698c00b2 upstream
Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.
Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup. Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.
Reported-by: pageexec@freemail.hu
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit eb35c218d83ec0780d9db869310f2e333f628702 upstream
With the removal of struct file from the xattr code,
reiserfs_file_release() isn't used anymore, so the prealloc isn't
discarded. This causes hangs later down the line.
This patch adds it to reiserfs_delete_inode. In most cases it will be a
no-op due to it already having been called, but will avoid hangs with
xattrs.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb upstream
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.
This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:
INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald D ffff810010820978 6712 20558 2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12
Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
simple "mount -t cifs //xxx /mnt" oopsed on strlen of options
http://kerneloops.org/guilty.php?guilty=cifs_get_sb&version=2.6.25-release&start=16711 \
68&end=1703935&class=oops
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
upstream commit: 71fd5179e8d1d4d503b517e0c5374f7c49540bfc
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 8dc4e37362a5dc910d704d52ac6542bfd49ddc2f
dget(dentry->d_parent) --> dget_parent(dentry)
unlock_parent() is racy and unnecessary. Replace single caller with
unlock_dir().
There are several other suspect uses of ->d_parent in ecryptfs...
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 2f9b12a31fcb738ea8c9eb0d4ddf906c6f1d696c
Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 9c3580aa52195699065bc2d7242b1c7e3e6903fa
Callers of notify_change() need to hold i_mutex.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: ed1524371716466e9c762808b02601d0d0276a92
Duh... Fortunately, the bug is quite recent (post-2.6.25) and, embarrassingly,
mine ;-/
http://bugzilla.kernel.org/show_bug.cgi?id=10878
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
support.
upstream commit: ca05a99a54db1db5bca72eccb5866d2a86f8517f
Source code out there hard-codes a notion of what the
_LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the
raw capability system calls capget() and capset(). Its unfortunate, but
true.
Since the confusing header file has been in a released kernel, there is
software that is erroneously using 64-bit capabilities with the semantics
of 32-bit compatibilities. These recently compiled programs may suffer
corruption of their memory when sys_getcap() overwrites more memory than
they are coded to expect, and the raising of added capabilities when using
sys_capset().
As such, this patch does a number of things to clean up the situation
for all. It
1. forces the _LINUX_CAPABILITY_VERSION define to always retain its
legacy value.
2. adopts a new #define strategy for the kernel's internal
implementation of the preferred magic.
3. deprecates v2 capability magic in favor of a new (v3) magic
number. The functionality of v3 is entirely equivalent to v2,
the only difference being that the v2 magic causes the kernel
to log a "deprecated" warning so the admin can find applications
that may be using v2 inappropriately.
[User space code continues to be encouraged to use the libcap API which
protects the application from details like this. libcap-2.10 is the first
to support v3 capabilities.]
Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518.
Thanks to Bojan Smojver for the report.
[akpm@linux-foundation.org: s/depreciate/deprecate/g]
[akpm@linux-foundation.org: be robust about put_user size]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: Bojan Smojver <bojan@rexursive.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: d3e49afbb66109613c3474f2273f5830ac2dcb09
The page decrypt calls in ecryptfs_write() are both pointless and buggy.
Pointless because ecryptfs_get_locked_page() has already brought the page
up to date, and buggy because prior mmap writes will just be blown away by
the decrypt call.
This patch also removes the declaration of a now-nonexistent function
ecryptfs_write_zeros().
Thanks to Eric Sandeen and David Kleikamp for helping to track this
down.
Eric said:
fsx w/ mmap dies quickly ( < 100 ops) without this, and survives
nicely (to millions of ops+) with it in place.
Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: backport to 2.6.25.5]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
/proc/pid/pagemap
upstream commit: aae8679b0ebcaa92f99c1c3cb0cd651594a43915
Fix a bug in add_to_pagemap. Previously, since pm->out was a char *,
put_user was only copying 1 byte of every PFN, resulting in the top 7
bytes of each PFN not being copied. By requiring that reads be a multiple
of 8 bytes, I can make pm->out and pm->end u64*s instead of char*s, which
makes put_user work properly, and also simplifies the logic in
add_to_pagemap a bit.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Tuttle <ttuttle@google.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: aed5417593ad125283f35513573282139a8664b5
This patch:
commit e9720acd728a46cb40daa52c99a979f7c4ff195c
Author: Pavel Emelyanov <xemul@openvz.org>
Date: Fri Mar 7 11:08:40 2008 -0800
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
introduced a /proc/self/net directory without bumping the corresponding
link count for /proc/self.
This patch replaces the static link count initializations with a call that
counts the number of directory entries in the given pid_entry table
whenever it is instantiated, and thus relieves the burden of manually
keeping the two in sync.
[akpm@linux-foundation.org: cleanup]
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 6ab455eeaff6893cd06da33843e840d888cdc04a
When we have multiple buffers in a single page for a blocksize == pagesize
filesystem we might overwrite the page contents if two callers hit it
shortly after each other. To prevent that we need to keep the page locked
until I/O is completed and the page marked uptodate.
Thanks to Eric Sandeen for triaging this bug and finding a reproducible
testcase and Dave Chinner for additional advice.
This should fix kernel.org bz #10421.
Tested-by: Eric Sandeen <sandeen@sandeen.net>
SGI-PV: 981813
SGI-Modid: xfs-linux-melb:xfs-kern:31173a
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: 076d8423a98659a92837b07aa494cb74bfefe77c
When a share was in DFS and the server was Unix/Linux, we were sending paths of the form
\\server\share/dir/file
rather than
//server/share/dir/file
There was some discussion between me and jra over whether we should use
/server/share/dir/file
as MS sometimes says - but the documentation for this claims it should be
doubleslash for this type of UNC-like path format and that works, so leaving
it as doubleslash but converting the \ to / in the the //server/share portion.
This gets Samba to now correctly return STATUS_PATH_NOT_COVERED when it is
supposed to (Windows already did since the direction of the slash was not an issue
for them). Still need another minor change to fully enable DFS (need to finish
some chages to SMBGetDFSRefer
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7e01c8e5420b6c7f9d85d34c15d8c7a15c9fc720 upstream
This fix the uninitialized bs when we try to replace a xattr entry in
ibody with the new value which require more than free space.
This situation only happens we format ext3/4 with inode size more than 128 and
we have put xattr entries both in ibody and block. The consequences about
this bug is we will lost the xattr block which pointed by i_file_acl with all
xattr entires in it. We will alloc a new xattr block and put that large value
entry in it. The old xattr block will become orphan block.
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
|
|
upstream commit: ddb2c43594f22843e9f3153da151deaba1a834c5
- Don't trust a length which is greater than the working buffer.
An invalid length could cause overflow when calculating buffer size
for decoding oid.
- An oid length of zero is invalid and allows for an off-by-one error when
decoding oid because the first subid actually encodes first 2 subids.
- A primitive encoding may not have an indefinite length.
Thanks to Wei Wang from McAfee for report.
Cc: Steven French <sfrench@us.ibm.com>
Cc: stable@kernel.org
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
commit d5dee5c395062a55236318ac4eec1f4ebb9de6db upstream
Quota files cannot have tails because quota_write and quota_read functions do
not support them. So far when quota files did have tail, we just refused to
turn quotas on it. Sadly this check has been wrong and so there are now plenty
installations where quota files don't have NOTAIL flag set and so now after
fixing the check, they suddently fail to turn quotas on. Since it's easy to
unpack the tail from kernel, do this from reiserfs_quota_on() which solves the
problem and is generally nicer to users anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: <urhausen@urifabi.net>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit: 02c6be615f1fcd37ac5ed93a3ad6692ad8991cd9 upstream
If utimensat() is called with both times set to UTIME_NOW or one of them to
UTIME_NOW and the other to UTIME_OMIT, then it will update the file time
without any permission checking.
I don't think this can be used for anything other than a local DoS, but could
be quite bewildering at that (e.g. "Why was that large source tree rebuilt
when I didn't modify anything???")
This affects all kernels from 2.6.22, when the utimensat() syscall was
introduced.
Fix by doing the same permission checking as for the "times == NULL" case.
Thanks to Michael Kerrisk, whose utimensat-non-conformances-and-fixes.patch in
-mm also fixes this (and breaks other stuff), only he didn't realize the
security implications of this bug.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0b2bac2f1ea0d33a3621b27ca68b9ae760fca2e9 upstream.
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.
As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).
Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 214b7049a7929f03bbd2786aaef04b8b79db34e2 upstream.
We have a race between fcntl() and close() that can lead to
dnotify_struct inserted into inode's list *after* the last descriptor
had been gone from current->files.
Since that's the only point where dnotify_struct gets evicted, we are
screwed - it will stick around indefinitely. Even after struct file in
question is gone and freed. Worse, we can trigger send_sigio() on it at
any later point, which allows to send an arbitrary signal to arbitrary
process if we manage to apply enough memory pressure to get the page
that used to host that struct file and fill it with the right pattern...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e92adcba261fd391591bb63c1703185a04a41554 upstream
This patch wakes up a thread waiting in io_getevents if another thread
destroys the context. This was tested using a small program that spawns a
thread to wait in io_getevents while the parent thread destroys the io context
and then waits for the getevents thread to exit. Without this patch, the
program hangs indefinitely. With the patch, the program exits as expected.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Christopher Smith <x@xman.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
We were accounting for the cleanmarker by calling jffs2_link_node_ref()
(without locking!), which adjusted both superblock and per-eraseblock
accounting, subtracting the size of the cleanmarker from {jeb,c}->free_size
and adding it to {jeb,c}->used_size.
But only _then_ were we adding the size of the newly-erased block back
to the superblock counts, and we were adding each of jeb->{free,used}_size
to the corresponding superblock counts. Thus, the size of the cleanmarker
was effectively subtracted from the superblock's free_size _twice_.
Fix this, by always adding a full eraseblock size to c->free_size when
we've erased a block. And call jffs2_link_node_ref() under the proper
lock, while we're at it.
Thanks to Alexander Yurchenko and/or Damir Shayhutdinov for (almost)
pinpointing the problem.
[Backport of commit 014b164e1392a166fe96e003d2f0e7ad2e2a0bb7]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Describe debug parameters with their names (and not their values).
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But
filesystems are calling this function while holding xattr_sem so possible
recursion into the fs violates locking ordering of xattr_sem and transaction
start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems
can specify desired gfp mask and use GFP_NOFS from all of them.
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Dave Jones <davej@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
write_begin
This fixes a regression introduced in commit
205c109a7a96d9a3d8ffe64c4068b70811fef5e8 when switching to
write_begin/write_end operations in JFFS2.
The page offset is miscalculated, leading to corruption of the fragment
lists and subsequently to memory corruption and panics.
[ Side note: the bug is a fairly direct result of the naming. Nick was
likely misled by the use of "offs", since we tend to use the notion of
"offset" not as an absolute position, but as an offset _within_ a page
or allocation.
Alternatively, a "pgoff_t" is a page index, but not a byte offset -
our VM naming can be a bit confusing.
So in this case, a VM person would likely have called this a "pos",
not an "offs", or perhaps talked about byte offsets rather than page
offsets (since it's counted in bytes, not pages). - Linus ]
Signed-off-by: Alexey Korolev <akorolev@infradead.org>
Signed-off-by: Vasiliy Leonenko <vasiliy.leonenko@mail.ru>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Miklos Szeredi found the bug:
"Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns
NLM_LCK_DENIED.
"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever."
So, for example, opening a file on an nfs filesystem, changing
permissions to forbid further access, then trying to lock the file,
could result in an infinite loop.
And Trond Myklebust identified the culprit, from Marc Eshel and I:
7723ec9777d9832849b76475b1a21a2872a40d20 "locks: factor out
generic/filesystem switch from setlock code"
That commit claimed to just be reshuffling code, but actually introduced
a behavioral change by calling the lock method repeatedly as long as it
returned -EAGAIN.
We assumed this would be safe, since we assumed a lock of type SETLKW
would only return with either success or an error other than -EAGAIN.
However, nfs does can in fact return -EAGAIN in this situation, and
independently of whether that behavior is correct or not, we don't
actually need this change, and it seems far safer not to depend on such
assumptions about the filesystem's ->lock method.
Therefore, revert the problematic part of the original commit. This
leaves vfs_lock_file() and its other callers unchanged, while returning
fcntl_setlk and fcntl_setlk64 to their former behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'docs' of git://git.lwn.net/linux-2.6:
Add additional examples in Documentation/spinlocks.txt
Move sched-rt-group.txt to scheduler/
Documentation: move rpc-cache.txt to filesystems/
Documentation: move nfsroot.txt to filesystems/
Spell out behavior of atomic_dec_and_lock() in kerneldoc
Fix a typo in highres.txt
Fixes to the seq_file document
Fill out information on patch tags in SubmittingPatches
Add the seq_file documentation
|
|
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Michael Kerrisk found out that signalfd was not reporting back user data
pushed using sigqueue:
http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123
The following patch makes signalfd report back the ssi_ptr and ssi_int members
of the signalfd_siginfo structure.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|