summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-03-31[GFS2] Allow bmap to allocate extentsSteven Whitehouse
We've supported mapping of extents when no block allocation is required for some time. This patch extends that to mapping of extents when an allocation has been requested. In that case we try to allocate as many blocks as are requested, but we might return fewer in case there is something preventing us from returning the complete amount (e.g. an already allocated block is in the way). Currently the only code path which can actually request multiple data blocks in a single bmap call is the page_mkwrite path and even then it only happens if there are multiple blocks per page. What this patch does do however, is merge the allocation requests for metadata (growing the metadata tree in either height or depth) with the allocation of the data blocks in the case that both are needed. This results in lower overheads even in the single block allocation case. The one thing which we can't handle here at the moment is unstuffing. I would like to be able to do that, but the problem which arises is that in order to unstuff one has to get a locked page from the page cache which results in locking problems in the (usual) case that the caller is holding the page lock on the page it wishes to map. So that case will have to be addressed in future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Fix a page lock / glock deadlockSteven Whitehouse
We've previously been using a "try lock" in readpage on the basis that it would prevent deadlocks due to the inverted lock ordering (our normal lock ordering is glock first and then page lock). Unfortunately tests have shown that this isn't enough. If the glock has a demote request queued such that run_queue() in the glock code tries to do a demote when its called under readpage then it will try and write out all the dirty pages which requires locking them. This then deadlocks with the page locked by readpage. The solution is to always require two calls into readpage. The first unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the second does the actual readpage and unlocks the glock & page as required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_opsAdrian Bunk
This patch adds a proper extern declaration for gdlm_ops in fs/gfs2/locking/dlm/lock_dlm.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] gfs2/ops_file.c should #include "ops_inode.h"Adrian Bunk
Every file should include the headers containing the prototypes for its global functions (in this case for gfs2_set_inode_flags()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] be*_add_cpu conversionMarcin Slusarz
replace all: big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) + expression_in_cpu_byteorder); with: beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Fix bug where we called drop_bh incorrectlySteven Whitehouse
As a result of an earlier patch, drop_bh was being called in cases when it shouldn't have been. Since we never have a gh in the drop case and we always have a gh in the promote case, we can use that extra information to tell which case has been seen. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2008-03-31[GFS2] Get inode buffer only once per block map callSteven Whitehouse
In the case that we needed to grow the height of the metadata tree we were looking up the inode buffer and then brelse()ing it despite the fact that it is needed later in the block map process. This patch ensures that we look up the inode's buffer once and only once during the block map process. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Eliminate (almost) duplicate field from gfs2_inodeSteven Whitehouse
The blocks counter is almost a duplicate of the i_blocks field in the VFS inode. The only difference is that i_blocks can be only 32bits long for 32bit arch without large single file support. Since GFS2 doesn't handle the non-large single file case (for 32 bit anyway) this adds a new config dependency on 64BIT || LSF. This has always been the case, however we've never explicitly said so before. Even if we do add support for the non-LSF case, we will still not require this field to be duplicated since we will not be able to access oversized files anyway. So the net result of all this is that we shave 8 bytes from a gfs2_inode and get our config deps correct. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Add a function to interate over an extentSteven Whitehouse
This adds a function (currently the only use is during mapping of already allocated blocks, but watch this space) which iterates over a number of pointers in a block and returns the extent length. If the initial pointer is 0 (i.e. unallocated) it will return the number of unallocated blocks in the extent. If the initial pointer is allocated, then it returns the number of contiguously allocated blocks in the extent. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] The case of the missing asteriskSteven Whitehouse
A dereference was forgotten. This adds it back correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Add extent allocation to block allocatorSteven Whitehouse
Rather than having to allocate a single block at a time, this patch allows the block allocator to allocate an extent. Since there is no difference (so far as the block allocator is concerned) between data blocks and indirect blocks, it is posible to allocate a single extent and for the caller to unrevoke just the blocks required for indirect blocks. Currently the only bit of GFS2 to make use of this feature is the build height function. The intention is that gfs2_block_map will be changed to make use of this feature in future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Merge gfs2_alloc_meta and gfs2_alloc_dataSteven Whitehouse
Thanks to the preceeding patches, the only difference between these two functions is their name. We can thus merge them and call the new function gfs2_alloc_block to reflect the fact that it can allocate either kind of block. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Update gfs2_trans_add_unrevoke to accept extentsSteven Whitehouse
By adding an extra argument to gfs2_trans_add_unrevoke we can now specify an extent length of blocks to unrevoke. This means that we only need to make one pass through the list for each extent rather than each block. Currently the only extent length which is used is 1, but that will change in the future. Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta since its the only difference between this and gfs2_alloc_data which is left. This will allow a future patch to merge these two functions into one (i.e. one call to allocate both data and metadata in a single extent in the future). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Merge the rd_last_alloc_meta and rd_last_alloc_data fieldsSteven Whitehouse
We don't need to keep track of when we last allocated data and metadata separately since the only thing thats important when searching for a free block is whether its free or not, which is independent from what type of block it is. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Reduce inode size by merging fieldsSteven Whitehouse
There were three fields being used to keep track of the location of the most recently allocated block for each inode. These have been merged into a single field in order to better keep the data and metadata for an inode close on disk, and also to reduce the space required for storage. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Remove unused countersBob Peterson
This is kind of trivial in the greater scheme of things, but this removes three counters that AFAICT are never used. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Shrink & rename di_depthSteven Whitehouse
This patch forms a pair with the previous patch which shrunk di_height. Like that patch di_depth is renamed i_depth and moved into struct gfs2_inode directly. Also the field goes from 16 bits to 8 bits since it is also limited to a max value which is rather small (17 in this case). In addition we also now validate the field against this maximum value when its read in. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Remove rgrp and glock version numbersBob Peterson
This patch further reduces GFS2's memory requirements by eliminating the 64-bit version number fields in lieu of a couple bits. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Remove lm.[ch] and distribute contentSteven Whitehouse
The functions in lm.c were just wrappers which were mostly only used in one other file. By moving the functions to the files where they are being used, they can be marked static and also this will usually result in them being inlined since they are often only used from one point in the code. A couple of really trivial functions have been inlined by hand into the function which called them as it makes the code clearer to do that. We also gain from one fewer function call in the glock lock and unlock paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Eliminate gl_req_bhBob Peterson
This patch further reduces the memory needs of GFS2 by eliminating the gl_req_bh variable from struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Add consts to various bits of rgrp.cSteven Whitehouse
There are a couple of routines which scan bitmaps where we can mark the bitmaps const, plus a couple of call sites that can be updated too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Introduce array of buffers to struct metapathSteven Whitehouse
The reason for doing this is to allow all the block mapping code to share the same array. As a result we can remove two arguments from lookup_metapath since they are now returned via the array. We also add a function to drop all refs to buffer heads when we are done with the metapath. The build_height function shares the struct metapath, but currently still frees its own buffers, and this will change in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Move part of gfs2_block_map into a separate functionSteven Whitehouse
This is required to enable future changes to the block mapping code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Get rid of gl_waiters2Bob Peterson
This patch reduces memory by replacing the int variable gl_waiters2 by a single bit in the gl_flags. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Combine rg_flags and rd_flagsBob Peterson
This patch reduces the memory required by GFS2 by combining the rd_flags and rg_flags (in core only). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Allocate gfs2_rgrpd from slab memoryBob Peterson
This patch moves the gfs2_rgrpd structure to its own slab memory. This makes it easier to control and monitor, and yields less memory fragmentation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Plug an unlikely leakBob Peterson
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] make gfs2_glock_hold() staticAdrian Bunk
gfs2_glock_hold() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Only wake the reclaim daemon if we need toBob Peterson
This patch only wakes up the glock reclaim daemon if there is actually something to be reclaimed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Misc fixupsBob Peterson
This patch contains two small fixups that didn't fit elsewhere. They are: (1) get rid of temp variable in find_metapath. (2) Remove vestigial "ret" variable from gfs2_writepage_common. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Only do lo_incore_commit onceBob Peterson
This patch is performance related. When we're doing a log flush, I noticed we were calling buf_lo_incore_commit twice: once for data bufs and once for metadata bufs. Since this is the same function and does the same thing in both cases, there should be no reason to call it twice. Since we only need to call it once, we can also make it faster by removing it from the generic "lops" code and making it a stand-along static function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Fix debug inode printingBob Peterson
I noticed that the latest change to i_height got rid of the value from the inode dump. This patch adds it back. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Get rid of unneeded parameter in gfs2_rlist_allocBob Peterson
This patch removed the unnecessary parameter from function gfs2_rlist_alloc. The parameter was always passed in as 0. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Streamline indirect pointer tree height calculationSteven Whitehouse
This patch improves the calculation of the tree height in order to reduce the number of operations which are carried out on each call to gfs2_block_map. In the common case, we now make a single comparison, rather than calculating the required tree height from scratch each time. Also in the case that the tree does need some extra height, we start from the current height rather from zero when we work out what the new height ought to be. In addition the di_height field is moved into the inode proper and reduced in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Speed up gfs2_write_alloc_required, deprecate gfs2_extent_mapSteven Whitehouse
This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required, instead we call gfs2_block_map directly. This results in fewer overall calls to gfs2_block_map in the multi-block case. Also, gfs2_extent_map is marked as deprecated so that people know that its going away as soon as all the callers have been converted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-30cifs: fix misannotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30NULL noise: fs/*, mm/*, kernel/*Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-30jbd/jbd2 NULL noiseAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock [PATCH] do shrink_submounts() for all fs types [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() [PATCH] count ghost references to vfsmounts [PATCH] reduce stack footprint in namespace.c
2008-03-28afs: prevent double cell registrationSven Schnelle
kafs doesn't check if the cell already exists - so if you do an echo "add newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell again. kobject will also complain about a double registration. To prevent such problems, return -EEXIST in that case. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28vfs: fix data leak in nobh_write_end()Dmitri Monakhov
Current nobh_write_end() implementation ignore partial writes(copied < len) case if page was fully mapped and simply mark page as Uptodate, which is totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in previous write_begin call. It simply contains garbage from pagecache and result in data leakage. #TEST_CASE_BEGIN: ~~~~~~~~~~~~~~~~ In fact issue triggered by classical testcase open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 ftruncate(3, 409600) = 0 writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1 ##TESTCASE_SOURCE: ~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/uio.h> #include <sys/mman.h> #include <errno.h> int main(int argc, char **argv) { int fd, ret; void* p; struct iovec iov[2]; fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666); ftruncate(fd, 409600); iov[0].iov_base="a"; iov[0].iov_len=1; iov[1].iov_base=NULL; iov[1].iov_len=4096; ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec)); printf("writev = %d, err = %d\n", ret, errno); return 0; } ##TESTCASE RESULT: ~~~~~~~~~~~~~~~~~~ [root@ts63 ~]# mount | grep mnt2 /dev/mapper/test on /mnt2 type ext2 (rw,nobh) [root@ts63 ~]# /tmp/writev /mnt2/test writev = 1, err = 0 [root@ts63 ~]# hexdump -C /mnt2/test 00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...| 00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000020 df df df df df df df df df df df df df df df df |................| 00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......| 00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......| 00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/| 00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n| skip.. 00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......| 00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v| 00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.| 00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......| 00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...| 00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................| 00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/| 00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat| 00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| As you can see file's page contains garbage from pagecache instead of zeros. #TEST_CASE_END Attached patch: - Add sanity check BUG_ON in order to prevent incorrect usage by caller, This is function invariant because page can has buffers and in no zero *fadata pointer at the same time. - Always attach buffers to page is it is partial write case. - Always switch back to generic_write_end if page has buffers. This is reasonable because if page already has buffer then generic_write_begin was called previously. Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-27[PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lockAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27[PATCH] do shrink_submounts() for all fs typesAl Viro
... and take it out of ->umount_begin() instances. Call with all locks already taken (by do_umount()) and leave calling release_mounts() to caller (it will do release_mounts() anyway, so we can just put into the same list). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27[PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts()Al Viro
... and fix a race on access of ->mnt_share et.al. without namespace_sem in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27[PATCH] count ghost references to vfsmountsAl Viro
make propagate_mount_busy() exclude references from the vfsmounts that had been isolated by umount_tree() and are just waiting for release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27[PATCH] reduce stack footprint in namespace.cAl Viro
A lot of places misuse struct nameidata when they need struct path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] get stack footprint of pathname resolution back to relative sanity [PATCH] double iput() on failure exit in hugetlb [PATCH] double dput() on failure exit in tiny-shmem [PATCH] fix up new filp allocators [PATCH] check for null vfsmount in dentry_open() [PATCH] reiserfs: eliminate private use of struct file in xattr [PATCH] sanitize hppfs hppfs pass vfsmount to dentry_open() [PATCH] restore export of do_kern_mount()
2008-03-24driver core: debug for bad dev_attr_show() return value.Andrew Morton
Try to find the culprit who caused http://bugzilla.kernel.org/show_bug.cgi?id=10150 Cc: <balajirrao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix mem leak on dfs referral [CIFS] file create with acl support enabled is slow [CIFS] Fix mtime on cp -p when file data cached but written out too late [CIFS] Fix build problem [CIFS] cifs: replace remaining __FUNCTION__ occurrences [CIFS] DFS patch that connects inode with dfs handling ops
2008-03-22Change pagemap output format to allow for future reporting of huge pagesHans Rosenfeld
Change pagemap output format to allow for future reporting of huge pages. (Format comment and minor cleanups: mpm@selenic.com) Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>