summaryrefslogtreecommitdiff
path: root/fs/ext4/inode.c
AgeCommit message (Collapse)Author
2012-03-21ext4: do not mark superblock as dirty unnecessarilyArtem Bityutskiy
Commit a0375156ca1041574b5d47cc7e32f10b891151b0 cleaned up superblock dirtying handling, but missed one place. This patch does what was intended: if we have the journal, then we update the superblock through the journal rather than doing this directly. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: correct ext4_punch_hole return codesAllison Henderson
ext4_punch_hole returns -ENOTSUPP but it should be using -EOPNOTSUPP Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-21ext4: remove restrictive checks for EOFBLOCKS_FLLukas Czerner
We are going to remove the EOFBLOCKS_FL flag in the future, so this is the first part of the removal. We can not remove it entirely just now, since the e2fsck is still checking for it and it might cause headache to some people. Instead, remove the restrictive checks now and the rest later, when the new e2fsck code is out and common enough. This is also needed because punch hole already breaks the EOFBLOCKS_FL semantics, so it might cause the some troubles. So simply remove it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-19ext4: change some printk() calls to use ext4_msg() insteadTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-19ext4: remove trailing newlines from ext4_msg() and ext4_error() messagesTheodore Ts'o
The functions ext4_msg() and ext4_error() already tack on a trailing newline, so remove the unnecessary extra newline. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-19ext4: add no_printk argument validation, fix falloutJoe Perches
Add argument validation to debug functions. Use ##__VA_ARGS__. Fix format and argument mismatches. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-05ext4: clean up the flags passed to __blockdev_direct_IOJeff Moyer
For extent-based files, you can perform DIO to holes, as mentioned in the comments in ext4_ext_direct_IO. However, that function passes DIO_SKIP_HOLES to __blockdev_direct_IO, which is *really* confusing to the uninitiated reader. The key, here, is that the get_block function passed in, ext4_get_block_write, completely ignores the create flag that is passed to it (the create flag is passed in from the direct I/O code, which uses the DIO_SKIP_HOLES flag to determine whether or not it should be cleared). This is a long-winded way of saying that the DIO_SKIP_HOLES flag is ultimately ignored. So let's remove it. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-03-02ext4: remove the I_VERSION mount flag and use the super_block flag insteadTheodore Ts'o
There's no point to have two bits that are set in parallel; so use the MS_I_VERSION flag that is needed by the VFS anyway, and that way we free up a bit in sbi->s_mount_opts. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-20ext4: fix race between unwritten extent conversion and truncateJeff Moyer
The following comment in ext4_end_io_dio caught my attention: /* XXX: probably should move into the real I/O completion handler */ inode_dio_done(inode); The truncate code takes i_mutex, then calls inode_dio_wait. Because the ext4 code path above will end up dropping the mutex before it is reacquired by the worker thread that does the extent conversion, it seems to me that the truncate can happen out of order. Jan Kara mentioned that this might result in error messages in the system logs, but that should be the extent of the "damage." The fix is pretty straight-forward: don't call inode_dio_done until the extent conversion is complete. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-02-20ext4: ignore EXT4_INODE_JOURNAL_DATA flag with delallocLukas Czerner
Ext4 does not support data journalling with delayed allocation enabled. We even do not allow to mount the file system with delayed allocation and data journalling enabled, however it can be set via FS_IOC_SETFLAGS so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file system mounted with delayed allocation (default) and that's where problem arises. The easies way to reproduce this problem is with the following set of commands: mkfs.ext4 /dev/sdd mount /dev/sdd /mnt/test1 dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 chattr +j /mnt/test1/file dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc chattr -j /mnt/test1/file Additionally it can be reproduced quite reliably with xfstests 272 and 269. In fact the above reproducer is a part of test 272. To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if the file system is mounted with delayed allocation. This can be easily done by fixing ext4_should_*_data() functions do ignore data journal flag when delalloc is set (suggested by Ted). We also have to set the appropriate address space operations for the inode (again, ignoring data journal flag if delalloc enabled). Additionally this commit introduces ext4_inode_journal_mode() function because ext4_should_*_data() has already had a lot of common code and this change is putting it all into one function so it is easier to read. Successfully tested with xfstests in following configurations: delalloc + data=ordered delalloc + data=writeback data=journal nodelalloc + data=ordered nodelalloc + data=writeback nodelalloc + data=journal Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-01-10Merge branch 'for_linus' into for_linus_mergedTheodore Ts'o
Conflicts: fs/ext4/ioctl.c
2012-01-09Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2/3/4: delete unneeded includes of module.h ext{3,4}: Fix potential race when setversion ioctl updates inode udf: Mark LVID buffer as uptodate before marking it dirty ext3: Don't warn from writepage when readonly inode is spotted after error jbd: Remove j_barrier mutex reiserfs: Force inode evictions before umount to avoid crash reiserfs: Fix quota mount option parsing udf: Treat symlink component of type 2 as / udf: Fix deadlock when converting file from in-ICB one to normal one udf: Cleanup calling convention of inode_getblk() ext2: Fix error handling on inode bitmap corruption ext3: Fix error handling on inode bitmap corruption ext3: replace ll_rw_block with other functions ext3: NULL dereference in ext3_evict_inode() jbd: clear revoked flag on buffers before a new transaction started ext3: call ext3_mark_recovery_complete() when recovery is really needed
2012-01-09ext2/3/4: delete unneeded includes of module.hPaul Gortmaker
Delete any instances of include module.h that were not strictly required. In the case of ext2, the declaration of MODULE_LICENSE etc. were in inode.c but the module_init/exit were in super.c, so relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it consistent with ext3 and ext4 at the same time. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)
2012-01-04ext4: make more symbols staticEric Sandeen
A couple more functions can reasonably be made static if desired. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28ext4: remove no longer used functions in inode.cZheng Liu
The functions ext4_block_truncate_page() and ext4_block_zero_page_range() are no longer used, so remove them. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28ext4: add missing spaces to debugging printk'sZheng Liu
Fix ext4_debug format in ext4_ext_handle_uninitialized_extents() and ext4_end_io_dio(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28ext4: flush journal when switching from data=journal modeYongqiang Yang
It's necessary to flush the journal when switching away from data=journal mode. This is because there are no revoke records when data blocks are journalled, but revoke records are required in the other journal modes. However, it is not necessary to flush the journal when switching into data=journal mode, and flushing the journal is expensive. So let's avoid it in that case. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-28ext4: allocate delalloc blocks before changing journal modeYongqiang Yang
delalloc blocks should be allocated before changing journal mode, otherwise they can not be allocated and even more truncate on delalloc blocks could triggre BUG by flushing delalloc buffers. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-12-13ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()Yongqiang Yang
If a page has been read into memory and never been written, it has no buffers, but we should handle the page in truncate or punch hole. VFS code of writing operations has handled holes correctly, so this patch removes the code handling holes in writing operations. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-12-13ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesizeYongqiang Yang
If there is an unwritten but clean buffer in a page and there is a dirty buffer after the buffer, then mpage_submit_io does not write the dirty buffer out. As a result, da_writepages loops forever. This patch fixes the problem by checking dirty flag. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-12-13ext4: avoid hangs in ext4_da_should_update_i_disksize()Andrea Arcangeli
If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-12-12ext4: fix ext4_end_io_dio() racing against fsync()Theodore Ts'o
We need to make sure iocb->private is cleared *before* we put the io_end structure on i_completed_io_list. Otherwise fsync() could potentially run on another CPU and free the iocb structure out from under us. Reported-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-12-06treewide: Fix comment and string typo 'bufer'Paul Bolle
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-02treewide: Fix typos in various parts of the kernel, and fix some comments.Justin P. Mattock
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-24ext4: fix racy use-after-free in ext4_end_io_dio()Tejun Heo
ext4_end_io_dio() queues io_end->work and then clears iocb->private; however, io_end->work calls aio_complete() which frees the iocb object. If that slab object gets reallocated, then ext4_end_io_dio() can end up clearing someone else's iocb->private, this use-after-free can cause a leak of a struct ext4_io_end_t structure. Detected and tested with slab poisoning. [ Note: Can also reproduce using 12 fio's against 12 file systems with the following configuration file: [global] direct=1 ioengine=libaio iodepth=1 bs=4k ba=4k size=128m [create] filename=${TESTDIR} rw=write -- tytso ] Google-Bug-Id: 5354697 Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Kent Overstreet <koverstreet@google.com> Tested-by: Kent Overstreet <koverstreet@google.com> Cc: stable@kernel.org
2011-11-21Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds
* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix up a undefined error in ext4_free_blocks in debugging code ext4: add blk_finish_plug in error case of writepages. ext4: Remove kernel_lock annotations ext4: ignore journalled data options on remount if fs has no journal
2011-11-07ext4: add blk_finish_plug in error case of writepages.Namjae Jeon
blk_finish_plug is needed in error case of writepages. Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-11-06Merge branch 'writeback-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: Add a 'reason' to wb_writeback_work writeback: send work item to queue_io, move_expired_inodes writeback: trace event balance_dirty_pages writeback: trace event bdi_dirty_ratelimit writeback: fix ppc compile warnings on do_div(long long, unsigned long) writeback: per-bdi background threshold writeback: dirty position control - bdi reserve area writeback: control dirty pause time writeback: limit max dirty pause time writeback: IO-less balance_dirty_pages() writeback: per task dirty rate limit writeback: stabilize bdi->dirty_ratelimit writeback: dirty rate control writeback: add bg_threshold parameter to __bdi_update_bandwidth() writeback: dirty position control writeback: account per-bdi accumulated dirtied pages
2011-11-02Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: vfs: add d_prune dentry operation vfs: protect i_nlink filesystems: add set_nlink() filesystems: add missing nlink wrappers logfs: remove unnecessary nlink setting ocfs2: remove unnecessary nlink setting jfs: remove unnecessary nlink setting hypfs: remove unnecessary nlink setting vfs: ignore error on forced remount readlinkat: ensure we return ENOENT for the empty pathname for normal lookups vfs: fix dentry leak in simple_fill_super()
2011-11-02Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits) jbd2: Unify log messages in jbd2 code jbd/jbd2: validate sb->s_first in journal_get_superblock() ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled ext4: fix a typo in struct ext4_allocation_context ext4: Don't normalize an falloc request if it can fit in 1 extent. ext4: remove comments about extent mount option in ext4_new_inode() ext4: let ext4_discard_partial_buffers handle unaligned range correctly ext4: return ENOMEM if find_or_create_pages fails ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock() ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten ext4: optimize locking for end_io extent conversion ext4: remove unnecessary call to waitqueue_active() ext4: Use correct locking for ext4_end_io_nolock() ext4: fix race in xattr block allocation path ext4: trace punch_hole correctly in ext4_ext_map_blocks ext4: clean up AGGRESSIVE_TEST code ext4: move variables to their scope ext4: fix quota accounting during migration ext4: migrate cleanup ...
2011-11-02filesystems: add set_nlink()Miklos Szeredi
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-31ext4: warn if direct reclaim tries to writeback pagesMel Gorman
Direct reclaim should never writeback pages. Warn if an attempt is made. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Alex Elder <aelder@sgi.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31ext4: let ext4_discard_partial_buffers handle unaligned range correctlyYongqiang Yang
As comment says, we should handle unaligned range rather than aligned one. This fixes a bug found by running xfstests #91. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
2011-10-31ext4: return ENOMEM if find_or_create_pages failsYongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()Yongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwrittenTao Ma
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We have found some bugs that the flag is set while leaving i_aiodio_unwritten unchanged(commit 32c80b32c053d). So this patch just tries to create a helper function to wrap them to avoid any future bug. The idea is inspired by Eric. Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31writeback: Add a 'reason' to wb_writeback_workCurt Wohlgemuth
This creates a new 'reason' field in a wb_writeback_work structure, which unambiguously identifies who initiates writeback activity. A 'wb_reason' enumeration has been added to writeback.h, to enumerate the possible reasons. The 'writeback_work_class' and tracepoint event class and 'writeback_queue_io' tracepoints are updated to include the symbolic 'reason' in all trace events. And the 'writeback_inodes_sbXXX' family of routines has had a wb_stats parameter added to them, so callers can specify why writeback is being started. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-26ext4: let ext4_page_mkwrite stop started handle in failureYongqiang Yang
The started journal handle should be stopped in failure case. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2011-10-25ext4: update EOFBLOCKS flag on fallocate properlyDmitry Monakhov
EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE Currently it happens only if new extent was allocated. TESTCASE: fallocate test_file -n -l4096 fallocate test_file -l4096 Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And fsck will complain about that. Also remove ping pong in ext4_fallocate() in case of new extents, where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later ext4_falloc_update_inode() restore it again. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-20ext4: fix the deadlock in mpage_da_map_and_submit()Kazuya Mio
If ext4_jbd2_file_inode() in mpage_da_map_and_submit() fails due to journal abort, this function returns to caller without unlocking the page. It leads to the deadlock, and the patch fixes this issue by calling mpage_da_submit_io(). Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-20ext4: fix deadlock in ext4_ordered_write_end()Akira Fujita
If ext4_jbd2_file_inode() in ext4_ordered_write_end() fails for some reasons, this function returns to caller without unlocking the page. It leads to the deadlock, and the patch fixes this issue. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: add block plug for .writepagesShaohua Li
Add block plug for ext4 .writepages. Though ext4 .writepages already handles request merge very well, block plug is still helpful to reduce block lock contention. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08ext4: fix the comment describing ext4_ext_search_right()Tao Ma
The comment describing what ext4_ext_search_right() does is incorrect. We return 0 in *phys when *logical is the 'largest' allocated block, not smallest. Fix a few other typos while we're at it. Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-09-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
2011-09-09ext4: attempt to fix race in bigalloc code pathAditya Kali
Currently, there exists a race between delayed allocated writes and the writeback when bigalloc feature is in use. The race was because we wanted to determine what blocks in a cluster are under delayed allocation and we were using buffer_delayed(bh) check for it. But, the writeback codepath clears this bit without any synchronization which resulted in a race and an ext4 warning similar to: EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0 reserved data blocks The race existed in two places. (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from writeback code path. (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where buffer_delayed(bh) is set. To fix (1), this patch introduces a new buffer_head state bit - BH_Da_Mapped. This bit is set under the protection of EXT4_I(inode)->i_data_sem when we have actually mapped the delayed allocated blocks during the writeout time. We can now reliably check for this bit inside ext4_find_delalloc_range() to determine whether the reservation for the blocks have already been claimed or not. To fix (2), it was necessary to set buffer_delay(bh) under the protection of i_data_sem. So, I extracted the very beginning of ext4_map_blocks into a new function - ext4_da_map_blocks() - and performed the required setting of bh_delay bit and the quota reservation under the protection of i_data_sem. These two fixes makes the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent, thus removing the race. Tested: I was able to reproduce the problem by running 'dd' and 'fsync' in parallel. Also, xfstests sometimes used to reproduce this race. After the fix both my test and xfstests were successful and no race (warning message) was observed. Google-Bug-Id: 4997027 Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: add some tracepoints in ext4/extents.cAditya Kali
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in ext4/inode.c. Tested: Built and ran the kernel and verified that these tracepoints work. Also ran xfstests. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters()Theodore Ts'o
Rename the function so it is more clear what is going on. Also rename the various variables so it's clearer what's happening. Also fix a missing blocks to cluster conversion when reading the number of reserved blocks for root. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters()Theodore Ts'o
This function really claims a number of free clusters, not blocks, so rename it so it's clearer what's going on. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters()Theodore Ts'o
This function really counts the free clusters reported in the block group descriptors, so rename it to reduce confusion. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>