<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/ext4, branch v3.1.4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ext4: fix race in xattr block allocation path</title>
<updated>2011-11-11T17:43:54+00:00</updated>
<author>
<name>Eric Sandeen</name>
<email>sandeen@redhat.com</email>
</author>
<published>2011-10-29T14:15:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=87556ecf5d093a7d083ce909476783bf98dc4725'/>
<id>87556ecf5d093a7d083ce909476783bf98dc4725</id>
<content type='text'>
commit 6d6a435190bdf2e04c9465cde5bdc3ac68cf11a4 upstream.

Ceph users reported that when using Ceph on ext4, the filesystem
would often become corrupted, containing inodes with incorrect
i_blocks counters.

I managed to reproduce this with a very hacked-up "streamtest"
binary from the Ceph tree.

Ceph is doing a lot of xattr writes, to out-of-inode blocks.
There is also another thread which does sync_file_range and close,
of the same files.  The problem appears to happen due to this race:

sync/flush thread               xattr-set thread
-----------------               ----------------

do_writepages                   ext4_xattr_set
ext4_da_writepages              ext4_xattr_set_handle
mpage_da_map_blocks             ext4_xattr_block_set
        set DELALLOC_RESERVE
                                ext4_new_meta_blocks
                                        ext4_mb_new_blocks
                                                if (!i_delalloc_reserved_flag)
                                                        vfs_dq_alloc_block
ext4_get_blocks
	down_write(i_data_sem)
        set i_delalloc_reserved_flag
	...
	up_write(i_data_sem)
                                        if (i_delalloc_reserved_flag)
                                                vfs_dq_alloc_block_nofail


In other words, the sync/flush thread pops in and sets
i_delalloc_reserved_flag on the inode, which makes the xattr thread
think that it's in a delalloc path in ext4_new_meta_blocks(),
and add the block for a second time, after already having added
it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks

The real problem is that we shouldn't be using the DELALLOC_RESERVED
state flag, and instead we should be passing
EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
using an inode state flag.  We'll fix this for now with using
i_data_sem to prevent this race, but this is really not the right way
to fix things.

Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6d6a435190bdf2e04c9465cde5bdc3ac68cf11a4 upstream.

Ceph users reported that when using Ceph on ext4, the filesystem
would often become corrupted, containing inodes with incorrect
i_blocks counters.

I managed to reproduce this with a very hacked-up "streamtest"
binary from the Ceph tree.

Ceph is doing a lot of xattr writes, to out-of-inode blocks.
There is also another thread which does sync_file_range and close,
of the same files.  The problem appears to happen due to this race:

sync/flush thread               xattr-set thread
-----------------               ----------------

do_writepages                   ext4_xattr_set
ext4_da_writepages              ext4_xattr_set_handle
mpage_da_map_blocks             ext4_xattr_block_set
        set DELALLOC_RESERVE
                                ext4_new_meta_blocks
                                        ext4_mb_new_blocks
                                                if (!i_delalloc_reserved_flag)
                                                        vfs_dq_alloc_block
ext4_get_blocks
	down_write(i_data_sem)
        set i_delalloc_reserved_flag
	...
	up_write(i_data_sem)
                                        if (i_delalloc_reserved_flag)
                                                vfs_dq_alloc_block_nofail


In other words, the sync/flush thread pops in and sets
i_delalloc_reserved_flag on the inode, which makes the xattr thread
think that it's in a delalloc path in ext4_new_meta_blocks(),
and add the block for a second time, after already having added
it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks

The real problem is that we shouldn't be using the DELALLOC_RESERVED
state flag, and instead we should be passing
EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
using an inode state flag.  We'll fix this for now with using
i_data_sem to prevent this race, but this is really not the right way
to fix things.

Signed-off-by: Eric Sandeen &lt;sandeen@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: let ext4_page_mkwrite stop started handle in failure</title>
<updated>2011-11-11T17:43:53+00:00</updated>
<author>
<name>Yongqiang Yang</name>
<email>xiaoqiangnk@gmail.com</email>
</author>
<published>2011-10-26T09:00:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6ff60a6de3b107b9c78eb10b2c9a9987bab5138c'/>
<id>6ff60a6de3b107b9c78eb10b2c9a9987bab5138c</id>
<content type='text'>
commit fcbb5515825f1bb20b7a6f75ec48bee61416f879 upstream.

The started journal handle should be stopped in failure case.

Signed-off-by: Yongqiang Yang &lt;xiaoqiangnk@gmail.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit fcbb5515825f1bb20b7a6f75ec48bee61416f879 upstream.

The started journal handle should be stopped in failure case.

Signed-off-by: Yongqiang Yang &lt;xiaoqiangnk@gmail.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: call ext4_handle_dirty_metadata with correct inode in ext4_dx_add_entry</title>
<updated>2011-11-11T17:43:52+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-31T16:02:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=caf61680a4f161a4b6d1c6e734ec0fb8233d45d7'/>
<id>caf61680a4f161a4b6d1c6e734ec0fb8233d45d7</id>
<content type='text'>
commit 5930ea643805feb50a2f8383ae12eb6f10935e49 upstream.

ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads
that point to directory blocks assigned to the directory inode.  However, the
function calls ext4_handle_dirty_metadata with the inode of the file that's
being added to the directory, not the directory inode itself.  Therefore,
correct the code to dirty the directory buffers with the directory inode, not
the file inode.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5930ea643805feb50a2f8383ae12eb6f10935e49 upstream.

ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads
that point to directory blocks assigned to the directory inode.  However, the
function calls ext4_handle_dirty_metadata with the inode of the file that's
being added to the directory, not the directory inode itself.  Therefore,
correct the code to dirty the directory buffers with the directory inode, not
the file inode.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: ext4_mkdir should dirty dir_block with newly created directory inode</title>
<updated>2011-11-11T17:43:52+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2011-08-31T16:00:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=523d90b566448dd9ca31b0a93b3cf9f601118d5c'/>
<id>523d90b566448dd9ca31b0a93b3cf9f601118d5c</id>
<content type='text'>
commit f9287c1f2d329f4d78a3bbc9cf0db0ebae6f146a upstream.

ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir".
Unfortunately, dir_block belongs to the newly created directory (which is
"inode"), not the parent directory (which is "dir").  Fix the incorrect
association.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f9287c1f2d329f4d78a3bbc9cf0db0ebae6f146a upstream.

ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir".
Unfortunately, dir_block belongs to the newly created directory (which is
"inode"), not the parent directory (which is "dir").  Fix the incorrect
association.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: ext4_rename should dirty dir_bh with the correct directory</title>
<updated>2011-11-11T17:43:50+00:00</updated>
<author>
<name>Darrick J. Wong</name>
<email>djwong@us.ibm.com</email>
</author>
<published>2011-08-31T15:58:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=76e72d11365ce08b2800ac70b7abc0d46ee9727d'/>
<id>76e72d11365ce08b2800ac70b7abc0d46ee9727d</id>
<content type='text'>
commit bcaa992975041e40449be8c010c26192b8c8b409 upstream.

When ext4_rename performs a directory rename (move), dir_bh is a
buffer that is modified to update the '..' link in the directory being
moved (old_inode).  However, ext4_handle_dirty_metadata is called with
the old parent directory inode (old_dir) and dir_bh, which is
incorrect because dir_bh does not belong to the parent inode.  Fix
this error.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bcaa992975041e40449be8c010c26192b8c8b409 upstream.

When ext4_rename performs a directory rename (move), dir_bh is a
buffer that is modified to update the '..' link in the directory being
moved (old_inode).  However, ext4_handle_dirty_metadata is called with
the old parent directory inode (old_dir) and dir_bh, which is
incorrect because dir_bh does not belong to the parent inode.  Fix
this error.

Signed-off-by: Darrick J. Wong &lt;djwong@us.ibm.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodes</title>
<updated>2011-11-11T17:43:50+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2011-08-31T15:54:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=305b7dd039f8c4ab5a38057f8c6fbcc50e4f4e43'/>
<id>305b7dd039f8c4ab5a38057f8c6fbcc50e4f4e43</id>
<content type='text'>
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream.

This doesn't make much sense, and it exposes a bug in the kernel where
attempts to create a new file in an append-only directory using
O_CREAT will fail (but still leave a zero-length file).  This was
discovered when xfstests #79 was generalized so it could run on all
file systems.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream.

This doesn't make much sense, and it exposes a bug in the kernel where
attempts to create a new file in an append-only directory using
O_CREAT will fail (but still leave a zero-length file).  This was
discovered when xfstests #79 was generalized so it could run on all
file systems.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.dk/linux-block</title>
<updated>2011-09-21T20:20:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-09-21T20:20:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fed678dc8a8b839c8189b5d889a94e865cd327dd'/>
<id>fed678dc8a8b839c8189b5d889a94e865cd327dd</id>
<content type='text'>
* 'for-linus' of git://git.kernel.dk/linux-block:
  floppy: use del_timer_sync() in init cleanup
  blk-cgroup: be able to remove the record of unplugged device
  block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
  mm: Add comment explaining task state setting in bdi_forker_thread()
  mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
  block: simplify force plug flush code a little bit
  block: change force plug flush call order
  block: Fix queue_flag update when rq_affinity goes from 2 to 1
  block: separate priority boosting from REQ_META
  block: remove READ_META and WRITE_META
  xen-blkback: fixed indentation and comments
  xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* 'for-linus' of git://git.kernel.dk/linux-block:
  floppy: use del_timer_sync() in init cleanup
  blk-cgroup: be able to remove the record of unplugged device
  block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
  mm: Add comment explaining task state setting in bdi_forker_thread()
  mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
  block: simplify force plug flush code a little bit
  block: change force plug flush call order
  block: Fix queue_flag update when rq_affinity goes from 2 to 1
  block: separate priority boosting from REQ_META
  block: remove READ_META and WRITE_META
  xen-blkback: fixed indentation and comments
  xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining</title>
<updated>2011-08-31T15:50:51+00:00</updated>
<author>
<name>Jiaying Zhang</name>
<email>jiayingz@google.com</email>
</author>
<published>2011-08-31T15:50:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8c0bec2151a47906bf779c6715a10ce04453ab77'/>
<id>8c0bec2151a47906bf779c6715a10ce04453ab77</id>
<content type='text'>
The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places.  In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode().  Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix.  Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero.  However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The i_mutex lock and flush_completed_IO() added by commit 2581fdc810
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places.  In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode().  Instead, we take a different approach to resolve
the software lockup that commit 2581fdc810 intends to fix.  Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero.  However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Signed-off-by: Jiaying Zhang &lt;jiayingz@google.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: separate priority boosting from REQ_META</title>
<updated>2011-08-23T12:50:29+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-08-23T12:50:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=65299a3b788bd274bed92f9fa3232082c9f3ea70'/>
<id>65299a3b788bd274bed92f9fa3232082c9f3ea70</id>
<content type='text'>
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.

All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.

All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: remove READ_META and WRITE_META</title>
<updated>2011-08-23T12:49:55+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-08-23T12:49:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5dc06c5a70b79a323152bec7e55783e705767e63'/>
<id>5dc06c5a70b79a323152bec7e55783e705767e63</id>
<content type='text'>
Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
