<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/ext4, branch v3.6.5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>ext4: Avoid underflow in ext4_trim_fs()</title>
<updated>2012-10-28T17:56:09+00:00</updated>
<author>
<name>Lukas Czerner</name>
<email>lczerner@redhat.com</email>
</author>
<published>2012-10-22T22:01:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b1f9e723563c5dae07c0c8d49b21ea9602c3e644'/>
<id>b1f9e723563c5dae07c0c8d49b21ea9602c3e644</id>
<content type='text'>
commit 5de35e8d5c02d271c20e18337e01bc20e6ef472e upstream.

Currently if len argument in ext4_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5de35e8d5c02d271c20e18337e01bc20e6ef472e upstream.

Currently if len argument in ext4_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Checksum the block bitmap properly with bigalloc enabled</title>
<updated>2012-10-28T17:56:09+00:00</updated>
<author>
<name>Tao Ma</name>
<email>boyu.mt@taobao.com</email>
</author>
<published>2012-10-22T04:34:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=462f4e60ea80fc6d82b872243c4d11a9f8286d0d'/>
<id>462f4e60ea80fc6d82b872243c4d11a9f8286d0d</id>
<content type='text'>
commit 79f1ba49569e5aec919b653c55b03274c2331701 upstream.

In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.

Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.

Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 79f1ba49569e5aec919b653c55b03274c2331701 upstream.

In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.

Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.

Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Lukas Czerner &lt;lczerner@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: race-condition protection for ext4_convert_unwritten_extents_endio</title>
<updated>2012-10-28T17:56:09+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2012-10-10T05:04:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e28b27be06c5e6a5236c0c0b370ca0768ab086bb'/>
<id>e28b27be06c5e6a5236c0c0b370ca0768ab086bb</id>
<content type='text'>
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream.

We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map-&gt;m_len] because
it was already split during submission.  But this may not be true due to
a race between writeback vs fallocate.

If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map-&gt;m_len] already contains valid data.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit dee1f973ca341c266229faa5a1a5bb268bed3531 upstream.

We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map-&gt;m_len] because
it was already split during submission.  But this may not be true due to
a race between writeback vs fallocate.

If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map-&gt;m_len] already contains valid data.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix mtime update in nodelalloc mode</title>
<updated>2012-10-12T20:50:25+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2012-10-01T03:04:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=92b77229ee73413c1ebfe793ed0085eb1ff794f1'/>
<id>92b77229ee73413c1ebfe793ed0085eb1ff794f1</id>
<content type='text'>
commit 041bbb6d369811e948ae01f3d00414264076be35 upstream.

Commits 5e8830dc85d0 and 41c4d25f78c0 introduced a regression into
v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not
take place for files modified via mmap if the page was already in the
page cache.  This would also affect ext3 file systems mounted using
the ext4 file system driver.

The problem was that ext4_page_mkwrite() had a shortcut which would
avoid calling __block_page_mkwrite() under some circumstances, and the
above two commit transferred the responsibility of calling
file_update_time() to __block_page_mkwrite --- which woudln't get
called in some circumstances.

Since __block_page_mkwrite() only has three callers,
block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the
best way to solve this is to move the responsibility for calling
file_update_time() to its caller.

This problem was found via xfstests #215 with a file system mounted
with -o nodelalloc.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: KONISHI Ryusuke &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 041bbb6d369811e948ae01f3d00414264076be35 upstream.

Commits 5e8830dc85d0 and 41c4d25f78c0 introduced a regression into
v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not
take place for files modified via mmap if the page was already in the
page cache.  This would also affect ext3 file systems mounted using
the ext4 file system driver.

The problem was that ext4_page_mkwrite() had a shortcut which would
avoid calling __block_page_mkwrite() under some circumstances, and the
above two commit transferred the responsibility of calling
file_update_time() to __block_page_mkwrite --- which woudln't get
called in some circumstances.

Since __block_page_mkwrite() only has three callers,
block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the
best way to solve this is to move the responsibility for calling
file_update_time() to its caller.

This problem was found via xfstests #215 with a file system mounted
with -o nodelalloc.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: KONISHI Ryusuke &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix fdatasync() for files with only i_size changes</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2012-09-27T01:52:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=34414b2bf58b95110bf8ccef77d66c05de9e923c'/>
<id>34414b2bf58b95110bf8ccef77d66c05de9e923c</id>
<content type='text'>
commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen &lt;knielsen@knielsen-hq.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen &lt;knielsen@knielsen-hq.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: always set i_op in ext4_mknod()</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Bernd Schubert</name>
<email>bernd.schubert@itwm.fraunhofer.de</email>
</author>
<published>2012-09-27T01:24:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=12ebdf00d08f21da12c85e1800d5d91beaa4bfbe'/>
<id>12ebdf00d08f21da12c85e1800d5d91beaa4bfbe</id>
<content type='text'>
commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert &lt;bernd.schubert@itwm.fraunhofer.de&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert &lt;bernd.schubert@itwm.fraunhofer.de&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: online defrag is not supported for journaled files</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2012-09-26T16:32:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=22a5672604ff3eef101c83436ee15d8e2e148187'/>
<id>22a5672604ff3eef101c83436ee15d8e2e148187</id>
<content type='text'>
commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: move_extent code cleanup</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2012-09-26T16:32:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba57d9ef068e6f55226bfedf0e7cd6adab37a316'/>
<id>ba57d9ef068e6f55226bfedf0e7cd6adab37a316</id>
<content type='text'>
commit 03bd8b9b896c8e940f282f540e6b4de90d666b7c upstream.

- Remove usless checks, because it is too late to check that inode != NULL
  at the moment it was referenced several times.
- Double lock routines looks very ugly and locking ordering relays on
  order of i_ino, but other kernel code rely on order of pointers.
  Let's make them simple and clean.
- check that inodes belongs to the same SB as soon as possible.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 03bd8b9b896c8e940f282f540e6b4de90d666b7c upstream.

- Remove usless checks, because it is too late to check that inode != NULL
  at the moment it was referenced several times.
- Double lock routines looks very ugly and locking ordering relays on
  order of i_ino, but other kernel code rely on order of pointers.
  Let's make them simple and clean.
- check that inodes belongs to the same SB as soon as possible.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix crash when accessing /proc/mounts concurrently</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Herton Ronaldo Krzesinski</name>
<email>herton.krzesinski@canonical.com</email>
</author>
<published>2012-09-24T02:49:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2fdb1128f404ab026d2794de3bd604500edd444b'/>
<id>2fdb1128f404ab026d2794de3bd604500edd444b</id>
<content type='text'>
commit 50df9fd55e4271e89a7adf3b1172083dd0ca199d upstream.

The crash was caused by a variable being erronously declared static in
token2str().

In addition to /proc/mounts, the problem can also be easily replicated
by accessing /proc/fs/ext4/&lt;partition&gt;/options in parallel:

$ cat /proc/fs/ext4/&lt;partition&gt;/options &gt; options.txt

... and then running the following command in two different terminals:

$ while diff /proc/fs/ext4/&lt;partition&gt;/options options.txt; do true; done

This is also the cause of the following a crash while running xfstests
#234, as reported in the following bug reports:

	https://bugs.launchpad.net/bugs/1053019
	https://bugzilla.kernel.org/show_bug.cgi?id=47731

Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Brad Figg &lt;brad.figg@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 50df9fd55e4271e89a7adf3b1172083dd0ca199d upstream.

The crash was caused by a variable being erronously declared static in
token2str().

In addition to /proc/mounts, the problem can also be easily replicated
by accessing /proc/fs/ext4/&lt;partition&gt;/options in parallel:

$ cat /proc/fs/ext4/&lt;partition&gt;/options &gt; options.txt

... and then running the following command in two different terminals:

$ while diff /proc/fs/ext4/&lt;partition&gt;/options options.txt; do true; done

This is also the cause of the following a crash while running xfstests
#234, as reported in the following bug reports:

	https://bugs.launchpad.net/bugs/1053019
	https://bugzilla.kernel.org/show_bug.cgi?id=47731

Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Brad Figg &lt;brad.figg@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: fix potential deadlock in ext4_nonda_switch()</title>
<updated>2012-10-12T20:50:24+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2012-09-20T02:42:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1638f1fb5b54c51f09769cb8938798d4122225d4'/>
<id>1638f1fb5b54c51f09769cb8938798d4122225d4</id>
<content type='text'>
commit 00d4e7362ed01987183e9528295de3213031309c upstream.

In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&amp;type-&gt;s_umount_key#18){++++..}, at: [&lt;c02277d4&gt;] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&amp;sb-&gt;s_type-&gt;i_mutex_key#8){+.+...}, at: [&lt;c01ddcce&gt;] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [&lt;c01ddcc5&gt;] generic_file_aio_write+0x56/0xd3
    #1:  (&amp;sb-&gt;s_type-&gt;i_mutex_key#8){+.+...}, at: [&lt;c01ddcce&gt;] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [&lt;c015b79c&gt;] ? console_unlock+0x345/0x372
    [&lt;c06d62a1&gt;] print_circular_bug+0x190/0x19d
    [&lt;c019906c&gt;] __lock_acquire+0x86d/0xb6c
    [&lt;c01999db&gt;] ? mark_held_locks+0x5c/0x7b
    [&lt;c0199724&gt;] lock_acquire+0x66/0xb9
    [&lt;c02277d4&gt;] ? writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c06db935&gt;] down_read+0x28/0x58
    [&lt;c02277d4&gt;] ? writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c02277d4&gt;] writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c026f3b2&gt;] ext4_nonda_switch+0xe1/0xf4
    [&lt;c0271ece&gt;] ext4_da_write_begin+0x27/0x193
    [&lt;c01dcdb0&gt;] generic_file_buffered_write+0xc8/0x1bb
    [&lt;c01ddc47&gt;] __generic_file_aio_write+0x1dd/0x205
    [&lt;c01ddce7&gt;] generic_file_aio_write+0x78/0xd3
    [&lt;c026d336&gt;] ext4_file_write+0x480/0x4a6
    [&lt;c0198c1d&gt;] ? __lock_acquire+0x41e/0xb6c
    [&lt;c0180944&gt;] ? sched_clock_cpu+0x11a/0x13e
    [&lt;c01967e9&gt;] ? trace_hardirqs_off+0xb/0xd
    [&lt;c018099f&gt;] ? local_clock+0x37/0x4e
    [&lt;c0209f2c&gt;] do_sync_write+0x67/0x9d
    [&lt;c0209ec5&gt;] ? wait_on_retry_sync_kiocb+0x44/0x44
    [&lt;c020a7b9&gt;] vfs_write+0x7b/0xe6
    [&lt;c020a9a6&gt;] sys_write+0x3b/0x64
    [&lt;c06dd4bd&gt;] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 00d4e7362ed01987183e9528295de3213031309c upstream.

In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&amp;type-&gt;s_umount_key#18){++++..}, at: [&lt;c02277d4&gt;] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&amp;sb-&gt;s_type-&gt;i_mutex_key#8){+.+...}, at: [&lt;c01ddcce&gt;] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [&lt;c01ddcc5&gt;] generic_file_aio_write+0x56/0xd3
    #1:  (&amp;sb-&gt;s_type-&gt;i_mutex_key#8){+.+...}, at: [&lt;c01ddcce&gt;] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [&lt;c015b79c&gt;] ? console_unlock+0x345/0x372
    [&lt;c06d62a1&gt;] print_circular_bug+0x190/0x19d
    [&lt;c019906c&gt;] __lock_acquire+0x86d/0xb6c
    [&lt;c01999db&gt;] ? mark_held_locks+0x5c/0x7b
    [&lt;c0199724&gt;] lock_acquire+0x66/0xb9
    [&lt;c02277d4&gt;] ? writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c06db935&gt;] down_read+0x28/0x58
    [&lt;c02277d4&gt;] ? writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c02277d4&gt;] writeback_inodes_sb_if_idle+0x28/0x46
    [&lt;c026f3b2&gt;] ext4_nonda_switch+0xe1/0xf4
    [&lt;c0271ece&gt;] ext4_da_write_begin+0x27/0x193
    [&lt;c01dcdb0&gt;] generic_file_buffered_write+0xc8/0x1bb
    [&lt;c01ddc47&gt;] __generic_file_aio_write+0x1dd/0x205
    [&lt;c01ddce7&gt;] generic_file_aio_write+0x78/0xd3
    [&lt;c026d336&gt;] ext4_file_write+0x480/0x4a6
    [&lt;c0198c1d&gt;] ? __lock_acquire+0x41e/0xb6c
    [&lt;c0180944&gt;] ? sched_clock_cpu+0x11a/0x13e
    [&lt;c01967e9&gt;] ? trace_hardirqs_off+0xb/0xd
    [&lt;c018099f&gt;] ? local_clock+0x37/0x4e
    [&lt;c0209f2c&gt;] do_sync_write+0x67/0x9d
    [&lt;c0209ec5&gt;] ? wait_on_retry_sync_kiocb+0x44/0x44
    [&lt;c020a7b9&gt;] vfs_write+0x7b/0xe6
    [&lt;c020a9a6&gt;] sys_write+0x3b/0x64
    [&lt;c06dd4bd&gt;] syscall_call+0x7/0xb

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
