<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/fs/jbd2/checkpoint.c, branch v3.2.17</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>jbd2: use WRITE_SYNC in journal checkpoint</title>
<updated>2011-06-27T16:36:29+00:00</updated>
<author>
<name>Tao Ma</name>
<email>boyu.mt@taobao.com</email>
</author>
<published>2011-06-27T16:36:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d3ad8434aa83ef7c88bc91edcfe012cdcbab9f3e'/>
<id>d3ad8434aa83ef7c88bc91edcfe012cdcbab9f3e</id>
<content type='text'>
In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [&lt;ffffffff8103e456&gt;] ? __dequeue_entity+0x33/0x38
 [&lt;ffffffff8103caad&gt;] ? need_resched+0x23/0x2d
 [&lt;ffffffff814006a6&gt;] ? thread_return+0xa2/0xbc
 [&lt;ffffffffa01f6224&gt;] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [&lt;ffffffffa01f6224&gt;] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [&lt;ffffffff81400d31&gt;] __mutex_lock_common+0x14e/0x1a9
 [&lt;ffffffffa021dbfb&gt;] ? brelse+0x13/0x15 [ext4]
 [&lt;ffffffff81400ddb&gt;] __mutex_lock_slowpath+0x19/0x1b
 [&lt;ffffffff81400b2d&gt;] mutex_lock+0x1b/0x32
 [&lt;ffffffffa01f927b&gt;] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [&lt;ffffffffa01f547b&gt;] start_this_handle+0x438/0x527 [jbd2]
 [&lt;ffffffff8106f491&gt;] ? autoremove_wake_function+0x0/0x3e
 [&lt;ffffffffa01f560b&gt;] jbd2_journal_start+0xa1/0xcc [jbd2]
 [&lt;ffffffffa02353be&gt;] ext4_journal_start_sb+0x57/0x81 [ext4]
 [&lt;ffffffffa024a314&gt;] ext4_xattr_set+0x6c/0xe3 [ext4]
 [&lt;ffffffffa024aaff&gt;] ext4_xattr_user_set+0x42/0x4b [ext4]
 [&lt;ffffffff81145adb&gt;] generic_setxattr+0x6b/0x76
 [&lt;ffffffff81146ac0&gt;] __vfs_setxattr_noperm+0x47/0xc0
 [&lt;ffffffff81146bb8&gt;] vfs_setxattr+0x7f/0x9a
 [&lt;ffffffff81146c88&gt;] setxattr+0xb5/0xe8
 [&lt;ffffffff81137467&gt;] ? do_filp_open+0x571/0xa6e
 [&lt;ffffffff81146d26&gt;] sys_fsetxattr+0x6b/0x91
 [&lt;ffffffff81002d32&gt;] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Robin Dong &lt;sanbai@taobao.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [&lt;ffffffff8103e456&gt;] ? __dequeue_entity+0x33/0x38
 [&lt;ffffffff8103caad&gt;] ? need_resched+0x23/0x2d
 [&lt;ffffffff814006a6&gt;] ? thread_return+0xa2/0xbc
 [&lt;ffffffffa01f6224&gt;] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [&lt;ffffffffa01f6224&gt;] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [&lt;ffffffff81400d31&gt;] __mutex_lock_common+0x14e/0x1a9
 [&lt;ffffffffa021dbfb&gt;] ? brelse+0x13/0x15 [ext4]
 [&lt;ffffffff81400ddb&gt;] __mutex_lock_slowpath+0x19/0x1b
 [&lt;ffffffff81400b2d&gt;] mutex_lock+0x1b/0x32
 [&lt;ffffffffa01f927b&gt;] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [&lt;ffffffffa01f547b&gt;] start_this_handle+0x438/0x527 [jbd2]
 [&lt;ffffffff8106f491&gt;] ? autoremove_wake_function+0x0/0x3e
 [&lt;ffffffffa01f560b&gt;] jbd2_journal_start+0xa1/0xcc [jbd2]
 [&lt;ffffffffa02353be&gt;] ext4_journal_start_sb+0x57/0x81 [ext4]
 [&lt;ffffffffa024a314&gt;] ext4_xattr_set+0x6c/0xe3 [ext4]
 [&lt;ffffffffa024aaff&gt;] ext4_xattr_user_set+0x42/0x4b [ext4]
 [&lt;ffffffff81145adb&gt;] generic_setxattr+0x6b/0x76
 [&lt;ffffffff81146ac0&gt;] __vfs_setxattr_noperm+0x47/0xc0
 [&lt;ffffffff81146bb8&gt;] vfs_setxattr+0x7f/0x9a
 [&lt;ffffffff81146c88&gt;] setxattr+0xb5/0xe8
 [&lt;ffffffff81137467&gt;] ? do_filp_open+0x571/0xa6e
 [&lt;ffffffff81146d26&gt;] sys_fsetxattr+0x6b/0x91
 [&lt;ffffffff81002d32&gt;] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

Signed-off-by: Tao Ma &lt;boyu.mt@taobao.com&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Reported-by: Robin Dong &lt;sanbai@taobao.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: Fix oops in jbd2_journal_remove_journal_head()</title>
<updated>2011-06-13T19:38:22+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2011-06-13T19:38:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=de1b794130b130e77ffa975bb58cb843744f9ae5'/>
<id>de1b794130b130e77ffa975bb58cb843744f9ae5</id>
<content type='text'>
jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh-&gt;b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
jbd2_journal_remove_journal_head() can oops when trying to access
journal_head returned by bh2jh(). This is caused for example by the
following race:

	TASK1					TASK2
  jbd2_journal_commit_transaction()
    ...
    processing t_forget list
      __jbd2_journal_refile_buffer(jh);
      if (!jh-&gt;b_transaction) {
        jbd_unlock_bh_state(bh);
					jbd2_journal_try_to_free_buffers()
					  jbd2_journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  jbd2_journal_put_journal_head(jh)
        jbd2_journal_remove_journal_head(bh);

jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and
buffer is not part of any transaction and thus frees journal_head
before TASK1 gets to doing so. Note that even buffer_head can be
released by try_to_free_buffers() after
jbd2_journal_put_journal_head() which adds even larger opportunity for
oops (but I didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head
reference (in b_jcount). That way we don't have to remove journal_head
explicitely via jbd2_journal_remove_journal_head() and instead just
remove journal_head when b_jcount drops to zero. The result of this is
that [__]jbd2_journal_refile_buffer(),
[__]jbd2_journal_unfile_buffer(), and
__jdb2_journal_remove_checkpoint() can free journal_head which needs
modification of a few callers. Also we have to be careful because once
journal_head is removed, buffer_head might be freed as well. So we
have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'next' into upstream-merge</title>
<updated>2010-10-28T03:44:47+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-10-28T03:44:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a107e5a3a473a2ea62bd5af24e11b84adf1486ff'/>
<id>a107e5a3a473a2ea62bd5af24e11b84adf1486ff</id>
<content type='text'>
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: Add sanity check for attempts to start handle during umount</title>
<updated>2010-10-28T01:30:04+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-10-28T01:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5c2178e785244341d1e6f2bc3b62f20a337cc44f'/>
<id>5c2178e785244341d1e6f2bc3b62f20a337cc44f</id>
<content type='text'>
An attempt to modify the file system during the call to
jbd2_destroy_journal() can lead to a system lockup.  So add some
checking to make it much more obvious when this happens to and to
determine where the offending code is located.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An attempt to modify the file system during the call to
jbd2_destroy_journal() can lead to a system lockup.  So add some
checking to make it much more obvious when this happens to and to
determine where the offending code is located.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: remove BLKDEV_IFL_WAIT</title>
<updated>2010-09-16T18:52:58+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-09-16T18:51:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dd3932eddf428571762596e17b65f5dc92ca361b'/>
<id>dd3932eddf428571762596e17b65f5dc92ca361b</id>
<content type='text'>
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller.  To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>remove SWRITE* I/O types</title>
<updated>2010-08-18T05:09:01+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-08-11T15:06:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9cb569d601e0b93e01c20a22872270ec663b75f6'/>
<id>9cb569d601e0b93e01c20a22872270ec663b75f6</id>
<content type='text'>
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: Change j_state_lock to be a rwlock_t</title>
<updated>2010-08-04T01:35:12+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-08-04T01:35:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a931da6ac9331a6c80dd91c199105806f2336188'/>
<id>a931da6ac9331a6c80dd91c199105806f2336188</id>
<content type='text'>
Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop</title>
<updated>2010-08-02T12:43:25+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2010-08-02T12:43:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a51dca9cd3bb4ec5a05bfb6feabf024a5c808a37'/>
<id>a51dca9cd3bb4ec5a05bfb6feabf024a5c808a37</id>
<content type='text'>
By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blkdev: generalize flags for blkdev_issue_fn functions</title>
<updated>2010-04-28T17:47:36+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>dmonakhov@openvz.org</email>
</author>
<published>2010-04-28T13:55:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fbd9b09a177a481eda256447c881f014f29034fe'/>
<id>fbd9b09a177a481eda256447c881f014f29034fe</id>
<content type='text'>
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Add new tracepoint for jbd2_cleanup_journal_tail</title>
<updated>2009-12-23T12:45:44+00:00</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2009-12-23T12:45:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=71f2be213a0009098819e5c04f75ff19f84f2122'/>
<id>71f2be213a0009098819e5c04f75ff19f84f2122</id>
<content type='text'>
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
</feed>
