<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/backing-dev.h, branch Colibri_T30_LinuxImageV2.1Beta2_20140206</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>atomic: use &lt;linux/atomic.h&gt;</title>
<updated>2011-07-26T23:49:47+00:00</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2011-07-26T23:09:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=60063497a95e716c9a689af3be2687d261f115b4'/>
<id>60063497a95e716c9a689af3be2687d261f115b4</id>
<content type='text'>
This allows us to move duplicated code in &lt;asm/atomic.h&gt;
(atomic_inc_not_zero() for now) to &lt;linux/atomic.h&gt;

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows us to move duplicated code in &lt;asm/atomic.h&gt;
(atomic_inc_not_zero() for now) to &lt;linux/atomic.h&gt;

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: bdi write bandwidth estimation</title>
<updated>2011-07-10T05:09:01+00:00</updated>
<author>
<name>Wu Fengguang</name>
<email>fengguang.wu@intel.com</email>
</author>
<published>2010-08-29T17:22:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e98be2d599207c6b31e9bb340d52a231b2f3662d'/>
<id>e98be2d599207c6b31e9bb340d52a231b2f3662d</id>
<content type='text'>
The estimation value will start from 100MB/s and adapt to the real
bandwidth in seconds.

It tries to update the bandwidth only when disk is fully utilized.
Any inactive period of more than one second will be skipped.

The estimated bandwidth will be reflecting how fast the device can
writeout when _fully utilized_, and won't drop to 0 when it goes idle.
The value will remain constant at disk idle time. At busy write time, if
not considering fluctuations, it will also remain high unless be knocked
down by possible concurrent reads that compete for the disk time and
bandwidth with async writes.

The estimation is not done purely in the flusher because there is no
guarantee for write_cache_pages() to return timely to update bandwidth.

The bdi-&gt;avg_write_bandwidth smoothing is very effective for filtering
out sudden spikes, however may be a little biased in long term.

The overheads are low because the bdi bandwidth update only occurs at
200ms intervals.

The 200ms update interval is suitable, because it's not possible to get
the real bandwidth for the instance at all, due to large fluctuations.

The NFS commits can be as large as seconds worth of data. One XFS
completion may be as large as half second worth of data if we are going
to increase the write chunk to half second worth of data. In ext4,
fluctuations with time period of around 5 seconds is observed. And there
is another pattern of irregular periods of up to 20 seconds on SSD tests.

That's why we are not only doing the estimation at 200ms intervals, but
also averaging them over a period of 3 seconds and then go further to do
another level of smoothing in avg_write_bandwidth.

CC: Li Shaohua &lt;shaohua.li@intel.com&gt;
CC: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The estimation value will start from 100MB/s and adapt to the real
bandwidth in seconds.

It tries to update the bandwidth only when disk is fully utilized.
Any inactive period of more than one second will be skipped.

The estimated bandwidth will be reflecting how fast the device can
writeout when _fully utilized_, and won't drop to 0 when it goes idle.
The value will remain constant at disk idle time. At busy write time, if
not considering fluctuations, it will also remain high unless be knocked
down by possible concurrent reads that compete for the disk time and
bandwidth with async writes.

The estimation is not done purely in the flusher because there is no
guarantee for write_cache_pages() to return timely to update bandwidth.

The bdi-&gt;avg_write_bandwidth smoothing is very effective for filtering
out sudden spikes, however may be a little biased in long term.

The overheads are low because the bdi bandwidth update only occurs at
200ms intervals.

The 200ms update interval is suitable, because it's not possible to get
the real bandwidth for the instance at all, due to large fluctuations.

The NFS commits can be as large as seconds worth of data. One XFS
completion may be as large as half second worth of data if we are going
to increase the write chunk to half second worth of data. In ext4,
fluctuations with time period of around 5 seconds is observed. And there
is another pattern of irregular periods of up to 20 seconds on SSD tests.

That's why we are not only doing the estimation at 200ms intervals, but
also averaging them over a period of 3 seconds and then go further to do
another level of smoothing in avg_write_bandwidth.

CC: Li Shaohua &lt;shaohua.li@intel.com&gt;
CC: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: account per-bdi accumulated written pages</title>
<updated>2011-07-10T05:09:01+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2010-12-09T04:44:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f7d2b1ecd0c714adefc7d3a942ef87beb828a763'/>
<id>f7d2b1ecd0c714adefc7d3a942ef87beb828a763</id>
<content type='text'>
Introduce the BDI_WRITTEN counter. It will be used for estimating the
bdi's write bandwidth.

Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;:
Move BDI_WRITTEN accounting into __bdi_writeout_inc().
This will cover and fix fuse, which only calls bdi_writeout_inc().

CC: Michael Rubin &lt;mrubin@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce the BDI_WRITTEN counter. It will be used for estimating the
bdi's write bandwidth.

Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;:
Move BDI_WRITTEN accounting into __bdi_writeout_inc().
This will cover and fix fuse, which only calls bdi_writeout_inc().

CC: Michael Rubin &lt;mrubin@google.com&gt;
Reviewed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: split inode_wb_list_lock into bdi_writeback.list_lock</title>
<updated>2011-06-08T00:25:21+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-04-22T00:19:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f758eeabeb96f878c860e8f110f94ec8820822a9'/>
<id>f758eeabeb96f878c860e8f110f94ec8820822a9</id>
<content type='text'>
Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
as it's currently the most contended lock in the system for metadata
heavy workloads.  It won't help for single-filesystem workloads for
which we'll need the I/O-less balance_dirty_pages, but at least we
can dedicate a cpu to spinning on each bdi now for larger systems.

Based on earlier patches from Nick Piggin and Dave Chinner.

It reduces lock contentions to 1/4 in this test case:
10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             34          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock          12893          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock          10702          [&lt;ffffffff8115afef&gt;] writeback_single_inode+0x16d/0x20a
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             19          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock           5550          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock           8511          [&lt;ffffffff8115b4ad&gt;] writeback_sb_inodes+0x10f/0x157

2.6.39-rc3 + patch:
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock             10          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1493          [&lt;ffffffff8115b1ed&gt;] writeback_inodes_wb+0x3d/0x150
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           3652          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1412          [&lt;ffffffff8115a38e&gt;] writeback_single_inode+0x17f/0x223
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              3          [&lt;ffffffff8110b5af&gt;] bdi_lock_two+0x46/0x4b
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              6          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2061          [&lt;ffffffff8115af97&gt;] __mark_inode_dirty+0x173/0x1cf
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2629          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f

hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
as it's currently the most contended lock in the system for metadata
heavy workloads.  It won't help for single-filesystem workloads for
which we'll need the I/O-less balance_dirty_pages, but at least we
can dedicate a cpu to spinning on each bdi now for larger systems.

Based on earlier patches from Nick Piggin and Dave Chinner.

It reduces lock contentions to 1/4 in this test case:
10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             34          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock          12893          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock          10702          [&lt;ffffffff8115afef&gt;] writeback_single_inode+0x16d/0x20a
                      ------------------
                      inode_wb_list_lock              2          [&lt;ffffffff81165da5&gt;] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             19          [&lt;ffffffff8115bd0b&gt;] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock           5550          [&lt;ffffffff8115bb53&gt;] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock           8511          [&lt;ffffffff8115b4ad&gt;] writeback_sb_inodes+0x10f/0x157

2.6.39-rc3 + patch:
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock             10          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1493          [&lt;ffffffff8115b1ed&gt;] writeback_inodes_wb+0x3d/0x150
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           3652          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           1412          [&lt;ffffffff8115a38e&gt;] writeback_single_inode+0x17f/0x223
                ------------------------
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              3          [&lt;ffffffff8110b5af&gt;] bdi_lock_two+0x46/0x4b
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock              6          [&lt;ffffffff8115b189&gt;] inode_wb_list_del+0x5f/0x86
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2061          [&lt;ffffffff8115af97&gt;] __mark_inode_dirty+0x173/0x1cf
                &amp;(&amp;wb-&gt;list_lock)-&gt;rlock           2629          [&lt;ffffffff8115a8e9&gt;] writeback_sb_inodes+0x123/0x16f

hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: remove per-queue plugging</title>
<updated>2011-03-10T07:52:07+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-03-10T07:52:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7eaceaccab5f40bbfda044629a6298616aeaed50'/>
<id>7eaceaccab5f40bbfda044629a6298616aeaed50</id>
<content type='text'>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops-&gt;sync_page().

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops-&gt;sync_page().

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: declare some external symbols</title>
<updated>2010-10-26T23:52:10+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@gmail.com</email>
</author>
<published>2010-10-26T21:22:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=92c09c041f15fc88b35f8628e07639f52e1fbb38'/>
<id>92c09c041f15fc88b35f8628e07639f52e1fbb38</id>
<content type='text'>
Declare 'bdi_pending_list' and 'tag_pages_for_writeback()' to remove
following sparse warnings:

 mm/backing-dev.c:46:1: warning: symbol 'bdi_pending_list' was not declared. Should it be static?
 mm/page-writeback.c:825:6: warning: symbol 'tag_pages_for_writeback' was not declared. Should it be static?

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Declare 'bdi_pending_list' and 'tag_pages_for_writeback()' to remove
following sparse warnings:

 mm/backing-dev.c:46:1: warning: symbol 'bdi_pending_list' was not declared. Should it be static?
 mm/page-writeback.c:825:6: warning: symbol 'tag_pages_for_writeback' was not declared. Should it be static?

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone</title>
<updated>2010-10-26T23:52:07+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mel@csn.ul.ie</email>
</author>
<published>2010-10-26T21:21:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0e093d99763eb4cea09f8ca4f1d01f34e121d10b'/>
<id>0e093d99763eb4cea09f8ca4f1d01f34e121d10b</id>
<content type='text'>
If congestion_wait() is called with no BDI congested, the caller will
sleep for the full timeout and this may be an unnecessary sleep.  This
patch adds a wait_iff_congested() that checks congestion and only sleeps
if a BDI is congested else, it calls cond_resched() to ensure the caller
is not hogging the CPU longer than its quota but otherwise will not sleep.

This is aimed at reducing some of the major desktop stalls reported during
IO.  For example, while kswapd is operating, it calls congestion_wait()
but it could just have been reclaiming clean page cache pages with no
congestion.  Without this patch, it would sleep for a full timeout but
after this patch, it'll just call schedule() if it has been on the CPU too
long.  Similar logic applies to direct reclaimers that are not making
enough progress.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If congestion_wait() is called with no BDI congested, the caller will
sleep for the full timeout and this may be an unnecessary sleep.  This
patch adds a wait_iff_congested() that checks congestion and only sleeps
if a BDI is congested else, it calls cond_resched() to ensure the caller
is not hogging the CPU longer than its quota but otherwise will not sleep.

This is aimed at reducing some of the major desktop stalls reported during
IO.  For example, while kswapd is operating, it calls congestion_wait()
but it could just have been reclaiming clean page cache pages with no
congestion.  Without this patch, it would sleep for a full timeout but
after this patch, it'll just call schedule() if it has been on the CPU too
long.  Similar logic applies to direct reclaimers that are not making
enough progress.

Signed-off-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: fix writeback_in_progress()</title>
<updated>2010-08-12T15:43:30+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2010-08-11T21:17:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=81d73a32d775ae9674ea6edf0b5b721fc3bc57d9'/>
<id>81d73a32d775ae9674ea6edf0b5b721fc3bc57d9</id>
<content type='text'>
Commit 83ba7b071f3 ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished.  Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false.  This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued.  But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 83ba7b071f3 ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished.  Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false.  This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued.  But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.

Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: optimize periodic bdi thread wakeups</title>
<updated>2010-08-07T16:53:56+00:00</updated>
<author>
<name>Artem Bityutskiy</name>
<email>Artem.Bityutskiy@nokia.com</email>
</author>
<published>2010-07-25T11:29:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6467716a37673e8d47b4984eb19839bdad0a8353'/>
<id>6467716a37673e8d47b4984eb19839bdad0a8353</id>
<content type='text'>
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.

This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.

This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.

We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.

Since now we take 'bdi-&gt;wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi-&gt;wb_lock'. This patch makes
this change as well.

This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.

This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.

This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.

We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.

Since now we take 'bdi-&gt;wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi-&gt;wb_lock'. This patch makes
this change as well.

This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>writeback: move last_active to bdi</title>
<updated>2010-08-07T16:53:56+00:00</updated>
<author>
<name>Artem Bityutskiy</name>
<email>Artem.Bityutskiy@nokia.com</email>
</author>
<published>2010-07-25T11:29:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ecd584030da67ede1bf17955746a6ce834d9fc6b'/>
<id>ecd584030da67ede1bf17955746a6ce834d9fc6b</id>
<content type='text'>
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.

Signed-off-by: Artem Bityutskiy &lt;Artem.Bityutskiy@nokia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
