linux-toradex.git/include/linux/backing-dev.h, branch Colibri_T30_LinuxImageV2.1Beta2_20140206

atomic: use

2011-07-26T23:49:47+00:00

This allows us to move duplicated code in 
(atomic_inc_not_zero() for now) to 

Signed-off-by: Arun Sharma 
Reviewed-by: Eric Dumazet 
Cc: Ingo Molnar 
Cc: David Miller 
Cc: Eric Dumazet 
Acked-by: Mike Frysinger 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

writeback: bdi write bandwidth estimation

2011-07-10T05:09:01+00:00

The estimation value will start from 100MB/s and adapt to the real
bandwidth in seconds.

It tries to update the bandwidth only when disk is fully utilized.
Any inactive period of more than one second will be skipped.

The estimated bandwidth will be reflecting how fast the device can
writeout when _fully utilized_, and won't drop to 0 when it goes idle.
The value will remain constant at disk idle time. At busy write time, if
not considering fluctuations, it will also remain high unless be knocked
down by possible concurrent reads that compete for the disk time and
bandwidth with async writes.

The estimation is not done purely in the flusher because there is no
guarantee for write_cache_pages() to return timely to update bandwidth.

The bdi->avg_write_bandwidth smoothing is very effective for filtering
out sudden spikes, however may be a little biased in long term.

The overheads are low because the bdi bandwidth update only occurs at
200ms intervals.

The 200ms update interval is suitable, because it's not possible to get
the real bandwidth for the instance at all, due to large fluctuations.

The NFS commits can be as large as seconds worth of data. One XFS
completion may be as large as half second worth of data if we are going
to increase the write chunk to half second worth of data. In ext4,
fluctuations with time period of around 5 seconds is observed. And there
is another pattern of irregular periods of up to 20 seconds on SSD tests.

That's why we are not only doing the estimation at 200ms intervals, but
also averaging them over a period of 3 seconds and then go further to do
another level of smoothing in avg_write_bandwidth.

CC: Li Shaohua 
CC: Peter Zijlstra 
Signed-off-by: Wu Fengguang

writeback: account per-bdi accumulated written pages

2011-07-10T05:09:01+00:00

Introduce the BDI_WRITTEN counter. It will be used for estimating the
bdi's write bandwidth.

Peter Zijlstra :
Move BDI_WRITTEN accounting into __bdi_writeout_inc().
This will cover and fix fuse, which only calls bdi_writeout_inc().

CC: Michael Rubin 
Reviewed-by: KOSAKI Motohiro 
Signed-off-by: Jan Kara 
Signed-off-by: Wu Fengguang

writeback: split inode_wb_list_lock into bdi_writeback.list_lock

2011-06-08T00:25:21+00:00

Split the global inode_wb_list_lock into a per-bdi_writeback list_lock,
as it's currently the most contended lock in the system for metadata
heavy workloads.  It won't help for single-filesystem workloads for
which we'll need the I/O-less balance_dirty_pages, but at least we
can dedicate a cpu to spinning on each bdi now for larger systems.

Based on earlier patches from Nick Piggin and Dave Chinner.

It reduces lock contentions to 1/4 in this test case:
10 HDD JBOD, 100 dd on each disk, XFS, 6GB ram

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vanilla 2.6.39-rc3:
                      inode_wb_list_lock:         42590          44433           0.12         147.74      144127.35         252274         886792           0.08         121.34      917211.23
                      ------------------
                      inode_wb_list_lock              2          [] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             34          [] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock          12893          [] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock          10702          [] writeback_single_inode+0x16d/0x20a
                      ------------------
                      inode_wb_list_lock              2          [] bdev_inode_switch_bdi+0x29/0x85
                      inode_wb_list_lock             19          [] inode_wb_list_del+0x22/0x49
                      inode_wb_list_lock           5550          [] __mark_inode_dirty+0x170/0x1d0
                      inode_wb_list_lock           8511          [] writeback_sb_inodes+0x10f/0x157

2.6.39-rc3 + patch:
                &(&wb->list_lock)->rlock:         11383          11657           0.14         151.69       40429.51          90825         527918           0.11         145.90      556843.37
                ------------------------
                &(&wb->list_lock)->rlock             10          [] inode_wb_list_del+0x5f/0x86
                &(&wb->list_lock)->rlock           1493          [] writeback_inodes_wb+0x3d/0x150
                &(&wb->list_lock)->rlock           3652          [] writeback_sb_inodes+0x123/0x16f
                &(&wb->list_lock)->rlock           1412          [] writeback_single_inode+0x17f/0x223
                ------------------------
                &(&wb->list_lock)->rlock              3          [] bdi_lock_two+0x46/0x4b
                &(&wb->list_lock)->rlock              6          [] inode_wb_list_del+0x5f/0x86
                &(&wb->list_lock)->rlock           2061          [] __mark_inode_dirty+0x173/0x1cf
                &(&wb->list_lock)->rlock           2629          [] writeback_sb_inodes+0x123/0x16f

hughd@google.com: fix recursive lock when bdi_lock_two() is called with new the same as old
akpm@linux-foundation.org: cleanup bdev_inode_switch_bdi() comment

Signed-off-by: Christoph Hellwig 
Signed-off-by: Hugh Dickins 
Signed-off-by: Andrew Morton 
Signed-off-by: Wu Fengguang

block: remove per-queue plugging

2011-03-10T07:52:07+00:00

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

mm: declare some external symbols

2010-10-26T23:52:10+00:00

Declare 'bdi_pending_list' and 'tag_pages_for_writeback()' to remove
following sparse warnings:

 mm/backing-dev.c:46:1: warning: symbol 'bdi_pending_list' was not declared. Should it be static?
 mm/page-writeback.c:825:6: warning: symbol 'tag_pages_for_writeback' was not declared. Should it be static?

Signed-off-by: Namhyung Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone

2010-10-26T23:52:07+00:00

If congestion_wait() is called with no BDI congested, the caller will
sleep for the full timeout and this may be an unnecessary sleep.  This
patch adds a wait_iff_congested() that checks congestion and only sleeps
if a BDI is congested else, it calls cond_resched() to ensure the caller
is not hogging the CPU longer than its quota but otherwise will not sleep.

This is aimed at reducing some of the major desktop stalls reported during
IO.  For example, while kswapd is operating, it calls congestion_wait()
but it could just have been reclaiming clean page cache pages with no
congestion.  Without this patch, it would sleep for a full timeout but
after this patch, it'll just call schedule() if it has been on the CPU too
long.  Similar logic applies to direct reclaimers that are not making
enough progress.

Signed-off-by: Mel Gorman 
Cc: Johannes Weiner 
Cc: Minchan Kim 
Cc: Wu Fengguang 
Cc: KAMEZAWA Hiroyuki 
Cc: KOSAKI Motohiro 
Cc: Rik van Riel 
Cc: Jens Axboe 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: fix writeback_in_progress()

2010-08-12T15:43:30+00:00

Commit 83ba7b071f3 ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished.  Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false.  This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued.  But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.

Signed-off-by: Jan Kara 
Cc: Christoph Hellwig 
Cc: Wu Fengguang 
Acked-by: Jens Axboe 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

writeback: optimize periodic bdi thread wakeups

2010-08-07T16:53:56+00:00

Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.

This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.

This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.

We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.

Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
this change as well.

This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.

Signed-off-by: Artem Bityutskiy 
Signed-off-by: Jens Axboe

writeback: move last_active to bdi

2010-08-07T16:53:56+00:00

Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.

Signed-off-by: Artem Bityutskiy 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Jens Axboe