linux-toradex.git/include/linux/backing-dev.h, branch v4.13-rc4

writeback: rework wb_[dec|inc]_stat family of functions

2017-07-12T23:26:05+00:00

Currently the writeback statistics code uses a percpu counters to hold
various statistics.  Furthermore we have 2 families of functions - those
which disable local irq and those which doesn't and whose names begin
with double underscore.  However, they both end up calling
__add_wb_stats which in turn calls percpu_counter_add_batch which is
already irq-safe.

Exploiting this fact allows to eliminated the __wb_* functions since
they don't add any further protection than we already have.
Furthermore, refactor the wb_* function to call __add_wb_stat directly
without the irq-disabling dance.  This will likely result in better
runtime of code which deals with modifying the stat counters.

While at it also document why percpu_counter_add_batch is in fact
preempt and irq-safe since at least 3 people got confused.

Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov 
Acked-by: Tejun Heo 
Reviewed-by: Jan Kara 
Cc: Josef Bacik 
Cc: Mel Gorman 
Cc: Jeff Layton 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

include/linux/backing-dev.h: simplify wb_stat_sum

2017-07-10T23:32:32+00:00

wb_stat_sum() disables interrupts and calls __wb_stat_sum() which
eventually calls __percpu_counter_sum().  However, the percpu routine is
already irq-safe.  Simplify the code a bit by making wb_stat_sum()
directly call percpu_counter_sum_positive() and not disable interrupts.

Also remove the now-uneeded __wb_stat_sum() which was just a wrapper
over percpu_counter_sum_positive().

Link: http://lkml.kernel.org/r/1498230681-29103-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov 
Acked-by: Peter Zijlstra 
Cc: Tejun Heo 
Cc: Jan Kara 
Cc: Jens Axboe 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch

2017-06-20T19:42:32+00:00

Currently, percpu_counter_add is a wrapper around __percpu_counter_add
which is preempt safe due to explicit calls to preempt_disable.  Given
how __ prefix is used in percpu related interfaces, the naming
unfortunately creates the false sense that __percpu_counter_add is
less safe than percpu_counter_add.  In terms of context-safety,
they're equivalent.  The only difference is that the __ version takes
a batch parameter.

Make this a bit more explicit by just renaming __percpu_counter_add to
percpu_counter_add_batch.

This patch doesn't cause any functional changes.

tj: Minor updates to patch description for clarity.  Cosmetic
    indentation updates.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Tejun Heo 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: Darrick J. Wong 
Cc: Jan Kara 
Cc: Jens Axboe 
Cc: linux-mm@kvack.org
Cc: "David S. Miller"

bdi: Drop 'parent' argument from bdi_register[_va]()

2017-04-20T18:09:55+00:00

Drop 'parent' argument of bdi_register() and bdi_register_va().  It is
always NULL.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe

block: Remove unused functions

2017-04-20T18:09:55+00:00

Now that all backing_dev_info structure are allocated separately, we can
drop some unused functions.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe

bdi: Provide bdi_register_va() and bdi_alloc()

2017-04-20T18:09:55+00:00

Add function that registers bdi and takes va_list instead of variable
number of arguments.

Add bdi_alloc() as simple wrapper for NUMA-unaware users allocating BDI.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe

block: Get rid of blk_get_backing_dev_info()

2017-02-02T15:21:32+00:00

blk_get_backing_dev_info() is now a simple dereference. Remove that
function and simplify some code around that.

Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe

block: Dynamically allocate and refcount backing_dev_info

2017-02-02T15:20:50+00:00

Instead of storing backing_dev_info inside struct request_queue,
allocate it dynamically, reference count it, and free it when the last
reference is dropped. Currently only request_queue holds the reference
but in the following patch we add other users referencing
backing_dev_info.

Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe

block: fix bdi vs gendisk lifetime mismatch

2016-08-04T20:19:16+00:00

The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live.  Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:

 WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

 Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
  0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
  ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
  0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
 Call Trace:
  [] dump_stack+0x63/0x87
  [] __warn+0xd1/0xf0
  [] warn_slowpath_fmt+0x5f/0x80
  [] sysfs_warn_dup+0x64/0x80
  [] sysfs_create_dir_ns+0x7e/0x90
  [] kobject_add_internal+0xaa/0x320
  [] ? vsnprintf+0x34e/0x4d0
  [] kobject_add+0x75/0xd0
  [] ? mutex_lock+0x12/0x2f
  [] device_add+0x125/0x610
  [] device_create_groups_vargs+0xd8/0x100
  [] device_create_vargs+0x1c/0x20
  [] bdi_register+0x8c/0x180
  [] bdi_register_dev+0x27/0x30
  [] add_disk+0x175/0x4a0

Cc: 
Reported-by: Yi Zhang 
Tested-by: Yi Zhang 
Signed-off-by: Dan Williams 

Fixed up missing 0 return in bdi_register_owner().

Signed-off-by: Jens Axboe

mm, vmscan: move LRU lists to node

2016-07-28T23:07:41+00:00

This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic.  Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes.  It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies.  For example, the scans are
per-zone but using per-node counters.  We also mark a node as congested
when a zone is congested.  This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

   When pages are skipped, they are immediately added back onto the LRU
   list. If lowmem reclaim persisted for long periods of time, the same
   highmem pages get continually scanned. The idea would be that lowmem
   keeps those pages on a separate list until a reclaim for highmem pages
   arrives that splices the highmem pages back onto the LRU. It potentially
   could be implemented similar to the UNEVICTABLE list.

   That would reduce the skip rate with the potential corner case is that
   highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

   This will break LRU ordering but may be preferable and faster during
   memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman 
Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Cc: Hillf Danton 
Cc: Joonsoo Kim 
Cc: Michal Hocko 
Cc: Minchan Kim 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds