linux-toradex.git/block/blk-cgroup.c, branch v3.10.89

blkcg: fix gendisk reference leak in blkg_conf_prep()

2015-08-10T19:20:30+00:00

commit 5f6c2d2b7dbb541c1e922538c49fa04c494ae3d7 upstream.

When a blkcg configuration is targeted to a partition rather than a
whole device, blkg_conf_prep fails with -EINVAL; unfortunately, it
forgets to put the gendisk ref in that case.  Fix it.

Signed-off-by: Tejun Heo 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blkcg: don't call into policy draining if root_blkg is already gone

2014-09-17T16:04:02+00:00

commit 2a1b4cf2331d92bc009bf94fa02a24604cdaf24c upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[]  [] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [] blkcg_drain_queue+0x1f/0x60
   [] __blk_drain_queue+0x71/0x180
   [] blk_queue_bypass_start+0x6e/0xb0
   [] blkcg_deactivate_policy+0x38/0x120
   [] blk_throtl_exit+0x34/0x50
   [] blkcg_exit_queue+0x35/0x40
   [] blk_release_queue+0x26/0xd0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] blk_put_queue+0x15/0x20
   [] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [] execute_in_process_context+0x89/0xa0
   [] scsi_device_dev_release+0x1c/0x20
   [] device_release+0x32/0xa0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] put_device+0x17/0x20
   [] __scsi_remove_device+0xa9/0xe0
   [] scsi_remove_device+0x2b/0x40
   [] sdev_store_delete+0x27/0x30
   [] dev_attr_store+0x18/0x30
   [] sysfs_kf_write+0x3e/0x50
   [] kernfs_fop_write+0xe7/0x170
   [] vfs_write+0xaf/0x1d0
   [] SyS_write+0x4d/0xc0
   [] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo 
Reported-by: Shirish Pargaonkar 
Reported-by: Sasha Levin 
Reported-by: Jet Chen 
Tested-by: Shirish Pargaonkar 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blkcg: don't call into policy draining if root_blkg is already gone

2014-07-31T19:53:49+00:00

commit 0b462c89e31f7eb6789713437eb551833ee16ff3 upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[]  [] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [] blkcg_drain_queue+0x1f/0x60
   [] __blk_drain_queue+0x71/0x180
   [] blk_queue_bypass_start+0x6e/0xb0
   [] blkcg_deactivate_policy+0x38/0x120
   [] blk_throtl_exit+0x34/0x50
   [] blkcg_exit_queue+0x35/0x40
   [] blk_release_queue+0x26/0xd0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] blk_put_queue+0x15/0x20
   [] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [] execute_in_process_context+0x89/0xa0
   [] scsi_device_dev_release+0x1c/0x20
   [] device_release+0x32/0xa0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] put_device+0x17/0x20
   [] __scsi_remove_device+0xa9/0xe0
   [] scsi_remove_device+0x2b/0x40
   [] sdev_store_delete+0x27/0x30
   [] dev_attr_store+0x18/0x30
   [] sysfs_kf_write+0x3e/0x50
   [] kernfs_fop_write+0xe7/0x170
   [] vfs_write+0xaf/0x1d0
   [] SyS_write+0x4d/0xc0
   [] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo 
Reported-by: Shirish Pargaonkar 
Reported-by: Sasha Levin 
Reported-by: Jet Chen 
Tested-by: Shirish Pargaonkar 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blkcg: fix "scheduling while atomic" in blk_queue_bypass_start

2013-04-09T13:01:21+00:00

Since 749fefe677 in v3.7 ("block: lift the initial queue bypass mode
on blk_register_queue() instead of blk_init_allocated_queue()"),
the following warning appears when multipath is used with CONFIG_PREEMPT=y.

This patch moves blk_queue_bypass_start() before radix_tree_preload()
to avoid the sleeping call while preemption is disabled.

  BUG: scheduling while atomic: multipath/2460/0x00000002
  1 lock held by multipath/2460:
   #0:  (&md->type_lock){......}, at: [] dm_lock_md_type+0x17/0x19 [dm_mod]
  Modules linked in: ...
  Pid: 2460, comm: multipath Tainted: G        W    3.7.0-rc2 #1
  Call Trace:
   [] __schedule_bug+0x6a/0x78
   [] __schedule+0xb4/0x5e0
   [] schedule+0x64/0x66
   [] schedule_timeout+0x39/0xf8
   [] ? put_lock_stats+0xe/0x29
   [] ? lock_release_holdtime+0xb6/0xbb
   [] wait_for_common+0x9d/0xee
   [] ? try_to_wake_up+0x206/0x206
   [] ? kfree_call_rcu+0x1c/0x1c
   [] wait_for_completion+0x1d/0x1f
   [] wait_rcu_gp+0x5d/0x7a
   [] ? wait_rcu_gp+0x7a/0x7a
   [] ? complete+0x21/0x53
   [] synchronize_rcu+0x1e/0x20
   [] blk_queue_bypass_start+0x5d/0x62
   [] blkcg_activate_policy+0x73/0x270
   [] ? kmem_cache_alloc_node_trace+0xc7/0x108
   [] cfq_init_queue+0x80/0x28e
   [] ? dm_blk_ioctl+0xa7/0xa7 [dm_mod]
   [] elevator_init+0xe1/0x115
   [] ? blk_queue_make_request+0x54/0x59
   [] blk_init_allocated_queue+0x8c/0x9e
   [] dm_setup_md_queue+0x36/0xaa [dm_mod]
   [] table_load+0x1bd/0x2c8 [dm_mod]
   [] ctl_ioctl+0x1d6/0x236 [dm_mod]
   [] ? table_clear+0xaa/0xaa [dm_mod]
   [] dm_ctl_ioctl+0x13/0x17 [dm_mod]
   [] do_vfs_ioctl+0x3fb/0x441
   [] ? file_has_perm+0x8a/0x99
   [] sys_ioctl+0x5e/0x82
   [] ? trace_hardirqs_on_thunk+0x3a/0x3f
   [] system_call_fastpath+0x16/0x1b

Signed-off-by: Jun'ichi Nomura 
Acked-by: Vivek Goyal 
Acked-by: Tejun Heo 
Cc: Alasdair G Kergon 
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block

2013-02-28T20:52:24+00:00

Pull block IO core bits from Jens Axboe:
 "Below are the core block IO bits for 3.9.  It was delayed a few days
  since my workstation kept crashing every 2-8h after pulling it into
  current -git, but turns out it is a bug in the new pstate code (divide
  by zero, will report separately).  In any case, it contains:

   - The big cfq/blkcg update from Tejun and and Vivek.

   - Additional block and writeback tracepoints from Tejun.

   - Improvement of the should sort (based on queues) logic in the plug
     flushing.

   - _io() variants of the wait_for_completion() interface, using
     io_schedule() instead of schedule() to contribute to io wait
     properly.

   - Various little fixes.

  You'll get two trivial merge conflicts, which should be easy enough to
  fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
  block: remove redundant check to bd_openers()
  block: use i_size_write() in bd_set_size()
  cfq: fix lock imbalance with failed allocations
  drivers/block/swim3.c: fix null pointer dereference
  block: don't select PERCPU_RWSEM
  block: account iowait time when waiting for completion of IO request
  sched: add wait_for_completion_io[_timeout]
  writeback: add more tracepoints
  block: add block_{touch|dirty}_buffer tracepoint
  buffer: make touch_buffer() an exported function
  block: add @req to bio_{front|back}_merge tracepoints
  block: add missing block_bio_complete() tracepoint
  block: Remove should_sort judgement when flush blk_plug
  block,elevator: use new hashtable implementation
  cfq-iosched: add hierarchical cfq_group statistics
  cfq-iosched: collect stats from dead cfqgs
  cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
  blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
  block: RCU free request_queue
  blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
  ...

hlist: drop the node parameter from iterators

2013-02-28T03:10:24+00:00

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin 
Acked-by: Paul E. McKenney 
Signed-off-by: Sasha Levin 
Cc: Wu Fengguang 
Cc: Marcelo Tosatti 
Cc: Gleb Natapov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock

2013-01-09T16:05:13+00:00

Instead of holding blkcg->lock while walking ->blkg_list and executing
prfill(), RCU walk ->blkg_list and hold the blkg's queue lock while
executing prfill().  This makes prfill() implementations easier as
stats are mostly protected by queue lock.

This will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal

blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()

2013-01-09T16:05:12+00:00

Implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge().
The former two collect the [rw]stats designated by the target policy
data and offset from the pd's subtree.  The latter two add one
[rw]stat to another.

Note that the recursive sum functions require the queue lock to be
held on entry to make blkg online test reliable.  This is necessary to
properly handle stats of a dying blkg.

These will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal

blkcg: export __blkg_prfill_rwstat()

2013-01-09T16:05:12+00:00

Hierarchical stats for cfq-iosched will need __blkg_prfill_rwstat().
Export it.

Signed-off-by: Tejun Heo 
Reported-by: Fengguang Wu

blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online

2013-01-09T16:05:12+00:00

Add two blkcg_policy methods, ->online_pd_fn() and ->offline_pd_fn(),
which are invoked as the policy_data gets activated and deactivated
while holding both blkcg and q locks.

Also, add blkcg_gq->online bool, which is set and cleared as the
blkcg_gq gets activated and deactivated.  This flag also is toggled
while holding both blkcg and q locks.

These will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal