linux-toradex.git/block, branch v3.10.58

genhd: fix leftover might_sleep() in blk_free_devt()

2014-10-05T21:54:13+00:00

commit 46f341ffcfb5d8530f7d1e60f3be06cce6661b62 upstream.

Commit 2da78092 changed the locking from a mutex to a spinlock,
so we now longer sleep in this context. But there was a leftover
might_sleep() in there, which now triggers since we do the final
free from an RCU callback. Get rid of it.

Reported-by: Pontus Fuchs 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: Fix dev_t minor allocation lifetime

2014-10-05T21:54:12+00:00

commit 2da78092dda13f1efd26edbbf99a567776913750 upstream.

Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.

Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.

Signed-off-by: Keith Busch 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

cfq-iosched: Fix wrong children_weight calculation

2014-10-05T21:54:08+00:00

commit e15693ef18e13e3e6bffe891fe140f18b8ff6d07 upstream.

cfq_group_service_tree_add() is applying new_weight at the beginning of
the function via cfq_update_group_weight().
This actually allows weight to change between adding it to and subtracting
it from children_weight, and triggers WARN_ON_ONCE() in
cfq_group_service_tree_del(), or even causes oops by divide error during
vfr calculation in cfq_group_service_tree_add().

The detailed scenario is as follows:
1. Create blkio cgroups X and Y as a child of X.
   Set X's weight to 500 and perform some I/O to apply new_weight.
   This X's I/O completes before starting Y's I/O.
2. Y starts I/O and cfq_group_service_tree_add() is called with Y.
3. cfq_group_service_tree_add() walks up the tree during children_weight
   calculation and adds parent X's weight (500) to children_weight of root.
   children_weight becomes 500.
4. Set X's weight to 1000.
5. X starts I/O and cfq_group_service_tree_add() is called with X.
6. cfq_group_service_tree_add() applies its new_weight (1000).
7. I/O of Y completes and cfq_group_service_tree_del() is called with Y.
8. I/O of X completes and cfq_group_service_tree_del() is called with X.
9. cfq_group_service_tree_del() subtracts X's weight (1000) from
   children_weight of root. children_weight becomes -500.
   This triggers WARN_ON_ONCE().
10. Set X's weight to 500.
11. X starts I/O and cfq_group_service_tree_add() is called with X.
12. cfq_group_service_tree_add() applies its new_weight (500) and adds it
    to children_weight of root. children_weight becomes 0. Calcularion of
    vfr triggers oops by divide error.

weight should be updated right before adding it to children_weight.

Reported-by: Ruki Sekiya 
Signed-off-by: Toshiaki Makita 
Acked-by: Tejun Heo 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blkcg: don't call into policy draining if root_blkg is already gone

2014-09-17T16:04:02+00:00

commit 2a1b4cf2331d92bc009bf94fa02a24604cdaf24c upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[]  [] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [] blkcg_drain_queue+0x1f/0x60
   [] __blk_drain_queue+0x71/0x180
   [] blk_queue_bypass_start+0x6e/0xb0
   [] blkcg_deactivate_policy+0x38/0x120
   [] blk_throtl_exit+0x34/0x50
   [] blkcg_exit_queue+0x35/0x40
   [] blk_release_queue+0x26/0xd0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] blk_put_queue+0x15/0x20
   [] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [] execute_in_process_context+0x89/0xa0
   [] scsi_device_dev_release+0x1c/0x20
   [] device_release+0x32/0xa0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] put_device+0x17/0x20
   [] __scsi_remove_device+0xa9/0xe0
   [] scsi_remove_device+0x2b/0x40
   [] sdev_store_delete+0x27/0x30
   [] dev_attr_store+0x18/0x30
   [] sysfs_kf_write+0x3e/0x50
   [] kernfs_fop_write+0xe7/0x170
   [] vfs_write+0xaf/0x1d0
   [] SyS_write+0x4d/0xc0
   [] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo 
Reported-by: Shirish Pargaonkar 
Reported-by: Sasha Levin 
Reported-by: Jet Chen 
Tested-by: Shirish Pargaonkar 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blkcg: don't call into policy draining if root_blkg is already gone

2014-07-31T19:53:49+00:00

commit 0b462c89e31f7eb6789713437eb551833ee16ff3 upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[]  [] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [] blkcg_drain_queue+0x1f/0x60
   [] __blk_drain_queue+0x71/0x180
   [] blk_queue_bypass_start+0x6e/0xb0
   [] blkcg_deactivate_policy+0x38/0x120
   [] blk_throtl_exit+0x34/0x50
   [] blkcg_exit_queue+0x35/0x40
   [] blk_release_queue+0x26/0xd0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] blk_put_queue+0x15/0x20
   [] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [] execute_in_process_context+0x89/0xa0
   [] scsi_device_dev_release+0x1c/0x20
   [] device_release+0x32/0xa0
   [] kobject_cleanup+0x38/0x70
   [] kobject_put+0x28/0x60
   [] put_device+0x17/0x20
   [] __scsi_remove_device+0xa9/0xe0
   [] scsi_remove_device+0x2b/0x40
   [] sdev_store_delete+0x27/0x30
   [] dev_attr_store+0x18/0x30
   [] sysfs_kf_write+0x3e/0x50
   [] kernfs_fop_write+0xe7/0x170
   [] vfs_write+0xaf/0x1d0
   [] SyS_write+0x4d/0xc0
   [] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo 
Reported-by: Shirish Pargaonkar 
Reported-by: Sasha Levin 
Reported-by: Jet Chen 
Tested-by: Shirish Pargaonkar 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: don't assume last put of shared tags is for the host

2014-07-31T19:53:48+00:00

commit d45b3279a5a2252cafcd665bbf2db8c9b31ef783 upstream.

There is no inherent reason why the last put of a tag structure must be
the one for the Scsi_Host, as device model objects can be held for
arbitrary periods.  Merge blk_free_tags and __blk_free_tags into a single
funtion that just release a references and get rid of the BUG() when the
host reference wasn't the last.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: provide compat ioctl for BLKZEROOUT

2014-07-31T19:53:48+00:00

commit 3b3a1814d1703027f9867d0f5cbbfaf6c7482474 upstream.

This patch provides the compat BLKZEROOUT ioctl. The argument is a pointer
to two uint64_t values, so there is no need to translate it.

Signed-off-by: Mikulas Patocka 
Acked-by: Martin K. Petersen 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

blktrace: fix accounting of partially completed requests

2014-05-31T04:52:11+00:00

commit af5040da01ef980670b3741b3e10733ee3e33566 upstream.

trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:

  C   R 232 + 240 [0]
  C   R 240 + 232 [0]
  C   R 248 + 224 [0]
  C   R 256 + 216 [0]

but should be:

  C   R 232 + 8 [0]
  C   R 240 + 8 [0]
  C   R 248 + 8 [0]
  C   R 256 + 8 [0]

Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.

This patch takes into account real completion size of the request and
fixes wrong completion accounting.

Signed-off-by: Roman Pen 
CC: Steven Rostedt 
CC: Frederic Weisbecker 
CC: Ingo Molnar 
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: add cond_resched() to potentially long running ioctl discard loop

2014-02-22T20:41:28+00:00

commit c8123f8c9cb517403b51aa41c3c46ff5e10b2c17 upstream.

When mkfs issues a full device discard and the device only
supports discards of a smallish size, we can loop in
blkdev_issue_discard() for a long time. If preempt isn't enabled,
this can turn into a softlock situation and the kernel will
start complaining.

Add an explicit cond_resched() at the end of the loop to avoid
that.

Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

block: __elv_next_request() shouldn't call into the elevator if bypassing

2014-02-22T20:41:28+00:00

commit 556ee818c06f37b2e583af0363e6b16d0e0270de upstream.

request_queue bypassing is used to suppress higher-level function of a
request_queue so that they can be switched, reconfigured and shut
down.  A request_queue does the followings while bypassing.

* bypasses elevator and io_cq association and queues requests directly
  to the FIFO dispatch queue.

* bypasses block cgroup request_list lookup and always uses the root
  request_list.

Once confirmed to be bypassing, specific elevator and block cgroup
policy implementations can assume that nothing is in flight for them
and perform various operations which would be dangerous otherwise.

Such confirmation is acheived by short-circuiting all new requests
directly to the dispatch queue and waiting for all the requests which
were issued before to finish.  Unfortunately, while the request
allocating and draining sides were properly handled, we forgot to
actually plug the request dispatch path.  Even after bypassing mode is
confirmed, if the attached driver tries to fetch a request and the
dispatch queue is empty, __elv_next_request() would invoke the current
elevator's elevator_dispatch_fn() callback.  As all in-flight requests
were drained, the elevator wouldn't contain any request but once
bypass is confirmed we don't even know whether the elevator is even
there.  It might be in the process of being switched and half torn
down.

Frank Mayhar reports that this actually happened while switching
elevators, leading to an oops.

Let's fix it by making __elv_next_request() avoid invoking the
elevator_dispatch_fn() callback if the queue is bypassing.  It already
avoids invoking the callback if the queue is dying.  As a dying queue
is guaranteed to be bypassing, we can simply replace blk_queue_dying()
check with blk_queue_bypass().

Reported-by: Frank Mayhar 
References: http://lkml.kernel.org/g/1390319905.20232.38.camel@bobble.lax.corp.google.com
Tested-by: Frank Mayhar 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman