<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/block/elevator.c, branch v3.2-rc5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown</title>
<updated>2011-10-19T12:42:16+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:42:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c9a929dde3913780b5c416f4bb9d9ed804f509ce'/>
<id>c9a929dde3913780b5c416f4bb9d9ed804f509ce</id>
<content type='text'>
request_queue is refcounted but actually depdends on lifetime
management from the queue owner - on blk_cleanup_queue(), block layer
expects that there's no request passing through request_queue and no
new one will.

This is fundamentally broken.  The queue owner (e.g. SCSI layer)
doesn't have a way to know whether there are other active users before
calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
guarantee that the queue is and would stay valid while it's holding a
reference.

With delay added in blk_queue_bio() before queue_lock is grabbed, the
following oops can be easily triggered when a device is removed with
in-flight IOs.

 sd 0:0:1:0: [sdb] Stopping disk
 ata1.01: disabled
 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
 RIP: 0010:[&lt;ffffffff8137d651&gt;]  [&lt;ffffffff8137d651&gt;] elv_rqhash_find+0x61/0x100
 ...
 Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
 ...
 Call Trace:
  [&lt;ffffffff8137d774&gt;] elv_merge+0x84/0xe0
  [&lt;ffffffff81385b54&gt;] blk_queue_bio+0xf4/0x400
  [&lt;ffffffff813838ea&gt;] generic_make_request+0xca/0x100
  [&lt;ffffffff81383994&gt;] submit_bio+0x74/0x100
  [&lt;ffffffff811c53ec&gt;] dio_bio_submit+0xbc/0xc0
  [&lt;ffffffff811c610e&gt;] __blockdev_direct_IO+0x92e/0xb40
  [&lt;ffffffff811c39f7&gt;] blkdev_direct_IO+0x57/0x60
  [&lt;ffffffff8113b1c5&gt;] generic_file_aio_read+0x6d5/0x760
  [&lt;ffffffff8118c1ca&gt;] do_sync_read+0xda/0x120
  [&lt;ffffffff8118ce55&gt;] vfs_read+0xc5/0x180
  [&lt;ffffffff8118cfaa&gt;] sys_pread64+0x9a/0xb0
  [&lt;ffffffff81afaf6b&gt;] system_call_fastpath+0x16/0x1b

This happens because blk_queue_cleanup() destroys the queue and
elevator whether IOs are in progress or not and DEAD tests are
sprinkled in the request processing path without proper
synchronization.

Similar problem exists for blk-throtl.  On queue cleanup, blk-throtl
is shutdown whether it has requests in it or not.  Depending on
timing, it either oopses or throttled bios are lost putting tasks
which are waiting for bio completion into eternal D state.

The way it should work is having the usual clear distinction between
shutdown and release.  Shutdown drains all currently pending requests,
marks the queue dead, and performs partial teardown of the now
unnecessary part of the queue.  Even after shutdown is complete,
reference holders are still allowed to issue requests to the queue
although they will be immmediately failed.  The rest of teardown
happens on release.

This patch makes the following changes to make blk_queue_cleanup()
behave as proper shutdown.

* QUEUE_FLAG_DEAD is now set while holding both q-&gt;exit_mutex and
  queue_lock.

* Unsynchronized DEAD check in generic_make_request_checks() removed.
  This couldn't make any meaningful difference as the queue could die
  after the check.

* blk_drain_queue() updated such that it can drain all requests and is
  now called during cleanup.

* blk_throtl updated such that it checks DEAD on grabbing queue_lock,
  drains all throttled bios during cleanup and free td when queue is
  released.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
request_queue is refcounted but actually depdends on lifetime
management from the queue owner - on blk_cleanup_queue(), block layer
expects that there's no request passing through request_queue and no
new one will.

This is fundamentally broken.  The queue owner (e.g. SCSI layer)
doesn't have a way to know whether there are other active users before
calling blk_cleanup_queue() and other users (e.g. bsg) don't have any
guarantee that the queue is and would stay valid while it's holding a
reference.

With delay added in blk_queue_bio() before queue_lock is grabbed, the
following oops can be easily triggered when a device is removed with
in-flight IOs.

 sd 0:0:1:0: [sdb] Stopping disk
 ata1.01: disabled
 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:

 Pid: 648, comm: test_rawio Not tainted 3.1.0-rc3-work+ #56 Bochs Bochs
 RIP: 0010:[&lt;ffffffff8137d651&gt;]  [&lt;ffffffff8137d651&gt;] elv_rqhash_find+0x61/0x100
 ...
 Process test_rawio (pid: 648, threadinfo ffff880019efa000, task ffff880019ef8a80)
 ...
 Call Trace:
  [&lt;ffffffff8137d774&gt;] elv_merge+0x84/0xe0
  [&lt;ffffffff81385b54&gt;] blk_queue_bio+0xf4/0x400
  [&lt;ffffffff813838ea&gt;] generic_make_request+0xca/0x100
  [&lt;ffffffff81383994&gt;] submit_bio+0x74/0x100
  [&lt;ffffffff811c53ec&gt;] dio_bio_submit+0xbc/0xc0
  [&lt;ffffffff811c610e&gt;] __blockdev_direct_IO+0x92e/0xb40
  [&lt;ffffffff811c39f7&gt;] blkdev_direct_IO+0x57/0x60
  [&lt;ffffffff8113b1c5&gt;] generic_file_aio_read+0x6d5/0x760
  [&lt;ffffffff8118c1ca&gt;] do_sync_read+0xda/0x120
  [&lt;ffffffff8118ce55&gt;] vfs_read+0xc5/0x180
  [&lt;ffffffff8118cfaa&gt;] sys_pread64+0x9a/0xb0
  [&lt;ffffffff81afaf6b&gt;] system_call_fastpath+0x16/0x1b

This happens because blk_queue_cleanup() destroys the queue and
elevator whether IOs are in progress or not and DEAD tests are
sprinkled in the request processing path without proper
synchronization.

Similar problem exists for blk-throtl.  On queue cleanup, blk-throtl
is shutdown whether it has requests in it or not.  Depending on
timing, it either oopses or throttled bios are lost putting tasks
which are waiting for bio completion into eternal D state.

The way it should work is having the usual clear distinction between
shutdown and release.  Shutdown drains all currently pending requests,
marks the queue dead, and performs partial teardown of the now
unnecessary part of the queue.  Even after shutdown is complete,
reference holders are still allowed to issue requests to the queue
although they will be immmediately failed.  The rest of teardown
happens on release.

This patch makes the following changes to make blk_queue_cleanup()
behave as proper shutdown.

* QUEUE_FLAG_DEAD is now set while holding both q-&gt;exit_mutex and
  queue_lock.

* Unsynchronized DEAD check in generic_make_request_checks() removed.
  This couldn't make any meaningful difference as the queue could die
  after the check.

* blk_drain_queue() updated such that it can drain all requests and is
  now called during cleanup.

* blk_throtl updated such that it checks DEAD on grabbing queue_lock,
  drains all throttled bios during cleanup and free td when queue is
  released.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: reorganize queue draining</title>
<updated>2011-10-19T12:32:38+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2011-10-19T12:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e3c78ca524d230bc145e902625e88c392a58ddf3'/>
<id>e3c78ca524d230bc145e902625e88c392a58ddf3</id>
<content type='text'>
Reorganize queue draining related code in preparation of queue exit
changes.

* Factor out actual draining from elv_quiesce_start() to
  blk_drain_queue().

* Make elv_quiesce_start/end() responsible for their own locking.

* Replace open-coded ELVSWITCH clearing in elevator_switch() with
  elv_quiesce_end().

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reorganize queue draining related code in preparation of queue exit
changes.

* Factor out actual draining from elv_quiesce_start() to
  blk_drain_queue().

* Make elv_quiesce_start/end() responsible for their own locking.

* Replace open-coded ELVSWITCH clearing in elevator_switch() with
  elv_quiesce_end().

This patch doesn't cause any visible functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>elevator: use ELV_NAME_MAX instead of magic number 16 for chosen_elevator</title>
<updated>2011-09-12T06:59:20+00:00</updated>
<author>
<name>Wang Sheng-Hui</name>
<email>shhuiw@gmail.com</email>
</author>
<published>2011-09-08T10:32:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=484fc254b88257a2d8b3759aa062e8e8b35e0988'/>
<id>484fc254b88257a2d8b3759aa062e8e8b35e0988</id>
<content type='text'>
We have ELV_NAME_MAX defined to 16, and hence we should use it
instead of the magic nubmer 16 for elevator's name string.

Signed-off-by: Wang Sheng-Hui &lt;shhuiw@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have ELV_NAME_MAX defined to 16, and hence we should use it
instead of the magic nubmer 16 for elevator's name string.

Signed-off-by: Wang Sheng-Hui &lt;shhuiw@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iosched: prevent aliased requests from starving other I/O</title>
<updated>2011-06-02T19:19:05+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2011-06-02T19:19:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=796d5116c407690b14fd5bda136aa67a39e7061a'/>
<id>796d5116c407690b14fd5bda136aa67a39e7061a</id>
<content type='text'>
Hi, Jens,

If you recall, I posted an RFC patch for this back in July of last year:
http://lkml.org/lkml/2010/7/13/279

The basic problem is that a process can issue a never-ending stream of
async direct I/Os to the same sector on a device, thus starving out
other I/O in the system (due to the way the alias handling works in both
cfq and deadline).  The solution I proposed back then was to start
dispatching from the fifo after a certain number of aliases had been
dispatched.  Vivek asked why we had to treat aliases differently at all,
and I never had a good answer.  So, I put together a simple patch which
allows aliases to be added to the rb tree (it adds them to the right,
though that doesn't matter as the order isn't guaranteed anyway).  I
think this is the preferred solution, as it doesn't break up time slices
in CFQ or batches in deadline.  I've tested it, and it does solve the
starvation issue.  Let me know what you think.

Cheers,
Jeff

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hi, Jens,

If you recall, I posted an RFC patch for this back in July of last year:
http://lkml.org/lkml/2010/7/13/279

The basic problem is that a process can issue a never-ending stream of
async direct I/Os to the same sector on a device, thus starving out
other I/O in the system (due to the way the alias handling works in both
cfq and deadline).  The solution I proposed back then was to start
dispatching from the fifo after a certain number of aliases had been
dispatched.  Vivek asked why we had to treat aliases differently at all,
and I never had a good answer.  So, I put together a simple patch which
allows aliases to be added to the rb tree (it adds them to the right,
though that doesn't matter as the order isn't guaranteed anyway).  I
think this is the preferred solution, as it doesn't break up time slices
in CFQ or batches in deadline.  I've tested it, and it does solve the
starvation issue.  Let me know what you think.

Cheers,
Jeff

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: get rid of on-stack plugging debug checks</title>
<updated>2011-05-20T18:52:16+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-05-20T18:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=771949d03b4f5295f648f09141325fd478f6c7ce'/>
<id>771949d03b4f5295f648f09141325fd478f6c7ce</id>
<content type='text'>
We don't need them anymore, so kill:

- REQ_ON_PLUG checks in various places
- !rq_mergeable() check in plug merging

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't need them anymore, so kill:

- REQ_ON_PLUG checks in various places
- !rq_mergeable() check in plug merging

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge commit 'v2.6.39' into for-2.6.40/core</title>
<updated>2011-05-20T18:33:15+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-05-20T18:33:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=698567f3fa790fea37509a54dea855302dd88331'/>
<id>698567f3fa790fea37509a54dea855302dd88331</id>
<content type='text'>
Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
had churn in the core area that makes it difficult to handle
patches for eg cfq or blk-throttle. Instead of requiring that they
be based in older versions with bugs that have been fixed later
in the rc cycle, merge in 2.6.39 final.

Also fixes up conflicts in the below files.

Conflicts:
	drivers/block/paride/pcd.c
	drivers/cdrom/viocd.c
	drivers/ide/ide-cd.c

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since for-2.6.40/core was forked off the 2.6.39 devel tree, we've
had churn in the core area that makes it difficult to handle
patches for eg cfq or blk-throttle. Instead of requiring that they
be based in older versions with bugs that have been fixed later
in the rc cycle, merge in 2.6.39 final.

Also fixes up conflicts in the below files.

Conflicts:
	drivers/block/paride/pcd.c
	drivers/cdrom/viocd.c
	drivers/ide/ide-cd.c

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iosched: remove redundant sprintf</title>
<updated>2011-05-06T00:02:12+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees.cook@canonical.com</email>
</author>
<published>2011-05-06T00:02:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=490b94be0282c3b67f56453628ff0aaae827a670'/>
<id>490b94be0282c3b67f56453628ff0aaae827a670</id>
<content type='text'>
After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.

Signed-off-by: Kees Cook &lt;kees.cook@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.

Signed-off-by: Kees Cook &lt;kees.cook@canonical.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too</title>
<updated>2011-04-21T17:28:35+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-04-21T17:28:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3aa72873ffdcc2f7919743efbbefc351ec73f5cb'/>
<id>3aa72873ffdcc2f7919743efbbefc351ec73f5cb</id>
<content type='text'>
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add blk_run_queue_async</title>
<updated>2011-04-18T09:41:33+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2011-04-18T09:41:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=24ecfbe27f65563909b14492afda2f1c21f7c044'/>
<id>24ecfbe27f65563909b14492afda2f1c21f7c044</id>
<content type='text'>
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: get rid of elv_insert() interface</title>
<updated>2011-04-05T21:51:37+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jaxboe@fusionio.com</email>
</author>
<published>2011-03-30T07:52:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b710a480554f2be682bac3cb59b0e085ba3d644b'/>
<id>b710a480554f2be682bac3cb59b0e085ba3d644b</id>
<content type='text'>
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().

Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
