summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-03-12blk-cgroup: Add unaccounted time to timeslice_used.Justin TerAvest
There are two kind of times that tasks are not charged for: the first seek and the extra time slice used over the allocated timeslice. Both of these exported as a new unaccounted_time stat. I think it would be good to have this reported in 'time' as well, but that is probably a separate discussion. Signed-off-by: Justin TerAvest <teravest@google.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11block: fixup plugging stubs for !CONFIG_BLOCKJens Axboe
They used an older prototype, fix it up. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11block: remove obsolete comments for blkdev_issue_zeroout.Tao Ma
barrier is already removed, so remove the obsolete comments in blkdev_issue_zeroout. Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.Tao Ma
In blk_add_trace_rq, we only chose the minor 2 bits from request's cmd_flags and did some check for discard. so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write after blkparse we get: 8,16 1 1 0.001776503 7509 A WS 1349632 + 1024 <- (8,17) 1347584 8,16 1 2 0.001776813 7509 Q WS 1349632 + 1024 [dd] 8,16 1 3 0.001780395 7509 G WS 1349632 + 1024 [dd] 8,16 1 5 0.001783186 7509 I W 1349632 + 1024 [dd] 8,16 1 11 0.001816987 7509 D W 1349632 + 1024 [dd] 8,16 0 2 0.006218192 0 C W 1349632 + 1024 [0] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to __blk_add_trace. With this patch, after a sync write we get: 8,16 1 1 0.001776900 5425 A WS 1189888 + 1024 <- (8,17) 1187840 8,16 1 2 0.001777179 5425 Q WS 1189888 + 1024 [dd] 8,16 1 3 0.001780797 5425 G WS 1189888 + 1024 [dd] 8,16 1 5 0.001783402 5425 I WS 1189888 + 1024 [dd] 8,16 1 11 0.001817468 5425 D WS 1189888 + 1024 [dd] 8,16 0 2 0.005640709 0 C WS 1189888 + 1024 [0] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10blk-throttle: Use blk_plug in throttle dispatchVivek Goyal
Use plug in throttle dispatch also as we are dispatching a bunch of bios in throttle context and some of them might merge. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: kill off REQ_UNPLUGJens Axboe
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10aio: remove request submission batchingJens Axboe
This should be useless now that we have on-stack plugging. So lets just kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10fs: make aio plugShaohua Li
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10fs: make mpage read/write_pages() plugJens Axboe
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10read-ahead: use pluggingJens Axboe
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10fs: make generic file read/write functions plugJens Axboe
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: initial patch for on-stack per-task pluggingJens Axboe
This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10scsi: convert to blk_delay_queue()Jens Axboe
It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10ide-cd: convert to blk_delay_queue() for a short pauseJens Axboe
It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: David S. Miller <davem@davemloft.net>
2011-03-10block: add API for delaying work/request_fn a little bitJens Axboe
Currently we use plugging for that, but as plugging is going away, we need an alternative mechanism. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-09staging: Convert to bdops->check_events()Tejun Heo
Convert two staging drivers - blkvsc_drv and cyasblkdev_block - from ->media_changed() to ->check_events(). The former always indicated media changed while the latter always indicated media not changed. Not sure what the drivers are trying to achieve but keep the original behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09pktcdvd: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). pktcdvd needs to forward all event related operations to the underlying device. Forward ->check_events() instead of ->media_changed() and inherit disk->[async_]events. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Peter Osterlund <petero2@telia.com>
2011-03-09umem: Drop dummy ->media_changed()Tejun Heo
umem doesn't implement media changed detection and there's no need to implement dummy callback anymore. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09s390/tape_block: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). s390/tape_block buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-03-09i2o_block: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). i2o_block buffers media changed state and clears it after reporting. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
2011-03-09xsysace: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). xsysace buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09ub: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). ub buffers media changed state and clears it on revalidation. It will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Pete Zaitcev <zaitcev@redhat.com>
2011-03-09swim[3]: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). Both swim and swim3 buffer media changed state and clear it on revalidation. They will behave correctly with kernel event polling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Laurent Vivier <laurent@lvivier.info> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-09dac960: Convert to bdops->check_events()Tejun Heo
Convert from ->media_changed() to ->check_events(). DAC960 media change notification seems to be one way (once set, never cleared) and will generate spurious events when polled once the condition triggers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09paride: Convert to bdops->check_events()Tejun Heo
Convert paride drivers from ->media_changed() to ->check_events(). pcd and pd buffer and clear events after reporting; however, pf unconditionally reports MEDIA_CHANGE and will generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tim Waugh <tim@cyberelk.net>
2011-03-09gdrom,viocd: Convert to bdops->check_events()Tejun Heo
Convert gdrom and viocd from ->media_changed() to ->check_events(). It's unclear how the conditions are cleared and it's possible that it may generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09floppy,{ami|ata}flop: Convert to bdops->check_events()Tejun Heo
Convert the floppy drivers from ->media_changed() to ->check_events(). Both floppy and ataflop buffer media changed state bit and clear them on revalidation and will behave correctly with kernel event polling. I can't tell how amiflop clears its event and it's possible that it may generate spurious events when polled. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09ide: Convert to bdops->check_events()Tejun Heo
Convert ->media_changed() to the new ->check_events() method. The conversion is mostly mechanical. The only notable change is that cdrom now doesn't generate any event if @slot_nr isn't CDSL_CURRENT. It used to return -EINVAL which would be treated as media changed. As media changer isn't supported anyway, this doesn't make any difference. This makes ide emit the standard disk events and allows kernel event polling. Currently, only MEDIA_CHANGE event is implemented. Adding support for EJECT_REQUEST shouldn't be difficult; however, given that ide driver is already deprecated, it probably is best to leave it alone. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-ide@vger.kernel.org
2011-03-09block: Don't check events while open is in progressTejun Heo
Not all block drivers clear events immediately after reporting. Some do so in ->revalidate_disk() or other steps during ->open(). There is a slim chance event poll may happen between the clearing event check from check_disk_change() and the actual clearing of the events which would result in spurious events. Block event checks while block device open is in progress. There is no need to kick explicit event check afterwards as events are always checked during open. -v2: The original patch could have called disk_unblock_events() with an already released or %NULL @disk causing oops. Fixed by making sure references are put after disk_unblock_events() is called. It also makes the error path of __blkdev_get() a bit simpler. This problem was reported by Jens. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09block: Don't check events on close unless it was blockedTejun Heo
The block event mechanism currently always checks events when the device is being closed regardless of the open mode. The intention was to allow detection of EJECT_REQUEST when a device is closed whether disk event polling is enabled or not. This is unnecessary as, for devices of interest, events are checked from either userland or kernel and in the former case ->check_events() is performed on open of each poll attempt anyway. Furthermore, this unconditional event check on close makes the code susceptible to event loop if the block driver doesn't clear reported events correctly - an event triggers userland to open and close the device which in turn causes another event, rinse and repeat. Check events on close only if it was blocked by excl write open. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09block: Don't implicitly trigger event check on disk_unblock_events()Tejun Heo
Currently, disk_unblock_events() implicitly kick event check if the block count reaches zero. This behavior is not described in the comment and hinders with future changes. Make the unblocker explicitly check events by calling disk_check_events() as necessary. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-08blk-cgroup: Lower minimum weight from 100 to 10.Justin TerAvest
We've found that we still get good, useful isolation at weights this low. I'd like to adjust the minimum so that any other changes can take these values into account. Signed-off-by: Justin TerAvest <teravest@google.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-08block: biovec_slab vs. CONFIG_BLK_DEV_INTEGRITYMartin K. Petersen
The block integrity subsystem no longer uses the bio_vec slabs so this code can safely be compiled in. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07blk-throttle: Some cleanups and race fixes in limit update codeVivek Goyal
When throttle group limits are updated through cgroups, a thread is woken up to process these updates. While reviewing that code, oleg noted couple of race conditions existed in the code and he also suggested that code can be simplified. This patch fixes the races simplifies the code based on Oleg's suggestions: - Use xchg(). - Introduced a common function throtl_update_blkio_group_common() which is shared now by all iops/bps update functions. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data as the argument now, not the queue. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07blk-throttle: process limit change only through one functionVivek Goyal
With the help of cgroup interface one can go and upate the bps/iops limits of existing group. Once the limits are udpated, a thread is woken up to see if some blocked group needs recalculation based on new limits and needs to be requeued. There was also a piece of code where I was checking for group limit update when a fresh bio comes in. This patch gets rid of that piece of code and keeps processing the limit change at one place throtl_process_limit_change(). It just keeps the code simple and easy to understand. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07Merge branch 'block-for-2.6.39-core' of ↵Jens Axboe
ssh://master.kernel.org/pub/scm/linux/kernel/git/tj/misc into for-2.6.39/core
2011-03-07cfq-iosched: Fix update_vdisktime logicGui Jianfeng
The update_vdisktime logic is broken since commit b54ce60eb7f61f8e314b8b241b0469eda3bb1d42, st->min_vdisktime never makes a progress. Fix it. Thanks Vivek for pointing it out. Signed-off-by: Gui Jianfeng <guijianfen@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07cfq-iosched: give busy sync queue no dispatch limitShaohua Li
If there are a sync and an async queue and the sync queue's think time is small, we can ignore the sync queue's dispatch quantum. Because the sync queue will always preempt the async queue, we don't need to care about async's latency. This can fix a performance regression of aiostress test, which is introduced by commit f8ae6e3eb825. The issue should exist even without the commit, but the commit amplifies the impact. The initial post does the same optimization for RT queue too, but since I have no real workload for it, Vivek suggests to drop it. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07cfq-iosched: fix race in cfq_set_request()Jens Axboe
We need to hold the queue lock over the reference increment, it's not atomic anymore. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-04Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/coreTejun Heo
This merge creates two set of conflicts. One is simple context conflicts caused by removal of throtl_scheduled_delayed_work() in for-linus and removal of throtl_shutdown_timer_wq() in for-2.6.39/core. The other is caused by commit 255bb490c8 (block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()) in for-linus crashing with FLUSH reimplementation in for-2.6.39/core. The conflict isn't trivial but the resolution is straight-forward. * __blk_run_queue() calls in flush_end_io() and flush_data_end_io() should be called with @force_kblockd set to %true. * elv_insert() in blk_kick_flush() should use %ELEVATOR_INSERT_REQUEUE. Both changes are to avoid invoking ->request_fn() directly from request completion path and closely match the changes in the commit 255bb490c8. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-03block: kill loop_mutexPetr Uzel
Following steps lead to deadlock in kernel: dd if=/dev/zero of=img bs=512 count=1000 losetup -f img mkfs.ext2 /dev/loop0 mount -t ext2 -o loop /dev/loop0 mnt umount mnt/ Stacktrace: [<c102ec04>] irq_exit+0x36/0x59 [<c101502c>] smp_apic_timer_interrupt+0x6b/0x75 [<c127f639>] apic_timer_interrupt+0x31/0x38 [<c101df88>] mutex_spin_on_owner+0x54/0x5b [<fe2250e9>] lo_release+0x12/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a4da5>] fput+0xd5/0x1aa [<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop] [<fe225110>] lo_release+0x39/0x67 [loop] [<c10c4eae>] __blkdev_put+0x7c/0x10c [<c10a59d9>] deactivate_locked_super+0x17/0x36 [<c10b6f37>] sys_umount+0x27e/0x2a5 [<c10b6f69>] sys_oldumount+0xb/0xe [<c1002897>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff Regression since 2a48fc0ab24241755dc9, which introduced the private loop_mutex as part of the BKL removal process. As per [1], the mutex can be safely removed. [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930 Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172 Signed-off-by: Petr Uzel <petr.uzel@suse.cz> Cc: stable@kernel.org Reviewed-by: Nikanth Karthikesan <knikanth@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-03blktrace: Remove blk_fill_rwbs_rq.Tao Ma
If we enable trace events to trace block actions, We use blk_fill_rwbs_rq to analyze the corresponding actions in request's cmd_flags, but we only choose the minor 2 bits from it, so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write we get: write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + = 8 [write_test] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and blk_fill_rwbs_rq isn't needed any more. With this patch, after a sync write we get: write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 += 8 [write_test] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: Move blk_throtl_exit() call to blk_cleanup_queue()Vivek Goyal
Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is written in such a way that it needs queue lock. In blk_release_queue() there is no gurantee that ->queue_lock is still around. Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported one problem. https://lkml.org/lkml/2010/10/23/86 And a quick fix moved blk_throtl_exit() to blk_release_queue(). commit 7ad58c028652753814054f4e3ac58f925e7343f4 Author: Jens Axboe <jaxboe@fusionio.com> Date: Sat Oct 23 20:40:26 2010 +0200 block: fix use-after-free bug in blk throttle code This patch reverts above change and does not try to shutdown the throtl work in blk_sync_queue(). By avoiding call to throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid the problem reported by Ingo. blk_sync_queue() seems to be used only by md driver and it seems to be using it to make sure q->unplug_fn is not called as md registers its own unplug functions and it is about to free up the data structures used by unplug_fn(). Block throttle does not call back into unplug_fn() or into md. So there is no need to cancel blk throttle work. In fact I think cancelling block throttle work is bad because it might happen that some bios are throttled and scheduled to be dispatched later with the help of pending work and if work is cancelled, these bios might never be dispatched. Block layer also uses blk_sync_queue() during blk_cleanup_queue() and blk_release_queue() time. That should be safe as we are also calling blk_throtl_exit() which should make sure all the throttling related data structures are cleaned up. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02loop: No need to initialize ->queue_lock explicitly before calling ↵Vivek Goyal
blk_cleanup_queue() Now we initialize ->queue_lock at queue allocation time so driver does not have to worry about initializing it before calling blk_cleanup_queue(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: Initialize ->queue_lock to internal lock at queue allocation timeVivek Goyal
There does not seem to be a clear convention whether q->queue_lock is initialized or not when blk_cleanup_queue() is called. In the past it was not necessary but now blk_throtl_exit() takes up queue lock by default and needs queue lock to be available. In fact elevator_exit() code also has similar requirement just that it is less stringent in the sense that elevator_exit() is called only if elevator is initialized. Two problems have been noticed because of ambiguity about spin lock status. - If a driver calls blk_alloc_queue() and then soon calls blk_cleanup_queue() almost immediately, (because some other driver structure allocation failed or some other error happened) then blk_throtl_exit() will run into issues as queue lock is not initialized. Loop driver ran into this issue recently and I noticed error paths in md driver too. Similar error paths should exist in other drivers too. - If some driver provided external spin lock and zapped the lock before blk_cleanup_queue(), then it can lead to issues. So this patch initializes the default queue lock at queue allocation time. block throttling code is one of the users of queue lock and it is initialized at the queue allocation time, so it makes sense to initialize ->queue_lock also to internal lock. A driver can overide that lock later. This will take care of the issue where a driver does not have to worry about initializing the queue lock to default before calling blk_cleanup_queue() Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block/genhd: Change some numerals into macrosLiu Yuan
Rename the numerals in the diskstats_show() into the macros. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()Tejun Heo
blk-flush decomposes a flush into sequence of multiple requests. On completion of a request, the next one is queued; however, block layer must not implicitly call into q->request_fn() directly from completion path. This makes the queue behave unexpectedly when seen from the drivers and violates the assumption that q->request_fn() is called with process context + queue_lock. This patch makes blk-flush the following two changes to make sure q->request_fn() is not called directly from request completion path. - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always use kblockd instead of calling directly into q->request_fn(). - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the request queue directly. Reported by Jan in the following threads. http://thread.gmane.org/gmane.linux.ide/48778 http://thread.gmane.org/gmane.linux.ide/48786 stable: applicable to v2.6.37. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: add @force_kblockd to __blk_run_queue()Tejun Heo
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>