summaryrefslogtreecommitdiff
path: root/block/blk-cgroup.h
AgeCommit message (Collapse)Author
2010-10-01blkio-throttle: limit max iops value to UINT_MAXVivek Goyal
- Limit max iops value to UINT_MAX and return error to user if value is more than that instead of accepting bigger values and truncating implicitly. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-01blkio: Recalculate the throttled bio dispatch time upon throttle limit changeVivek Goyal
o Currently any cgroup throttle limit changes are processed asynchronousy and the change does not take affect till a new bio is dispatched from same group. o It might happen that a user sets a redicuously low limit on throttling. Say 1 bytes per second on reads. In such cases simple operations like mount a disk can wait for a very long time. o Once bio is throttled, there is no easy way to come out of that wait even if user increases the read limit later. o This patch fixes it. Now if a user changes the cgroup limits, we recalculate the bio dispatch time according to new limits. o Can't take queueu lock under blkcg_lock, hence after the change I wake up the dispatch thread again which recalculates the time. So there are some variables being synchronized across two threads without lock and I had to make use of barriers. Hoping I have used barriers correctly. Any review of memory barrier code especially will help. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-16blk-cgroup: cgroup changes for IOPS limit supportVivek Goyal
o cgroup changes for IOPS throttling rules. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-16blk-cgroup: Introduce cgroup changes for throttling policyVivek Goyal
o cgroup chagnes for throttle policy. o Introduces READ and WRITE bytes per second throttling rules. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-16blk-cgroup: Prepare the base for supporting more than one IO control policiesVivek Goyal
o This patch prepares the base for introducing new IO control policies. Currently all the code is written knowing there is only one policy and that is proportional bandwidth. Creating infrastructure for newer policies to come in. o Also there were many functions which were generated using macro. It was very confusing. Got rid of those. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-04-26blk-cgroup: config options re-arrangementVivek Goyal
This patch fixes few usability and configurability issues. o All the cgroup based controller options are configurable from "Genral Setup/Control Group Support/" menu. blkio is the only exception. Hence make this option visible in above menu and make it configurable from there to bring it inline with rest of the cgroup based controllers. o Get rid of CONFIG_DEBUG_CFQ_IOSCHED. This option currently does two things. - Enable printing of cgroup paths in blktrace - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat files in cgroup. If we are using group scheduling, blktrace data is of not really much use if cgroup information is not present. To get this data, currently one has to also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of all the additional debug stat files which is not desired. Hence, this patch moves printing of cgroup paths under CONFIG_CFQ_GROUP_IOSCHED. This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which can be enabled through config menu. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Divyesh Shah <dpshah@google.com> Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-26blkio: Fix another BUG_ON() crash due to cfqq movement across groupsVivek Goyal
o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was assuming that upon slice expiry, group can't be marked empty already (except forced dispatch). But this assumption is broken if cfqq can move (group_isolation=0) across groups after receiving a request. I think most likely in this case we got a request in a cfqq and accounted the rq in one group, later while adding the cfqq to tree, we moved the queue to a different group which was already marked empty and after dispatch from slice we found group already marked empty and raised alarm. This patch does not error out if group is already marked empty. This can introduce some empty_time stat error only in case of group_isolation=0. This is better than crashing. In case of group_isolation=1 we should still get same stats as before this patch. [ 222.308546] ------------[ cut here ]------------ [ 222.309311] kernel BUG at block/blk-cgroup.c:236! [ 222.309311] invalid opcode: 0000 [#1] SMP [ 222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler [ 222.309311] CPU 1 [ 222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] [ 222.309311] [ 222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation [ 222.309311] RIP: 0010:[<ffffffff8121ad88>] [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83 [ 222.309311] RSP: 0018:ffff8800ba6e79f8 EFLAGS: 00010002 [ 222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808 [ 222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30 [ 222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001 [ 222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff [ 222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808 [ 222.309311] FS: 00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000 [ 222.309311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0 [ 222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00) [ 222.309311] Stack: [ 222.309311] 0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800 [ 222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698 [ 222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48 [ 222.309311] Call Trace: [ 222.309311] [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec [ 222.309311] [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8 [ 222.309311] [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10 [ 222.309311] [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b [ 222.309311] [<ffffffff81210461>] blk_peek_request+0x191/0x1a7 [ 222.309311] [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod] [ 222.309311] [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35 [ 222.309311] [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37 [ 222.309311] [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c [ 222.309311] [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod] [ 222.309311] [<ffffffff8120ca37>] blk_unplug+0x29/0x2d [ 222.309311] [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14 [ 222.309311] [<ffffffff81109a7a>] block_sync_page+0x35/0x39 [ 222.309311] [<ffffffff810ae616>] sync_page+0x41/0x4a [ 222.309311] [<ffffffff810ae62d>] sync_page_killable+0xe/0x35 [ 222.309311] [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f [ 222.309311] [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d [ 222.309311] [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33 [ 222.309311] [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e [ 222.309311] [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0 [ 222.309311] [<ffffffff810ea044>] do_sync_read+0xcb/0x108 [ 222.309311] [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18 [ 222.309311] [<ffffffff810ea6ab>] vfs_read+0xab/0x108 [ 222.309311] [<ffffffff810ea7c8>] sys_read+0x4a/0x6e [ 222.309311] [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b [ 222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04 [ 222.309311] RIP [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83 [ 222.309311] RSP <ffff8800ba6e79f8> [ 222.309311] ---[ end trace 32b4f71dffc15712 ]--- Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Divyesh Shah <dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-16blkio: Initialize blkg->stats_lock for the root cfqg tooDivyesh Shah
This fixes the lockdep warning reported by Gui Jianfeng. Signed-off-by: Divyesh Shah <dpshah@google.com> Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-13block: Update to io-controller statsDivyesh Shah
Changelog from v1: o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at dispatch time. Changelog from original patchset: (in response to Vivek Goyal's comments) o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions o rename blkiocg_update_set_active_queue_stats() to blkiocg_update_avg_queue_size_stats() o s/request/io/ in blkiocg_update_request_add_stats() and blkiocg_update_request_remove_stats() o Call cfq_del_timer() at request dispatch() instead of blkiocg_update_idle_time_stats() Signed-off-by: Divyesh Shah<dpshah@google.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-13io-controller: Add a new interface "weight_device" for IO-ControllerGui Jianfeng
Currently, IO Controller makes use of blkio.weight to assign weight for all devices. Here a new user interface "blkio.weight_device" is introduced to assign different weights for different devices. blkio.weight becomes the default value for devices which are not configured by "blkio.weight_device" You can use the following format to assigned specific weight for a given device: #echo "major:minor weight" > blkio.weight_device major:minor represents device number. And you can remove weight for a given device as following: #echo "major:minor 0" > blkio.weight_device V1->V2 changes: - use user interface "weight_device" instead of "policy" suggested by Vivek - rename some struct suggested by Vivek - rebase to 2.6-block "for-linus" branch - remove an useless list_empty check pointed out by Li Zefan - some trivial typo fix V2->V3 changes: - Move policy_*_node() functions up to get rid of forward declarations - rename related functions by adding prefix "blkio_" Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09blkio: Add more debug-only per-cgroup statsDivyesh Shah
1) group_wait_time - This is the amount of time the cgroup had to wait to get a timeslice for one of its queues from when it became busy, i.e., went from 0 to 1 request queued. This is different from the io_wait_time which is the cumulative total of the amount of time spent by each IO in that cgroup waiting in the scheduler queue. This stat is a great way to find out any jobs in the fleet that are being starved or waiting for longer than what is expected (due to an IO controller bug or any other issue). 2) empty_time - This is the amount of time a cgroup spends w/o any pending requests. This stat is useful when a job does not seem to be able to use its assigned disk share by helping check if that is happening due to an IO controller bug or because the job is not submitting enough IOs. 3) idle_time - This is the amount of time spent by the IO scheduler idling for a given cgroup in anticipation of a better request than the exising ones from other queues/cgroups. All these stats are recorded using start and stop events. When reading these stats, we do not add the delta between the current time and the last start time if we're between the start and stop events. We avoid doing this to make sure that these numbers are always monotonically increasing when read. Since we're using sched_clock() which may use the tsc as its source, it may induce some inconsistency (due to tsc resync across cpus) if we included the current delta. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09blkio: Add io_queued and avg_queue_size statsDivyesh Shah
These stats are useful for getting a feel for the queue depth of the cgroup, i.e., how filled up its queues are at a given instant and over the existence of the cgroup. This ability is useful when debugging problems in the wild as it helps understand the application's IO pattern w/o having to read through the userspace code (coz its tedious or just not available) or w/o the ability to run blktrace (since you may not have root access and/or not want to disturb performance). Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09blkio: Add io_merged statDivyesh Shah
This includes both the number of bios merged into requests belonging to this cgroup as well as the number of requests merged together. In the past, we've observed different merging behavior across upstream kernels, some by design some actual bugs. This stat helps a lot in debugging such problems when applications report decreased throughput with a new kernel version. This needed adding an extra elevator function to capture bios being merged as I did not want to pollute elevator code with blkiocg knowledge and hence needed the accounting invocation to come from CFQ. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09blkio: Changes to IO controller additional stats patchesDivyesh Shah
that include some minor fixes and addresses all comments. Changelog: (most based on Vivek Goyal's comments) o renamed blkiocg_reset_write to blkiocg_reset_stats o more clarification in the documentation on io_service_time and io_wait_time o Initialize blkg->stats_lock o rename io_add_stat to blkio_add_stat and declare it static o use bool for direction and sync o derive direction and sync info from existing rq methods o use 12 for major:minor string length o define io_service_time better to cover the NCQ case o add a separate reset_stats interface o make the indexed stats a 2d array to simplify macro and function pointer code o blkio.time now exports in jiffies as before o Added stats description in patch description and Documentation/cgroup/blkio-controller.txt o Prefix all stats functions with blkio and make them static as applicable o replace IO_TYPE_MAX with IO_TYPE_TOTAL o Moved #define constant to top of blk-cgroup.c o Pass dev_t around instead of char * o Add note to documentation file about resetting stats o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef statements o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has rq_direction() and rq_sync() functions which are used by CFQ and when using io-controller at a higher level, bio_* functions can be added. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02blkio: Increment the blkio cgroup stats for real nowDivyesh Shah
We also add start_time_ns and io_start_time_ns fields to struct request here to record the time when a request is created and when it is dispatched to device. We use ns uints here as ms and jiffies are not very useful for non-rotational media. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02blkio: Add io controller stats likeDivyesh Shah
- io_service_time - io_wait_time - io_serviced - io_service_bytes These stats are accumulated per operation type helping us to distinguish between read and write, and sync and async IO. This patch does not increment any of these stats. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02blkio: Remove per-cfqq nr_sectors as we'll be passingDivyesh Shah
that info at request dispatch with other stats now. This patch removes the existing support for accounting sectors for a blkio_group. This will be added back differently in the next two patches. Signed-off-by: Divyesh Shah<dpshah@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-03-12cgroups: blkio subsystem as moduleBen Blum
Modify the Block I/O cgroup subsystem to be able to be built as a module. As the CFQ disk scheduler optionally depends on blk-cgroup, config options in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are enhanced to support the new module dependency. Signed-off-by: Ben Blum <bblum@andrew.cmu.edu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-26cfq: Remove useless css reference getGui Jianfeng
There's no need to take css reference here, for the caller has already called rcu_read_lock() to prevent cgroup from being removed. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04blkio: Implement dynamic io controlling policy registrationVivek Goyal
o One of the goals of block IO controller is that it should be able to support mulitple io control policies, some of which be operational at higher level in storage hierarchy. o To begin with, we had one io controlling policy implemented by CFQ, and I hard coded the CFQ functions called by blkio. This created issues when CFQ is compiled as module. o This patch implements a basic dynamic io controlling policy registration functionality in blkio. This is similar to elevator functionality where ioschedulers register the functions dynamically. o Now in future, when more IO controlling policies are implemented, these can dynakically register with block IO controller. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04blkio: Export some symbols from blkio as its user CFQ can be a moduleVivek Goyal
o blkio controller is inside the kernel and cfq makes use of interfaces exported by blkio. CFQ can be a module too, hence export symbols used by CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03cfq-iosched: fix compile problem with !CONFIG_CGROUPJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03blkio: Export disk time and sectors used by a group to user spaceVivek Goyal
o Export disk time and sector used by a group to user space through cgroup interface. o Also export a "dequeue" interface to cgroup which keeps track of how many a times a group was deleted from service tree. Helps in debugging. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03blkio: Some debugging aids for CFQVivek Goyal
o Some debugging aids for CFQ. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03blkio: Take care of cgroup deletion and cfq group reference countingVivek Goyal
o One can choose to change elevator or delete a cgroup. Implement group reference counting so that both elevator exit and cgroup deletion can take place gracefully. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Nauman Rafique <nauman@google.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03blkio: Introduce blkio controller cgroup interfaceVivek Goyal
o This is basic implementation of blkio controller cgroup interface. This is the common interface visible to user space and should be used by different IO control policies as we implement those. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>