From ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 25 Jan 2011 12:43:54 +0100 Subject: block: reimplement FLUSH/FUA to support merge The current FLUSH/FUA support has evolved from the implementation which had to perform queue draining. As such, sequencing is done queue-wide one flush request after another. However, with the draining requirement gone, there's no reason to keep the queue-wide sequential approach. This patch reimplements FLUSH/FUA support such that each FLUSH/FUA request is sequenced individually. The actual FLUSH execution is double buffered and whenever a request wants to execute one for either PRE or POSTFLUSH, it queues on the pending queue. Once certain conditions are met, a flush request is issued and on its completion all pending requests proceed to the next sequence. This allows arbitrary merging of different type of flushes. How they are merged can be primarily controlled and tuned by adjusting the above said 'conditions' used to determine when to issue the next flush. This is inspired by Darrick's patches to merge multiple zero-data flushes which helps workloads with highly concurrent fsync requests. * As flush requests are never put on the IO scheduler, request fields used for flush share space with rq->rb_node. rq->completion_data is moved out of the union. This increases the request size by one pointer. As rq->elevator_private* are used only by the iosched too, it is possible to reduce the request size further. However, to do that, we need to modify request allocation path such that iosched data is not allocated for flush requests. * FLUSH/FUA processing happens on insertion now instead of dispatch. - Comments updated as per Vivek and Mike. Signed-off-by: Tejun Heo Cc: "Darrick J. Wong" Cc: Shaohua Li Cc: Christoph Hellwig Cc: Vivek Goyal Cc: Mike Snitzer Signed-off-by: Jens Axboe --- block/elevator.c | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 2569512830d3..270e0972eb9f 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -673,6 +673,11 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) q->elevator->ops->elevator_add_req_fn(q, rq); break; + case ELEVATOR_INSERT_FLUSH: + rq->cmd_flags |= REQ_SOFTBARRIER; + blk_insert_flush(rq); + break; + default: printk(KERN_ERR "%s: bad insertion point %d\n", __func__, where); @@ -785,6 +790,8 @@ void elv_abort_queue(struct request_queue *q) { struct request *rq; + blk_abort_flushes(q); + while (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); rq->cmd_flags |= REQ_QUIET; -- cgit v1.2.3 From c186794dbb466b45cf40f942f2d09d6d5b4b0e42 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Fri, 11 Feb 2011 11:08:00 +0100 Subject: block: share request flush fields with elevator_private Flush requests are never put on the IO scheduler. Convert request structure's elevator_private* into an array and have the flush fields share a union with it. Reclaim the space lost in 'struct request' by moving 'completion_data' back in the union with 'rb_node'. Signed-off-by: Mike Snitzer Acked-by: Vivek Goyal Acked-by: Tejun Heo Signed-off-by: Jens Axboe --- block/elevator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 270e0972eb9f..f98e92edc937 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -764,7 +764,7 @@ int elv_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask) if (e->ops->elevator_set_req_fn) return e->ops->elevator_set_req_fn(q, rq, gfp_mask); - rq->elevator_private = NULL; + rq->elevator_private[0] = NULL; return 0; } -- cgit v1.2.3 From 73c101011926c5832e6e141682180c4debe2cf45 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 8 Mar 2011 13:19:51 +0100 Subject: block: initial patch for on-stack per-task plugging This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by: Jens Axboe --- block/elevator.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index f98e92edc937..25713927c0d3 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -113,7 +113,7 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio) } EXPORT_SYMBOL(elv_rq_merge_ok); -static inline int elv_try_merge(struct request *__rq, struct bio *bio) +int elv_try_merge(struct request *__rq, struct bio *bio) { int ret = ELEVATOR_NO_MERGE; @@ -421,6 +421,8 @@ void elv_dispatch_sort(struct request_queue *q, struct request *rq) struct list_head *entry; int stop_flags; + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); + if (q->last_merge == rq) q->last_merge = NULL; @@ -696,6 +698,8 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) void __elv_add_request(struct request_queue *q, struct request *rq, int where, int plug) { + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); + if (rq->cmd_flags & REQ_SOFTBARRIER) { /* barriers are scheduling boundary, update end_sector */ if (rq->cmd_type == REQ_TYPE_FS || -- cgit v1.2.3 From 7eaceaccab5f40bbfda044629a6298616aeaed50 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 10 Mar 2011 08:52:07 +0100 Subject: block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe --- block/elevator.c | 43 +++---------------------------------------- 1 file changed, 3 insertions(+), 40 deletions(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 25713927c0d3..3ea208256e78 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -619,21 +619,12 @@ void elv_quiesce_end(struct request_queue *q) void elv_insert(struct request_queue *q, struct request *rq, int where) { - int unplug_it = 1; - trace_block_rq_insert(q, rq); rq->q = q; switch (where) { case ELEVATOR_INSERT_REQUEUE: - /* - * Most requeues happen because of a busy condition, - * don't force unplug of the queue for that case. - * Clear unplug_it and fall through. - */ - unplug_it = 0; - case ELEVATOR_INSERT_FRONT: rq->cmd_flags |= REQ_SOFTBARRIER; list_add(&rq->queuelist, &q->queue_head); @@ -679,24 +670,14 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) rq->cmd_flags |= REQ_SOFTBARRIER; blk_insert_flush(rq); break; - default: printk(KERN_ERR "%s: bad insertion point %d\n", __func__, where); BUG(); } - - if (unplug_it && blk_queue_plugged(q)) { - int nrq = q->rq.count[BLK_RW_SYNC] + q->rq.count[BLK_RW_ASYNC] - - queue_in_flight(q); - - if (nrq >= q->unplug_thresh) - __generic_unplug_device(q); - } } -void __elv_add_request(struct request_queue *q, struct request *rq, int where, - int plug) +void __elv_add_request(struct request_queue *q, struct request *rq, int where) { BUG_ON(rq->cmd_flags & REQ_ON_PLUG); @@ -711,38 +692,20 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where, where == ELEVATOR_INSERT_SORT) where = ELEVATOR_INSERT_BACK; - if (plug) - blk_plug_device(q); - elv_insert(q, rq, where); } EXPORT_SYMBOL(__elv_add_request); -void elv_add_request(struct request_queue *q, struct request *rq, int where, - int plug) +void elv_add_request(struct request_queue *q, struct request *rq, int where) { unsigned long flags; spin_lock_irqsave(q->queue_lock, flags); - __elv_add_request(q, rq, where, plug); + __elv_add_request(q, rq, where); spin_unlock_irqrestore(q->queue_lock, flags); } EXPORT_SYMBOL(elv_add_request); -int elv_queue_empty(struct request_queue *q) -{ - struct elevator_queue *e = q->elevator; - - if (!list_empty(&q->queue_head)) - return 0; - - if (e->ops->elevator_queue_empty_fn) - return e->ops->elevator_queue_empty_fn(q); - - return 1; -} -EXPORT_SYMBOL(elv_queue_empty); - struct request *elv_latter_request(struct request_queue *q, struct request *rq) { struct elevator_queue *e = q->elevator; -- cgit v1.2.3 From 5e84ea3a9c662dc2d7a48703a4468fad954a3b7f Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 21 Mar 2011 10:14:27 +0100 Subject: block: attempt to merge with existing requests on plug flush One of the disadvantages of on-stack plugging is that we potentially lose out on merging since all pending IO isn't always visible to everybody. When we flush the on-stack plugs, right now we don't do any checks to see if potential merge candidates could be utilized. Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE. It works just ELEVATOR_INSERT_SORT, but first checks whether we can merge with an existing request before doing the insertion (if we fail merging). This fixes a regression with multiple processes issuing IO that can be merged. Thanks to Shaohua Li for testing and fixing an accounting bug. Signed-off-by: Jens Axboe --- block/elevator.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 542ce826b401..c387d3168734 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -521,6 +521,40 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio) return ELEVATOR_NO_MERGE; } +/* + * Attempt to do an insertion back merge. Only check for the case where + * we can append 'rq' to an existing request, so we can throw 'rq' away + * afterwards. + * + * Returns true if we merged, false otherwise + */ +static bool elv_attempt_insert_merge(struct request_queue *q, + struct request *rq) +{ + struct request *__rq; + + if (blk_queue_nomerges(q)) + return false; + + /* + * First try one-hit cache. + */ + if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq)) + return true; + + if (blk_queue_noxmerges(q)) + return false; + + /* + * See if our hash lookup can find a potential backmerge. + */ + __rq = elv_rqhash_find(q, blk_rq_pos(rq)); + if (__rq && blk_attempt_req_merge(q, __rq, rq)) + return true; + + return false; +} + void elv_merged_request(struct request_queue *q, struct request *rq, int type) { struct elevator_queue *e = q->elevator; @@ -538,14 +572,18 @@ void elv_merge_requests(struct request_queue *q, struct request *rq, struct request *next) { struct elevator_queue *e = q->elevator; + const int next_sorted = next->cmd_flags & REQ_SORTED; - if (e->ops->elevator_merge_req_fn) + if (next_sorted && e->ops->elevator_merge_req_fn) e->ops->elevator_merge_req_fn(q, rq, next); elv_rqhash_reposition(q, rq); - elv_rqhash_del(q, next); - q->nr_sorted--; + if (next_sorted) { + elv_rqhash_del(q, next); + q->nr_sorted--; + } + q->last_merge = rq; } @@ -647,6 +685,14 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) __blk_run_queue(q, false); break; + case ELEVATOR_INSERT_SORT_MERGE: + /* + * If we succeed in merging this request with one in the + * queue already, we are done - rq has now been freed, + * so no need to do anything further. + */ + if (elv_attempt_insert_merge(q, rq)) + break; case ELEVATOR_INSERT_SORT: BUG_ON(rq->cmd_type != REQ_TYPE_FS && !(rq->cmd_flags & REQ_DISCARD)); -- cgit v1.2.3 From b710a480554f2be682bac3cb59b0e085ba3d644b Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 30 Mar 2011 09:52:30 +0200 Subject: block: get rid of elv_insert() interface Merge it with __elv_add_request(), it's pretty pointless to have a function with only two callers. The main interface is elv_add_request()/__elv_add_request(). Signed-off-by: Jens Axboe --- block/elevator.c | 35 +++++++++++++++-------------------- 1 file changed, 15 insertions(+), 20 deletions(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index c387d3168734..0cdb4e7ebab4 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -610,7 +610,7 @@ void elv_requeue_request(struct request_queue *q, struct request *rq) rq->cmd_flags &= ~REQ_STARTED; - elv_insert(q, rq, ELEVATOR_INSERT_REQUEUE); + __elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE); } void elv_drain_elevator(struct request_queue *q) @@ -655,12 +655,25 @@ void elv_quiesce_end(struct request_queue *q) queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q); } -void elv_insert(struct request_queue *q, struct request *rq, int where) +void __elv_add_request(struct request_queue *q, struct request *rq, int where) { trace_block_rq_insert(q, rq); rq->q = q; + BUG_ON(rq->cmd_flags & REQ_ON_PLUG); + + if (rq->cmd_flags & REQ_SOFTBARRIER) { + /* barriers are scheduling boundary, update end_sector */ + if (rq->cmd_type == REQ_TYPE_FS || + (rq->cmd_flags & REQ_DISCARD)) { + q->end_sector = rq_end_sector(rq); + q->boundary_rq = rq; + } + } else if (!(rq->cmd_flags & REQ_ELVPRIV) && + where == ELEVATOR_INSERT_SORT) + where = ELEVATOR_INSERT_BACK; + switch (where) { case ELEVATOR_INSERT_REQUEUE: case ELEVATOR_INSERT_FRONT: @@ -722,24 +735,6 @@ void elv_insert(struct request_queue *q, struct request *rq, int where) BUG(); } } - -void __elv_add_request(struct request_queue *q, struct request *rq, int where) -{ - BUG_ON(rq->cmd_flags & REQ_ON_PLUG); - - if (rq->cmd_flags & REQ_SOFTBARRIER) { - /* barriers are scheduling boundary, update end_sector */ - if (rq->cmd_type == REQ_TYPE_FS || - (rq->cmd_flags & REQ_DISCARD)) { - q->end_sector = rq_end_sector(rq); - q->boundary_rq = rq; - } - } else if (!(rq->cmd_flags & REQ_ELVPRIV) && - where == ELEVATOR_INSERT_SORT) - where = ELEVATOR_INSERT_BACK; - - elv_insert(q, rq, where); -} EXPORT_SYMBOL(__elv_add_request); void elv_add_request(struct request_queue *q, struct request *rq, int where) -- cgit v1.2.3 From 24ecfbe27f65563909b14492afda2f1c21f7c044 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 18 Apr 2011 11:41:33 +0200 Subject: block: add blk_run_queue_async Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- block/elevator.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 0cdb4e7ebab4..6f6abc08bb56 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -642,7 +642,7 @@ void elv_quiesce_start(struct request_queue *q) */ elv_drain_elevator(q); while (q->rq.elvpriv) { - __blk_run_queue(q, false); + __blk_run_queue(q); spin_unlock_irq(q->queue_lock); msleep(10); spin_lock_irq(q->queue_lock); @@ -695,7 +695,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where) * with anything. There's no point in delaying queue * processing. */ - __blk_run_queue(q, false); + __blk_run_queue(q); break; case ELEVATOR_INSERT_SORT_MERGE: -- cgit v1.2.3 From 3aa72873ffdcc2f7919743efbbefc351ec73f5cb Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 21 Apr 2011 19:28:35 +0200 Subject: elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too The sort insert is the one that goes to the IO scheduler. With the SORT_MERGE addition, we could bypass IO scheduler setup but still ask the IO scheduler to insert the request. This would cause an oops on switching IO schedulers through the sysfs interface, unless the disk just happened to be idle while it occured. Signed-off-by: Jens Axboe --- block/elevator.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'block/elevator.c') diff --git a/block/elevator.c b/block/elevator.c index 6f6abc08bb56..45ca1e34f582 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -671,7 +671,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where) q->boundary_rq = rq; } } else if (!(rq->cmd_flags & REQ_ELVPRIV) && - where == ELEVATOR_INSERT_SORT) + (where == ELEVATOR_INSERT_SORT || + where == ELEVATOR_INSERT_SORT_MERGE)) where = ELEVATOR_INSERT_BACK; switch (where) { -- cgit v1.2.3