<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/elevator.h, branch v4.11-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>blk-mq: pass bio to blk_mq_sched_get_rq_priv</title>
<updated>2017-02-10T16:09:59+00:00</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@linaro.org</email>
</author>
<published>2017-02-07T17:24:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f1ba82616c3368e1ae9e64ef29cf3edc1be0860d'/>
<id>f1ba82616c3368e1ae9e64ef29cf3edc1be0860d</id>
<content type='text'>
bio is used in bfq-mq's get_rq_priv, to get the request group. We could
pass directly the group here, but I thought that passing the bio was
more general, giving the possibility to get other pieces of information
if needed.

Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bio is used in bfq-mq's get_rq_priv, to get the request group. We could
pass directly the group here, but I thought that passing the bio was
more general, giving the possibility to get other pieces of information
if needed.

Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: optionally merge discontiguous discard bios into a single request</title>
<updated>2017-02-08T20:43:08+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-02-08T13:46:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1e739730c5b9ea80a2f25e9cf6e1025d47e3d8ed'/>
<id>1e739730c5b9ea80a2f25e9cf6e1025d47e3d8ed</id>
<content type='text'>
Add a new merge strategy that merges discard bios into a request until the
maximum number of discard ranges (or the maximum discard size) is reached
from the plug merging code.  I/O scheduler merging is not wired up yet
but might also be useful, although not for fast devices like NVMe which
are the only user for now.

Note that for now we don't support limiting the size of each discard range,
but if needed that can be added later.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new merge strategy that merges discard bios into a request until the
maximum number of discard ranges (or the maximum discard size) is reached
from the plug merging code.  I/O scheduler merging is not wired up yet
but might also be useful, although not for fast devices like NVMe which
are the only user for now.

Note that for now we don't support limiting the size of each discard range,
but if needed that can be added later.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: enumify ELEVATOR_*_MERGE</title>
<updated>2017-02-08T20:43:06+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-02-08T13:46:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=34fe7c05400663e01e23cddd1fea68bb7a2b3d29'/>
<id>34fe7c05400663e01e23cddd1fea68bb7a2b3d29</id>
<content type='text'>
Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq-sched: change -&gt;dispatch_requests() to -&gt;dispatch_request()</title>
<updated>2017-01-27T15:20:35+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2017-01-26T19:40:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c13660a08c8b3bb49def4374bfd414aaaa564662'/>
<id>c13660a08c8b3bb49def4374bfd414aaaa564662</id>
<content type='text'>
When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Tested-by: Hannes Reinecke &lt;hare@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
Tested-by: Hannes Reinecke &lt;hare@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>blk-mq-sched: add framework for MQ capable IO schedulers</title>
<updated>2017-01-17T17:04:20+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2017-01-17T13:03:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bd166ef183c263c5ced656d49ef19c7da4adc774'/>
<id>bd166ef183c263c5ced656d49ef19c7da4adc774</id>
<content type='text'>
This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: move existing elevator ops to union</title>
<updated>2017-01-17T17:03:33+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2016-12-10T22:13:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c51ca6cf545bc51ad38bd50816bde37c647d608d'/>
<id>c51ca6cf545bc51ad38bd50816bde37c647d608d</id>
<content type='text'>
Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Reviewed-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Reviewed-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Reviewed-by: Omar Sandoval &lt;osandov@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>elevator: make the rqhash helpers exported</title>
<updated>2016-12-09T16:03:02+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2016-12-07T15:43:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=70b3ea056f3074be6d9256c312b64c0d90a4a683'/>
<id>70b3ea056f3074be6d9256c312b64c0d90a4a683</id>
<content type='text'>
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: better op and flags encoding</title>
<updated>2016-10-28T14:48:16+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-10-28T14:48:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ef295ecf090d3e86e5b742fc6ab34f1122a43773'/>
<id>ef295ecf090d3e86e5b742fc6ab34f1122a43773</id>
<content type='text'>
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: do not merge requests without consulting with io scheduler</title>
<updated>2016-07-21T03:35:12+00:00</updated>
<author>
<name>Tahsin Erdogan</name>
<email>tahsin@google.com</email>
</author>
<published>2016-07-07T18:48:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=72ef799b3f14f4cb4c56ba3af6e6bdcbae6df368'/>
<id>72ef799b3f14f4cb4c56ba3af6e6bdcbae6df368</id>
<content type='text'>
Before merging a bio into an existing request, io scheduler is called to
get its approval first. However, the requests that come from a plug
flush may get merged by block layer without consulting with io
scheduler.

In case of CFQ, this can cause fairness problems. For instance, if a
request gets merged into a low weight cgroup's request, high weight cgroup
now will depend on low weight cgroup to get scheduled. If high weigt cgroup
needs that io request to complete before submitting more requests, then it
will also lose its timeslice.

Following script demonstrates the problem. Group g1 has a low weight, g2
and g3 have equal high weights but g2's requests are adjacent to g1's
requests so they are subject to merging. Due to these merges, g2 gets
poor disk time allocation.

cat &gt; cfq-merge-repro.sh &lt;&lt; "EOF"
#!/bin/bash
set -e

IO_ROOT=/mnt-cgroup/io

mkdir -p $IO_ROOT

if ! mount | grep -qw $IO_ROOT; then
  mount -t cgroup none -oblkio $IO_ROOT
fi

cd $IO_ROOT

for i in g1 g2 g3; do
  if [ -d $i ]; then
    rmdir $i
  fi
done

mkdir g1 &amp;&amp; echo 10 &gt; g1/blkio.weight
mkdir g2 &amp;&amp; echo 495 &gt; g2/blkio.weight
mkdir g3 &amp;&amp; echo 495 &gt; g3/blkio.weight

RUNTIME=10

(echo $BASHPID &gt; g1/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=0k &amp;&gt; /dev/null)&amp;

(echo $BASHPID &gt; g2/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=64k &amp;&gt; /dev/null)&amp;

(echo $BASHPID &gt; g3/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=256k &amp;&gt; /dev/null)&amp;

sleep $((RUNTIME+1))

for i in g1 g2 g3; do
  echo ---- $i ----
  cat $i/blkio.time
done

EOF
# ./cfq-merge-repro.sh
---- g1 ----
8:16 162
---- g2 ----
8:16 165
---- g3 ----
8:16 686

After applying the patch:

# ./cfq-merge-repro.sh
---- g1 ----
8:16 90
---- g2 ----
8:16 445
---- g3 ----
8:16 471

Signed-off-by: Tahsin Erdogan &lt;tahsin@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before merging a bio into an existing request, io scheduler is called to
get its approval first. However, the requests that come from a plug
flush may get merged by block layer without consulting with io
scheduler.

In case of CFQ, this can cause fairness problems. For instance, if a
request gets merged into a low weight cgroup's request, high weight cgroup
now will depend on low weight cgroup to get scheduled. If high weigt cgroup
needs that io request to complete before submitting more requests, then it
will also lose its timeslice.

Following script demonstrates the problem. Group g1 has a low weight, g2
and g3 have equal high weights but g2's requests are adjacent to g1's
requests so they are subject to merging. Due to these merges, g2 gets
poor disk time allocation.

cat &gt; cfq-merge-repro.sh &lt;&lt; "EOF"
#!/bin/bash
set -e

IO_ROOT=/mnt-cgroup/io

mkdir -p $IO_ROOT

if ! mount | grep -qw $IO_ROOT; then
  mount -t cgroup none -oblkio $IO_ROOT
fi

cd $IO_ROOT

for i in g1 g2 g3; do
  if [ -d $i ]; then
    rmdir $i
  fi
done

mkdir g1 &amp;&amp; echo 10 &gt; g1/blkio.weight
mkdir g2 &amp;&amp; echo 495 &gt; g2/blkio.weight
mkdir g3 &amp;&amp; echo 495 &gt; g3/blkio.weight

RUNTIME=10

(echo $BASHPID &gt; g1/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=0k &amp;&gt; /dev/null)&amp;

(echo $BASHPID &gt; g2/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=64k &amp;&gt; /dev/null)&amp;

(echo $BASHPID &gt; g3/cgroup.procs &amp;&amp;
 fio --readonly --name name1 --filename /dev/sdb \
     --rw read --size 64k --bs 64k --time_based \
     --runtime=$RUNTIME --offset=256k &amp;&gt; /dev/null)&amp;

sleep $((RUNTIME+1))

for i in g1 g2 g3; do
  echo ---- $i ----
  cat $i/blkio.time
done

EOF
# ./cfq-merge-repro.sh
---- g1 ----
8:16 162
---- g2 ----
8:16 165
---- g3 ----
8:16 686

After applying the patch:

# ./cfq-merge-repro.sh
---- g1 ----
8:16 90
---- g2 ----
8:16 445
---- g3 ----
8:16 471

Signed-off-by: Tahsin Erdogan &lt;tahsin@google.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: prepare elevator to use REQ_OPs.</title>
<updated>2016-06-07T19:41:38+00:00</updated>
<author>
<name>Mike Christie</name>
<email>mchristi@redhat.com</email>
</author>
<published>2016-06-05T19:32:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=ba568ea0a2ef9a193ca24874228474ec7ae2ff98'/>
<id>ba568ea0a2ef9a193ca24874228474ec7ae2ff98</id>
<content type='text'>
This patch converts the elevator code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.

Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch converts the elevator code to use separate variables
for the operation and flags, and to check req_op for the REQ_OP.

Signed-off-by: Mike Christie &lt;mchristi@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
