<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/crush, branch v4.14-rc3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>libceph: fallback for when there isn't a pool-specific choose_arg</title>
<updated>2017-08-01T14:46:44+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-07-24T14:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e17e8969f5c59a10083af5e260bdad6026872203'/>
<id>e17e8969f5c59a10083af5e260bdad6026872203</id>
<content type='text'>
There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set.  If you create a per-pool weight-set,
that works for that pool.  Otherwise we try the compat/default one.  If
that doesn't exist either, then we use the normal CRUSH weights.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is now a fallback to a choose_arg index of -1 if there isn't
a pool-specific choose_arg set.  If you create a per-pool weight-set,
that works for that pool.  Otherwise we try the compat/default one.  If
that doesn't exist either, then we use the normal CRUSH weights.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()</title>
<updated>2017-07-07T15:25:19+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-06-22T17:44:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5cf9c4a9959b6273675310d14a834ef14fbca37c'/>
<id>5cf9c4a9959b6273675310d14a834ef14fbca37c</id>
<content type='text'>
If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
                          dbe36e08be00c6519a8c89718dd47b0219c20516.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
                          dbe36e08be00c6519a8c89718dd47b0219c20516.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: implement weight and id overrides for straw2</title>
<updated>2017-07-07T15:25:19+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-06-22T17:44:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=069f3222ca96acfe8c59937e98c401bda5475b48'/>
<id>069f3222ca96acfe8c59937e98c401bda5475b48</id>
<content type='text'>
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: merge working data and scratch</title>
<updated>2017-02-20T11:16:11+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-01-31T14:55:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=743efcffffc6620ab44ea9ec67c7e4e28dfa7742'/>
<id>743efcffffc6620ab44ea9ec67c7e4e28dfa7742</id>
<content type='text'>
Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Much like Arlo Guthrie, I decided that one big pile is better than two
little piles.

Reflects ceph.git commit 95c2df6c7e0b22d2ea9d91db500cf8b9441c73ba.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: remove mutable part of CRUSH map</title>
<updated>2017-02-20T11:16:11+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2017-01-31T14:55:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=66a0e2d579dbec5c676cfe446234ffebb267c564'/>
<id>66a0e2d579dbec5c676cfe446234ffebb267c564</id>
<content type='text'>
Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Then add it to the working state. It would be very nice if we didn't
have to take a lock to calculate a crush placement. By moving the
permutation array into the working data, we can treat the CRUSH map as
immutable.

Reflects ceph.git commit cbcd039651c0569551cb90d26ce27e1432671f2a.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add chooseleaf_stable tunable</title>
<updated>2016-02-04T17:25:55+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-01-31T13:36:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dc6ae6d8e7726bad4f1c87244b49cac851746c65'/>
<id>dc6ae6d8e7726bad4f1c87244b49cac851746c65</id>
<content type='text'>
Add a tunable to fix the bug that chooseleaf may cause unnecessary pg
migrations when some device fails.

Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a tunable to fix the bug that chooseleaf may cause unnecessary pg
migrations when some device fails.

Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: sync up with userspace</title>
<updated>2015-06-25T08:49:31+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-06-12T10:21:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b459be739f97e2062b2ba77cfe8ea198dbd58904'/>
<id>b459be739f97e2062b2ba77cfe8ea198dbd58904</id>
<content type='text'>
.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
&lt;xerofoify@gmail.com&gt; is folded in as crush_ln() has been made static
in userspace as well.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
&lt;xerofoify@gmail.com&gt; is folded in as crush_ln() has been made static
in userspace as well.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: straw2 bucket type with an efficient 64-bit crush_ln()</title>
<updated>2015-04-22T15:33:43+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-04-14T13:54:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=958a27658d94cf212caeb0ffb04ee0b0bc89cc40'/>
<id>958a27658d94cf212caeb0ffb04ee0b0bc89cc40</id>
<content type='text'>
This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -&gt; S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -&gt; S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add SET_CHOOSELEAF_VARY_R step</title>
<updated>2014-04-05T04:07:28+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-03-19T14:58:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d83ed858f144e3cfbef883d4bc499113cdddabeb'/>
<id>d83ed858f144e3cfbef883d4bc499113cdddabeb</id>
<content type='text'>
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add chooseleaf_vary_r tunable</title>
<updated>2014-04-05T04:07:26+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-03-19T14:58:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e2b149cc4ba00766aceb87950c6de72ea7fc8b2e'/>
<id>e2b149cc4ba00766aceb87950c6de72ea7fc8b2e</id>
<content type='text'>
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
