<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/crush, branch v4.8-rc1</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>crush: add chooseleaf_stable tunable</title>
<updated>2016-02-04T17:25:55+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2016-01-31T13:36:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=dc6ae6d8e7726bad4f1c87244b49cac851746c65'/>
<id>dc6ae6d8e7726bad4f1c87244b49cac851746c65</id>
<content type='text'>
Add a tunable to fix the bug that chooseleaf may cause unnecessary pg
migrations when some device fails.

Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a tunable to fix the bug that chooseleaf may cause unnecessary pg
migrations when some device fails.

Reflects ceph.git commit fdb3f664448e80d984470f32f04e2e6f03ab52ec.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: sync up with userspace</title>
<updated>2015-06-25T08:49:31+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-06-12T10:21:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b459be739f97e2062b2ba77cfe8ea198dbd58904'/>
<id>b459be739f97e2062b2ba77cfe8ea198dbd58904</id>
<content type='text'>
.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
&lt;xerofoify@gmail.com&gt; is folded in as crush_ln() has been made static
in userspace as well.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
.. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
between kernel and userspace").  This fixes a bunch of recently pulled
coding style issues and makes includes a bit cleaner.

A patch "crush:Make the function crush_ln static" from Nicholas Krause
&lt;xerofoify@gmail.com&gt; is folded in as crush_ln() has been made static
in userspace as well.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: straw2 bucket type with an efficient 64-bit crush_ln()</title>
<updated>2015-04-22T15:33:43+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2015-04-14T13:54:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=958a27658d94cf212caeb0ffb04ee0b0bc89cc40'/>
<id>958a27658d94cf212caeb0ffb04ee0b0bc89cc40</id>
<content type='text'>
This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -&gt; S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is an improved straw bucket that correctly avoids any data movement
between items A and B when neither A nor B's weights are changed.  Said
differently, if we adjust the weight of item C (including adding it anew
or removing it completely), we will only see inputs move to or from C,
never between other items in the bucket.

Notably, there is not intermediate scaling factor that needs to be
calculated.  The mapping function is a simple function of the item weights.

The below commits were squashed together into this one (mostly to avoid
adding and then yanking a ~6000 lines worth of crush_ln_table):

- crush: add a straw2 bucket type
- crush: add crush_ln to calculate nature log efficently
- crush: improve straw2 adjustment slightly
- crush: change crush_ln to provide 32 more digits
- crush: fix crush_get_bucket_item_weight and bucket destroy for straw2
- crush/mapper: fix divide-by-0 in straw2
  (with div64_s64() for draw = ln / w and INT64_MIN -&gt; S64_MIN - need
   to create a proper compat.h in ceph.git)

Reflects ceph.git commits 242293c908e923d474910f2b8203fa3b41eb5a53,
                          32a1ead92efcd351822d22a5fc37d159c65c1338,
                          6289912418c4a3597a11778bcf29ed5415117ad9,
                          35fcb04e2945717cf5cfe150b9fa89cb3d2303a1,
                          6445d9ee7290938de1e4ee9563912a6ab6d8ee5f,
                          b5921d55d16796e12d66ad2c4add7305f9ce2353.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add SET_CHOOSELEAF_VARY_R step</title>
<updated>2014-04-05T04:07:28+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-03-19T14:58:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d83ed858f144e3cfbef883d4bc499113cdddabeb'/>
<id>d83ed858f144e3cfbef883d4bc499113cdddabeb</id>
<content type='text'>
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This lets you adjust the vary_r tunable on a per-rule basis.

Reflects ceph.git commit f944ccc20aee60a7d8da7e405ec75ad1cd449fac.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add chooseleaf_vary_r tunable</title>
<updated>2014-04-05T04:07:26+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2014-03-19T14:58:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e2b149cc4ba00766aceb87950c6de72ea7fc8b2e'/>
<id>e2b149cc4ba00766aceb87950c6de72ea7fc8b2e</id>
<content type='text'>
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call.  That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.

Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.

Note that this was done from the get-go for the new crush_choose_indep
algorithm.

This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.

Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Josh Durgin &lt;josh.durgin@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add set_choose_local_[fallback_]tries steps</title>
<updated>2013-12-31T18:32:26+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f046bf92080cbdc4a94c6e86698c5a3f10716445'/>
<id>f046bf92080cbdc4a94c6e86698c5a3f10716445</id>
<content type='text'>
This allows all of the tunables to be overridden by a specific rule.

Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292,
                          0497db49e5973b50df26251ed0e3f4ac7578e66e.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows all of the tunables to be overridden by a specific rule.

Reflects ceph.git commits d129e09e57fbc61cfd4f492e3ee77d0750c9d292,
                          0497db49e5973b50df26251ed0e3f4ac7578e66e.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: CHOOSE_LEAF -&gt; CHOOSELEAF throughout</title>
<updated>2013-12-31T18:32:24+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=917edad5d1d62070436b74ecbf5ea019b651ff69'/>
<id>917edad5d1d62070436b74ecbf5ea019b651ff69</id>
<content type='text'>
This aligns the internal identifier names with the user-visible names in
the decompiled crush map language.

Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This aligns the internal identifier names with the user-visible names in
the decompiled crush map language.

Reflects ceph.git commit caa0e22e15e4226c3671318ba1f61314bf6da2a6.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: add SET_CHOOSE_TRIES rule step</title>
<updated>2013-12-31T18:32:23+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=cc10df4a3a5c34cb1d5b635ac70dd1fc406153ce'/>
<id>cc10df4a3a5c34cb1d5b635ac70dd1fc406153ce</id>
<content type='text'>
Since we can specify the recursive retries in a rule, we may as well also
specify the non-recursive tries too for completeness.

Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since we can specify the recursive retries in a rule, we may as well also
specify the non-recursive tries too for completeness.

Reflects ceph.git commit d1b97462cffccc871914859eaee562f2786abfd1.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: apply chooseleaf_tries to firstn mode too</title>
<updated>2013-12-31T18:32:22+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f18650ace38ef200dd1578257c75e9407297953c'/>
<id>f18650ace38ef200dd1578257c75e9407297953c</id>
<content type='text'>
Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well.  Note that we have
slightly different behavior here than with indep:

 If the firstn value is not specified for firstn, we pass through the
 normal attempt count.  This maintains compatibility with legacy behavior.
 Note that this is usually *not* actually N^2 work, though, because of the
 descend_once tunable.  However, descend_once is unfortunately *not* the
 same thing as 1 chooseleaf try because it is only checked on a reject but
 not on a collision.  Sigh.

 In contrast, for indep, if tries is not specified we default to 1
 recursive attempt, because that is simply more sane, and we have the
 option to do so.  The descend_once tunable has no effect for indep.

Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well.  Note that we have
slightly different behavior here than with indep:

 If the firstn value is not specified for firstn, we pass through the
 normal attempt count.  This maintains compatibility with legacy behavior.
 Note that this is usually *not* actually N^2 work, though, because of the
 descend_once tunable.  However, descend_once is unfortunately *not* the
 same thing as 1 chooseleaf try because it is only checked on a reject but
 not on a collision.  Sigh.

 In contrast, for indep, if tries is not specified we default to 1
 recursive attempt, because that is simply more sane, and we have the
 option to do so.  The descend_once tunable has no effect for indep.

Reflects ceph.git commit 64aeded50d80942d66a5ec7b604ff2fcbf5d7b63.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>crush: new SET_CHOOSE_LEAF_TRIES command</title>
<updated>2013-12-31T18:32:21+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>ilya.dryomov@inktank.com</email>
</author>
<published>2013-12-24T19:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=be3226acc5544bcc91e756eb3ee6ca7b74f6f0a8'/>
<id>be3226acc5544bcc91e756eb3ee6ca7b74f6f0a8</id>
<content type='text'>
Explicitly control the number of sample attempts, and allow the number of
tries in the recursive call to be explicitly controlled via the rule. This
is important because the amount of time we want to spend looking for a
solution may be rule dependent (e.g., higher for the wide indep pool than
the rep pools).

(We should do the same for the other tunables, by the way!)

Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Explicitly control the number of sample attempts, and allow the number of
tries in the recursive call to be explicitly controlled via the rule. This
is important because the amount of time we want to spend looking for a
solution may be rule dependent (e.g., higher for the wide indep pool than
the rep pools).

(We should do the same for the other tunables, by the way!)

Reflects ceph.git commit c43c893be872f709c787bc57f46c0e97876ff681.

Signed-off-by: Ilya Dryomov &lt;ilya.dryomov@inktank.com&gt;
Reviewed-by: Sage Weil &lt;sage@inktank.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
