<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/dm-table.c, branch v3.18.3</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>dm: allow active and inactive tables to share dm_devs</title>
<updated>2014-10-06T00:03:35+00:00</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2014-08-13T18:53:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=86f1152b117a404229fd6f08ec3faca779f37b92'/>
<id>86f1152b117a404229fd6f08ec3faca779f37b92</id>
<content type='text'>
Until this change, when loading a new DM table, DM core would re-open
all of the devices in the DM table.  Now, DM core will avoid redundant
device opens (and closes when destroying the old table) if the old
table already has a device open using the same mode.  This is achieved
by managing reference counts on the table_devices that DM core now
stores in the mapped_device structure (rather than in the dm_table
structure).  So a mapped_device's active and inactive dm_tables' dm_dev
lists now just point to the dm_devs stored in the mapped_device's
table_devices list.

This improvement in DM core's device reference counting has the
side-effect of fixing a long-standing limitation of the multipath
target: a DM multipath table couldn't include any paths that were unusable
(failed).  For example: if all paths have failed and you add a new,
working, path to the table; you can't use it since the table load would
fail due to it still containing failed paths.  Now a re-load of a
multipath table can include failed devices and when those devices become
active again they can be used instantly.

The device list code in dm.c isn't a straight copy/paste from the code in
dm-table.c, but it's very close (aside from some variable renames).  One
subtle difference is that find_table_device for the tables_devices list
will only match devices with the same name and mode.  This is because we
don't want to upgrade a device's mode in the active table when an
inactive table is loaded.

Access to the mapped_device structure's tables_devices list requires a
mutex (tables_devices_lock), so that tables cannot be created and
destroyed concurrently.

Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Until this change, when loading a new DM table, DM core would re-open
all of the devices in the DM table.  Now, DM core will avoid redundant
device opens (and closes when destroying the old table) if the old
table already has a device open using the same mode.  This is achieved
by managing reference counts on the table_devices that DM core now
stores in the mapped_device structure (rather than in the dm_table
structure).  So a mapped_device's active and inactive dm_tables' dm_dev
lists now just point to the dm_devs stored in the mapped_device's
table_devices list.

This improvement in DM core's device reference counting has the
side-effect of fixing a long-standing limitation of the multipath
target: a DM multipath table couldn't include any paths that were unusable
(failed).  For example: if all paths have failed and you add a new,
working, path to the table; you can't use it since the table load would
fail due to it still containing failed paths.  Now a re-load of a
multipath table can include failed devices and when those devices become
active again they can be used instantly.

The device list code in dm.c isn't a straight copy/paste from the code in
dm-table.c, but it's very close (aside from some variable renames).  One
subtle difference is that find_table_device for the tables_devices list
will only match devices with the same name and mode.  This is because we
don't want to upgrade a device's mode in the active table when an
inactive table is loaded.

Access to the mapped_device structure's tables_devices list requires a
mutex (tables_devices_lock), so that tables cannot be created and
destroyed concurrently.

Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: propagate QUEUE_FLAG_NO_SG_MERGE</title>
<updated>2014-08-11T00:54:49+00:00</updated>
<author>
<name>Jeff Moyer</name>
<email>jmoyer@redhat.com</email>
</author>
<published>2014-08-08T15:03:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=200612ec33e555a356eebc717630b866ae2b694f'/>
<id>200612ec33e555a356eebc717630b866ae2b694f</id>
<content type='text'>
Commit 05f1dd5 ("block: add queue flag for disabling SG merging")
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio-&gt;bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req-&gt;nr_phys_segments + 2 &gt; vblk-&gt;sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk-&gt;sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &amp;q-&gt;queue_flags))
                bio-&gt;bi_phys_segments = bio-&gt;bi_vcnt;

and now we've set bio-&gt;bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.16+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 05f1dd5 ("block: add queue flag for disabling SG merging")
introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
default in blk_mq_init_queue for mq-enabled devices.  The effect of
the flag is to bypass the SG segment merging.  Instead, the
bio-&gt;bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req-&gt;nr_phys_segments + 2 &gt; vblk-&gt;sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk-&gt;sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &amp;q-&gt;queue_flags))
                bio-&gt;bi_phys_segments = bio-&gt;bi_vcnt;

and now we've set bio-&gt;bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.

Signed-off-by: Jeff Moyer &lt;jmoyer@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org # 3.16+
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: make dm_table_supports_discards static</title>
<updated>2014-08-01T16:30:34+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2014-07-10T16:23:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a7ffb6a53391c2690263675f13c79a273301d2b3'/>
<id>a7ffb6a53391c2690263675f13c79a273301d2b3</id>
<content type='text'>
The function dm_table_supports_discards is only called from
dm-table.c:dm_table_set_restrictions().  So move it above
dm_table_set_restrictions and make it static.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function dm_table_supports_discards is only called from
dm-table.c:dm_table_set_restrictions().  So move it above
dm_table_set_restrictions and make it static.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: remove symbol export for dm_set_device_limits</title>
<updated>2014-06-04T13:46:34+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2014-06-03T14:30:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=11f0431be2f99c574a65c6dfc0ca205511500f29'/>
<id>11f0431be2f99c574a65c6dfc0ca205511500f29</id>
<content type='text'>
There is no need for code other than DM core to use dm_set_device_limits
so remove its EXPORT_SYMBOL_GPL.  Also, cleanup a couple whitespace nits.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is no need for code other than DM core to use dm_set_device_limits
so remove its EXPORT_SYMBOL_GPL.  Also, cleanup a couple whitespace nits.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: add dm_table_run_md_queue_async</title>
<updated>2014-03-27T20:56:24+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2014-02-28T14:33:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9974fa2c6a7d470ca3c201fe7dbac64bf4dd8d2a'/>
<id>9974fa2c6a7d470ca3c201fe7dbac64bf4dd8d2a</id>
<content type='text'>
Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.

Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce dm_table_run_md_queue_async() to run the request_queue of the
mapped_device associated with a request-based DM table.

Also add dm_md_get_queue() wrapper to extract the request_queue from a
mapped_device.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: make dm_table_alloc_md_mempools static</title>
<updated>2014-03-27T20:56:23+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2014-02-13T18:43:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=473c36dfeecf4e49db928f3284b2fbe981f8c284'/>
<id>473c36dfeecf4e49db928f3284b2fbe981f8c284</id>
<content type='text'>
Make the function dm_table_alloc_md_mempools static because it is not
called from another file.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make the function dm_table_alloc_md_mempools static because it is not
called from another file.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: remove unused buggy code that extends the targets array</title>
<updated>2014-01-07T15:11:44+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2013-11-23T00:51:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=57a2f238564e0700c8648238d31f366246a5b963'/>
<id>57a2f238564e0700c8648238d31f366246a5b963</id>
<content type='text'>
A device mapper table is allocated in the following way:
* The function dm_table_create is called, it gets the number of targets
  as an argument -- it allocates a targets array accordingly.
* For each target, we call dm_table_add_target.

If we add more targets than were specified in dm_table_create, the
function dm_table_add_target reallocates the targets array.  However,
this reallocation code is wrong - it moves the targets array to a new
location, while some target constructors hold pointers to the array in
the old location.

The following DM target drivers save the pointer to the target
structure, so they corrupt memory if the target array is moved:
multipath, raid, mirror, snapshot, stripe, switch, thin, verity.

Under normal circumstances, the reallocation function is not called
(because dm_table_create is called with the correct number of targets),
so the buggy reallocation code is not used.

Prior to the fix "dm table: fail dm_table_create on dm_round_up
overflow", the reallocation code could only be used in case the user
specifies too large a value in param-&gt;target_count, such as 0xffffffff.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A device mapper table is allocated in the following way:
* The function dm_table_create is called, it gets the number of targets
  as an argument -- it allocates a targets array accordingly.
* For each target, we call dm_table_add_target.

If we add more targets than were specified in dm_table_create, the
function dm_table_add_target reallocates the targets array.  However,
this reallocation code is wrong - it moves the targets array to a new
location, while some target constructors hold pointers to the array in
the old location.

The following DM target drivers save the pointer to the target
structure, so they corrupt memory if the target array is moved:
multipath, raid, mirror, snapshot, stripe, switch, thin, verity.

Under normal circumstances, the reallocation function is not called
(because dm_table_create is called with the correct number of targets),
so the buggy reallocation code is not used.

Prior to the fix "dm table: fail dm_table_create on dm_round_up
overflow", the reallocation code could only be used in case the user
specifies too large a value in param-&gt;target_count, such as 0xffffffff.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: fail dm_table_create on dm_round_up overflow</title>
<updated>2013-12-10T21:34:27+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2013-11-23T00:52:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5b2d06576c5410c10d95adfd5c4d8b24de861d87'/>
<id>5b2d06576c5410c10d95adfd5c4d8b24de861d87</id>
<content type='text'>
The dm_round_up function may overflow to zero.  In this case,
dm_table_create() must fail rather than go on to allocate an empty array
with alloc_targets().

This fixes a possible memory corruption that could be caused by passing
too large a number in "param-&gt;target_count".

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The dm_round_up function may overflow to zero.  In this case,
dm_table_create() must fail rather than go on to allocate an empty array
with alloc_targets().

This fixes a possible memory corruption that could be caused by passing
too large a number in "param-&gt;target_count".

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: print error on preresume failure</title>
<updated>2013-11-09T23:20:21+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2013-10-24T18:10:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7833b08e18241a1c35c09ef38be840cbf6c58acf'/>
<id>7833b08e18241a1c35c09ef38be840cbf6c58acf</id>
<content type='text'>
If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.

This change was motivated by local preresume error logging that was
added to the cache target ("preresume failed").  Elevating this
target-agnostic context for the where the target-specific error occurred
relative to the DM core's callouts makes sense.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.

This change was motivated by local preresume error logging that was
added to the cache target ("preresume failed").  Elevating this
target-agnostic context for the where the target-specific error occurred
relative to the DM core's callouts makes sense.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: allocate buffer for messages with small number of arguments using GFP_NOIO</title>
<updated>2013-10-31T17:55:45+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2013-10-31T17:55:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f36afb3957353d2529cb2b00f78fdccd14fc5e9c'/>
<id>f36afb3957353d2529cb2b00f78fdccd14fc5e9c</id>
<content type='text'>
dm-mpath and dm-thin must process messages even if some device is
suspended, so we allocate argv buffer with GFP_NOIO. These messages have
a small fixed number of arguments.

On the other hand, dm-switch needs to process bulk data using messages
so excessive use of GFP_NOIO could cause trouble.

The patch also lowers the default number of arguments from 64 to 8, so
that there is smaller load on GFP_NOIO allocations.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm-mpath and dm-thin must process messages even if some device is
suspended, so we allocate argv buffer with GFP_NOIO. These messages have
a small fixed number of arguments.

On the other hand, dm-switch needs to process bulk data using messages
so excessive use of GFP_NOIO could cause trouble.

The patch also lowers the default number of arguments from 64 to 8, so
that there is smaller load on GFP_NOIO allocations.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
Acked-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
