<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/md/dm.h, branch v3.6</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>dm: retain table limits when swapping to new table with no devices</title>
<updated>2012-09-26T22:45:45+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2012-09-26T22:45:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3ae706561637331aa578e52bb89ecbba5edcb7a9'/>
<id>3ae706561637331aa578e52bb89ecbba5edcb7a9</id>
<content type='text'>
Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices.  This is to reduce the chance that requests not
respecting the hardware limits will reach the device.

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath.  DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored.  The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath.  This became a much bigger issue after v3.6 commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").

Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a safety net that will re-use the DM device's existing limits in the
event that DM device has a temporary table that doesn't have any
component devices.  This is to reduce the chance that requests not
respecting the hardware limits will reach the device.

DM recalculates queue limits based only on devices which currently exist
in the table.  This creates a problem in the event all devices are
temporarily removed such as all paths being lost in multipath.  DM will
reset the limits to the maximum permissible, which can then assemble
requests which exceed the limits of the paths when the paths are
restored.  The request will fail the blk_rq_check_limits() test when
sent to a path with lower limits, and will be retried without end by
multipath.  This became a much bigger issue after v3.6 commit fe86cdcef
("block: do not artificially constrain max_sectors for stacking
drivers").

Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm thin: commit before gathering status</title>
<updated>2012-07-27T14:08:16+00:00</updated>
<author>
<name>Alasdair G Kergon</name>
<email>agk@redhat.com</email>
</author>
<published>2012-07-27T14:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1f4e0ff07980820977f45d6a5dbc81d3bb9ce4d3'/>
<id>1f4e0ff07980820977f45d6a5dbc81d3bb9ce4d3</id>
<content type='text'>
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm table: add immutable feature</title>
<updated>2011-10-31T20:19:04+00:00</updated>
<author>
<name>Alasdair G Kergon</name>
<email>agk@redhat.com</email>
</author>
<published>2011-10-31T20:19:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=36a0456fbf2d9680bf9af81b39daf4a8e22cb1b8'/>
<id>36a0456fbf2d9680bf9af81b39daf4a8e22cb1b8</id>
<content type='text'>
Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
with any other target type, and once loaded into a device, it cannot be
replaced with a table containing a different type.

The thin provisioning pool device will use this.

Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
with any other target type, and once loaded into a device, it cannot be
replaced with a table containing a different type.

The thin provisioning pool device will use this.

Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: ignore merge_bvec for snapshots when safe</title>
<updated>2011-08-02T11:32:04+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2011-08-02T11:32:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d5b9dd04bd74b774b8e8d93ced7a0d15ad403fa9'/>
<id>d5b9dd04bd74b774b8e8d93ced7a0d15ad403fa9</id>
<content type='text'>
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns.  When set, use this to send large bios to snapshots
which can split them if necessary.  Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.

Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function.  After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.

The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped.  Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function.  Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.

If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns.  When set, use this to send large bios to snapshots
which can split them if necessary.  Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.

Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function.  After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.

The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped.  Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function.  Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.

If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: Require subsystems to explicitly allocate bio_set integrity mempool</title>
<updated>2011-03-17T10:11:05+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2011-03-17T10:11:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a91a2785b200864aef2270ed6a3babac7a253a20'/>
<id>a91a2785b200864aef2270ed6a3babac7a253a20</id>
<content type='text'>
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Zdenek Kabelac &lt;zkabelac@redhat.com&gt;
Acked-by: Mike Snitzer &lt;snizer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.

Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.

Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reported-by: Zdenek Kabelac &lt;zkabelac@redhat.com&gt;
Acked-by: Mike Snitzer &lt;snizer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jaxboe@fusionio.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: linear support discard</title>
<updated>2010-08-12T03:14:08+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-08-12T03:14:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=5ae89a8720c28caf35c4e53711d77df2856c404e'/>
<id>5ae89a8720c28caf35c4e53711d77df2856c404e</id>
<content type='text'>
Allow discards to be passed through to linear mappings if at least one
underlying device supports it.  Discards will be forwarded only to
devices that support them.

A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.

Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allow discards to be passed through to linear mappings if at least one
underlying device supports it.  Discards will be forwarded only to
devices that support them.

A target that supports discards should set num_discard_requests to
indicate how many times each discard request must be submitted to it.

Verify table's underlying devices support discards prior to setting the
associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm ioctl: refactor dm_table_complete</title>
<updated>2010-08-12T03:14:03+00:00</updated>
<author>
<name>Will Drewry</name>
<email>wad@chromium.org</email>
</author>
<published>2010-08-12T03:14:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=26803b9f06d365122fae82e7554a66ef8278e0bb'/>
<id>26803b9f06d365122fae82e7554a66ef8278e0bb</id>
<content type='text'>
This change unifies the various checks and finalization that occurs on a
table prior to use.  By doing so, it allows table construction without
traversing the dm-ioctl interface.

Signed-off-by: Will Drewry &lt;wad@chromium.org&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change unifies the various checks and finalization that occurs on a
table prior to use.  By doing so, it allows table construction without
traversing the dm-ioctl interface.

Signed-off-by: Will Drewry &lt;wad@chromium.org&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: do not initialise full request queue when bio based</title>
<updated>2010-08-12T03:14:02+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-08-12T03:14:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4a0b4ddf261fc89c050fe0a10ec57a61251d7ac0'/>
<id>4a0b4ddf261fc89c050fe0a10ec57a61251d7ac0</id>
<content type='text'>
Change bio-based mapped devices no longer to have a fully initialized
request_queue (request_fn, elevator, etc).  This means bio-based DM
devices no longer register elevator sysfs attributes ('iosched/' tree
or 'scheduler' other than "none").

In contrast, a request-based DM device will continue to have a full
request_queue and will register elevator sysfs attributes.  Therefore
a user can determine a DM device's type by checking if elevator sysfs
attributes exist.

First allocate a minimalist request_queue structure for a DM device
(needed for both bio and request-based DM).

Initialization of a full request_queue is deferred until it is known
that the DM device is request-based, at the end of the table load
sequence.

Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().

The md-&gt;type_lock mutex is used to protect md-&gt;queue, in addition to
md-&gt;type, during table_load().

A DM device's first table_load will establish the immutable md-&gt;type.
But md-&gt;queue initialization, based on md-&gt;type, may fail at that time
(because blk_init_allocated_queue cannot allocate memory).  Therefore
any subsequent table_load must (re)try dm_setup_md_queue independently of
establishing md-&gt;type.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change bio-based mapped devices no longer to have a fully initialized
request_queue (request_fn, elevator, etc).  This means bio-based DM
devices no longer register elevator sysfs attributes ('iosched/' tree
or 'scheduler' other than "none").

In contrast, a request-based DM device will continue to have a full
request_queue and will register elevator sysfs attributes.  Therefore
a user can determine a DM device's type by checking if elevator sysfs
attributes exist.

First allocate a minimalist request_queue structure for a DM device
(needed for both bio and request-based DM).

Initialization of a full request_queue is deferred until it is known
that the DM device is request-based, at the end of the table load
sequence.

Factor DM device's request_queue initialization:
- common to both request-based and bio-based into dm_init_md_queue().
- specific to request-based into dm_init_request_based_queue().

The md-&gt;type_lock mutex is used to protect md-&gt;queue, in addition to
md-&gt;type, during table_load().

A DM device's first table_load will establish the immutable md-&gt;type.
But md-&gt;queue initialization, based on md-&gt;type, may fail at that time
(because blk_init_allocated_queue cannot allocate memory).  Therefore
any subsequent table_load must (re)try dm_setup_md_queue independently of
establishing md-&gt;type.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm ioctl: make bio or request based device type immutable</title>
<updated>2010-08-12T03:14:01+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-08-12T03:14:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a5664dad7e1a278d2915c2bf79cf42250e12d7db'/>
<id>a5664dad7e1a278d2915c2bf79cf42250e12d7db</id>
<content type='text'>
Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.

This patch performs different device initialisation in each of the two
cases.  (We don't think it's necessary to add code to support changing
between the two types.)

Allowed md-&gt;type transitions:
  DM_TYPE_NONE to DM_TYPE_BIO_BASED
  DM_TYPE_NONE to DM_TYPE_REQUEST_BASED

We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.

Introduce 'type_lock' into the struct mapped_device to protect md-&gt;type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md-&gt;type_lock is held.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;

 drivers/md/dm-ioctl.c    |   15 +++++++++++++++
 drivers/md/dm.c          |   37 ++++++++++++++++++++++++++++++-------
 drivers/md/dm.h          |    5 +++++
 include/linux/dm-ioctl.h |    4 ++--
 4 files changed, 52 insertions(+), 9 deletions(-)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Determine whether a mapped device is bio-based or request-based when
loading its first (inactive) table and don't allow that to be changed
later.

This patch performs different device initialisation in each of the two
cases.  (We don't think it's necessary to add code to support changing
between the two types.)

Allowed md-&gt;type transitions:
  DM_TYPE_NONE to DM_TYPE_BIO_BASED
  DM_TYPE_NONE to DM_TYPE_REQUEST_BASED

We now prevent table_load from replacing the inactive table with a
conflicting type of table even after an explicit table_clear.

Introduce 'type_lock' into the struct mapped_device to protect md-&gt;type
and to prepare for the next patch that will change the queue
initialization and allocate memory while md-&gt;type_lock is held.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;

 drivers/md/dm-ioctl.c    |   15 +++++++++++++++
 drivers/md/dm.c          |   37 ++++++++++++++++++++++++++++++-------
 drivers/md/dm.h          |    5 +++++
 include/linux/dm-ioctl.h |    4 ++--
 4 files changed, 52 insertions(+), 9 deletions(-)
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: separate device deletion from dm_put</title>
<updated>2010-08-12T03:13:56+00:00</updated>
<author>
<name>Kiyoshi Ueda</name>
<email>k-ueda@ct.jp.nec.com</email>
</author>
<published>2010-08-12T03:13:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=3f77316de0ec0fd208467fbee8d9edc70e2c73b2'/>
<id>3f77316de0ec0fd208467fbee8d9edc70e2c73b2</id>
<content type='text'>
This patch separates the device deletion code from dm_put()
to make sure the deletion happens in the process context.

By this patch, device deletion always occurs in an ioctl (process)
context and dm_put() can be called in interrupt context.
As a result, the request-based dm's bad dm_put() usage pointed out
by Mikulas below disappears.
    http://marc.info/?l=dm-devel&amp;m=126699981019735&amp;w=2

Without this patch, I confirmed there is a case to crash the system:
    dm_put() =&gt; dm_table_destroy() =&gt; vfree() =&gt; BUG_ON(in_interrupt())

Some more backgrounds and details:
In request-based dm, a device opener can remove a mapped_device
while the last request is still completing, because bios in the last
request complete first and then the device opener can close and remove
the mapped_device before the last request completes:
  CPU0                                          CPU1
  =================================================================
  &lt;&lt;INTERRUPT&gt;&gt;
  blk_end_request_all(clone_rq)
    blk_update_request(clone_rq)
      bio_endio(clone_bio) == end_clone_bio
        blk_update_request(orig_rq)
          bio_endio(orig_bio)
                                                &lt;&lt;I/O completed&gt;&gt;
                                                dm_blk_close()
                                                dev_remove()
                                                  dm_put(md)
                                                    &lt;&lt;Free md&gt;&gt;
   blk_finish_request(clone_rq)
     ....
     dm_end_request(clone_rq)
       free_rq_clone(clone_rq)
       blk_end_request_all(orig_rq)
       rq_completed(md)

So request-based dm used dm_get()/dm_put() to hold md for each I/O
until its request completion handling is fully done.
However, the final dm_put() can call the device deletion code which
must not be run in interrupt context and may cause kernel panic.

To solve the problem, this patch moves the device deletion code,
dm_destroy(), to predetermined places that is actually deleting
the mapped_device in ioctl (process) context, and changes dm_put()
just to decrement the reference count of the mapped_device.
By this change, dm_put() can be used in any context and the symmetric
model below is introduced:
    dm_create():  create a mapped_device
    dm_destroy(): destroy a mapped_device
    dm_get():     increment the reference count of a mapped_device
    dm_put():     decrement the reference count of a mapped_device

dm_destroy() waits for all references of the mapped_device to disappear,
then deletes the mapped_device.

dm_destroy() uses active waiting with msleep(1), since deleting
the mapped_device isn't performance-critical task.
And since at this point, nobody opens the mapped_device and no new
reference will be taken, the pending counts are just for racing
completing activity and will eventually decrease to zero.

For the unlikely case of the forced module unload, dm_destroy_immediate(),
which doesn't wait and forcibly deletes the mapped_device, is also
introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
may be stuck and never return.
And now, because the mapped_device is deleted at this point, subsequent
accesses to the mapped_device may cause NULL pointer references.

Cc: stable@kernel.org
Signed-off-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch separates the device deletion code from dm_put()
to make sure the deletion happens in the process context.

By this patch, device deletion always occurs in an ioctl (process)
context and dm_put() can be called in interrupt context.
As a result, the request-based dm's bad dm_put() usage pointed out
by Mikulas below disappears.
    http://marc.info/?l=dm-devel&amp;m=126699981019735&amp;w=2

Without this patch, I confirmed there is a case to crash the system:
    dm_put() =&gt; dm_table_destroy() =&gt; vfree() =&gt; BUG_ON(in_interrupt())

Some more backgrounds and details:
In request-based dm, a device opener can remove a mapped_device
while the last request is still completing, because bios in the last
request complete first and then the device opener can close and remove
the mapped_device before the last request completes:
  CPU0                                          CPU1
  =================================================================
  &lt;&lt;INTERRUPT&gt;&gt;
  blk_end_request_all(clone_rq)
    blk_update_request(clone_rq)
      bio_endio(clone_bio) == end_clone_bio
        blk_update_request(orig_rq)
          bio_endio(orig_bio)
                                                &lt;&lt;I/O completed&gt;&gt;
                                                dm_blk_close()
                                                dev_remove()
                                                  dm_put(md)
                                                    &lt;&lt;Free md&gt;&gt;
   blk_finish_request(clone_rq)
     ....
     dm_end_request(clone_rq)
       free_rq_clone(clone_rq)
       blk_end_request_all(orig_rq)
       rq_completed(md)

So request-based dm used dm_get()/dm_put() to hold md for each I/O
until its request completion handling is fully done.
However, the final dm_put() can call the device deletion code which
must not be run in interrupt context and may cause kernel panic.

To solve the problem, this patch moves the device deletion code,
dm_destroy(), to predetermined places that is actually deleting
the mapped_device in ioctl (process) context, and changes dm_put()
just to decrement the reference count of the mapped_device.
By this change, dm_put() can be used in any context and the symmetric
model below is introduced:
    dm_create():  create a mapped_device
    dm_destroy(): destroy a mapped_device
    dm_get():     increment the reference count of a mapped_device
    dm_put():     decrement the reference count of a mapped_device

dm_destroy() waits for all references of the mapped_device to disappear,
then deletes the mapped_device.

dm_destroy() uses active waiting with msleep(1), since deleting
the mapped_device isn't performance-critical task.
And since at this point, nobody opens the mapped_device and no new
reference will be taken, the pending counts are just for racing
completing activity and will eventually decrease to zero.

For the unlikely case of the forced module unload, dm_destroy_immediate(),
which doesn't wait and forcibly deletes the mapped_device, is also
introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
may be stuck and never return.
And now, because the mapped_device is deleted at this point, subsequent
accesses to the mapped_device may cause NULL pointer references.

Cc: stable@kernel.org
Signed-off-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
