<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/nvme/host/fc.c, branch v5.5</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>nvme-fc: fix double-free scenarios on hw queues</title>
<updated>2019-11-26T18:00:13+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-11-21T17:59:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c869e494ef8b5846d9ba91f1e922c23cd444f0c1'/>
<id>c869e494ef8b5846d9ba91f1e922c23cd444f0c1</id>
<content type='text'>
If an error occurs on one of the ios used for creating an
association, the creating routine has error paths that are
invoked by the command failure and the error paths will free
up the controller resources created to that point.

But... the io was ultimately determined by an asynchronous
completion routine that detected the error and which
unconditionally invokes the error_recovery path which calls
delete_association. Delete association deletes all outstanding
io then tears down the controller resources. So the
create_association thread can be running in parallel with
the error_recovery thread. What was seen was the LLDD received
a call to delete a queue, causing the LLDD to do a free of a
resource, then the transport called the delete queue again
causing the driver to repeat the free call. The second free
routine corrupted the allocator. The transport shouldn't be
making the duplicate call, and the delete queue is just one
of the resources being freed.

To fix, it is realized that the create_association path is
completely serialized with one command at a time. So the
failed io completion will always be seen by the create_association
path and as of the failure, there are no ios to terminate and there
is no reason to be manipulating queue freeze states, etc.
The serialized condition stays true until the controller is
transitioned to the LIVE state. Thus the fix is to change the
error recovery path to check the controller state and only
invoke the teardown path if not already in the CONNECTING state.

Reviewed-by: Himanshu Madhani &lt;hmadhani@marvell.com&gt;
Reviewed-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If an error occurs on one of the ios used for creating an
association, the creating routine has error paths that are
invoked by the command failure and the error paths will free
up the controller resources created to that point.

But... the io was ultimately determined by an asynchronous
completion routine that detected the error and which
unconditionally invokes the error_recovery path which calls
delete_association. Delete association deletes all outstanding
io then tears down the controller resources. So the
create_association thread can be running in parallel with
the error_recovery thread. What was seen was the LLDD received
a call to delete a queue, causing the LLDD to do a free of a
resource, then the transport called the delete queue again
causing the driver to repeat the free call. The second free
routine corrupted the allocator. The transport shouldn't be
making the duplicate call, and the delete queue is just one
of the resources being freed.

To fix, it is realized that the create_association path is
completely serialized with one command at a time. So the
failed io completion will always be seen by the create_association
path and as of the failure, there are no ios to terminate and there
is no reason to be manipulating queue freeze states, etc.
The serialized condition stays true until the controller is
transitioned to the LIVE state. Thus the fix is to change the
error recovery path to check the controller state and only
invoke the teardown path if not already in the CONNECTING state.

Reviewed-by: Himanshu Madhani &lt;hmadhani@marvell.com&gt;
Reviewed-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme_fc: add module to ops template to allow module references</title>
<updated>2019-11-26T17:48:27+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-11-14T23:15:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=863fbae929c7a5b64e96b8a3ffb34a29eefb9f8f'/>
<id>863fbae929c7a5b64e96b8a3ffb34a29eefb9f8f</id>
<content type='text'>
In nvme-fc: it's possible to have connected active controllers
and as no references are taken on the LLDD, the LLDD can be
unloaded.  The controller would enter a reconnect state and as
long as the LLDD resumed within the reconnect timeout, the
controller would resume.  But if a namespace on the controller
is the root device, allowing the driver to unload can be problematic.
To reload the driver, it may require new io to the boot device,
and as it's no longer connected we get into a catch-22 that
eventually fails, and the system locks up.

Fix this issue by taking a module reference for every connected
controller (which is what the core layer did to the transport
module). Reference is cleared when the controller is removed.

Acked-by: Himanshu Madhani &lt;hmadhani@marvell.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In nvme-fc: it's possible to have connected active controllers
and as no references are taken on the LLDD, the LLDD can be
unloaded.  The controller would enter a reconnect state and as
long as the LLDD resumed within the reconnect timeout, the
controller would resume.  But if a namespace on the controller
is the root device, allowing the driver to unload can be problematic.
To reload the driver, it may require new io to the boot device,
and as it's no longer connected we get into a catch-22 that
eventually fails, and the system locks up.

Fix this issue by taking a module reference for every connected
controller (which is what the core layer did to the transport
module). Reference is cleared when the controller is removed.

Acked-by: Himanshu Madhani &lt;hmadhani@marvell.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: Avoid preallocating big SGL for data</title>
<updated>2019-11-26T17:14:01+00:00</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2019-11-24T16:38:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b1ae1a238900474a9f51431c0f7f169ade1faa19'/>
<id>b1ae1a238900474a9f51431c0f7f169ade1faa19</id>
<content type='text'>
nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.

Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.

Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme: move common call to nvme_cleanup_cmd to core layer</title>
<updated>2019-11-04T17:56:41+00:00</updated>
<author>
<name>Max Gurtovoy</name>
<email>maxg@mellanox.com</email>
</author>
<published>2019-10-13T16:57:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=16686f3a6c3cd6316dbc5cba886242c73f713237'/>
<id>16686f3a6c3cd6316dbc5cba886242c73f713237</id>
<content type='text'>
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.

Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.

Signed-off-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: ensure association_id is cleared regardless of a Disconnect LS</title>
<updated>2019-11-04T17:56:40+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-09-27T21:27:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bcde5f0fc7d318c98d4234b52bcc1a87fc2162d9'/>
<id>bcde5f0fc7d318c98d4234b52bcc1a87fc2162d9</id>
<content type='text'>
Code today only clears the association_id if a Disconnect LS is transmit.

Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Code today only clears the association_id if a Disconnect LS is transmit.

Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: clarify error messages</title>
<updated>2019-11-04T17:56:40+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-09-27T22:27:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7db394848ece0e0706dfe8e4940b24e949f3b88f'/>
<id>7db394848ece0e0706dfe8e4940b24e949f3b88f</id>
<content type='text'>
Change wording on a couple of messages to clarify what happened.

Signed-off-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change wording on a couple of messages to clarify what happened.

Signed-off-by: Ewan D. Milne &lt;emilne@redhat.com&gt;
Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu</title>
<updated>2019-11-04T17:56:40+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-09-27T21:51:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=44fbf3bb1ac3dbebc6bd07eb68abf2d0badfae65'/>
<id>44fbf3bb1ac3dbebc6bd07eb68abf2d0badfae65</id>
<content type='text'>
Set the new category field in the FC-NVME CMND_IU based on queue number.

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Set the new category field in the FC-NVME CMND_IU based on queue number.

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes</title>
<updated>2019-11-04T17:56:40+00:00</updated>
<author>
<name>James Smart</name>
<email>jsmart2021@gmail.com</email>
</author>
<published>2019-09-27T21:51:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=53b2b2f59967c0b7eb4df265136b1cc25b9fb287'/>
<id>53b2b2f59967c0b7eb4df265136b1cc25b9fb287</id>
<content type='text'>
Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.

Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.

Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new

Signed-off-by: James Smart &lt;jsmart2021@gmail.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: Fail transport errors with NVME_SC_HOST_PATH</title>
<updated>2019-09-12T15:50:45+00:00</updated>
<author>
<name>James Smart</name>
<email>james.smart@broadcom.com</email>
</author>
<published>2019-08-06T07:14:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=74bd8cbe7dd64c50623cf0dc20eaeaab6b68d3d6'/>
<id>74bd8cbe7dd64c50623cf0dc20eaeaab6b68d3d6</id>
<content type='text'>
NVME_SC_INTERNAL should indicate an internal controller errors
and not host transport errors. These errors will propagate to
upper layers (essentially nvme core) and be interpereted as
transport errors which should not be taken into account for
namespace state or condition.

Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NVME_SC_INTERNAL should indicate an internal controller errors
and not host transport errors. These errors will propagate to
upper layers (essentially nvme core) and be interpereted as
transport errors which should not be taken into account for
namespace state or condition.

Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-fc: Use rq_dma_dir macro</title>
<updated>2019-08-29T19:55:03+00:00</updated>
<author>
<name>Israel Rukshin</name>
<email>israelr@mellanox.com</email>
</author>
<published>2019-08-28T11:11:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f15872c5dce43b69c3dee7739d7d3f54c54fc527'/>
<id>f15872c5dce43b69c3dee7739d7d3f54c54fc527</id>
<content type='text'>
Remove code duplication.

Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove code duplication.

Signed-off-by: Israel Rukshin &lt;israelr@mellanox.com&gt;
Reviewed-by: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Reviewed-by: James Smart &lt;james.smart@broadcom.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
</feed>
