<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/nvme, branch v4.9.16</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too</title>
<updated>2017-01-19T19:18:05+00:00</updated>
<author>
<name>Guilherme G. Piccoli</name>
<email>gpiccoli@linux.vnet.ibm.com</email>
</author>
<published>2016-12-29T00:13:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0aefd99f37f315a67dcb6487ce371f32dd1422a7'/>
<id>0aefd99f37f315a67dcb6487ce371f32dd1422a7</id>
<content type='text'>
commit b5a10c5f7532b7473776da87e67f8301bbc32693 upstream.

Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl-&gt;tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl-&gt;tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: Andrew Byrne &lt;byrneadw@ie.ibm.com&gt;
Reported-by: Jaime A. H. Gomez &lt;jahgomez@mx1.ibm.com&gt;
Reported-by: Zachary D. Myers &lt;zdmyers@us.ibm.com&gt;
Signed-off-by: Guilherme G. Piccoli &lt;gpiccoli@linux.vnet.ibm.com&gt;
Acked-by: Jeffrey Lien &lt;Jeff.Lien@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b5a10c5f7532b7473776da87e67f8301bbc32693 upstream.

Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl-&gt;tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl-&gt;tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: Andrew Byrne &lt;byrneadw@ie.ibm.com&gt;
Reported-by: Jaime A. H. Gomez &lt;jahgomez@mx1.ibm.com&gt;
Reported-by: Zachary D. Myers &lt;zdmyers@us.ibm.com&gt;
Signed-off-by: Guilherme G. Piccoli &lt;gpiccoli@linux.vnet.ibm.com&gt;
Acked-by: Jeffrey Lien &lt;Jeff.Lien@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet: Fix possible infinite loop triggered on hot namespace removal</title>
<updated>2017-01-06T09:40:14+00:00</updated>
<author>
<name>Solganik Alexander</name>
<email>sashas@lightbitslabs.com</email>
</author>
<published>2016-10-30T08:35:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fe3d462821b02df53e5c99b2c9b059824adbac0a'/>
<id>fe3d462821b02df53e5c99b2c9b059824adbac0a</id>
<content type='text'>
commit e4fcf07cca6a3b6c4be00df16f08be894325eaa3 upstream.

When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.

The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.

By changing that, we cannot rely on ns-&gt;dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Solganik Alexander &lt;sashas@lightbitslabs.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e4fcf07cca6a3b6c4be00df16f08be894325eaa3 upstream.

When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.

The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.

By changing that, we cannot rely on ns-&gt;dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Solganik Alexander &lt;sashas@lightbitslabs.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>nvme/pci: Don't free queues on error</title>
<updated>2016-11-16T19:39:57+00:00</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2016-11-15T20:56:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d48756228ee9161ac8836b346589a43fabdc9f3c'/>
<id>d48756228ee9161ac8836b346589a43fabdc9f3c</id>
<content type='text'>
The nvme_remove function tears down all allocated resources in the correct
order, so no need to free queues on error during initialization. This
fixes possible use-after-free errors when queues are still associated
with a blk-mq hctx.

Reported-by: Scott Bauer &lt;scott.bauer@intel.com&gt;
Tested-by: Scott Bauer &lt;scott.bauer@intel.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimbeg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The nvme_remove function tears down all allocated resources in the correct
order, so no need to free queues on error during initialization. This
fixes possible use-after-free errors when queues are still associated
with a blk-mq hctx.

Reported-by: Scott Bauer &lt;scott.bauer@intel.com&gt;
Tested-by: Scott Bauer &lt;scott.bauer@intel.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimbeg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-rdma: drain the queue-pair just before freeing it</title>
<updated>2016-11-14T00:08:53+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2016-11-06T09:03:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=14c862dbb0a0e0a9baec20480d441e32cb54b2b9'/>
<id>14c862dbb0a0e0a9baec20480d441e32cb54b2b9</id>
<content type='text'>
draining the qp right after disconnect might not suffice because
the nvmet sq is not fully drained (in nvmet_sq_destroy) and we might
see completions after the drain. Instead, drain right before the
qp destroy which comes after the sq destruction and we can be sure
that no posts come after the drain.

Tested-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
draining the qp right after disconnect might not suffice because
the nvmet sq is not fully drained (in nvmet_sq_destroy) and we might
see completions after the drain. Instead, drain right before the
qp destroy which comes after the sq destruction and we can be sure
that no posts come after the drain.

Tested-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-rdma: stop and free io queues on connect failure</title>
<updated>2016-11-14T00:08:53+00:00</updated>
<author>
<name>Steve Wise</name>
<email>swise@opengridcomputing.com</email>
</author>
<published>2016-11-08T17:16:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c8dbc37cd81d4705fce51123f5d81ea3267a5b88'/>
<id>c8dbc37cd81d4705fce51123f5d81ea3267a5b88</id>
<content type='text'>
While testing nvme-rdma with the spdk nvmf target over iw_cxgb4, I
configured the target (mistakenly) to generate an error creating the
NVMF IO queues.  This resulted a "Invalid SQE Parameter" error sent back
to the host on the first IO queue connect:

[ 9610.928182] nvme nvme1: queue_size 128 &gt; ctrl maxcmd 120, clamping down
[ 9610.938745] nvme nvme1: creating 32 I/O queues.

So nvmf_connect_io_queue() returns an error to
nvmf_connect_io_queue() / nvmf_connect_io_queues(), and that
is returned to nvme_rdma_create_io_queues().  In the error path,
nvmf_rdma_create_io_queues() frees the queue tagset memory _before_
stopping and freeing the IB queues, which causes yet another
touch-after-free crash due to SQ CQEs being flushed after the ib_cqe
structs pointed-to by the flushed WRs have been freed (since they are
part of the nvme_rdma_request struct).

The fix is to stop and free the queues in nvmf_connect_io_queues()
if there is an error connecting any of the queues.

Signed-off-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While testing nvme-rdma with the spdk nvmf target over iw_cxgb4, I
configured the target (mistakenly) to generate an error creating the
NVMF IO queues.  This resulted a "Invalid SQE Parameter" error sent back
to the host on the first IO queue connect:

[ 9610.928182] nvme nvme1: queue_size 128 &gt; ctrl maxcmd 120, clamping down
[ 9610.938745] nvme nvme1: creating 32 I/O queues.

So nvmf_connect_io_queue() returns an error to
nvmf_connect_io_queue() / nvmf_connect_io_queues(), and that
is returned to nvme_rdma_create_io_queues().  In the error path,
nvmf_rdma_create_io_queues() frees the queue tagset memory _before_
stopping and freeing the IB queues, which causes yet another
touch-after-free crash due to SQ CQEs being flushed after the ib_cqe
structs pointed-to by the flushed WRs have been freed (since they are
part of the nvme_rdma_request struct).

The fix is to stop and free the queues in nvmf_connect_io_queues()
if there is an error connecting any of the queues.

Signed-off-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-rdma: don't forget to delete a queue from the list of connection failed</title>
<updated>2016-11-14T00:08:52+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2016-11-06T09:09:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=766dbb179d41d6337fed2b3ca00caa5845d298ce'/>
<id>766dbb179d41d6337fed2b3ca00caa5845d298ce</id>
<content type='text'>
In case we accepted a queue connection and it failed, we might not
remove the queue from the list until we unload and clean it up.
We should delete it from the queue list on the relevant handler.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In case we accepted a queue connection and it failed, we might not
remove the queue from the list until we unload and clean it up.
We should delete it from the queue list on the relevant handler.

Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet: Don't queue fatal error work if csts.cfs is set</title>
<updated>2016-11-14T00:08:51+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2016-11-06T09:03:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8242ddac1bfcf6eb8873b4d0a4e7a172c2b5b625'/>
<id>8242ddac1bfcf6eb8873b4d0a4e7a172c2b5b625</id>
<content type='text'>
In the transport, in case of an interal queue error like
error completion in rdma we trigger a fatal error. However,
multiple queues in the same controller can serr error completions
and we don't want to trigger fatal error work more than once.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the transport, in case of an interal queue error like
error completion in rdma we trigger a fatal error. However,
multiple queues in the same controller can serr error completions
and we don't want to trigger fatal error work more than once.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-rdma: reject non-connect commands before the queue is live</title>
<updated>2016-11-14T00:08:51+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2016-11-02T14:49:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=553cd9ef82edd811948782a8f73ae73c4bfeedd3'/>
<id>553cd9ef82edd811948782a8f73ae73c4bfeedd3</id>
<content type='text'>
If we reconncect we might have command queue up that get resent as soon
as the queue is restarted.  But until the connect command succeeded we
can't send other command.  Add a new flag that marks a queue as live when
connect finishes, and delay any non-connect command until the queue is
live based on it.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reported-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Tested-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
[sagig: fixes admin queue LIVE setting]
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we reconncect we might have command queue up that get resent as soon
as the queue is restarted.  But until the connect command succeeded we
can't send other command.  Add a new flag that marks a queue as live when
connect finishes, and delay any non-connect command until the queue is
live based on it.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reported-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
Tested-by: Steve Wise &lt;swise@opengridcomputing.com&gt;
[sagig: fixes admin queue LIVE setting]
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-rdma: Fix possible NULL deref when handling rdma cm events</title>
<updated>2016-11-14T00:08:50+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bart.vanassche@sandisk.com</email>
</author>
<published>2016-11-01T16:36:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fa14a0acea1ffe67913ba384a2897130a36dfe03'/>
<id>fa14a0acea1ffe67913ba384a2897130a36dfe03</id>
<content type='text'>
When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id-&gt;qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id-&gt;qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id-&gt;qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id-&gt;qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche &lt;bart.vanassche@sandisk.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>lightnvm: invalid offset calculation for lba_shift</title>
<updated>2016-11-12T01:27:32+00:00</updated>
<author>
<name>Matias Bjørling</name>
<email>m@bjorling.me</email>
</author>
<published>2016-11-10T11:26:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=409ae5a76e0505c8ffe1424f9c00dbf2ec7b5eea'/>
<id>409ae5a76e0505c8ffe1424f9c00dbf2ec7b5eea</id>
<content type='text'>
The ns-&gt;lba_shift assumes its value to be the logarithmic of the
LA size. A previous patch duplicated the lba_shift calculation into
lightnvm. It prematurely also subtracted a 512byte shift, which commonly
is applied per-command. The 512byte shift being subtracted twice led to
data loss when restoring the logical to physical mapping table from
device and when issuing I/O commands using rrpc.

Fix offset by removing the 512byte shift subtraction when calculating
lba_shift.

Fixes: b0b4e09c1ae7 "lightnvm: control life of nvm_dev in driver"
Reported-by: Javier González &lt;javier@cnexlabs.com&gt;
Signed-off-by: Matias Bjørling &lt;m@bjorling.me&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ns-&gt;lba_shift assumes its value to be the logarithmic of the
LA size. A previous patch duplicated the lba_shift calculation into
lightnvm. It prematurely also subtracted a 512byte shift, which commonly
is applied per-command. The 512byte shift being subtracted twice led to
data loss when restoring the logical to physical mapping table from
device and when issuing I/O commands using rrpc.

Fix offset by removing the 512byte shift subtraction when calculating
lba_shift.

Fixes: b0b4e09c1ae7 "lightnvm: control life of nvm_dev in driver"
Reported-by: Javier González &lt;javier@cnexlabs.com&gt;
Signed-off-by: Matias Bjørling &lt;m@bjorling.me&gt;
Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
