linux-toradex.git/drivers/nvme, branch v4.9.16

nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

2017-01-19T19:18:05+00:00

commit b5a10c5f7532b7473776da87e67f8301bbc32693 upstream.

Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: Andrew Byrne 
Reported-by: Jaime A. H. Gomez 
Reported-by: Zachary D. Myers 
Signed-off-by: Guilherme G. Piccoli 
Acked-by: Jeffrey Lien 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Greg Kroah-Hartman

nvmet: Fix possible infinite loop triggered on hot namespace removal

2017-01-06T09:40:14+00:00

commit e4fcf07cca6a3b6c4be00df16f08be894325eaa3 upstream.

When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.

The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.

By changing that, we cannot rely on ns->dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.

Signed-off-by: Sagi Grimberg 
Signed-off-by: Solganik Alexander 
Signed-off-by: Greg Kroah-Hartman

nvme/pci: Don't free queues on error

2016-11-16T19:39:57+00:00

The nvme_remove function tears down all allocated resources in the correct
order, so no need to free queues on error during initialization. This
fixes possible use-after-free errors when queues are still associated
with a blk-mq hctx.

Reported-by: Scott Bauer 
Tested-by: Scott Bauer 
Signed-off-by: Keith Busch 
Reviewed-by: Sagi Grimberg 
Reviewed-by: Christoph Hellwig 
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe

nvmet-rdma: drain the queue-pair just before freeing it

2016-11-14T00:08:53+00:00

draining the qp right after disconnect might not suffice because
the nvmet sq is not fully drained (in nvmet_sq_destroy) and we might
see completions after the drain. Instead, drain right before the
qp destroy which comes after the sq destruction and we can be sure
that no posts come after the drain.

Tested-by: Steve Wise 
Signed-off-by: Sagi Grimberg

nvme-rdma: stop and free io queues on connect failure

2016-11-14T00:08:53+00:00

While testing nvme-rdma with the spdk nvmf target over iw_cxgb4, I
configured the target (mistakenly) to generate an error creating the
NVMF IO queues.  This resulted a "Invalid SQE Parameter" error sent back
to the host on the first IO queue connect:

[ 9610.928182] nvme nvme1: queue_size 128 > ctrl maxcmd 120, clamping down
[ 9610.938745] nvme nvme1: creating 32 I/O queues.

So nvmf_connect_io_queue() returns an error to
nvmf_connect_io_queue() / nvmf_connect_io_queues(), and that
is returned to nvme_rdma_create_io_queues().  In the error path,
nvmf_rdma_create_io_queues() frees the queue tagset memory _before_
stopping and freeing the IB queues, which causes yet another
touch-after-free crash due to SQ CQEs being flushed after the ib_cqe
structs pointed-to by the flushed WRs have been freed (since they are
part of the nvme_rdma_request struct).

The fix is to stop and free the queues in nvmf_connect_io_queues()
if there is an error connecting any of the queues.

Signed-off-by: Steve Wise 
Signed-off-by: Sagi Grimberg

nvmet-rdma: don't forget to delete a queue from the list of connection failed

2016-11-14T00:08:52+00:00

In case we accepted a queue connection and it failed, we might not
remove the queue from the list until we unload and clean it up.
We should delete it from the queue list on the relevant handler.

Signed-off-by: Sagi Grimberg

nvmet: Don't queue fatal error work if csts.cfs is set

2016-11-14T00:08:51+00:00

In the transport, in case of an interal queue error like
error completion in rdma we trigger a fatal error. However,
multiple queues in the same controller can serr error completions
and we don't want to trigger fatal error work more than once.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Sagi Grimberg

nvme-rdma: reject non-connect commands before the queue is live

2016-11-14T00:08:51+00:00

If we reconncect we might have command queue up that get resent as soon
as the queue is restarted.  But until the connect command succeeded we
can't send other command.  Add a new flag that marks a queue as live when
connect finishes, and delay any non-connect command until the queue is
live based on it.

Signed-off-by: Christoph Hellwig 
Reported-by: Steve Wise 
Tested-by: Steve Wise 
[sagig: fixes admin queue LIVE setting]
Signed-off-by: Sagi Grimberg

nvmet-rdma: Fix possible NULL deref when handling rdma cm events

2016-11-14T00:08:50+00:00

When we initiate queue teardown sequence we call rdma_destroy_qp
which clears cm_id->qp, afterwards we call rdma_destroy_id, but
we might see a rdma_cm event in between with a cleared cm_id->qp
so watch out for that and silently ignore the event because this
means that the queue teardown sequence is in progress.

Signed-off-by: Bart Van Assche 
Signed-off-by: Sagi Grimberg

lightnvm: invalid offset calculation for lba_shift

2016-11-12T01:27:32+00:00

The ns->lba_shift assumes its value to be the logarithmic of the
LA size. A previous patch duplicated the lba_shift calculation into
lightnvm. It prematurely also subtracted a 512byte shift, which commonly
is applied per-command. The 512byte shift being subtracted twice led to
data loss when restoring the logical to physical mapping table from
device and when issuing I/O commands using rrpc.

Fix offset by removing the 512byte shift subtraction when calculating
lba_shift.

Fixes: b0b4e09c1ae7 "lightnvm: control life of nvm_dev in driver"
Reported-by: Javier González 
Signed-off-by: Matias Bjørling 
Signed-off-by: Jens Axboe