summaryrefslogtreecommitdiff
path: root/drivers/block/nvme-core.c
AgeCommit message (Collapse)Author
2014-02-05Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds
Pull NVMe driver update from Matthew Wilcox: "Looks like I missed the merge window ... but these are almost all bugfixes anyway (the ones that aren't have been baking for months)" * git://git.infradead.org/users/willy/linux-nvme: NVMe: Namespace use after free on surprise removal NVMe: Correct uses of INIT_WORK NVMe: Include device and queue numbers in interrupt name NVMe: Add a pci_driver shutdown method NVMe: Disable admin queue on init failure NVMe: Dynamically allocate partition numbers NVMe: Async IO queue deletion NVMe: Surprise removal handling NVMe: Abort timed out commands NVMe: Schedule reset for failed controllers NVMe: Device resume error handling NVMe: Cache dev->pci_dev in a local pointer NVMe: Fix lockdep warnings NVMe: compat SG_IO ioctl NVMe: remove deprecated IRQF_DISABLED NVMe: Avoid shift operation when writing cq head doorbell
2014-02-02NVMe: Namespace use after free on surprise removalKeith Busch
An nvme block device may have open references when the device is removed. New commands may still be sent on the removed device, so we need to ref count the opens, return errors for new commands, and not free the namespace and nvme_dev until all references are closed. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-29NVMe: Correct uses of INIT_WORKMatthew Wilcox
We need to initialise the work_struct when we initialise the rest of the struct nvme_dev, otherwise we'll hit a lockdep warning when we remove the device. Use PREPARE_WORK to change the function pointer instead of INIT_WORK. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Include device and queue numbers in interrupt nameMatthew Wilcox
On larger systems with many drives, it may help debugging to know which queue is tied to which interrupt, just by looking at /proc/interrupts. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Add a pci_driver shutdown methodKeith Busch
We need to shut down the device cleanly when the system is being shut down. This was in an earlier patch but was inadvertently lost during a rewrite. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Disable admin queue on init failureKeith Busch
Disable the admin queue if device fails during initialization so the queue's irq is freed. Signed-off-by: Keith Busch <keith.busch@intel.com> [rewritten to use nvme_free_queues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Dynamically allocate partition numbersMatthew Wilcox
Some users need more than 64 partitions per device. Rather than simply increasing the number of partitions, switch to the dynamic partition allocation scheme. This means that minor numbers are not stable across boots, but since major numbers aren't either, I cannot see this being a significant problem. Tested-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Async IO queue deletionKeith Busch
This attempts to delete all IO queues at the same time asynchronously on shutdown. This is necessary for a present device that is not responding; a shutdown operation previously would take 2 minutes per queue-pair to timeout before moving on to the next queue, making a device removal appear to take a very long time or "hung" as reported by users. In the previous worst case, a removal may be stuck forever until a kill signal is given if there are more than 32 queue pairs since it would run out of admin command IDs after over an hour of timed out sync commands (admin queue depth is 64). This patch will wait for the admin command timeout for all commands to complete, so the worst case now for an unresponsive controller is 60 seconds, though that still seems like a long time. Since this adds another way to take queues offline, some duplicate code resulted so I moved these into more convienient functions. Signed-off-by: Keith Busch <keith.busch@intel.com> [make functions static, correct line length and whitespace issues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Surprise removal handlingKeith Busch
This adds checks to see if the nvme pci device was removed. The check reads the status register for the value of -1, which it should never be unless the device is no longer present. If a user performs a surprise removal on an nvme device, the driver will be notified either by the pci driver remove callback if the platform's slot is capable of this event, or via reading the device BAR status register, which will indicate controller failure and trigger a reset. Either way, the device is not present so all outstanding commands would timeout. This will not send queue deletion commands to a drive that isn't present and fail after ioremap, significantly speeding up surprise removal; previously this took over 2 minutes per IO queue pair created, but this will complete removing the device within a few seconds. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Abort timed out commandsKeith Busch
Send nvme abort command to io requests that have timed out on an initialized device. If the command is not returned after another timeout, schedule the controller for reset. Signed-off-by: Keith Busch <keith.busch@intel.com> [fix endianness issues] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-01-27NVMe: Schedule reset for failed controllersKeith Busch
Schedules a controller reset when it indicates it has a failed status. If the device does not become ready after a reset, the pci device will be scheduled for removal. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: Device resume error handlingKeith Busch
Adds controller error handling on resume power management. If the device fails to initialize, the device is queued for a reset. If the reset fails, a thread is spawned to remove the pci device. If the device resumes as "busy", the device is responding to admin commands but will not create IO queues. In this case, we need to remove the gendisks and free the IO queues since they can't be used and may be holding bios in their lists. From testing, the dma pools require a pci device so this had to change the pci driver 'remove' to release the dma resources in line with that call instead of after all references to the device are released. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: Cache dev->pci_dev in a local pointerMatthew Wilcox
Helps with line-length issues Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: Fix lockdep warningsMatthew Wilcox
During the initialisation path, the queue lock is taken without interrupt protection. It's perfectly safe to do so, because the interrupt handler can't run at this point, but it confuses lockdep. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-12-16NVMe: compat SG_IO ioctlKeith Busch
For 32-bit versions of sg3-utils running on a 64-bit system. This is mostly a copy from the relevent portions of fs/compat_ioctl.c, with slight modifications for going through block_device_operations. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> [fixed up CONFIG_COMPAT=n build problems] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-11-23block: Introduce new bio_split()Kent Overstreet
The new bio_split() can split arbitrary bios - it's not restricted to single page bios, like the old bio_split() (previously renamed to bio_pair_split()). It also has different semantics - it doesn't allocate a struct bio_pair, leaving it up to the caller to handle completions. Then convert the existing bio_pair_split() users to the new bio_split() - and also nvme, which was open coding bio splitting. (We have to take that BUG_ON() out of bio_integrity_trim() because this bio_split() needs to use it, and there's no reason it has to be used on bios marked as cloned; BIO_CLONED doesn't seem to have clearly documented semantics anyways.) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Neil Brown <neilb@suse.de>
2013-11-23block: Convert bio_for_each_segment() to bvec_iterKent Overstreet
More prep work for immutable biovecs - with immutable bvecs drivers won't be able to use the biovec directly, they'll need to use helpers that take into account bio->bi_iter.bi_bvec_done. This updates callers for the new usage without changing the implementation yet. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Jim Paris <jim@jtan.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com> Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com> Cc: support@lsi.com Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Cc: linux-m68k@lists.linux-m68k.org Cc: linuxppc-dev@lists.ozlabs.org Cc: drbd-user@lists.linbit.com Cc: nbd-general@lists.sourceforge.net Cc: cbe-oss-dev@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Cc: virtualization@lists.linux-foundation.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: DL-MPTFusionLinux@lsi.com Cc: linux-scsi@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: linux-fsdevel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: linux-mm@kvack.org Acked-by: Geoff Levand <geoff@infradead.org>
2013-11-23block: Abstract out bvec iteratorKent Overstreet
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-11-18NVMe: remove deprecated IRQF_DISABLEDMichael Opdenacker
This patch proposes to remove the use of the IRQF_DISABLED flag It's a NOOP since 2.6.35 and it will be removed one day. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-11-18NVMe: Avoid shift operation when writing cq head doorbellHaiyan Hu
Changes the type of dev->db_stride to unsigned and changes the value stored there to be 1 << the current value. Then there is less calculation to be done at completion time. Signed-off-by: Haiyan Hu <huhaiyan@huawei.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-21DMA-API: block: nvme-core: replace dma_set_mask()+dma_set_coherent_mask() ↵Russell King
with new helper Replace the following sequence: dma_set_mask(dev, mask); dma_set_coherent_mask(dev, mask); with a call to the new helper dma_set_mask_and_coherent(). Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-09-06NVMe: Merge issue on character device bring-upKeith Busch
A recent patch made it possible to bring up the character handle when the device is responsive but not accepting a set-features command. Another recent patch moved the initialization that requires we move where the checks for this condition occur. This patch merges these two ideas so it works much as before. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Handle ioremap failureKeith Busch
Decrement the number of queues required for doorbell remapping until the memory is successfully mapped for that size. Additional checks are done so that we don't call free_irq if it has already been freed. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Add pci suspend/resume driver callbacksKeith Busch
Used for going in and out of low power states. Resuming reuses the IO queues from the previous initialization, freeing any allocated queues that are no longer usable. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Use normal shutdownKeith Busch
The NVMe spec recommends using the shutdown normal sequence when safely taking the controller offline instead of hitting CC.EN on the next start-up to reset the controller. The spec recommends a minimum of 1 second for the shutdown complete. This patch waits 2 seconds to be on the safe side. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Separate controller init from disk discoveryKeith Busch
This combines the controller initialization into one function, removing IO queue setup from namespace discovery, and creates symetric functions for device removal. The controller start and shutdown functions can now be called from resume/suspend context as well as probe/remove. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Separate queue alloc/free from create/deleteKeith Busch
This separates nvme queue allocation from creation, and queue deletion from freeing. This is so that we may in the future temporarily disable queues and reuse the same memory when bringing them back online, like coming back from suspend state. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Group pci related actions in functionsKeith Busch
This will make it easier to reuse these outside probe/remove. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Disk stats for read/write commands onlyKeith Busch
Flush and discard requests would previously mess up the accounting. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-09-03NVMe: Bring up cdev on set feature failureKeith Busch
This patch creates the character device as long as a device's admin queues are usable so a user has an opprotunity to perform administration tasks. A device may be in a state that does not allow IO and setting the queue count feature in such a state returns an error. Previously the driver would bail and the controller would be unusable. Signed-off-by: Keith Busch <keith.busch@intel.com>
2013-09-03NVMe: Fix checkpatch issuesKeith Busch
Signed-off-by: Keith Busch <keith.busch@intel.com>
2013-09-03NVMe: Namespace IDs are unsignedMatthew Wilcox
The 'Number of Namespaces' read from the device was being treated as signed, which would cause us to not scan any namespaces for a device with more than 2 billion namespaces. That led to noticing that the namespace ID was also being treated as signed, which could lead to the result from NVME_IOCTL_ID being treated as an error code. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-06-24NVMe: Call nvme_process_cq from submission pathMatthew Wilcox
Since we have the queue locked, it makes sense to check if there are any completion queue entries on the queue before we release the lock. If there are, it may save an interrupt and reduce latency for the I/Os that happened to complete. This happens fairly often for some workloads. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-06-24NVMe: Remove "process_cq did something" messageMatthew Wilcox
I was originally intending to log the fact that the kthread had done some work since it might help us find interrupt handling problems, but that hasn't been done yet, and spamming the logs with this message is just rude. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-06-24NVMe: Return correct value from interrupt handlerMatthew Wilcox
The interrupt handler currently reports whether it found any new completion queue entries. If the completion queue is primarily being processed by a method other than the interrupt handler, it may return IRQ_NONE so often that Linux thinks that the interrupt is being falsely triggered. To solve this problem, report whether any completion queue entries have been seen since the last interrupt was received for this queue. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-06-20NVMe: Disk IO statisticsKeith Busch
Add io stats accounting for bio requests so nvme block devices show useful disk stats. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-06-20NVMe: Restructure MSI / MSI-X setupMatthew Wilcox
The current code copies 'nr_io_queues' into 'q_count', modifies 'nr_io_queues' during MSI-X setup, then resets 'nr_io_queues' for MSI setup. Instead, copy 'nr_io_queues' into 'vecs' and modify 'vecs' during both MSI-X and MSI setup. This lets us simplify the for-loops that set up MSI-X and MSI, and opens the possibility of using more I/O queues than we have interrupt vectors, should future benchmarking prove that to be a useful feature. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-31NVMe: Add MSI supportRamachandra Rao Gajula
Some devices only have support for MSI, not MSI-X. While MSI is more limited, it still provides better performance than line-based interrupts. Signed-off-by: Ramachandra Gajula <rama@fastorsystems.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-28NVMe: Use dma_set_mask() correctlyMatthew Wilcox
In some circumstances setting a 64-bit DMA mask can fail, as explained in Documentation/DMA-API-HOWTO.txt. Use the recommended code sequence to set a 32-bit DMA mask if setting a 64-bit DMA mask fails. Reported-by: Chayan Biswas <Chayan.Biswas@sandisk.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-23Return the result from user admin command IOCTL even in case of failureChayan Biswas
We copy the result to user if the command is completed from the controller even if it completes with failure (non-zero) status. A return status of < 0 indicates the command was not completed by the controller. The user application may expect the error code in the result field in case of failure. Signed-off-by: Chayan Biswas <Chayan.Biswas@sandisk.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-17NVMe: Do not cancel command multiple timesKeith Busch
Cancelling an already cancelled command does not do anything, so check the command context before cancelling it, continuing if had already been cancelled so we do not log the same problem every second if a device stops responding. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-17NVMe: fix error return code in nvme_submit_bio_queue()Wei Yongjun
nvme_submit_flush_data() might overwrite the initialisation of the return value with 0, so move the -ENOMEM setting close to the usage. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-17NVMe: check for integer overflow in nvme_map_user_pages()Dan Carpenter
You need to have CAP_SYS_ADMIN to trigger this overflow but it makes the static checkers complain so we should fix it. The worry is that "length" comes from copy_from_user() so we need to check that "length + offset" can't overflow. I also changed the min_t() cast to be unsigned instead of signed. Now that we cap "length" to INT_MAX it doesn't make a difference, but it's a little easier for reviewers to know that large values aren't cast to negative. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-09NVMe: Use user defined admin ioctl timeoutKeith Busch
Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-08NVMe: Only clear the enable bit when disabling controllerMatthew Wilcox
Many of the bits in the Controller Configuration register may only be modified when the Enable bit is clear. Clearing them at the same time as the Enable bit might be OK, but let's play it safe and only touch the Enable bit. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
2013-05-08NVMe: Wait for device to acknowledge shutdownMatthew Wilcox
A recent update to the specification makes it clear that the host is expected to wait for the device to acknowledge the Enable bit transitioning to 0 as well as waiting for the device to acknowledge a transition to 1. Reported-by: Khosrow Panah <Khosrow.Panah@idt.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
2013-05-02NVMe: Schedule timeout for sync commandsKeith Busch
Schedule a timeout on sync commands in case the command times out and the device is not being polled for timeouts. This prevents device removal from hanging forever if the device has stopped responding. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-02NVMe: Meta-data support in NVME_IOCTL_SUBMIT_IOKeith Busch
This adds support for namespaces with separate meta-data formats in the submit io ioctl. The meta-data buffer has to be a contiguous, so such a buffer is allocated and the mapped user pages are copied to/from this buffer for write/read commands. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-02NVMe: Device specific stripe size handlingKeith Busch
We have an nvme device that has a concept of a stripe size. IO requests that do not transfer data crossing a stripe boundary has greater performance compared to IO that does cross it. This patch sets the stripe size for the device if the device and vendor ids match one with this feature and splits IO requests that cross the stripe boundary. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-02NVMe: Split non-mergeable bio requestsKeith Busch
It is possible a bio request can not be submitted as a single NVMe IO command if the bio_vec is not mergeable with the NVMe PRP alignement constraints. This condition was handled by submitting an IO for the mergeable portion then submitting a follow on IO for the remaining data after the previous IO completes. The remainder to be sent was tracked by manipulating the bio->bi_idx and bio->bi_sector. This patch splits the request as many times as necessary and submits the bios together. Since submitting the bio may cause it to be requeued on split, nvme_resubmit_bios had to be modified to remove the wait queue when the bio list is empty prior to submitting the bio since a split would have added the wait queue a second time, corrupting the wait queue head task list. There are a few other benefits from doing this: it fixes a potential issue with the previous handling of a non-mergeable bio as the requeuing method could would use an unlocked nvme_queue if the callback isn't invoked on the queue's associated cpu; it will be possible to retry a failed bio if desired at some later time since it does not manipulate the original bio; the bio integrity extensions require the bio to be in its original condition for the checks to work correctly if we implement the end-to-end data protection in the future. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>