linux-toradex.git/include/linux/iova.h, branch v4.16-rc4

iommu/iova: Make rcache flush optional on IOVA allocation failure

2017-10-12T12:18:02+00:00

Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
  rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
  Thus we may flush our rcaches and still get no new IOVA like in the
  commonly used scenario:

    if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
        iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift);

    if (!iova)
        iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift);

   1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
      PCI devices a SAC address
   2. alloc_iova() fails due to full 32-bit space
   3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
      throws entries away for nothing and alloc_iova() fails again
   4. Next alloc_iova_fast() call cannot take advantage of rcache since we
      have just defeated caches. In this case we pick the slowest option
      to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki 
Reviewed-by: Robin Murphy 
Tested-by: Nate Watterson 
Signed-off-by: Joerg Roedel

iommu/iova: Add rbtree anchor node

2017-09-27T15:09:57+00:00

Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.

Signed-off-by: Robin Murphy 
Signed-off-by: Joerg Roedel

iommu/iova: Make dma_32bit_pfn implicit

2017-09-27T15:09:57+00:00

Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.

CC: Thierry Reding 
CC: Jonathan Hunter 
CC: David Airlie 
CC: Sudeep Dutt 
CC: Ashutosh Dixit 
Signed-off-by: Zhen Lei 
Tested-by: Ard Biesheuvel 
Tested-by: Zhen Lei 
Tested-by: Nate Watterson 
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy 
Signed-off-by: Joerg Roedel

iommu/iova: Extend rbtree node caching

2017-09-27T15:09:57+00:00

The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei .

Tested-by: Ard Biesheuvel 
Tested-by: Zhen Lei 
Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
Signed-off-by: Joerg Roedel

iommu/iova: Add flush timer

2017-08-15T16:23:52+00:00

Add a timer to flush entries from the Flush-Queues every
10ms. This makes sure that no stale TLB entries remain for
too long after an IOVA has been unmapped.

Signed-off-by: Joerg Roedel

iommu/iova: Add locking to Flush-Queues

2017-08-15T16:23:52+00:00

The lock is taken from the same CPU most of the time. But
having it allows to flush the queue also from another CPU if
necessary.

This will be used by a timer to regularily flush any pending
IOVAs from the Flush-Queues.

Signed-off-by: Joerg Roedel

iommu/iova: Add flush counters to Flush-Queue implementation

2017-08-15T16:23:51+00:00

There are two counters:

	* fq_flush_start_cnt  - Increased when a TLB flush
	                        is started.

	* fq_flush_finish_cnt - Increased when a TLB flush
				is finished.

The fq_flush_start_cnt is assigned to every Flush-Queue
entry on its creation. When freeing entries from the
Flush-Queue, the value in the entry is compared to the
fq_flush_finish_cnt. The entry can only be freed when its
value is less than the value of fq_flush_finish_cnt.

The reason for these counters it to take advantage of IOMMU
TLB flushes that happened on other CPUs. These already
flushed the TLB for Flush-Queue entries on other CPUs so
that they can already be freed without flushing the TLB
again.

This makes it less likely that the Flush-Queue is full and
saves IOMMU TLB flushes.

Signed-off-by: Joerg Roedel

iommu/iova: Implement Flush-Queue ring buffer

2017-08-15T16:23:51+00:00

Add a function to add entries to the Flush-Queue ring
buffer. If the buffer is full, call the flush-callback and
free the entries.

Signed-off-by: Joerg Roedel

iommu/iova: Add flush-queue data structures

2017-08-15T16:23:50+00:00

This patch adds the basic data-structures to implement
flush-queues in the generic IOVA code. It also adds the
initialization and destroy routines for these data
structures.

The initialization routine is designed so that the use of
this feature is optional for the users of IOVA code.

Signed-off-by: Joerg Roedel

iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m

2017-03-22T23:06:17+00:00

The #ifdef in iova.h only catches the CONFIG_IOMMU_IOVA=y
case, so that compilation as a module fails with duplicate
function definition errors. Fix it by catching both cases in
the #if.

Signed-off-by: Joerg Roedel