<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/include/linux/iova.h, branch v4.16-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>iommu/iova: Make rcache flush optional on IOVA allocation failure</title>
<updated>2017-10-12T12:18:02+00:00</updated>
<author>
<name>Tomasz Nowicki</name>
<email>tomasz.nowicki@caviumnetworks.com</email>
</author>
<published>2017-09-20T08:52:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=538d5b333216c3daa7a5821307164f10af73ec8c'/>
<id>538d5b333216c3daa7a5821307164f10af73ec8c</id>
<content type='text'>
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
  rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
  Thus we may flush our rcaches and still get no new IOVA like in the
  commonly used scenario:

    if (dma_limit &gt; DMA_BIT_MASK(32) &amp;&amp; dev_is_pci(dev))
        iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) &gt;&gt; shift);

    if (!iova)
        iova = alloc_iova_fast(iovad, iova_len, dma_limit &gt;&gt; shift);

   1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
      PCI devices a SAC address
   2. alloc_iova() fails due to full 32-bit space
   3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
      throws entries away for nothing and alloc_iova() fails again
   4. Next alloc_iova_fast() call cannot take advantage of rcache since we
      have just defeated caches. In this case we pick the slowest option
      to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki &lt;Tomasz.Nowicki@caviumnetworks.com&gt;
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since IOVA allocation failure is not unusual case we need to flush
CPUs' rcache in hope we will succeed in next round.

However, it is useful to decide whether we need rcache flush step because
of two reasons:
- Not scalability. On large system with ~100 CPUs iterating and flushing
  rcache for each CPU becomes serious bottleneck so we may want to defer it.
- free_cpu_cached_iovas() does not care about max PFN we are interested in.
  Thus we may flush our rcaches and still get no new IOVA like in the
  commonly used scenario:

    if (dma_limit &gt; DMA_BIT_MASK(32) &amp;&amp; dev_is_pci(dev))
        iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) &gt;&gt; shift);

    if (!iova)
        iova = alloc_iova_fast(iovad, iova_len, dma_limit &gt;&gt; shift);

   1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get
      PCI devices a SAC address
   2. alloc_iova() fails due to full 32-bit space
   3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas()
      throws entries away for nothing and alloc_iova() fails again
   4. Next alloc_iova_fast() call cannot take advantage of rcache since we
      have just defeated caches. In this case we pick the slowest option
      to proceed.

This patch reworks flushed_rcache local flag to be additional function
argument instead and control rcache flush step. Also, it updates all users
to do the flush as the last chance.

Signed-off-by: Tomasz Nowicki &lt;Tomasz.Nowicki@caviumnetworks.com&gt;
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Add rbtree anchor node</title>
<updated>2017-09-27T15:09:57+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2017-09-21T15:52:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=bb68b2fbfbd643d4407541f9c7a16a2c9b3a57c7'/>
<id>bb68b2fbfbd643d4407541f9c7a16a2c9b3a57c7</id>
<content type='text'>
Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a permanent dummy IOVA reservation to the rbtree, such that we can
always access the top of the address space instantly. The immediate
benefit is that we remove the overhead of the rb_last() traversal when
not using the cached node, but it also paves the way for further
simplifications.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Make dma_32bit_pfn implicit</title>
<updated>2017-09-27T15:09:57+00:00</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2017-09-21T15:52:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=aa3ac9469c1850ed00741955b975c3a19029763a'/>
<id>aa3ac9469c1850ed00741955b975c3a19029763a</id>
<content type='text'>
Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.

CC: Thierry Reding &lt;thierry.reding@gmail.com&gt;
CC: Jonathan Hunter &lt;jonathanh@nvidia.com&gt;
CC: David Airlie &lt;airlied@linux.ie&gt;
CC: Sudeep Dutt &lt;sudeep.dutt@intel.com&gt;
CC: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Tested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that the cached node optimisation can apply to all allocations, the
couple of users which were playing tricks with dma_32bit_pfn in order to
benefit from it can stop doing so. Conversely, there is also no need for
all the other users to explicitly calculate a 'real' 32-bit PFN, when
init_iova_domain() can happily do that itself from the page granularity.

CC: Thierry Reding &lt;thierry.reding@gmail.com&gt;
CC: Jonathan Hunter &lt;jonathanh@nvidia.com&gt;
CC: David Airlie &lt;airlied@linux.ie&gt;
CC: Sudeep Dutt &lt;sudeep.dutt@intel.com&gt;
CC: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Tested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
[rm: use iova_shift(), rewrote commit message]
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Extend rbtree node caching</title>
<updated>2017-09-27T15:09:57+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2017-09-21T15:52:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e60aa7b53845a261dd419652f12ab9f89e668843'/>
<id>e60aa7b53845a261dd419652f12ab9f89e668843</id>
<content type='text'>
The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei &lt;thunder.leizhen@huawei.com&gt;.

Tested-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Tested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cached node mechanism provides a significant performance benefit for
allocations using a 32-bit DMA mask, but in the case of non-PCI devices
or where the 32-bit space is full, the loss of this benefit can be
significant - on large systems there can be many thousands of entries in
the tree, such that walking all the way down to find free space every
time becomes increasingly awful.

Maintain a similar cached node for the whole IOVA space as a superset of
the 32-bit space so that performance can remain much more consistent.

Inspired by work by Zhen Lei &lt;thunder.leizhen@huawei.com&gt;.

Tested-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Tested-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Tested-by: Nate Watterson &lt;nwatters@codeaurora.org&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Add flush timer</title>
<updated>2017-08-15T16:23:52+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-08-10T14:58:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=9a005a800ae817c2c90ef117d7cd77614d866777'/>
<id>9a005a800ae817c2c90ef117d7cd77614d866777</id>
<content type='text'>
Add a timer to flush entries from the Flush-Queues every
10ms. This makes sure that no stale TLB entries remain for
too long after an IOVA has been unmapped.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a timer to flush entries from the Flush-Queues every
10ms. This makes sure that no stale TLB entries remain for
too long after an IOVA has been unmapped.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Add locking to Flush-Queues</title>
<updated>2017-08-15T16:23:52+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-08-10T14:31:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=8109c2a2f8463852dddd6a1c3fcf262047c0c124'/>
<id>8109c2a2f8463852dddd6a1c3fcf262047c0c124</id>
<content type='text'>
The lock is taken from the same CPU most of the time. But
having it allows to flush the queue also from another CPU if
necessary.

This will be used by a timer to regularily flush any pending
IOVAs from the Flush-Queues.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The lock is taken from the same CPU most of the time. But
having it allows to flush the queue also from another CPU if
necessary.

This will be used by a timer to regularily flush any pending
IOVAs from the Flush-Queues.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Add flush counters to Flush-Queue implementation</title>
<updated>2017-08-15T16:23:51+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-08-10T14:14:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fb418dab8a4f01dde0c025d15145c589ec02796b'/>
<id>fb418dab8a4f01dde0c025d15145c589ec02796b</id>
<content type='text'>
There are two counters:

	* fq_flush_start_cnt  - Increased when a TLB flush
	                        is started.

	* fq_flush_finish_cnt - Increased when a TLB flush
				is finished.

The fq_flush_start_cnt is assigned to every Flush-Queue
entry on its creation. When freeing entries from the
Flush-Queue, the value in the entry is compared to the
fq_flush_finish_cnt. The entry can only be freed when its
value is less than the value of fq_flush_finish_cnt.

The reason for these counters it to take advantage of IOMMU
TLB flushes that happened on other CPUs. These already
flushed the TLB for Flush-Queue entries on other CPUs so
that they can already be freed without flushing the TLB
again.

This makes it less likely that the Flush-Queue is full and
saves IOMMU TLB flushes.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are two counters:

	* fq_flush_start_cnt  - Increased when a TLB flush
	                        is started.

	* fq_flush_finish_cnt - Increased when a TLB flush
				is finished.

The fq_flush_start_cnt is assigned to every Flush-Queue
entry on its creation. When freeing entries from the
Flush-Queue, the value in the entry is compared to the
fq_flush_finish_cnt. The entry can only be freed when its
value is less than the value of fq_flush_finish_cnt.

The reason for these counters it to take advantage of IOMMU
TLB flushes that happened on other CPUs. These already
flushed the TLB for Flush-Queue entries on other CPUs so
that they can already be freed without flushing the TLB
again.

This makes it less likely that the Flush-Queue is full and
saves IOMMU TLB flushes.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Implement Flush-Queue ring buffer</title>
<updated>2017-08-15T16:23:51+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-08-10T13:49:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=1928210107edd4fa786199fef6b875d3af3bef88'/>
<id>1928210107edd4fa786199fef6b875d3af3bef88</id>
<content type='text'>
Add a function to add entries to the Flush-Queue ring
buffer. If the buffer is full, call the flush-callback and
free the entries.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a function to add entries to the Flush-Queue ring
buffer. If the buffer is full, call the flush-callback and
free the entries.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Add flush-queue data structures</title>
<updated>2017-08-15T16:23:50+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-08-10T12:44:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=42f87e71c3df12d8f29ec1bb7b47772ffaeaf1ee'/>
<id>42f87e71c3df12d8f29ec1bb7b47772ffaeaf1ee</id>
<content type='text'>
This patch adds the basic data-structures to implement
flush-queues in the generic IOVA code. It also adds the
initialization and destroy routines for these data
structures.

The initialization routine is designed so that the use of
this feature is optional for the users of IOVA code.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds the basic data-structures to implement
flush-queues in the generic IOVA code. It also adds the
initialization and destroy routines for these data
structures.

The initialization routine is designed so that the use of
this feature is optional for the users of IOVA code.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m</title>
<updated>2017-03-22T23:06:17+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>jroedel@suse.de</email>
</author>
<published>2017-03-22T23:06:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=b4d8c7aea15efa8c6272c58d78296f8b017c4c6a'/>
<id>b4d8c7aea15efa8c6272c58d78296f8b017c4c6a</id>
<content type='text'>
The #ifdef in iova.h only catches the CONFIG_IOMMU_IOVA=y
case, so that compilation as a module fails with duplicate
function definition errors. Fix it by catching both cases in
the #if.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The #ifdef in iova.h only catches the CONFIG_IOMMU_IOVA=y
case, so that compilation as a module fails with duplicate
function definition errors. Fix it by catching both cases in
the #if.

Signed-off-by: Joerg Roedel &lt;jroedel@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
