<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/vfio, branch v4.17</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"</title>
<updated>2018-06-02T14:41:44+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-06-02T14:41:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=89c29def6b0101fff66a3d74d0178b844f88d732'/>
<id>89c29def6b0101fff66a3d74d0178b844f88d732</id>
<content type='text'>
Bisection by Amadeusz Sławiński implicates this commit leading to bad
page state issues after VM shutdown, likely due to unbalanced page
references.  The original commit was intended only as a performance
improvement, therefore revert for offline rework.

Link: https://lkml.org/lkml/2018/6/2/97
Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
Cc: Jason Cai (Xiang Feng) &lt;jason.cai@linux.alibaba.com&gt;
Reported-by: Amadeusz Sławiński &lt;amade@asmblr.net&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Bisection by Amadeusz Sławiński implicates this commit leading to bad
page state issues after VM shutdown, likely due to unbalanced page
references.  The original commit was intended only as a performance
improvement, therefore revert for offline rework.

Link: https://lkml.org/lkml/2018/6/2/97
Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
Cc: Jason Cai (Xiang Feng) &lt;jason.cai@linux.alibaba.com&gt;
Reported-by: Amadeusz Sławiński &lt;amade@asmblr.net&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio</title>
<updated>2018-04-07T02:44:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-07T02:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=f605ba97fb80522656c7dce9825a908f1e765b57'/>
<id>f605ba97fb80522656c7dce9825a908f1e765b57</id>
<content type='text'>
Pull VFIO updates from Alex Williamson:

 - Adopt iommu_unmap_fast() interface to type1 backend
   (Suravee Suthikulpanit)

 - mdev sample driver fixup (Shunyong Yang)

 - More efficient PFN mapping handling in type1 backend
   (Jason Cai)

 - VFIO device ioeventfd interface (Alex Williamson)

 - Tag new vfio-platform sub-maintainer (Alex Williamson)

* tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
  MAINTAINERS: vfio/platform: Update sub-maintainer
  vfio/pci: Add ioeventfd support
  vfio/pci: Use endian neutral helpers
  vfio/pci: Pull BAR mapping setup from read-write path
  vfio/type1: Improve memory pinning process for raw PFN mapping
  vfio-mdev/samples: change RDI interrupt condition
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull VFIO updates from Alex Williamson:

 - Adopt iommu_unmap_fast() interface to type1 backend
   (Suravee Suthikulpanit)

 - mdev sample driver fixup (Shunyong Yang)

 - More efficient PFN mapping handling in type1 backend
   (Jason Cai)

 - VFIO device ioeventfd interface (Alex Williamson)

 - Tag new vfio-platform sub-maintainer (Alex Williamson)

* tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
  MAINTAINERS: vfio/platform: Update sub-maintainer
  vfio/pci: Add ioeventfd support
  vfio/pci: Use endian neutral helpers
  vfio/pci: Pull BAR mapping setup from read-write path
  vfio/type1: Improve memory pinning process for raw PFN mapping
  vfio-mdev/samples: change RDI interrupt condition
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Add ioeventfd support</title>
<updated>2018-03-26T19:22:58+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-03-21T18:46:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=30656177c4080460b936709ff6648f201d7d2c1a'/>
<id>30656177c4080460b936709ff6648f201d7d2c1a</id>
<content type='text'>
The ioeventfd here is actually irqfd handling of an ioeventfd such as
supported in KVM.  A user is able to pre-program a device write to
occur when the eventfd triggers.  This is yet another instance of
eventfd-irqfd triggering between KVM and vfio.  The impetus for this
is high frequency writes to pages which are virtualized in QEMU.
Enabling this near-direct write path for selected registers within
the virtualized page can improve performance and reduce overhead.
Specifically this is initially targeted at NVIDIA graphics cards where
the driver issues a write to an MMIO register within a virtualized
region in order to allow the MSI interrupt to re-trigger.

Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ioeventfd here is actually irqfd handling of an ioeventfd such as
supported in KVM.  A user is able to pre-program a device write to
occur when the eventfd triggers.  This is yet another instance of
eventfd-irqfd triggering between KVM and vfio.  The impetus for this
is high frequency writes to pages which are virtualized in QEMU.
Enabling this near-direct write path for selected registers within
the virtualized page can improve performance and reduce overhead.
Specifically this is initially targeted at NVIDIA graphics cards where
the driver issues a write to an MMIO register within a virtualized
region in order to allow the MSI interrupt to re-trigger.

Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Use endian neutral helpers</title>
<updated>2018-03-26T19:22:58+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-03-21T18:46:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=07fd7ef3a1c25a11015bb5821c9c5982f722d4a2'/>
<id>07fd7ef3a1c25a11015bb5821c9c5982f722d4a2</id>
<content type='text'>
The iowriteXX/ioreadXX functions assume little endian hardware and
convert to little endian on a write and from little endian on a read.
We currently do our own explicit conversion to negate this.  Instead,
add some endian dependent defines to avoid all byte swaps.  There
should be no functional change other than big endian systems aren't
penalized with wasted swaps.

Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The iowriteXX/ioreadXX functions assume little endian hardware and
convert to little endian on a write and from little endian on a read.
We currently do our own explicit conversion to negate this.  Instead,
add some endian dependent defines to avoid all byte swaps.  There
should be no functional change other than big endian systems aren't
penalized with wasted swaps.

Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Pull BAR mapping setup from read-write path</title>
<updated>2018-03-26T19:22:58+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-03-21T18:46:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=0d77ed3589ac054d197ccde7231e36f9e032426c'/>
<id>0d77ed3589ac054d197ccde7231e36f9e032426c</id>
<content type='text'>
This creates a common helper that we'll use for ioeventfd setup.

Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This creates a common helper that we'll use for ioeventfd setup.

Reviewed-by: Peter Xu &lt;peterx@redhat.com&gt;
Reviewed-by: Eric Auger &lt;eric.auger@redhat.com&gt;
Reviewed-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: Improve memory pinning process for raw PFN mapping</title>
<updated>2018-03-22T19:18:27+00:00</updated>
<author>
<name>Jason Cai (Xiang Feng)</name>
<email>jason.cai@linux.alibaba.com</email>
</author>
<published>2018-03-22T04:52:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=356e88ebe4473a3663cf3d14727ce293a4526d34'/>
<id>356e88ebe4473a3663cf3d14727ce293a4526d34</id>
<content type='text'>
When using vfio to pass through a PCIe device (e.g. a GPU card) that
has a huge BAR (e.g. 16GB), a lot of cycles are wasted on memory
pinning because PFNs of PCI BAR are not backed by struct page, and
the corresponding VMA has flag VM_PFNMAP.

With this change, when pinning a region which is a raw PFN mapping,
it can skip unnecessary user memory pinning process, and thus, can
significantly improve VM's boot up time when passing through devices
via VFIO. In my test on a Xeon E5 2.6GHz, the time mapping a 16GB
BAR was reduced from about 0.4s to 1.5us.

Signed-off-by: Jason Cai (Xiang Feng) &lt;jason.cai@linux.alibaba.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When using vfio to pass through a PCIe device (e.g. a GPU card) that
has a huge BAR (e.g. 16GB), a lot of cycles are wasted on memory
pinning because PFNs of PCI BAR are not backed by struct page, and
the corresponding VMA has flag VM_PFNMAP.

With this change, when pinning a region which is a raw PFN mapping,
it can skip unnecessary user memory pinning process, and thus, can
significantly improve VM's boot up time when passing through devices
via VFIO. In my test on a Xeon E5 2.6GHz, the time mapping a 16GB
BAR was reduced from about 0.4s to 1.5us.

Signed-off-by: Jason Cai (Xiang Feng) &lt;jason.cai@linux.alibaba.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert: "vfio-pci: Mask INTx if a device is not capabable of enabling it"</title>
<updated>2018-03-22T04:50:19+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@redhat.com</email>
</author>
<published>2018-03-22T04:50:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=834814e80268c818f354c8f402e0c6604ed75589'/>
<id>834814e80268c818f354c8f402e0c6604ed75589</id>
<content type='text'>
This reverts commit 2170dd04316e0754cbbfa4892a25aead39d225f7

The intent of commit 2170dd04316e ("vfio-pci: Mask INTx if a device is
not capabable of enabling it") was to disallow the user from seeing
that the device supports INTx if the platform is incapable of enabling
it.  The detection of this case however incorrectly includes devices
which natively do not support INTx, such as SR-IOV VFs, and further
discussions reveal gaps even for the target use case.

Reported-by: Arjun Vynipadath &lt;arjun@chelsio.com&gt;
Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 2170dd04316e0754cbbfa4892a25aead39d225f7

The intent of commit 2170dd04316e ("vfio-pci: Mask INTx if a device is
not capabable of enabling it") was to disallow the user from seeing
that the device supports INTx if the platform is incapable of enabling
it.  The detection of this case however incorrectly includes devices
which natively do not support INTx, such as SR-IOV VFs, and further
discussions reveal gaps even for the target use case.

Reported-by: Arjun Vynipadath &lt;arjun@chelsio.com&gt;
Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs</title>
<updated>2018-03-21T18:46:19+00:00</updated>
<author>
<name>Suravee Suthikulpanit</name>
<email>suravee.suthikulpanit@amd.com</email>
</author>
<published>2018-03-21T18:46:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6bd06f5a486c06023a618a86e8153b91d26f75f4'/>
<id>6bd06f5a486c06023a618a86e8153b91d26f75f4</id>
<content type='text'>
VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices has a large number of mapped
IOVAs. This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Signed-off-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
[aw - use LIST_HEAD]
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices has a large number of mapped
IOVAs. This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Signed-off-by: Suravee Suthikulpanit &lt;suravee.suthikulpanit@amd.com&gt;
[aw - use LIST_HEAD]
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio: disable filesystem-dax page pinning</title>
<updated>2018-03-03T02:00:04+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2018-02-04T18:34:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=94db151dc89262bfa82922c44e8320cea2334667'/>
<id>94db151dc89262bfa82922c44e8320cea2334667</id>
<content type='text'>
Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: kvm@vger.kernel.org
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Tested-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: kvm@vger.kernel.org
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Tested-by: Haozhong Zhang &lt;haozhong.zhang@intel.com&gt;
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: do bulk POLL* -&gt; EPOLL* replacement</title>
<updated>2018-02-11T22:34:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-02-11T22:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=a9a08845e9acbd224e4ee466f5c1275ed50054e8'/>
<id>a9a08845e9acbd224e4ee466f5c1275ed50054e8</id>
<content type='text'>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\&lt;POLL$V\&gt;\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\&lt;POLL$V\&gt;\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
