summaryrefslogtreecommitdiff
path: root/drivers/iommu/iommufd
AgeCommit message (Collapse)Author
5 daysMerge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several fixes: - Add missing static const - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is turned on - Fix selftest memory leak and syzkaller splat - Fix missed -EFAULT in fault reporting write() fops - Fix a race where map/unmap with the internal IOVA allocator can unmap things it should not" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix a race with concurrent allocation and unmap iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format iommufd: Fix return value of iommufd_fault_fops_write() iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc() iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy} iommufd: vfio compatibility extension check for noiommu mode iommufd: Constify struct dma_buf_attach_ops
6 daysMerge tag 'iommu-updates-v7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Support for RISC-V IO-page-table format in generic iommupt code ARM-SMMU Updates: - Introduction of an "invalidation array" for SMMUv3, which enables future scalability work and optimisations for devices with a large number of SMMUv3 instances - Update the conditions under which the SMMUv3 driver works around hardware errata for invalidation on MMU-700 implementations - Fix broken command filtering for the host view of NVIDIA's "cmdqv" SMMUv3 extension - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs Intel VT-d: - Support for dirty tracking on domains attached to PASID - Removal of unnecessary read*()/write*() wrappers - Improvements to the invalidation paths AMD Vi: - Race-condition fixed in debugfs code - Make log buffer allocation NUMA aware RISC-V: - IO-TLB flushing improvements - Minor fixes" * tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits) iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC iommu/amd: Invalidate IRT cache for DMA aliases iommu/riscv: Remove overflows on the invalidation path iommu/amd: Fix clone_alias() to use the original device's devid iommu/vt-d: Remove the remaining pages along the invalidation path iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages iommu/vt-d: Split piotlb invalidation into range and all iommu/vt-d: Remove dmar_writel() and dmar_writeq() iommu/vt-d: Remove dmar_readl() and dmar_readq() iommufd/selftest: Test dirty tracking on PASID iommu/vt-d: Support dirty tracking on PASID iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer iommu/vt-d: Block PASID attachment to nested domain with dirty tracking iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain iommu/riscv: Fix signedness bug iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs iommu/amd: Fix illegal device-id access in IOMMU debugfs iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init() ...
10 daysiommufd: Fix a race with concurrent allocation and unmapSina Hassani
iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop body when getting to the more expensive unmap operations. This is fine on its own, except the loop condition is based on the first area that matches the unmap address range. If a concurrent call to map picks an area that was unmapped in previous iterations, the loop mistakenly tries to unmap it. This is reproducible by having one userspace thread map buffers and pass them to another thread that unmaps them. The problem manifests as EBUSY errors with single page mappings. Fix this by advancing the start pointer after unmapping an area. This ensures each iteration only examines the IOVA range that remains mapped, which is guaranteed not to have overlaps. Cc: stable@vger.kernel.org Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com Signed-off-by: Sina Hassani <sina@openai.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 formatPranjal Shrivastava
syzbot found that allocating a mock domain with AMDV1 format could cause a WARN_ON because the selftest enabled DYNAMIC_TOP without providing the required driver_ops. The AMDV1 format in the selftest was a placeholder and was not actually used by any of the existing selftests. Instead of adding dummy driver_ops to satisfy the requirements of a format we don't currently test, remove the AMDV1 format option from the selftest. The MOCK_IOMMUPT_DEFAULT and MOCK_IOMMUPT_HUGE formats are unaffected as they use the amdv1_mock variant which does not enable DYNAMIC_TOP. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Link: https://patch.msgid.link/r/20260330092609.2659235-1-praan@google.com Reported-by: syzbot+453eb7add07c3767adab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69c1d50b.a70a0220.3cae05.0001.GAE@google.com/ Signed-off-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31iommufd: Fix return value of iommufd_fault_fops_write()Zhenzhong Duan
copy_from_user() may return number of bytes failed to copy, we should not pass over this number to user space to cheat that write() succeed. Instead, -EFAULT should be returned. Link: https://patch.msgid.link/r/20260330030755.12856-1-zhenzhong.duan@intel.com Cc: stable@vger.kernel.org Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-25iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc()Kexin Sun
The function iommufd_hw_pagetable_alloc() was renamed to iommufd_hwpt_paging_alloc() by commit 89db31635c87 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable"). Update the stale reference in iommufd_device_auto_get_domain(). Link: https://patch.msgid.link/r/20260321105759.6832-1-kexinsun@smail.nju.edu.cn Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-17iommufd: Report ATS not supported status via IOMMU_GET_HW_INFOShameer Kolothum
If the IOMMU driver reports that ATS is not supported for a device, set the IOMMU_HW_CAP_PCI_ATS_NOT_SUPPORTED flag in the returned hardware capabilities. This uses a negative flag for UAPI compatibility. Existing userspace assumes ATS is supported if no flag is present. This also ensures that new userspace works correctly on both old and new kernels, where a zero value implies ATS support. When this flag is set, ATS cannot be used for the device. When it is clear, ATS may be enabled when an appropriate HWPT is attached. Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-13iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy}Thorsten Blum
mock_viommu_init() allocates two pages using __get_free_pages(..., 1), but its error path and mock_viommu_destroy() only release the first page using free_page(), leaking the second page. Use free_pages() with the matching order instead to avoid any page leaks. Fixes: 80478a2b450e ("iommufd/selftest: Add coverage for the new mmap interface") Link: https://patch.msgid.link/r/20260312164040.457293-3-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-02iommufd: vfio compatibility extension check for noiommu modeJacob Pan
VFIO_CHECK_EXTENSION should return false for TYPE1_IOMMU variants when in NO-IOMMU mode and IOMMUFD compat container is set. This change makes the behavior match VFIO_CONTAINER in noiommu mode. It also prevents userspace from incorrectly attempting to use TYPE1 IOMMU operations in a no-iommu context. Fixes: d624d6652a65 ("iommufd: vfio container FD ioctl compatibility") Link: https://patch.msgid.link/r/20260213183636.3340-1-jacob.pan@linux.microsoft.com Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-02iommufd: Constify struct dma_buf_attach_opsChristophe JAILLET
'struct dma_buf_attach_ops' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 81096 13899 192 95187 173d3 drivers/iommu/iommufd/pages.o After: ===== text data bss dec hex filename 81160 13835 192 95187 173d3 drivers/iommu/iommufd/pages.o Link: https://patch.msgid.link/r/67e9126bbffa1d5c05124773a8dd4a3493be77ac.1772139886.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-02-23iommufd: Add dma_buf_pin()Leon Romanovsky
IOMMUFD relies on a private protocol with VFIO, and this always operated in pinned mode. Now that VFIO can support pinned importers update IOMMUFD to invoke the normal dma-buf flow to request pin. This isn't enough to allow IOMMUFD to work with other exporters, it still needs a way to get the physical address list which is another series. IOMMUFD supports the defined revoke semantics. It immediately stops and fences access to the memory inside it's invalidate_mappings() callback, and it currently doesn't use scatterlists so doesn't call map/unmap at all. It is expected that a future revision can synchronously call unmap from the move_notify callback as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-8-463d956bd527@nvidia.com
2026-02-23Merge drm/drm-next into drm-misc-nextMaxime Ripard
Let's merge 7.0-rc1 to start the new drm-misc-next window Signed-off-by: Maxime Ripard <mripard@kernel.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: "A small cycle with the bulk in selftests and reintroducing poison handling in the nvgrace-gpu driver. The rest are fixes, cleanups, and some dmabuf structure consolidation. - Update outdated mdev comment referencing the renamed mdev_type_add() function (Julia Lawall) - Introduce selftest support for IOMMU mapping of PCI MMIO BARs (Alex Mastro) - Relax selftest assertion relative to differences in huge page handling between legacy (v1) TYPE1 IOMMU mapping behavior and the compatibility mode supported by IOMMUFD (David Matlack) - Reintroduce memory poison handling support for non-struct-page- backed memory in the nvgrace-gpu variant driver (Ankit Agrawal) - Replace dma_buf_phys_vec with phys_vec to avoid duplicate structure and semantics (Leon Romanovsky) - Add missing upstream bridge locking across PCI function reset, resolving an assertion failure when secondary bus reset is used to provide that reset (Anthony Pighin) - Fixes to hisi_acc vfio-pci variant driver to resolve corner case issues related to resets, repeated migration, and error injection scenarios (Longfang Liu, Weili Qian) - Restrict vfio selftest builds to arm64 and x86_64, resolving compiler warnings on 32-bit archs (Ted Logan) - Un-deprecate the fsl-mc vfio bus driver as a new maintainer has stepped up (Ioana Ciornei)" * tag 'vfio-v7.0-rc1' of https://github.com/awilliam/linux-vfio: vfio/fsl-mc: add myself as maintainer vfio: selftests: only build tests on arm64 and x86_64 hisi_acc_vfio_pci: fix the queue parameter anomaly issue hisi_acc_vfio_pci: resolve duplicate migration states hisi_acc_vfio_pci: update status after RAS error hisi_acc_vfio_pci: fix VF reset timeout issue vfio/pci: Lock upstream bridge for vfio_pci_core_disable() types: reuse common phys_vec type instead of DMABUF open‑coded variant vfio/nvgrace-gpu: register device memory for poison handling mm: add stubs for PFNMAP memory failure registration functions vfio: selftests: Drop IOMMU mapping size assertions for VFIO_TYPE1_IOMMU vfio: selftests: Add vfio_dma_mapping_mmio_test vfio: selftests: Align BAR mmaps for efficient IOMMU mapping vfio: selftests: Centralize IOMMU mode name definitions vfio/mdev: update outdated comment
2026-02-05Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get bug fixes from v6.19-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2026-01-28iommufd: Initialize batch->kind in batch_clear()Deepanshu Kartikey
KMSAN reported an uninitialized value when batch_add_pfn_num() reads batch->kind. This occurs because batch_clear() does not initialize the kind field. When batch_add_pfn_num() checks "if (batch->kind != kind)", it reads this uninitialized value, triggering KMSAN warnings. However the algorithm is fine with any value in kind at this point as the batch is always empty and it always corrects kind if wrong. Initialize batch->kind to zero in batch_clear() to silence the KMSAN warning. Link: https://patch.msgid.link/r/20260124132214.624041-1-kartikey406@gmail.com Reported-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=df28076a30d726933015 Fixes: f394576eb11db ("iommufd: PFN handling for iopt_pages") Tested-by: syzbot+df28076a30d726933015@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com Reported-by: syzbot+a0c841e02f328005bbcc@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-01-27dma-buf: Rename dma_buf_move_notify() to dma_buf_invalidate_mappings()Leon Romanovsky
Along with renaming the .move_notify() callback, rename the corresponding dma-buf core function. This makes the expected behavior clear to exporters calling this function. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-2-f98fca917e96@nvidia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2026-01-27dma-buf: Rename .move_notify() callback to a clearer identifierLeon Romanovsky
Rename the .move_notify() callback to .invalidate_mappings() to make its purpose explicit and highlight that it is responsible for invalidating existing mappings. Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20260124-dmabuf-revoke-v5-1-f98fca917e96@nvidia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2026-01-19Merge tag 'common_phys_vec_via_vfio' into v6.20/vfio/nextAlex Williamson
* Reuse common phys_vec, phase out dma_buf_phys_vec Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-19types: reuse common phys_vec type instead of DMABUF open‑coded variantLeon Romanovsky
After commit fcf463b92a08 ("types: move phys_vec definition to common header"), we can use the shared phys_vec type instead of the DMABUF‑specific dma_buf_phys_vec, which duplicated the same structure and semantics. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20260107-convert-to-pvec-v1-1-6e3ab8079708@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-01-10iommufd/selftest: Prevent module/builtin conflicts in kconfigJason Gunthorpe
The selftest now depends on the AMDv1 page table, however the selftest kconfig itself is just an sub-option of the main IOMMUFD module kconfig. This means it cannot be modular and so kconfig allowed a modular IOMMU_PT_AMDV1 with a built in IOMMUFD. This causes link failures: ld: vmlinux.o: in function `mock_domain_alloc_pgtable.isra.0': selftest.c:(.text+0x12e8ad3): undefined reference to `pt_iommu_amdv1_init' ld: vmlinux.o: in function `BSWAP_SHUFB_CTL': sha1-avx2-asm.o:(.rodata+0xaa36a8): undefined reference to `pt_iommu_amdv1_read_and_clear_dirty' ld: sha1-avx2-asm.o:(.rodata+0xaa36f0): undefined reference to `pt_iommu_amdv1_map_pages' ld: sha1-avx2-asm.o:(.rodata+0xaa36f8): undefined reference to `pt_iommu_amdv1_unmap_pages' ld: sha1-avx2-asm.o:(.rodata+0xaa3720): undefined reference to `pt_iommu_amdv1_iova_to_phys' Adjust the kconfig to disable IOMMUFD_TEST if IOMMU_PT_AMDV1 is incompatible. Fixes: e93d5945ed5b ("iommufd: Change the selftest to use iommupt instead of xarray") Suggested-by: Arnd Bergmann <arnd@arndb.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512210135.freQWpxa-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-10iommufd/selftest: Add missing kconfig for DMA_SHARED_BUFFERJason Gunthorpe
The test doesn't build without it, dma-buf.h does not provide stub functions if it is not enabled. Compilation can fail with: ERROR:root:ld: vmlinux.o: in function `iommufd_test': (.text+0x3b1cdd): undefined reference to `dma_buf_get' ld: (.text+0x3b1d08): undefined reference to `dma_buf_put' ld: (.text+0x3b2105): undefined reference to `dma_buf_export' ld: (.text+0x3b211f): undefined reference to `dma_buf_fd' ld: (.text+0x3b2e47): undefined reference to `dma_buf_move_notify' Add the missing select. Fixes: d2041f1f11dd ("iommufd/selftest: Add some tests for the dmabuf flow") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-12-16iommufd/selftest: Check for overflow in IOMMU_TEST_OP_ADD_RESERVEDJason Gunthorpe
syzkaller found it could overflow math in the test infrastructure and cause a WARN_ON by corrupting the reserved interval tree. This only effects test kernels with CONFIG_IOMMUFD_TEST. Validate the user input length in the test ioctl. Fixes: f4b20bb34c83 ("iommufd: Add kernel support for testing iommufd") Link: https://patch.msgid.link/r/0-v1-cd99f6049ba5+51-iommufd_syz_add_resv_jgg@nvidia.com Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Reported-by: syzbot+57fdb0cf6a0c5d1f15a2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69368129.a70a0220.38f243.008f.GAE@google.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-15iommufd/selftest: Do not leak the hwpt if IOMMU_TEST_OP_MD_CHECK_MAP failsJason Gunthorpe
If the input validation fails it returned without freeing the hwpt refcount causing a leak. This triggers a WARN_ON when closing the fd: WARNING: drivers/iommu/iommufd/main.c:369 at iommufd_fops_release+0x385/0x430, CPU#1: repro/724 Found by szykaller. Fixes: e93d5945ed5b ("iommufd: Change the selftest to use iommupt instead of xarray") Link: https://patch.msgid.link/r/0-v1-c8ed57e24380+44ae-iommufd_selftest_hwpt_leak_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/r/aTJGMaqwQK0ASj0G@ly-workstation Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-15iommufd: Fix building without dmabufArnd Bergmann
When DMABUF is disabled, trying to use it causes a link failure: x86_64-linux-ld: drivers/iommu/iommufd/io_pagetable.o: in function `iopt_map_file_pages': io_pagetable.c:(.text+0x1735): undefined reference to `dma_buf_get' x86_64-linux-ld: io_pagetable.c:(.text+0x1775): undefined reference to `dma_buf_put' Fixes: 44ebaa1744fd ("iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE") Link: https://patch.msgid.link/r/20251204100333.1034767-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-04Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This is a pretty consequential cycle for iommufd, though this pull is not too big. It is based on a shared branch with VFIO that introduces VFIO_DEVICE_FEATURE_DMA_BUF a DMABUF exporter for VFIO device's MMIO PCI BARs. This was a large multiple series journey over the last year and a half. Based on that work IOMMUFD gains support for VFIO DMABUF's in its existing IOMMU_IOAS_MAP_FILE, which closes the last major gap to support PCI peer to peer transfers within VMs. In Joerg's iommu tree we have the "generic page table" work which aims to consolidate all the duplicated page table code in every iommu driver into a single algorithm. This will be used by iommufd to implement unique page table operations to start adding new features and improve performance. In here: - Expand IOMMU_IOAS_MAP_FILE to accept a DMABUF exported from VFIO. This is the first step to broader DMABUF support in iommufd, right now it only works with VFIO. This closes the last functional gap with classic VFIO type 1 to safely support PCI peer to peer DMA by mapping the VFIO device's MMIO into the IOMMU. - Relax SMMUv3 restrictions on nesting domains to better support qemu's sequence to have an identity mapping before the vSID is established" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases iommufd/selftest: Add some tests for the dmabuf flow iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Allow MMIO pages in a batch iommufd: Allow a DMABUF to be revoked iommufd: Do not map/unmap revoked DMABUFs iommufd: Add DMABUF to iopt_pages vfio/pci: Add vfio_pci_dma_buf_iommufd_map()
2025-12-04Merge tag 'iommu-updates-v6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: - Introduction of the generic IO page-table framework with support for Intel and AMD IOMMU formats from Jason. This has good potential for unifying more IO page-table implementations and making future enhancements more easy. But this also needed quite some fixes during development. All known issues have been fixed, but my feeling is that there is a higher potential than usual that more might be needed. - Intel VT-d updates: - Use right invalidation hint in qi_desc_iotlb() - Reduce the scope of INTEL_IOMMU_FLOPPY_WA - ARM-SMMU updates: - Qualcomm device-tree binding updates for Kaanapali and Glymur SoCs and a new clock for the TBU. - Fix error handling if level 1 CD table allocation fails. - Permit more than the architectural maximum number of SMRs for funky Qualcomm mis-implementations of SMMUv2. - Mediatek driver: - MT8189 iommu support - Move ARM IO-pgtable selftests to kunit - Device leak fixes for a couple of drivers - Random smaller fixes and improvements * tag 'iommu-updates-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (81 commits) iommupt/vtd: Support mgaw's less than a 4 level walk for first stage iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires powerpc/pseries/svm: Make mem_encrypt.h self contained genpt: Make GENERIC_PT invisible iommupt: Avoid a compiler bug with sw_bit iommu/arm-smmu-qcom: Enable use of all SMR groups when running bare-metal iommupt: Fix unlikely flows in increase_top() iommu/amd: Propagate the error code returned by __modify_irte_ga() in modify_irte_ga() MAINTAINERS: Update my email address iommu/arm-smmu-v3: Fix error check in arm_smmu_alloc_cd_tables dt-bindings: iommu: qcom_iommu: Allow 'tbu' clock iommu/vt-d: Restore previous domain::aperture_end calculation iommu/vt-d: Fix unused invalidation hint in qi_desc_iotlb iommu/vt-d: Set INTEL_IOMMU_FLOPPY_WA depend on BLK_DEV_FD iommu/tegra: fix device leak on probe_device() iommu/sun50i: fix device leak on of_xlate() iommu/omap: simplify probe_device() error handling iommu/omap: fix device leaks on probe_device() iommu/mediatek-v1: add missing larb count sanity check iommu/mediatek-v1: fix device leaks on probe() ...
2025-11-28Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'mediatek', ↵Joerg Roedel
'nvidia/tegra', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
2025-11-26Merge branch 'iommufd_dmabuf' into k.o-iommufd/for-nextJason Gunthorpe
Jason Gunthorpe says: ==================== This series is the start of adding full DMABUF support to iommufd. Currently it is limited to only work with VFIO's DMABUF exporter. It sits on top of Leon's series to add a DMABUF exporter to VFIO: https://lore.kernel.org/all/20251120-dmabuf-vfio-v9-0-d7f71607f371@nvidia.com/ The existing IOMMU_IOAS_MAP_FILE is enhanced to detect DMABUF fd's, but otherwise works the same as it does today for a memfd. The user can select a slice of the FD to map into the ioas and if the underliyng alignment requirements are met it will be placed in the iommu_domain. Though limited, it is enough to allow a VMM like QEMU to connect MMIO BAR memory from VFIO to an iommu_domain controlled by iommufd. This is used for PCI Peer to Peer support in VMs, and is the last feature that the VFIO type 1 container has that iommufd couldn't do. The VFIO type1 version extracts raw PFNs from VMAs, which has no lifetime control and is a use-after-free security problem. Instead iommufd relies on revokable DMABUFs. Whenever VFIO thinks there should be no access to the MMIO it can shoot down the mapping in iommufd which will unmap it from the iommu_domain. There is no automatic remap, this is a safety protocol so the kernel doesn't get stuck. Userspace is expected to know it is doing something that will revoke the dmabuf and map/unmap it around the activity. Eg when QEMU goes to issue FLR it should do the map/unmap to iommufd. Since DMABUF is missing some key general features for this use case it relies on a "private interconnect" between VFIO and iommufd via the vfio_pci_dma_buf_iommufd_map() call. The call confirms the DMABUF has revoke semantics and delivers a phys_addr for the memory suitable for use with iommu_map(). Medium term there is a desire to expand the supported DMABUFs to include GPU drivers to support DPDK/SPDK type use cases so future series will work to add a general concept of revoke and a general negotiation of interconnect to remove vfio_pci_dma_buf_iommufd_map(). I also plan another series to modify iommufd's vfio_compat to transparently pull a dmabuf out of a VFIO VMA to emulate more of the uAPI of type1. The latest series for interconnect negotation to exchange a phys_addr is: https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com And the discussion for design of revoke is here: https://lore.kernel.org/dri-devel/20250114173103.GE5556@nvidia.com/ ==================== Based on a shared branch with vfio. * iommufd_dmabuf: iommufd/selftest: Add some tests for the dmabuf flow iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILE iommufd: Have iopt_map_file_pages convert the fd to a file iommufd: Have pfn_reader process DMABUF iopt_pages iommufd: Allow MMIO pages in a batch iommufd: Allow a DMABUF to be revoked iommufd: Do not map/unmap revoked DMABUFs iommufd: Add DMABUF to iopt_pages vfio/pci: Add vfio_pci_dma_buf_iommufd_map() vfio/nvgrace: Support get_dmabuf_phys vfio/pci: Add dma-buf export support for MMIO regions vfio/pci: Enable peer-to-peer DMA transactions by default vfio/pci: Share the core device pointer while invoking feature functions vfio: Export vfio device get and put registration helpers dma-buf: provide phys_vec to scatter-gather mapping routine PCI/P2PDMA: Document DMABUF model PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() function PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation PCI/P2PDMA: Simplify bus address mapping API PCI/P2PDMA: Separate the mmap() support from the core logic Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd/selftest: Add some tests for the dmabuf flowJason Gunthorpe
Basic tests of establishing a dmabuf and revoking it. The selftest kernel side provides a basic small dmabuf for this testing. Link: https://patch.msgid.link/r/9-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Accept a DMABUF through IOMMU_IOAS_MAP_FILEJason Gunthorpe
Finally call iopt_alloc_dmabuf_pages() if the user passed in a DMABUF through IOMMU_IOAS_MAP_FILE. This makes the feature visible to userspace. Link: https://patch.msgid.link/r/8-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Have iopt_map_file_pages convert the fd to a fileJason Gunthorpe
Since dmabuf only has APIs that work on an int fd and not a struct file *, pass the fd deeper into the call chain so we can use the dmabuf APIs as is. Link: https://patch.msgid.link/r/7-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Have pfn_reader process DMABUF iopt_pagesJason Gunthorpe
Make another sub implementation of pfn_reader for DMABUF. This version will fill the batch using the struct phys_vec recorded during the attachment. Link: https://patch.msgid.link/r/6-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Allow MMIO pages in a batchJason Gunthorpe
Addresses intended for MMIO should be propagated through to the iommu with the IOMMU_MMIO flag set. Keep track in the batch if all the pfns are cachable or mmio and flush the batch out of it ever needs to be changed. Switch to IOMMU_MMIO if the batch is MMIO when mapping the iommu. Link: https://patch.msgid.link/r/5-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Allow a DMABUF to be revokedJason Gunthorpe
When connected to VFIO, the only DMABUF exporter that is accepted, the move_notify callback will be made when VFIO wants to remove access to the MMIO. This is being called revoke. Wire up revoke to go through all the iommu_domain's that have mapped the DMABUF and unmap them. The locking here is unpleasant, since the existing locking scheme was designed to come from the iopt through the area to the pages we cannot use pages as starting point for the locking. There is no way to obtain the domains_rwsem before obtaining the pages mutex to reliably use the existing domains_itree. Solve this problem by adding a new tracking structure just for DMABUF revoke. Record a linked list of areas and domains inside the pages mutex. Clean the entries on the list during revoke. The map/unmaps are now all done under a pages mutex while updating the tracking linked list so nothing can get out of sync. Only one lock is required for revoke processing. Link: https://patch.msgid.link/r/4-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Do not map/unmap revoked DMABUFsJason Gunthorpe
Once a DMABUF is revoked the domain will be unmapped under the pages mutex. Double unmapping will trigger a WARN, and mapping while revoked will fail. Check for revoked DMABUFs along all the map and unmap paths to resolve this. Ensure that map/unmap is always done under the pages mutex so it is synchronized with the revoke notifier. If a revoke happens between allocating the iopt_pages and the population to a domain then the population will succeed, and leave things unmapped as though revoke had happened immediately after. Currently there is no way to repopulate the domains. Userspace is expected to know if it is going to do something that would trigger revoke (eg if it is about to do a FLR) then it should go and remove the DMABUF mappings before and put the back after. The revoke is only to protect the kernel from mis-behaving userspace. Link: https://patch.msgid.link/r/3-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25iommufd: Add DMABUF to iopt_pagesJason Gunthorpe
Add IOPT_ADDRESS_DMABUF to the iopt_pages and the basic infrastructure to create an iopt_pages from a struct dma_buf *. DMABUF pages are not supported for accesses, and for now can only be used with the VFIO DMABUF exporter. The overall flow will be similar to memfd where the user can pass in a DMABUF file descriptor to IOMMU_IOAS_MAP_FILE and create an area and pages. Like other areas it can be copied and otherwise manipulated, though there is little point in doing so. There is no pinned page accounting done for DMABUF maps. The DMABUF attachment exists so long as the dmabuf is mapped into an IOAS, even if the IOAS is not mapped to any domains. Link: https://patch.msgid.link/r/2-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-21iommufd/iommufd_private.h: Avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declaration to the end of the corresponding structure. Notice that struct iommufd_vevent is a flexible structure, this is a structure that contains a flexible-array member. Fix the following warning: drivers/iommu/iommufd/iommufd_private.h:621:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://patch.msgid.link/r/aRHOAwpATIE0oajj@kspp Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-21iommufd/driver: Fix counter initialization for counted_by annotationGustavo A. R. Silva
One of the requirements for counted_by annotations is that the counter member must be initialized before the first reference to the flexible-array member. Move the vevent->data_len = data_len; initialization to before the first access to flexible array vevent->event_data. Link: https://patch.msgid.link/r/aRL7ZFFqM5bRTd2D@kspp Cc: stable@vger.kernel.org Fixes: e8e1ef9b77a7 ("iommufd/viommu: Add iommufd_viommu_report_event helper") Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05iommufd: Make vfio_compat's unmap succeed if the range is already emptyJason Gunthorpe
iommufd returns ENOENT when attempting to unmap a range that is already empty, while vfio type1 returns success. Fix vfio_compat to match. Fixes: d624d6652a65 ("iommufd: vfio container FD ioctl compatibility") Link: https://patch.msgid.link/r/0-v1-76be45eff0be+5d-iommufd_unmap_compat_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Alex Mastro <amastro@fb.com> Reported-by: Alex Mastro <amastro@fb.com> Closes: https://lore.kernel.org/r/aP0S5ZF9l3sWkJ1G@devgpu012.nha5.facebook.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-05iommufd: Change the selftest to use iommupt instead of xarrayJason Gunthorpe
The iommufd self test uses an xarray to store the pfns and their orders to emulate a page table. Make it act more like a real iommu driver by replacing the xarray with an iommupt based page table. The new AMDv1 mock format behaves similarly to the xarray. Add set_dirty() as a iommu_pt operation to allow the test suite to simulate HW dirty. Userspace can select between several formats including the normal AMDv1 format and a special MOCK_IOMMUPT_HUGE variation for testing huge page dirty tracking. To make the dirty tracking test work the page table must only store exactly 2M huge pages otherwise the logic the test uses fails. They cannot be broken up or combined. Aside from aligning the selftest with a real page table implementation, this helps test the iommupt code itself. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-10-27iommu: Pass in old domain to attach_dev callback functionsNicolin Chen
The IOMMU core attaches each device to a default domain on probe(). Then, every new "attach" operation has a fundamental meaning of two-fold: - detach from its currently attached (old) domain - attach to a given new domain Modern IOMMU drivers following this pattern usually want to clean up the things related to the old domain, so they call iommu_get_domain_for_dev() to fetch the old domain. Pass in the old domain pointer from the core to drivers, aligning with the set_dev_pasid op that does so already. Ensure all low-level attach fcuntions in the core can forward the correct old domain pointer. Thus, rework those functions as well. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-10-20iommufd: Don't overflow during division for dirty trackingJason Gunthorpe
If pgshift is 63 then BITS_PER_TYPE(*bitmap->bitmap) * pgsize will overflow to 0 and this triggers divide by 0. In this case the index should just be 0, so reorganize things to divide by shift and avoid hitting any overflows. Link: https://patch.msgid.link/r/0-v1-663679b57226+172-iommufd_dirty_div0_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 58ccf0190d19 ("vfio: Add an IOVA bitmap support") Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+093a8a8b859472e6c257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-30iommufd: Register iommufd mock devices with fwspecGuixin Liu
Since the bus ops were retired the iommu subsystem changed to using fwspec to match the iommu driver to the iommu device. If a device has a NULL fwspec then it is matched to the first iommu driver with a NULL fwspec, effectively disabling support for systems with more than one non-fwspec iommu driver. Thus, if the iommufd selfest are run in an x86 system that registers a non-fwspec iommu driver they fail to bind their mock devices to the mock iommu driver. Fix this by allocating a software fwnode for mock iommu driver's iommu_device, and set it to the device which mock iommu driver created. This is done by adding a new helper iommu_mock_device_add() which abuses the internals of the fwspec system to establish a fwspec before the device is added and is careful not to leak it. A matching dummy fwspec is automatically added to the mock iommu driver. Test by "make -C toosl/testing/selftests TARGETS=iommu run_tests": PASSED: 229 / 229 tests passed. In addition, this issue is also can be found on amd platform, and also tested on a amd machine. Link: https://patch.msgid.link/r/20250925054730.3877-1-kanie@linux.alibaba.com Fixes: 17de3f5fdd35 ("iommu: Retire bus ops") Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Qinyun Tan <qinyuntan@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19iommufd: WARN if an object is aborted with an elevated refcountJason Gunthorpe
If something holds a refcount then it is at risk of UAFing. For abort paths we expect the caller to never share the object with a parallel thread and to clean up any refcounts it obtained on its own. Add the missing dec inside iommufd_hwpt_paging_alloc() during error unwind by making iommufd_hw_pagetable_attach/detach() proper pairs. Link: https://patch.msgid.link/r/2-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19iommufd: Fix race during abort for file descriptorsJason Gunthorpe
fput() doesn't actually call file_operations release() synchronously, it puts the file on a work queue and it will be released eventually. This is normally fine, except for iommufd the file and the iommufd_object are tied to gether. The file has the object as it's private_data and holds a users refcount, while the object is expected to remain alive as long as the file is. When the allocation of a new object aborts before installing the file it will fput() the file and then go on to immediately kfree() the obj. This causes a UAF once the workqueue completes the fput() and tries to decrement the users refcount. Fix this by putting the core code in charge of the file lifetime, and call __fput_sync() during abort to ensure that release() is called before kfree. __fput_sync() is a bit too tricky to open code in all the object implementations. Instead the objects tell the core code where the file pointer is and the core will take care of the life cycle. If the object is successfully allocated then the file will hold a users refcount and the iommufd_object cannot be destroyed. It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an issue because close() is already using a synchronous version of fput(). The UAF looks like this: BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164 CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcd/0x630 mm/kasan/report.c:482 kasan_report+0xe0/0x110 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline] __refcount_dec include/linux/refcount.h:455 [inline] refcount_dec include/linux/refcount.h:476 [inline] iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 __fput+0x402/0xb70 fs/file_table.c:468 task_work_run+0x14d/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f Link: https://patch.msgid.link/r/1-v1-02cd136829df+31-iommufd_syz_fput_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-09-19iommufd: Fix refcounting race during mmapJason Gunthorpe
The owner object of the imap can be destroyed while the imap remains in the mtree. So access to the imap pointer without holding locks is racy with destruction. The imap is safe to access outside the lock once a users refcount is obtained, the owner object cannot start destruction until users is 0. Thus the users refcount should not be obtained at the end of iommufd_fops_mmap() but instead inside the mtree lock held around the mtree_load(). Move the refcount there and use refcount_inc_not_zero() as we can have a 0 refcount inside the mtree during destruction races. Link: https://patch.msgid.link/r/0-v1-e6faace50971+3cc-iommufd_mmap_fix_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 56e9a0d8e53f ("iommufd: Add mmap interface") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-08-18iommufd: viommu: free memory allocated by kvcalloc() using kvfree()Akhilesh Patil
Use kvfree() instead of kfree() to free pages allocated by kvcalloc() in iommufs_hw_queue_alloc_phys() to fix potential memory corruption. Ensure the memory is properly freed, as kvcalloc may internally use vmalloc or kmalloc depending on available memory in the system. Fixes: 2238ddc2b056 ("iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl") Link: https://patch.msgid.link/r/aJifyVV2PL6WGEs6@bhairav-test.ee.iitb.ac.in Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-07-31Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This broadly brings the assigned HW command queue support to iommufd. This feature is used to improve SVA performance in VMs by avoiding paravirtualization traps during SVA invalidations. Along the way I think some of the core logic is in a much better state to support future driver backed features. Summary: - IOMMU HW now has features to directly assign HW command queues to a guest VM. In this mode the command queue operates on a limited set of invalidation commands that are suitable for improving guest invalidation performance and easy for the HW to virtualize. This brings the generic infrastructure to allow IOMMU drivers to expose such command queues through the iommufd uAPI, mmap the doorbell pages, and get the guest physical range for the command queue ring itself. - An implementation for the NVIDIA SMMUv3 extension "cmdqv" is built on the new iommufd command queue features. It works with the existing SMMU driver support for cmdqv in guest VMs. - Many precursor cleanups and improvements to support the above cleanly, changes to the general ioctl and object helpers, driver support for VDEVICE, and mmap pgoff cookie infrastructure. - Sequence VDEVICE destruction to always happen before VFIO device destruction. When using the above type features, and also in future confidential compute, the internal virtual device representation becomes linked to HW or CC TSM configuration and objects. If a VFIO device is removed from iommufd those HW objects should also be cleaned up to prevent a sort of UAF. This became important now that we have HW backing the VDEVICE. - Fix one syzkaller found error related to math overflows during iova allocation" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (57 commits) iommu/arm-smmu-v3: Replace vsmmu_size/type with get_viommu_size iommu/arm-smmu-v3: Do not bother impl_ops if IOMMU_VIOMMU_TYPE_ARM_SMMUV3 iommufd: Rename some shortterm-related identifiers iommufd/selftest: Add coverage for vdevice tombstone iommufd/selftest: Explicitly skip tests for inapplicable variant iommufd/vdevice: Remove struct device reference from struct vdevice iommufd: Destroy vdevice on idevice destroy iommufd: Add a pre_destroy() op for objects iommufd: Add iommufd_object_tombstone_user() helper iommufd/viommu: Roll back to use iommufd_object_alloc() for vdevice iommufd/selftest: Test reserved regions near ULONG_MAX iommufd: Prevent ALIGN() overflow iommu/tegra241-cmdqv: import IOMMUFD module namespace iommufd: Do not allow _iommufd_object_alloc_ucmd if abort op is set iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops ...