summaryrefslogtreecommitdiff
path: root/kernel/dma
AgeCommit message (Collapse)Author
2025-10-15dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOCMarek Szyprowski
Commit 370645f41e6e ("dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned") introduced DMA_BOUNCE_UNALIGNED_KMALLOC feature and permitted architecture specific code configure kmalloc slabs with sizes smaller than the value of dma_get_cache_alignment(). When that feature is enabled, the physical address of some small kmalloc()-ed buffers might be not aligned to the CPU cachelines, thus not really suitable for typical DMA. To properly handle that case a SWIOTLB buffer bouncing is used, so no CPU cache corruption occurs. When that happens, there is no point reporting a false-positive DMA-API warning that the buffer is not properly aligned, as this is not a client driver fault. [m.szyprowski@samsung.com: replace is_swiotlb_allocated() with is_swiotlb_active(), per Catalin] Link: https://lkml.kernel.org/r/20251010173009.3916215-1-m.szyprowski@samsung.com Link: https://lkml.kernel.org/r/20251009141508.2342138-1-m.szyprowski@samsung.com Fixes: 370645f41e6e ("dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: Robin Murohy <robin.murphy@arm.com> Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03Merge tag 'dma-mapping-6.18-2025-09-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - Refactoring of DMA mapping API to physical addresses as the primary interface instead of page+offset parameters This gets much closer to Matthew Wilcox's long term wish for struct-pageless IO to cacheable DRAM and is supporting memdesc project which seeks to substantially transform how struct page works. An advantage of this approach is the possibility of introducing DMA_ATTR_MMIO, which covers existing 'dma_map_resource' flow in the common paths, what in turn lets to use recently introduced dma_iova_link() API to map PCI P2P MMIO without creating struct page Developped by Leon Romanovsky and Jason Gunthorpe - Minor clean-up by Petr Tesarik and Qianfeng Rong * tag 'dma-mapping-6.18-2025-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: kmsan: fix missed kmsan_handle_dma() signature conversion mm/hmm: properly take MMIO path mm/hmm: migrate to physical address-based DMA mapping API dma-mapping: export new dma_*map_phys() interface xen: swiotlb: Open code map_resource callback dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs() kmsan: convert kmsan_handle_dma to use physical addresses dma-mapping: convert dma_direct_*map_page to be phys_addr_t based iommu/dma: implement DMA_ATTR_MMIO for iommu_dma_(un)map_phys() iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys dma-debug: refactor to use physical addresses for page mapping iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link(). dma-mapping: introduce new DMA attribute to indicate MMIO memory swiotlb: Remove redundant __GFP_NOWARN dma-direct: clean up the logic in __dma_direct_alloc_pages()
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-09-21dma-remap: drop nth_page() in dma_common_contiguous_remap()David Hildenbrand
dma_common_contiguous_remap() is used to remap an "allocated contiguous region". Within a single allocation, there is no need to use nth_page() anymore. Neither the buddy, nor hugetlb, nor CMA will hand out problematic page ranges. Link: https://lkml.kernel.org/r/20250901150359.867252-24-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-12dma-mapping: export new dma_*map_phys() interfaceLeon Romanovsky
Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys() that operate directly on physical addresses instead of page+offset parameters. This provides a more efficient interface for drivers that already have physical addresses available. The new functions are implemented as the primary mapping layer, with the existing dma_map_page_attrs()/dma_map_resource() and dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple wrappers around the phys-based implementations. In case dma_map_page_attrs(), the struct page is converted to physical address with help of page_to_phys() function and dma_map_resource() provides physical address as is together with addition of DMA_ATTR_MMIO attribute. The old page-based API is preserved in mapping.c to ensure that existing code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL variant for dma_*map_phys(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/54cc52af91777906bbe4a386113437ba0bcfba9c.1757423202.git.leonro@nvidia.com
2025-09-12dma-mapping: implement DMA_ATTR_MMIO for dma_(un)map_page_attrs()Leon Romanovsky
Make dma_map_page_attrs() and dma_map_page_attrs() respect DMA_ATTR_MMIO. DMA_ATR_MMIO makes the functions behave the same as dma_(un)map_resource(): - No swiotlb is possible - Legacy dma_ops arches use ops->map_resource() - No kmsan - No arch_dma_map_phys_direct() The prior patches have made the internal functions called here support DMA_ATTR_MMIO. This is also preparation for turning dma_map_resource() into an inline calling dma_map_phys(DMA_ATTR_MMIO) to consolidate the flows. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/3660e2c78ea409d6c483a215858fb3af52cd0ed3.1757423202.git.leonro@nvidia.com
2025-09-12kmsan: convert kmsan_handle_dma to use physical addressesLeon Romanovsky
Convert the KMSAN DMA handling function from page-based to physical address-based interface. The refactoring renames kmsan_handle_dma() parameters from accepting (struct page *page, size_t offset, size_t size) to (phys_addr_t phys, size_t size). The existing semantics where callers are expected to provide only kmap memory is continued here. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/3557cbaf66e935bc794f37d2b891ef75cbf2c80c.1757423202.git.leonro@nvidia.com
2025-09-12dma-mapping: convert dma_direct_*map_page to be phys_addr_t basedLeon Romanovsky
Convert the DMA direct mapping functions to accept physical addresses directly instead of page+offset parameters. The functions were already operating on physical addresses internally, so this change eliminates the redundant page-to-physical conversion at the API boundary. The functions dma_direct_map_page() and dma_direct_unmap_page() are renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively, with their calling convention changed from (struct page *page, unsigned long offset) to (phys_addr_t phys). Architecture-specific functions arch_dma_map_page_direct() and arch_dma_unmap_page_direct() are similarly renamed to arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct(). The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks to allow integration with dma_direct_map_resource and dma_direct_map_phys() is extended to support MMIO path either. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
2025-09-12iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_physLeon Romanovsky
Rename the IOMMU DMA mapping functions to better reflect their actual calling convention. The functions iommu_dma_map_page() and iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and iommu_dma_unmap_phys() respectively, as they already operate on physical addresses rather than page structures. The calling convention changes from accepting (struct page *page, unsigned long offset) to (phys_addr_t phys), which eliminates the need for page-to-physical address conversion within the functions. This renaming prepares for the broader DMA API conversion from page-based to physical address-based mapping throughout the kernel. All callers are updated to pass physical addresses directly, including dma_map_page_attrs(), scatterlist mapping functions, and DMA page allocation helpers. The change simplifies the code by removing the page_to_phys() + offset calculation that was previously done inside the IOMMU functions. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/ed172f95f8f57782beae04f782813366894e98df.1757423202.git.leonro@nvidia.com
2025-09-12dma-mapping: rename trace_dma_*map_page to trace_dma_*map_physLeon Romanovsky
As a preparation for following map_page -> map_phys API conversion, let's rename trace_dma_*map_page() to be trace_dma_*map_phys(). Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/c0c02d7d8bd4a148072d283353ba227516a76682.1757423202.git.leonro@nvidia.com
2025-09-12dma-debug: refactor to use physical addresses for page mappingLeon Romanovsky
Convert the DMA debug infrastructure from page-based to physical address-based mapping as a preparation to rely on physical address for DMA mapping routines. The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and changes its signature to accept a phys_addr_t parameter instead of struct page and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys(). A new dma_debug_phy type is introduced to distinguish physical address mappings from other debug entry types. All callers throughout the codebase are updated to pass physical addresses directly, eliminating the need for page-to-physical conversion in the debug layer. This refactoring eliminates the need to convert between page pointers and physical addresses in the debug layer, making the code more efficient and consistent with the DMA mapping API's physical address focus. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> [mszyprow: added a fixup] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/56d1a6769b68dfcbf8b26a75a7329aeb8e3c3b6a.1757423202.git.leonro@nvidia.com Link: https://lore.kernel.org/all/20250910052618.GH341237@unreal/
2025-09-12Merge tag 'dma-mapping-6.17-2025-09-09' into HEADMarek Szyprowski
dma-mapping fix for Linux 6.17 - one more fix for DMA API debugging infrastructure (Baochen Qiang)
2025-09-02dma-debug: don't enforce dma mapping check on noncoherent allocationsBaochen Qiang
As discussed in [1], there is no need to enforce dma mapping check on noncoherent allocations, a simple test on the returned CPU address is good enough. Add a new pair of debug helpers and use them for noncoherent alloc/free to fix this issue. Fixes: efa70f2fdc84 ("dma-mapping: add a new dma_alloc_pages API") Link: https://lore.kernel.org/all/ff6c1fe6-820f-4e58-8395-df06aa91706c@oss.qualcomm.com # 1 Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250828-dma-debug-fix-noncoherent-dma-check-v1-1-76e9be0dd7fc@oss.qualcomm.com
2025-08-13dma/pool: Ensure DMA_DIRECT_REMAP allocations are decryptedShanker Donthineni
When CONFIG_DMA_DIRECT_REMAP is enabled, atomic pool pages are remapped via dma_common_contiguous_remap() using the supplied pgprot. Currently, the mapping uses pgprot_dmacoherent(PAGE_KERNEL), which leaves the memory encrypted on systems with memory encryption enabled (e.g., ARM CCA Realms). This can cause the DMA layer to fail or crash when accessing the memory, as the underlying physical pages are not configured as expected. Fix this by requesting a decrypted mapping in the vmap() call: pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)) This ensures that atomic pool memory is consistently mapped unencrypted. Cc: stable@vger.kernel.org Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250811181759.998805-1-sdonthineni@nvidia.com
2025-08-11of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()Oreoluwa Babatunde
Restructure the call site for dma_contiguous_early_fixup() to where the reserved_mem nodes are being parsed from the DT so that dma_mmu_remap[] is populated before dma_contiguous_remap() is called. Fixes: 8a6e02d0c00e ("of: reserved_mem: Restructure how the reserved memory regions are processed") Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250806172421.2748302-1-oreoluwa.babatunde@oss.qualcomm.com
2025-08-11swiotlb: Remove redundant __GFP_NOWARNQianfeng Rong
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250805023222.332920-1-rongqianfeng@vivo.com
2025-08-11dma-direct: clean up the logic in __dma_direct_alloc_pages()Petr Tesarik
Convert a goto-based loop to a while() loop. To allow the simplification, return early when allocation from CMA is successful. As a bonus, this early return avoids a repeated dma_coherent_ok() check. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com
2025-06-12dma-contiguous: hornor the cma address limit setup by userFeng Tang
When porting a cma related usage from x86_64 server to arm64 server, the "cma=4G@4G" setup failed on arm64. The reason is arm64 and some other architectures have specific physical address limit for reserved cma area, like 4GB due to the device's need for 32 bit dma. Actually lots of platforms of those architectures don't have this device dma limit, but still have to obey it, and are not able to reserve a huge cma pool. This situation could be improved by honoring the user input cma physical address than the arch limit. As when users specify it, they already knows what the default is which probably can't suit them. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250612021417.44929-1-feng.tang@linux.alibaba.com
2025-05-27Merge tag 'dma-mapping-6.16-2025-05-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: "New two step DMA mapping API, which is is a first step to a long path to provide alternatives to scatterlist and to remove hacks, abuses and design mistakes related to scatterlists. This new approach optimizes some calls to DMA-IOMMU layer and cache maintenance by batching them, reduces memory usage as it is no need to store mapped DMA addresses to unmap them, and reduces some function call overhead. It is a combination effort of many people, lead and developed by Christoph Hellwig and Leon Romanovsky" * tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: docs: core-api: document the IOVA-based API dma-mapping: add a dma_need_unmap helper dma-mapping: Implement link/unlink ranges API iommu/dma: Factor out a iommu_dma_map_swiotlb helper dma-mapping: Provide an interface to allow allocate IOVA iommu: add kernel-doc for iommu_unmap_fast iommu: generalize the batched sync after map interface dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h PCI/P2PDMA: Refactor the p2pdma mapping helpers
2025-05-06dma-mapping: add a dma_need_unmap helperChristoph Hellwig
Add helper that allows a driver to skip calling dma_unmap_* if the DMA layer can guarantee that they are no-nops. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.hChristoph Hellwig
To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06PCI/P2PDMA: Refactor the p2pdma mapping helpersChristoph Hellwig
The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-04-22dma-coherent: Warn if OF reserved memory is beyond current coherent DMA maskChen-Yu Tsai
When a reserved memory region described in the device tree is attached to a device, it is expected that the device's limitations are correctly included in that description. However, if the device driver failed to implement DMA address masking or addressing beyond the default 32 bits (on arm64), then bad things could happen because the DMA address was truncated, such as playing back audio with no actual audio coming out, or DMA overwriting random blocks of kernel memory. Check against the coherent DMA mask when the memory regions are attached to the device. Give a warning when the memory region can not be covered by the mask. A warning instead of a hard error was chosen, because it is possible that existing drivers could be working fine even if they forgot to extend the coherent DMA mask. Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250421083930.374173-1-wenst@chromium.org
2025-04-22dma-mapping: Fix warning reported for missing prototypeBalbir Singh
lkp reported a warning about missing prototype for a recent patch. The kernel-doc style comments are out of sync, move them to the right function. Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504190615.g9fANxHw-lkp@intel.com/ Signed-off-by: Balbir Singh <balbirs@nvidia.com> [mszyprow: reformatted subject] Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250422114034.3535515-1-balbirs@nvidia.com
2025-04-14dma/mapping.c: dev_dbg support for dma_addressing_limitedBalbir Singh
In the debug and resolution of an issue involving forced use of bounce buffers, 7170130e4c72 ("x86/mm/init: Handle the special case of device private pages in add_pages(), to not increase max_pfn and trigger dma_addressing_limited() bounce buffers"). It would have been easier to debug the issue if dma_addressing_limited() had debug information about the device not being able to address all of memory and thus forcing all accesses through a bounce buffer. Please see[2] Implement dev_dbg to debug the potential use of bounce buffers when we hit the condition. When swiotlb is used, dma_addressing_limited() is used to determine the size of maximum dma buffer size in dma_direct_max_mapping_size(). The debug prints could be triggered in that check as well (when enabled). Link: https://lore.kernel.org/lkml/20250401000752.249348-1-balbirs@nvidia.com/ [1] Link: https://lore.kernel.org/lkml/20250310112206.4168-1-spasswolf@web.de/ [2] Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250414113752.3298276-1-balbirs@nvidia.com
2025-04-09dma/contiguous: avoid warning about unused size_bytesArnd Bergmann
When building with W=1, this variable is unused for configs with CONFIG_CMA_SIZE_SEL_PERCENTAGE=y: kernel/dma/contiguous.c:67:26: error: 'size_bytes' defined but not used [-Werror=unused-const-variable=] Change this to a macro to avoid the warning. Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250409151557.3890443-1-arnd@kernel.org
2025-03-12dma-mapping: fix missing clear bdr in check_ram_in_range_map()Baochen Qiang
As discussed in [1], if 'bdr' is set once, it would never get cleared, hence 0 is always returned. Refactor the range check hunk into a new helper dma_find_range(), which allows 'bdr' to be cleared in each iteration. Link: https://lore.kernel.org/all/64931fac-085b-4ff3-9314-84bac2fa9bdb@quicinc.com/ # [1] Fixes: a409d9600959 ("dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://lore.kernel.org/r/20250307030350.69144-1-quic_bqiang@quicinc.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2024-11-28dma-debug: fix physical address calculation for struct dma_debug_entryFedor Pchelkin
Offset into the page should also be considered while calculating a physical address for struct dma_debug_entry. page_to_phys() just shifts the value PAGE_SHIFT bits to the left so offset part is zero-filled. An example (wrong) debug assertion failure with CONFIG_DMA_API_DEBUG enabled which is observed during systemd boot process after recent dma-debug changes: DMA-API: e1000 0000:00:03.0: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: CPU: 4 PID: 941 at kernel/dma/debug.c:596 add_dma_entry CPU: 4 UID: 0 PID: 941 Comm: ip Not tainted 6.12.0+ #288 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:add_dma_entry kernel/dma/debug.c:596 Call Trace: <TASK> debug_dma_map_page kernel/dma/debug.c:1236 dma_map_page_attrs kernel/dma/mapping.c:179 e1000_alloc_rx_buffers drivers/net/ethernet/intel/e1000/e1000_main.c:4616 ... Found by Linux Verification Center (linuxtesting.org). Fixes: 9d4f645a1fd4 ("dma-debug: store a phys_addr_t in struct dma_debug_entry") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> [hch: added a little helper to clean up the code] Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-11-14dma-mapping: save base/size instead of pointer to shared DMA poolGeert Uytterhoeven
On RZ/Five, which is non-coherent, and uses CONFIG_DMA_GLOBAL_POOL=y: Oops - store (or AMO) access fault [#1] CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-rc1-00015-g8a6e02d0c00e #201 Hardware name: Renesas SMARC EVK based on r9a07g043f01 (DT) epc : __memset+0x60/0x100 ra : __dma_alloc_from_coherent+0x150/0x17a epc : ffffffff8062d2bc ra : ffffffff80053a94 sp : ffffffc60000ba20 gp : ffffffff812e9938 tp : ffffffd601920000 t0 : ffffffc6000d0000 t1 : 0000000000000000 t2 : ffffffffe9600000 s0 : ffffffc60000baa0 s1 : ffffffc6000d0000 a0 : ffffffc6000d0000 a1 : 0000000000000000 a2 : 0000000000001000 a3 : ffffffc6000d1000 a4 : 0000000000000000 a5 : 0000000000000000 a6 : ffffffd601adacc0 a7 : ffffffd601a841a8 s2 : ffffffd6018573c0 s3 : 0000000000001000 s4 : ffffffd6019541e0 s5 : 0000000200000022 s6 : ffffffd6018f8410 s7 : ffffffd6018573e8 s8 : 0000000000000001 s9 : 0000000000000001 s10: 0000000000000010 s11: 0000000000000000 t3 : 0000000000000000 t4 : ffffffffdefe62d1 t5 : 000000001cd6a3a9 t6 : ffffffd601b2aad6 status: 0000000200000120 badaddr: ffffffc6000d0000 cause: 0000000000000007 [<ffffffff8062d2bc>] __memset+0x60/0x100 [<ffffffff80053e1a>] dma_alloc_from_global_coherent+0x1c/0x28 [<ffffffff80053056>] dma_direct_alloc+0x98/0x112 [<ffffffff8005238c>] dma_alloc_attrs+0x78/0x86 [<ffffffff8035fdb4>] rz_dmac_probe+0x3f6/0x50a [<ffffffff803a0694>] platform_probe+0x4c/0x8a If CONFIG_DMA_GLOBAL_POOL=y, the reserved_mem structure passed to rmem_dma_setup() is saved for later use, by saving the passed pointer. However, when dma_init_reserved_memory() is called later, the pointer has become stale, causing a crash. E.g. in the RZ/Five case, the referenced memory now contains the reserved_mem structure for the "mmode_resv0@30000" node (with base 0x30000 and size 0x10000), instead of the correct "pma_resv0@58000000" node (with base 0x58000000 and size 0x8000000). Fix this by saving the needed reserved_mem structure's contents instead. Fixes: 8a6e02d0c00e7b62 ("of: reserved_mem: Restructure how the reserved memory regions are processed") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Oreoluwa Babatunde <quic_obabatun@quicinc.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-11-08dma-mapping: fix swapped dir/flags arguments to trace_dma_alloc_sgt_errSean Anderson
trace_dma_alloc_sgt_err was called with the dir and flags arguments swapped. Fix this. Fixes: 68b6dbf1f441 ("dma-mapping: trace more error paths") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410302243.1wnTlPk3-lkp@intel.com/ Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: trace more error pathsSean Anderson
It can be surprising to the user if DMA functions are only traced on success. On failure, it can be unclear what the source of the problem is. Fix this by tracing all functions even when they fail. Cases where we BUG/WARN are skipped, since those should be sufficiently noisy already. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_mapSean Anderson
In some cases, we use trace_dma_map to trace dma_alloc* functions. This generally follows dma_debug. However, this does not record all of the relevant information for allocations, such as GFP flags. Create new dma_alloc tracepoints for these functions. Note that while dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's point of view), the device will only see one contiguous mapping. Therefore, we just need to trace dma_addr and size. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-mapping: trace dma_alloc/free directionSean Anderson
In preparation for using these tracepoints in a few more places, trace the DMA direction as well. For coherent allocations this is always bidirectional. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-debug: remove DMA_API_DEBUG_SGChristoph Hellwig
The scatterlist validity checks are pretty simple and cheap, perform them unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-debug: store a phys_addr_t in struct dma_debug_entryChristoph Hellwig
dma-debug goes to great length to split incoming physical addresses into a PFN and offset to store them in struct dma_debug_entry, just to recombine those for all meaningful uses. Just store a phys_addr_t instead. Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-10-29dma-debug: fix a possible deadlock on radix_lockLevi Yun
radix_lock() shouldn't be held while holding dma_hash_entry[idx].lock otherwise, there's a possible deadlock scenario when dma debug API is called holding rq_lock(): CPU0 CPU1 CPU2 dma_free_attrs() check_unmap() add_dma_entry() __schedule() //out (A) rq_lock() get_hash_bucket() (A) dma_entry_hash check_sync() (A) radix_lock() (W) dma_entry_hash dma_entry_free() (W) radix_lock() // CPU2's one (W) rq_lock() CPU1 situation can happen when it extending radix tree and it tries to wake up kswapd via wake_all_kswapd(). CPU2 situation can happen while perf_event_task_sched_out() (i.e. dma sync operation is called while deleting perf_event using etm and etr tmc which are Arm Coresight hwtracing driver backends). To remove this possible situation, call dma_entry_free() after put_hash_bucket() in check_unmap(). Reported-by: Denis Nikitin <denik@chromium.org> Closes: https://lists.linaro.org/archives/list/coresight@lists.linaro.org/thread/2WMS7BBSF5OZYB63VT44U5YWLFP5HL6U/#RWM6MLQX5ANBTEQ2PRM7OXCBGCE6NPWU Signed-off-by: Levi Yun <yeoreum.yun@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-23dma-mapping: report unlimited DMA addressing in IOMMU DMA pathLeon Romanovsky
While using the IOMMU DMA path, the dma_addressing_limited() function checks ops struct which doesn't exist in the IOMMU case. This causes to the kernel panic while loading ADMGPU driver. BUG: kernel NULL pointer dereference, address: 00000000000000a0 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 UID: 0 PID: 611 Comm: (udev-worker) Tainted: G T 6.11.0-clang-07154-g726e2d0cf2bb #257 Tainted: [T]=RANDSTRUCT Hardware name: ASUS System Product Name/ROG STRIX Z690-G GAMING WIFI, BIOS 3701 07/03/2024 RIP: 0010:dma_addressing_limited+0x53/0xa0 Code: 8b 93 48 02 00 00 48 39 d1 49 89 d6 4c 0f 42 f1 48 85 d2 4c 0f 44 f1 f6 83 fc 02 00 00 40 75 0a 48 89 df e8 1f 09 00 00 eb 24 <4c> 8b 1c 25 a0 00 00 00 4d 85 db 74 17 48 89 df 41 ba 8b 84 2d 55 RSP: 0018:ffffa8d2c12cf740 EFLAGS: 00010202 RAX: 00000000ffffffff RBX: ffff8948820220c8 RCX: 000000ffffffffff RDX: 0000000000000000 RSI: ffffffffc124dc6d RDI: ffff8948820220c8 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff894883c3f040 R13: ffff89488dac8828 R14: 000000ffffffffff R15: ffff8948820220c8 FS: 00007fe6ba881900(0000) GS:ffff894fdf700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 0000000111984000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x65/0xc0 ? page_fault_oops+0x3b9/0x450 ? _prb_read_valid+0x212/0x390 ? do_user_addr_fault+0x608/0x680 ? exc_page_fault+0x4e/0xa0 ? asm_exc_page_fault+0x26/0x30 ? dma_addressing_limited+0x53/0xa0 amdgpu_ttm_init+0x56/0x4b0 [amdgpu] gmc_v8_0_sw_init+0x561/0x670 [amdgpu] amdgpu_device_ip_init+0xf5/0x570 [amdgpu] amdgpu_device_init+0x1a57/0x1ea0 [amdgpu] ? _raw_spin_unlock_irqrestore+0x1a/0x40 ? pci_conf1_read+0xc0/0xe0 ? pci_bus_read_config_word+0x52/0xa0 amdgpu_driver_load_kms+0x15/0xa0 [amdgpu] amdgpu_pci_probe+0x1b7/0x4c0 [amdgpu] pci_device_probe+0x1c5/0x260 really_probe+0x130/0x470 __driver_probe_device+0x77/0x150 driver_probe_device+0x19/0x120 __driver_attach+0xb1/0x1e0 ? __cfi___driver_attach+0x10/0x10 bus_for_each_dev+0x115/0x170 bus_add_driver+0x192/0x2d0 driver_register+0x5c/0xf0 ? __cfi_init_module+0x10/0x10 [amdgpu] do_one_initcall+0x128/0x380 ? idr_alloc_cyclic+0x139/0x1d0 ? security_kernfs_init_security+0x42/0x140 ? __kernfs_new_node+0x1be/0x250 ? sysvec_apic_timer_interrupt+0xb6/0xc0 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 ? _raw_spin_unlock+0x11/0x30 ? free_unref_page+0x283/0x650 ? kfree+0x274/0x3a0 ? kfree+0x274/0x3a0 ? kfree+0x274/0x3a0 ? load_module+0xf2e/0x1130 ? __kmalloc_cache_noprof+0x12a/0x2e0 do_init_module+0x7d/0x240 __se_sys_init_module+0x19e/0x220 do_syscall_64+0x8a/0x150 ? __irq_exit_rcu+0x5e/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fe6bb5980ee Code: 48 8b 0d 3d ed 12 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0a ed 12 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd462219d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000af RAX: ffffffffffffffda RBX: 0000556caf0d0670 RCX: 00007fe6bb5980ee RDX: 0000556caf0d3080 RSI: 0000000002893458 RDI: 00007fe6b3400010 RBP: 0000000000020000 R08: 0000000000020010 R09: 0000000000000080 R10: c26073c166186e00 R11: 0000000000000206 R12: 0000556caf0d3430 R13: 0000556caf0d0670 R14: 0000556caf0d3080 R15: 0000556caf0ce700 </TASK> Modules linked in: amdgpu(+) i915(+) drm_suballoc_helper intel_gtt drm_exec drm_buddy iTCO_wdt i2c_algo_bit intel_pmc_bxt drm_display_helper iTCO_vendor_support gpu_sched drm_ttm_helper cec ttm amdxcp video backlight pinctrl_alderlake nct6775 hwmon_vid nct6775_core coretemp CR2: 00000000000000a0 ---[ end trace 0000000000000000 ]--- RIP: 0010:dma_addressing_limited+0x53/0xa0 Code: 8b 93 48 02 00 00 48 39 d1 49 89 d6 4c 0f 42 f1 48 85 d2 4c 0f 44 f1 f6 83 fc 02 00 00 40 75 0a 48 89 df e8 1f 09 00 00 eb 24 <4c> 8b 1c 25 a0 00 00 00 4d 85 db 74 17 48 89 df 41 ba 8b 84 2d 55 RSP: 0018:ffffa8d2c12cf740 EFLAGS: 00010202 RAX: 00000000ffffffff RBX: ffff8948820220c8 RCX: 000000ffffffffff RDX: 0000000000000000 RSI: ffffffffc124dc6d RDI: ffff8948820220c8 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff894883c3f040 R13: ffff89488dac8828 R14: 000000ffffffffff R15: ffff8948820220c8 FS: 00007fe6ba881900(0000) GS:ffff894fdf700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 0000000111984000 CR4: 0000000000f50ef0 PKRU: 55555554 Fixes: b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219292 Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
2024-09-22dma-mapping: fix vmap and mmap of noncontiougs allocationsChristoph Hellwig
Commit b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") switched to use direct calls to dma-iommu, but missed the dma_vmap_noncontiguous, dma_vunmap_noncontiguous and dma_mmap_noncontiguous behavior keyed off the presence of the alloc_noncontiguous method. Fix this by removing the now unused alloc_noncontiguous and free_noncontiguous methods and moving the vmapping and mmaping of the noncontiguous allocations into the iommu code, as it is the only provider of actually noncontiguous allocations. Fixes: b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") Reported-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leon@kernel.org> Tested-by: Xi Ruoyao <xry111@xry111.site>
2024-09-12dma-mapping: reflow dma_supportedChristoph Hellwig
dma_supported has become too much spaghetti for my taste. Reflow it to remove the duplicate use_dma_iommu condition and make the main path more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leon@kernel.org>
2024-09-12dma-mapping: reliably inform about DMA support for IOMMULeon Romanovsky
If the DMA IOMMU path is going to be used, the appropriate check should return that DMA is supported. Fixes: b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") Closes: https://lore.kernel.org/all/181e06ff-35a3-434f-b505-672f430bd1cb@notapiano Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-10dma-mapping: add tracing for dma-mapping API callsSean Anderson
When debugging drivers, it can often be useful to trace when memory gets (un)mapped for DMA (and can be accessed by the device). Add some tracepoints for this purpose. Use u64 instead of phys_addr_t and dma_addr_t (and similarly %llx instead of %pa) because libtraceevent can't handle typedefs in all cases. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-05dma-mapping: use IOMMU DMA calls for common alloc/free page callsLeon Romanovsky
Common alloca and free pages routines are called when IOMMU DMA is used, and internally it calls to DMA ops structure which is not available for default IOMMU. This patch adds necessary if checks to call IOMMU DMA. It fixes the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000040 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000d20bb000 [0000000000000040] pgd=08000000d20c1003 , p4d=08000000d20c1003 , pud=08000000d20c2003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: ipv6 hci_uart venus_core btqca v4l2_mem2mem btrtl qcom_spmi_adc5 sbs_battery btbcm qcom_vadc_common cros_ec_typec videobuf2_v4l2 leds_cros_ec cros_kbd_led_backlight cros_ec_chardev videodev elan_i2c videobuf2_common qcom_stats mc bluetooth coresight_stm stm_core ecdh_generic ecc pwrseq_core panel_edp icc_bwmon ath10k_snoc ath10k_core ath mac80211 phy_qcom_qmp_combo aux_bridge libarc4 coresight_replicator coresight_etm4x coresight_tmc coresight_funnel cfg80211 rfkill coresight qcom_wdt cbmem ramoops reed_solomon pwm_bl coreboot_table backlight crct10dif_ce CPU: 7 UID: 0 PID: 70 Comm: kworker/u32:4 Not tainted 6.11.0-rc6-next-20240903-00003-gdfc6015d0711 #660 Hardware name: Google Lazor Limozeen without Touchscreen (rev5 - rev8) (DT) Workqueue: events_unbound deferred_probe_work_func hub 2-1:1.0: 4 ports detected pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : dma_common_alloc_pages+0x54/0x1b4 lr : dma_common_alloc_pages+0x4c/0x1b4 sp : ffff8000807d3730 x29: ffff8000807d3730 x28: ffff02a7d312f880 x27: 0000000000000001 x26: 000000000000c000 x25: 0000000000000000 x24: 0000000000000001 x23: ffff02a7d23b6898 x22: 0000000000006cc0 x21: 000000000000c000 x20: ffff02a7858bf410 x19: fffffe0a60006000 x18: 0000000000000001 x17: 00000000000000d5 x16: 1fffe054f0bcc261 x15: 0000000000000001 x14: ffff02a7844dc680 x13: 0000000000100180 x12: dead000000000100 x11: dead000000000122 x10: 00000000001001ff x9 : ffff02a87f7b7b00 x8 : ffff02a87f7b7b00 x7 : ffff405977d6b000 x6 : ffff8000807d3310 x5 : ffff02a87f6b6398 x4 : 0000000000000001 x3 : ffff405977d6b000 x2 : ffff02a7844dc600 x1 : 0000000100000000 x0 : fffffe0a60006000 Call trace: dma_common_alloc_pages+0x54/0x1b4 __dma_alloc_pages+0x68/0x90 dma_alloc_pages+0x10/0x1c snd_dma_noncoherent_alloc+0x28/0x8c __snd_dma_alloc_pages+0x30/0x50 snd_dma_alloc_dir_pages+0x40/0x80 do_alloc_pages+0xb8/0x13c preallocate_pcm_pages+0x6c/0xf8 preallocate_pages+0x160/0x1a4 snd_pcm_set_managed_buffer_all+0x64/0xb0 lpass_platform_pcm_new+0xc0/0xe8 snd_soc_pcm_component_new+0x3c/0xc8 soc_new_pcm+0x4fc/0x668 snd_soc_bind_card+0xabc/0xbac snd_soc_register_card+0xf0/0x108 devm_snd_soc_register_card+0x4c/0xa4 sc7180_snd_platform_probe+0x180/0x224 platform_probe+0x68/0xc0 really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x3c/0x15c __device_attach_driver+0xb8/0x134 bus_for_each_drv+0x84/0xe0 __device_attach+0x9c/0x188 device_initial_probe+0x14/0x20 bus_probe_device+0xac/0xb0 deferred_probe_work_func+0x88/0xc0 process_one_work+0x14c/0x28c worker_thread+0x2cc/0x3d4 kthread+0x114/0x118 ret_from_fork+0x10/0x20 Code: f9411c19 940000c9 aa0003f3 b4000460 (f9402326) ---[ end trace 0000000000000000 ]--- Fixes: b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") Closes: https://lore.kernel.org/all/10431dfd-ce04-4e0f-973b-c78477303c18@notapiano Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-04dma-direct: optimize page freeing when it is not addressableChen Yu
When the CMA allocation succeeds but isn't addressable, its buffer has already been released and the page is set to NULL. So later when the normal page allocation succeeds but isn't addressable, __free_pages() can be used to free that normal page rather than using dma_free_contiguous that does extra checks that are not needed. Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-04dma-mapping: clearly mark DMA ops as an architecture featureChristoph Hellwig
DMA ops are a helper for architectures and not for drivers to override the DMA implementation. Unfortunately driver authors keep ignoring this. Make the fact more clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers overriding their dma_ops depend on that. These drivers should probably be marked broken, but we can give them a bit of a grace period for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6 Acked-by: Robin Murphy <robin.murphy@arm.com>
2024-08-22dma-mapping: direct calls for dma-iommuLeon Romanovsky
Directly call into dma-iommu just like we have been doing for dma-direct for a while. This avoids the indirect call overhead for IOMMU ops and removes the need to have DMA ops entirely for many common configurations. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-22dma-mapping: call ->unmap_page and ->unmap_sg unconditionallyLeon Romanovsky
Almost all instances of the dma_map_ops ->map_page()/map_sg() methods implement ->unmap_page()/unmap_sg() too. The once instance which doesn't dma_dummy_ops which is used to fail the DMA mapping and thus there won't be any calls to ->unmap_page()/unmap_sg(). Remove the checks for ->unmap_page()/unmap_sg() and call them directly to create an interface that is symmetrical to ->map_page()/map_sg(). Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-22dma-mapping: replace zone_dma_bits by zone_dma_limitCatalin Marinas
The hardware DMA limit might not be power of 2. When RAM range starts above 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit can not encode this limit. Use a plain address for the DMA zone limit instead. Since the DMA zone can now potentially span beyond 4GB physical limit of DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Co-developed-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-22dma-mapping: use bit masking to check VM_DMA_COHERENTYosry Ahmed
In dma_common_find_pages(), area->flags are compared directly with VM_DMA_COHERENT. This works because VM_DMA_COHERENT is the only set flag. During development of a new feature (ASI [1]), a new VM flag is introduced, and that flag can be injected into VM_DMA_COHERENT mappings (among others). The presence of that flag caused dma_common_find_pages() to return NULL for VM_DMA_COHERENT addresses, leading to a lot of problems ending in crashing during boot. It took a bit of time to figure this problem out. It was a mistake to inject a VM flag to begin with, but it took a significant amount of debugging to figure out the problem. Most users of area->flags use bitmasking rather than equivalency to check for flags. Update dma_common_find_pages() and dma_common_free_remap() to do the same, which would have avoided the boot crashing. Instead, add a warning in dma_common_find_pages() if any extra VM flags are set to catch such problems more easily during development. No functional change intended. [1]https://lore.kernel.org/lkml/20240712-asi-rfc-24-v1-0-144b319a40d8@google.com/ Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-08-06dma-debug: avoid deadlock between dma debug vs printk and netconsoleRik van Riel
Currently the dma debugging code can end up indirectly calling printk under the radix_lock. This happens when a radix tree node allocation fails. This is a problem because the printk code, when used together with netconsole, can end up inside the dma debugging code while trying to transmit a message over netcons. This creates the possibility of either a circular deadlock on the same CPU, with that CPU trying to grab the radix_lock twice, or an ABBA deadlock between different CPUs, where one CPU grabs the console lock first and then waits for the radix_lock, while the other CPU is holding the radix_lock and is waiting for the console lock. The trace captured by lockdep is of the ABBA variant. -> #2 (&dma_entry_hash[i].lock){-.-.}-{2:2}: _raw_spin_lock_irqsave+0x5a/0x90 debug_dma_map_page+0x79/0x180 dma_map_page_attrs+0x1d2/0x2f0 bnxt_start_xmit+0x8c6/0x1540 netpoll_start_xmit+0x13f/0x180 netpoll_send_skb+0x20d/0x320 netpoll_send_udp+0x453/0x4a0 write_ext_msg+0x1b9/0x460 console_flush_all+0x2ff/0x5a0 console_unlock+0x55/0x180 vprintk_emit+0x2e3/0x3c0 devkmsg_emit+0x5a/0x80 devkmsg_write+0xfd/0x180 do_iter_readv_writev+0x164/0x1b0 vfs_writev+0xf9/0x2b0 do_writev+0x6d/0x110 do_syscall_64+0x80/0x150 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (console_owner){-.-.}-{0:0}: __lock_acquire+0x15d1/0x31a0 lock_acquire+0xe8/0x290 console_flush_all+0x2ea/0x5a0 console_unlock+0x55/0x180 vprintk_emit+0x2e3/0x3c0 _printk+0x59/0x80 warn_alloc+0x122/0x1b0 __alloc_pages_slowpath+0x1101/0x1120 __alloc_pages+0x1eb/0x2c0 alloc_slab_page+0x5f/0x150 new_slab+0x2dc/0x4e0 ___slab_alloc+0xdcb/0x1390 kmem_cache_alloc+0x23d/0x360 radix_tree_node_alloc+0x3c/0xf0 radix_tree_insert+0xf5/0x230 add_dma_entry+0xe9/0x360 dma_map_page_attrs+0x1d2/0x2f0 __bnxt_alloc_rx_frag+0x147/0x180 bnxt_alloc_rx_data+0x79/0x160 bnxt_rx_skb+0x29/0xc0 bnxt_rx_pkt+0xe22/0x1570 __bnxt_poll_work+0x101/0x390 bnxt_poll+0x7e/0x320 __napi_poll+0x29/0x160 net_rx_action+0x1e0/0x3e0 handle_softirqs+0x190/0x510 run_ksoftirqd+0x4e/0x90 smpboot_thread_fn+0x1a8/0x270 kthread+0x102/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x11/0x20 This bug is more likely than it seems, because when one CPU has run out of memory, chances are the other has too. The good news is, this bug is hidden behind the CONFIG_DMA_API_DEBUG, so not many users are likely to trigger it. Signed-off-by: Rik van Riel <riel@surriel.com> Reported-by: Konstantin Ovsepian <ovs@meta.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-07-19dma: fix call order in dmam_free_coherentLance Richardson
dmam_free_coherent() frees a DMA allocation, which makes the freed vaddr available for reuse, then calls devres_destroy() to remove and free the data structure used to track the DMA allocation. Between the two calls, it is possible for a concurrent task to make an allocation with the same vaddr and add it to the devres list. If this happens, there will be two entries in the devres list with the same vaddr and devres_destroy() can free the wrong entry, triggering the WARN_ON() in dmam_match. Fix by destroying the devres entry before freeing the DMA allocation. Tested: kokonut //net/encryption http://sponge2/b9145fe6-0f72-4325-ac2f-a84d81075b03 Fixes: 9ac7849e35f7 ("devres: device resource management") Signed-off-by: Lance Richardson <rlance@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de>