From 43d746dc49bb4c82034fce01a92fe67344d664cf Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Wed, 29 May 2024 13:19:01 +0200 Subject: mm/zsmalloc: use a proper page type Let's clean it up: use a proper page type and store our data (offset into a page) in the lower 16 bit as documented. We won't be able to support 256 KiB base pages, which is acceptable. Teach Kconfig to handle that cleanly using a new CONFIG_HAVE_ZSMALLOC. Based on this, we should do a proper "struct zsdesc" conversion, as proposed in [1]. This removes the last _mapcount/page_type offender. [1] https://lore.kernel.org/all/20231130101242.2590384-1-42.hyeyoo@gmail.com/ Link: https://lkml.kernel.org/r/20240529111904.2069608-4-david@redhat.com Signed-off-by: David Hildenbrand Tested-by: Sergey Senozhatsky [zram/zsmalloc workloads] Reviewed-by: Sergey Senozhatsky Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Matthew Wilcox (Oracle) Cc: Mike Rapoport (IBM) Cc: Minchan Kim Signed-off-by: Andrew Morton --- drivers/block/zram/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers') diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index 7b29cce60ab2..eacf1cba7bf4 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -2,6 +2,7 @@ config ZRAM tristate "Compressed RAM block device support" depends on BLOCK && SYSFS && MMU + depends on HAVE_ZSMALLOC depends on CRYPTO_LZO || CRYPTO_ZSTD || CRYPTO_LZ4 || CRYPTO_LZ4HC || CRYPTO_842 select ZSMALLOC help -- cgit v1.2.3 From 645b1399fa67baff565fa82c48976c53822a393f Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Tue, 4 Jun 2024 19:48:21 +0800 Subject: fb_defio: use a folio in fb_deferred_io_work() Replaces three calls to compound_head() with one, which removes last caller of page_mkclean(). Link: https://lkml.kernel.org/r/20240604114822.2089819-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: David Hildenbrand Cc: Daniel Vetter Cc: Helge Deller Cc: Jonathan Corbet Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- drivers/video/fbdev/core/fb_defio.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'drivers') diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index 806ecd32219b..c9c8e294b7e7 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -244,10 +244,11 @@ static void fb_deferred_io_work(struct work_struct *work) /* here we mkclean the pages, then do all deferred IO */ mutex_lock(&fbdefio->lock); list_for_each_entry(pageref, &fbdefio->pagereflist, list) { - struct page *cur = pageref->page; - lock_page(cur); - page_mkclean(cur); - unlock_page(cur); + struct folio *folio = page_folio(pageref->page); + + folio_lock(folio); + folio_mkclean(folio); + folio_unlock(folio); } /* driver's callback with pagereflist */ -- cgit v1.2.3 From a929e0d10f3db1a53668f6b9845db27d7fb63759 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Tue, 4 Jun 2024 19:48:22 +0800 Subject: mm: remove page_mkclean() There are no more users of page_mkclean(), remove it and update the document and comment. Link: https://lkml.kernel.org/r/20240604114822.2089819-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: David Hildenbrand Cc: Daniel Vetter Cc: Helge Deller Cc: Jonathan Corbet Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- drivers/video/fbdev/core/fb_defio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers') diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c index c9c8e294b7e7..d38998714215 100644 --- a/drivers/video/fbdev/core/fb_defio.c +++ b/drivers/video/fbdev/core/fb_defio.c @@ -113,7 +113,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf) printk(KERN_ERR "no mapping available\n"); BUG_ON(!page->mapping); - page->index = vmf->pgoff; /* for page_mkclean() */ + page->index = vmf->pgoff; /* for folio_mkclean() */ vmf->page = page; return 0; @@ -161,7 +161,7 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long /* * We want the page to remain locked from ->page_mkwrite until - * the PTE is marked dirty to avoid page_mkclean() being called + * the PTE is marked dirty to avoid folio_mkclean() being called * before the PTE is updated, which would leave the page ignored * by defio. * Do this by locking the page here and informing the caller -- cgit v1.2.3 From 503b158fc30f203a1854c87183ca3467c6466001 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Fri, 7 Jun 2024 11:09:37 +0200 Subject: mm/memory_hotplug: initialize memmap of !ZONE_DEVICE with PageOffline() instead of PageReserved() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We currently initialize the memmap such that PG_reserved is set and the refcount of the page is 1. In virtio-mem code, we have to manually clear that PG_reserved flag to make memory offlining with partially hotplugged memory blocks possible: has_unmovable_pages() would otherwise bail out on such pages. We want to avoid PG_reserved where possible and move to typed pages instead. Further, we want to further enlighten memory offlining code about PG_offline: offline pages in an online memory section. One example is handling managed page count adjustments in a cleaner way during memory offlining. So let's initialize the pages with PG_offline instead of PG_reserved. generic_online_page()->__free_pages_core() will now clear that flag before handing that memory to the buddy. Note that the page refcount is still 1 and would forbid offlining of such memory except when special care is take during GOING_OFFLINE as currently only implemented by virtio-mem. With this change, we can now get non-PageReserved() pages in the XEN balloon list. From what I can tell, that can already happen via decrease_reservation(), so that should be fine. HV-balloon should not really observe a change: partial online memory blocks still cannot get surprise-offlined, because the refcount of these PageOffline() pages is 1. Update virtio-mem, HV-balloon and XEN-balloon code to be aware that hotplugged pages are now PageOffline() instead of PageReserved() before they are handed over to the buddy. We'll leave the ZONE_DEVICE case alone for now. Note that self-hosted vmemmap pages will no longer be marked as reserved. This matches ordinary vmemmap pages allocated from the buddy during memory hotplug. Now, really only vmemmap pages allocated from memblock during early boot will be marked reserved. Existing PageReserved() checks seem to be handling all relevant cases correctly even after this change. Link: https://lkml.kernel.org/r/20240607090939.89524-3-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador [generic memory-hotplug bits] Cc: Alexander Potapenko Cc: Dexuan Cui Cc: Dmitry Vyukov Cc: Eugenio Pérez Cc: Haiyang Zhang Cc: Jason Wang Cc: Juergen Gross Cc: "K. Y. Srinivasan" Cc: Marco Elver Cc: Michael S. Tsirkin Cc: Mike Rapoport (IBM) Cc: Oleksandr Tyshchenko Cc: Stefano Stabellini Cc: Wei Liu Cc: Xuan Zhuo Signed-off-by: Andrew Morton --- drivers/hv/hv_balloon.c | 5 ++--- drivers/virtio/virtio_mem.c | 18 ++++++++++++------ drivers/xen/balloon.c | 9 +++++++-- 3 files changed, 21 insertions(+), 11 deletions(-) (limited to 'drivers') diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c index 0e7427c2baf5..c38dcdfcb914 100644 --- a/drivers/hv/hv_balloon.c +++ b/drivers/hv/hv_balloon.c @@ -683,9 +683,8 @@ static void hv_page_online_one(struct hv_hotadd_state *has, struct page *pg) if (!PageOffline(pg)) __SetPageOffline(pg); return; - } - if (PageOffline(pg)) - __ClearPageOffline(pg); + } else if (!PageOffline(pg)) + return; /* This frame is currently backed; online the page. */ generic_online_page(pg, 0); diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index a3857bacc844..b90df29621c8 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1146,12 +1146,16 @@ static void virtio_mem_set_fake_offline(unsigned long pfn, for (; nr_pages--; pfn++) { struct page *page = pfn_to_page(pfn); - __SetPageOffline(page); - if (!onlined) { + if (!onlined) + /* + * Pages that have not been onlined yet were initialized + * to PageOffline(). Remember that we have to route them + * through generic_online_page(). + */ SetPageDirty(page); - /* FIXME: remove after cleanups */ - ClearPageReserved(page); - } + else + __SetPageOffline(page); + VM_WARN_ON_ONCE(!PageOffline(page)); } page_offline_end(); } @@ -1166,9 +1170,11 @@ static void virtio_mem_clear_fake_offline(unsigned long pfn, for (; nr_pages--; pfn++) { struct page *page = pfn_to_page(pfn); - __ClearPageOffline(page); if (!onlined) + /* generic_online_page() will clear PageOffline(). */ ClearPageDirty(page); + else + __ClearPageOffline(page); } } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index aaf2514fcfa4..528395133b4f 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -146,7 +146,8 @@ static DECLARE_WAIT_QUEUE_HEAD(balloon_wq); /* balloon_append: add the given page to the balloon. */ static void balloon_append(struct page *page) { - __SetPageOffline(page); + if (!PageOffline(page)) + __SetPageOffline(page); /* Lowmem is re-populated first, so highmem pages go at list tail. */ if (PageHighMem(page)) { @@ -412,7 +413,11 @@ static enum bp_state increase_reservation(unsigned long nr_pages) xenmem_reservation_va_mapping_update(1, &page, &frame_list[i]); - /* Relinquish the page back to the allocator. */ + /* + * Relinquish the page back to the allocator. Note that + * some pages, including ones added via xen_online_page(), might + * not be marked reserved; free_reserved_page() will handle that. + */ free_reserved_page(page); } -- cgit v1.2.3 From 50625744220c101705a989d7c57a6c16e945f3b1 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Fri, 7 Jun 2024 11:09:38 +0200 Subject: mm/memory_hotplug: skip adjust_managed_page_count() for PageOffline() pages when offlining MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We currently have a hack for virtio-mem in place to handle memory offlining with PageOffline pages for which we already adjusted the managed page count. Let's enlighten memory offlining code so we can get rid of that hack, and document the situation. Link: https://lkml.kernel.org/r/20240607090939.89524-4-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador Cc: Alexander Potapenko Cc: Dexuan Cui Cc: Dmitry Vyukov Cc: Eugenio Pérez Cc: Haiyang Zhang Cc: Jason Wang Cc: Juergen Gross Cc: "K. Y. Srinivasan" Cc: Marco Elver Cc: Michael S. Tsirkin Cc: Mike Rapoport (IBM) Cc: Oleksandr Tyshchenko Cc: Stefano Stabellini Cc: Wei Liu Cc: Xuan Zhuo Signed-off-by: Andrew Morton --- drivers/virtio/virtio_mem.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) (limited to 'drivers') diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index b90df29621c8..b0b871441578 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -1269,12 +1269,6 @@ static void virtio_mem_fake_offline_going_offline(unsigned long pfn, struct page *page; unsigned long i; - /* - * Drop our reference to the pages so the memory can get offlined - * and add the unplugged pages to the managed page counters (so - * offlining code can correctly subtract them again). - */ - adjust_managed_page_count(pfn_to_page(pfn), nr_pages); /* Drop our reference to the pages so the memory can get offlined. */ for (i = 0; i < nr_pages; i++) { page = pfn_to_page(pfn + i); @@ -1293,10 +1287,9 @@ static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn, unsigned long i; /* - * Get the reference we dropped when going offline and subtract the - * unplugged pages from the managed page counters. + * Get the reference again that we dropped via page_ref_dec_and_test() + * when going offline. */ - adjust_managed_page_count(pfn_to_page(pfn), -nr_pages); for (i = 0; i < nr_pages; i++) page_ref_inc(pfn_to_page(pfn + i)); } -- cgit v1.2.3 From 1b301f5f28bae33087ec0d8a8730a02c87ba6235 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Fri, 21 Jun 2024 13:35:14 +0200 Subject: s390/irqflags: do not instrument arch_local_irq_*() with KMSAN Lockdep generates the following false positives with KMSAN on s390x: [ 6.063666] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) [ ...] [ 6.577050] Call Trace: [ 6.619637] [<000000000690d2de>] check_flags+0x1fe/0x210 [ 6.665411] ([<000000000690d2da>] check_flags+0x1fa/0x210) [ 6.707478] [<00000000006cec1a>] lock_acquire+0x2ca/0xce0 [ 6.749959] [<00000000069820ea>] _raw_spin_lock_irqsave+0xea/0x190 [ 6.794912] [<00000000041fc988>] __stack_depot_save+0x218/0x5b0 [ 6.838420] [<000000000197affe>] __msan_poison_alloca+0xfe/0x1a0 [ 6.882985] [<0000000007c5827c>] start_kernel+0x70c/0xd50 [ 6.927454] [<0000000000100036>] startup_continue+0x36/0x40 Between trace_hardirqs_on() and `stosm __mask, 3` lockdep thinks that interrupts are on, but on the CPU they are still off. KMSAN instrumentation takes spinlocks, giving lockdep a chance to see and complain about this discrepancy. KMSAN instrumentation is inserted in order to poison the __mask variable. Disable instrumentation in the respective functions. They are very small and it's easy to see that no important metadata updates are lost because of this. Link: https://lkml.kernel.org/r/20240621113706.315500-31-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich Reviewed-by: Alexander Potapenko Cc: Alexander Gordeev Cc: Christian Borntraeger Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Heiko Carstens Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim Cc: Cc: Marco Elver Cc: Mark Rutland Cc: Masami Hiramatsu (Google) Cc: Pekka Enberg Cc: Roman Gushchin Cc: Steven Rostedt (Google) Cc: Sven Schnelle Cc: Vasily Gorbik Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- drivers/s390/char/sclp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'drivers') diff --git a/drivers/s390/char/sclp.c b/drivers/s390/char/sclp.c index fbe29cabcbb8..f3621adbd5de 100644 --- a/drivers/s390/char/sclp.c +++ b/drivers/s390/char/sclp.c @@ -736,7 +736,7 @@ sclp_sync_wait(void) cr0_sync.val = cr0.val & ~CR0_IRQ_SUBCLASS_MASK; cr0_sync.val |= 1UL << (63 - 54); local_ctl_load(0, &cr0_sync); - __arch_local_irq_stosm(0x01); + arch_local_irq_enable_external(); /* Loop until driver state indicates finished request */ while (sclp_running_state != sclp_running_state_idle) { /* Check for expired request timer */ -- cgit v1.2.3 From 725553d202dda60dc17a142c80fd96bdf6ca43db Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 23 Jun 2024 23:36:12 -0700 Subject: udmabuf: add CONFIG_MMU dependency There is no !CONFIG_MMU version of vmf_insert_pfn(): arm-linux-gnueabi-ld: drivers/dma-buf/udmabuf.o: in function `udmabuf_vm_fault': udmabuf.c:(.text+0xaa): undefined reference to `vmf_insert_pfn' Link: https://lkml.kernel.org/r/20240624063952.1572359-5-vivek.kasireddy@intel.com Signed-off-by: Arnd Bergmann Acked-by: David Hildenbrand Acked-by: Vivek Kasireddy Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Daniel Vetter Cc: Dave Airlie Cc: Dongwon Kim Cc: Gerd Hoffmann Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: Junxiao Chang Cc: Matthew Wilcox (Oracle) Cc: Mike Kravetz Cc: Oscar Salvador Cc: Peter Xu Cc: Shuah Khan Signed-off-by: Andrew Morton --- drivers/dma-buf/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'drivers') diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index e4dc53a36428..b46eb8a552d7 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -35,6 +35,7 @@ config UDMABUF default n depends on DMA_SHARED_BUFFER depends on MEMFD_CREATE || COMPILE_TEST + depends on MMU help A driver to let userspace turn memfd regions into dma-bufs. Qemu can use this to create host dmabufs for guest framebuffers. -- cgit v1.2.3 From 7d79cd784470395539bda91bf0b3505ff5b2ab6d Mon Sep 17 00:00:00 2001 From: Vivek Kasireddy Date: Sun, 23 Jun 2024 23:36:13 -0700 Subject: udmabuf: use vmf_insert_pfn and VM_PFNMAP for handling mmap Add VM_PFNMAP to vm_flags in the mmap handler to ensure that the mappings would be managed without using struct page. And, in the vm_fault handler, use vmf_insert_pfn to share the page's pfn to userspace instead of directly sharing the page (via struct page *). Link: https://lkml.kernel.org/r/20240624063952.1572359-6-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy Suggested-by: David Hildenbrand Acked-by: David Hildenbrand Acked-by: Dave Airlie Acked-by: Gerd Hoffmann Cc: Daniel Vetter Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Dongwon Kim Cc: Junxiao Chang Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Matthew Wilcox (Oracle) Cc: Mike Kravetz Cc: Oscar Salvador Cc: Shuah Khan Signed-off-by: Andrew Morton --- drivers/dma-buf/udmabuf.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'drivers') diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index c40645999648..820c993c8659 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -35,12 +35,13 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; struct udmabuf *ubuf = vma->vm_private_data; pgoff_t pgoff = vmf->pgoff; + unsigned long pfn; if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; - vmf->page = ubuf->pages[pgoff]; - get_page(vmf->page); - return 0; + + pfn = page_to_pfn(ubuf->pages[pgoff]); + return vmf_insert_pfn(vma, vmf->address, pfn); } static const struct vm_operations_struct udmabuf_vm_ops = { @@ -56,6 +57,7 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) vma->vm_ops = &udmabuf_vm_ops; vma->vm_private_data = ubuf; + vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); return 0; } -- cgit v1.2.3 From 0c8b91ef5100eaed3d64123ac91ac4739fccf15c Mon Sep 17 00:00:00 2001 From: Vivek Kasireddy Date: Sun, 23 Jun 2024 23:36:14 -0700 Subject: udmabuf: add back support for mapping hugetlb pages A user or admin can configure a VMM (Qemu) Guest's memory to be backed by hugetlb pages for various reasons. However, a Guest OS would still allocate (and pin) buffers that are backed by regular 4k sized pages. In order to map these buffers and create dma-bufs for them on the Host, we first need to find the hugetlb pages where the buffer allocations are located and then determine the offsets of individual chunks (within those pages) and use this information to eventually populate a scatterlist. Testcase: default_hugepagesz=2M hugepagesz=2M hugepages=2500 options were passed to the Host kernel and Qemu was launched with these relevant options: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -display gtk,gl=on -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 Replacing -display gtk,gl=on with -display gtk,gl=off above would exercise the mmap handler. Link: https://lkml.kernel.org/r/20240624063952.1572359-7-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy Acked-by: Mike Kravetz (v2) Acked-by: Dave Airlie Acked-by: Gerd Hoffmann Cc: David Hildenbrand Cc: Daniel Vetter Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Dongwon Kim Cc: Junxiao Chang Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Matthew Wilcox (Oracle) Cc: Oscar Salvador Cc: Shuah Khan Signed-off-by: Andrew Morton --- drivers/dma-buf/udmabuf.c | 122 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 101 insertions(+), 21 deletions(-) (limited to 'drivers') diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 820c993c8659..274defd3fa3e 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,7 @@ struct udmabuf { struct page **pages; struct sg_table *sg; struct miscdevice *device; + pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -41,6 +43,8 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) return VM_FAULT_SIGBUS; pfn = page_to_pfn(ubuf->pages[pgoff]); + pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + return vmf_insert_pfn(vma, vmf->address, pfn); } @@ -90,23 +94,29 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; + struct scatterlist *sgl; + unsigned int i = 0; int ret; sg = kzalloc(sizeof(*sg), GFP_KERNEL); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, - 0, ubuf->pagecount << PAGE_SHIFT, - GFP_KERNEL); + + ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); if (ret < 0) - goto err; + goto err_alloc; + + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); + ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) - goto err; + goto err_map; return sg; -err: +err_map: sg_free_table(sg); +err_alloc: kfree(sg); return ERR_PTR(ret); } @@ -143,6 +153,7 @@ static void release_udmabuf(struct dma_buf *buf) for (pg = 0; pg < ubuf->pagecount; pg++) put_page(ubuf->pages[pg]); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); } @@ -196,17 +207,77 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) +static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + struct hstate *hpstate = hstate_file(memfd); + pgoff_t mapidx = offset >> huge_page_shift(hpstate); + pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; + pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; + struct page *hpage = NULL; + struct folio *folio; + pgoff_t pgidx; + + mapidx <<= huge_page_order(hpstate); + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + if (!hpage) { + folio = __filemap_get_folio(memfd->f_mapping, + mapidx, + FGP_ACCESSED, 0); + if (IS_ERR(folio)) + return PTR_ERR(folio); + + hpage = &folio->page; + } + + get_page(hpage); + ubuf->pages[*pgbuf] = hpage; + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; + (*pgbuf)++; + if (++subpgoff == maxsubpgs) { + put_page(hpage); + hpage = NULL; + subpgoff = 0; + mapidx += pages_per_huge_page(hpstate); + } + } + + if (hpage) + put_page(hpage); + + return 0; +} + +static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, + pgoff_t offset, pgoff_t pgcnt, + pgoff_t *pgbuf) +{ + pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; + struct page *page; + + for (pgidx = 0; pgidx < pgcnt; pgidx++) { + page = shmem_read_mapping_page(memfd->f_mapping, + pgoff + pgidx); + if (IS_ERR(page)) + return PTR_ERR(page); + + ubuf->pages[*pgbuf] = page; + (*pgbuf)++; + } + + return 0; +} + static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { DEFINE_DMA_BUF_EXPORT_INFO(exp_info); struct file *memfd = NULL; - struct address_space *mapping = NULL; struct udmabuf *ubuf; struct dma_buf *buf; - pgoff_t pgoff, pgcnt, pgidx, pgbuf = 0, pglimit; - struct page *page; + pgoff_t pgcnt, pgbuf = 0, pglimit; int seals, ret = -EINVAL; u32 i, flags; @@ -234,6 +305,12 @@ static long udmabuf_create(struct miscdevice *device, ret = -ENOMEM; goto err; } + ubuf->offsets = kcalloc(ubuf->pagecount, sizeof(*ubuf->offsets), + GFP_KERNEL); + if (!ubuf->offsets) { + ret = -ENOMEM; + goto err; + } pgbuf = 0; for (i = 0; i < head->count; i++) { @@ -241,8 +318,7 @@ static long udmabuf_create(struct miscdevice *device, memfd = fget(list[i].memfd); if (!memfd) goto err; - mapping = memfd->f_mapping; - if (!shmem_mapping(mapping)) + if (!shmem_file(memfd) && !is_file_hugepages(memfd)) goto err; seals = memfd_fcntl(memfd, F_GET_SEALS, 0); if (seals == -EINVAL) @@ -251,16 +327,19 @@ static long udmabuf_create(struct miscdevice *device, if ((seals & SEALS_WANTED) != SEALS_WANTED || (seals & SEALS_DENIED) != 0) goto err; - pgoff = list[i].offset >> PAGE_SHIFT; - pgcnt = list[i].size >> PAGE_SHIFT; - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(mapping, pgoff + pgidx); - if (IS_ERR(page)) { - ret = PTR_ERR(page); - goto err; - } - ubuf->pages[pgbuf++] = page; - } + + pgcnt = list[i].size >> PAGE_SHIFT; + if (is_file_hugepages(memfd)) + ret = handle_hugetlb_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + else + ret = handle_shmem_pages(ubuf, memfd, + list[i].offset, + pgcnt, &pgbuf); + if (ret < 0) + goto err; + fput(memfd); memfd = NULL; } @@ -287,6 +366,7 @@ err: put_page(ubuf->pages[--pgbuf]); if (memfd) fput(memfd); + kfree(ubuf->offsets); kfree(ubuf->pages); kfree(ubuf); return ret; -- cgit v1.2.3 From 5e72b2b41a21e596dcff489810ea760adeb2ef30 Mon Sep 17 00:00:00 2001 From: Vivek Kasireddy Date: Sun, 23 Jun 2024 23:36:15 -0700 Subject: udmabuf: convert udmabuf driver to use folios This is mainly a preparatory patch to use memfd_pin_folios() API for pinning folios. Using folios instead of pages makes sense as the udmabuf driver needs to handle both shmem and hugetlb cases. And, using the memfd_pin_folios() API makes this easier as we no longer need to separately handle shmem vs hugetlb cases in the udmabuf driver. Note that, the function vmap_udmabuf() still needs a list of pages; so, we collect all the head pages into a local array in this case. Other changes in this patch include the addition of helpers for checking the memfd seals and exporting dmabuf. Moving code from udmabuf_create() into these helpers improves readability given that udmabuf_create() is a bit long. Link: https://lkml.kernel.org/r/20240624063952.1572359-8-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy Acked-by: Dave Airlie Acked-by: Gerd Hoffmann Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Daniel Vetter Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Dongwon Kim Cc: Junxiao Chang Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Mike Kravetz Cc: Oscar Salvador Cc: Shuah Khan Signed-off-by: Andrew Morton --- drivers/dma-buf/udmabuf.c | 139 +++++++++++++++++++++++++++------------------- 1 file changed, 83 insertions(+), 56 deletions(-) (limited to 'drivers') diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 274defd3fa3e..e67515808ed3 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -26,7 +26,7 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is struct udmabuf { pgoff_t pagecount; - struct page **pages; + struct folio **folios; struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; @@ -42,7 +42,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; - pfn = page_to_pfn(ubuf->pages[pgoff]); + pfn = folio_pfn(ubuf->folios[pgoff]); pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; return vmf_insert_pfn(vma, vmf->address, pfn); @@ -68,11 +68,21 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) { struct udmabuf *ubuf = buf->priv; + struct page **pages; void *vaddr; + pgoff_t pg; dma_resv_assert_held(buf->resv); - vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1); + pages = kmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + for (pg = 0; pg < ubuf->pagecount; pg++) + pages[pg] = &ubuf->folios[pg]->page; + + vaddr = vm_map_ram(pages, ubuf->pagecount, -1); + kfree(pages); if (!vaddr) return -EINVAL; @@ -107,7 +117,8 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, goto err_alloc; for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) - sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); + sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, + ubuf->offsets[i]); ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) @@ -152,9 +163,9 @@ static void release_udmabuf(struct dma_buf *buf) put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL); for (pg = 0; pg < ubuf->pagecount; pg++) - put_page(ubuf->pages[pg]); + folio_put(ubuf->folios[pg]); kfree(ubuf->offsets); - kfree(ubuf->pages); + kfree(ubuf->folios); kfree(ubuf); } @@ -215,36 +226,33 @@ static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, pgoff_t mapidx = offset >> huge_page_shift(hpstate); pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; - struct page *hpage = NULL; - struct folio *folio; + struct folio *folio = NULL; pgoff_t pgidx; mapidx <<= huge_page_order(hpstate); for (pgidx = 0; pgidx < pgcnt; pgidx++) { - if (!hpage) { + if (!folio) { folio = __filemap_get_folio(memfd->f_mapping, mapidx, FGP_ACCESSED, 0); if (IS_ERR(folio)) return PTR_ERR(folio); - - hpage = &folio->page; } - get_page(hpage); - ubuf->pages[*pgbuf] = hpage; + folio_get(folio); + ubuf->folios[*pgbuf] = folio; ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; (*pgbuf)++; if (++subpgoff == maxsubpgs) { - put_page(hpage); - hpage = NULL; + folio_put(folio); + folio = NULL; subpgoff = 0; mapidx += pages_per_huge_page(hpstate); } } - if (hpage) - put_page(hpage); + if (folio) + folio_put(folio); return 0; } @@ -254,31 +262,69 @@ static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, pgoff_t *pgbuf) { pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; - struct page *page; + struct folio *folio = NULL; for (pgidx = 0; pgidx < pgcnt; pgidx++) { - page = shmem_read_mapping_page(memfd->f_mapping, - pgoff + pgidx); - if (IS_ERR(page)) - return PTR_ERR(page); + folio = shmem_read_folio(memfd->f_mapping, pgoff + pgidx); + if (IS_ERR(folio)) + return PTR_ERR(folio); - ubuf->pages[*pgbuf] = page; + ubuf->folios[*pgbuf] = folio; (*pgbuf)++; } return 0; } +static int check_memfd_seals(struct file *memfd) +{ + int seals; + + if (!memfd) + return -EBADFD; + + if (!shmem_file(memfd) && !is_file_hugepages(memfd)) + return -EBADFD; + + seals = memfd_fcntl(memfd, F_GET_SEALS, 0); + if (seals == -EINVAL) + return -EBADFD; + + if ((seals & SEALS_WANTED) != SEALS_WANTED || + (seals & SEALS_DENIED) != 0) + return -EINVAL; + + return 0; +} + +static int export_udmabuf(struct udmabuf *ubuf, + struct miscdevice *device, + u32 flags) +{ + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *buf; + + ubuf->device = device; + exp_info.ops = &udmabuf_ops; + exp_info.size = ubuf->pagecount << PAGE_SHIFT; + exp_info.priv = ubuf; + exp_info.flags = O_RDWR; + + buf = dma_buf_export(&exp_info); + if (IS_ERR(buf)) + return PTR_ERR(buf); + + return dma_buf_fd(buf, flags); +} + static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { - DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + pgoff_t pgcnt, pgbuf = 0, pglimit; struct file *memfd = NULL; struct udmabuf *ubuf; - struct dma_buf *buf; - pgoff_t pgcnt, pgbuf = 0, pglimit; - int seals, ret = -EINVAL; + int ret = -EINVAL; u32 i, flags; ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL); @@ -299,9 +345,9 @@ static long udmabuf_create(struct miscdevice *device, if (!ubuf->pagecount) goto err; - ubuf->pages = kmalloc_array(ubuf->pagecount, sizeof(*ubuf->pages), + ubuf->folios = kmalloc_array(ubuf->pagecount, sizeof(*ubuf->folios), GFP_KERNEL); - if (!ubuf->pages) { + if (!ubuf->folios) { ret = -ENOMEM; goto err; } @@ -314,18 +360,9 @@ static long udmabuf_create(struct miscdevice *device, pgbuf = 0; for (i = 0; i < head->count; i++) { - ret = -EBADFD; memfd = fget(list[i].memfd); - if (!memfd) - goto err; - if (!shmem_file(memfd) && !is_file_hugepages(memfd)) - goto err; - seals = memfd_fcntl(memfd, F_GET_SEALS, 0); - if (seals == -EINVAL) - goto err; - ret = -EINVAL; - if ((seals & SEALS_WANTED) != SEALS_WANTED || - (seals & SEALS_DENIED) != 0) + ret = check_memfd_seals(memfd); + if (ret < 0) goto err; pgcnt = list[i].size >> PAGE_SHIFT; @@ -344,30 +381,20 @@ static long udmabuf_create(struct miscdevice *device, memfd = NULL; } - exp_info.ops = &udmabuf_ops; - exp_info.size = ubuf->pagecount << PAGE_SHIFT; - exp_info.priv = ubuf; - exp_info.flags = O_RDWR; - - ubuf->device = device; - buf = dma_buf_export(&exp_info); - if (IS_ERR(buf)) { - ret = PTR_ERR(buf); + flags = head->flags & UDMABUF_FLAGS_CLOEXEC ? O_CLOEXEC : 0; + ret = export_udmabuf(ubuf, device, flags); + if (ret < 0) goto err; - } - flags = 0; - if (head->flags & UDMABUF_FLAGS_CLOEXEC) - flags |= O_CLOEXEC; - return dma_buf_fd(buf, flags); + return ret; err: while (pgbuf > 0) - put_page(ubuf->pages[--pgbuf]); + folio_put(ubuf->folios[--pgbuf]); if (memfd) fput(memfd); kfree(ubuf->offsets); - kfree(ubuf->pages); + kfree(ubuf->folios); kfree(ubuf); return ret; } -- cgit v1.2.3 From c6a3194c05e7e6fd0e8fbfb1720084ae2503c4ac Mon Sep 17 00:00:00 2001 From: Vivek Kasireddy Date: Sun, 23 Jun 2024 23:36:16 -0700 Subject: udmabuf: pin the pages using memfd_pin_folios() API Using memfd_pin_folios() will ensure that the pages are pinned correctly using FOLL_PIN. And, this also ensures that we don't accidentally break features such as memory hotunplug as it would not allow pinning pages in the movable zone. Using this new API also simplifies the code as we no longer have to deal with extracting individual pages from their mappings or handle shmem and hugetlb cases separately. Link: https://lkml.kernel.org/r/20240624063952.1572359-9-vivek.kasireddy@intel.com Signed-off-by: Vivek Kasireddy Acked-by: Dave Airlie Acked-by: Gerd Hoffmann Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Daniel Vetter Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Dongwon Kim Cc: Junxiao Chang Cc: Arnd Bergmann Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Mike Kravetz Cc: Oscar Salvador Cc: Shuah Khan Signed-off-by: Andrew Morton --- drivers/dma-buf/udmabuf.c | 155 ++++++++++++++++++++++++---------------------- 1 file changed, 80 insertions(+), 75 deletions(-) (limited to 'drivers') diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index e67515808ed3..047c3cd2ceff 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -30,6 +30,12 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; + struct list_head unpin_list; +}; + +struct udmabuf_folio { + struct folio *folio; + struct list_head list; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -153,17 +159,43 @@ static void unmap_udmabuf(struct dma_buf_attachment *at, return put_sg_table(at->dev, sg, direction); } +static void unpin_all_folios(struct list_head *unpin_list) +{ + struct udmabuf_folio *ubuf_folio; + + while (!list_empty(unpin_list)) { + ubuf_folio = list_first_entry(unpin_list, + struct udmabuf_folio, list); + unpin_folio(ubuf_folio->folio); + + list_del(&ubuf_folio->list); + kfree(ubuf_folio); + } +} + +static int add_to_unpin_list(struct list_head *unpin_list, + struct folio *folio) +{ + struct udmabuf_folio *ubuf_folio; + + ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL); + if (!ubuf_folio) + return -ENOMEM; + + ubuf_folio->folio = folio; + list_add_tail(&ubuf_folio->list, unpin_list); + return 0; +} + static void release_udmabuf(struct dma_buf *buf) { struct udmabuf *ubuf = buf->priv; struct device *dev = ubuf->device->this_device; - pgoff_t pg; if (ubuf->sg) put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL); - for (pg = 0; pg < ubuf->pagecount; pg++) - folio_put(ubuf->folios[pg]); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf); @@ -218,64 +250,6 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) -static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - struct hstate *hpstate = hstate_file(memfd); - pgoff_t mapidx = offset >> huge_page_shift(hpstate); - pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; - pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; - struct folio *folio = NULL; - pgoff_t pgidx; - - mapidx <<= huge_page_order(hpstate); - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - if (!folio) { - folio = __filemap_get_folio(memfd->f_mapping, - mapidx, - FGP_ACCESSED, 0); - if (IS_ERR(folio)) - return PTR_ERR(folio); - } - - folio_get(folio); - ubuf->folios[*pgbuf] = folio; - ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; - (*pgbuf)++; - if (++subpgoff == maxsubpgs) { - folio_put(folio); - folio = NULL; - subpgoff = 0; - mapidx += pages_per_huge_page(hpstate); - } - } - - if (folio) - folio_put(folio); - - return 0; -} - -static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; - struct folio *folio = NULL; - - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - folio = shmem_read_folio(memfd->f_mapping, pgoff + pgidx); - if (IS_ERR(folio)) - return PTR_ERR(folio); - - ubuf->folios[*pgbuf] = folio; - (*pgbuf)++; - } - - return 0; -} - static int check_memfd_seals(struct file *memfd) { int seals; @@ -321,16 +295,19 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { - pgoff_t pgcnt, pgbuf = 0, pglimit; + pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0; + long nr_folios, ret = -EINVAL; struct file *memfd = NULL; + struct folio **folios; struct udmabuf *ubuf; - int ret = -EINVAL; - u32 i, flags; + u32 i, j, k, flags; + loff_t end; ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL); if (!ubuf) return -ENOMEM; + INIT_LIST_HEAD(&ubuf->unpin_list); pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT; for (i = 0; i < head->count; i++) { if (!IS_ALIGNED(list[i].offset, PAGE_SIZE)) @@ -366,17 +343,46 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgcnt = list[i].size >> PAGE_SHIFT; - if (is_file_hugepages(memfd)) - ret = handle_hugetlb_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - else - ret = handle_shmem_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - if (ret < 0) + folios = kmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL); + if (!folios) { + ret = -ENOMEM; goto err; + } + + end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1; + ret = memfd_pin_folios(memfd, list[i].offset, end, + folios, pgcnt, &pgoff); + if (ret <= 0) { + kfree(folios); + if (!ret) + ret = -EINVAL; + goto err; + } + + nr_folios = ret; + pgoff >>= PAGE_SHIFT; + for (j = 0, k = 0; j < pgcnt; j++) { + ubuf->folios[pgbuf] = folios[k]; + ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT; + + if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) { + ret = add_to_unpin_list(&ubuf->unpin_list, + folios[k]); + if (ret < 0) { + kfree(folios); + goto err; + } + } + + pgbuf++; + if (++pgoff == folio_nr_pages(folios[k])) { + pgoff = 0; + if (++k == nr_folios) + break; + } + } + kfree(folios); fput(memfd); memfd = NULL; } @@ -389,10 +395,9 @@ static long udmabuf_create(struct miscdevice *device, return ret; err: - while (pgbuf > 0) - folio_put(ubuf->folios[--pgbuf]); if (memfd) fput(memfd); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf); -- cgit v1.2.3 From 823430c8e9d98c5865af518c782d0493b76aa511 Mon Sep 17 00:00:00 2001 From: "Ho-Ren (Jack) Chuang" Date: Thu, 4 Jul 2024 07:26:44 +0000 Subject: memory tier: consolidate the initialization of memory tiers The current memory tier initialization process is distributed across two different functions, memory_tier_init() and memory_tier_late_init(). This design is hard to maintain. Thus, this patch is proposed to reduce the possible code paths by consolidating different initialization patches into one. The earlier discussion with Jonathan and Ying is listed here: https://lore.kernel.org/lkml/20240405150244.00004b49@Huawei.com/ If we want to put these two initializations together, they must be placed together in the later function. Because only at that time, the HMAT information will be ready, adist between nodes can be calculated, and memory tiering can be established based on the adist. So we position the initialization at memory_tier_init() to the memory_tier_late_init() call. Moreover, it's natural to keep memory_tier initialization in drivers at device_initcall() level. If we simply move the set_node_memory_tier() from memory_tier_init() to late_initcall(), it will result in HMAT not registering the mt_adistance_algorithm callback function, because set_node_memory_tier() is not performed during the memory tiering initialization phase, leading to a lack of correct default_dram information. Therefore, we introduced a nodemask to pass the information of the default DRAM nodes. The reason for not choosing to reuse default_dram_type->nodes is that it is not clean enough. So in the end, we use a __initdata variable, which is a variable that is released once initialization is complete, including both CPU and memory nodes for HMAT to iterate through. Link: https://lkml.kernel.org/r/20240704072646.437579-1-horen.chuang@linux.dev Signed-off-by: Ho-Ren (Jack) Chuang Suggested-by: Jonathan Cameron Reviewed-by: "Huang, Ying" Reviewed-by: Jonathan Cameron Cc: Alistair Popple Cc: Aneesh Kumar K.V Cc: Dan Williams Cc: Dave Jiang Cc: Gregory Price Cc: Len Brown Cc: Michal Hocko Cc: Rafael J. Wysocki Cc: Ravi Jonnalagadda Cc: SeongJae Park Cc: Tejun Heo Signed-off-by: Andrew Morton --- drivers/acpi/numa/hmat.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'drivers') diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index 2c8ccc91ebe6..a2f9e7a4b479 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -940,10 +940,7 @@ static int hmat_set_default_dram_perf(void) struct memory_target *target; struct access_coordinate *attrs; - if (!default_dram_type) - return -EIO; - - for_each_node_mask(nid, default_dram_type->nodes) { + for_each_node_mask(nid, default_dram_nodes) { pxm = node_to_pxm(nid); target = find_mem_target(pxm); if (!target) -- cgit v1.2.3