From d187a86de793f84766ea40b9ade7ac60aabbb4fe Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 30 Mar 2026 10:39:09 -0700 Subject: x86-64: rename misleadingly named '__copy_user_nocache()' function This function was a masterclass in bad naming, for various historical reasons. It claimed to be a non-cached user copy. It is literally _neither_ of those things. It's a specialty memory copy routine that uses non-temporal stores for the destination (but not the source), and that does exception handling for both source and destination accesses. Also note that while it works for unaligned targets, any unaligned parts (whether at beginning or end) will not use non-temporal stores, since only words and quadwords can be non-temporal on x86. The exception handling means that it _can_ be used for user space accesses, but not on its own - it needs all the normal "start user space access" logic around it. But typically the user space access would be the source, not the non-temporal destination. That was the original intention of this, where the destination was some fragile persistent memory target that needed non-temporal stores in order to catch machine check exceptions synchronously and deal with them gracefully. Thus that non-descriptive name: one use case was to copy from user space into a non-cached kernel buffer. However, the existing users are a mix of that intended use-case, and a couple of random drivers that just did this as a performance tweak. Some of those random drivers then actively misused the user copying version (with STAC/CLAC and all) to do kernel copies without ever even caring about the exception handling, _just_ for the non-temporal destination. Rename it as a first small step to actually make it halfway sane, and change the prototype to be more normal: it doesn't take a user pointer unless the caller has done the proper conversion, and the argument size is the full size_t (it still won't actually copy more than 4GB in one go, but there's also no reason to silently truncate the size argument in the caller). Finally, use this now sanely named function in the NTB code, which mis-used a user copy version (with STAC/CLAC and all) of this interface despite it not actually being a user copy at all. Signed-off-by: Linus Torvalds --- drivers/infiniband/sw/rdmavt/qp.c | 8 +++----- drivers/ntb/ntb_transport.c | 7 ++++--- 2 files changed, 7 insertions(+), 8 deletions(-) (limited to 'drivers') diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c index 134a79eecfcb..3467797b5b01 100644 --- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -92,12 +92,10 @@ static int rvt_wss_llc_size(void) static void cacheless_memcpy(void *dst, void *src, size_t n) { /* - * Use the only available X64 cacheless copy. Add a __user cast - * to quiet sparse. The src agument is already in the kernel so - * there are no security issues. The extra fault recovery machinery - * is not invoked. + * Use the only available X64 cacheless copy. + * The extra fault recovery machinery is not invoked. */ - __copy_user_nocache(dst, (void __user *)src, n); + copy_to_nontemporal(dst, src, n); } void rvt_wss_exit(struct rvt_dev_info *rdi) diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c index 71d4bb25f7fd..21e1d238bc31 100644 --- a/drivers/ntb/ntb_transport.c +++ b/drivers/ntb/ntb_transport.c @@ -1810,12 +1810,13 @@ static void ntb_tx_copy_callback(void *data, static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset) { -#ifdef ARCH_HAS_NOCACHE_UACCESS +#ifdef copy_to_nontemporal /* * Using non-temporal mov to improve performance on non-cached - * writes, even though we aren't actually copying from user space. + * writes. This only works if __iomem is strictly memory-like, + * but that is the case on x86-64 */ - __copy_from_user_inatomic_nocache(offset, entry->buf, entry->len); + copy_to_nontemporal(offset, entry->buf, entry->len); #else memcpy_toio(offset, entry->buf, entry->len); #endif -- cgit v1.2.3 From 5de7bcaadf160c1716b20a263cf8f5b06f658959 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 30 Mar 2026 13:11:07 -0700 Subject: x86: rename and clean up __copy_from_user_inatomic_nocache() Similarly to the previous commit, this renames the somewhat confusingly named function. But in this case, it was at least less confusing: the __copy_from_user_inatomic_nocache is indeed copying from user memory, and it is indeed ok to be used in an atomic context, so it will not warn about it. But the previous commit also removed the NTB mis-use of the __copy_from_user_inatomic_nocache() function, and as a result every call-site is now _actually_ doing a real user copy. That means that we can now do the proper user pointer verification too. End result: add proper address checking, remove the double underscores, and change the "nocache" to "nontemporal" to more accurately describe what this x86-only function actually does. It might be worth noting that only the target is non-temporal: the actual user accesses are normal memory accesses. Also worth noting is that non-x86 targets (and on older 32-bit x86 CPU's before XMM2 in the Pentium III) we end up just falling back on a regular user copy, so nothing can actually depend on the non-temporal semantics, but that has always been true. Signed-off-by: Linus Torvalds --- drivers/gpu/drm/i915/i915_gem.c | 2 +- drivers/gpu/drm/qxl/qxl_ioctl.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers') diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 4c82c9544b93..72fe91ed1c74 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -520,7 +520,7 @@ ggtt_write(struct io_mapping *mapping, /* We can use the cpu mem copy function because this is X86. */ vaddr = io_mapping_map_atomic_wc(mapping, base); - unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset, + unwritten = copy_from_user_inatomic_nontemporal((void __force *)vaddr + offset, user_data, length); io_mapping_unmap_atomic(vaddr); if (unwritten) { diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c index 336cbff26089..26545a08cdf7 100644 --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -184,7 +184,7 @@ static int qxl_process_single_command(struct qxl_device *qdev, /* TODO copy slow path code from i915 */ fb_cmd = qxl_bo_kmap_atomic_page(qdev, cmd_bo, (release->release_offset & PAGE_MASK)); - unwritten = __copy_from_user_inatomic_nocache + unwritten = copy_from_user_inatomic_nontemporal (fb_cmd + sizeof(union qxl_release_info) + (release->release_offset & ~PAGE_MASK), u64_to_user_ptr(cmd->command), cmd->command_size); -- cgit v1.2.3