summaryrefslogtreecommitdiff
path: root/drivers/infiniband/core
AgeCommit message (Collapse)Author
7 daysRDMA/core: Fix rereg_mr use-after-free raceMichael Guralnik
When a driver creates a new MR during rereg_user_mr, a race window exists between rdma_alloc_commit_uobject() for the new MR and the point where the code reads that MR to populate the response keys. A concurrent rereg_mr or destroy_mr could destroy the MR in this window and cause UAF in the first thread. Racing flow between two rereg_mr calls: CPU0 CPU1 ---- ---- rereg_user_mr(mr_handle) uobj_get_write(mr_handle) -> mr0 mr1 = driver→rereg() rdma_alloc_commit_uobject(mr1) // mr1 replaced mr0 and is unlocked uobj_put_destroy(mr0) rereg_user_mr(mr_handle) uobj_get_write(mr_handle) -> mr1 mr2 = driver→rereg() rdma_alloc_commit_uobject(mr2) // mr2 replaced mr1 and is unlocked uobj_put_destroy(mr1) // Destroys mr1! resp.lkey = mr1->lkey; // UAF - mr1 was freed! resp.rkey = mr1->rkey; // UAF - mr1 was freed! Fix by storing lkey/rkey in local variables before the new MR is unlocked and using the local variables to set the user response. Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
7 daysIB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()Maher Sanalla
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a netlink request to the userspace daemon to perform IP-to-GID resolution in certain cases. The function allocates the netlink message buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the size of size_t) instead of 16 bytes (the size of an IPv6 address). This results in an 8-byte under-allocation. This is currently masked by nlmsg_new() over-allocation of the skb in its internal logic. However, the code remains incorrect. Fix the issue by supplying the proper IPv6 address length to nla_total_size(). Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "The usual collection of driver changes, more core infrastructure updates that typical this cycle: - Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa, ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma - New udata validation framework and driver updates - Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in core - Promote UMEM to a core component, split out DMA block iterator logic - Introduce FRMR pools with aging, statistics, pinned handles, and netlink control and use it in mlx5 - Add PCIe TLP emulation support in mlx5 - Extend umem to work with revocable pinned dmabuf's and use it in irdma - More net namespace improvements for rxe - GEN4 hardware support in irdma - First steps to MW and UC support in mana_ib - Support for CQ umem and doorbells in bnxt_re - Drop opa_vnic driver from hfi1 Fixes: - IB/core zero dmac neighbor resolution race - GID table memory free - rxe pad/ICRC validation and r_key async errors - mlx4 external umem for CQ - umem DMA attributes on unmap - mana_ib RX steering on RSS QP destroy" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits) RDMA/core: Fix user CQ creation for drivers without create_cq RDMA/ionic: bound node_desc sysfs read with %.64s IB/core: Fix zero dmac race in neighbor resolution RDMA/mana_ib: Support memory windows RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv RDMA/core: Prefer NLA_NUL_STRING RDMA/core: Fix memory free for GID table RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in() RDMA: Remove redundant = {} for udata req structs RDMA/irdma: Add missing comp_mask check in alloc_ucontext RDMA/hns: Add missing comp_mask check in create_qp RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm() RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask RDMA/hns: Use ib_copy_validate_udata_in() RDMA/mlx4: Use ib_copy_validate_udata_in() for QP RDMA/mlx4: Use ib_copy_validate_udata_in() RDMA/mlx5: Use ib_copy_validate_udata_in() for MW RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq RDMA: Use ib_copy_validate_udata_in() for implicit full structs ...
2026-04-17RDMA/core: Fix user CQ creation for drivers without create_cqMichael Margolin
CQ creation is failing for drivers that only implement create_user_cq (e.g. EFA), when buffer isn't provided by userspace. This because of a leftover check that requires create_cq existence in such case. Remove the create_cq existence check from the no-buffer path. The buffer is optional and drivers that handle their own memory should work through create_user_cq regardless. Fixes: 584ec74748e6 ("RDMA/core: Prepare create CQ path for API unification") Link: https://patch.msgid.link/r/20260416201408.13980-1-mrgolin@amazon.com Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15Merge tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - new DRM RAS infrastructure using netlink - amdgpu: enable DC on CIK APUs, and more IP enablement, and more user queue work - xe: purgeable BO support, and new hw enablement - dma-buf : add revocable operations Full summary: mm: - two-pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET*() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi_*_multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow forcing DMA through IOMMU IOVA - support per-BO mem usage queries - refactor GEM implementation ivpu: - update boot API to v3.29.4 - limit per-user number of doorbells/contexts - perform engine reset on TDR error loongson: - replace custom code with drm_gem_ttm_dumb_map_offset() imx: - support planes behind the primary plane - fix bus-format selection vkms: - support CRTC background color v3d: - improve handling of struct v3d_stats komeda: - support Arm China Linlon D6 plus DT bindings imagination: - improve power-off sequence - support context-reset notification from firmware mediatek: - mtk_dsi: enable hs clock during pre-enable - Remove all conflicting aperture devices during probe - Add support for mt8167 display blocks" * tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits) drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc drm/ttm/tests: fix lru_count ASSERT drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs drm/fb-helper: Fix a locking bug in an error path dma-fence: correct kernel-doc function parameter @flags ttm/pool: track allocated_pages per numa node. ttm/pool: make pool shrinker NUMA aware (v2) ttm/pool: drop numa specific pools ttm/pool: port to list_lru. (v2) drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) mm: add gpu active/reclaim per-node stat counters (v2) gpu: nova-core: fix missing colon in SEC2 boot debug message gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing gpu: nova-core: bitfield: fix broken Default implementation gpu: nova-core: falcon: pad firmware DMA object size to required block alignment gpu: nova-core: gsp: fix undefined behavior in command queue code drm/shmem_helper: Make sure PMD entries get the writeable upgrade accel/ivpu: Trigger recovery on TDR with OS scheduling drm/msm: Use of_get_available_child_by_name() dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir ...
2026-04-14Merge tag 'net-next-7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...
2026-04-09IB/core: Fix zero dmac race in neighbor resolutionChen Zhao
dst_fetch_ha() checks nud_state without holding the neighbor lock, then copies ha under the seqlock. A race in __neigh_update() where nud_state is set to NUD_REACHABLE before ha is written allows dst_fetch_ha() to read a zero MAC address while the seqlock reports no concurrent writer. netevent_callback amplifies this by waking ALL pending addr_req workers when ANY neighbor becomes NUD_VALID. At scale (N peers resolving ARP concurrently), the hit probability scales as N^2, making it near-certain for large RDMA workloads. N(A): neigh_update(A) W(A): addr_resolve(A) | [sleep] | write_lock_bh(&A->lock) | | A->nud_state = NUD_REACHABLE | | // A->ha is still 0 | | [woken by netevent_cb() of | another neighbour] | | dst_fetch_ha(A) | | A->nud_state & NUD_VALID | | read_seqbegin(&A->ha_lock) | | snapshot = A->ha /* 0 */ | | read_seqretry(&A->ha_lock) | | return snapshot | seqlock(&A->ha_lock) | A->ha = mac_A /* too late */ | sequnlock(&A->ha_lock) | write_unlock_bh(&A->lock) The incorrect/zero mac is read and programmed in the device QP while it was not yet updated. This causes silent packet loss and eventual RETRY_EXC_ERR. Fix by holding the neighbor read lock across the nud_state check and ha copy in dst_fetch_ha(), ensuring it synchronizes with __neigh_update() which is updating while holding the write lock. Cc: stable@vger.kernel.org Fixes: 92ebb6a0a13a ("IB/cm: Remove now useless rcu_lock in dst_fetch_ha") Link: https://patch.msgid.link/r/20260405-fix-dmac-race-v1-1-cfa1ec2ce54a@nvidia.com Signed-off-by: Chen Zhao <chezhao@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-09RDMA/core: Prefer NLA_NUL_STRINGFlorian Westphal
These attributes are evaluated as c-string (passed to strcmp), but NLA_STRING doesn't check for the presence of a \0 terminator. Either this needs to switch to nla_strcmp() and needs to adjust printf fmt specifier to not use plain %s, or this needs to use NLA_NUL_STRING. As the code has been this way for long time, it seems to me that userspace does include the terminating nul, even tough its not enforced so far, and thus NLA_NUL_STRING use is the simpler solution. Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service") Link: https://patch.msgid.link/r/20260330122742.13315-1-fw@strlen.de Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-09kernfs: pass struct ns_common instead of const void * for namespace tagsChristian Brauner
kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void *. Replace all const void * namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-07RDMA/core: Fix memory free for GID tablezhenwei pi
When removing a RXE device, kernel oops: RIP: 0010:free_large_kmalloc+0xf6/0x140 Code: 75 28 0f 0b 44 0f b6 2d a5 d6 d1 01 41 80 fd 01 0f 87 7c d1 ad ff 41 83 e5 01 74 3d 41 bc 00 f0 ff ff 45 31 ed e9 61 ff ff ff <0f> 0b 48 c7 c6 af b1 70 83 48 89 df e8 79 0a fa ff 5b 41 5c 41 5d RSP: 0018:ffffd038c18074d8 EFLAGS: 00010293 RAX: 0017ffffc0000000 RBX: fffff86984219d00 RCX: 0000000000000000 RDX: 00000000000000f0 RSI: ffff899b88674000 RDI: fffff86984219d00 RBP: ffffd038c18074f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff899b88674000 R13: 0000000000000001 R14: ffff899b88674000 R15: ffff899b86180000 FS: 00007b163c71c740(0000) GS:ffff899c378bf000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007b163c730200 CR3: 0000000106a1d000 CR4: 0000000000350ef0 Call Trace: <TASK> kfree+0x163/0x3a0 gid_table_release_one+0xaf/0xf0 [ib_core] ib_cache_release_one+0x66/0x80 [ib_core] ib_device_release+0x48/0xb0 [ib_core] device_release+0x44/0xa0 kobject_put+0x9b/0x250 put_device+0x13/0x30 ib_unregister_device_and_put+0x40/0x60 [ib_core] nldev_dellink+0xd3/0x140 [ib_core] rdma_nl_rcv_msg+0x11d/0x300 [ib_core] ? netlink_bind+0x141/0x3a0 rdma_nl_rcv_skb.constprop.0.isra.0+0xba/0x110 [ib_core] rdma_nl_rcv+0xe/0x20 [ib_core] netlink_unicast+0x28d/0x3e0 netlink_sendmsg+0x214/0x470 __sys_sendto+0x21f/0x230 __x64_sys_sendto+0x24/0x40 x64_sys_call+0x1888/0x26e0 do_syscall_64+0xcb/0x14d0 ? _copy_from_user+0x27/0x70 ? do_sock_setsockopt+0xbd/0x190 ? __sys_setsockopt+0x72/0xd0 ? __x64_sys_setsockopt+0x1f/0x40 ? x64_sys_call+0x221b/0x26e0 ? do_syscall_64+0x109/0x14d0 ? exc_page_fault+0x92/0x1c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e GID table is allocated by kzalloc_flex() instead of raw kzalloc_obj(), kfree() should not be called on the data_vec flex array. Fixes: cef2842c922c ("RDMA/core: Use kzalloc_flex for GID table") Link: https://patch.msgid.link/r/20260406132830.435381-2-zhenwei.pi@linux.dev Reported-by: syzbot+4334f9a250019c1b79b4@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/69cc35ec.a70a0220.97f31.02a2.GAE@google.com Signed-off-by: zhenwei pi <zhenwei.pi@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-05mm: rename zap_vma_ptes() to zap_special_vma_range()David Hildenbrand (Arm)
zap_vma_ptes() is the only zapping function we export to modules. It's essentially a wrapper around zap_vma_range(), however, with some safety checks: * That the passed range fits fully into the VMA * That it's only used for VM_PFNMAP We will add support for VM_MIXEDMAP next, so use the more-generic term "special vma", although "special" is a bit overloaded. Maybe we'll later just support any VM_SPECIAL flag. While at it, improve the kerneldoc. Link: https://lkml.kernel.org/r/20260227200848.114019-16-david@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Leon Romanovsky <leon@kernel.org> [drivers/infiniband] Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arve <arve@android.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Dave Airlie <airlied@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jakub Kacinski <kuba@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Namhyung kim <namhyung@kernel.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Tvrtko Ursulin <tursulin@ursulin.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling") ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-31BackMerge tag 'v7.0-rc6' into drm-nextDave Airlie
Linux 7.0-rc6 Requested by a few people on irc to resolve conflicts in other tress. Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-30RDMA/core: Use kzalloc_flex for GID tableRosen Penev
Simplifies allocations by using a flexible array member in struct ib_gid_table. Add __counted_by to get extra runtime analysis. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20260327030124.8385-1-rosenp@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30RDMA/umem: Use consistent DMA attributes when unmapping entriesLeon Romanovsky
The DMA API expects that mapping and unmapping use the same DMA attributes. The RDMA umem code did not meet this requirement, so fix the mismatch. Fixes: f03d9fadfe13 ("RDMA/core: Add weak ordering dma attr to dma mapping") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30Merge branch 'master' into rdma-nextLeon Romanovsky
Let's bring v7.0-rc6 to the -next branch, so we can merge the DMA attributes fix [1] without merge conflicts. [1] https://lore.kernel.org/all/20260323-umem-dma-attrs-v1-1-d6890f2e6a1e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> * master: (1688 commits) Linux 7.0-rc6 ...
2026-03-30RDMA/uverbs: Update outdated reference to remove_commit_idr_uobject()Kexin Sun
The function remove_commit_idr_uobject() was split into destroy_hw_idr_uobject() and remove_handle_idr_uobject() by commit 0f50d88a6e9a ("IB/uverbs: Allow all DESTROY commands to succeed after disassociate"). The kref put that the comment refers to now lives in remove_handle_idr_uobject(). Update the stale reference. Also update "allocated this IDR with a NULL object" to "allocated this XArray entry with a NULL pointer" to match the actual data structure (xa_store) and the wording already used two lines below ("transfers our kref on uobj to the XArray"). Assisted-by: unnamed:deepseek-v3.2 coccinelle Signed-off-by: Kexin Sun <kexinsun@smail.nju.edu.cn> Link: https://patch.msgid.link/20260321105859.7642-1-kexinsun@smail.nju.edu.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30RDMA: Properly propagate the number of CQEs as unsigned intLeon Romanovsky
Instead of checking whether the number of CQEs is negative or zero, fix the .resize_user_cq() declaration to use unsigned int. This better reflects the expected value range. The sanity check is then handled correctly in ib_uvbers. Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA: Clarify that CQ resize is a user‑space verbLeon Romanovsky
The CQ resize operation is used only by uverbs. Make this explicit. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/core: Remove unused ib_resize_cq() implementationLeon Romanovsky
There are no in-kernel users of the CQ resize functionality, so drop it. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-1-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/nldev: Add dellink function pointerZhu Yanjun
Add a dellink function pointer to rdma_link_ops to allow drivers to clean up resources created during newlink. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20260313023058.13020-2-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-29drivers: net: drop ipv6_stub usage and use direct function callsFernando Fernandez Mancera
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. Convert all drivers currently utilizing ipv6_stub to make direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20260325120928.15848-7-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: - Quite a few irdma bug fixes, several user triggerable - Fix a 0 SMAC header in ionic - Tolerate FW errors for RAAS in bng_re - Don't UAF in efa when printing error events - Better handle pool exhaustion in the new bvec paths * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Harden depth calculation functions RDMA/irdma: Return EINVAL for invalid arp index error RDMA/irdma: Fix deadlock during netdev reset with active connections RDMA/irdma: Remove reset check from irdma_modify_qp_to_err() RDMA/irdma: Clean up unnecessary dereference of event->cm_node RDMA/irdma: Remove a NOP wait_event() in irdma_modify_qp_roce() RDMA/irdma: Update ibqp state to error if QP is already in error state RDMA/irdma: Initialize free_qp completion before using it RDMA/efa: Fix possible deadlock RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ path RDMA/rw: Fall back to direct SGE on MR pool exhaustion RDMA/efa: Fix use of completion ctx after free RDMA/bng_re: Fix silent failure in HWRM version query RDMA/ionic: Preserve and set Ethernet source MAC after ib_ud_header_init() RDMA/irdma: Fix double free related to rereg_user_mr
2026-03-26Merge tag 'dma-mapping-7.0-2026-03-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "A set of fixes for DMA-mapping subsystem, which resolve false- positive warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon Romanovsky) as well as a simple build fix (Miguel Ojeda)" * tag 'dma-mapping-7.0-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: add missing `inline` for `dma_free_attrs` mm/hmm: Indicate that HMM requires DMA coherency RDMA/umem: Tell DMA mapping that UMEM requires coherency iommu/dma: add support for DMA_ATTR_REQUIRE_COHERENT attribute dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set dma-mapping: Introduce DMA require coherency attribute dma-mapping: Clarify valid conditions for CPU cache line overlap dma-mapping: handle DMA_ATTR_CPU_CACHE_CLEAN in trace output dma-debug: Allow multiple invocations of overlapping entries dma: swiotlb: add KMSAN annotations to swiotlb_bounce()
2026-03-20RDMA/umem: Tell DMA mapping that UMEM requires coherencyLeon Romanovsky
The RDMA subsystem exposes DMA regions through the verbs interface, which assumes a coherent system. Use the DMA_ATTR_REQUIRE_COHERENCE attribute to ensure coherency and avoid taking the SWIOTLB path. The RDMA verbs programming model resembles HMM and assumes concurrent DMA and CPU access to userspace memory. The hardware and programming model support "one-sided" operations initiated remotely without any local CPU involvement or notification. These include ATOMIC compare/swap, READ, and WRITE. A remote CPU can use these operations to traverse data structures, manipulate locks, and perform similar tasks without the host CPU’s awareness. If SWIOTLB substitutes memory or DMA is not cache coherent, these use cases break entirely. In-kernel RDMA is fine with incoherent mappings because kernel users do not rely on one-sided operations in ways that would expose these issues. A given region may also be exported multiple times, which can trigger warnings about cacheline overlaps. These warnings are suppressed when the new attribute is used. infiniband rocep8s0f0: mlx5_ib_reg_user_mr:1592:(pid 5812): start 0x2b28c000, iova 0x2b28c000, length 0x1000, access_flags 0x1 infiniband rocep8s0f0: mlx5_ib_reg_user_mr:1592:(pid 5812): start 0x2b28c001, iova 0x2b28c001, length 0xfff, access_flags 0x1 ------------[ cut here ]------------ DMA-API: mlx5_core 0000:08:00.0: cacheline tracking EEXIST, overlapping mappings aren't supported WARNING: kernel/dma/debug.c:620 at add_dma_entry+0x1bb/0x280, CPU#6: ibv_rc_pingpong/5812 Modules linked in: veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core ib_core CPU: 6 UID: 2733 PID: 5812 Comm: ibv_rc_pingpong Tainted: G W 6.19.0+ #129 PREEMPT Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:add_dma_entry+0x1be/0x280 Code: 8b 7b 10 48 85 ff 0f 84 c3 00 00 00 48 8b 6f 50 48 85 ed 75 03 48 8b 2f e8 ff 8e 6a 00 48 89 c6 48 8d 3d 55 ef 2d 01 48 89 ea <67> 48 0f b9 3a 48 85 db 74 1a 48 c7 c7 b0 00 2b 82 e8 9c 25 fd ff RSP: 0018:ff11000138717978 EFLAGS: 00010286 RAX: ffffffffa02d7831 RBX: ff1100010246de00 RCX: 0000000000000000 RDX: ff110001036fac30 RSI: ffffffffa02d7831 RDI: ffffffff82678650 RBP: ff110001036fac30 R08: ff11000110dcb4a0 R09: ff11000110dcb478 R10: 0000000000000000 R11: ffffffff824b30a8 R12: 0000000000000000 R13: 00000000ffffffef R14: 0000000000000202 R15: ff1100010246de00 FS: 00007f59b411c740(0000) GS:ff110008dcc99000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe538f7000 CR3: 000000010e066005 CR4: 0000000000373eb0 Call Trace: <TASK> debug_dma_map_sg+0x1b4/0x390 __dma_map_sg_attrs+0x6d/0x1a0 dma_map_sgtable+0x19/0x30 ib_umem_get+0x254/0x380 [ib_uverbs] mlx5_ib_reg_user_mr+0x68/0x2a0 [mlx5_ib] ib_uverbs_reg_mr+0x17f/0x2a0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0x130 [ib_uverbs] ib_uverbs_cmd_verbs+0xa0b/0xae0 [ib_uverbs] ? ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT_SPEED+0xe0/0xe0 [ib_uverbs] ? mmap_region+0x7a/0xb0 ? do_mmap+0x3b8/0x5c0 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] __x64_sys_ioctl+0x14f/0x8b0 ? ksys_mmap_pgoff+0xc5/0x190 do_syscall_64+0x8c/0xbf0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f59b430aeed Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 RSP: 002b:00007ffe538f9430 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe538f94c0 RCX: 00007f59b430aeed RDX: 00007ffe538f94e0 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffe538f9480 R08: 0000000000000028 R09: 00007ffe538f9684 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffe538f9684 R13: 000000000000000c R14: 000000002b28d170 R15: 000000000000000c </TASK> ---[ end trace 0000000000000000 ]--- Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-7-1dde90a7f08b@nvidia.com
2026-03-17RDMA/rw: Fix MR pool exhaustion in bvec RDMA READ pathChuck Lever
When IOVA-based DMA mapping is unavailable (e.g., IOMMU passthrough mode), rdma_rw_ctx_init_bvec() falls back to checking rdma_rw_io_needs_mr() with the raw bvec count. Unlike the scatterlist path in rdma_rw_ctx_init(), which passes a post-DMA-mapping entry count that reflects coalescing of physically contiguous pages, the bvec path passes the pre-mapping page count. This overstates the number of DMA entries, causing every multi-bvec RDMA READ to consume an MR from the QP's pool. Under NFS WRITE workloads the server performs RDMA READs to pull data from the client. With the inflated MR demand, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. svcrdma treats this as a DMA mapping failure, closes the connection, and the client reconnects -- producing a cycle of 71% RPC retransmissions and ~100 reconnections per test run. RDMA WRITEs (NFS READ direction) are unaffected because DMA_TO_DEVICE never triggers the max_sgl_rd check. Remove the rdma_rw_io_needs_mr() gate from the bvec path entirely, so that bvec RDMA operations always use the map_wrs path (direct WR posting without MR allocation). The bvec caller has no post-DMA-coalescing segment count available -- xdr_buf and svc_rqst hold pages as individual pointers, and physical contiguity is discovered only during DMA mapping -- so the raw page count cannot serve as a reliable input to rdma_rw_io_needs_mr(). iWARP devices, which require MRs unconditionally, are handled by an earlier check in rdma_rw_ctx_init_bvec() and are unaffected. Fixes: bea28ac14cab ("RDMA/core: add MR support for bvec-based RDMA operations") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-3-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-17RDMA/rw: Fall back to direct SGE on MR pool exhaustionChuck Lever
When IOMMU passthrough mode is active, ib_dma_map_sgtable_attrs() produces no coalescing: each scatterlist page maps 1:1 to a DMA entry, so sgt.nents equals the raw page count. A 1 MB transfer yields 256 DMA entries. If that count exceeds the device's max_sgl_rd threshold (an optimization hint from mlx5 firmware), rdma_rw_io_needs_mr() steers the operation into the MR registration path. Each such operation consumes one or more MRs from a pool sized at max_rdma_ctxs -- roughly one MR per concurrent context. Under write-intensive workloads that issue many concurrent RDMA READs, the pool is rapidly exhausted, ib_mr_pool_get() returns NULL, and rdma_rw_init_one_mr() returns -EAGAIN. Upper layer protocols treat this as a fatal DMA mapping failure and tear down the connection. The max_sgl_rd check is a performance optimization, not a correctness requirement: the device can handle large SGE counts via direct posting, just less efficiently than with MR registration. When the MR pool cannot satisfy a request, falling back to the direct SGE (map_wrs) path avoids the connection reset while preserving the MR optimization for the common case where pool resources are available. Add a fallback in rdma_rw_ctx_init() so that -EAGAIN from rdma_rw_init_mr_wrs() triggers direct SGE posting instead of propagating the error. iWARP devices, which mandate MR registration for RDMA READs, and force_mr debug mode continue to treat -EAGAIN as terminal. Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260313194201.5818-2-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Add helpers for umem dmabuf revoke lockJacob Moroni
Added helpers to acquire and release the umem dmabuf revoke lock. The intent is to avoid the need for drivers to peek into the ib_umem_dmabuf internals to get the dma_resv_lock and bring us one step closer to abstracting ib_umem_dmabuf away from drivers in general. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-5-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Add pinned revocable dmabuf import interfaceJacob Moroni
Added an interface for importing a pinned but revocable dmabuf. This interface can be used by drivers that are capable of revocation so that they can import dmabufs from exporters that may require it, such as VFIO. This interface implements a two step process, where drivers will first call ib_umem_dmabuf_get_pinned_revocable_and_lock() which will pin and map the dmabuf (and provide a functional move_notify/invalidate_mappings callback), but will return with the lock still held so that the driver can then populate the callback via ib_umem_dmabuf_set_revoke_locked() without races from concurrent revocations. This scheme also allows for easier integration with drivers that may not have actually allocated their internal MR objects at the time of the get_pinned_revocable* call. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-4-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Move umem dmabuf revoke logic into helper functionJacob Moroni
This same logic will eventually be reused from within the invalidate_mappings callback which already has the dma_resv_lock held, so break it out into a separate function so it can be reused. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-3-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/umem: Add ib_umem_dmabuf_get_pinned_and_lock helperJacob Moroni
Move the inner logic of ib_umem_dmabuf_get_pinned_with_dma_device() to a new static function that returns with the lock held upon success. The intent is to allow reuse for the future get_pinned_revocable_and_lock function. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-2-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA: Add IB_UVERBS_CORE_SUPPORT_ROBUST_UDATAJason Gunthorpe
This flag can be set by drivers once they have finished auditing and implementing the full udata support on every udata operation. My intention going forward is that driver authors proposing new udata uAPI for their drivers must first do the work and set this flag. If this flag is not set the userspace should not try to use udata based uAPI newer than this commit, though on a case by case basis it may be OK based on what checks historical kernels performed on the specific call. Since bnxt_re is audited now, it is the first driver to set the flag. Link: https://patch.msgid.link/r/13-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_respond_udata()Jason Gunthorpe
Wrap the common copy_to_user() pattern used in drivers and enhance it to zero pad as well. Include debug logging on failures. Link: https://patch.msgid.link/r/5-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_copy_validate_udata_in_cm()Jason Gunthorpe
For structures with comp_mask also absorb the check of comp_mask valid bits into the helper. This is slightly tricky because ~ might not fully extend to 64 bits, the helper inserts an explicit type to ensure that ~ covers all bits. Link: https://patch.msgid.link/r/4-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Add ib_copy_validate_udata_in()Jason Gunthorpe
Add a new function to consolidate the required compatibility pattern for driver data of checking against a minimum size, and checking for unknown trailing bytes to be zero into a function. This new function uses the faster copy_struct_from_user() instead of trying to directly check for zero. Incorporate the common ibdev_dbg() logging directly into the error paths of the helper. Link: https://patch.msgid.link/r/3-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA/core: Add rdma_udata_to_dev()Jason Gunthorpe
Get an ib_device out of a udata so it can be used for debug prints. Link: https://patch.msgid.link/r/2-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-08RDMA: Use copy_struct_from_user() instead of open codingJason Gunthorpe
This entire function is just open coding copy_struct_from_user(), call it directly, it is faster anyhow. Link: https://patch.msgid.link/r/1-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-05Add support for TLP emulationLeon Romanovsky
This series adds support for Transaction Layer Packet (TLP) emulation response gateway regions, enabling userspace device emulation software to write TLP responses directly to lower layers without kernel driver involvement. Currently, the mlx5 driver exposes VirtIO emulation access regions via the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that ioctl to also support allocating TLP response gateway channels for PCI device emulation use cases. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/core: Delete not-implemented get_vector_affinityLeon Romanovsky
No drivers implement .get_vector_affinity(), and no callers invoke ib_get_vector_affinity(), so remove it. Link: https://patch.msgid.link/20260226-get_vector_affinity-v1-1-910a899c4e5d@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05RDMA/nldev: Expose kernel-internal FRMR pools in netlinkMichael Guralnik
Allow netlink users, through the usage of driver-details netlink attribute, to get information about internal FRMR pools that use the kernel_vendor_key FRMR key member. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-11-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/nldev: Add command to set pinned FRMR handlesMichael Guralnik
Allow users to set through netlink, for a specific FRMR pool, the amount of handles that are not aged, and fill the pool to this amount. This allows users to warm-up the FRMR pools to an expected amount of handles with specific attributes that fits their expected usage. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-10-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA/core: Add netlink command to modify FRMR agingMichael Guralnik
Allow users to set FRMR pools aging timer through netlink. This functionality will allow user to control how long handles reside in the kernel before being destroyed, thus being able to tune the tradeoff between memory and HW object consumption and memory registration optimization. Since FRMR pools is highly beneficial for application restart scenarios, this command allows users to modify the aging timer to their application restart time, making sure the FRMR handles deregistered on application teardown are kept for long enough in the pools for reuse in the application startup. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-9-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA/nldev: Add command to get FRMR poolsMichael Guralnik
Add support for a new command in netlink to dump to user the state of the FRMR pools on the devices. Expose each pool with its key and the usage statistics for it. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-8-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA/core: Add pinned handles to FRMR poolsMichael Guralnik
Add a configuration of pinned handles on a specific FRMR pool. The configured amount of pinned handles will not be aged and will stay available for users to claim. Upon setting the amount of pinned handles to an FRMR pool, we will make sure we have at least the pinned amount of handles associated with the pool and create more, if necessary. The count for pinned handles take into account handles that are used by user MRs and handles in the queue. Introduce a new FRMR operation of build_key that allows drivers to manipulate FRMR keys supplied by the user, allowing failing for unsupported properties and masking of properties that are modifiable. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-5-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA/core: Add FRMR pools statisticsMichael Guralnik
Count for each pool the number of FRMR handles popped and held by user MRs. Also keep track of the max value of this counter. Next patches will expose the statistics through netlink. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-4-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA/core: Add aging to FRMR poolsMichael Guralnik
Add aging mechanism to handles of FRMR pools. Keep the handles stored in FRMR pools for at least 1 minute for application to reuse, destroy all handles which were not reused. Add a new queue to each pool to accomplish that. Upon aging trigger, destroy all FRMR handles from the new 'inactive' queue and move all handles from the 'active' pool to the 'inactive' pool. This ensures all destroyed handles were not reused for at least one aging time period and were not held longer than 2 aging time periods. Handles from the inactive queue will be popped only if the active queue is empty. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-3-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02IB/core: Introduce FRMR poolsMichael Guralnik
Add a generic Fast Registration Memory Region pools mechanism to allow drivers to optimize memory registration performance. Drivers that have the ability to reuse MRs or their underlying HW objects can take advantage of the mechanism to keep a 'handle' for those objects and use them upon user request. We assume that to achieve this goal a driver and its HW should implement a modify operation for the MRs that is able to at least clear and set the MRs and in more advanced implementations also support changing a subset of the MRs properties. The mechanism is built using an RB-tree consisting of pools, each pool represents a set of MR properties that are shared by all of the MRs residing in the pool and are unmodifiable by the vendor driver or HW. The exposed API from ib_core to the driver has 4 operations: Init and cleanup - handles data structs and locks for the pools. Push and pop - store and retrieve 'handle' for a memory registration or deregistrations request. The FRMR pools mechanism implements the logic to search the RB-tree for a pool with matching properties and create a new one when needed and requires the driver to implement creation and destruction of a 'handle' when pool is empty or a handle is requested or is being destroyed. Later patch will introduce Netlink API to interact with the FRMR pools mechanism to allow users to both configure and track its usage. A vendor wishing to configure FRMR pool without exposing it or without exposing internal MR properties to users, should use the kernel_vendor_key field in the pools key. This can be useful in a few cases, e.g, when the FRMR handle has a vendor-specific un-modifiable property that the user registering the memory might not be aware of. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-2-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02Merge tag 'drm-misc-next-2026-02-26' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v7.1: UAPI Changes: connector: - Add panel_type property fourcc: - Add ARM interleaved 64k modifier nouveau: - Query Z-Cull info with DRM_IOCTL_NOUVEAU_GET_ZCULL_INFO Cross-subsystem Changes: coreboot: - Clean up coreboot framebuffer support dma-buf: - Provide revoke mechanism for shared buffers - Rename move_notify callback to invalidate_mappings and update users. - Always enable move_notify - Support dma_fence_was_initialized() test - Protect dma_fence_ops by RCU and improve locking - Fix sparse warnings Core Changes: atomic: - Allocate drm_private_state via callback and convert drivers atomic-helper: - Use system_percpu_wq buddy: - Make buddy allocator available to all DRM drivers - Document flags and structures colorop: - Add destroy helper and convert drivers fbdev-emulation: - Clean up gem: - Fix drm_gem_objects_lookup() error cleanup Driver Changes: amdgpu: - Set panel_type to OELD for eDP atmel-hlcdc: - Support sana5d65 LCD controller bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - Clean up imagination: - Clean up komeda: - Fix integer overflow in AFBC checks mcde: - Improve bridge handling nouveau: - Provide Z-cull info to user space - gsp: Support GA100 - Shutdown on PCI device shutdown - Clean up panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - Fix Kconfig dependencies panthor: - Add tracepoints for power and IRQs rcar-du: - dsi: fix VCLK calculation rockchip: - vop2: Use drm_ logging functions - Support DisplayPort on RK3576 sysfb: - corebootdrm: Support system framebuffer on coreboot firmware; detect orientation - Clean up pixel-format lookup sun4i: - Clean up tilcdc: - Use DT bindings scheme - Use managed DRM interfaces - Support DRM_BRIDGE_ATTACH_NO_CONNECTOR - Clean up a lot of obsolete code v3d: - Clean up vc4: - Use system_percpu_wq - Clean up verisilicon: - Support DC8200 plus DT bindings virtgpu: - Support PRIME imports with enabled 3D Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260226143615.GA47200@linux.fritz.box
2026-02-26RDMA/uverbs: Import DMA-BUF module in uverbs_std_types_dmabuf fileLeon Romanovsky
Fix the following compilation error: ERROR: modpost: module ib_uverbs uses symbol dma_buf_move_notify from namespace DMA_BUF, but does not import it. Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations") Link: https://patch.msgid.link/20260225-fix-uverbs-compilation-v1-1-acf7b3d0f9fa@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>