linux-toradex.git/drivers/infiniband/core/uverbs_cmd.c, branch v7.1

RDMA/core: Fix rereg_mr use-after-free race

2026-04-29T19:37:12+00:00

When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.

A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.

Racing flow between two rereg_mr calls:

 CPU0                           CPU1
 ----                           ----
 rereg_user_mr(mr_handle)
   uobj_get_write(mr_handle) -> mr0
   mr1 = driver→rereg()
   rdma_alloc_commit_uobject(mr1)
   // mr1 replaced mr0 and is unlocked
   uobj_put_destroy(mr0)
                                rereg_user_mr(mr_handle)
                                  uobj_get_write(mr_handle) -> mr1
                                  mr2 = driver→rereg()
                                  rdma_alloc_commit_uobject(mr2)
                                  // mr2 replaced mr1 and is unlocked
                                  uobj_put_destroy(mr1)
                                  // Destroys mr1!

   resp.lkey = mr1->lkey; // UAF - mr1 was freed!
   resp.rkey = mr1->rkey; // UAF - mr1 was freed!

Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.

Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com
Signed-off-by: Michael Guralnik 
Reviewed-by: Maher Sanalla 
Signed-off-by: Edward Srouji 
Signed-off-by: Jason Gunthorpe

RDMA: Properly propagate the number of CQEs as unsigned int

2026-03-30T17:47:44+00:00

Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in ib_uvbers.

Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com
Reviewed-by: Zhu Yanjun 
Signed-off-by: Leon Romanovsky

RDMA: Clarify that CQ resize is a user‑space verb

2026-03-30T17:47:44+00:00

The CQ resize operation is used only by uverbs. Make this explicit.

Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky

RDMA: Use copy_struct_from_user() instead of open coding

2026-03-08T10:20:24+00:00

This entire function is just open coding copy_struct_from_user(), call it
directly, it is faster anyhow.

Link: https://patch.msgid.link/r/1-v3-bd56dd443069+49-bnxt_re_uapi_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe

RDMA/core: Reject zero CQE count

2026-02-25T13:15:30+00:00

All drivers already ensure that the number of CQEs is at least 1.
Add this validation to the core so drivers no longer need to repeat it.
Future patches converting to the .create_user_cq() interface will remove
the per‑driver checks.

Link: https://patch.msgid.link/20260213-refactor-umem-v1-8-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky

RDMA/core: Prepare create CQ path for API unification

2026-02-25T13:15:30+00:00

Ensure that .create_cq_umem() and .create_cq() follow the same API
contract, allowing drivers to be gradually migrated to the umem-aware
CQ management flow.

Link: https://patch.msgid.link/20260213-refactor-umem-v1-7-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky

RDMA/core: Manage CQ umem in core code

2026-02-25T13:15:30+00:00

In the current implementation, CQ umem is handled both by ib_core and
the driver. ib_core sometimes creates and destroys it, while the driver
also destroys it.

Store the umem in struct ib_cq and ensure that only ib_core manages
its lifetime, relying solely on its internal reference counter.

Link: https://patch.msgid.link/20260213-refactor-umem-v1-5-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky

Convert more 'alloc_obj' cases to default GFP_KERNEL arguments

2026-02-22T04:03:00+00:00

This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using

    git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook