linux-toradex.git/drivers/infiniband/sw, branch master

RDMA/siw: publish QP after initialization

2026-07-02T17:37:03+00:00

siw_create_qp() currently calls siw_qp_add() before the queues, CQ
pointers, state, completion, and device list entry are ready. A QPN
lookup can therefore reach a QP that is still being constructed.

Move siw_qp_add() to the end of siw_create_qp(), after QP
initialization and before adding the QP to the siw device list.

Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Link: https://patch.msgid.link/r/20260630060040.966461-1-ruoyuw560@gmail.com
Suggested-by: Bernard Metzler 
Signed-off-by: Ruoyu Wang 
Acked-by: Bernard Metzler 
Signed-off-by: Jason Gunthorpe

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

2026-06-18T15:16:21+00:00

Pull rdma updates from Jason Gunthorpe:
 "Many AI driven bug fixes, and several big driver API cleanups

   - Driver bug fixes and minor cleanups in mlx5, hns, rxe, efa, siw,
     rtrs, mana, irdma, mlx4. Commonly error path flows, integer
     arithmetic overflows on unsafe data, out of bounds access, and use
     after free issues under races.

   - Second half of the new udata API for drivers focusing on uAPI
     response

   - bnxt_re supports more options for QP creation that will allow a dv
     path in rdma-core

   - Untangle the module dependencies so drivers don't link to
     ib_uverbs.ko as was originall intended

   - Provide a new way to handle umems with a consistent simplified uAPI
     and update several drivers to use it. This brings dmabuf support to
     more places and more drivers

   - Support for mlx5 rate limit and packet pacing for UD and UC

   - A batch of fixes for the new shared FRMR pools infrastructure"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (148 commits)
  RDMA/irdma: Replace waitqueue and flag with completion
  RDMA/hns: Fix memory leak of bonding resources
  RDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sg
  docs: infiniband: correct name of option to enable the ib_uverbs module
  RDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocated
  RDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabled
  RDMA/bnxt_re: Avoid repeated requests to allocate WC pages
  RDMA/bnxt_re: Proper rollback if the ioremap fails
  RDMA/bnxt_re: Add a max slot check for SQ
  RDMA/bnxt_re: Avoid displaying the kernel pointer
  RDMA/bnxt_re: Free CQ toggle page after firmware teardown
  RDMA/bnxt_re: Free SRQ toggle page after firmware teardown
  RDMA/bnxt_re: Initialize dpi variable to zero
  ABI: sysfs-class-infiniband: minor cleanup
  RDMA/mlx5: Release the HW‑provided UAR index rather than the SW one
  RDMA/mlx5: Fix undefined shift of user RQ WQE size
  RDMA/mlx5: Remove raw RSS QP restrack tracking
  RDMA/mlx5: Remove DCT restrack tracking
  RDMA/mlx5: Drop FRMR pool handle on UMR revoke failure
  RDMA/core: Add ib_frmr_pool_drop for unrecoverable handles
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2026-06-11T21:33:35+00:00

Cross-merge networking fixes after downstream PR (net-7.1-rc8).

Conflicts:

drivers/net/ethernet/wangxun/txgbe/txgbe_aml.c
  f67aead16e85 ("net: txgbe: rework service event handling")
  57d39faed4c9 ("net: txgbe: improve functions of AML 40G devices")

net/rds/info.c
  512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()")
  6e94eeb2a2a6 ("rds: convert to getsockopt_iter")

Adjacent changes:

include/net/sock.h
  1ee90b77b727 ("net: guard timestamp cmsgs to real error queue skbs")
  f0de88303d5e ("net: make is_skb_wmem() available to modules")

Signed-off-by: Jakub Kicinski

RDMA: During rereg_mr ensure that REREG_ACCESS is compatible

2026-06-08T16:39:20+00:00

If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be
re-evaluated to ensure it is properly pinned as RW. Since the umem is
hidden inside each driver's mr struct add a ib_umem_check_rereg() function
that each driver has to call before processing IB_MR_REREG_ACCESS.

mlx4 has to retain its duplicate ib_access_writable check because it
implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items
in place sequentially while the MR is live, so it will continue to not
support this combination.

Cc: stable@vger.kernel.org
Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage")
Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com
Reported-by: Philip Tsukerman 
Signed-off-by: Jason Gunthorpe

RDMA/siw: Fix endpoint/socket association handling

2026-06-05T17:25:52+00:00

Disassociating a socket from an endpoint via siw_socket_disassoc() may
release the last reference on that endpoint and free it. Therefore, don't
clear the endpoints socket pointer after calling that function, but
within.

This fixes a:

  BUG: KASAN: slab-use-after-free in siw_cm_work_handler (drivers/infiniband/sw/siw/siw_cm.c:1053 drivers/infiniband/sw/siw/siw_cm.c:1075)

which occurred after processing a malformed MPA request during connection
establishment, causing the new endpoint to be closed.

Fixes: 6c52fdc244b5c ("rdma/siw: connection management")
Link: https://patch.msgid.link/r/20260604160808.30948-1-bernard.metzler@linux.dev
Reported-by: Shuangpeng Bai 
Signed-off-by: Bernard Metzler 
Signed-off-by: Jason Gunthorpe

RDMA/siw: bound Read Response placement to the RREAD length

2026-06-05T17:06:07+00:00

In drivers/infiniband/sw/siw/siw_qp_rx.c, siw_proc_rresp() places each
inbound Read Response DDP segment at sge->laddr + wqe->processed and then
accumulates wqe->processed, but it never checks the running total against
the sink buffer length on continuation segments. siw_check_sge() resolves
and validates the sink memory only on the first fragment (the if (!*mem)
branch), and siw_rresp_check_ntoh() compares the cumulative length against
wqe->bytes only on the final segment (the !frx->more_ddp_segs guard).

A connected siw peer that answers an outstanding RREAD with Read Response
segments that keep the DDP Last flag clear, carrying more total payload
than the RREAD requested, drives wqe->processed past the validated sink
buffer; the next siw_rx_data() call writes out of bounds at
sge->laddr + wqe->processed. siw runs iWARP over ordinary routable TCP,
so the peer is the remote end of an established RDMA connection and needs
no local privilege.

Bound every segment before placement, exactly as siw_proc_send() and
siw_proc_write() already do for their tagged and untagged paths, and
terminate the connection with a base-or-bounds DDP error when the
Read Response would overrun the sink buffer.

This is the second receive-path length fix for this file. A separate
change rejects an MPA FPDU length that underflows the per-fragment
remainder in the header decode; that guard does not cover this case,
because here each individual segment length is self-consistent and only
the accumulated placement offset overruns the buffer.

Fixes: 8b6a361b8c48 ("rdma/siw: receive path")
Link: https://patch.msgid.link/r/20260602194700.2273758-1-michael.bommarito@gmail.com
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito 
Signed-off-by: Jason Gunthorpe

RDMA: Update the query_device() op

2026-06-03T18:12:43+00:00

This op hasn't followed the normal pattern of passing NULL for udata when
invoked by the kernel. Instead the kernel caller creates a dummy ib_udata
on the stack and passes that in. It does not seem to currently be a bug,
but this flow should be modernized to use the new API flow and in the
process accept NULL as well.

Only mlx4 uses an input request structure, have every other driver call
ib_is_udata_in_empty() to enforce the lack of request structs.

Use ib_respond_empty_udata() in every driver that does not use a response
struct.

Ensure a check for NULL udata before calling ib_respond_udata() in
bnxt_re, efa, and mlx5.

Make mlx4 safe to be called with NULL.

Link: https://patch.msgid.link/r/2-v1-922fa8e828ba+f7-ib_udata_stack_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe

RDMA/rxe: Copy WQE to local buffer in non-SRQ receive path

2026-05-29T23:33:59+00:00

For non-SRQ QPs, the responder reads WQE fields directly from the
shared queue buffer mapped into userspace. This allows a malicious
user to modify fields like num_sge or sge entries while the kernel
is processing the WQE, leading to out-of-bounds reads in
rxe_resp_check_length() and copy_data().

Introduce get_recv_wqe() that validates num_sge and copies the WQE
to a kernel-local buffer before processing, matching the approach
already used for SRQ WQEs in get_srq_wqe(). The srq_wqe buffer is
reused since SRQ and non-SRQ paths are mutually exclusive per QP.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260518215040.1598586-3-tristan@talencesecurity.com
Signed-off-by: Tristan Madani 
Reviewed-by: Zhu Yanjun 
Signed-off-by: Jason Gunthorpe

RDMA/rxe: Fix TOCTOU heap overflow in get_srq_wqe

2026-05-29T23:32:48+00:00

get_srq_wqe() reads wqe->dma.num_sge from the shared receive queue
buffer, which is mapped into userspace. It validates num_sge against
max_sge, but then re-reads the same field to calculate the memcpy
size. A concurrent userspace thread can modify num_sge between
validation and use, causing a heap buffer overflow when copying the
WQE into qp->resp.srq_wqe.

Read num_sge into a local variable and use it for both the bounds
check and the size calculation.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260518215040.1598586-2-tristan@talencesecurity.com
Signed-off-by: Tristan Madani 
Reviewed-by: Zhu Yanjun 
Signed-off-by: Jason Gunthorpe

RDMA/umem: Rename ib_umem_get() to ib_umem_get_va()

2026-05-29T23:19:57+00:00

The new umem getter family being introduced in follow-up patches
need a fitting name for the central all-source helper that resolves
attributes, legacy fillers and a UHW VA fallback.

Rename the existing VA-pinning helper ib_umem_get() to ib_umem_get_va()
so the name is freed up. The new name is consistent with names of rest
of the helpers that are about to be introduced.

Link: https://patch.msgid.link/r/20260529134312.2836341-2-jiri@resnulli.us
Signed-off-by: Jiri Pirko 
Signed-off-by: Jason Gunthorpe