<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-toradex.git/drivers/infiniband/ulp, branch v6.9-rc4</title>
<subtitle>Linux kernel for Apalis and Colibri modules</subtitle>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/'/>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma</title>
<updated>2024-03-18T22:34:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-03-18T22:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=6207b37eb5c5e48f45f3ffe0a299d2df6b42ed69'/>
<id>6207b37eb5c5e48f45f3ffe0a299d2df6b42ed69</id>
<content type='text'>
Pull rdma updates from Jason Gunthorpe:
 "Very small update this cycle:

   - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5,
     irdma, rxe, rtrs, mana

   - Simplify the hns hem mechanism

   - Fix EFA's MSI-X allocation in resource constrained configurations

   - Fix a KASN splat in srpt

   - Narrow hns's congestion control selection to QPs granularity and
     allow userspace to select it

   - Solve a parallel module loading race between the CM module and a
     driver module

   - Flexible array cleanup

   - Dump hns's SCC Conext to 'rdma res' for debugging

   - Make mana build page lists for HW objects that require a 0 offset
     correctly

   - Stuck CM ID debugging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits)
  RDMA/cm: add timeout to cm_destroy_id wait
  RDMA/mana_ib: Use virtual address in dma regions for MRs
  RDMA/mana_ib: Fix bug in creation of dma regions
  RDMA/hns: Append SCC context to the raw dump of QPC
  RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings
  RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity
  RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()
  RDMA/uverbs: Remove flexible arrays from struct *_filter
  RDMA/device: Fix a race between mad_client and cm_client init
  RDMA/hns: Fix mis-modifying default congestion control algorithm
  RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user
  RDMA/srpt: Do not register event handler until srpt device is fully setup
  RDMA/irdma: Remove duplicate assignment
  RDMA/efa: Limit EQs to available MSI-X vectors
  RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype
  RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype
  RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function
  RDMA/mana_ib: Introduce mana_ib_get_netdev helper function
  RDMA/mana_ib: Introduce mdev_to_gc helper function
  RDMA/hns: Simplify 'struct hns_roce_hem' allocation
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull rdma updates from Jason Gunthorpe:
 "Very small update this cycle:

   - Minor code improvements in fi, rxe, ipoib, mana, cxgb4, mlx5,
     irdma, rxe, rtrs, mana

   - Simplify the hns hem mechanism

   - Fix EFA's MSI-X allocation in resource constrained configurations

   - Fix a KASN splat in srpt

   - Narrow hns's congestion control selection to QPs granularity and
     allow userspace to select it

   - Solve a parallel module loading race between the CM module and a
     driver module

   - Flexible array cleanup

   - Dump hns's SCC Conext to 'rdma res' for debugging

   - Make mana build page lists for HW objects that require a 0 offset
     correctly

   - Stuck CM ID debugging"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (29 commits)
  RDMA/cm: add timeout to cm_destroy_id wait
  RDMA/mana_ib: Use virtual address in dma regions for MRs
  RDMA/mana_ib: Fix bug in creation of dma regions
  RDMA/hns: Append SCC context to the raw dump of QPC
  RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings
  RDMA/hns: Support userspace configuring congestion control algorithm with QP granularity
  RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()
  RDMA/uverbs: Remove flexible arrays from struct *_filter
  RDMA/device: Fix a race between mad_client and cm_client init
  RDMA/hns: Fix mis-modifying default congestion control algorithm
  RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_user
  RDMA/srpt: Do not register event handler until srpt device is fully setup
  RDMA/irdma: Remove duplicate assignment
  RDMA/efa: Limit EQs to available MSI-X vectors
  RDMA/mlx5: Delete unused mlx5_ib_copy_pas prototype
  RDMA/cxgb4: Delete unused c4iw_ep_redirect prototype
  RDMA/mana_ib: Introduce mana_ib_install_cq_cb helper function
  RDMA/mana_ib: Introduce mana_ib_get_netdev helper function
  RDMA/mana_ib: Introduce mdev_to_gc helper function
  RDMA/hns: Simplify 'struct hns_roce_hem' allocation
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>rtnetlink: prepare nla_put_iflink() to run under RCU</title>
<updated>2024-02-26T11:46:12+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-02-22T10:50:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=e353ea9ce471331c13edffd5977eadd602d1bb80'/>
<id>e353ea9ce471331c13edffd5977eadd602d1bb80</id>
<content type='text'>
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We want to be able to run rtnl_fill_ifinfo() under RCU protection
instead of RTNL in the future.

This patch prepares dev_get_iflink() and nla_put_iflink()
to run either with RTNL or RCU held.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/rtrs-clt: Check strnlen return len in sysfs mpath_policy_store()</title>
<updated>2024-02-25T16:21:42+00:00</updated>
<author>
<name>Alexey Kodanev</name>
<email>aleksei.kodanev@bell-sw.com</email>
</author>
<published>2024-02-21T11:32:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=7a7b7f575a25aa68ee934ee8107294487efcb3fe'/>
<id>7a7b7f575a25aa68ee934ee8107294487efcb3fe</id>
<content type='text'>
strnlen() may return 0 (e.g. for "\0\n" string), it's better to
check the result of strnlen() before using 'len - 1' expression
for the 'buf' array index.

Detected using the static analysis tool - Svace.

Fixes: dc3b66a0ce70 ("RDMA/rtrs-clt: Add a minimum latency multipath policy")
Signed-off-by: Alexey Kodanev &lt;aleksei.kodanev@bell-sw.com&gt;
Link: https://lore.kernel.org/r/20240221113204.147478-1-aleksei.kodanev@bell-sw.com
Acked-by: Jack Wang &lt;jinpu.wang@ionos.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
strnlen() may return 0 (e.g. for "\0\n" string), it's better to
check the result of strnlen() before using 'len - 1' expression
for the 'buf' array index.

Detected using the static analysis tool - Svace.

Fixes: dc3b66a0ce70 ("RDMA/rtrs-clt: Add a minimum latency multipath policy")
Signed-off-by: Alexey Kodanev &lt;aleksei.kodanev@bell-sw.com&gt;
Link: https://lore.kernel.org/r/20240221113204.147478-1-aleksei.kodanev@bell-sw.com
Acked-by: Jack Wang &lt;jinpu.wang@ionos.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/srpt: fix function pointer cast warnings</title>
<updated>2024-02-14T09:15:54+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2024-02-13T10:07:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=eb5c7465c3240151cd42a55c7ace9da0026308a1'/>
<id>eb5c7465c3240151cd42a55c7ace9da0026308a1</id>
<content type='text'>
clang-16 notices that srpt_qp_event() gets called through an incompatible
pointer here:

drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 1815 |                 = (void(*)(struct ib_event *, void*))srpt_qp_event;

Change srpt_qp_event() to use the correct prototype and adjust the
argument inside of it.

Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
clang-16 notices that srpt_qp_event() gets called through an incompatible
pointer here:

drivers/infiniband/ulp/srpt/ib_srpt.c:1815:5: error: cast from 'void (*)(struct ib_event *, struct srpt_rdma_ch *)' to 'void (*)(struct ib_event *, void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
 1815 |                 = (void(*)(struct ib_event *, void*))srpt_qp_event;

Change srpt_qp_event() to use the correct prototype and adjust the
argument inside of it.

Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20240213100728.458348-1-arnd@kernel.org
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/srpt: Support specifying the srpt_service_guid parameter</title>
<updated>2024-02-05T07:57:53+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2024-02-05T00:42:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=fdfa083549de5d50ebf7f6811f33757781e838c0'/>
<id>fdfa083549de5d50ebf7f6811f33757781e838c0</id>
<content type='text'>
Make loading ib_srpt with this parameter set work. The current behavior is
that setting that parameter while loading the ib_srpt kernel module
triggers the following kernel crash:

BUG: kernel NULL pointer dereference, address: 0000000000000000
Call Trace:
 &lt;TASK&gt;
 parse_one+0x18c/0x1d0
 parse_args+0xe1/0x230
 load_module+0x8de/0xa60
 init_module_from_file+0x8b/0xd0
 idempotent_init_module+0x181/0x240
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x5f/0xe0
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

Cc: LiHonggang &lt;honggangli@163.com&gt;
Reported-by: LiHonggang &lt;honggangli@163.com&gt;
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20240205004207.17031-1-bvanassche@acm.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make loading ib_srpt with this parameter set work. The current behavior is
that setting that parameter while loading the ib_srpt kernel module
triggers the following kernel crash:

BUG: kernel NULL pointer dereference, address: 0000000000000000
Call Trace:
 &lt;TASK&gt;
 parse_one+0x18c/0x1d0
 parse_args+0xe1/0x230
 load_module+0x8de/0xa60
 init_module_from_file+0x8b/0xd0
 idempotent_init_module+0x181/0x240
 __x64_sys_finit_module+0x5a/0xb0
 do_syscall_64+0x5f/0xe0
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

Cc: LiHonggang &lt;honggangli@163.com&gt;
Reported-by: LiHonggang &lt;honggangli@163.com&gt;
Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20240205004207.17031-1-bvanassche@acm.org
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/srpt: Do not register event handler until srpt device is fully setup</title>
<updated>2024-02-04T09:44:50+00:00</updated>
<author>
<name>William Kucharski</name>
<email>william.kucharski@oracle.com</email>
</author>
<published>2024-02-02T09:15:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=c21a8870c98611e8f892511825c9607f1e2cd456'/>
<id>c21a8870c98611e8f892511825c9607f1e2cd456</id>
<content type='text'>
Upon rare occasions, KASAN reports a use-after-free Write
in srpt_refresh_port().

This seems to be because an event handler is registered before the
srpt device is fully setup and a race condition upon error may leave a
partially setup event handler in place.

Instead, only register the event handler after srpt device initialization
is complete.

Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Link: https://lore.kernel.org/r/20240202091549.991784-2-william.kucharski@oracle.com
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Upon rare occasions, KASAN reports a use-after-free Write
in srpt_refresh_port().

This seems to be because an event handler is registered before the
srpt device is fully setup and a race condition upon error may leave a
partially setup event handler in place.

Instead, only register the event handler after srpt device initialization
is complete.

Fixes: a42d985bd5b2 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: William Kucharski &lt;william.kucharski@oracle.com&gt;
Link: https://lore.kernel.org/r/20240202091549.991784-2-william.kucharski@oracle.com
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>RDMA/ipoib: Print symbolic error name instead of error code</title>
<updated>2024-01-25T09:50:58+00:00</updated>
<author>
<name>Christian Heusel</name>
<email>christian@heusel.eu</email>
</author>
<published>2024-01-11T14:13:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=64854534ff96641e69fa1b7b2caab54259255c14'/>
<id>64854534ff96641e69fa1b7b2caab54259255c14</id>
<content type='text'>
Utilize the %pe print specifier to get the symbolic error name as a
string (i.e "-ENOMEM") in the log message instead of the error code to
increase its readability.

This change was suggested in
https://lore.kernel.org/all/92972476-0b1f-4d0a-9951-af3fc8bc6e65@suswa.mountain/

Signed-off-by: Christian Heusel &lt;christian@heusel.eu&gt;
Link: https://lore.kernel.org/r/20240111141311.987098-1-christian@heusel.eu
Reviewed-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Utilize the %pe print specifier to get the symbolic error name as a
string (i.e "-ENOMEM") in the log message instead of the error code to
increase its readability.

This change was suggested in
https://lore.kernel.org/all/92972476-0b1f-4d0a-9951-af3fc8bc6e65@suswa.mountain/

Signed-off-by: Christian Heusel &lt;christian@heusel.eu&gt;
Link: https://lore.kernel.org/r/20240111141311.987098-1-christian@heusel.eu
Reviewed-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/iser: Prevent invalidating wrong MR</title>
<updated>2024-01-04T20:37:03+00:00</updated>
<author>
<name>Sergey Gorenko</name>
<email>sergeygo@nvidia.com</email>
</author>
<published>2023-12-19T07:23:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=2f1888281e67205bd80d3e8f54dbd519a9653f26'/>
<id>2f1888281e67205bd80d3e8f54dbd519a9653f26</id>
<content type='text'>
The iser_reg_resources structure has two pointers to MR but only one
mr_valid field. The implementation assumes that we use only *sig_mr when
pi_enable is true. Otherwise, we use only *mr. However, it is only
sometimes correct. Read commands without protection information occur even
when pi_enble is true. For example, the following SCSI commands have a
Data-In buffer but never have protection information: READ CAPACITY (16),
INQUIRY, MODE SENSE(6), MAINTENANCE IN. So, we use
*sig_mr for some SCSI commands and *mr for the other SCSI commands.

In most cases, it works fine because the remote invalidation is applied.
However, there are two cases when the remote invalidation is not
applicable.
 1. Small write commands when all data is sent as an immediate.
 2. The target does not support the remote invalidation feature.

The lazy invalidation is used if the remote invalidation is impossible.
Since, at the lazy invalidation, we always invalidate the MR we want to
use, the wrong MR may be invalidated.

To fix the issue, we need a field per MR that indicates the MR needs
invalidation. Since the ib_mr structure already has such a field, let's
use ib_mr.need_inval instead of iser_reg_resources.mr_valid.

Fixes: b76a439982f8 ("IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover")
Link: https://lore.kernel.org/r/20231219072311.40989-1-sergeygo@nvidia.com
Acked-by: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Signed-off-by: Sergey Gorenko &lt;sergeygo@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The iser_reg_resources structure has two pointers to MR but only one
mr_valid field. The implementation assumes that we use only *sig_mr when
pi_enable is true. Otherwise, we use only *mr. However, it is only
sometimes correct. Read commands without protection information occur even
when pi_enble is true. For example, the following SCSI commands have a
Data-In buffer but never have protection information: READ CAPACITY (16),
INQUIRY, MODE SENSE(6), MAINTENANCE IN. So, we use
*sig_mr for some SCSI commands and *mr for the other SCSI commands.

In most cases, it works fine because the remote invalidation is applied.
However, there are two cases when the remote invalidation is not
applicable.
 1. Small write commands when all data is sent as an immediate.
 2. The target does not support the remote invalidation feature.

The lazy invalidation is used if the remote invalidation is impossible.
Since, at the lazy invalidation, we always invalidate the MR we want to
use, the wrong MR may be invalidated.

To fix the issue, we need a field per MR that indicates the MR needs
invalidation. Since the ib_mr structure already has such a field, let's
use ib_mr.need_inval instead of iser_reg_resources.mr_valid.

Fixes: b76a439982f8 ("IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover")
Link: https://lore.kernel.org/r/20231219072311.40989-1-sergeygo@nvidia.com
Acked-by: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Signed-off-by: Sergey Gorenko &lt;sergeygo@nvidia.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos</title>
<updated>2023-12-26T08:40:46+00:00</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2023-12-22T23:46:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=d42fafb895246e3f5a5e0f83e7485167fa651f5c'/>
<id>d42fafb895246e3f5a5e0f83e7485167fa651f5c</id>
<content type='text'>
Drop one kernel-doc comment to prevent a warning:

iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device'

and spell 2 words correctly (buffer and deferred).

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Cc: linux-rdma@vger.kernel.org
Link: https://lore.kernel.org/r/20231222234623.25231-1-rdunlap@infradead.org
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Acked-by: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Drop one kernel-doc comment to prevent a warning:

iscsi_iser.h:313: warning: Excess struct member 'mr' description in 'iser_device'

and spell 2 words correctly (buffer and deferred).

Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Cc: linux-rdma@vger.kernel.org
Link: https://lore.kernel.org/r/20231222234623.25231-1-rdunlap@infradead.org
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Acked-by: Max Gurtovoy &lt;mgurtovoy@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IB/ipoib: Fix mcast list locking</title>
<updated>2023-12-12T08:28:39+00:00</updated>
<author>
<name>Daniel Vacek</name>
<email>neelx@redhat.com</email>
</author>
<published>2023-12-12T08:07:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.toradex.cn/cgit/linux-toradex.git/commit/?id=4f973e211b3b1c6d36f7c6a19239d258856749f9'/>
<id>4f973e211b3b1c6d36f7c6a19239d258856749f9</id>
<content type='text'>
Releasing the `priv-&gt;lock` while iterating the `priv-&gt;multicast_list` in
`ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to
remove the items while in the middle of iteration. If the mcast is removed
while the lock was dropped, the for loop spins forever resulting in a hard
lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel):

    Task A (kworker/u72:2 below)       | Task B (kworker/u72:0 below)
    -----------------------------------+-----------------------------------
    ipoib_mcast_join_task(work)        | ipoib_ib_dev_flush_light(work)
      spin_lock_irq(&amp;priv-&gt;lock)       | __ipoib_ib_dev_flush(priv, ...)
      list_for_each_entry(mcast,       | ipoib_mcast_dev_flush(dev = priv-&gt;dev)
          &amp;priv-&gt;multicast_list, list) |
        ipoib_mcast_join(dev, mcast)   |
          spin_unlock_irq(&amp;priv-&gt;lock) |
                                       |   spin_lock_irqsave(&amp;priv-&gt;lock, flags)
                                       |   list_for_each_entry_safe(mcast, tmcast,
                                       |                  &amp;priv-&gt;multicast_list, list)
                                       |     list_del(&amp;mcast-&gt;list);
                                       |     list_add_tail(&amp;mcast-&gt;list, &amp;remove_list)
                                       |   spin_unlock_irqrestore(&amp;priv-&gt;lock, flags)
          spin_lock_irq(&amp;priv-&gt;lock)   |
                                       |   ipoib_mcast_remove_list(&amp;remove_list)
   (Here, `mcast` is no longer on the  |     list_for_each_entry_safe(mcast, tmcast,
    `priv-&gt;multicast_list` and we keep |                            remove_list, list)
    spinning on the `remove_list` of   |  &gt;&gt;&gt;  wait_for_completion(&amp;mcast-&gt;done)
    the other thread which is blocked  |
    and the list is still valid on     |
    it's stack.)

Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent
eventual sleeps.
Unfortunately we could not reproduce the lockup and confirm this fix but
based on the code review I think this fix should address such lockups.

crash&gt; bc 31
PID: 747      TASK: ff1c6a1a007e8000  CPU: 31   COMMAND: "kworker/u72:2"
--
    [exception RIP: ipoib_mcast_join_task+0x1b1]
    RIP: ffffffffc0944ac1  RSP: ff646f199a8c7e00  RFLAGS: 00000002
    RAX: 0000000000000000  RBX: ff1c6a1a04dc82f8  RCX: 0000000000000000
                                  work (&amp;priv-&gt;mcast_task{,.work})
    RDX: ff1c6a192d60ac68  RSI: 0000000000000286  RDI: ff1c6a1a04dc8000
           &amp;mcast-&gt;list
    RBP: ff646f199a8c7e90   R8: ff1c699980019420   R9: ff1c6a1920c9a000
    R10: ff646f199a8c7e00  R11: ff1c6a191a7d9800  R12: ff1c6a192d60ac00
                                                         mcast
    R13: ff1c6a1d82200000  R14: ff1c6a1a04dc8000  R15: ff1c6a1a04dc82d8
           dev                    priv (&amp;priv-&gt;lock)     &amp;priv-&gt;multicast_list (aka head)
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- &lt;NMI exception stack&gt; ---
 #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib]
 #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967

crash&gt; rx ff646f199a8c7e68
ff646f199a8c7e68:  ff1c6a1a04dc82f8 &lt;&lt;&lt; work = &amp;priv-&gt;mcast_task.work

crash&gt; list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000
(empty)

crash&gt; ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000
  mcast_task.work.func = 0xffffffffc0944910 &lt;ipoib_mcast_join_task&gt;,
  mcast_mutex.owner.counter = 0xff1c69998efec000

crash&gt; b 8
PID: 8        TASK: ff1c69998efec000  CPU: 33   COMMAND: "kworker/u72:0"
--
 #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646
 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib]
 #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib]
 #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib]
 #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967

crash&gt; rx ff646f1980153e68
ff646f1980153e68:  ff1c6a1a04dc83f0 &lt;&lt;&lt; work = &amp;priv-&gt;flush_light

crash&gt; ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000
  flush_light.func = 0xffffffffc0943820 &lt;ipoib_ib_dev_flush_light&gt;,
  broadcast = 0x0,

The mcast(s) on the `remove_list` (the remaining part of the ex `priv-&gt;multicast_list`):

crash&gt; list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - -
ff1c6a192bd0c200          done.done = 0x0,
ff1c6a192d60ac00          done.done = 0x0,

Reported-by: Yuya Fujita-bishamonten &lt;fj-lsoft-rh-driver@dl.jp.fujitsu.com&gt;
Signed-off-by: Daniel Vacek &lt;neelx@redhat.com&gt;
Link: https://lore.kernel.org/all/20231212080746.1528802-1-neelx@redhat.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Releasing the `priv-&gt;lock` while iterating the `priv-&gt;multicast_list` in
`ipoib_mcast_join_task()` opens a window for `ipoib_mcast_dev_flush()` to
remove the items while in the middle of iteration. If the mcast is removed
while the lock was dropped, the for loop spins forever resulting in a hard
lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel):

    Task A (kworker/u72:2 below)       | Task B (kworker/u72:0 below)
    -----------------------------------+-----------------------------------
    ipoib_mcast_join_task(work)        | ipoib_ib_dev_flush_light(work)
      spin_lock_irq(&amp;priv-&gt;lock)       | __ipoib_ib_dev_flush(priv, ...)
      list_for_each_entry(mcast,       | ipoib_mcast_dev_flush(dev = priv-&gt;dev)
          &amp;priv-&gt;multicast_list, list) |
        ipoib_mcast_join(dev, mcast)   |
          spin_unlock_irq(&amp;priv-&gt;lock) |
                                       |   spin_lock_irqsave(&amp;priv-&gt;lock, flags)
                                       |   list_for_each_entry_safe(mcast, tmcast,
                                       |                  &amp;priv-&gt;multicast_list, list)
                                       |     list_del(&amp;mcast-&gt;list);
                                       |     list_add_tail(&amp;mcast-&gt;list, &amp;remove_list)
                                       |   spin_unlock_irqrestore(&amp;priv-&gt;lock, flags)
          spin_lock_irq(&amp;priv-&gt;lock)   |
                                       |   ipoib_mcast_remove_list(&amp;remove_list)
   (Here, `mcast` is no longer on the  |     list_for_each_entry_safe(mcast, tmcast,
    `priv-&gt;multicast_list` and we keep |                            remove_list, list)
    spinning on the `remove_list` of   |  &gt;&gt;&gt;  wait_for_completion(&amp;mcast-&gt;done)
    the other thread which is blocked  |
    and the list is still valid on     |
    it's stack.)

Fix this by keeping the lock held and changing to GFP_ATOMIC to prevent
eventual sleeps.
Unfortunately we could not reproduce the lockup and confirm this fix but
based on the code review I think this fix should address such lockups.

crash&gt; bc 31
PID: 747      TASK: ff1c6a1a007e8000  CPU: 31   COMMAND: "kworker/u72:2"
--
    [exception RIP: ipoib_mcast_join_task+0x1b1]
    RIP: ffffffffc0944ac1  RSP: ff646f199a8c7e00  RFLAGS: 00000002
    RAX: 0000000000000000  RBX: ff1c6a1a04dc82f8  RCX: 0000000000000000
                                  work (&amp;priv-&gt;mcast_task{,.work})
    RDX: ff1c6a192d60ac68  RSI: 0000000000000286  RDI: ff1c6a1a04dc8000
           &amp;mcast-&gt;list
    RBP: ff646f199a8c7e90   R8: ff1c699980019420   R9: ff1c6a1920c9a000
    R10: ff646f199a8c7e00  R11: ff1c6a191a7d9800  R12: ff1c6a192d60ac00
                                                         mcast
    R13: ff1c6a1d82200000  R14: ff1c6a1a04dc8000  R15: ff1c6a1a04dc82d8
           dev                    priv (&amp;priv-&gt;lock)     &amp;priv-&gt;multicast_list (aka head)
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- &lt;NMI exception stack&gt; ---
 #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 [ib_ipoib]
 #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967

crash&gt; rx ff646f199a8c7e68
ff646f199a8c7e68:  ff1c6a1a04dc82f8 &lt;&lt;&lt; work = &amp;priv-&gt;mcast_task.work

crash&gt; list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000
(empty)

crash&gt; ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff1c6a1a04dc8000
  mcast_task.work.func = 0xffffffffc0944910 &lt;ipoib_mcast_join_task&gt;,
  mcast_mutex.owner.counter = 0xff1c69998efec000

crash&gt; b 8
PID: 8        TASK: ff1c69998efec000  CPU: 33   COMMAND: "kworker/u72:0"
--
 #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646
 #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6 [ib_ipoib]
 #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 [ib_ipoib]
 #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [ib_ipoib]
 #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967

crash&gt; rx ff646f1980153e68
ff646f1980153e68:  ff1c6a1a04dc83f0 &lt;&lt;&lt; work = &amp;priv-&gt;flush_light

crash&gt; ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000
  flush_light.func = 0xffffffffc0943820 &lt;ipoib_ib_dev_flush_light&gt;,
  broadcast = 0x0,

The mcast(s) on the `remove_list` (the remaining part of the ex `priv-&gt;multicast_list`):

crash&gt; list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e10 | paste - -
ff1c6a192bd0c200          done.done = 0x0,
ff1c6a192d60ac00          done.done = 0x0,

Reported-by: Yuya Fujita-bishamonten &lt;fj-lsoft-rh-driver@dl.jp.fujitsu.com&gt;
Signed-off-by: Daniel Vacek &lt;neelx@redhat.com&gt;
Link: https://lore.kernel.org/all/20231212080746.1528802-1-neelx@redhat.com
Signed-off-by: Leon Romanovsky &lt;leon@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
