summaryrefslogtreecommitdiff
path: root/tools/lib/python/abi
diff options
context:
space:
mode:
authorCosmin Ratiu <cratiu@nvidia.com>2026-02-18 09:29:03 +0200
committerJakub Kicinski <kuba@kernel.org>2026-02-19 09:30:16 -0800
commit83ac0304a2d77519dae1e54c9713cbe1aedf19c9 (patch)
treec0eae92a12d861d7a8a479dde2a12149e4c93d28 /tools/lib/python/abi
parent9854b243ce4225328d0b32fdc998d35b6952d3f7 (diff)
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
In the mentioned "Fixes" commit, various work tasks triggering devlink health reporter recovery were switched to use netdev_trylock to protect against concurrent tear down of the channels being recovered. But this had the side effect of introducing potential deadlocks because of incorrect lock ordering. The correct lock order is described by the init flow: probe_one -> mlx5_init_one (acquires devlink lock) -> mlx5_init_one_devl_locked -> mlx5_register_device -> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe -> register_netdev (acquires rtnl lock) -> register_netdevice (acquires netdev lock) => devlink lock -> rtnl lock -> netdev lock. But in the current recovery flow, the order is wrong: mlx5e_tx_err_cqe_work (acquires netdev lock) -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report -> devlink_health_report (acquires devlink lock => boom!) -> devlink_health_reporter_recover -> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx -> mlx5e_tx_reporter_err_cqe_recover The same pattern exists in: mlx5e_reporter_rx_timeout mlx5e_reporter_tx_ptpsq_unhealthy mlx5e_reporter_tx_timeout Fix these by moving the netdev_trylock calls from the work handlers lower in the call stack, in the respective recovery functions, where they are actually necessary. Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260218072904.1764634-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'tools/lib/python/abi')
0 files changed, 0 insertions, 0 deletions