Merge branch 'net-mlx5-hw-counters-refactor'

Tariq Toukan says: ==================== net/mlx5: hw counters refactor This is a patchset re-post, see: https://lore.kernel.org/20240815054656.2210494-7-tariqt@nvidia.com In this patchset, Cosmin refactors hw counters and solves perf scaling issue. Series generated against: commit c824deb1a897 ("cxgb4: clip_tbl: Fix spelling mistake "wont" -> "won't"") HW counters are central to mlx5 driver operations. They are hardware objects created and used alongside most steering operations, and queried from a variety of places. Most counters are queried in bulk from a periodic task in fs_counters.c. Counter performance is important and as such, a variety of improvements have been done over the years. Currently, counters are allocated from pools, which are bulk allocated to amortize the cost of firmware commands. Counters are managed through an IDR, a doubly linked list and two atomic single linked lists. Adding/removing counters is a complex dance between user contexts requesting it and the mlx5_fc_stats_work task which does most of the work. Under high load (e.g. from connection tracking flow insertion/deletion), the counter code becomes a bottleneck, as seen on flame graphs. Whenever a counter is deleted, it gets added to a list and the wq task is scheduled to run immediately to actually delete it. This is done via mod_delayed_work which uses an internal spinlock. In some tests, waiting for this spinlock took up to 66% of all samples. This series refactors the counter code to use a more straight-forward approach, avoiding the mod_delayed_work problem and making the code easier to understand. For that: - patch #1 moves counters data structs to a more appropriate place. - patch #2 simplifies the bulk query allocation scheme by using vmalloc. - patch #3 replaces the IDR+3 lists with an xarray. This is the main patch of the series, solving the spinlock congestion issue. - patch #4 removes an unnecessary cacheline alignment causing a lot of memory to be wasted. - patches #5 and #6 are small cleanups enabled by the refactoring. ==================== Link: https://patch.msgid.link/20241001103709.58127-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
author: Jakub Kicinski <kuba@kernel.org> 2024-10-04 11:33:48 -0700
committer: Jakub Kicinski <kuba@kernel.org> 2024-10-04 11:33:48 -0700
commit: 34ea1df802f79d4498a12ca79eff6fffbf8fa7f3 (patch)
tree: 917894d7a57ad66dde5fad82da213a047ce94192 /include/linux/mlx5/fs.h
parent: c55ff46aeebed1704a9a6861777b799f15ce594d (diff)
parent: d1c9cffe4b01f4d8bc52169139a5fedd48908abc (diff)
1 files changed, 0 insertions, 3 deletions
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index b744e554f014..438db888bde0 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -298,9 +298,6 @@ int mlx5_modify_rule_destination(struct mlx5_flow_handle *handler,
 
 struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
 
-/* As mlx5_fc_create() but doesn't queue stats refresh thread. */
-struct mlx5_fc *mlx5_fc_create_ex(struct mlx5_core_dev *dev, bool aging);
-
 void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
 u64 mlx5_fc_query_lastuse(struct mlx5_fc *counter);
 void mlx5_fc_query_cached(struct mlx5_fc *counter,
author	Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:48 -0700
committer	Jakub Kicinski <kuba@kernel.org>	2024-10-04 11:33:48 -0700
commit	34ea1df802f79d4498a12ca79eff6fffbf8fa7f3 (patch)
tree	917894d7a57ad66dde5fad82da213a047ce94192 /include/linux/mlx5/fs.h
parent	c55ff46aeebed1704a9a6861777b799f15ce594d (diff)
parent	d1c9cffe4b01f4d8bc52169139a5fedd48908abc (diff)