diff options
| author | Christoph Paasch <cpaasch@openai.com> | 2025-08-16 16:12:48 -0700 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2025-08-19 17:50:33 -0700 |
| commit | 5236f57e7c033d869fe8f2080a977ea47882b26f (patch) | |
| tree | 66ee461e3ccb4ecc5ce98cbb0740fadc5c9582e2 | |
| parent | 51992f99f068fba966a680a9ac118b815f2fe08e (diff) | |
net: Make nexthop-dumps scale linearly with the number of nexthops
When we have a (very) large number of nexthops, they do not fit within a
single message. rtm_dump_walk_nexthops() thus will be called repeatedly
and ctx->idx is used to avoid dumping the same nexthops again.
The approach in which we avoid dumping the same nexthops is by basically
walking the entire nexthop rb-tree from the left-most node until we find
a node whose id is >= s_idx. That does not scale well.
Instead of this inefficient approach, rather go directly through the
tree to the nexthop that should be dumped (the one whose nh_id >=
s_idx). This allows us to find the relevant node in O(log(n)).
We have quite a nice improvement with this:
Before:
=======
--> ~1M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
1050624
real 0m21.080s
user 0m0.666s
sys 0m20.384s
--> ~2M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
2101248
real 1m51.649s
user 0m1.540s
sys 1m49.908s
After:
======
--> ~1M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
1050624
real 0m1.157s
user 0m0.926s
sys 0m0.259s
--> ~2M nexthops:
$ time ~/libnl/src/nl-nh-list | wc -l
2101248
real 0m2.763s
user 0m2.042s
sys 0m0.776s
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250816-nexthop_dump-v2-1-491da3462118@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| -rw-r--r-- | net/ipv4/nexthop.c | 36 |
1 files changed, 33 insertions, 3 deletions
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 29118c43ebf5..509004bfd08e 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -3511,12 +3511,42 @@ static int rtm_dump_walk_nexthops(struct sk_buff *skb, int err; s_idx = ctx->idx; - for (node = rb_first(root); node; node = rb_next(node)) { + + /* If this is not the first invocation, ctx->idx will contain the id of + * the last nexthop we processed. Instead of starting from the very + * first element of the red/black tree again and linearly skipping the + * (potentially large) set of nodes with an id smaller than s_idx, walk + * the tree and find the left-most node whose id is >= s_idx. This + * provides an efficient O(log n) starting point for the dump + * continuation. + */ + if (s_idx != 0) { + struct rb_node *tmp = root->rb_node; + + node = NULL; + while (tmp) { + struct nexthop *nh; + + nh = rb_entry(tmp, struct nexthop, rb_node); + if (nh->id < s_idx) { + tmp = tmp->rb_right; + } else { + /* Track current candidate and keep looking on + * the left side to find the left-most + * (smallest id) that is still >= s_idx. + */ + node = tmp; + tmp = tmp->rb_left; + } + } + } else { + node = rb_first(root); + } + + for (; node; node = rb_next(node)) { struct nexthop *nh; nh = rb_entry(node, struct nexthop, rb_node); - if (nh->id < s_idx) - continue; ctx->idx = nh->id; err = nh_cb(skb, cb, nh, data); |
