Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
northd: lflow-mgr: Allocate DP reference counters on a second use.
Currently, whenever a new logical flow is created, northd allocates a reference counter for every datapath in the datapath group for that logical flow. Those reference counters are necessary in order to not remove the datapath from a logical flow during incremental processing if it was created from two different objects for the same datapath and now one of them is removed. However, that creates a serious scalability problem. In a large scale setups with tens of thousands of logical flows applied to large datapath groups we may have hundreds of millions of these reference counters allocated, which is many GB of RSS just for that purpose. For example, in ovn-heater's cluster-density 500 node test, ovn-northd started to consume up to 8.5 GB or RAM. In the same test before the reference counting, ovn-northd consumed only 2.5 GB. All those memory allocation also increased CPU usage. Re-compute time went up from 1.5 seconds to 6 seconds in the same test. In the end we have about 4x increase on both CPU and memory usage. Running under valgrind --tool=massif shows: ------------------------------------------------------------- total(B) useful-heap(B) extra-heap(B) stacks(B) ------------------------------------------------------------- 9,416,438,200 7,401,971,556 2,014,466,644 0 78.61% (7,401 MB) (heap allocation functions) malloc ->45.78% (4,311 MB) xcalloc__ (util.c:124) | ->45.68% (4,301 MB) xzalloc__ (util.c:134) | | ->45.68% (4,301 MB) xzalloc (util.c:168) | | ->40.97% (3,857 MB) dp_refcnt_use (lflow-mgr.c:1348) | | | ->40.89% (3,850 MB) lflow_table_add_lflow (lflow-mgr.c:696) | | | | ->12.27% (1,155 MB) build_lb_rules_pre_stateful (northd.c:7180) | | | | ->12.27% (1,155 MB) build_lb_rules (northd.c:7658) | | | | ->12.27% (1,155,MB) build_gw_lrouter_nat_flows_for_lb (northd.c:11054) ->28.62% (2,694 MB) xmalloc__ (util.c:140) | ->28.62% (2,694 MB) xmalloc (util.c:175) | ->06.71% (631 MB) resize (hmap.c:100) | | ->06.01% (565 MB) hmap_expand_at (hmap.c:175) | | | ->05.24% (492 MB) hmap_insert_at (hmap.h:309) | | | | ->05.24% (492 MB) dp_refcnt_use (lflow-mgr.c:1351) 45% of all the memory is allocated for reference counters themselves and another 7% is taken by hash maps to hold them. Also, there is more than 20% of a total memory allocation overhead (extra-heap) since all these allocated objects are very small (32B). This test allocates 120 M reference counters total. However, the vast majority of all the reference counters always has a value of 1, i.e. these datapaths are not used more than once. Defer allocation of reference counters until the datapath is used for the same logical flow for the second time. We can do that by checking the current datapath group bitmap. With this change, the amount of allocated reference counters goes down from 120 M to just 12 M. Memory consumption reduced from 8.5 GB to 2.67 GB and the northd recompute time reduced from 6 to 2.1 seconds. It is still a little higher than resource usage before introduction of incremental processing for logical flows, but it is fairly manageable. Also, the resource usage and memory consumption may be further improved by reducing the number of cases where northd attempts to create the logical flows for the same datapaths multiple times. Note: the cluster-density test in ovn-heater creates new port groups on every iteration and ovn-northd doesn't handle this incrementally, so it always re-computes. That's why there is no benefit from northd I-P for CPU usage in this scenario. Fixes: a623606 ("northd: Refactor lflow management into a separate module.") Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Numan Siddique <numans@ovn.org>