-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#24565] docdb: Pre-Aggregate Metrics for Faster Scraping
Summary: **Background** The current metric scraping approach aggregates metrics during the scrape, which has caused performance bottlenecks, leading to frequent Prometheus target downtimes for customers. We observed metric scrape times exceeding 15 seconds with 4,000 tables and 18,000 tablets, with most of this time consumed by aggregating tablet metrics to the table level. This update introduces pre-aggregation for tablet metrics that are summed, allowing metric scrapes to skip aggregation of most tablet metrics, significantly reducing the scrape duration. **Key Aspects of Pre-Aggregation:** Pre-aggregation is supported for tablet, xCluster, and CDCSDK metrics. Below, we use tablet metrics as an example: 1. Pre-Aggregation Setup: During metric creation, pre-aggregation is enabled based on the metric's entity type and aggregation function. Only tablet and stream metrics with a sum aggregation function are eligible for pre-aggregation, while other metrics are handled at scrape time. 2. Shared Atomic Variable: Pre-aggregation creates an Atomic Integer variable shared across all instances of the same metric within a table(or stream). When a pre-aggregated metric value is updated, the shared Atomic Integer variable is also updated accordingly. 3. Metric Destruction Handling: When a pre-aggregated tablet metric object is destroyed (e.g., due to tablet move), the shared aggregated value is decremented by the tablet metric value to maintain accuracy. Contention concern for (2): With many concurrent read or write operations, contention may occur, as updating a tablet metric value must compete with other threads updating the same table-level value. To verify performance impact, I ran several Sysbench read-only and and write-only workload, which showed no noticeable impact. [[ https://docs.google.com/spreadsheets/d/1O-RtRWWLkZYNTnjeNWLvrocenIpMtwCb9urjtNkSR9c/edit | Link to results ]]. **New Metric Scraping Steps:** 1. Handling Non-Pre-Aggregated Metrics: * Metrics that need to be aggregated at scrape time are aggregated in this step. * Metrics that do not require aggregation are flushed directly in this step. 2. Flushing pre-aggregated metrics and scrape-time-aggregated metrics. After completing these two phases, a separate asynchronous thread handles cleanup. This cleanup removes unreferenced metric values (e.g., when a table is removed, and no tablets reference the shared aggregated value) and cleans up attributes associated with pre-aggregated values. With these enhancements, scrape time has improved from 15 seconds to 2 seconds for 4,000 tables and 18,000 tablets. **Other Changes:** * Addressed a potential issue where aggregated metrics using the max aggregation function were assumed to always be greater than or equal to zero. However, negative values are possible and are now correctly handled. * Redesigned D35689: The aggregated metric now holds a shared_ptr to its prototype to ensure the OwningPrototype is not deleted before the flush operation. Jira: DB-13599 Test Plan: Jenkins MetricsTest.AggregationTest Reviewers: esheng, mlillibridge, rthallam, amitanand Reviewed By: amitanand Subscribers: amitanand, hsunder, yql, kannan, ybase Differential Revision: https://phorge.dev.yugabyte.com/D39667
- Loading branch information
1 parent
cb0dfba
commit 6486578
Showing
27 changed files
with
1,670 additions
and
558 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.