Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Pre-Aggregate Metrics for Faster Prometheus Metric Scraping #24565

Open
1 task done
yusong-yan opened this issue Oct 22, 2024 · 0 comments
Open
1 task done

[DocDB] Pre-Aggregate Metrics for Faster Prometheus Metric Scraping #24565

yusong-yan opened this issue Oct 22, 2024 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@yusong-yan
Copy link
Contributor

yusong-yan commented Oct 22, 2024

Jira Link: DB-13599

Description

The current metric scraping approach aggregates metrics during the scrape, which has caused performance bottlenecks, leading to frequent Prometheus target downtimes for customers. We observed metric scrape times exceeding 15 seconds with 4,000 tables and 18,000 tablets, with most of this time consumed by aggregating tablet metrics to the table level.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@yusong-yan yusong-yan added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Oct 22, 2024
@yusong-yan yusong-yan self-assigned this Oct 22, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue labels Oct 22, 2024
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Oct 23, 2024
@yusong-yan yusong-yan changed the title [DocDB] Optimize Metric Scraping to Reduce Unnecessary Allocations and Redundant Operations [DocDB] Pre-Aggregate Pre-Aggregate Metrics for Faster Prometheus Metric Scraping Nov 21, 2024
yusong-yan added a commit that referenced this issue Dec 31, 2024
Summary:
**Background**
The current metric scraping approach aggregates metrics during the scrape, which has caused performance bottlenecks, leading to frequent Prometheus target downtimes for customers. We observed metric scrape times exceeding 15 seconds with 4,000 tables and 18,000 tablets, with most of this time consumed by aggregating tablet metrics to the table level. This update introduces pre-aggregation for tablet metrics that are summed, allowing metric scrapes to skip aggregation of most tablet metrics, significantly reducing the scrape duration.

**Key Aspects of Pre-Aggregation:**
Pre-aggregation is supported for tablet, xCluster, and CDCSDK metrics. Below, we use tablet metrics as an example:
1. Pre-Aggregation Setup: During metric creation, pre-aggregation is enabled based on the metric's entity type and aggregation function. Only tablet and stream metrics with a sum aggregation function are eligible for pre-aggregation, while other metrics are handled at scrape time.
2. Shared Atomic Variable: Pre-aggregation creates an Atomic Integer variable shared across all instances of the same metric within a table(or stream). When a pre-aggregated metric value is updated, the shared Atomic Integer variable is also updated accordingly.
3. Metric Destruction Handling: When a pre-aggregated tablet metric object is destroyed (e.g., due to tablet move), the shared aggregated value is decremented by the tablet metric value to maintain accuracy.

Contention concern for (2): With many concurrent read or write operations, contention may occur, as updating a tablet metric value must compete with other threads updating the same table-level value. To verify performance impact, I ran several Sysbench read-only and and write-only workload, which showed no noticeable impact. [[ https://docs.google.com/spreadsheets/d/1O-RtRWWLkZYNTnjeNWLvrocenIpMtwCb9urjtNkSR9c/edit | Link to results ]].

**New Metric Scraping Steps:**
1. Handling Non-Pre-Aggregated Metrics:
     * Metrics that need to be aggregated at scrape time are aggregated in this step.
     * Metrics that do not require aggregation are flushed directly in this step.
2. Flushing pre-aggregated metrics and scrape-time-aggregated metrics.
After completing these two phases, a separate asynchronous thread handles cleanup. This cleanup removes unreferenced metric values (e.g., when a table is removed, and no tablets reference the shared aggregated value) and cleans up attributes associated with pre-aggregated values.

With these enhancements, scrape time has improved from 15 seconds to 2 seconds for 4,000 tables and 18,000 tablets.

**Other Changes:**
* Addressed a potential issue where aggregated metrics using the max aggregation function were assumed to always be greater than or equal to zero. However, negative values are possible and are now correctly handled.
* Redesigned D35689: The aggregated metric now holds a shared_ptr to its prototype to ensure the OwningPrototype is not deleted before the flush operation.
Jira: DB-13599

Test Plan:
Jenkins
MetricsTest.AggregationTest

Reviewers: esheng, mlillibridge, rthallam, amitanand

Reviewed By: amitanand

Subscribers: amitanand, hsunder, yql, kannan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39667
vaibhav-yb pushed a commit to vaibhav-yb/yugabyte-db that referenced this issue Jan 2, 2025
Summary:
**Background**
The current metric scraping approach aggregates metrics during the scrape, which has caused performance bottlenecks, leading to frequent Prometheus target downtimes for customers. We observed metric scrape times exceeding 15 seconds with 4,000 tables and 18,000 tablets, with most of this time consumed by aggregating tablet metrics to the table level. This update introduces pre-aggregation for tablet metrics that are summed, allowing metric scrapes to skip aggregation of most tablet metrics, significantly reducing the scrape duration.

**Key Aspects of Pre-Aggregation:**
Pre-aggregation is supported for tablet, xCluster, and CDCSDK metrics. Below, we use tablet metrics as an example:
1. Pre-Aggregation Setup: During metric creation, pre-aggregation is enabled based on the metric's entity type and aggregation function. Only tablet and stream metrics with a sum aggregation function are eligible for pre-aggregation, while other metrics are handled at scrape time.
2. Shared Atomic Variable: Pre-aggregation creates an Atomic Integer variable shared across all instances of the same metric within a table(or stream). When a pre-aggregated metric value is updated, the shared Atomic Integer variable is also updated accordingly.
3. Metric Destruction Handling: When a pre-aggregated tablet metric object is destroyed (e.g., due to tablet move), the shared aggregated value is decremented by the tablet metric value to maintain accuracy.

Contention concern for (2): With many concurrent read or write operations, contention may occur, as updating a tablet metric value must compete with other threads updating the same table-level value. To verify performance impact, I ran several Sysbench read-only and and write-only workload, which showed no noticeable impact. [[ https://docs.google.com/spreadsheets/d/1O-RtRWWLkZYNTnjeNWLvrocenIpMtwCb9urjtNkSR9c/edit | Link to results ]].

**New Metric Scraping Steps:**
1. Handling Non-Pre-Aggregated Metrics:
     * Metrics that need to be aggregated at scrape time are aggregated in this step.
     * Metrics that do not require aggregation are flushed directly in this step.
2. Flushing pre-aggregated metrics and scrape-time-aggregated metrics.
After completing these two phases, a separate asynchronous thread handles cleanup. This cleanup removes unreferenced metric values (e.g., when a table is removed, and no tablets reference the shared aggregated value) and cleans up attributes associated with pre-aggregated values.

With these enhancements, scrape time has improved from 15 seconds to 2 seconds for 4,000 tables and 18,000 tablets.

**Other Changes:**
* Addressed a potential issue where aggregated metrics using the max aggregation function were assumed to always be greater than or equal to zero. However, negative values are possible and are now correctly handled.
* Redesigned D35689: The aggregated metric now holds a shared_ptr to its prototype to ensure the OwningPrototype is not deleted before the flush operation.
Jira: DB-13599

Test Plan:
Jenkins
MetricsTest.AggregationTest

Reviewers: esheng, mlillibridge, rthallam, amitanand

Reviewed By: amitanand

Subscribers: amitanand, hsunder, yql, kannan, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39667
@rthallamko3 rthallamko3 changed the title [DocDB] Pre-Aggregate Pre-Aggregate Metrics for Faster Prometheus Metric Scraping [DocDB] Pre-Aggregate Metrics for Faster Prometheus Metric Scraping Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants