Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add capability to purge old histogram data #460

Merged
merged 6 commits into from
Mar 16, 2024

Conversation

mnpw
Copy link
Contributor

@mnpw mnpw commented Mar 8, 2024

Copy of #451


What

  • add purge_timeout option to PrometheusBuilder
  • run a purger that purges based on the purge_timeout

Implements third way as prescribed here to purge old histogram data:

  • update the builder to generate a future which both drives the Hyper server future as well as a call to get_recent_metrics on an interval

Fixes #245

mnpw added 3 commits March 8, 2024 13:21
- add purge_timeout option to PrometheusBuilder
- run a purger that purges based on the purge_timeout
Copy link
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, but just a few notes/requests to try and make this a little more generic/understandable to folks.

Comment on lines 498 to 513
if let Ok(handle) = runtime::Handle::try_current() {
handle.spawn(purger);
} else {
let thread_name = "metrics-exporter-prometheus-purger";

let runtime = runtime::Builder::new_current_thread()
.enable_all()
.build()
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?;

thread::Builder::new()
.name(thread_name.to_owned())
.spawn(move || runtime.block_on(purger))
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?;
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the conditional logic here: just spawn the future directly.

(We already document that this method must be called from within a Tokio runtime or else it will panic.)


let exporter_config = self.exporter_config.clone();
let recorder = self.build_recorder();
let handle = recorder.handle();

// use the handle to recorder
// #[cfg(not(feature = "push-gateway"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line.


/// Purges registry's histogram data by draining it into the distribution. This should be
/// called periodically to prevent the accumulation of histogram samples.
pub fn purge(&self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just change the wording overall from "purge" to "upkeep", and make this run_upkeep.

Purging to me sounds like getting rid of, when realistically we're just doing periodic cleanup work.

Comment on lines 366 to 370
/// Sets the purge timeout for metrics.
///
/// If a purge timeout is set, the purger will call `.render()` on the registry, causing
/// the values from histograms to be drained out. This ensures that stale histogram values
/// do not persist indefinitely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar note here about just changing the wording overall from "purge"/"purging" to "upkeep".

I would also remove the specific bit about calling .render(), since it doesn't actually do that anymore. Just be generic, something like:

/// Sets the upkeep interval.
///
/// The upkeep task handles periodic maintenance operations, such as draining histogram data,
/// to ensure that all recorded data is up-to-date and prevent unbounded memory growth. 

@@ -128,6 +129,7 @@ impl PrometheusBuilder {
buckets: None,
bucket_overrides: None,
idle_timeout: None,
purge_timeout: None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should actually enable this by default.

Thinking more about it, it's a quality of life improvement to avoid unbounded memory growth for people with a high rate of histogram metrics, or who scrape/push their Prometheus exporter infrequently.

Probably 5 seconds as the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, 5 seconds is a good default. Also I can't think of a use case where somebody would not want to run upkeep so making this non-optional would be sensible

Copy link
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Nice and simple. 👍🏻

@tobz tobz merged commit 3c71988 into metrics-rs:main Mar 16, 2024
12 checks passed
@tobz tobz added C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-enhancement Type: enhancement. S-awaiting-release Status: awaiting a release to be considered fixed/implemented. labels Mar 16, 2024
@tobz
Copy link
Member

tobz commented Mar 16, 2024

Released in [email protected].

Thanks again for your contribution!

@tobz tobz removed the S-awaiting-release Status: awaiting a release to be considered fixed/implemented. label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-enhancement Type: enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clearing/Expiration of old values in histogram/limit to histogram
2 participants