-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add capability to purge old histogram data #460
Conversation
- add purge_timeout option to PrometheusBuilder - run a purger that purges based on the purge_timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, but just a few notes/requests to try and make this a little more generic/understandable to folks.
if let Ok(handle) = runtime::Handle::try_current() { | ||
handle.spawn(purger); | ||
} else { | ||
let thread_name = "metrics-exporter-prometheus-purger"; | ||
|
||
let runtime = runtime::Builder::new_current_thread() | ||
.enable_all() | ||
.build() | ||
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?; | ||
|
||
thread::Builder::new() | ||
.name(thread_name.to_owned()) | ||
.spawn(move || runtime.block_on(purger)) | ||
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?; | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the conditional logic here: just spawn the future directly.
(We already document that this method must be called from within a Tokio runtime or else it will panic.)
|
||
let exporter_config = self.exporter_config.clone(); | ||
let recorder = self.build_recorder(); | ||
let handle = recorder.handle(); | ||
|
||
// use the handle to recorder | ||
// #[cfg(not(feature = "push-gateway"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this line.
|
||
/// Purges registry's histogram data by draining it into the distribution. This should be | ||
/// called periodically to prevent the accumulation of histogram samples. | ||
pub fn purge(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just change the wording overall from "purge" to "upkeep", and make this run_upkeep
.
Purging to me sounds like getting rid of, when realistically we're just doing periodic cleanup work.
/// Sets the purge timeout for metrics. | ||
/// | ||
/// If a purge timeout is set, the purger will call `.render()` on the registry, causing | ||
/// the values from histograms to be drained out. This ensures that stale histogram values | ||
/// do not persist indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar note here about just changing the wording overall from "purge"/"purging" to "upkeep".
I would also remove the specific bit about calling .render()
, since it doesn't actually do that anymore. Just be generic, something like:
/// Sets the upkeep interval.
///
/// The upkeep task handles periodic maintenance operations, such as draining histogram data,
/// to ensure that all recorded data is up-to-date and prevent unbounded memory growth.
@@ -128,6 +129,7 @@ impl PrometheusBuilder { | |||
buckets: None, | |||
bucket_overrides: None, | |||
idle_timeout: None, | |||
purge_timeout: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should actually enable this by default.
Thinking more about it, it's a quality of life improvement to avoid unbounded memory growth for people with a high rate of histogram metrics, or who scrape/push their Prometheus exporter infrequently.
Probably 5 seconds as the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, 5 seconds is a good default. Also I can't think of a use case where somebody would not want to run upkeep so making this non-optional would be sensible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Nice and simple. 👍🏻
Released in Thanks again for your contribution! |
Copy of #451
What
Implements third way as prescribed here to purge old histogram data:
Fixes #245