-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram unbounded memory even with run_upkeep #467
Comments
Hmm, the latest ( |
Correct, hence I patched the version of |
Looking at how Not really sure there's a ton we can do here, or to be frank, that I'm willing to do here. The number of ways to build/install the recorder is already too high for comfort because it's trying to be all things to all people. The simplest thing would be if |
This is what I am using inside axum framework: PrometheusMetricLayerBuilder::new()
.with_metrics_from_fn(|| {
PrometheusBuilder::new()
.idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(300)))
.upkeep_timeout(Duration::from_secs(60))
.set_buckets(&[0.1, 0.2, 0.3])
.unwrap()
.install_recorder()
.unwrap()
})
.build_pair() This does create the |
Right, you're using |
Ok, I got it to work correctly but I must say, that was an unexpected nuance there :-) So, something like this works and there is no more unbounded memory growth: PrometheusMetricLayerBuilder::new()
.with_metrics_from_fn(|| {
let (recorder, _) = PrometheusBuilder::new()
// if metrics are not updated within this time, they will be removed
.idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(5 * 60)))
// prevents unbounded memory growth by draining histogram data periodically
.upkeep_timeout(Duration::from_secs(5))
.set_buckets(buckets_in_secs.as_slice())
.expect_or_log(&format!("Failed to set buckets: '{buckets_in_secs:?}'"))
// instead of install_recorder, use build because the former doesn't start the upkeep tasks
.build()
.expect_or_log("Failed to build metrics recorder");
let handle = recorder.handle();
metrics::set_global_recorder(recorder).expect_or_log("Failed to set global recorder");
handle
})
.build_pair() ++ @Ptrskay3 for changes in |
Thanks for the ping. I'm open to changing |
@Ptrskay3 Thanks for opening that. 👍🏻 Like I said upthread, I'm also not a fan of how thorny the crate (this crate, not yours) has become in terms of the API... but it does seem like taking the approach shown by @gauravkumar37 would be the simplest one. I'll also make a note for myself to potentially contribute that as a PR to |
Just released |
@tobz Do you consider to add upkeep_timeout task to the install_recorder() (or build_recorder) method? If somebody needs a recorder and creates it using install_recorder method then there is a still bug with the memory usage. It seems that this task should be presented there. |
When users opt to use All of that said, the documentation on this isn't good and doesn't explain this for |
@tobz Thank you for explanations. I have one more question. What was the main reason why so complicated logic was used for the histogram data and why it can not be calculated immediately in the time when it received? |
It mostly boils down to concurrency. When I originally wrote the Prometheus exporter, I needed something that could support histograms or summaries, which meant needing to keep track of the raw samples to accurately generate the summary data. Since we need the raw samples, this forces us to be able to concurrently accept the incoming histogram samples (since there can be multiple threads updating the same metric at the same time) which is why I wrote If we didn't need to care about aggregated summaries, we could do something better like an array of atomic integers -- one integer per histogram bucket -- and just update them directly, as you allude to. It might be worth doing that overall, and just sort of declaring "we don't support aggregated summaries because they're crap and make things harder, sorry" but I don't have a strong enough opinion/need so it's never really been on my list of things to do. |
Thanks for a wonderful library. After the recent merge of #460 I tested the upkeep and idle timeout configs but still see unbounded memory usage if /metrics endpoint is not being called.
Is this still expected even after upkeep?
Testing methodology:
wrk
to benchmark an API with a simple health endpointSo, is my observed behavior still expected even after the introduction of upkeep?
The text was updated successfully, but these errors were encountered: