Histogram unbounded memory even with run_upkeep #467

gauravkumar37 · 2024-03-26T21:02:36Z

Thanks for a wonderful library. After the recent merge of #460 I tested the upkeep and idle timeout configs but still see unbounded memory usage if /metrics endpoint is not being called.
Is this still expected even after upkeep?

Testing methodology:

Using axum with axum-prometheus
Using /metrics
Using wrk to benchmark an API with a simple health endpoint
Memory keep on increasing unbounded
Code has idle_timeout of 60s and upkeep of 5s
If I now open /metrics endpoint, memory eventually reduces and the more I open it periodically, the more stable the memory is

So, is my observed behavior still expected even after the introduction of upkeep?

The text was updated successfully, but these errors were encountered:

tobz · 2024-03-27T00:11:28Z

Hmm, the latest (0.6.1) version of axum-prometheus is still on 0.13.x of metrics-exporter-prometheus. Are you using a development version or patching metrics-exporter-prometheus to 0.14.0?

gauravkumar37 · 2024-03-27T06:54:18Z

Correct, hence I patched the version of metrics-exporter-prometheus in axum-prometheus to 0.14.0 to test, also able to verify that adding upkeep_timeout compiles correctly, also verified the 0.14.0 in Cargo.lock.
Patch location: https://github.com/gauravkumar37/axum-prometheus/tree/patch-1

tobz · 2024-03-27T13:23:07Z

Looking at how axum-prometheus uses metrics-exporter-prometheus, this makes sense now: PrometheusBuilder doesn't create the upkeep task unless PrometheusBuilder::build or PrometheusBuilder::install are called, whereas right now, it calls PrometheusBuilder::install_recorder.

Not really sure there's a ton we can do here, or to be frank, that I'm willing to do here. The number of ways to build/install the recorder is already too high for comfort because it's trying to be all things to all people. The simplest thing would be if axum-prometheus switched to using PrometheusBuilder::build and discarded the exporter future if it's not in push gateway mode. It would still have to install the recorder, but it would allow it the chance to grab a recorder handle, and it would allow the upkeep task to be created.

gauravkumar37 · 2024-03-27T13:27:56Z

This is what I am using inside axum framework:

    PrometheusMetricLayerBuilder::new()
        .with_metrics_from_fn(|| {
            PrometheusBuilder::new()
                .idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(300)))
                .upkeep_timeout(Duration::from_secs(60))
                .set_buckets(&[0.1, 0.2, 0.3])
                .unwrap()
                .install_recorder()
                .unwrap()
        })
        .build_pair()

This does create the PrometheusBuilder and installs it. Is there something more to do here?

tobz · 2024-03-27T22:18:34Z

Right, you're using install_recorder, when you need to use build.

gauravkumar37 · 2024-03-29T20:31:14Z

Ok, I got it to work correctly but I must say, that was an unexpected nuance there :-)

So, something like this works and there is no more unbounded memory growth:

    PrometheusMetricLayerBuilder::new()
        .with_metrics_from_fn(|| {
            let (recorder, _) = PrometheusBuilder::new()
                // if metrics are not updated within this time, they will be removed
                .idle_timeout(MetricKindMask::HISTOGRAM, Some(Duration::from_secs(5 * 60)))
                // prevents unbounded memory growth by draining histogram data periodically
                .upkeep_timeout(Duration::from_secs(5))
                .set_buckets(buckets_in_secs.as_slice())
                .expect_or_log(&format!("Failed to set buckets: '{buckets_in_secs:?}'"))
                // instead of install_recorder, use build because the former doesn't start the upkeep tasks
                .build()
                .expect_or_log("Failed to build metrics recorder");
            let handle = recorder.handle();
            metrics::set_global_recorder(recorder).expect_or_log("Failed to set global recorder");
            handle
        })
        .build_pair()

++ @Ptrskay3 for changes in axum-prometheus

Ptrskay3 · 2024-03-30T06:05:23Z

Thanks for the ping. I'm open to changing axum_prometheus, if that makes it safer for most of the people. Not sure when I'll find the time to do it myself however. Created Ptrskay3/axum-prometheus#51.

tobz · 2024-03-31T17:26:24Z

@Ptrskay3 Thanks for opening that. 👍🏻

Like I said upthread, I'm also not a fan of how thorny the crate (this crate, not yours) has become in terms of the API... but it does seem like taking the approach shown by @gauravkumar37 would be the simplest one.

I'll also make a note for myself to potentially contribute that as a PR to axum_prometheus.

Ptrskay3 · 2024-07-20T10:00:22Z

Just released 0.7 of axum-prometheus, that should fix this issue if people using the default implementation.

anastasia-tarasenko · 2024-10-28T13:53:41Z

@tobz Do you consider to add upkeep_timeout task to the install_recorder() (or build_recorder) method? If somebody needs a recorder and creates it using install_recorder method then there is a still bug with the memory usage. It seems that this task should be presented there.

tobz · 2024-10-29T15:32:39Z

When users opt to use PrometheusBuilder::build_recorder or PrometheusBuilder::install_recorder, they're specifically opting into a lower level of control where they're handling how the recorder is installed and/or how the resulting rendered metrics are emitted and consumed. They're also opting into handling upkeep manually as well.

All of that said, the documentation on this isn't good and doesn't explain this for PrometheusBuilder::build_recorder or PrometheusBuilder::install_recorder, so I'm going to open a PR to make those changes.

anastasia-tarasenko · 2024-10-31T18:20:25Z

@tobz Thank you for explanations. I have one more question. What was the main reason why so complicated logic was used for the histogram data and why it can not be calculated immediately in the time when it received?

tobz · 2024-10-31T19:11:52Z

It mostly boils down to concurrency.

When I originally wrote the Prometheus exporter, I needed something that could support histograms or summaries, which meant needing to keep track of the raw samples to accurately generate the summary data. Since we need the raw samples, this forces us to be able to concurrently accept the incoming histogram samples (since there can be multiple threads updating the same metric at the same time) which is why I wrote AtomicBucket<T>.

If we didn't need to care about aggregated summaries, we could do something better like an array of atomic integers -- one integer per histogram bucket -- and just update them directly, as you allude to. It might be worth doing that overall, and just sort of declaring "we don't support aggregated summaries because they're crap and make things harder, sorry" but I don't have a strong enough opinion/need so it's never really been on my list of things to do.

tobz added C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-ergonomics Type: ergonomics. T-request Type: request. labels Mar 27, 2024

Ptrskay3 mentioned this issue Mar 30, 2024

Change PrometheusBuilder::install_recorder for the default metrics Ptrskay3/axum-prometheus#51

Closed

Ptrskay3 mentioned this issue Jun 1, 2024

Change default initialization of PrometheusHandle Ptrskay3/axum-prometheus#52

Merged

1 task

tobz mentioned this issue Oct 30, 2024

prometheus: update docs to highlight requirements around upkeep #537

Merged

tobz closed this as completed in #537 Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histogram unbounded memory even with run_upkeep #467

Histogram unbounded memory even with run_upkeep #467

gauravkumar37 commented Mar 26, 2024

tobz commented Mar 27, 2024

gauravkumar37 commented Mar 27, 2024

tobz commented Mar 27, 2024 •

edited

Loading

gauravkumar37 commented Mar 27, 2024

tobz commented Mar 27, 2024

gauravkumar37 commented Mar 29, 2024 •

edited

Loading

Ptrskay3 commented Mar 30, 2024

tobz commented Mar 31, 2024

Ptrskay3 commented Jul 20, 2024

anastasia-tarasenko commented Oct 28, 2024

tobz commented Oct 29, 2024

anastasia-tarasenko commented Oct 31, 2024

tobz commented Oct 31, 2024

Histogram unbounded memory even with run_upkeep #467

Histogram unbounded memory even with run_upkeep #467

Comments

gauravkumar37 commented Mar 26, 2024

tobz commented Mar 27, 2024

gauravkumar37 commented Mar 27, 2024

tobz commented Mar 27, 2024 • edited Loading

gauravkumar37 commented Mar 27, 2024

tobz commented Mar 27, 2024

gauravkumar37 commented Mar 29, 2024 • edited Loading

Ptrskay3 commented Mar 30, 2024

tobz commented Mar 31, 2024

Ptrskay3 commented Jul 20, 2024

anastasia-tarasenko commented Oct 28, 2024

tobz commented Oct 29, 2024

anastasia-tarasenko commented Oct 31, 2024

tobz commented Oct 31, 2024

tobz commented Mar 27, 2024 •

edited

Loading

gauravkumar37 commented Mar 29, 2024 •

edited

Loading