Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add native support for Prometheus #1452

Merged

Conversation

pdambrauskas
Copy link
Contributor

@pdambrauskas pdambrauskas commented Jul 12, 2020

We are running Secor on K8s and monitoring our stack with Prometheus.
As Secor uses Micrometer, it is not that hard to add Prometheus support directly, and avoid using exporters for Secor metrics.

I've changed the way gauge method works, because Prometheus endpoint was returning NaN values.
More on that https://stackoverflow.com/questions/50821924/micrometer-prometheus-gauge-displays-nan

I've added /prometheus endpoint to Ostrich Server to avoid making things complicated with listening to some other port.

Also moved initialization of metric controller up to ConsumerMain class. I guess we don't need to have different instances for each Consumer. Otherwise I'd have to ensure, that there is only one instance of PrometheusMetricRegistry to avoid duplications and other problems on Prometheus http endpoint. This shouldnt be a problem, as Metric registries are delegating calls to static methods anyway.

ref: #430

Copy link
Contributor

@HenryCaiHaiying HenryCaiHaiying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, left a few comments. A little worried about the gauge cache size

@@ -58,7 +57,7 @@ public void start() {
List$.MODULE$.<StatsFactory>empty(),
Option.<String>empty(),
List$.MODULE$.<Regex>empty(),
Map$.MODULE$.<String, CustomHttpHandler>empty(),
new Map.Map1<>("/prometheus", new PrometheusHandler()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be conditioned on whether prometheus is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handler will return 404, if there is no PrometheusMetricRegistry added on Micrometer (see handler class itself).
But I guess we could add a check here too, to make it more obvious while reading the code.


/**
* MicorMeter meters can integrate with many different metrics backend
* (StatsD/Promethus/Graphite/JMX etc, see https://micrometer.io/docs)
*/
public class MicroMeterMetricCollector implements MetricCollector {
private static final Map<String, Double> GAUGE_CACHE = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big will this cache become?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, on how many different gauges will be tracked. Currently it will be 2*topic-count. Maybe I should limit cache size to avoid memory leak, if someone adds way too many metrics with different tags?
The cache is used, because prometheus is returning NaNs otherwise. https://stackoverflow.com/questions/50821924/micrometer-prometheus-gauge-displays-nan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add a config param for the cache size (e.g. 100), if the cache goes over that size, throw a runtime exception with error message to tell user to increase the cache size parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just log warning message? We probably shouldn't crash writing process because of metric cache overflow.

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jul 13, 2020 via email

Copy link
Contributor

@HenryCaiHaiying HenryCaiHaiying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small comment

}

@Override
public void gauge(String label, double value, String topic) {
String key = label + "_" + topic;
if (mGaugeCache.size() <= mConfig.getMicroMeterCacheSize()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only do the cache size check and issue error message when the cache needs to grow, not each time gauge is called (there will be too frequent)

if (!mGaugeCache.containsKey(key) && mGaugeCache.size() >= mConfig.getMicroMeterCacheSize()) {
LOG.error("Gauge cache size reached maximum, this may result in inaccurate metrics, "
+ "you can increase cache size by changing "
+ "\"secor.monitoring.metrics.collector.micrometer.cache.size\" property.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put 'return' statement here? Otherwise Metrics.gauge() result might be un-deterministic

@HenryCaiHaiying HenryCaiHaiying merged commit a06da25 into pinterest:master Jul 15, 2020
@HenryCaiHaiying
Copy link
Contributor

Looks good, thanks for the contribution.

@pdambrauskas pdambrauskas deleted the add_native_prom_support branch August 26, 2020 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants