-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Expose Rest API to return metrics in prometheus format #21599
Conversation
|
bc7f6a3
to
87e94ee
Compare
Ahana is not a great name for this. How about PrometheusStatsReporter? |
87e94ee
to
1cabfe5
Compare
@@ -994,6 +1003,11 @@ void PrestoServer::reportServerInfo(proxygen::ResponseHandler* downstream) { | |||
http::sendOkResponse(downstream, json(serverInfo)); | |||
} | |||
|
|||
void PrestoServer::reportHealthMetrics(proxygen::ResponseHandler* downstream) { | |||
auto reporter = std::dynamic_pointer_cast<ahana::PrometheusStatsReporter>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the ahana namespace.
|
||
namespace ahana { | ||
|
||
class AhanaStatsReporterTest : public testing::Test { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the use of the ahana namespace and use PrometheusStatsReporterTest as the name of the test instead.
|
||
#include "PrometheusStatsReporter.h" | ||
|
||
namespace ahana { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be under facebook::presto namespace.
const std::vector<int32_t>& /* pcts */) const override {} | ||
|
||
void addMetricValue(const std::string& key, size_t value = 1) | ||
const override{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an empty method ? We should do the same logic as the other addMetricValue methods.
bdff158
to
13e58ac
Compare
@karteekmurthys : Please add some documentation. The doc should have information about:
In addition, I feel that Prometheus StatsReporter should be added based on some config (and not default). |
@karteekmurthys let me know when you have added the documentation and I'll review it then. If I can help you get the docs drafted into review-ready status, just ask! |
@steveburnett thanks! I am currently drafting it. Will reach out to you soon. cc: @aditi-pandit |
7b8851f
to
b01736d
Compare
daca1b2
to
0effee9
Compare
13e58ac
to
4dc5230
Compare
6e46e4d
to
725c364
Compare
|
||
private: | ||
/// Mapping of registered stats key to StatType. | ||
mutable std::unordered_map<std::string, facebook::velox::StatType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using https://github.com/jupp0r/prometheus-cpp or some opensource implementation for stats collection? If not any specific reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data model is not too complex to rely on a library at this point. We want to keep external dependencies minimal as well. If we discover that writing serialization ourselves is getting complex we may consider using the library.
The library has additional support to create exporter as well as a push support to a gateway.
void RegisterCollectable(const std::weak_ptr<Collectable>& collectable,
const std::string& uri = std::string("/metrics"));
void RegisterAuth(
std::function<bool(const std::string&, const std::string&)> authCB,
const std::string& realm = "Prometheus-cpp Exporter",
const std::string& uri = std::string("/metrics"));
I don't think we need these capabilities at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, it supports all metrics, is production tested and widely used (though like an unofficial library)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a draft PR - #21753, we believe that using the library is the fastest way to production since its production tested. Not sure how to integrate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the library seems better if it is production-ready and widely used.
This can be an optional dependency that we can turn on/off at compile time.
I can help with the dependency management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karteekmurthys I believe Jay already has a PR in progress here #21753
We should base our work on that.
You can help with the dependency management in that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaystarshot I took a shot at using prometheus-cpp in this PR itself. I have added a compile time settings to include prometheus-directory. Added some unit tests. I will further modify the circle ci configs to run unit tests for this change. Feel free to use this PR for your testing. I will be opening an RFC soon to get this merged with Presto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried this in local? I don't think this will work since there is no exposer defined. My PR #21753 shows how to do that. Can you share the RFC with me too? I can also input from my end thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need an Exposer. We already expose a REST API, check PrestoServer.cpp changes in this PR.
You can configure your Prometheus server to scrape the enpoint: /v1/info/health/metrics
exposed by the worker.
Or you can configure whatever intermediary process to scrape from worker endpoint.
Missed adding the test results in previous comment, here it is: prometheus-format-metrics.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thanks! I think having an exposer is better since it separates the http server and makes it configurable based on the number of threads etc. In future we can also move this easily to a sidecar. We can sync offline regarding the RFC too if you have a draft prepared already
725c364
to
892525f
Compare
4dc5230
to
e16ad47
Compare
e16ad47
to
a4dc38c
Compare
892525f
to
571c2c8
Compare
Dockerfile-native-debug
Outdated
@@ -0,0 +1,52 @@ | |||
ARG PRESTO_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move the Docker/Jenkins changes to its own PR.
d0e3abb
to
663fe40
Compare
076f229
to
f1fb59f
Compare
@@ -210,6 +211,10 @@ void PrestoServer::run() { | |||
exit(EXIT_FAILURE); | |||
} | |||
|
|||
if (systemConfig->enableRuntimeStatsCollection()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karteekmurthys Karteek, let me add this config in a separte PR. Inside Meta there is a small wrapper on PrestoServer that does following:
if (gflag enabled):
initialize stats reporting
initialize presto server
I think we can use the config and remove the internal flag altogether. Let me introduce this flag in a PR and modify internal code. You can rebase on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karteekmurthys Here is the small PR
#21745
Once we merge it, I will refactor internal logic to use the cofnig property.
1f43cb8
to
0046c30
Compare
0046c30
to
01bb923
Compare
01bb923
to
3ce8d6c
Compare
|
||
private: | ||
/// Mapping of registered stats key to StatType. | ||
mutable std::unordered_map<std::string, facebook::velox::StatType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mutable folly::ConcurrentHashMap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we normally use folly::Synchronized<>
to do this. Lmk revisit this after finishing the RFC. I wanted to unblock any testing you wanted to do, so focussed on getting a working version. I need to clean up the tests as well.
/// A mapping from stats key of type COUNT to value. | ||
mutable std::unordered_map<std::string, size_t> metricsMap_; | ||
// Mutex to control access to registeredStats_ and metricMap_ members. | ||
mutable std::mutex mutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not need mutex
proxygen::HTTPMessage* /*message*/, | ||
const std::vector<std::unique_ptr<folly::IOBuf>>& /*body*/, | ||
proxygen::ResponseHandler* downstream) { | ||
server->reportHealthMetrics(downstream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? If we just define an exposer that will create another http server on /metrics or /v1/metrics? We can also define the number of threads to be used by the new http server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what we are trying to avoid. We didn't find a strong need to start a new server sharing space with our worker containers. Reduces the overhead of maintaining and launching another process. Why do you want multiple threads to be spawned by the exposer? are you expecting high traffic.
CMIW, prometheus server is configured to periodically call the scrape endpoint. So, we can expect 1 HTTP request at X seconds interval. If you are expecting huge traffic, then it is not recommended to share space with worker instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isolation is also one point, we do not want to degrade the presto process as much as we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkdutta : What approach did Meta take for this ? Does your metric collection use any endpoint in Presto worker process itself, or did you start another server ?
Moved it here: #22360 (review) |
Description
This PR is still a work-in-progress.
There are 2 approaches implemented in this PR:
This PR implements the BaseStatsReporter interface as StatsReporterImpl. The current design of StatsReporterImpl holds the metrics in-memory and exposes a method getMetrics(MetricsSerializer) that serializes the metrics with the serializer passed down to the function call. MetricsSerializer is an interface that users can implement to customize metric serialization. This approach doesn't support Histogram
Implements PrometheusReporter using the prometheus-cpp library. The library has interfaces to create all metric types (COUNTER, GAUGE, Histograms and Summaries). The library exposes a Registry which can be used to hold the references to the counters. Also, a textSerializer interface that can convert the metrics into Prometheus Data Model.
We may chose one of the above or both.
To the outside world prestissimo exposes a rest API
/v1/info/health/metrics
, which in turn uses StatsReporterImpl::getMetrics(MetricsSerializer).The Prometheus server must be configured to regularly probe this Prestissimo rest API to collect metrics.
Test Plan
Added a unit test class for StatsReporterImpl and also PrometheusReporter.
Also, Tested REST API locally and results are attached here.
prometheus-format-metrics.txt using prometheus-cpp library
Contributor checklist