Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint method and path to metrics name. #2850

Conversation

JDarDagran
Copy link
Contributor

@JDarDagran JDarDagran commented Jul 8, 2024

Problem

Currently, in metrics endpoint there is information gathered that contains SQL Object name + method name. It might be more informative if additional information about endpoints would be added as well.

Solution

Introduce labels (DAO name, DAO method, endpoint method, endpoint path).

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Jul 8, 2024
Copy link

netlify bot commented Jul 8, 2024

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit ce3835d
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/66a17a26a65cb100080bd928

@JDarDagran
Copy link
Contributor Author

It is breaking backwards compatibility - however, do we need it in metrics?
If so, I may suggest 2 options:

  • doubling old and new names
  • making new names optional via config

@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch from 1dfdd7d to 580b666 Compare July 10, 2024 12:45
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great there is a solution to this. See the comments.

@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch from 580b666 to e6d3c1b Compare July 11, 2024 10:03
@wslulciuc
Copy link
Member

@JDarDagran I agree that more investment is needed in our metrics, and correlating an API call -> query is nice but adding the HTTP method feels unrelated to the metric. This metric is for a DB call, not an HTTP call, though the method invocation chain is implicit. We need consistent naming strategies for not only DAO metrics, but all metrics emitted (ex: should we add the DAO method name to the API metrics? If we go down this route, metrics would contain, so it would seem, unrelated parts). I personally, I'm not a fan of the auto-generated metric names by dropwizard.

I would favor first defining a proposal for metrics and their importance, naming strategy (ie: no longer relying on dropwizard auto-generated names), labels, etc. If we were to introduce breaking changes (and I think we have to), we need a precise metric naming strategy and list of metrics that are critical for debugging DB perf issue before moving forward.

What about adding the HTTP call as a label? This will at least unblock the PR and not introduce a breaking change and new naming strategy.

Signed-off-by: Jakub Dardzinski <[email protected]>

Add MetricsIntegrationTest.

Signed-off-by: Jakub Dardzinski <[email protected]>
@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch 2 times, most recently from 7bb7545 to dfea8a7 Compare July 16, 2024 14:17
@JDarDagran
Copy link
Contributor Author

Reworked it. Now v2 endpoint /metrics/v2 contains following metrics:

marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.01",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="0.025",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.TagDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/tags",} 0.014148125
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.005",} 0.0
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="0.01",} 1.0
...
marquez_sql_duration_seconds_bucket{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",le="+Inf",} 1.0
marquez_sql_duration_seconds_count{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 1.0
marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375

@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch 2 times, most recently from e5f7a87 to 700f2d8 Compare July 19, 2024 11:43
@JDarDagran JDarDagran marked this pull request as draft July 19, 2024 11:44
@JDarDagran JDarDagran marked this pull request as ready for review July 19, 2024 11:44
@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch 5 times, most recently from afbb231 to d02d284 Compare July 22, 2024 11:42
@JDarDagran JDarDagran force-pushed the observability/add_endpoint_name_to_metrics_name branch from d02d284 to e83e967 Compare July 22, 2024 11:43
@wslulciuc
Copy link
Member

wslulciuc commented Jul 23, 2024

I think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself endpoint_method, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:

Given Example Metric:

marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375

Define Metric:

# The time to make the DB call for a given HTTP endpoint.
marquez_db_duration_seconds_by_http_call

Labels:

  • sql_class: the name of the DAO class (replaces object_name)
  • sql_method: the name of the method called by the DAO class (replaces method_name)
  • http_method: the name of the HTTP method for the given call
  • http_path: the name of the HTTP path for the given call

Rename endpoint name for v2 metrics.

Signed-off-by: Jakub Dardzinski <[email protected]>
@JDarDagran
Copy link
Contributor Author

I think the metric (below) for our DB calls is a great start! But, what the metric is really trying to do is trace the HTTP call to the DB query (i.e. span). I do find it confusing (as I mentioned before) as these labels are unrelated to the metric itself endpoint_method, etc. Though I do think this is better represented as a span, I have some suggestions that will best represent the metric:

Given Example Metric:

marquez_sql_duration_seconds_sum{object_name="marquez.db.NamespaceDao",method_name="findAll",endpoint_method="GET",endpoint_path="/api/v1/namespaces",} 0.005786375

Define Metric:

# The time to make the DB call for a given HTTP endpoint.
marquez_db_duration_seconds_by_http_call

Labels:

  • sql_class: the name of the DAO class (replaces object_name)
  • sql_method: the name of the method called by the DAO class (replaces method_name)
  • http_method: the name of the HTTP method for the given call
  • http_path: the name of the HTTP path for the given call

@wslulciuc I did follow your advice and renamed metric and labels as per your suggestion. If you / any commiter could run CI tests (probably with git-push-fork-to-upstream-branch - I don't know why it doesn't trigger circleCI by default) and review, I'd really appreciate that.

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JDarDagran for improving our metrics! It's long over due. We'll be investment in this area more and more as we move towards a stable release of Marquez.

@wslulciuc wslulciuc merged commit 801831c into MarquezProject:main Jul 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants