Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcp_pubsub: Support distribution-valued metrics and metrics from topics #5246

Merged
merged 12 commits into from
Dec 7, 2023

Conversation

kevinmingtarja
Copy link
Contributor

@kevinmingtarja kevinmingtarja commented Dec 1, 2023

This PR does two main things:

  • Support distribution-valued metrics for pubsub by exposing a new aggregation field:
    Previously, only single-valued metrics are supported, because we didn't do any aggregations in our query to GCP and without aggregations, distribution-valued metrics are not meaningful. With this change, it opens up a wide range of possibilities of trigger events, for example, scaling based on median or p99 of subscription/ack_latencies, etc.

  • Support metrics from topics by exposing a new topicName field:
    Previously, only metrics from subscriptions are available to use. With this change, it unlocks more use cases, but the main one is that it enables users to scale based on the rate of incoming messages from the publisher, for example by using topic/send_request_count, or a count/sum of topic/message_sizes.

    This is the main request from issue Add support for scaling by Google Pub/Sub subscription publish rate #5070.

I designed it such that there are no breaking API changes to the gcp_pubsub scaler, only addition of two new fields, topicName and aggregation.

Internal API changes:

  • Added a QueryMetrics() method in StackDriverClient, which under the hood calls GCP's QueryTimeSeries(). This is done because (1) I couldn't figure out how to properly do the aggregation with the GetMetrics() method. And (2) it provides a more expressive way of querying as well, as it uses the newer and better Monitoring Query Language (MQL) from GCP, as compared to the old stackdriver (old name for GCP's monitoring suite) filters we're currently using in GetMetrics(). This allows us to expose a simpler API as well to users.

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • I have verified that my change is according to the deprecations & breaking changes policy
  • Tests have been added
  • Changelog has been updated and is aligned with our changelog requirements
  • [N/A] A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #5070

Relates to kedacore/keda-docs#1271

Copy link

github-actions bot commented Dec 1, 2023

Thank you for your contribution! 🙏 We will review your PR as soon as possible.

While you are waiting, make sure to:

Learn more about:

Signed-off-by: Kevin Mingtarja <[email protected]>
Signed-off-by: Kevin Mingtarja <[email protected]>
Signed-off-by: Kevin Mingtarja <[email protected]>
@kevinmingtarja kevinmingtarja marked this pull request as ready for review December 3, 2023 17:12
@kevinmingtarja kevinmingtarja requested a review from a team as a code owner December 3, 2023 17:12
@kevinmingtarja
Copy link
Contributor Author

kevinmingtarja commented Dec 3, 2023

For reference, this is what it's going to look like to scale based on the rate of messages being published to a topic:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
...
spec:
  ...
  triggers:
  - type: gcp-pubsub
    metadata:
      # Example from: https://cloud.google.com/pubsub/docs/monitoring#monitoring_message_throughput_2
      mode: "MessageSizes"
      aggregation: "count"
      topicName: "mytopic"
      ...

This calculates a count of messages being sent to a topic by applying a count aggregator to the topic/message_sizes metric (which is a distribution-valued metric).

Internally, this is the MQL query that is generated from the above and fired to GCP's API.

fetch pubsub_topic
| metric 'pubsub.googleapis.com/topic/message_sizes'
| filter (resource.topic_id == 'mytopic')
| within 5m
| align delta(3m)
| every 3m
| group_by [], count(value)

@kevinmingtarja kevinmingtarja changed the title gcp_pubsub: support distribution-valued metrics and metrics from topics gcp_pubsub: Support distribution-valued metrics and metrics from topics Dec 3, 2023
@kevinmingtarja
Copy link
Contributor Author

One of the CI seems to be failing on TestPrometheusScalerExecutePromQueryParameters after I merged from main, which is unrelated to this PR, should we re-run the CI job?

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution, could you please also add e2e test for this scenario?
Existing GCP tests: https://github.com/kedacore/keda/tree/main/tests/scalers/gcp
Test infra, if anything is needed to be modified on the GCP setup: https://github.com/kedacore/testing-infrastructure/tree/main/terraform/modules/gcp

@kevinmingtarja
Copy link
Contributor Author

kevinmingtarja commented Dec 6, 2023

Hi @zroubalik, I've added an e2e test for this scenario. I copied the test from gcp_pubsub_test.go then modified it to use different parameters to scale based on the topic/message_sizes metric.

I think no change is needed on the test infra side, since the existing GCP service account should have enough permissions, as i didnt change the setup step as well from gcp_pubsub_test.go.

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinmingtarja thanks a lot!

I wonder whether we shouldn't include your test into the existing gcp_pubsub test, just add another test case that will test different ScaledObject. Because majority of the testcode is same. @JorTurFer WDYT?

@JorTurFer
Copy link
Member

yeah, if almost all the code is the same, I'd extend current test, but I don't have any strong opinion being honest. With rabbit we use multiple test cases too

@kevinmingtarja
Copy link
Contributor Author

kevinmingtarja commented Dec 7, 2023

Just a heads up, I fixed the merge conflicts due to PR #5258, refactored it a bit and wrote a test, and at the same time included the changes from there in the new StackDriverClient.QueryMetrics() function I wrote for this PR.

Also, it seems like CI is failing again on an unrelated test (TestPredictKubeGetMetrics) after I merged my branch with main.

@zroubalik
Copy link
Member

zroubalik commented Dec 7, 2023

/run-e2e gcp
Update: You can check the progress here

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the great contribution! Awesome feature.

@zroubalik zroubalik merged commit b730a5f into kedacore:main Dec 7, 2023
18 of 19 checks passed
@kevinmingtarja kevinmingtarja deleted the gcp_pubsub_topic_monitor branch December 7, 2023 17:14
@kevinmingtarja
Copy link
Contributor Author

Thanks @zroubalik for the review!

toniiiik pushed a commit to toniiiik/keda that referenced this pull request Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for scaling by Google Pub/Sub subscription publish rate
3 participants