Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP PubSub scaler produces large amount of errors when subscription has no messages. #5896

Closed
Caislear opened this issue Jun 19, 2024 · 2 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@Caislear
Copy link
Contributor

Caislear commented Jun 19, 2024

Report

The GCP Pub/Sub Scaler produces large amounts of "error getting metric" errors when the Pubsub subscription is empty. In our case it's producing 10's of thousands of error messages per day. Along with causing excessive error spam, this also appears to cause issues with Flux deployment reconciliations, as the scaler sometimes does not get marked as healthy during scaler changes and results in delayed deployments.

Expected Behavior

There to be some form of mechanism for accepting the fact that sometimes subscriptions may have no messages for an extended period of time. Such as the valueIfNull default value feature on the older Stackdriver scaler. This allows us have a default fallback value if there is no data returned from the backing GCP metric.

Actual Behavior

Keda logs large amounts of "error getting metric" and "error getting scale decision" as result of having no default fallback value. From testing this appears to not only cause excessive error spam, but somtimes interferes with our Flux deployments as the scaler fails to be marked as healthy due to failing to resolve a scale decision and thus delays our deployments.

Steps to Reproduce the Problem

  1. Create a pubsub scaler
  2. Have it pointing at Pubsub subscription that only receives sporadic messages throughout a given day. (The majority of the time the subscription is empty)
  3. Monitor over the period of the day and note repeated repeated log messages with failing to get metric and failing to make scale decision

(The more scalers the more noticeable this issue is, in our case we have 40+ pubsub scalers)

Logs from KEDA operator

could not find stackdriver metric with filter fetch pubsub_subscription | metric 'pubsub.googleapis.com/subscription/oldest_unacked_message_age' | filter (resource.project_id == 'xxx' && resource.subscription_id == 'xxx') | within 2m

KEDA Version

2.14.0

Kubernetes Version

1.28

Platform

Google Cloud

Scaler Details

Google Cloud Platform Pub/Sub

Anything else?

The most straightforward solution I have is to add in the functionality to have a default value that pubsub scalers can optionally use if configured when the Google Cloud Platform metrics returns no value/null. This functionality already exists on the default Stackdriver scaler implementation and as a result is easy to port.

I have created a pr that adds this functionality onto the pubsub scaler and this resolves our issues with Keda continuously logging errors. Found here

Copy link

stale bot commented Aug 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Aug 18, 2024
@Caislear
Copy link
Contributor Author

This has been resolved with the recent changes I made in version 2.15.0. Closing issue out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

1 participant