-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pub/Sub Subscriber health check #613
Pub/Sub Subscriber health check #613
Conversation
Kudos, SonarCloud Quality Gate passed! |
@gkatzioura Apologies for the delay. You've put a lot of thought and work into the pull request, and we'd like to treat it as carefully. We currently have some work in-flight restructuring the Pub/Sub properties to enable per-subscription setting customization (example: #562) that will both conflict, and possibly affect direction of your PR. It would be better to rebase and review it in a couple of weeks, once that work is completed. One question I had about the addition of monitoring metrics -- how will |
Another directional question -- GKE forces pod restart when it detects unhealthy indicators. But restarting pods when there is already a scaling issue is counterproductive, as the load won't get any better if the pods are thrashing (we've had issues with the existing global indicator under load -- spring-attic/spring-cloud-gcp#2628). So I am wary of indicating |
Hi @elefeint Will answer your questions.
Will explain marking unhealthy based on the backlog. In a scenario of multiple pods/workers it is expected to have lot's of outstanding messages, also if subscribers are closed or stopped pulling messages the
If a subscriber consumes messages, regardless of how big the backlog gets ( If the subscriber is not healthy messages cannot be processed or acknowledged. An error or exception probably occurs while processing. On this scenario because the indicator of recently completed processing is not there, we need to evaluate if there are no messages on the backlog or if messages do exist and wait to be processed (thus our subscriber is unhealthy). Based on this logic the following scenarios happen
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience! We can start working on this PR -- it's very useful to have this per-subscription healthcheck functionality.
First, could you revert the unrelated whitespace changes? This project still uses Spring's checkstyle for now -- https://github.com/spring-cloud/spring-cloud-gcp/blob/main/CONTRIBUTING.adoc#code-formatting-guidelines
...e/src/main/java/com/google/cloud/spring/autoconfigure/pubsub/GcpPubSubAutoConfiguration.java
Outdated
Show resolved
Hide resolved
...d/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicatorAutoConfiguration.java
Outdated
Show resolved
Hide resolved
Hey @gkatzioura, thank you for waiting and your contribution! PR #638 is merged so you can rebase this PR now. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
520743a
to
4616ffa
Compare
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
4616ffa
to
9ab58d0
Compare
Hi @elefeint & @mpeddada1 did rebase so seems ok for a review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another batch of comments. I apologize for the delays -- I will also be out most of next week, which will not help matters.
.../test/java/com/google/cloud/spring/autoconfigure/pubsub/GcpPubSubAutoConfigurationTests.java
Outdated
Show resolved
Hide resolved
@Bean | ||
@ConditionalOnMissingBean(name = "pubsubSubscriberThreadPool") | ||
@ConditionalOnProperty("spring.cloud.gcp.pubsub.subscriber.executorThreads") | ||
public ThreadPoolTaskScheduler pubsubSubscriberThreadPool() { | ||
return threadPool("gcp-pubsub-subscriber", this.gcpPubSubProperties.getSubscriber().getExecutorThreads()); | ||
} | ||
|
||
@Bean | ||
@ConditionalOnMissingBean(name = "subscriberExecutorProvider") | ||
@ConditionalOnProperty("spring.cloud.gcp.pubsub.subscriber.executorThreads") | ||
public ExecutorProvider subscriberExecutorProvider( | ||
@Qualifier("pubsubSubscriberThreadPool") ThreadPoolTaskScheduler scheduler) { | ||
return FixedExecutorProvider.create(scheduler.getScheduledExecutor()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice structure. However, after the rebase, we have two "default" subscribers thread pools/executors -- one defined here that is exclusively used for the health checks, and the globalScheduler
that's dynamically registered. We've been trying to deprecate the global scheduler bean that is not dynamic that's being extracted here.
Since our subscriber thread pool management is so complicated now (there is a deprecated global subscriber threadpool bean, and then there is a default subscriber threadpool constructed from properties, plus separate configurations possible for each subscription), I think it would be best to have a dedicated thread pool for Pub/Sub healthchecks. Then it will never interfere with production workloads.
Could you:
- rename the subscriber executor provider bean to make it clear that it's for health checks. It can be shared between publisher and subscriber health checks in the future; no need to differentiate here.
- Fold the new dedicated threadpool/executor beans into PubSubSubscriptionHealthIndicatorAutoConfiguration.java; this way it will be possible to turn it off with the rest of configuration.
- Put the publisher thread pool configuration back. Let's keep the scope of this PR to subscription logic only.
- This class, pleasant to look at as it is, will then go away.
...d/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicatorAutoConfiguration.java
Outdated
Show resolved
Hide resolved
...d/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicatorAutoConfiguration.java
Outdated
Show resolved
Hide resolved
private Integer lagThreshold; | ||
|
||
private Integer backlogThreshold; | ||
|
||
private Integer lookUpInterval = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get those in a subclass, for logical grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has moved to PubSubConfiguration.Health
...d/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicatorAutoConfiguration.java
Outdated
Show resolved
Hide resolved
...a/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicator.java
Show resolved
Hide resolved
public void setProjectId(String projectId) { | ||
this.projectId = projectId; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping the current project ID inside of HealthTrackerRegistry
? Then it does not need to be set separately. registerTracker()
could b eoverloaded to operate on either an object or a string (which can be either short or fully qualified subscription names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will have a default-project-id on the HealthTrackerRegistry, the one derived by default by the other modules GcpProjectIdProvider
.
If the user wants to use subscriptions from multiple projects (my everyday case), then the user should provide the fully qualified subscription name.
For example projects/<project_name>/subscriptions/<subscription_name>
instead <subscription_name>
If short subscription name is provided the the project-id on the registry shall be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense.
@elefeint Lines 205 to 245 in a9f3eb4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are in the final stages here -- I've tested the health check with one of our samples, and it works great. Really nice to look at code, too.
No rush on this round of comments -- I am unavailable until Thursday for review. I apologize for the multiple layers of delays here.
- Add documentation
- add warning that the health indicator will not behave entirely as expected if Dead Letter Queueing is enabled on the subscription being checked (
num_undelivered_messages
will drop down by itself after DLQ threshold is reached).
- add warning that the health indicator will not behave entirely as expected if Dead Letter Queueing is enabled on the subscription being checked (
- document the new required (lagThresholad, backlogThreshold) and optional (lookupInterval) properties. Describe units of measurement -- seconds, number of messages.
- Javadoc all public interface methods. HealthTrackerRegistry especially needs javadoc, make sure to describe seconds being the units of measurement for lagThreshold -- we've had users confused about granularity before.
- Replace
@since 2.0.5
with@since 2.0.6
- Could you take a look at Sonar's code smell list?
No problem on the thread pool creation duplication; the logic is straightforward enough. In principle, `GcpPubSubAutoConfiguration.createThreadPoolTaskScheduler(Integer executorThreads, String threadName) could be turned into a reusable utility but it's probably not worth it.
...d/spring/autoconfigure/pubsub/health/PubSubSubscriptionHealthIndicatorAutoConfiguration.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/com/google/cloud/spring/autoconfigure/pubsub/GcpPubSubAutoConfiguration.java
Outdated
Show resolved
Hide resolved
...ain/java/com/google/cloud/spring/pubsub/integration/inbound/PubSubInboundChannelAdapter.java
Show resolved
Hide resolved
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very useful healthcheck, and putting up with the lengthy review process. This is now good to go.
Hi @elefeint ! I am grateful for your time and assistance on this! |
Add per-subscription healthcheck validating that there was a recent successfully processed message on the subscription or that the backlog represented by `num_undelivered_messages` Cloud Monitoring metric is under acceptable threshold.
Bumps [reactor-bom](https://github.com/reactor/reactor) from 2020.0.26 to 2022.0.1. - [Release notes](https://github.com/reactor/reactor/releases) - [Commits](reactor/reactor@2020.0.26...2022.0.1) --- updated-dependencies: - dependency-name: io.projectreactor:reactor-bom dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The purspose of this PR is to add health check to PubSub subscribers.
This is not in conflict with the health check of the pubsub publisher, and can be used in parallel.
The criteria for the health check of a subscriber is the message backlog and the last processed message.
If the message has been recently processed in a reasonable time threshold, then the subscriber is healthy.
If the backlog of messages is big but the subscriber consumes messages then this is a scalling issue.
If there hasn't been any processing of recent messages but the backlog increases, then the subscriber is unhealthy
Since there can be multiple subscriptions, the subscriptions shall be enabled per subscriber creation.
The method that processes the messages shall be wrapped in order for a method tracker to receive the recently processed message.
Eventually there would be multiple health trackers per subscription
Since PubSubInboundChannelAdapter precedes DefaultSubscriberFactory it will register the tracker first and DefaultSubscriberFactory will avoid adding any health checks.
If DefaultSubscriberFactory is used directly is will add a health check if enabled.
Added a separate
PubSubExecutorConfiguration
since the health check for subscribers should be autoconfigured before the actual PubSub configuration.