-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom subscription name for Pub/Sub health check #330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good - great work for a first pass!
Left a comment for @meltsufin / @elefeint re: which return codes we want use for up/down signaling.
|
||
public PubSubHealthTemplate(PubSubTemplate pubSubTemplate, String subscription, | ||
long timeoutMillis) { | ||
super(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary to invoke super()
here.
if (t instanceof ApiException) { | ||
ApiException aex = (ApiException) t; | ||
Code errorCode = aex.getStatusCode().getCode(); | ||
if (errorCode == StatusCode.Code.NOT_FOUND || errorCode == Code.PERMISSION_DENIED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Given that we're now encouraging "bring your own subscription to use for health checks", do we really want to say that a not found is ok? Seems to me like this would mask a misconfiguration error, and the error would be swallowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to require users to bring their own subscription. This certainly would need to be documented well to clear up the confusion on why setting a custom subscription for health check if optional.
That being said, if the user provides a subscription, NOT_FOUND
or PERMISSION_DENIED
would indeed be error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that custom subscription should be optional -- for backwards compatibility, and for users who can't create a dedicated subscription due to permission structure.
So there are two paths through the healthcheck, which I'd probably handle as follows:
- a random subscription expected to return NOT_FOUND or PERMISSION_DENIED. Any other exception would signal downtime.
- a custom subscription, for which the expected "up" signal is not getting any exception. Either receiving messages or pulling an empty batch would be "up". But getting not found OR permission denied should be unexpected. My reasoning is that if a team/org has enough permissions to create a dedicated subscription, they are likely able to set permissions appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if/else will become even more unwieldy with my suggestions, btw, so you may want a helper method.
Would it be better to keep the custom subscription as a field of the On my first pass I thought keeping the custom subscription in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much benefit in making PubSubHealthTemplate
part of our public API.
It can either be subsumed under PubSubHealthIndicator
, or just made package-private. I would be fine with either.
if (t instanceof ApiException) { | ||
ApiException aex = (ApiException) t; | ||
Code errorCode = aex.getStatusCode().getCode(); | ||
if (errorCode == StatusCode.Code.NOT_FOUND || errorCode == Code.PERMISSION_DENIED) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to require users to bring their own subscription. This certainly would need to be documented well to clear up the confusion on why setting a custom subscription for health check if optional.
That being said, if the user provides a subscription, NOT_FOUND
or PERMISSION_DENIED
would indeed be error conditions.
try { | ||
this.pubSubTemplate.pull("subscription-" + UUID.randomUUID().toString(), 1, true); | ||
future.get(this.pubSubHealthTemplate.getTimeoutMillis(), TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want builder.up()
after this line?
long timeoutMillis) { | ||
super(); | ||
this.pubSubTemplate = pubSubTemplate; | ||
this.subscription = subscription; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subscription should be validated as existing as early as possible. (Fail fast)
Continued working on this and got to a point where it started to get a bit confusing. Consider the following scenario:
Will it be possible to have several Validate in |
@patpe One thing to note is that subscription name could be relative or fully-qualified as in |
@meltsufin I understand your point. My concern is the scenario where multiple Which of the |
Updated PR which now supports the following
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the intent to make the health-check async? If so, currently it doesn't seem to be doing it because of the future.get()
. You would instead need to register a success and a failure callback on the future, which will in turn set the heath status. This will probably be easier to do if you merged the PubSubHealthTemplate
with the PubSubHealthIndicator
into a single class.
.../src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthTemplate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthTemplate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthTemplate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthTemplate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthTemplate.java
Outdated
Show resolved
Hide resolved
Also please take a look at the Sonar Cloud report for the 5 code smells - they should be pretty simple to fix. |
Thanks for the feedback, incorporated your suggestions. ⛷️ for a week, hence late reply.
The intent was never to make the health check async and this is just poor naming of the PR from me. The intent was to use the async functionality of the PubSub API to guarantee a response from the health check in a configurable amount of time, thereby avoiding not returning a response under heavy load to the Kubernetes health probe. That being said, if making the health probe async is a better solution I can continue working with that as a target. My gut feeling about making it async was that the use case did not merit introducing that level of complexity. |
|
||
public PubSubHealthIndicator(PubSubTemplate pubSubTemplate) { | ||
public PubSubHealthIndicator(PubSubHealthTemplate pubSubHealthTemplate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since PubSubHealthTemplate
is now a package-private class, we can't make it part of the public interface here. So, we'll have to accept a bean of PubSubHealthTemplate
like before and just create the PubSubHealthTemplate
internally. Alternatively, you can just move the methods from the PubSubHealthTemplate
class into this class and remove PubSubHealthTemplate
altogether.
* if connection is successful by pulling message asynchronously from the pubSubHealthTemplate. | ||
* | ||
* If a custom subscription has been specified, this health indicator will only signal up | ||
* if messages are successfully pulled and acknowledged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth noting here that the custom subscription doesn't have to have messages on it. Also, it's important for users to realize that the topic that the custom subscription is for needs to be dedicated to the health check. I'm very worried that users will accidentally create a health subscription to a topic that is also used for other purposes and potentially lose messages.
return createContributor(pubSubHealthTemplates); | ||
} | ||
|
||
private void validatePubSubHealthTemplate(String name, PubSubHealthTemplate pubSubHealthTemplate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this bit of code duplication with the PubSubHealthIndicator
be avoided?
* @author Patrik Hörlin | ||
* | ||
*/ | ||
class PubSubHealthTemplate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know at first I said it probably didn't matter that we're introducing this additional class, as long as it's package-private, but I'm now seeing more complexity introduced due to it. What's the benefit you're seeing from having this class?
Refactored so that I have some thoughts about the way the health indicator is validated. Should it
@meltsufin I agree with your concerns and I have some doubts about the current implementation in this PR. If forced to choose between the following two scenarios
I would prefer to trigger option 2 since it is the lesser of the two evils. Should we perhaps remove the |
The nice thing about validation in the constructor is that we force any problems to surface during app startup, but on the other hand, if no other health indicator does it, I think it should be fine to just remove it.
What do you think of a compromise where we just add a configuration property for acking and default it to |
@patpe I would like to help you polish this PR. Would you mind giving me access to your fork? |
Thanks @meltsufin, I have given you access and you should be receiving a notification about it soon. Combining this PR with work proved a greater challenge than I anticipated, will dive into it again right now. |
@meltsufin Added support for toggling acknowledge of messages on or off, off by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patpe I've polished up the pull request and added some reference documentation. Please see if looks good to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only substantial comment is about subscription names needing to start with a letter.
Otherwise lgtm with minor suggestions.
...src/main/java/com/google/cloud/spring/autoconfigure/pubsub/health/PubSubHealthIndicator.java
Outdated
Show resolved
Hide resolved
this.subscription = healthCheckSubscription; | ||
} | ||
else { | ||
this.subscription = UUID.randomUUID().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subscription names must start with a letter. I'd suggest a prefix of "spring-cloud-gcp-healthcheck".
https://googleapis.dev/java/google-cloud-pubsub/latest/com/google/pubsub/v1/Subscription.Builder.html#setName-java.lang.String-
catch (ExecutionException e) { | ||
if (!isHealthyException(e)) { | ||
validationFailed(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment after the if
block for // ignore expected exceptions
in lieu of the else
clause.
public void healthIndicatorPresent() { | ||
public void healthIndicatorPresent() throws Exception { | ||
PubSubTemplate mockPubSubTemplate = mock(PubSubTemplate.class); | ||
ListenableFuture<List<AcknowledgeablePubsubMessage>> future = mock(ListenableFuture.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to mock it; use ApiFutures.immediateFuture()
. It might help with unchecked warnings, too, since you can return a typed list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I actually want to verify calls on the future, I think I'll keep the mock.
"management.health.pubsub.enabled=true", | ||
"spring.cloud.gcp.pubsub.health.subscription=test", | ||
"spring.cloud.gcp.pubsub.health.timeout-millis=1500", | ||
"spring.cloud.gcp.pubsub.health.acknowledgeMessages=true") | ||
.run(ctx -> { | ||
PubSubHealthIndicator healthIndicator = ctx.getBean(PubSubHealthIndicator.class); | ||
assertThat(healthIndicator).isNotNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is testing context loading, might want to verify that the health indicator validation got invoked (that template.pullAsync() got called)
properties.setSubscription("test"); | ||
|
||
PubSubTemplate mockPubSubTemplate = mock(PubSubTemplate.class); | ||
ListenableFuture<List<AcknowledgeablePubsubMessage>> future = mock(ListenableFuture.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApiFutures.immediateFailedFuture()
will help.
} | ||
|
||
@Test | ||
void customSubsription_TimeoutException() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void customSubsription_TimeoutException() throws Exception { | |
void customSubscription_TimeoutException() throws Exception { |
|
||
private void pullMessage() throws InterruptedException, ExecutionException, TimeoutException { | ||
ListenableFuture<List<AcknowledgeablePubsubMessage>> future = pubSubTemplate.pullAsync(this.subscription, 1, true); | ||
List<AcknowledgeablePubsubMessage> messages = future.get(timeoutMillis, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the future.get(..)
is blocking right? Should you register a callback here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to keep it synchronous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is there a benefit to using pullAsync vs. the Sync pull then?
…spring/autoconfigure/pubsub/health/PubSubHealthIndicator.java Co-authored-by: Elena Felder <[email protected]>
Kudos, SonarCloud Quality Gate passed! |
…dPlatform#330) Fixes: GoogleCloudPlatform#236. Co-authored-by: Mike Eltsufin <[email protected]>
Bumps [maven-surefire-plugin](https://github.com/apache/maven-surefire) from 3.0.0-M4 to 3.0.0-M5. - [Release notes](https://github.com/apache/maven-surefire/releases) - [Commits](apache/maven-surefire@surefire-3.0.0-M4...surefire-3.0.0-M5) Signed-off-by: dependabot-preview[bot] <[email protected]> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
First, the main issue that I want to address can be seen by looking at https://github.com/Predictly/gcp-pubsub-health where Kubernetes put's the pod in an restart loop since it fails to respond to
/actuator/health
.Here are my thoughts that went into designing a solution:
PubSubHealthIndicator
which will make it possible to return with an unknown state during the health check even if the application is under heavy loadMy concerns related to this could be mitigated by documenting how these attributes in the new
PubSubHealthIndicatorProperties
should NOT be usedI have tested my code in the
gcp-pubsub-health
application and it removes the issue with Kubernetes restarting the application due to liveness timeouts, under load it responds based on the timeout configured in the new propertyspring.cloud.gcp.pubsub.health.timeout-millis
with the followingIf this looks good I can go ahead and update the docs as well. First time contributing so any feedback is welcome.