Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow retries on Vault MP Config Source initialization #20343

Merged
merged 1 commit into from
Oct 8, 2021
Merged

Allow retries on Vault MP Config Source initialization #20343

merged 1 commit into from
Oct 8, 2021

Conversation

vsevel
Copy link
Contributor

@vsevel vsevel commented Sep 22, 2021

During the initialization of the Vault Config Source, vault (or one specific replica) might be temporarily unavailable.
In that case the rest call through the vertx web client will throw a mutiny timeout exception, which will make the application fail at startup.
Retrying, sometimes just once, may be enough to successfully fetch the secrets from the Vault (or a good replica) after an initial failure, which will allow for the application to continue its initialization, and avoid a crash and a restart.
Ideally the remote service should always be up. But this can't be guaranteed, and since it is on the critical path of application startup, it is fair to add some resiliency.
This PR provides a way to allow some retries during the very first time secrets get fetched by the vault config source.
Once fetched successfully once, the retries will not be used again. Instead we will rely on the internal cache.
Retries are used only in the context of the vault config source. In particular it is not being used by the different programmatic interfaces for the supported engines.
To summarize it is only used by the vault config source, and only on the very first access.

@vsevel vsevel requested a review from geoand September 22, 2021 21:16
@vsevel vsevel requested a review from sberyozkin September 22, 2021 21:17
Copy link
Contributor

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vsevel
Copy link
Contributor Author

vsevel commented Sep 23, 2021

not ready to merge. I am reworking the impl.

@vsevel
Copy link
Contributor Author

vsevel commented Sep 23, 2021

hello @geoand . thanks for the initial review. there were a few shortcomings I realized and wanted to address:

  • it was doing the retries on the read secret operation, when the very first call is actually the login. so the initial impl would have failed on the login first
  • I was retrying only on read timeouts, not on idle timeout (connection gets closed), cannot connect, or connection closed (e.g. reset by peer)

so I changed the design to translate the different error conditions into a common exception VaultIOException and handled it directly into the config source.

I also changed the default read timeout to be 5S instead of 1S, which was largely insufficient in case of a temporary slow down.

does this sound reasonable? thanks for your time.

@vsevel vsevel added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Sep 23, 2021
@geoand
Copy link
Contributor

geoand commented Sep 23, 2021

I'll let @sberyozkin review this as well

@geoand geoand removed the triage/waiting-for-ci Ready to merge when CI successfully finishes label Sep 23, 2021
@vsevel
Copy link
Contributor Author

vsevel commented Sep 29, 2021

hi @sberyozkin how are you? any chance you could take a look? thanks.

@vsevel
Copy link
Contributor Author

vsevel commented Oct 6, 2021

hi @sberyozkin do you think you will be able to take a look? or if you do not have time, would you know somebody who may be able to do it? thanks.

@sberyozkin
Copy link
Member

Hi Vincent @vsevel very sorry, I keep missing many pings, please ping me on Zulip if I don't respond, if I don't it means I have missed the ping :-)
I'll do review shortly today

@sberyozkin
Copy link
Member

Let me do it right now...

// happens if we reach the atMost condition - see UniAwait.atMost(Duration)
throw new VaultIOException(e);

} catch (VertxException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsevel in quarkus-oidc we add a few retries in such cases, so please consider it, but it is not needed for this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initially I had the retry logic in the vault client itself. but I did not want to apply it to all calls. so I started passing args around to control when and when not to retry. to avoid changing to much the behavior (introducing lags instead of errors), I wanted to limit it to the very first time we fetch properties out of the vault, because it is in the critical path for startup. we will see if this needs to be added in other use cases.

@sberyozkin
Copy link
Member

@vsevel It looks fine, simple and logical improvement :-), sorry for keeping you waiting

@sberyozkin sberyozkin self-requested a review October 8, 2021 13:14
@sberyozkin sberyozkin merged commit 2c499b1 into quarkusio:main Oct 8, 2021
@quarkus-bot quarkus-bot bot added this to the 2.4 - main milestone Oct 8, 2021
@vsevel
Copy link
Contributor Author

vsevel commented Oct 8, 2021

Hi Vincent @vsevel very sorry, I keep missing many pings, please ping me on Zulip if I don't respond, if I don't it means I have missed the ping :-) I'll do review shortly today

hi @sberyozkin I figured you had other things to do too ;-)
initially I pinged @geoand because the improvement was more related to config sources and app startup, rather than vault itself.
anyway, thanks for the review + merge.

@vsevel vsevel deleted the vault_retry branch October 8, 2021 14:12
@vsevel vsevel mentioned this pull request Oct 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants