-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a periodic test of the autoseal to detect loss of connectivity. #13078
Conversation
vault/seal_autoseal.go
Outdated
@@ -18,16 +21,20 @@ import ( | |||
// applicable in the OSS side | |||
var barrierTypeUpgradeCheck = func(_ string, _ *SealConfig) {} | |||
|
|||
const sealHeathTestInterval = 1 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we want to check this hourly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm back and forth on that. Since it's a cheap encrypt/decrypt I coded it to be more frequent. Open to thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is being done on all auto-unseal implementations, not just pkcs11 implementations. If that is the case could the costs associated with using the various KMS providers within the cloud start adding up? I don't have a good sense of those costs across the various implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sealHeathTestInterval
- should this be sealHealthTestInterval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sure should.
vault/seal_autoseal.go
Outdated
for { | ||
select { | ||
case <-d.healthCheckStop: | ||
d.healthCheck.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably being paranoid here but if we get multiple healthCheckStop we will get a nil deference error. Is it worth adding a quick test and return if d.healthCheck is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm the case statement will never fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Just review the seal test interval if we really want to run this every minute.
vault/seal_autoseal.go
Outdated
for { | ||
select { | ||
case <-d.healthCheckStop: | ||
d.healthCheck.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm the case statement will never fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I somehow failed to submit these comments yesterday.
vault/seal_autoseal.go
Outdated
lastTestOk = false | ||
d.core.MetricSink().SetGauge(autoSealUnavailableDuration, 0) | ||
} else { | ||
plaintext, err := d.Access.Decrypt(ctx, ciphertext, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same timeout (shared) across both Encrypt/Decrypt, or should we have a 1m timeout for each?
ciphertext, err := d.Access.Encrypt(ctx, []byte(testVal), nil) | ||
|
||
if err != nil { | ||
fail("failed to encrypt seal health test value, seal backend may be unreachable", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're in a func we could do an early return instead of an else.
Hello @sgmiller, I was wondering if you could please clarify a quick question I have around this PR (which is a great feature by the way). From my understanding, the only way to be verify if the seal backend is health would be by checking Vault logs. Do you know if there is any API that we could call to get the status of the seal backend without having to check the logs? If not, do you reckon that maybe |
This allows operators to spot problems with connections to their HSMs and
KMSes that would otherwise go unnoticed until the next unseal, where time is
critical.