-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health probes to GRPC servers validating network access #8777
Conversation
Codecov Report
@@ Coverage Diff @@
## main #8777 +/- ##
==========================================
+ Coverage 11.17% 14.40% +3.22%
==========================================
Files 18 51 +33
Lines 993 4900 +3907
==========================================
+ Hits 111 706 +595
- Misses 880 4124 +3244
- Partials 2 70 +68
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
04cef51
to
541d8dc
Compare
Logs:
|
/hold we need to ensure the postStart waits for the readinessCheck |
b1fae11
to
13e5ff0
Compare
87bc8f4
to
502ed72
Compare
33391df
to
37f8a17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Leaving the hold, as we still need to determine why we're having trouble with image pulls in core-dev
/werft run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM opens workspaces now, woot! Removing hold.
/unhold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change - it's surprising how long we got by without this.
components/blobserve/cmd/run.go
Outdated
health.AddLivenessCheck("dns", kubernetes.DNSCanResolveProbe(staticLayerHost, 1*time.Second)) | ||
health.AddLivenessCheck("registry", kubernetes.NetworkIsReachableProbe(fmt.Sprintf("http://%v", repository))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want blobserve to go down if we can no longer reach those? Chances are things are cached and blobserve would just keep functioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can remove those. Now, How do we know there's an error in blobserve without a restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want to add metrics/alerts for the error cases we know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thank you for this change.
Description
Running registry-facade without probes could lead to workspace issues during the start phase. In particular with requests downloading images.
The new probes ensure we can resolve DNS queries and at least we have access to the static layers defined in the configuration to avoid such issues.
How to test
Registry facade should start without any issues.
Release Notes
Documentation