-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ws-rollout] Add prometheus init check #16000
Conversation
When given a non-usable `--prometheus-url`, We start the rollout without verifying if the prometheus is reachable or not. This is a problem as we will be unable to get the metrics from prometheus and hence the rollout will be reverted later causing unnecessary time waste. This can be prevented by performing a simple check to see if the prometheus is reachable or not. `up` query is used instead of key metrics as we can't be sure of their existence. Signed-off-by: Tarun Pothulapati <[email protected]>
b443912
to
efe4e88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - have some non-blocking suggestions
err = analysis.CheckPrometheusReachable(ctx, conf.prometheusURL) | ||
if err != nil { | ||
log.WithError(err).Fatal("init: prometheus is not reachable") | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could remove the error return here, log.Fatal
will already exit the process
if prometheusURL == "" { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving this function to WorkspaceKeyMetricsAnalyzer
, e.g. WorkspaceKeyMetricsAnalyzer.CheckPrometheusReachable()
, and then use the api
client of the analyzer to check if it's reachable. You'd only need the v1Client.Query(ctx, "up", time.Now())
part then, deduplicates the construction of the API clients
@gitpod-io/engineering-delivery-operations-experience could we ask for your help with a review? It'd be great to test drive ws-rollout next week during the traffic shift. |
@kylos101 since I was the one reviewing tarun's work before, I can say that the PR looks good. @WVerlaek's review makes sense though, but I'm not sure @Pothulapati can push changes to the branch since he is not part of the org anymore (nor if he is willing to do so) @corneliusludmann could you give the ✅ just so the change gets merged and you'll apply the fix in another PR? |
/unhold We'll handle the suggestions in a follow-on PR, if needed, thank you @WVerlaek for the comprehensive review! Also, @nandajavarma appreciate you unblocking this! |
Description
When given a non-usable
--prometheus-url
, We start the rollout without verifying if the prometheus is reachable or not. This is a problem as we will be unable to get the metrics from prometheus and hence the rollout will be reverted later causing unnecessary time waste.This can be prevented by performing a simple check to see if the prometheus is reachable or not.
up
query is used instead of key metrics as we can't be sure of their existence.Signed-off-by: Tarun Pothulapati [email protected]
Related Issue(s)
Fixes #
How to test
Testing by passing a unreachable prometheus URL
Release Notes
Documentation
Build Options:
Experimental feature to run the build with GitHub Actions (and not in Werft).
leeway-target=components:all-ci
Run Leeway with
--dont-test
If enabled this will build
install/preview
Valid options are
all
,workspace
,webapp
,ide
,jetbrains
,vscode
,ssh