-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add livez and readyz for etcd #16651
Conversation
Tested with local server
|
Cool stuff and thanks for the PR, I was wondering why you're adding this to the GRPC API? We already have a health handler on the http server: https://github.com/etcd-io/etcd/blob/main/server/embed/etcd.go#L746 Just FYI, we had huge issues in the past with running etcdctl commands as a health probes in openshift with regards to zombie processes. We're much happier with just letting kubelet hit an http endpoint instead. |
I think we can start from http endpoint. I don’t think backporting gRPC changes to release-3.5 is feasible and accepted. It will literally change the customer facing API even if it is just maintenance service. |
ae3789a
to
0705f0e
Compare
Removed gPRC change. |
f5d4788
to
8abcec0
Compare
8abcec0
to
f588767
Compare
f588767
to
09da096
Compare
9e9b0ab
to
c6de0cb
Compare
c6de0cb
to
9e30c01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice work!
Please rebase this PR and resolve the left comment, thx |
if _, found := r.URL.Query()["verbose"]; found { | ||
fmt.Fprint(w, h.Reason) | ||
} | ||
fmt.Fprint(w, "ok\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to implement verbose for per check endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per check endpoint still print the detailed error message in verbose, and no details in non verbose. In theory, the user can get all the details from the root path verbose. But I think it still makes sense to follow the same paradigm.
9e30c01
to
9a8478c
Compare
@ahrtr Rebased the PR. Please review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
9a8478c
to
936cb03
Compare
2ac4daf
to
b371a49
Compare
Add two separate probes, one for liveness and one for readiness. The liveness probe would check that the local individual node is up and running, or else restart the node, while the readiness probe would check that the cluster is ready to serve traffic. This would make etcd health-check fully Kubernetes API complient. Signed-off-by: Siyuan Zhang <[email protected]>
b371a49
to
7a57e06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for your first contribution, and great work!
Please also add a followup item to update the doc in etcd-io/website.
Please note that this PR doesn't implement the full design as it was presented. Livez works correct, however we are missing checks for readyz making it not very useful. Please see 80ab2ad I think this is ok to merge, however until we finish implementing readyz, we shouldn't backport nor document the new endpoints. |
#16007
This is a prototype for adding livez/readyz support to etcd (design doc).
This pr is setting up the general structure for livez/ready checks, with only 2 simple checks implemented.