Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if Liveness Probing is Production Grade #72

Closed
johnrk-zz opened this issue Jul 13, 2020 · 5 comments
Closed

Investigate if Liveness Probing is Production Grade #72

johnrk-zz opened this issue Jul 13, 2020 · 5 comments

Comments

@johnrk-zz
Copy link
Contributor

johnrk-zz commented Jul 13, 2020

During Monday's (7/13) demo of the Operator, it was identified that our liveness probing for Kubernetes may need to improve. This will be helpful to have to have proper end to end testing.

For our end to end testing, we verify that clusters are setup properly. But we don't check to see if they're running properly. We don't run any SQL queries, and we don't check for liveness.

@johnrk-zz johnrk-zz added the bug Something isn't working label Jul 13, 2020
@johnrk-zz johnrk-zz changed the title [bug] Improve Liveness Probing for Kubernetes with CockroachDB Investigate if Liveliness Probing is Production Grade Jul 13, 2020
@johnrk-zz johnrk-zz removed the bug Something isn't working label Jul 13, 2020
@vladdy
Copy link
Contributor

vladdy commented Jul 13, 2020

Just a small correction. e2e tests check for liveness of all pods - https://github.com/cockroachdb/cockroach-operator/blob/master/e2e/assert.go#L138

@johnrk-zz johnrk-zz changed the title Investigate if Liveliness Probing is Production Grade Investigate if Liveness Probing is Production Grade Jul 13, 2020
@chrislovecnm
Copy link
Contributor

@chrisseto can we close this?

@chrisseto
Copy link
Contributor

I'm tempted to say that we should leave in this open? In CC, we actually don't have liveness probes installed. We've found that the probes timeout at high loads, so k8s kills the pods making the issue even worse.
We've not encountered any issues that would be solved by adding it back in however.

@udnay
Copy link

udnay commented Jun 26, 2021

Can we close this?

@mbrancato
Copy link
Contributor

Noting here that the upstream manifests have disabled liveness probes and we experienced liveness probe failures under heavy load causing more latency in the cluster.

Support referenced these:
cockroachdb/cockroach#67080
cockroachdb/cockroach#44832

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants