-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide the ability to add custom health checks that precede and follow ClickHouse reconfiguration/upgrade #1477
Comments
This is a very interesting topic that requires detailed discussion. However, these are my brief thoughts on the matter.
Clickhouse-operator has timeouts for host reconcile processes. Some additional or "general" timeout(s) should be introduced, if we are talking about waiting for external permission to reconcile a host.
There is no such a special case as "upgrade" at the moment, operator performs general reconcile. I'd rather avoid narrow cases and prefer to go with more generic approach. Introducing some kind of "external hooks" may be a proper way to go. Details are to be specified additionally.
This question is the good starting point to think on how:
Anyway, this is a big topic to discuss and to plan, which we can start to. |
Just a quick idea - option to launch k8s job from user provided template and wait for its completion before starting ch pod reconciliation (or some condition). We might augment this job with well-known env variables (such as pod to be reconciled, all cluster pods). Important advantage could be that we will move all the configuration (secrets, job.podFailurePolicies and others) to k8s and have build-in way to watch/debug it. Job could even push metrics/logs to collector during its runtime. |
👋 Hey folks! I'm an engineer over at Mux, the company asking y'all for this feature 🙌 Overall I think what y'all are saying makes sense, for some clarity I think we only need the ability to execute a single SQL statement before progressing onto the next pod to ensure our strict latency requirements are met. For example, a query we'd want to run before a pod rolls is something like this (used a random value threshold since I'm not 100% sure what replication lag threshold we'd want to block progress on):
I was thinking it could be implemented similar to how Prometheus alerts work, where any data returned means a failure: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
It looks like we already have some logic that runs SQL before it proceeds with the reconcile here:
Under the hood, that calls clickhouse-operator/pkg/model/chi/schemer/schemer.go Lines 151 to 154 in d5f265f
clickhouse-operator/pkg/model/chi/schemer/sql.go Lines 244 to 246 in d5f265f
We could just add another function call right above/below I think all we really need is the ability to specify a single SQL statement in the custom resource, maybe a field like Curious what y'all think of this! |
@com6056, thank you for your input. As an alternative, would it make sense to have this custom SQL as a part of the readiness probe? Or even a special condition to include host to the load balancer. I am asking because syncing the replica may take quite a lot of time. Operator has no visibility what is going on when check is failing. What is a timeout value? Alternatively, we may add replica wait logic to operator explicitly, without a custom SQL. In this case, operator will know what is going on. It will be able not just monitor the replica delay, but also check replication queue and see if there is any progress, or host is stuck. |
I think the latter idea sounds best with the operator being in full control of the rolling based on those metrics. Ideally we don't want pods rolling if the replication delay is above a certain amount and I don't believe a readiness probe halts rolling of pods with this setup (not 100% sure how it all works in your operator since you have a special StatefulSet setup), the probe only prevents the pod from being served traffic. |
We had a recent discussion with a user who has the following use case. They have an operator that watches for replication lag on ClickHouse and only schedules upgrade when it's below a certain level. They were wondering if the operator could help them implement logic that amount to "guard conditions" on operator upgrades. It could work as follows:
In this case the guard condition is replica lag but it's likely that there would be others. They and other users would presumably want to customize the conditions.
It seems as if this could be implemented in a general way by adding a custom health check section to CHI instances that contain one or more SQL queries to run before and after upgrade. If the check(s) pass, the upgrade could proceed. If they fail the operator would wait.
There are some questions with this kind of guard that we need to contemplate.
The text was updated successfully, but these errors were encountered: