-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support configuring readiness probe timeout #197
Comments
edit 2: Found the root cause for slow startup times: Every time the registry restarted, an empty message is written to the kafka topic. Combined with the memory leak which caused the registry to restart frequently (every ~15min), adding empty messages caused >20k messages on the topic, which all were read+discarded+logged as tombstone messages during startup. To fix this, we backed up the schemes, set the retention time to clean up all old messages, then imported the schemes again. The referenced blog post did not help in our case though as the described steps did not delete the old messages. Now startup time on the cleaned up kafkasql topic dropped from 139s down to 5s. -> Suggest to prefer cleaning up the topic before sacrificing readiness probe timeouts. edit 1: the slow startup time and tombstone messages seem to be related to an older bug that got fixed but requires manual cleanup. Instead of raising startup probe timeouts, consider fixing the root cause instead (see https://www.apicur.io/blog/2021/12/09/kafkasql-storage-and-security). Confirming the same issue with slow initial startup of the registry (as of commit d490f6e / 1.1.0-dev). Since the operator does not reconcile changes on the probes of its Deployments, a workaround is to just patch the Deployment after the operator created it. Though this change will likely become overridden after a config change on your ApicurioRegistry resources:
In my case, the registry took up to 30min to start up (4 restart attempts), and issues thousands of these messages:
Would prefer to fix the root cause of the slow startup if possible, even though configurable probes would be nice too (along with resource/limit config). Anyone knows how this slow startup can happen at all / how to avoid that? |
We have good news to share here. We have started an implementation of a snapshotting mechanism that will reduce the startup time significantly. This new feature will be available with Apicurio Registry 3.0. |
See https://docs.openshift.com/container-platform/4.12/applications/application-health.html
This helps with an edge case in kafkasql storage, where a huge number of artifacts in the topic causes the pod to take too much time to get ready, resulting in a restart loop.
The text was updated successfully, but these errors were encountered: