-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
component startup ordering #23
Comments
It might help to break this down into two scenarios:
The first scenario reminds me of environment setup tasks that require elevated privileges. I don't think the application model should define this. There is a separate role defined for these types of tasks, called "infrastructure operator" who would be responsible for preparing the operating environment using the native language of the platform that the application is running on. This keeps infrastructure related operations out of the application model, and allows operators to use the underlying platform's RBAC mechanism as well. For the second scenario, startup order isn't going to help, and could even be harmful if application code depends on it. Startup ordering won't guarantee execution timing and readiness once components are started, and a component can always fail, timeout, or restart at any time after the initial deployment. Applications should always expect that a dependent component may be unavailable at any time. That said, there is an opportunity to provide something better than blind retries in application code without the need to carefully order startup operations. A combination of health probes and service mesh functionality should be able to provide the kind of fault tolerance needed. |
Applications usually still need retries to be reliable. But it will solve the blindly retry problem once order is explicitly expressed where retry is only triggered when a real failure happens, dramatically reduce the startup time for applications that contains many components (we have some use cases). |
I suspect we'd have to go a few steps further than just start up ordering to significantly reduce the chance of a communication failure during deployment and upgrade, and even then, it would still only be reducing the chances of failure. I think a combination of readiness probes and traffic management traits (retries and back-offs) could handle this in a broader and safer manner. Reduction of start up time for applications is a more compelling reason to have ordering in my mind. What are some of the use cases you've found where the startup time is reduced dramatically? |
@vturecek It reads like we are mixing "cause" with "effect" here. Start Order is a hint so that a Component has a way to know the application topology if it cares. After this Component got this information, it can then check readiness gate of its dependencies, or do retry or back-offs, or do whatever implementation details it wants to. While before that, the Component should have a way to know whether it should start after/before some other Component, that's what's missing in current spec. |
I am not convinced that introducing sophisticated dependency diagrams into Hydra as a first-class concept is a good idea. I don't think the purpose of Hydra is to create an alternative to Terraform or Ansible. It's to define an application model that encourages following specific practices for cloud native and microservice development. I think it is reasonable to do a sequential release of N components. I think anything beyond that is outside of the scope of the tool, and should be accomplished by using multiple operational configurations and an external tool like Ansible or Terraform. |
The components in the operationalConfiguration is a flat array right now. In some cases, we might want to express the dependencies of components. Component A might need to start before Component B, or Component B might fail.
We might solve this problem by blindly retry in starting phase of components, but ideally we can express ordering requirement explicitly.
The text was updated successfully, but these errors were encountered: