component startup ordering #23

xiang90 · 2019-07-03T17:56:04Z

The components in the operationalConfiguration is a flat array right now. In some cases, we might want to express the dependencies of components. Component A might need to start before Component B, or Component B might fail.

We might solve this problem by blindly retry in starting phase of components, but ideally we can express ordering requirement explicitly.

vturecek · 2019-07-09T00:17:35Z

It might help to break this down into two scenarios:

The first is a situation where I need a start-up task to do some initialization before other components can run. This would be a run-to-completion task that needs to finish before creating any other components.
The second is a component dependency, where component A must be up and running for component B to work. In other words, component B is useless without component A.

The first scenario reminds me of environment setup tasks that require elevated privileges. I don't think the application model should define this. There is a separate role defined for these types of tasks, called "infrastructure operator" who would be responsible for preparing the operating environment using the native language of the platform that the application is running on. This keeps infrastructure related operations out of the application model, and allows operators to use the underlying platform's RBAC mechanism as well.

For the second scenario, startup order isn't going to help, and could even be harmful if application code depends on it. Startup ordering won't guarantee execution timing and readiness once components are started, and a component can always fail, timeout, or restart at any time after the initial deployment. Applications should always expect that a dependent component may be unavailable at any time.

That said, there is an opportunity to provide something better than blind retries in application code without the need to carefully order startup operations. A combination of health probes and service mesh functionality should be able to provide the kind of fault tolerance needed.

xiang90 · 2019-07-09T00:21:01Z

For the second scenario, startup order isn't going to help, and could even be harmful if application code depends on it. Startup ordering won't guarantee execution timing and readiness once components are started, and a component can always fail, timeout, or restart at any time after the initial deployment. Applications should always expect that a dependent component may be unavailable at any time.

Applications usually still need retries to be reliable. But it will solve the blindly retry problem once order is explicitly expressed where retry is only triggered when a real failure happens, dramatically reduce the startup time for applications that contains many components (we have some use cases).

vturecek · 2019-07-09T22:34:23Z

I suspect we'd have to go a few steps further than just start up ordering to significantly reduce the chance of a communication failure during deployment and upgrade, and even then, it would still only be reducing the chances of failure. I think a combination of readiness probes and traffic management traits (retries and back-offs) could handle this in a broader and safer manner.

Reduction of start up time for applications is a more compelling reason to have ordering in my mind. What are some of the use cases you've found where the startup time is reduced dramatically?

resouer · 2019-07-28T05:19:43Z

For the second scenario, startup order isn't going to help, and could even be harmful if application code depends on it. Startup ordering won't guarantee execution timing and readiness once components are started, and a component can always fail, timeout, or restart at any time after the initial deployment. Applications should always expect that a dependent component may be unavailable at any time.

@vturecek It reads like we are mixing "cause" with "effect" here.

Start Order is a hint so that a Component has a way to know the application topology if it cares.

After this Component got this information, it can then check readiness gate of its dependencies, or do retry or back-offs, or do whatever implementation details it wants to.

While before that, the Component should have a way to know whether it should start after/before some other Component, that's what's missing in current spec.

technosophos · 2019-08-27T23:47:10Z

I am not convinced that introducing sophisticated dependency diagrams into Hydra as a first-class concept is a good idea. I don't think the purpose of Hydra is to create an alternative to Terraform or Ansible. It's to define an application model that encourages following specific practices for cloud native and microservice development.

I think it is reasonable to do a sequential release of N components. I think anything beyond that is outside of the scope of the tool, and should be accomplished by using multiple operational configurations and an external tool like Ansible or Terraform.

hongchaodeng mentioned this issue Sep 19, 2019

Issue tracker of sprint work #131

Closed

11 tasks

mikkelhegn mentioned this issue Sep 30, 2019

[WIP] AppScope metadata #145

Closed

vturecek mentioned this issue Oct 31, 2019

Create a roadmap for next release #229

Open

wonderflow mentioned this issue Nov 1, 2019

Add FAQ for spec #239

Open

hongchaodeng mentioned this issue Mar 10, 2020

Resource Dependency in OAM #326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

component startup ordering #23

component startup ordering #23

xiang90 commented Jul 3, 2019

vturecek commented Jul 9, 2019

xiang90 commented Jul 9, 2019

vturecek commented Jul 9, 2019

resouer commented Jul 28, 2019 •

edited

Loading

technosophos commented Aug 27, 2019

component startup ordering #23

component startup ordering #23

Comments

xiang90 commented Jul 3, 2019

vturecek commented Jul 9, 2019

xiang90 commented Jul 9, 2019

vturecek commented Jul 9, 2019

resouer commented Jul 28, 2019 • edited Loading

technosophos commented Aug 27, 2019

resouer commented Jul 28, 2019 •

edited

Loading