-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ApplicationScope: add health
core scope
#133
Conversation
@technosophos @mikkelhegn |
@@ -133,31 +133,43 @@ The health scope on its own does not take any action based on health status. It | |||
- Application upgrade traits can monitor the aggregate health of a health scope and decide when to initiate an automatic rollback. | |||
- Monitoring applications can monitor the aggregate health of a health scope to issue alerts. | |||
|
|||
#### Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression goes that Core Scope should be defined explicitly, not as an example.
We will then add examples on how to use Health
scope once https://github.com/microsoft/hydra-spec/issues/130 is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that this should be defined explicitly. But should we add some way for extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can set properties beyond what was defined. The Hydra model itself is extensible by design.
The idea with the health scope is to evaluate health across multiple components, to use in e.g. an upgrade scenario to monitor whether upgrading an individual component is triggering failures on dependent components. This is needed to be able to trigger roll-backs based on a set of defined thresholds. This concept exists in Service Fabric, and we would like to carry it over: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade#health-checks-during-upgrades. I agree this should be explicit, as it's a core scope, but I think the current example is closer to what we need. We need a set of parameters for the health scope to know when to report error, warning or good, and it would be great to have explicit rules as well as average thresholds. E.g. overall % of components healthy threshold, and an option to deem specific components required to be healthy for the scope to be health etc. |
1926c24
to
532cb7e
Compare
@mikkelhegn Thanks for the suggestion. The docs you shared looks a great learning resource! I have added the an option to let user provide a list of components that should be healthy to consider the scope healthy. This is a very useful parameter since I have added back the healthThresholdPercentage. TBH, we haven't really seen any use cases like this. If a Deployment only needs partial to be healthy to proceed upgrade, we just pass it to the workload controller to determine itself wether to report healthy in the Component instance (e.g. Deployment). I will let you define this since you have more experience on it. Health scope requires some way to report health status. We define the "some way" here as the probe. The existing log query is one type of probe, but seems limited. That's why we try to make it more extensible and define the probe. We need to support probing status subresource, http endpoints, prom queries, etc. The probe-method/endpoint/timeout/interval are ubiquitous to support all of these. Let me know what you think. Thanks very much! |
I will wait for @technosophos to review this before I merge this. We have some folks travelling/on leave this week. This will get looked at next Monday PST :) |
No description provided.