Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ApplicationScope: add health core scope #133

Merged
merged 3 commits into from
Sep 13, 2019

Conversation

hongchaodeng
Copy link
Member

No description provided.

@hongchaodeng
Copy link
Member Author

@technosophos @mikkelhegn
Please help review it :)

@hongchaodeng hongchaodeng mentioned this pull request Sep 9, 2019
11 tasks
@@ -133,31 +133,43 @@ The health scope on its own does not take any action based on health status. It
- Application upgrade traits can monitor the aggregate health of a health scope and decide when to initiate an automatic rollback.
- Monitoring applications can monitor the aggregate health of a health scope to issue alerts.

#### Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still an example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression goes that Core Scope should be defined explicitly, not as an example.
We will then add examples on how to use Health scope once https://github.com/microsoft/hydra-spec/issues/130 is addressed.

Copy link
Member

@wonderflow wonderflow Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur that this should be defined explicitly. But should we add some way for extension?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can set properties beyond what was defined. The Hydra model itself is extensible by design.

@mikkelhegn
Copy link
Contributor

The idea with the health scope is to evaluate health across multiple components, to use in e.g. an upgrade scenario to monitor whether upgrading an individual component is triggering failures on dependent components. This is needed to be able to trigger roll-backs based on a set of defined thresholds. This concept exists in Service Fabric, and we would like to carry it over: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-application-upgrade#health-checks-during-upgrades.

I agree this should be explicit, as it's a core scope, but I think the current example is closer to what we need. We need a set of parameters for the health scope to know when to report error, warning or good, and it would be great to have explicit rules as well as average thresholds. E.g. overall % of components healthy threshold, and an option to deem specific components required to be healthy for the scope to be health etc.

@hongchaodeng
Copy link
Member Author

@mikkelhegn Thanks for the suggestion. The docs you shared looks a great learning resource!

I have added the an option to let user provide a list of components that should be healthy to consider the scope healthy. This is a very useful parameter since

I have added back the healthThresholdPercentage. TBH, we haven't really seen any use cases like this. If a Deployment only needs partial to be healthy to proceed upgrade, we just pass it to the workload controller to determine itself wether to report healthy in the Component instance (e.g. Deployment). I will let you define this since you have more experience on it.

Health scope requires some way to report health status. We define the "some way" here as the probe. The existing log query is one type of probe, but seems limited. That's why we try to make it more extensible and define the probe. We need to support probing status subresource, http endpoints, prom queries, etc. The probe-method/endpoint/timeout/interval are ubiquitous to support all of these.

Let me know what you think. Thanks very much!

@suhuruli
Copy link
Contributor

I will wait for @technosophos to review this before I merge this. We have some folks travelling/on leave this week. This will get looked at next Monday PST :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants