-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Resiliency Metrics #226
Add Resiliency Metrics #226
Conversation
500f9a4
to
16f89bc
Compare
k8sClient: k8sclient.Get(logger), | ||
logger: logger, | ||
k8sClient: k8sclient.Get(logger, | ||
k8sclient.CaptureOnlyNodeLabelsInfo(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this will be always true and not contorlled by a flag or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add unit test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be empty if the node is not hyperpod (instance type doesn't start with ml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have direct access to captureOnlyNodeLabelInfo from leaderElection. also i don't see the CaptureNodeLevelInfo getting tested. any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this will be always true and not contorlled by a flag or something?
Yes this will run only on the leader agent to collect the labels for all nodes for now, we might end up collecting other labels for the node local agent as well, so we will need it for that as well.
receiver/awscontainerinsightreceiver/internal/k8sapiserver/utils.go
Outdated
Show resolved
Hide resolved
receiver/awscontainerinsightreceiver/internal/k8sapiserver/k8sapiserver.go
Outdated
Show resolved
Hide resolved
receiver/awscontainerinsightreceiver/internal/k8sapiserver/k8sapiserver.go
Outdated
Show resolved
Hide resolved
receiver/awscontainerinsightreceiver/internal/k8sapiserver/k8sapiserver_test.go
Outdated
Show resolved
Hide resolved
16f89bc
to
9def984
Compare
receiver/awscontainerinsightreceiver/internal/k8sapiserver/k8sapiserver.go
Show resolved
Hide resolved
if sageMakerHealthStatus, ok := node.Labels[SageMakerNodeHealthStatus.String()]; ok { | ||
info.Labels = make(map[Label]int8) | ||
if condition, ok := k8sutil.ParseString(sageMakerHealthStatus); ok { | ||
info.Labels[SageMakerNodeHealthStatus] = condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this map will have a max of one label for the health status and not have any other labels.
Does this even have to be a map in that case? Cant this just be a string called sageMakerNodeHealthStatus
as part of NodeInfo
and itll be empty for non hyperpod nodes. Callers can check for empty string or ret pointer and check for nil.
You can probably remove the captureOnlyNodeLabelInfo
as well at that point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling it something generic like NodeToLabelsMap
gives readers an expectation that all the labels set on the node are accessible via this map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map are part of an allowlist that uses an ENUM instead of strings to store the label keys/values, any additional labels to be stored will have to be made into an ENUM, thus we will continue to use this pattern for now.
receiver/awscontainerinsightreceiver/internal/k8sapiserver/k8sapiserver.go
Show resolved
Hide resolved
9def984
to
74044e9
Compare
74044e9
to
a73d727
Compare
a73d727
to
147a7ae
Compare
Description:
The HyperPod team will tag each node on the Kubernetes level with a label which describes it’s health status, this PR goal is to add a feature to extract these labels value and emit a metrics to CW.
Testing: Deployed a custom agent on a testing cluster
Documentation: N/A