-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2371: clarify plan for cAdvisor and target at Beta for 1.24 #3003
Conversation
@@ -225,7 +225,7 @@ As such, this proposal will be broken up into what needs to be done for each of | |||
2. Add CRI Pod Level Stats message to CRI protobuf that includes all [Pod Level Stats](#summary-pod-stats-object) metrics from Summary API. | |||
3. Add support for the new CRI additions in supported container runtimes (CRI-O and containerd). | |||
4. Switch Kubelet's CRI stats provider from querying container and pod level stats from cAdvisor to newly added CRI pod and container level stats | |||
5. cAdvisor should stop collecting container and pod level stats. If any other components need container or pod level stats from cAdvisor, the CRI implementation should be queried instead. | |||
5. cAdvisor should be updated to support no longer collecting stats that are duplicated with CRI implementation. Any client that requires the metrics that are reported by the CRI should gather them from the CRI instead of cAdvisor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this somewhat go against the eventual goal of removing /metrics/cadvisor
/externalizing metrics, at least unless we add a step 6 equivalent to the existing step 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm yeah we're going to have to figure out how to merge the two. I believe there are metrics that cadvisor provides that are out of scope for CRI that should be provided by cadvisor, but there are a set that should no longer be provided by cadvisor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts @bobbypage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, my thinking is that the high level goal is to provide ability for clients to obtain metrics from CRI (i.e. summary api) rather than cAdvisor.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Signed-off-by: Peter Hunt <[email protected]>
Some pieces that were initially slated for Alpha have been pushed to Beta Signed-off-by: Peter Hunt <[email protected]>
5ed8a28
to
1fee62c
Compare
I've taken over my own PR and updated it to cover its status movign to beta in 1.24 |
1fee62c
to
36247ad
Compare
Signed-off-by: Peter Hunt <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
36247ad
to
33d0afd
Compare
/cc @bobbypage |
- CRI implementations should report any difficulties collecting the stats, and by Beta the CRI stats implementation should perform better than they did with CRI+cAdvisor. | ||
- CRI implementations should support their equivalent of `/metrics/cadvisor`, passing the performance and conformance suite created in Alpha. | ||
- Conduct research to find the set of metrics from `/metrics/cadvisor` that compliant CRI implementations must expose. | ||
- Conformance tests for the fields in `/metrics/cadvisor` should be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full k8s conformance tests will be significant effort, perhaps instead we should just ensure that the fields we want to support have overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will we continually verify that without testing? especially as time goes on. If we consider these fields a sort of API of kube, then we should probably do something to verify they're there
- CRI implementations should support their equivalent of `/metrics/cadvisor`, passing the performance and conformance suite created in Alpha. | ||
- Conduct research to find the set of metrics from `/metrics/cadvisor` that compliant CRI implementations must expose. | ||
- Conformance tests for the fields in `/metrics/cadvisor` should be created. | ||
- Performance tests for CPU/memory usage between Kubelet/cAdvisor/CRI implementation should be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think k8s already has OSS perf tests, perhaps we should rephrase this to say "Validate performance impact of this feature"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
Kubelet should fail early if the CRI implementation won't report the expected metrics. | ||
- For Beta/GA releases, components that rely on `/metrics/cadvisor` should take the decided action (use `/stats/summary`, or use the Kubelet provided `/metrics/cadvisor`). | ||
Kubelet should fallback to use cAdvisor if the CRI implementation won't report the expected metrics. | ||
- For Beta/GA releases, components that rely on `/metrics/cadvisor` should take the decided action (use `/stats/summary`, or use the CRI provided `/metrics/cadvisor`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename, CRI provided -> CRI provided endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded
@@ -651,12 +668,17 @@ _This section must be completed when targeting beta graduation to a release._ | |||
checking if there are objects with field X set) may be a last resort. Avoid | |||
logs or events for this purpose. | |||
|
|||
The source of `/metrics/cadvisor` is the CRI implementation, not cAdvisor. Further, if the CRI implementation was using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe avoid providing name for endpoint directly...
The source of the "new" prometheus endpoint will be CRI ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworeded a bit
@@ -669,6 +691,9 @@ the health of the service?** | |||
job creation time) for cron job <= 10% | |||
- 99,9% of /health requests per day finish with 200 code | |||
|
|||
- 100% coverage of metrics reported by cAdvisor reported by CRI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of the metrics we said may not be needed; so promising 100% coverage may be a bit too optimistic :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright I dropped it, though I think we may need to provide close to it..
011b3da
to
2ae4a99
Compare
updated with the feedback! |
Thank you for updating! /lgtm |
/remove-lifecycle stale |
/uncc |
Signed-off-by: Peter Hunt <[email protected]>
2ae4a99
to
ac9c25d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
for PRR
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updates.
the <2% metric should be qualified a little further when we do testing for particular density of pods.
/approve
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, ehashman, haircommander The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Peter Hunt [email protected]
addressing comments in Add KEP for cAdvisor-less, CRI-full Container and Pod Stats #2364 (comment)