Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outdated GPU monitoring info in README #1805

Closed
Tracked by #1809
eero-t opened this issue May 11, 2023 · 3 comments · Fixed by #1880
Closed
Tracked by #1809

Outdated GPU monitoring info in README #1805

eero-t opened this issue May 11, 2023 · 3 comments · Fixed by #1880

Comments

@eero-t
Copy link

eero-t commented May 11, 2023

Monitoring README claims to provide GPU usage through cAdvisor "accelerator" metrics: https://github.com/kubeflow/training-operator/blob/master/docs/monitoring/README.md

However, those have been removed from latest cAdvisor release: https://github.com/google/cadvisor/releases

Vendors provide their own monitoring solutions for GPU metrics.

@tenzen-y
Copy link
Member

@eero-t Thank you for creating this!

According to the release note, those metrics can't be supported since K8s v1.25+.

https://github.com/google/cadvisor/releases/tag/v0.47.0

So, we should remove the docs for the next training operator release!

ref: #1803

cc: @kubeflow/wg-training-leads

@tenzen-y
Copy link
Member

/help
/good-first-issue

@tenzen-y
Copy link
Member

tenzen-y commented Aug 5, 2023

/remove-help
/remove-good-first-issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants