Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NVIDIA GPU node labeller to scheduling-gpus.md #16090

Closed
wants to merge 2 commits into from
Closed

Add NVIDIA GPU node labeller to scheduling-gpus.md #16090

wants to merge 2 commits into from

Conversation

jjacobelli
Copy link

Improving the scheduling-gpus.md documentation by adding the NVIDIA GPU node labeller

@k8s-ci-robot
Copy link
Contributor

Welcome @Ethyling!

It looks like this is your first PR to kubernetes/website 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/website has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 26, 2019
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Aug 26, 2019
@netlify
Copy link

netlify bot commented Aug 26, 2019

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 646cbfd

https://deploy-preview-16090--kubernetes-io-master-staging.netlify.com

@jjacobelli jjacobelli marked this pull request as ready for review September 3, 2019 22:01
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 3, 2019
@jjacobelli
Copy link
Author

Hey @vishh @Rajakavitha1, can we have some review on this? :)

@RenaudWasTaken
Copy link

ping on this :) !
@vishh @Rajakavitha1

@zacharysarah
Copy link
Contributor

/hold

@Ethyling We're in the middle of a larger conversation about how to handle third party content. I'm holding this discussion pending the outcome of a KEP from #15748.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2020
@sftim
Copy link
Contributor

sftim commented Jan 14, 2020

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 14, 2020
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ethyling - SIG Docs hasn't yet got a clear and agreed policy on 3rd party content.

My own opinion: documentation should cover what's needed to run Kubernetes based on the main kubernetes/kubernetes repository and its dependencies. With exceptions where warranted.

Also my opinion, this is on the borderline of whether or not it's acceptable. An alternative might be to pop this table into a repo or other site that NVIDIA control, and hyperlink there.

If you're willing to revise this, here's some feedback. I hope it's useful.

Comment on lines 163 to 174
| nvidia.com/cuda.runtime.major | Integer | Major of the version of CUDA |
| nvidia.com/cuda.runtime.minor | Integer | Minor of the version of CUDA |
| nvidia.com/cuda.driver.major | Integer | Major of the version of NVIDIA driver |
| nvidia.com/cuda.driver.minor | Integer | Minor of the version of NVIDIA driver |
| nvidia.com/cuda.driver.rev | Integer | Revision of the version of NVIDIA driver |
| nvidia.com/gpu.family | String | Architecture family of the GPU |
| nvidia.com/gpu.machine | String | Machine type |
| nvidia.com/gpu.product | String | Model of the GPU |
| nvidia.com/gpu.memory | Integer | Memory of the GPU in Mb |
| nvidia.com/gpu.compute.major | Integer | Major of the compute capabilities |
| nvidia.com/gpu.compute.minor | Integer | Minor of the compute capabilities |
| nvidia.com/gfd.timestamp | Integer | Timestamp of the generated labels |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the label names in backticks, and sort them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed!

@RenaudWasTaken
Copy link

RenaudWasTaken commented Jan 26, 2020

Also my opinion, this is on the borderline of whether or not it's acceptable. An alternative might be to pop this table into a repo or other site that NVIDIA control, and hyperlink there.

For the alternative, this is surfaced here: https://github.com/NVIDIA/gpu-feature-discovery#labels
I'm happy to update this pull request with your suggestion, it is however not clear how I should treat the existing documentation that describes this feature (node labelling) but for AMD GPUs.

https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#node-labeller

An alternative path would be to follow the example set by kubeadm with network plugins: a content area with multiple panels (aka tabs).
With the default tab being empty / describing the feature.

@kbhawkey
Copy link
Contributor

@Ethyling ,
I plan to close this PR as it has been open for some time.
As there are outstanding changes, feel free to reopen the pull request.
/close

@k8s-ci-robot
Copy link
Contributor

@kbhawkey: Closed this PR.

In response to this:

@Ethyling ,
I plan to close this PR as it has been open for some time.
As there are outstanding changes, feel free to reopen the pull request.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RenaudWasTaken
Copy link

/reopen

Hello @kbhawkey !
We are waiting for some feedback on how to structure the PR.
If you have some feedback, I'm happy to update the PR.

cc @sftim did you have time to look at my previous comment?

Thanks!

@k8s-ci-robot k8s-ci-robot reopened this Feb 16, 2020
@k8s-ci-robot
Copy link
Contributor

@RenaudWasTaken: Reopened this PR.

In response to this:

/reopen

Hello @kbhawkey !
We are waiting for some feedback on how to structure the PR.
If you have some feedback, I'm happy to update the PR.

cc @sftim did you have time to look at my previous comment?

Thanks!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2020
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 17, 2020
@sftim
Copy link
Contributor

sftim commented Mar 17, 2020

Does this fit with doc-policies-for-third-party-content ?

@RenaudWasTaken
Copy link

@sftim reading through the document, it seems that we fall under story #5

In PR #16766 @pouledodue proposed adding Hertzner Cloud Controller to the list of vendors that have implemented a cloud controller manager. That PR was held pending the outcome of this KEP, then later merged.

If my understanding is correct then this PR fit with the policies-for-third-party-content.

Signed-off-by: Renaud Gaubert <[email protected]>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign steveperry-53
You can assign the PR to them by writing /assign @steveperry-53 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RenaudWasTaken
Copy link

/assign @steveperry-53

@RenaudWasTaken
Copy link

Actually sorry I should have assigned @sftim
/assign @sftim

@RenaudWasTaken
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 20, 2020
@sftim
Copy link
Contributor

sftim commented Mar 20, 2020

To help with review, @Ethyling, can you amend the PR description with an explanation of why this fits in with the current content guide?

@sftim
Copy link
Contributor

sftim commented Apr 13, 2020

/close

Does not seem to fit in with current content guide

@k8s-ci-robot
Copy link
Contributor

@sftim: Closed this PR.

In response to this:

/close

Does not seem to fit in with current content guide

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants