Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable PCI passthrough on nerc-ocp-test cluster #563

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

naved001
Copy link
Contributor

@naved001 naved001 commented Oct 8, 2024

This PR does 3 things:

  • Enable IOMMU
  • Bind device (the GPU, V100) to the vfio driver
  • permit device passthrough in the kubevirt resource

Before merging, I need to apply a label to the node which feels like would be bunch of work with argocd.

oc label node wrk-3 nvidia.com/gpu.deploy.operands=false 

It would be interesting to see how this would work with the A100 systems considering each host has multiple identical GPUs and they will all have the same vendor and device PCI IDs.

I am going to mark it as draft because the GPU is in use for other testing.

https://docs.openshift.com/container-platform/4.15/virt/virtual_machines/advanced_vm_management/virt-configuring-pci-passthrough.html

@naved001 naved001 requested review from jtriley and computate October 8, 2024 17:59
@naved001 naved001 force-pushed the ocp-test/pci-passthrough branch from e8939fb to 36e8e98 Compare October 8, 2024 18:16
@naved001 naved001 requested review from larsks and schwesig October 8, 2024 19:03
Copy link
Contributor

@schwesig schwesig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides waiting for free GPUs and the manual labels,
👍 LGTM

@naved001
Copy link
Contributor Author

naved001 commented Nov 4, 2024

@computate This is ready for review. Is it okay if I apply the label to wrk-3 which will kick off some nvidia pods from it? (cc @dystewart )

@computate
Copy link
Member

Fine with me @naved001 , nice work here!

Copy link
Member

@larsks larsks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all looks reasonable.

@naved001 naved001 merged commit b47eaec into OCP-on-NERC:main Nov 5, 2024
2 checks passed
naved001 added a commit to naved001/nerc-ocp-config that referenced this pull request Nov 13, 2024
Undo all the changes we did since this PR OCP-on-NERC#563
naved001 added a commit to naved001/nerc-ocp-config that referenced this pull request Nov 13, 2024
Undo all the changes we did since this PR OCP-on-NERC#563
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants