Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter pods by node name and improve oldest pod selection logic #37

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cybergeek2077
Copy link
Contributor

The device plugin allocates a device by selecting the oldest pod, but it did not filter pods by node. This caused a bug if different nodes' pods have the same time assigned by Volcano (e.g., a Gang scheduler). It may get a pod on another node and so assign another node's GPU.

This PR fixes that bug by filtering pods by node name and also improves the oldest pod selection logic by filtering pods that are pending and have the right Volcano annotations.

Hami implement reference: Project-HAMi/HAMi#340

@cybergeek2077
Copy link
Contributor Author

@archlitchi The build failed with an error saying there's no space left on the device, but I can't find a way to restart it.

@archlitchi
Copy link
Contributor

hi, could you resolve this conflict? you can leave the rest to me

@cybergeek2077
Copy link
Contributor Author

I have resolved the conflict, also added filter the AssignedNodeAnnotations in my PR.
Below is a brief of my PR.

This pull request includes changes to the pkg/plugin/vgpu/util/util.go file to enhance the filtering and selection of pods based on specific criteria. The most important changes include modifying the GetPendingPod function to filter pods for a specific node and updating the getOldestPod function to select the oldest pending pod with specific annotations.

Improvements to pod filtering and selection:

  • pkg/plugin/vgpu/util/util.go: Modified the GetPendingPod function to filter pods for the specified node using a FieldSelector in ListOptions.
  • pkg/plugin/vgpu/util/util.go: Updated the getOldestPod function to select the oldest pending pod with specific annotations, ensuring that only pods in the v1.PodPending phase and with the correct DeviceBindPhase and AssignedNodeAnnotations are considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants