Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GPU similar to CPU #459

Open
larsbijl opened this issue Oct 2, 2019 · 9 comments
Open

Support GPU similar to CPU #459

larsbijl opened this issue Oct 2, 2019 · 9 comments
Labels
feature request New feature triaged Issue has been screened and prioritized by a project lead

Comments

@larsbijl
Copy link
Contributor

larsbijl commented Oct 2, 2019

Is your feature request related to a problem? Please describe.

CPU (cores) are first-class citizens in Host but not GPU's. Add support for GPU core count?

Describe the solution you'd like

We have machines with 8 GPU's in them. it would be nice to be able to specify a service with GPU cores and have similar to taskset affinity to the GPU's.

@larsbijl larsbijl added the feature request New feature label Oct 2, 2019
@bcipriano bcipriano added the triaged Issue has been screened and prioritized by a project lead label Jan 24, 2020
@donalm
Copy link
Contributor

donalm commented Jun 10, 2020

Hey @larsbijl how are you managing those 8-GPU nodes in production currently?

@larsbijl
Copy link
Contributor Author

We implemented GPU similar to CPU in our version of the opencue scheduler.

@donalm
Copy link
Contributor

donalm commented Jun 10, 2020

Ah - are you maintaining that as a private fork?

@larsbijl
Copy link
Contributor Author

yes, we made many changes to the UI for our facilities workflow which wouldn't be relevant to others, which are mixed into the same repo. Though we are open to contributing this change back when we get time to do so.

@donalm
Copy link
Contributor

donalm commented Jun 10, 2020

Yeah please consider contributing the GPU patch. If an alternative approach gets merged you might have significant work to do to stay in sync with upstream.

@donalm
Copy link
Contributor

donalm commented Aug 10, 2020

Hey @larsbijl we have some time to work on this feature now - are you interested in contributing your code at this time? Would we be able to help you to prepare it to get merged?

If not, would you like to offer us some guidance on how you approached this before we dig into it ourselves?

@larsbijl
Copy link
Contributor Author

Hey @donalm, I will make time this weekend to put in a MR with our implementation.

it will be missing the migration (never wrote one) and some of the front end implementation as that had diverged from our local branch.

@donalm
Copy link
Contributor

donalm commented Aug 11, 2020

That'd be amazing - thanks @larsbijl !

@larsbijl
Copy link
Contributor Author

A little update on this. I have the majority of this ported over.

Some outstanding issues that I will need some help on.

  1. for simplicity I modified the Initial migration to incorporate all the changed needed in both the tables, functions and triggers.
    To keep backward compatibility for users it will need to make it into a migration.

  2. Our cuegui and rqd have diverged too much for easy merge. I've ported what I can, but it will likely be missing elements.

  3. We don't use windows, the GPU RQD side uses nvidia-smi directly. we will want to find an OS-agnostic method.

  4. tests. we will definitely need to write some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature triaged Issue has been screened and prioritized by a project lead
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants