Support GPU similar to CPU #459

larsbijl · 2019-10-02T20:56:50Z

Is your feature request related to a problem? Please describe.

CPU (cores) are first-class citizens in Host but not GPU's. Add support for GPU core count?

Describe the solution you'd like

We have machines with 8 GPU's in them. it would be nice to be able to specify a service with GPU cores and have similar to taskset affinity to the GPU's.

The text was updated successfully, but these errors were encountered:

donalm · 2020-06-10T12:24:01Z

Hey @larsbijl how are you managing those 8-GPU nodes in production currently?

larsbijl · 2020-06-10T12:38:09Z

We implemented GPU similar to CPU in our version of the opencue scheduler.

donalm · 2020-06-10T12:52:10Z

Ah - are you maintaining that as a private fork?

larsbijl · 2020-06-10T12:59:19Z

yes, we made many changes to the UI for our facilities workflow which wouldn't be relevant to others, which are mixed into the same repo. Though we are open to contributing this change back when we get time to do so.

donalm · 2020-06-10T13:17:01Z

Yeah please consider contributing the GPU patch. If an alternative approach gets merged you might have significant work to do to stay in sync with upstream.

donalm · 2020-08-10T17:06:47Z

Hey @larsbijl we have some time to work on this feature now - are you interested in contributing your code at this time? Would we be able to help you to prepare it to get merged?

If not, would you like to offer us some guidance on how you approached this before we dig into it ourselves?

larsbijl · 2020-08-11T12:44:39Z

Hey @donalm, I will make time this weekend to put in a MR with our implementation.

it will be missing the migration (never wrote one) and some of the front end implementation as that had diverged from our local branch.

donalm · 2020-08-11T14:02:17Z

That'd be amazing - thanks @larsbijl !

larsbijl · 2020-08-16T10:48:08Z

A little update on this. I have the majority of this ported over.

Some outstanding issues that I will need some help on.

for simplicity I modified the Initial migration to incorporate all the changed needed in both the tables, functions and triggers.
To keep backward compatibility for users it will need to make it into a migration.
Our cuegui and rqd have diverged too much for easy merge. I've ported what I can, but it will likely be missing elements.
We don't use windows, the GPU RQD side uses nvidia-smi directly. we will want to find an OS-agnostic method.
tests. we will definitely need to write some tests.

larsbijl added the feature request New feature label Oct 2, 2019

larsbijl mentioned this issue Oct 2, 2019

Rename gpu, idle_gpu. total_gpu, free_gpu #460

Open

bcipriano added the triaged Issue has been screened and prioritized by a project lead label Jan 24, 2020

larsbijl mentioned this issue Aug 16, 2020

feat: Add multiple GPU support #760

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GPU similar to CPU #459

Support GPU similar to CPU #459

larsbijl commented Oct 2, 2019

donalm commented Jun 10, 2020

larsbijl commented Jun 10, 2020

donalm commented Jun 10, 2020

larsbijl commented Jun 10, 2020

donalm commented Jun 10, 2020

donalm commented Aug 10, 2020

larsbijl commented Aug 11, 2020

donalm commented Aug 11, 2020

larsbijl commented Aug 16, 2020

Support GPU similar to CPU #459

Support GPU similar to CPU #459

Comments

larsbijl commented Oct 2, 2019

donalm commented Jun 10, 2020

larsbijl commented Jun 10, 2020

donalm commented Jun 10, 2020

larsbijl commented Jun 10, 2020

donalm commented Jun 10, 2020

donalm commented Aug 10, 2020

larsbijl commented Aug 11, 2020

donalm commented Aug 11, 2020

larsbijl commented Aug 16, 2020