Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API DESIGN REVIEW multi-gpu-ratios #9155

Closed
jinkos opened this issue Jan 22, 2018 · 17 comments
Closed

API DESIGN REVIEW multi-gpu-ratios #9155

jinkos opened this issue Jan 22, 2018 · 17 comments

Comments

@jinkos
Copy link

jinkos commented Jan 22, 2018

I am submitting the following design suggestion document...

API Design Review Document

see My Github Tutorial

Summary

A modified version of keras.utils.multi_gpu_model() that takes an extra parameter: a list of ratios denoting how the GPU load should be split. e.g...

multi_gpu_model(model,gpus=[0,1],ratios=[4,3]) will spread the samples per batch roughly in the ratio of 4:3 between GPU:0 and GPU:1

@fchollet
Copy link
Collaborator

Reposting what I posted on the mailing list thread, so other people can reply to it here:

I think this would be a bit of a niche functionality. It is generally a reasonable assumption that all GPU devices on a machine have the same capabilities.

@jinkos
Copy link
Author

jinkos commented Jan 24, 2018

The problem for me was that I already had a GPU and wanted to buy a new one, so that I would have two. But people keep bringing out new, faster GPUs so the new one I bought was significantly faster.

For people who can only afford to occasionally buy one new GPU, this is quite a big deal. It's a shame if you are stuck at the speed for your slowest GPU.

I think for people building their own Linux boxes on a shoestring, this must be quite a common problem.

James

@ahundt
Copy link
Contributor

ahundt commented Jan 24, 2018

I think this would be a bit of a niche functionality. It is generally a reasonable assumption that all GPU devices on a machine have the same capabilities.

I believe this change is valuable for a very important reason: GPUs are very expensive and the proposed change better supports those that can not afford to buy many of the same GPU.

I'm a grad student and I bought one pre-owned to get started with deep learning. I then bought another different pre-owned GPU several months later with more memory once I decided it was worth more investment.

Update 2018-01-25: I also know of several other people I've collaborated with a bit on open source projects both inside and outside the US with multiple different GPUs in their machine.

@ahundt
Copy link
Contributor

ahundt commented Jan 24, 2018

@jinkos could you also consider adding a StagingArea to your changes? I believe your proposed change + a StagingArea could make it possible to get a very substantial performance boost if you have two of the same or two different GPUs.

I started such a change at master...ahundt:StagingArea but the dimensions are off and I haven't had the time to fix it.

@jinkos
Copy link
Author

jinkos commented Jan 25, 2018 via email

@ahundt
Copy link
Contributor

ahundt commented Jan 25, 2018

@TimZaman knows about this intimately. He gave some useful details on another pull request I made a while ago which you can see at #6928 (comment). Since the PR is so long it doesn't always show up, you may have to click "View more" twice, then search for the username TimZaman, there are pictures of tensorboard there.

@TimZaman
Copy link
Contributor

TimZaman commented Jan 26, 2018

Fixing skewed GPU ratios.

First response
Don't fix this. Make sure your GPUs are aligned.

Nuanced response
Use https://github.com/uber/horovod/tree/master/horovod to distribute keras over multiple GPUs, this is also faster than what's in Keras itself, and easy to set up.
Then, per process (so per GPU) you give it a different batch size, to fix your GPU muscle misalignment.

Optimal graph

Keras puts the user first, with a proper tradeoff with speed. If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).
Also, your datapipe should be in pure tf for optimal perf.
Provided you have an optimal graph for your model, and an optimal graph for your data input; create a tf.StagingArea to connect those two. Put that area on the GPU explicitly; that will mean that the model (running on GPU) doesn't have to wait for CPU-GPU transfers.
What you should do here:
before step 1: put 1 batch in the buffer
with every step: take 1 step from the buffer (your model is connected to this) and put one in the buffer too. Putting something in the buffer can be done by adding **kwargs to your fit which would then be passed on into the tensorflow_backend's Function so that that "put op" will be run with each step.

@ahundt coming to GTC?

@fchollet
Copy link
Collaborator

fchollet commented Jan 26, 2018

this is also faster than what's in Keras itself

What can we do to improve multi_gpu_model in Keras, especially performance for small models? This is an outstanding item in our "requests for contributions" list.

@TimZaman
Copy link
Contributor

TimZaman commented Jan 26, 2018

@fchollet iirc multi_gpu_model merges when it gets to the loss function; instead of having a model-parallel loss computation. Furthermore, the distinct processes used by horovod means you don't have to optimize [or multiprocess] your datapipe as much as vanilla Keras, even if you have a homebrew np datapipe.

Another problem is that for multi-gpu, the StagingArea won't work as well; since your StagingArea should be on GPU ideally; you need to add the StagingArea per GPU. Since multi_gpu_model does the split for you, you cannot split anything over the gpus before you enter multi_gpu_model. The best one could do is add a custom layer with the tf.StagingArea; so that this custom layer is on each GPU, which might not be a bad idea at all; I realize while I am writing this.

@fchollet are you doing a book signing at GTC?

@ahundt
Copy link
Contributor

ahundt commented Jan 26, 2018

Don't fix this. Make sure your GPUs are aligned.

By aligned do you mean the identical model?

If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).

I'll give it another try when 1.5 is released, last time I tried tf.keras it choked on import tf.keras.backend as K and I was too short on time to debug.

@ahundt coming to GTC?

It sounds great but I don't think I can get funding for it.

@TimZaman
Copy link
Contributor

Don't fix this. Make sure your GPUs are aligned.
By aligned do you mean the identical model?

I mean: don't mix different gpu types in one system.

@fchollet
Copy link
Collaborator

@ahundt you probably want

from tensorflow import keras
K = keras.backend

@fchollet
Copy link
Collaborator

are you doing a book signing at GTC?

This was in the plans but I haven't had any update on it for a while. Maybe?

@fchollet
Copy link
Collaborator

Closing since we won't implement this API change.

@jinkos
Copy link
Author

jinkos commented Jan 27, 2018 via email

@ahundt
Copy link
Contributor

ahundt commented Jan 29, 2018

I mean: don't mix different gpu types in one system.

Too late, but so far together they have certainly been faster than one 👍. Prices are too high for me to do anything differently at the moment, thanks bitcoin. :-)

@ozabluda
Copy link
Contributor

@TimZaman
If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).

Why are those faster in tf.keras?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants