API DESIGN REVIEW multi-gpu-ratios #9155

jinkos · 2018-01-22T13:55:34Z

I am submitting the following design suggestion document...

Summary

A modified version of keras.utils.multi_gpu_model() that takes an extra parameter: a list of ratios denoting how the GPU load should be split. e.g...

multi_gpu_model(model,gpus=[0,1],ratios=[4,3]) will spread the samples per batch roughly in the ratio of 4:3 between GPU:0 and GPU:1

fchollet · 2018-01-23T19:07:11Z

Reposting what I posted on the mailing list thread, so other people can reply to it here:

I think this would be a bit of a niche functionality. It is generally a reasonable assumption that all GPU devices on a machine have the same capabilities.

jinkos · 2018-01-24T07:16:21Z

The problem for me was that I already had a GPU and wanted to buy a new one, so that I would have two. But people keep bringing out new, faster GPUs so the new one I bought was significantly faster.

For people who can only afford to occasionally buy one new GPU, this is quite a big deal. It's a shame if you are stuck at the speed for your slowest GPU.

I think for people building their own Linux boxes on a shoestring, this must be quite a common problem.

James

ahundt · 2018-01-24T22:42:23Z

I think this would be a bit of a niche functionality. It is generally a reasonable assumption that all GPU devices on a machine have the same capabilities.

I believe this change is valuable for a very important reason: GPUs are very expensive and the proposed change better supports those that can not afford to buy many of the same GPU.

I'm a grad student and I bought one pre-owned to get started with deep learning. I then bought another different pre-owned GPU several months later with more memory once I decided it was worth more investment.

Update 2018-01-25: I also know of several other people I've collaborated with a bit on open source projects both inside and outside the US with multiple different GPUs in their machine.

ahundt · 2018-01-24T23:12:36Z

@jinkos could you also consider adding a StagingArea to your changes? I believe your proposed change + a StagingArea could make it possible to get a very substantial performance boost if you have two of the same or two different GPUs.

I started such a change at master...ahundt:StagingArea but the dimensions are off and I haven't had the time to fix it.

jinkos · 2018-01-25T09:50:10Z

>could you also consider adding a StagingArea to your changes? I believe your proposed change + a StagingArea could make it possible to get a very substantial performance boost if you have two different GPUs

StagingArea? TF add to their bloated toolkit so often that I have missed the whole StagingArea thing. I will experiment with your code for my current Kaggle competition and report back when I have properly got my head around it. Happy to work on anything that makes things faster. If Keras can do for pipelining what it has done for modelling it would be unstoppable. But that's a kind of architecture thing and I don't think it can be tackled with a tweak here and a tweak there. Interfaces for pipelining seem to be changing faster than interfaces for modelling at the moment. Is there a Keras strategy/view for this? This seems crucial, to me. The 'n different GPUs problem' and the ratios solution is a trivial few lines of code, it's already written and quite well tested.

ahundt · 2018-01-25T19:44:53Z

@TimZaman knows about this intimately. He gave some useful details on another pull request I made a while ago which you can see at #6928 (comment). Since the PR is so long it doesn't always show up, you may have to click "View more" twice, then search for the username TimZaman, there are pictures of tensorboard there.

TimZaman · 2018-01-26T21:02:05Z

Fixing skewed GPU ratios.

First response
Don't fix this. Make sure your GPUs are aligned.

Nuanced response
Use https://github.com/uber/horovod/tree/master/horovod to distribute keras over multiple GPUs, this is also faster than what's in Keras itself, and easy to set up.
Then, per process (so per GPU) you give it a different batch size, to fix your GPU muscle misalignment.

Optimal graph

Keras puts the user first, with a proper tradeoff with speed. If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).
Also, your datapipe should be in pure tf for optimal perf.
Provided you have an optimal graph for your model, and an optimal graph for your data input; create a tf.StagingArea to connect those two. Put that area on the GPU explicitly; that will mean that the model (running on GPU) doesn't have to wait for CPU-GPU transfers.
What you should do here:
before step 1: put 1 batch in the buffer
with every step: take 1 step from the buffer (your model is connected to this) and put one in the buffer too. Putting something in the buffer can be done by adding **kwargs to your fit which would then be passed on into the tensorflow_backend's Function so that that "put op" will be run with each step.

@ahundt coming to GTC?

fchollet · 2018-01-26T21:12:07Z

this is also faster than what's in Keras itself

What can we do to improve multi_gpu_model in Keras, especially performance for small models? This is an outstanding item in our "requests for contributions" list.

TimZaman · 2018-01-26T21:28:25Z

@fchollet iirc multi_gpu_model merges when it gets to the loss function; instead of having a model-parallel loss computation. Furthermore, the distinct processes used by horovod means you don't have to optimize [or multiprocess] your datapipe as much as vanilla Keras, even if you have a homebrew np datapipe.

Another problem is that for multi-gpu, the StagingArea won't work as well; since your StagingArea should be on GPU ideally; you need to add the StagingArea per GPU. Since multi_gpu_model does the split for you, you cannot split anything over the gpus before you enter multi_gpu_model. The best one could do is add a custom layer with the tf.StagingArea; so that this custom layer is on each GPU, which might not be a bad idea at all; I realize while I am writing this.

@fchollet are you doing a book signing at GTC?

ahundt · 2018-01-26T22:00:00Z

Don't fix this. Make sure your GPUs are aligned.

By aligned do you mean the identical model?

If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).

I'll give it another try when 1.5 is released, last time I tried tf.keras it choked on import tf.keras.backend as K and I was too short on time to debug.

@ahundt coming to GTC?

It sounds great but I don't think I can get funding for it.

TimZaman · 2018-01-26T22:02:49Z

Don't fix this. Make sure your GPUs are aligned.
By aligned do you mean the identical model?

I mean: don't mix different gpu types in one system.

fchollet · 2018-01-26T22:13:07Z

@ahundt you probably want

from tensorflow import keras
K = keras.backend

fchollet · 2018-01-26T22:14:15Z

are you doing a book signing at GTC?

This was in the plans but I haven't had any update on it for a while. Maybe?

fchollet · 2018-01-26T22:33:06Z

Closing since we won't implement this API change.

jinkos · 2018-01-27T00:05:55Z

Thanks for indulging me it's been a great discussion.

…

On 26/01/2018 22:33, François Chollet wrote: Closed #9155 <#9155>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9155 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA_N3iXS09wNt125X1w73KkeCQoFV6z8ks5tOlLEgaJpZM4Rm_U5>.

ahundt · 2018-01-29T04:17:50Z

I mean: don't mix different gpu types in one system.

Too late, but so far together they have certainly been faster than one 👍. Prices are too high for me to do anything differently at the moment, thanks bitcoin. :-)

ozabluda · 2018-01-31T04:19:49Z

@TimZaman
If you deeply care about perf, use tf.keras instead (i.e. faster bias adds, batch norm ops).

Why are those faster in tf.keras?

fchollet closed this as completed Jan 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API DESIGN REVIEW multi-gpu-ratios #9155

API DESIGN REVIEW multi-gpu-ratios #9155

jinkos commented Jan 22, 2018

fchollet commented Jan 23, 2018

jinkos commented Jan 24, 2018

ahundt commented Jan 24, 2018 •

edited

Loading

ahundt commented Jan 24, 2018 •

edited

Loading

jinkos commented Jan 25, 2018 via email •

edited

Loading

ahundt commented Jan 25, 2018 •

edited

Loading

TimZaman commented Jan 26, 2018 •

edited

Loading

fchollet commented Jan 26, 2018 •

edited

Loading

TimZaman commented Jan 26, 2018 •

edited

Loading

ahundt commented Jan 26, 2018

TimZaman commented Jan 26, 2018

fchollet commented Jan 26, 2018

fchollet commented Jan 26, 2018

fchollet commented Jan 26, 2018

jinkos commented Jan 27, 2018 via email

ahundt commented Jan 29, 2018

ozabluda commented Jan 31, 2018

API DESIGN REVIEW multi-gpu-ratios #9155

API DESIGN REVIEW multi-gpu-ratios #9155

Comments

jinkos commented Jan 22, 2018

fchollet commented Jan 23, 2018

jinkos commented Jan 24, 2018

ahundt commented Jan 24, 2018 • edited Loading

ahundt commented Jan 24, 2018 • edited Loading

jinkos commented Jan 25, 2018 via email • edited Loading

ahundt commented Jan 25, 2018 • edited Loading

TimZaman commented Jan 26, 2018 • edited Loading

Fixing skewed GPU ratios.

Optimal graph

fchollet commented Jan 26, 2018 • edited Loading

TimZaman commented Jan 26, 2018 • edited Loading

ahundt commented Jan 26, 2018

TimZaman commented Jan 26, 2018

fchollet commented Jan 26, 2018

fchollet commented Jan 26, 2018

fchollet commented Jan 26, 2018

jinkos commented Jan 27, 2018 via email

ahundt commented Jan 29, 2018

ozabluda commented Jan 31, 2018

ahundt commented Jan 24, 2018 •

edited

Loading

ahundt commented Jan 24, 2018 •

edited

Loading

jinkos commented Jan 25, 2018 via email •

edited

Loading

ahundt commented Jan 25, 2018 •

edited

Loading

TimZaman commented Jan 26, 2018 •

edited

Loading

fchollet commented Jan 26, 2018 •

edited

Loading

TimZaman commented Jan 26, 2018 •

edited

Loading