Gpu tensorflow example #638

JoelH96 · 2019-06-20T13:57:41Z

Developed GPU Deep Mnist Tensorflow Example to run on GKE. The main sections are as follows:

Install of all CUDA and cuDNN drivers required for TF-GPU=1.13.1
Training and local testing of Deep Mnist model (this was achieved using a GCE VM with GPU).
Pushing the Deep-Mnist-GPU image to Google Container Registry.
Deploying the image using Seldon on GKE (with a GPU configured node)

ISSUES involving GPU support: #590 #602 #619

ukclivecox

You seem to have replicated all the python s2i files. Are you not able to add just a custom assemble extension

For GKE don't you need to add resource requirements for a GPU in the SerldonDeployment JSON?

Also might be good to look at Node taints so that only Pods requesting the GPU are run on the GPU node?

divyadilip91 · 2020-07-06T06:04:49Z

@cliveseldon @JoelH96 @ryandawsonuk Hi all ,
I am currently facing an issue of gpu not being found by my docker container when deploying a gpu specific seldon deployment on kubernetes .
I have seldon core 1.1.0 installed on kubernetes. The tensorflow-gpu python package that has been used in my docker image is 1.14 and the base docker image used is ufoym/deepo:pytorch-cu101-ver200129.
I receive the following errors

Would seldon deployments with gpu work only by using tensorflow-gpu 1.13 ? Will it not work for 1.14?
Could you please guide me on how to set up the docker image using gpu that can be used by seldon deployment yaml for deploying on kubernetes.
According to the above example notebook, its show a lot of dependencies to be installed. Is a base image already present that involves all these dependencies installed? Expecting your reply.

axsaucedo · 2020-07-06T06:28:53Z

@divyadilip91 we've resolved this issue via #2048, it seems the image had a bug so we've depricated this tf GPU image, and we're providing a plain GPU conda image in 1.2.2 which resolves this issue (cc. @RafalSkolasinski)

RafalSkolasinski · 2020-07-06T08:42:20Z

I think Alejandro meant that in 1.3 we will provide plain GPU image, see #1789.
Right now there is still tensorflow there but idea is that one will be able to install whichever
version of tensorflow they see fit.

@divyadilip91 It looks like you are not using our image as your base.
Could you try ours image https://hub.docker.com/repository/docker/seldonio/seldon-core-s2i-python3-tf-gpu
especially the one with latest tag seldon-core-s2i-python3-tf-gpu:1.2.2-dev.

RafalSkolasinski · 2020-07-06T08:42:55Z

@divyadilip91 If you find an issue with our image, could you open a new issue for it please?

divyadilip91 · 2020-07-06T08:52:27Z

@axsaucedo Thanks for your immediate reply.
@RafalSkolasinski I have seldon core 1.1.0 installed on my kuberenetes and I wrap my model using seldon python class and not s2i and then create the dockerfile which has the seldon core command.So I havent used this as my base image. So will the seldon-core python package 1.2 create issues if the seldon core installed on kuberenetes is 1.1.0.Moreover would this image cause issue since I see the image mentioned above has 1.2 version and my kuberenetes seldon has 1.1.0.
Kindly help

RafalSkolasinski · 2020-07-06T10:27:41Z

@divyadilip91 this will be only a version python wrapper inside the image. It should work properly with Seldon Core 1.1 in the cluster (this actually describes a version of the seldon core operator that is installed in the cluster).

You can use seldon-core-s2i-python3-tf-gpu:1.2.2-dev as the base image in your Dockerfile - you are not limited to using the s2i tool.

Also, you can then in your Dockerfile pip install any version of seldon-core and tensorflow.

divyadilip91 · 2020-07-07T03:33:22Z

Ok Thankyou @RafalSkolasinski . I ll use this as my base image and will get back in case I face any issues.

JoelH96 and others added 11 commits June 6, 2019 14:07

Added Tensorflow with GPU Example

ea730cb

README.md added

368142f

Changed GPU Example formatting

8cd108c

New GPU TF README

1483924

Updated Jupyter Notebook

52fc3ef

Updated TF GPU Example Jupyter notebook

984a7c8

Removed binary files

a012c3c

Commit before closing VM

a7c4a19

tensorflow gpu 1.13.1

feb582e

GPU Example TF 1.13.1

5a95ff6

GKE Deployment added to GPU Mnist example

504f7dd

JoelH96 requested a review from axsaucedo June 20, 2019 13:57

seldondev added the size/XXL label Jun 20, 2019

Clear up of TF GPU example folder

f0135c6

JoelH96 requested review from ukclivecox, gsunner and ryandawsonuk June 21, 2019 09:30

ukclivecox suggested changes Jun 21, 2019

View reviewed changes

Added GPU resources to Seldon GPU Deployment

6833ec9

ukclivecox merged commit d917614 into master Jun 25, 2019

RafalSkolasinski deleted the gpu-tensorflow-example branch February 21, 2023 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gpu tensorflow example #638

Gpu tensorflow example #638

JoelH96 commented Jun 20, 2019

ukclivecox left a comment

divyadilip91 commented Jul 6, 2020

axsaucedo commented Jul 6, 2020

RafalSkolasinski commented Jul 6, 2020

RafalSkolasinski commented Jul 6, 2020

divyadilip91 commented Jul 6, 2020 •

edited

Loading

RafalSkolasinski commented Jul 6, 2020

divyadilip91 commented Jul 7, 2020

Gpu tensorflow example #638

Gpu tensorflow example #638

Conversation

JoelH96 commented Jun 20, 2019

ukclivecox left a comment

Choose a reason for hiding this comment

divyadilip91 commented Jul 6, 2020

axsaucedo commented Jul 6, 2020

RafalSkolasinski commented Jul 6, 2020

RafalSkolasinski commented Jul 6, 2020

divyadilip91 commented Jul 6, 2020 • edited Loading

RafalSkolasinski commented Jul 6, 2020

divyadilip91 commented Jul 7, 2020

divyadilip91 commented Jul 6, 2020 •

edited

Loading