Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu tensorflow example #638

Merged
merged 13 commits into from
Jun 25, 2019
Merged

Gpu tensorflow example #638

merged 13 commits into from
Jun 25, 2019

Conversation

JoelH96
Copy link
Contributor

@JoelH96 JoelH96 commented Jun 20, 2019

Developed GPU Deep Mnist Tensorflow Example to run on GKE. The main sections are as follows:

  1. Install of all CUDA and cuDNN drivers required for TF-GPU=1.13.1
  2. Training and local testing of Deep Mnist model (this was achieved using a GCE VM with GPU).
  3. Pushing the Deep-Mnist-GPU image to Google Container Registry.
  4. Deploying the image using Seldon on GKE (with a GPU configured node)

ISSUES involving GPU support: #590 #602 #619

@JoelH96 JoelH96 requested a review from axsaucedo June 20, 2019 13:57
Copy link
Contributor

@ukclivecox ukclivecox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have replicated all the python s2i files. Are you not able to add just a custom assemble extension

For GKE don't you need to add resource requirements for a GPU in the SerldonDeployment JSON?

Also might be good to look at Node taints so that only Pods requesting the GPU are run on the GPU node?

@ukclivecox ukclivecox merged commit d917614 into master Jun 25, 2019
@divyadilip91
Copy link

@cliveseldon @JoelH96 @ryandawsonuk Hi all ,
I am currently facing an issue of gpu not being found by my docker container when deploying a gpu specific seldon deployment on kubernetes .
I have seldon core 1.1.0 installed on kubernetes. The tensorflow-gpu python package that has been used in my docker image is 1.14 and the base docker image used is ufoym/deepo:pytorch-cu101-ver200129.
I receive the following errors
image
image
image
image

Would seldon deployments with gpu work only by using tensorflow-gpu 1.13 ? Will it not work for 1.14?
Could you please guide me on how to set up the docker image using gpu that can be used by seldon deployment yaml for deploying on kubernetes.
According to the above example notebook, its show a lot of dependencies to be installed. Is a base image already present that involves all these dependencies installed? Expecting your reply.

@axsaucedo
Copy link
Contributor

@divyadilip91 we've resolved this issue via #2048, it seems the image had a bug so we've depricated this tf GPU image, and we're providing a plain GPU conda image in 1.2.2 which resolves this issue (cc. @RafalSkolasinski)

@RafalSkolasinski
Copy link
Contributor

I think Alejandro meant that in 1.3 we will provide plain GPU image, see #1789.
Right now there is still tensorflow there but idea is that one will be able to install whichever
version of tensorflow they see fit.

@divyadilip91 It looks like you are not using our image as your base.
Could you try ours image https://hub.docker.com/repository/docker/seldonio/seldon-core-s2i-python3-tf-gpu
especially the one with latest tag seldon-core-s2i-python3-tf-gpu:1.2.2-dev.

@RafalSkolasinski
Copy link
Contributor

@divyadilip91 If you find an issue with our image, could you open a new issue for it please?

@divyadilip91
Copy link

divyadilip91 commented Jul 6, 2020

@axsaucedo Thanks for your immediate reply.
@RafalSkolasinski I have seldon core 1.1.0 installed on my kuberenetes and I wrap my model using seldon python class and not s2i and then create the dockerfile which has the seldon core command.So I havent used this as my base image. So will the seldon-core python package 1.2 create issues if the seldon core installed on kuberenetes is 1.1.0.Moreover would this image cause issue since I see the image mentioned above has 1.2 version and my kuberenetes seldon has 1.1.0.
Kindly help

@RafalSkolasinski
Copy link
Contributor

@divyadilip91 this will be only a version python wrapper inside the image. It should work properly with Seldon Core 1.1 in the cluster (this actually describes a version of the seldon core operator that is installed in the cluster).

You can use seldon-core-s2i-python3-tf-gpu:1.2.2-dev as the base image in your Dockerfile - you are not limited to using the s2i tool.

Also, you can then in your Dockerfile pip install any version of seldon-core and tensorflow.

@divyadilip91
Copy link

Ok Thankyou @RafalSkolasinski . I ll use this as my base image and will get back in case I face any issues.

@RafalSkolasinski RafalSkolasinski deleted the gpu-tensorflow-example branch February 21, 2023 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants