Skip to content

Commit

Permalink
Single node GPU training example (#333)
Browse files Browse the repository at this point in the history
* Single node GPU training example

Signed-off-by: Ketan Umare <[email protected]>

* Minor fix related to tensorboard in PyTorch (#334)

Signed-off-by: Jinserk Baik <[email protected]>

* updated pytorch training example

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* wandb integration, code lint, content

Signed-off-by: Samhita Alla <[email protected]>

* remove misplaced text

Signed-off-by: Samhita Alla <[email protected]>

* add pytorch in tests' manifest

Signed-off-by: Samhita Alla <[email protected]>

* changed pytorch to mnist

Signed-off-by: Samhita Alla <[email protected]>

* dockerfile

Signed-off-by: Samhita Alla <[email protected]>

* update link

Signed-off-by: cosmicBboy <[email protected]>

* update deps

Signed-off-by: cosmicBboy <[email protected]>

Co-authored-by: Jinserk Baik <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: cosmicBboy <[email protected]>

add pytorch multi-gpu tutorial

Signed-off-by: cosmicBboy <[email protected]>

update pytorch tutorials

Signed-off-by: cosmicBboy <[email protected]>

update multi gpu example

Signed-off-by: cosmicBboy <[email protected]>

update multi-gpu

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

update flytekit version

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>
  • Loading branch information
kumare3 authored and cosmicBboy committed Aug 12, 2021
1 parent 54656b8 commit 3524ed1
Show file tree
Hide file tree
Showing 9 changed files with 784 additions and 21 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@ _build/
.python-version
cookbook/release-snacks
.kube/
.docker/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ FLYTE_DIR := ~/.flyte


# Module of cookbook examples to register
EXAMPLES_MODULE := core
EXAMPLES_MODULE ?= core

define LOG
echo "$(shell tput bold)$(shell tput setaf 2)$(1)$(shell tput sgr0)"
Expand Down
14 changes: 9 additions & 5 deletions cookbook/case_studies/ml_training/mnist_classifier/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
FROM nvcr.io/nvidia/pytorch:21.06-py3
FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime
LABEL org.opencontainers.image.source https://github.com/flyteorg/flytesnacks

WORKDIR /root
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

# Give your wandb API key. Get it from https://wandb.ai/authorize.
# ENV WANDB_API_KEY your-api-key
# Set your wandb API key and user name. Get the API key from https://wandb.ai/authorize.
# ENV WANDB_API_KEY <api_key>
# ENV WANDB_USERNAME <user_name>

# Install the AWS cli for AWS support
RUN pip install awscli

ENV VENV /opt/venv

# Virtual environment
ENV VENV /opt/venv
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

Expand All @@ -25,6 +25,10 @@ RUN pip install -r /root/requirements.txt
# Copy the actual code
COPY mnist_classifier/ /root/mnist_classifier/

# Copy the makefile targets to expose on the container. This makes it easier to register.
COPY in_container.mk /root/Makefile
COPY mnist_classifier/sandbox.config /root

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Weights & Biases Integration

`Weights & Biases <https://wandb.ai/site>`__, or simply, ``wandb`` helps build better models faster with experiment tracking, dataset versioning, and model management.

We'll use ``wandb`` alongside PyTorch to track our ML experiment and its concerned model parameters.
We'll use ``wandb`` alongside PyTorch to track our ML experiment and its associated model parameters.

.. note::
Before running the example, create a ``wandb`` account and log in to access the API.
Expand Down
Loading

0 comments on commit 3524ed1

Please sign in to comment.