Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single node GPU training example #333

Merged
merged 11 commits into from
Jul 27, 2021
Merged

Single node GPU training example #333

merged 11 commits into from
Jul 27, 2021

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Jul 19, 2021

Signed-off-by: Ketan Umare [email protected]

@kumare3 kumare3 requested a review from wild-endeavor as a code owner July 19, 2021 23:19
Copy link
Contributor

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose re-naming this section to "MNIST Classification" with a Pytorch section underneath it, "MNIST Classification with Pytorch and W&B".

As a reminder, Tutorials are use-case-centric, not technology-centric, e.g. we could also write tutorials for training an MNIST classifier using any number of other ML packages.

cookbook/case_studies/ml_training/pytorch/README.rst Outdated Show resolved Hide resolved
cookbook/docs/ml_training.rst Outdated Show resolved Hide resolved
@kumare3
Copy link
Contributor Author

kumare3 commented Jul 23, 2021

Agree with @cosmicBboy

@cosmicBboy
Copy link
Contributor

#342 should fix the docs build issue

kumare3 and others added 11 commits July 27, 2021 14:08
Signed-off-by: Ketan Umare <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
@cosmicBboy cosmicBboy merged commit ff4e179 into master Jul 27, 2021
@samhita-alla
Copy link
Contributor

@cosmicBboy, we haven't yet tested this on demo.nuclyde with one GPU provisioned. A few days ago, Ketan wanted to spin up an instance and test, but that hasn't happened yet.

We may need to test this when you're working on the distributed training scenario.

@cosmicBboy
Copy link
Contributor

@samhita-alla thanks for the heads up! @wild-endeavor is working on getting gpus on demo.nuclyde, I can help test out the example as well

@samhita-alla
Copy link
Contributor

@samhita-alla thanks for the heads up! @wild-endeavor is working on getting gpus on demo.nuclyde, I can help test out the example as well

Sure, thank you!

cosmicBboy pushed a commit that referenced this pull request Aug 12, 2021
* Single node GPU training example

Signed-off-by: Ketan Umare <[email protected]>

* Minor fix related to tensorboard in PyTorch (#334)

Signed-off-by: Jinserk Baik <[email protected]>

* updated pytorch training example

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* wandb integration, code lint, content

Signed-off-by: Samhita Alla <[email protected]>

* remove misplaced text

Signed-off-by: Samhita Alla <[email protected]>

* add pytorch in tests' manifest

Signed-off-by: Samhita Alla <[email protected]>

* changed pytorch to mnist

Signed-off-by: Samhita Alla <[email protected]>

* dockerfile

Signed-off-by: Samhita Alla <[email protected]>

* update link

Signed-off-by: cosmicBboy <[email protected]>

* update deps

Signed-off-by: cosmicBboy <[email protected]>

Co-authored-by: Jinserk Baik <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: cosmicBboy <[email protected]>

add pytorch multi-gpu tutorial

Signed-off-by: cosmicBboy <[email protected]>

update pytorch tutorials

Signed-off-by: cosmicBboy <[email protected]>

update multi gpu example

Signed-off-by: cosmicBboy <[email protected]>

update multi-gpu

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

update flytekit version

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: cosmicBboy <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>

multi-gpu WIP

Signed-off-by: Niels Bantilan <[email protected]>
samhita-alla added a commit to samhita-alla/flytesnacks that referenced this pull request Aug 13, 2021
Signed-off-by: Samhita Alla <[email protected]>

protobf -> becomes -> protobuf (flyteorg#329)

Signed-off-by: Bruce Arctor <[email protected]>

fix dolt docs (flyteorg#327)

Signed-off-by: Samhita Alla <[email protected]>

reorganize sqlite3 user guide example (flyteorg#300)

* reorganize sqlite3 user guide example

move from extending_flyte to integrations/flytekit_plugins

Signed-off-by: cosmicBboy <[email protected]>

* update title

Signed-off-by: cosmicBboy <[email protected]>

* update dolt card text

Signed-off-by: cosmicBboy <[email protected]>

* add sql-alchemy

Signed-off-by: Samhita Alla <[email protected]>

* readme

Signed-off-by: Samhita Alla <[email protected]>

* lint code

Signed-off-by: Samhita Alla <[email protected]>

* modify sqlalchemy

Signed-off-by: Samhita Alla <[email protected]>

* update content

Signed-off-by: Samhita Alla <[email protected]>

* update example

Signed-off-by: Samhita Alla <[email protected]>

* sqlalchemy remote example

Signed-off-by: Samhita Alla <[email protected]>

* code updates

Signed-off-by: Samhita Alla <[email protected]>

Co-authored-by: Samhita Alla <[email protected]>

Bump urllib3 in /cookbook/integrations/flytekit_plugins/dolt (flyteorg#325)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.11 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@1.25.11...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Example for writing queries for Athena (flyteorg#319)

- flyte allows writing queries directly that are executed by the
backend. This shows such an example

Signed-off-by: Ketan Umare <[email protected]>

Link athena docs from aws integrations page overview (flyteorg#338)

rename control_plane section to remote_access (flyteorg#302)

* rename control_plane section to remote_access

- add stub pages for remote access user guide examples
- clean up of named tuple outputs example
- clean-up of sagemaker distributed pytorch training

Signed-off-by: cosmicBboy <[email protected]>

* [PR Into 302] Added documentation for running task, launchplans ,inspecting and debgging them (flyteorg#316)

* Added documentation for running task, launchplans ,inspecting and debugging them

Signed-off-by: Prafulla Mahindrakar <[email protected]>

* Incorporated the feedback

Signed-off-by: pmahindrakar-oss <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>

* add links, formatting

Signed-off-by: cosmicBboy <[email protected]>

Co-authored-by: pmahindrakar-oss <[email protected]>

fix sandbox start command in the cookbook (flyteorg#339)

Signed-off-by: Pianist038801 <[email protected]>

Co-authored-by: steven <[email protected]>

Bump urllib3 from 1.25.11 to 1.26.5 in /cookbook/integrations/aws/athena (flyteorg#337)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.11 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@1.25.11...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bump urllib3 in /cookbook/integrations/external_services/hive (flyteorg#336)

Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.11 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@1.25.11...1.26.5)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

update doc requirements with sphinx v4 (flyteorg#341)

Signed-off-by: cosmicBboy <[email protected]>

update dev requirements (flyteorg#342)

* update dev requirements

flyteorg#341 only updated the
docs requirement, which resulted in a docs build issue

https://readthedocs.org/projects/flytecookbook/builds/14327792/

Signed-off-by: cosmicBboy <[email protected]>

* docs build installing deps in ci matches rtd

Signed-off-by: cosmicBboy <[email protected]>

Single node GPU training example (flyteorg#333)

* Single node GPU training example

Signed-off-by: Ketan Umare <[email protected]>

* Minor fix related to tensorboard in PyTorch (flyteorg#334)

Signed-off-by: Jinserk Baik <[email protected]>

* updated pytorch training example

Signed-off-by: Ketan Umare <[email protected]>

* updated

Signed-off-by: Ketan Umare <[email protected]>

* wandb integration, code lint, content

Signed-off-by: Samhita Alla <[email protected]>

* remove misplaced text

Signed-off-by: Samhita Alla <[email protected]>

* add pytorch in tests' manifest

Signed-off-by: Samhita Alla <[email protected]>

* changed pytorch to mnist

Signed-off-by: Samhita Alla <[email protected]>

* dockerfile

Signed-off-by: Samhita Alla <[email protected]>

* update link

Signed-off-by: cosmicBboy <[email protected]>

* update deps

Signed-off-by: cosmicBboy <[email protected]>

Co-authored-by: Jinserk Baik <[email protected]>
Co-authored-by: Samhita Alla <[email protected]>
Co-authored-by: cosmicBboy <[email protected]>

Indent list items under "Contribute to examples" section (flyteorg#346)

Signed-off-by: eduardo apolinario <[email protected]>

Papermill Tutorial (flyteorg#345)

* Papermill tutorial

Signed-off-by: Samhita Alla <[email protected]>

* github action, tests

Signed-off-by: Samhita Alla <[email protected]>

* docs-related change

Signed-off-by: Samhita Alla <[email protected]>

* dataclass

Signed-off-by: Samhita Alla <[email protected]>
cosmicBboy added a commit that referenced this pull request Aug 17, 2021
* update pytorch multi-gpu example, incorporate comments @samhita-alla @kumare3

Signed-off-by: Niels Bantilan <[email protected]>

* Apply suggestions from code review

Co-authored-by: Samhita Alla <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>

Co-authored-by: Samhita Alla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants