-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate how distributed training can be used for custom training jobs #452
Comments
using mpirun gives us more information compared to using horovodrun |
We can leverage this https://github.com/aws/sagemaker-training-toolkit/blob/c999c12941408a89969984079f7ddfeddf272882/src/sagemaker_training/mpi.py#L13 |
This was referenced Oct 6, 2020
Merged
Adding logic to insert the MPI-enabling hyperparameter in SageMaker plugin
flyteorg/flyteplugins#124
Merged
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Dec 6, 2022
* Support optional input Signed-off-by: Kevin Su <[email protected]> * updates Signed-off-by: Kevin Su <[email protected]> * updates Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Dec 20, 2022
…yteorg#452) Signed-off-by: Nastya Rusina <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Jul 24, 2023
* Doc Hub Proposal Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * fix test error Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * update database schema Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * Fixed tests Signed-off-by: Kevin Su <[email protected]> * Fixed tests Signed-off-by: Kevin Su <[email protected]> * More tests Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * list description entity Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * register docs when creating task Signed-off-by: Kevin Su <[email protected]> * update go.sum Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * Add short description to workflow Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * more test Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * Lint Signed-off-by: Kevin Su <[email protected]> * update to one transation Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * Bump idl Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * Add identifier Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * merged master Signed-off-by: Kevin Su <[email protected]> * Address comment Signed-off-by: Kevin Su <[email protected]> * Address comment Signed-off-by: Kevin Su <[email protected]> * update migrations.go Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * Merged master Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Kevin Su <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Aug 9, 2023
* Support optional input Signed-off-by: Kevin Su <[email protected]> * updates Signed-off-by: Kevin Su <[email protected]> * updates Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Aug 21, 2023
* Doc Hub Proposal Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * fix test error Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * update database schema Signed-off-by: Kevin Su <[email protected]> * update Signed-off-by: Kevin Su <[email protected]> * Fixed tests Signed-off-by: Kevin Su <[email protected]> * Fixed tests Signed-off-by: Kevin Su <[email protected]> * More tests Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * list description entity Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * register docs when creating task Signed-off-by: Kevin Su <[email protected]> * update go.sum Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * Add short description to workflow Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * fix test Signed-off-by: Kevin Su <[email protected]> * more test Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * Lint Signed-off-by: Kevin Su <[email protected]> * update to one transation Signed-off-by: Kevin Su <[email protected]> * Fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * update comment Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * Bump idl Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * update tests Signed-off-by: Kevin Su <[email protected]> * Add identifier Signed-off-by: Kevin Su <[email protected]> * update idl Signed-off-by: Kevin Su <[email protected]> * wip Signed-off-by: Kevin Su <[email protected]> * merged master Signed-off-by: Kevin Su <[email protected]> * Address comment Signed-off-by: Kevin Su <[email protected]> * Address comment Signed-off-by: Kevin Su <[email protected]> * update migrations.go Signed-off-by: Kevin Su <[email protected]> * nit Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * fix tests Signed-off-by: Kevin Su <[email protected]> * lint Signed-off-by: Kevin Su <[email protected]> * Merged master Signed-off-by: Kevin Su <[email protected]> * more tests Signed-off-by: Kevin Su <[email protected]> Signed-off-by: Kevin Su <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Apr 30, 2024
Signed-off-by: Niels Bantilan <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Apr 30, 2024
Signed-off-by: Niels Bantilan <[email protected]>
austin362667
pushed a commit
to austin362667/flyte
that referenced
this issue
May 7, 2024
Signed-off-by: Niels Bantilan <[email protected]>
robert-ulbrich-mercedes-benz
pushed a commit
to robert-ulbrich-mercedes-benz/flyte
that referenced
this issue
Jul 2, 2024
Signed-off-by: Niels Bantilan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
sagemaker_enable_mpi=true
in hyperparameters (Adding logic to insert the MPI-enabling hyperparameter in SageMaker plugin flyteplugins#124)distributed_training_context
withwf_params
(Enabling bare-bones distributed training for SageMaker and prepare for enabling distributed training for PyTorch / Tensorflow plugins flytekit#173)(Enabling bare-bones distributed training for SageMaker and prepare for enabling distributed training for PyTorch / Tensorflow plugins flytekit#173)
The text was updated successfully, but these errors were encountered: