Horovod demo #899

porteratzo · 2023-11-29T20:44:29Z

PR to integrate a new workspace template called torch_llm_horovod, it uses pt_model as the collaborator task runner and launches a horovod cluster in the train_batches and validate steps with launch_horovod, this launches in InHorovodrun.py which creates the LLMTrainer and loads the datasets and trains or validates the model. The taskrunner and dataloader in the collaborator handles saving the data and model state for the LLMTrainer to load and return to the taskrunner.

psfoley · 2024-02-07T00:42:31Z

@porteratzo could you add a description to this PR?

psfoley

This is a great contribution, @porteratzo. A couple small requested changes, and then I think this is ready to merge.

openfl-tutorials/experimental/Federeated_Pytorch_LLM_Horovod.py

openfl-tutorials/experimental/LLM_Horovod.MD

* initial commit * changes * name changes, added readme * remove files, new dataloader inherit * move files, flake changes * flake fix --------- Co-authored-by: Patrick Foley <[email protected]> Signed-off-by: nammbash <[email protected]>

* initial commit * changes * name changes, added readme * remove files, new dataloader inherit * move files, flake changes * flake fix --------- Co-authored-by: Patrick Foley <[email protected]> Signed-off-by: manuelhsantana <[email protected]>

porteratzo and others added 5 commits November 29, 2023 12:40

initial commit

61894a9

Merge branch 'securefederatedai:develop' into horovod_demo

fb63061

changes

ac17821

name changes, added readme

208b69a

Merge branch 'develop' into horovod_demo

cb68a0e

remove files, new dataloader inherit

f78b235

psfoley requested changes Feb 22, 2024

View reviewed changes

openfl-tutorials/experimental/Federeated_Pytorch_LLM_Horovod.py Outdated Show resolved Hide resolved

openfl-tutorials/experimental/LLM_Horovod.MD Outdated Show resolved Hide resolved

porteratzo added 2 commits February 23, 2024 08:40

move files, flake changes

7f14c17

flake fix

edc83ea

psfoley self-requested a review February 23, 2024 19:29

psfoley approved these changes Feb 23, 2024

View reviewed changes

psfoley merged commit 9e9047f into securefederatedai:develop Feb 23, 2024
24 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Horovod demo #899

Horovod demo #899

porteratzo commented Nov 29, 2023 •

edited

Loading

psfoley commented Feb 7, 2024

psfoley left a comment

Horovod demo #899

Horovod demo #899

Conversation

porteratzo commented Nov 29, 2023 • edited Loading

psfoley commented Feb 7, 2024

psfoley left a comment

Choose a reason for hiding this comment

porteratzo commented Nov 29, 2023 •

edited

Loading