Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Create PyTorch multi-node distributed training runtime #2211

Closed
Tracked by #2170
andreyvelich opened this issue Aug 14, 2024 · 2 comments · Fixed by #2328
Closed
Tracked by #2170

KEP-2170: Create PyTorch multi-node distributed training runtime #2211

andreyvelich opened this issue Aug 14, 2024 · 2 comments · Fixed by #2328
Assignees

Comments

@andreyvelich
Copy link
Member

Related: #2170

We should create ClusterTrainingRuntime for PyTorch multi-node distributed training.

/area runtime

@yang20150702
Copy link

I'm learning training-operator v1, I want to work for this issue. Please give me some suggestions.

@deepanker13
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants