Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDP WORLD_SIZE-safe dataloader workers #5631

Merged
merged 2 commits into from
Nov 12, 2021
Merged

DDP WORLD_SIZE-safe dataloader workers #5631

merged 2 commits into from
Nov 12, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Nov 12, 2021

Improved DDP reporting of total worker count and safe limiting of total worker count. Partially addresses #5628

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Optimized data loading in distributed training environments for YOLOv5.

📊 Key Changes

  • Adjusted the number of dataloader workers in train.py to consider the WORLD_SIZE variable.
  • Modified the --workers command-line argument description to reflect its relationship with distributed training (DDP mode).
  • Updated utils/datasets.py to factor in WORLD_SIZE when calculating the number of worker threads.

🎯 Purpose & Impact

  • The changes ensure better utilization of system resources during distributed training by dynamically adjusting the number of dataloader workers based on the number of nodes (WORLD_SIZE).
  • Users will likely experience more efficient data loading and potentially faster training times in multi-node environments.
  • The update clarifies to users how worker counts will be affected when training in a distributed fashion.

@glenn-jocher glenn-jocher self-assigned this Nov 12, 2021
@glenn-jocher glenn-jocher linked an issue Nov 12, 2021 that may be closed by this pull request
2 tasks
@glenn-jocher glenn-jocher changed the title WORLD_SIZE-safe dataloader workers DDP WORLD_SIZE-safe dataloader workers Nov 12, 2021
@glenn-jocher glenn-jocher merged commit 7473f0f into master Nov 12, 2021
@glenn-jocher glenn-jocher deleted the update/workers branch November 12, 2021 13:48
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* WORLD_SIZE-safe workers

* Update with DDP comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected behavior in DDP mode with dataloader workers
1 participant