-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The reason for NaN #12591
Comments
👋 Hello @KwangryeolPark, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it. If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results. RequirementsPython>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit. Introducing YOLOv8 🚀We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀! Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects. Check out our YOLOv8 Docs for details and get started with: pip install ultralytics |
I hope you fix the mixed precision problem. |
@KwangryeolPark hello! Thanks for bringing this to our attention. NaNs during training can indeed sometimes be related to precision issues when using mixed precision training (AMP). However, there could be other factors at play, such as learning rate, weight initialization, or data preprocessing. Regarding the use of NVIDIA apex, YOLOv5 uses PyTorch's native AMP implementation, which is generally recommended for its ease of use and integration. If you're experiencing NaNs with AMP, you might want to try the following:
If you're willing to submit a PR, we'd be happy to review any improvements or fixes you propose. Just make sure to thoroughly test your changes to ensure they're beneficial across various scenarios. Remember to check out our documentation for more details on troubleshooting and best practices: https://docs.ultralytics.com/yolov5/ Thanks for your contribution to the YOLOv5 community! 🚀 |
@glenn-jocher Thank you for answer. In order to set learning-rate, I see Training Arguments and find lr0 argument. However, when I add --lr0 0.001, the script shows |
Apologies for the confusion, @KwangryeolPark. The correct argument for setting the initial learning rate in the YOLOv5 training script is python train.py --data coco.yaml --epochs 300 --weights '' --cfg yolov5m.yaml --batch-size 40 --optimizer CAME --device 0 --lr 0.001 Make sure to adjust the learning rate according to your specific needs and keep an eye on the training process to ensure stability. If you have any further questions or issues, don't hesitate to reach out. Happy training! 🚀 |
@glenn-jocher Thank you for guidance. However, |
I apologize for the oversight, @KwangryeolPark. In YOLOv5, the learning rate is set in the hyperparameter configuration file rather than as a command-line argument. You can adjust the learning rate by editing the For example, to set the initial learning rate to 0.001, you would modify the lr0: 0.001 # initial learning rate Then, you can reference this hyperparameter file during training using the python train.py --data coco.yaml --epochs 300 --weights '' --cfg yolov5m.yaml --batch-size 40 --optimizer CAME --device 0 --hyp your_hyperparameter_file.yaml Replace |
Thank you |
You're welcome, @KwangryeolPark! If you have any more questions or need further assistance in the future, feel free to reach out. Best of luck with your YOLOv5 training! Happy detecting! 🚀👀 |
Search before asking
YOLOv5 Component
Training
Bug
Like other issues, I also see NaN during training yolov5m to coco dataset following the script in coco.yaml and README.md.
I try to figure out the reason for NaN and I find a hint in a Issue which indirectly is about amp (Auto Mixed Precision).
It makes sense that low precission has a higher chance to occur NaN during casting because of Underflow.
Therefore, I think, lots of NaN problem come from amp so I looks better to use NVIDIA apex which uses distribution shift to prevent distribution miss match.
Environment
YOLOv5m
torch:1.12.1+cu116
python: 3.8.12
dataset: coco
optimizer: CAME
epochs: 300
batch size: 40
Minimal Reproducible Example
I use CAME optimizer with betas=(momentum, 0.999, 0.999)
Additional
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: