-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tensor hooks and 10.0 gradient clipping #8598
Conversation
@glenn-jocher I have to change the part with the hook, that is not so right I noticed. Srry. |
Is this PR ready to test or do you still need to make changes? |
@glenn-jocher Yes now, I removed the unnecessary line of code. It would not have changed, but it is not necessary. Thanks a lot! |
@glenn-jocher A few insights: I've been running ~50 trainings on my private dataset with this new branch the last few days. So far it looks like this version is much more stable & the rSeed plays a less important role. No unstable trainings as described above have occurred so far. Before sometimes 3 out of 9 trainings were unstable. I know that is just my dataset and not a general test :) |
@UnglvKitDe I'd like to close this out but we need a few changes. If this improves default trainings with little cost/resources penalty then we should enable it by default. We want to avoid adding argparser arguments if at all possible, we already have too many. Can you profile training with this branch and with master to compare mAP and training time and CUDA memory utilization? |
@UnglvKitDe also why do we have to unscale optimizer first? This seems like a costly operation, wouldn't it make more sense to scale the gradient clipping value by the scaler settings? |
@glenn-jocher At the moment I unfortunately don't have the resource to run a full coco training. My own dataset has only a few hundred images, coco ~120k. Srry! Can you test this? If I can help you in any other way, please let me know :) |
@glenn-jocher As described here, it is necessary to unscale them because otherwise you will get the wrong gradients. II did not notice any significant longer training on my data set. But (as said above) I have only a small dataset. Maybe it also improves the training and leads to better results (by a more stable training). |
@UnglvKitDe thanks, I'll review the link and run some tests this weekend. |
@UnglvKitDe it looks like the PR does two separate things, first replace NaN with 0.0 and then separately clip gradients. Do you know which change resulted into the improvements you saw? Are you able to determine which was more important? |
@UnglvKitDe I'm testing in Colab now. I see same time but slightly worse mAP for PR. I wonder if the clipping is too tight. We have 10.0 now, maybe put 30.0 or 100.0? !git clone https://github.com/UnglvKitDe/yolov5-1 -b fix/grad_inf_nan # clone
%cd yolov5-1
%pip install -qr requirements.txt # install
!python train.py --img 640 --batch 16 --epochs 30 --data coco128.yaml --weights yolov5s.pt --cache
%cd ..
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!python train.py --img 640 --batch 16 --epochs 30 --data coco128.yaml --weights yolov5s.pt --cache |
@UnglvKitDe confirm raising the clip max to 100.0 solves the issue of lower mAP. You think this might be too high? |
|
@glenn-jocher About your changes: I don't know to what extent the parameters may be None. Actually, this makes no sense. But in a forum this was advised once, but it can also be an old & solved bug. I have seen values for max_value from 2.0 to 5.0. 10 was already chosen very high by me. I find 100, is already very very high... |
@glenn-jocher I think that depends on the context, which height is needed, so I wanted to keep it configurable. I agree with you though that we have too many argparser arguments. Maybe we put this in a config file (hyp*.yaml or model *.yaml ?). |
@UnglvKitDe reduced gradient clipping to 10.0 after some experiments found not much difference with master. PR is merged. Thank you for your contributions to YOLOv5 π and Vision AI β |
@glenn-jocher Thx :) |
@UnglvKitDe I observed erratic training behavior (green line) with the nan_to_num hook in classifier branch (I added it there also), so I'm going to remove it from master. |
@glenn-jocher Mh interesting, I have never seen anything like this. You have taken my code except for the None check I see. I do not know if this is the reason. But actually none of the weights should be None. And if they are, an exception should be thrown... |
* Add tensor hooks and gradient clipping ultralytics#8578 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove retain_grad(), because its not necessary * Update train.py * Simplify * Update train.py * Update train.py * Update train.py * Update train.py Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <[email protected]>
@UnglvKitDe I agree, the None checks should not be necessary, and the issue should be investigated further to ensure all weights are properly initialized and exceptions are handled appropriately. Thank you for bringing this to my attention. |
Improves the stability issues as described in #8578
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Improving training stability with gradient regularization and NaN mitigation.
π Key Changes
π― Purpose & Impact