-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SE-ResNeXt Optimization #8990
Comments
Maybe a lot of small Op kernels, like sgd_op, also need be optimized. It can be imaged that, if the model has 1000 parameters, it will call 1000 times of sgd_ops, this method is very time-consuming. There are two strategies, one is to analyze the dependence of operations and insert those sgd_ops into the process of backward, the other is to replace sgd_op with sgd_group_op. This issue(#8941) displays the result of the second strategy(using sgd_group_op). |
Config and Env:
The comparison results before optimization:
After optimizing the speed:
After optimizing the memory usage:
|
Now, if we choose release memory policy, the memory occupation is almost the same with PyTorch. However, delete_var operator will synchronize the CUDA stream before release unused memory, which will reduce computation performance. We have to implement |
Background
project: https://github.com/PaddlePaddle/Paddle/projects/55
Profiling script:
Optimization methods and result
Status
Plan
Give a total profile after all the optimization is merged (@chengduoZH )
The text was updated successfully, but these errors were encountered: