Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Allocator] add new allocator strategy #62638

Merged
merged 15 commits into from
Mar 25, 2024

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented Mar 12, 2024

PR types

Performance optimization

PR changes

Others

Description

Pcard-74613

添加新的显存策略尝试优化显存碎片。
打开方法:export FLAGS_use_auto_growth_v2=1

对外暴露了_set_warmup接口,用于应用标记warmup结束:

  1. 目前在optimizer基类中调用了该接口。正常训练1个step则结束warmup
  2. 部分模型自定义了optimizer没有使用框架的optimizer,这种情况,模型必须手动调用_set_warmup,否则会出现高频的显存申请释放,这样会影响调度性能,在小batchsize的模型中可能影响模型性能。
  3. 如果预测场景需要使用该策略,需要手动调用_set_warmup。

目前CI CE主要通过max_mem_reserved统计显存使用情况。这个策略,由于早期会申请冗余显存块,所以会显得峰值更高。

Copy link

paddle-bot bot commented Mar 12, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link

paddle-ci-bot bot commented Mar 20, 2024

Sorry to inform you that ec1c828's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall,但建议还是酌情考虑下新增逻辑的管理方式

namespace memory {
namespace allocation {

class AutoGrowthBestFitAllocatorV2 : public AutoGrowthBestFitAllocator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以 V2 为后缀命名文件和类,但其实这里并不一定能够会将其在框架默认开启吧?使用一个单独的namespace做管理是不是比V2更好一些?从之前的经验来看,涉及到V2未来就要考虑退场的问题。

@wanghuancoder wanghuancoder merged commit 6261015 into PaddlePaddle:develop Mar 25, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants