-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high-performance SingleThreadedWorkQueue #35086
high-performance SingleThreadedWorkQueue #35086
Conversation
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work.LGTM
|
||
EventCount(const EventCount&) = delete; | ||
|
||
void operator=(const EventCount&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important,我们其实是有DISABLE_COPY_AND_ASSIGN宏的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
基于Eigen的实现,改进实现细节,实现一个high-performance SingleThreadedWorkQueue。
对于Eigen原本的实现,修改点:
1)修改内存对齐和内存分配,进一步提升性能。
2)修改EventCount接口,提升易用性。
3)新增WaitQueueEmpty接口,便于用户等待task完成而无需自己追踪task。
4)替换了Eigen自定义宏为C++标准库宏和函数。
5)后续将要做的修改:将RunQueue的std::mutex替换为spinlock,以便提升性能
6)后续将要做的修改:将ThreadPool的多线程分支大改,包括spin等待逻辑/spin的条件,以便提升性能。
性能测试结论:
性能好于Paddle原来使用的ThreadPool、TFRT的SingleThreadedWorkQueue。
测试方法:
1)将PTB模型的OP计算图dump成文件,在测试程序中还原计算图。
2)使用一段纯CPU计算(for循环计数)模拟OP执行。
3)按照拓扑排序执行计算图,通过AddTask方法将算子提交到SingleThreadedWorkQueue。
4)执行2000个batch,每个batch执行一遍计算图,模拟训练过程。