-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split send_op into fetch_vars_op and send_vars_op #9161
Comments
Thank you! This is indeed very important.
|
Hi @helin, thanks for your comments.
I make a short discuession with teacher Yu @reyoung at office, maybe we don't need an forward: fc1(w1)->send1(w1')->fc2(w2)->send2(w2')->fc3(w3)->send3(w3')->send_barrier()...
backpropagaion: recv1(w1)->fc1'(w1)->recv2(w2)->fc2'(w2)->recv3(w3)->fc3'(w3)->recv_barrier()... There is no dependency in the thread0: Wait(send_barrier)-->recv1(w1)-->Wait(recv1)-->fc1(w1)
thread1: Wait(send_barrier)-->recv2(w2)-->Wait(fc1,recv2)-->fc(w2)
thread2: Wait(send_barrier)-->recv3(w3)-->Wait(recv3, fc2)-->fc(w3)
thread3: Wait(recv1, recv2, recv3)-->recv_barrier() So it looks like |
I'm not sure, maybe we need do some experiments or maybe we need another IO ThreadPool which have many numbers of threads than the computing ThreadPool. But I think we need a parameter to configure the size of ThreadPool in the current situation.
As the comment above, maybe we only need more threads and don't block the computing threads? |
@Yancey1989 thanks! Maybe in future we need two types of threads, one for computing one for IO. @gongweibao thanks for the picture! |
Agree with you, I have added it to TODO list and will implement it ASAP. |
Currently, trainer would send all gradients after execution of all the backward ops, like:
For the above process, send op will send all gradients until all the forward, backward ops done.
But actually, we would send the
w2'
after opB(backward), sendw1'
after opA(backward), parallel execution of computing op and IO op would improve the performance. For another hand, currentSendOp
would not only do SEND, but alsowait all send request finished
andreceive parameters from pserver
, so we also need to split these into multiple Op.For sync update
fetch(w1)-->opA->fetch(w2)->opB->opB(backward)->w2'->send(w2')->opB(backward)->w1'->send(w1')->send_barrier()
for async update, there is no
send_varrier()
op at the end of the process.TODO
AsyncSendOp
,SendBarrierOp
.distribute transpiler
with the async send op.The text was updated successfully, but these errors were encountered: