-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate moving value/policy head to GPU #51
Comments
I'm looking into this. It's not as trivial as I had hoped. |
This doesn't move the entire head (i.e. not the FC/innerproduct layers), just the final 1x1 convolution. It's useful because it saves a lot of data transfer (64-128 fold!) over the PCIe bus, even if the computational gain is little. |
edit - never mind I found the issue. I do wonder though, what is a good way to debug OpenCL code? I'm a CUDA guy. |
#64 only moved the convolutions to GPU. Innerproducts are still on CPU and according to the profiler take more time than convolutions on GPU. I made earlier a GPU innerproduct for leelaz, but it was slower than doing it on CPU. It should be faster for lczero since innerproducts are much bigger. I'll see if I can port it over. |
Hey @glinscott sorry it's not fully resolved yet, we should still move the innerproducts to the GPU, atleast the policy head 32x8x8 -> 1924 and the 32x8x8 -> 128 from the valuehead. |
There was a prototype here: https://github.com/ihavnoid/leela-zero/tree/dualhead_conv
It was not a win for Leela Zero, but with us increasing from 1/2 to 32 or 64 channels, it could make a big difference now.
The text was updated successfully, but these errors were encountered: