Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate moving value/policy head to GPU #51

Closed
glinscott opened this issue Jan 28, 2018 · 7 comments
Closed

Investigate moving value/policy head to GPU #51

glinscott opened this issue Jan 28, 2018 · 7 comments

Comments

@glinscott
Copy link
Owner

There was a prototype here: https://github.com/ihavnoid/leela-zero/tree/dualhead_conv

It was not a win for Leela Zero, but with us increasing from 1/2 to 32 or 64 channels, it could make a big difference now.

@Error323
Copy link
Collaborator

I'm looking into this. It's not as trivial as I had hoped.

@gcp
Copy link
Contributor

gcp commented Feb 15, 2018

leela-zero/leela-zero@f4a36c1

This doesn't move the entire head (i.e. not the FC/innerproduct layers), just the final 1x1 convolution. It's useful because it saves a lot of data transfer (64-128 fold!) over the PCIe bus, even if the computational gain is little.

@Error323
Copy link
Collaborator

Error323 commented Feb 17, 2018

edit - never mind I found the issue. I do wonder though, what is a good way to debug OpenCL code? I'm a CUDA guy.

@glinscott
Copy link
Owner Author

Resolved by #64. Thanks @Error323!

@Ttl
Copy link
Contributor

Ttl commented Feb 25, 2018

#64 only moved the convolutions to GPU. Innerproducts are still on CPU and according to the profiler take more time than convolutions on GPU.

I made earlier a GPU innerproduct for leelaz, but it was slower than doing it on CPU. It should be faster for lczero since innerproducts are much bigger. I'll see if I can port it over.

@Error323
Copy link
Collaborator

Error323 commented Feb 25, 2018

Hey @glinscott sorry it's not fully resolved yet, we should still move the innerproducts to the GPU, atleast the policy head 32x8x8 -> 1924 and the 32x8x8 -> 128 from the valuehead.

@glinscott
Copy link
Owner Author

Ah, yes, good call @Error323. After some quick work by @Ttl the innerproducts are now moved to GPU as well! Thanks @Ttl!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants