-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metal backend ? #179
Comments
Metal support has recently been merged in #175. Should be available in the next release |
Great ! I have just test it (dev) and it seems to run ok ! |
You could try manually writing the kernel with |
I will try this ! |
While this seems to be hard coded, in reality it is not: literal types are converted to the right type before the kernel expression is given back to the user (at parse time). To enable simpler dealing with literal types, as a user you only need to distinguish between floats and integers, write for example |
Yes. I confirm that the performance does not change with FP32 literals. Thanks again for the Metal backend ! |
Which is about 53% of peak for your application. What's the percentage on e.g. Nvidia GPU? Maybe tuning the kernel launch parameters could help a bit in Metal? Although, 53% isn't that bad ;) |
Yes it is not bad for this size (512^3). I do not remember how to tune the kernel launch parameters with PS. |
Fair point!
From
The remaining thread and block sizes is then inferred from ParallelStencil.jl/src/ParallelKernel/parallel.jl Lines 598 to 610 in d7b00ca
@omlins it may be interesting to have the per backend heuristic defined in the extension now that we have those? |
Thanks ! |
@LaurentPlagne : unfortunately, I don't have access to a Metal capable GPU in order to test and tune myself. |
and @albert-de-montserrat ? |
@omlins : I can run the test you provide on my machine if that helps. |
@omlins I have not dug deep enough into that as of now, but I could try to gather some info and then come up with an heuristic. I also can run the tests ofc.
@LaurentPlagne Thanks a lot! Surely we can chat, contact me vie the email address you can find in my GH profile page. |
Hi,
Considering that apple GPUs are the only devices allowing for very large fast RAM (up to 128 Go) for (relatively) affordable price,
I wonder about the amount of work that would be necessary to add a metal backend to PS.
I also wonder if the best way to achieve that would be using a KernelAbstractions.jl backend or to extend the current PS collection of backend.
Any though ?
Laurent
The text was updated successfully, but these errors were encountered: