-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run sample-spmv on CPU #149
Comments
Hi @treasurebox , CPU devices have not been a target for either testing or performance work, so we can make no claims that it will work or that it will perform well. It should work (as theory), as opencl abstracts the device implementation assuming there are no problems in the runtime. -1015 is the code for clsparseInvalidKernelExecution. Its returned from many places in the code. Can you step through and see where this return code is returned from? Are you building debug versions of the library? |
For what it's worth, I tried to test this by changing line 126 of sample-spmv.cpp from When running this on an AMD A10-7850K CPU using the AMD APP SDK on Linux, the program completed successfully (without Error -1015). As such, we will likely need your help in debugging this. Thank you for offering -- your help with the previous double precision issue is greatly appreciated. As for the performance of the algorithm on a CPU, as kknox said, we have not yet done any performance analysis or optimizations for CPUs. The SpMV algorithms we have currently implemented are focused on optimizing GPU performance. (For example, the csrmv_adaptive algorithm is described in the paper "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format" from SC14 and the upcoming paper "Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices" at the upcoming HiPC 2015.) If you get a chance to test the performance on a CPU, I would be interested to hear the results. |
Hi! Sorry for disappearing, I'd love to help debugging this, but currently I am a bit swamped under my other duties. |
Hi!
I've modified the sample to use CL_DEVICE_TYPE_CPU, for comparison/benchmarking purposes. It didn't work:
Let me know how I can help diagnosing this!
As a secondary question, can I expect the algorithm to work with reasonable performance on a CPU, or it is only good for an actual GPU ?
(I am on jlgreathouse's repo develop branch currently)
The text was updated successfully, but these errors were encountered: