Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probable bug in conversion routines #177

Open
kvaragan opened this issue Dec 17, 2015 · 14 comments
Open

Probable bug in conversion routines #177

kvaragan opened this issue Dec 17, 2015 · 14 comments
Labels

Comments

@kvaragan
Copy link
Contributor

test case: Reuters911.mtx. Environment: Linux 64-bit, W9100.
test-conversion routines reports failed tests:
dense_to_csr, where TypeParam = float
coo_to_csr, where TypeParam = float
csr_to_dense, where TypeParam = double
dense_to_csr, where TypeParam = double
csr_to_coo, where TypeParam = double
coo_to_csr, where TypeParam = double

We need to look at conversion routines.

@kvaragan kvaragan added the bug label Dec 17, 2015
@kvaragan
Copy link
Contributor Author

Reuters911 has integer symmetric data, as a result in dense2csr conversion fails in the function
atomic_reduce<T, F_OP>. The template parameter T is clsparseIdx_t which is cl_uint. But in the function atomic_reduce, the condition else if (typeid(cl_int) == typeid(T)) fails. We can't change cl_int to cl_uint since mtx data can be integer and not just unsigned int.

@kknox
Copy link
Contributor

kknox commented Dec 17, 2015

Hi Kvaraga~
I'm looking at the contents of the Reuters911 file, and I don't see negative indices. Do you feel that we should be using signed types as indices? I don't know of a use case.

@kvaragan
Copy link
Contributor Author

The indices are not negative, it is the values which can be negative just like in the real matrices. So we need to support integer values as well. I added the condition typeid(cl_sparseIdx_t) == typeid(T) to remove clsparseInvalidType error.

@jpola
Copy link
Contributor

jpola commented Jan 18, 2016

Hi,

Have you tried to run any of clSPARSE programs on CUDA device? I have GTX 660 with CUDA 7.5. It is capable to execute OpenCL 1.2 programs however when I executed any of samples I obtain following error:

#ifndef WG_SIZE
#error WG_SIZE undefined!
#endif

__kernel
__attribute__((reqd_work_group_size(WG_SIZE,1,1)))
void control(void)
{
    return;
}


---------------------------------------
parameters:  -cl-kernel-arg-info -cl-std=CL1.2 -DWG_SIZE=1024
---------------------------------------
ptxas application ptx input, line 31; fatal   : Parsing error near '}': syntax error
ptxas fatal   : Ptx assembly aborted due to errors


@kknox
Copy link
Contributor

kknox commented Jan 18, 2016

Hi @jpola
I've not tried to run tests on an Nvidia device in some time. Is it any kernel giving this parse error? I will try to repro my tomorrow.

@kvaragan
Copy link
Contributor Author

Just tried running clsparse binaries on NV systems, they are not working. Some are producing seg-faults and some are producing the error mentioned by @jpola .

@jpola
Copy link
Contributor

jpola commented Jan 19, 2016

Since CUDA devices are now supporting OpenCL 1.2 standard, the whole flow:
compilation, kernelWrapper and sourceProvider need to be revisited. Since
currently I do not have AMD GPU available, I can't proceed with the isssue
related to conversion routines. Sorry.

2016-01-19 7:35 GMT+01:00 Kiran [email protected]:

Just tried running clsparse binaries on NV systems, they are not working.
Some are producing seg-faults and some are producing the error mentioned by
@jpola https://github.com/jpola .


Reply to this email directly or view it on GitHub
#177 (comment)
.

@jpola
Copy link
Contributor

jpola commented Jan 19, 2016

I think but not sure that -cl-kernel-arg-info is causing the problem.

@kknox
Copy link
Contributor

kknox commented Jan 20, 2016

Hi @jpola
It took me a few hours to build a machine again, but I think I can help you. This is not a valid fix for checkin, but minimal changes to execute on an nvidia device again.

kernel-cache.cpp:45: #if( BUILD_CLVERSION >= 200 )

kernel-wrap.hpp:123: #if( BUILD_CLVERSION < 200 )

@jpola
Copy link
Contributor

jpola commented Jan 25, 2016

Hi,

I did some research regarding the bug.
<goog_146701560>

  1. reduce_by_key.hpp
    https://github.com/clMathLibraries/clSPARSE/blob/develop/src/library/transform/reduce-by-key.hpp#L132
    there should be size_input instead of size.
  2. The main problem lays in the indices_to_offsets function. The code is
    working for the matrices having all rows filled with at least one non-zero
    value. ie. diagonal matrix. It seems to me that if the matrix have at least
    one empty row the procedure will generate wrong results. If you have Matlab
    or python scipy you can use spy function to take closer look on the
    matrices which causing the problem i.e. Reuters. There are many empty rows.
    In opposite the 3elt or Cities which are good. For the fast fix I would put
    that procedure on host and change it's implementation on the one that will
    use several for loops. We are not able to say quickly what is the sparsity
    pattern of a given matrix without scanning it and then make a decision
    where to execute the code. In example

if(matrix_is_good(coo))
//use_gpu_implementation;
else
//use_host_implementation;

I didn't foresee this situation because in CFD I haven't got such matrices
like Reuters.

Well done generalization of indices_to_offsets function can be achieved by
using vectorized binary search like the cusp is doing. We could probably do
that by using boost::compute. This library seems to have all component
for_each, zip_iterator, counting_iterator, binary_search and closures
mechanism to generate functors for search algorithms.

Kuba.

2016-01-20 1:00 GMT+01:00 Kent Knox [email protected]:

Hi @jpola https://github.com/jpola
It took me a few hours to build a machine again, but I think I can help
you. This is not a valid fix for checkin, but minimal changes to execute on
an nvidia device again.

kernel-cache.cpp:45
https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/internal/kernel-cache.cpp#L45:
#if( BUILD_CLVERSION >= 200 )

kernel-wrap.hpp:123
https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/internal/kernel-wrap.hpp#L123:
#if( BUILD_CLVERSION < 200 )


Reply to this email directly or view it on GitHub
#177 (comment)
.

@kknox
Copy link
Contributor

kknox commented Jan 25, 2016

Hey Kuba~
Thanks for taking the time to triage, i know that you are busy with other projects. Does your quick fix only apply to a few of the conversion routines? It looks like you have 2 fixes above, one is simple and one is more involved. If everything is fixed, which routines would slow down (by offloading to CPU)?

Kent

@jpola
Copy link
Contributor

jpola commented Jan 25, 2016

The routines which will be aftected are dense_to_csr and coo_to_csr. They
both converting coo format to csr.

2016-01-25 20:29 GMT+01:00 Kent Knox [email protected]:

Hey Kuba~
Thanks for taking the time to triage, i know that you are busy with other
projects. Does your quick fix only apply to a few of the conversion
routines? It looks like you have 2 fixes above, one is simple and one is
more involved. If everything is fixed, which routines would slow down (by
offloading to CPU)?

Kent


Reply to this email directly or view it on GitHub
#177 (comment)
.

@kknox
Copy link
Contributor

kknox commented Jan 25, 2016

Hi @jpola
If you could make the fast fixes and check them into the /develop branch, we will look into the greater overall fix if/when we transition to boost.compute. I would appreciate if you could comment the code appropriately for why the computation is routed to CPU.

K

@jpola
Copy link
Contributor

jpola commented Jan 26, 2016

Sure, I will do that.

2016-01-25 23:32 GMT+01:00 Kent Knox [email protected]:

Hi @jpola https://github.com/jpola
If you could make the fast fixes and check them into the /develop branch,
we will look into the greater overall fix if/when we transition to
boost.compute. I would appreciate if you could comment the code
appropriately for why the computation is routed to CPU.

K


Reply to this email directly or view it on GitHub
#177 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants