Error in tuning results #116

MigMuc · 2016-10-15T14:17:35Z

I started to tune my GPU device (Tonga) as described in the README. Then I started the python script which then gives an error (division by 0). So I found that there is one *.json file which has a timing result of 0.000.
{ 100 "kernel": "XgemvFast", 101 "time": 0.000, 102 "parameters": {"WGS2": 64,"WPT2": 4,"VW2": 4,"PRECISION": 6464} 103 },

Looking at other timings I see a bunch of results with timings close to 0.000 (order of nanoseconds).
{ 105 "kernel": "XgemvFast", 106 "time": 0.003, 107 "parameters": {"WGS2": 128,"WPT2": 1,"VW2": 1,"PRECISION": 6464} 108 },

I think this is not realistic.

The text was updated successfully, but these errors were encountered:

CNugteren · 2016-10-15T14:54:03Z

Could you share the whole output of the gemvfast tuner as shown on screen? And perhaps also upload the JSON as a zip file? Thanks!

MigMuc · 2016-10-15T17:31:22Z

Tuning the Xgemv now, it seems not to having produced any timings exactly 0.0. I have attached 2 files with the output of the tuner and one of the corresponding JSON files. Do you want to have all of them?
The tuner war run with ./clblast_tuner_xgemv -precision 6464.

xgemv_tuner.txt
clblast_xgemv_fast_6464.json.txt

CNugteren · 2016-10-16T09:54:57Z

Thanks for the input. Some observations:

The 0.0 ms (or close to zero) is clearly an error since the attained throughput is way beyond what you would expect on any device.
It could be that the kernel crashed or did not run at all. However, then CLTune should give a warning stating that the results are incorrect. Apparently it does not, perhaps there is a bug in results-checking? I will verify this.
The problem is across all 3 flavours of the GEMV kernels: Xgemv, XgemvFast, and XgemvFastRot. So it doesn't seem to be kernel-specific. However of course these kernels share some common code.

Is this only the case for double-complex precision (6464)? Perhaps the compiler can't handle some of the parts of the kernel for given configurations if it becomes too complicated?

By the way, this is on the development branch, right?

MigMuc · 2016-10-16T12:17:08Z

It turns out that times close to zero do not happen just only with the Xgemv kernel but with the Xdot kernel as well. It happens for other precisions, too. This behaviour is seen with the development branch. I am tuning the master branch right now.

By the way, what do you exactly mean with "the kernel ... becomes too complicated"?
I observed some errors which I posted on another issue #107. Maybe it it is somehow related to each other.

MigMuc · 2016-10-18T01:03:51Z

I was playing around with the OpenCL example you provide in the project myGEMM. I wondered why I could not run the example. As it turned out, I cannot set the number of thread blocks > 16.
That is because the GPU I am working on does not support a maximum work group size > 256.
So I thought maybe to tune the Xgemm setting this value <= 256. That could be the reason why tuning of the kernel fails. How can I limit this parameter when tuning? Is there any command line argument to set the limit?

CNugteren · 2016-10-18T06:12:09Z

This should be handled by the tuner itself or internally in CLTune. I'll investigate why this is not happening.

CNugteren · 2016-10-21T20:36:46Z

I doubled-checked and CLTune still seems to be handling the maximum work-group size limitation. On which version of CLTune are? A release version - if so, which one? Perhaps you can try to update to the latest development version and try again? Could you also update to the latest development branch of CLBlast?

I also verified that in case of an OpenCL error CLTune finds it and does not report that specific run as a valid result, e.g.:

[ RUN      ] Running vector_add
[   FAILED ] Kernel vector_add failed
[   FAILED ]   catched exception: Internal OpenCL error: -54
[   FAILED ] vector_add;      0.0 ms;GROUP_SIZE 256;

Also if results are incorrect it report it and does not mark the run as valid, e.g.:

[ RUN      ] Running Xgemv
[       OK ] Completed Xgemv (5.8 ms) - 3 out of 12
[  WARNING ] Results differ: L2 norm is 5.34e+04
[  WARNING ] Xgemv;      5.8 ms;  WGS1 32;   WPT1 4;PRECISION 3232;

What could of course be is that the original kernel produced incorrect results as well, in that case if another kernel produces the same incorrect results it is marked as correct. One example would be if the kernel doesn't do anything at all. But in that case you would expect all runs to take 0.0ms.

Could you re-upload the xgemv_tuner.txt after you've updated to the latest versions of CLTune and CLBlast? Thanks!

MigMuc · 2016-10-22T00:23:13Z

I changed from the CLTune master branch to the development branch and now everything works as expected. Thank you for your suggestions. I have posted the tuning results on the corresponding issue (#1).

MigMuc closed this as completed Oct 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in tuning results #116

Error in tuning results #116

MigMuc commented Oct 15, 2016

CNugteren commented Oct 15, 2016

MigMuc commented Oct 15, 2016

CNugteren commented Oct 16, 2016

MigMuc commented Oct 16, 2016 •

edited

Loading

MigMuc commented Oct 18, 2016

CNugteren commented Oct 18, 2016

CNugteren commented Oct 21, 2016

MigMuc commented Oct 22, 2016 •

edited

Loading

Error in tuning results #116

Error in tuning results #116

Comments

MigMuc commented Oct 15, 2016

CNugteren commented Oct 15, 2016

MigMuc commented Oct 15, 2016

CNugteren commented Oct 16, 2016

MigMuc commented Oct 16, 2016 • edited Loading

MigMuc commented Oct 18, 2016

CNugteren commented Oct 18, 2016

CNugteren commented Oct 21, 2016

MigMuc commented Oct 22, 2016 • edited Loading

MigMuc commented Oct 16, 2016 •

edited

Loading

MigMuc commented Oct 22, 2016 •

edited

Loading