Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in tuning results #116

Closed
MigMuc opened this issue Oct 15, 2016 · 8 comments
Closed

Error in tuning results #116

MigMuc opened this issue Oct 15, 2016 · 8 comments

Comments

@MigMuc
Copy link

MigMuc commented Oct 15, 2016

I started to tune my GPU device (Tonga) as described in the README. Then I started the python script which then gives an error (division by 0). So I found that there is one *.json file which has a timing result of 0.000.
{ 100 "kernel": "XgemvFast", 101 "time": 0.000, 102 "parameters": {"WGS2": 64,"WPT2": 4,"VW2": 4,"PRECISION": 6464} 103 },

Looking at other timings I see a bunch of results with timings close to 0.000 (order of nanoseconds).
{ 105 "kernel": "XgemvFast", 106 "time": 0.003, 107 "parameters": {"WGS2": 128,"WPT2": 1,"VW2": 1,"PRECISION": 6464} 108 },

I think this is not realistic.

@CNugteren
Copy link
Owner

Could you share the whole output of the gemvfast tuner as shown on screen? And perhaps also upload the JSON as a zip file? Thanks!

@MigMuc
Copy link
Author

MigMuc commented Oct 15, 2016

Tuning the Xgemv now, it seems not to having produced any timings exactly 0.0. I have attached 2 files with the output of the tuner and one of the corresponding JSON files. Do you want to have all of them?
The tuner war run with ./clblast_tuner_xgemv -precision 6464.

xgemv_tuner.txt
clblast_xgemv_fast_6464.json.txt

@CNugteren
Copy link
Owner

Thanks for the input. Some observations:

  • The 0.0 ms (or close to zero) is clearly an error since the attained throughput is way beyond what you would expect on any device.
  • It could be that the kernel crashed or did not run at all. However, then CLTune should give a warning stating that the results are incorrect. Apparently it does not, perhaps there is a bug in results-checking? I will verify this.
  • The problem is across all 3 flavours of the GEMV kernels: Xgemv, XgemvFast, and XgemvFastRot. So it doesn't seem to be kernel-specific. However of course these kernels share some common code.

Is this only the case for double-complex precision (6464)? Perhaps the compiler can't handle some of the parts of the kernel for given configurations if it becomes too complicated?

By the way, this is on the development branch, right?

@MigMuc
Copy link
Author

MigMuc commented Oct 16, 2016

It turns out that times close to zero do not happen just only with the Xgemv kernel but with the Xdot kernel as well. It happens for other precisions, too. This behaviour is seen with the development branch. I am tuning the master branch right now.

By the way, what do you exactly mean with "the kernel ... becomes too complicated"?
I observed some errors which I posted on another issue #107. Maybe it it is somehow related to each other.

@MigMuc
Copy link
Author

MigMuc commented Oct 18, 2016

I was playing around with the OpenCL example you provide in the project myGEMM. I wondered why I could not run the example. As it turned out, I cannot set the number of thread blocks > 16.
That is because the GPU I am working on does not support a maximum work group size > 256.
So I thought maybe to tune the Xgemm setting this value <= 256. That could be the reason why tuning of the kernel fails. How can I limit this parameter when tuning? Is there any command line argument to set the limit?

@CNugteren
Copy link
Owner

This should be handled by the tuner itself or internally in CLTune. I'll investigate why this is not happening.

@CNugteren
Copy link
Owner

I doubled-checked and CLTune still seems to be handling the maximum work-group size limitation. On which version of CLTune are? A release version - if so, which one? Perhaps you can try to update to the latest development version and try again? Could you also update to the latest development branch of CLBlast?

I also verified that in case of an OpenCL error CLTune finds it and does not report that specific run as a valid result, e.g.:

[ RUN      ] Running vector_add
[   FAILED ] Kernel vector_add failed
[   FAILED ]   catched exception: Internal OpenCL error: -54
[   FAILED ] vector_add;      0.0 ms;GROUP_SIZE 256;

Also if results are incorrect it report it and does not mark the run as valid, e.g.:

[ RUN      ] Running Xgemv
[       OK ] Completed Xgemv (5.8 ms) - 3 out of 12
[  WARNING ] Results differ: L2 norm is 5.34e+04
[  WARNING ] Xgemv;      5.8 ms;  WGS1 32;   WPT1 4;PRECISION 3232;

What could of course be is that the original kernel produced incorrect results as well, in that case if another kernel produces the same incorrect results it is marked as correct. One example would be if the kernel doesn't do anything at all. But in that case you would expect all runs to take 0.0ms.

Could you re-upload the xgemv_tuner.txt after you've updated to the latest versions of CLTune and CLBlast? Thanks!

@MigMuc
Copy link
Author

MigMuc commented Oct 22, 2016

I changed from the CLTune master branch to the development branch and now everything works as expected. Thank you for your suggestions. I have posted the tuning results on the corresponding issue (#1).

@MigMuc MigMuc closed this as completed Oct 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants