-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in tuning results #116
Comments
Could you share the whole output of the gemvfast tuner as shown on screen? And perhaps also upload the JSON as a zip file? Thanks! |
Tuning the Xgemv now, it seems not to having produced any timings exactly 0.0. I have attached 2 files with the output of the tuner and one of the corresponding JSON files. Do you want to have all of them? |
Thanks for the input. Some observations:
Is this only the case for double-complex precision (6464)? Perhaps the compiler can't handle some of the parts of the kernel for given configurations if it becomes too complicated? By the way, this is on the |
It turns out that times close to zero do not happen just only with the Xgemv kernel but with the Xdot kernel as well. It happens for other precisions, too. This behaviour is seen with the By the way, what do you exactly mean with "the kernel ... becomes too complicated"? |
I was playing around with the OpenCL example you provide in the project myGEMM. I wondered why I could not run the example. As it turned out, I cannot set the number of thread blocks > 16. |
This should be handled by the tuner itself or internally in CLTune. I'll investigate why this is not happening. |
I doubled-checked and CLTune still seems to be handling the maximum work-group size limitation. On which version of CLTune are? A release version - if so, which one? Perhaps you can try to update to the latest development version and try again? Could you also update to the latest development branch of CLBlast? I also verified that in case of an OpenCL error CLTune finds it and does not report that specific run as a valid result, e.g.:
Also if results are incorrect it report it and does not mark the run as valid, e.g.:
What could of course be is that the original kernel produced incorrect results as well, in that case if another kernel produces the same incorrect results it is marked as correct. One example would be if the kernel doesn't do anything at all. But in that case you would expect all runs to take 0.0ms. Could you re-upload the |
I changed from the CLTune master branch to the development branch and now everything works as expected. Thank you for your suggestions. I have posted the tuning results on the corresponding issue (#1). |
I started to tune my GPU device (Tonga) as described in the README. Then I started the python script which then gives an error (division by 0). So I found that there is one *.json file which has a timing result of 0.000.
{ 100 "kernel": "XgemvFast", 101 "time": 0.000, 102 "parameters": {"WGS2": 64,"WPT2": 4,"VW2": 4,"PRECISION": 6464} 103 },
Looking at other timings I see a bunch of results with timings close to 0.000 (order of nanoseconds).
{ 105 "kernel": "XgemvFast", 106 "time": 0.003, 107 "parameters": {"WGS2": 128,"WPT2": 1,"VW2": 1,"PRECISION": 6464} 108 },
I think this is not realistic.
The text was updated successfully, but these errors were encountered: