Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey, first PR for me here:
adding ShaderEval task1, this is essentially just implementing the task as is in the EvaluationSuite: return completion.
the task is very much just meant as a "proof of concept" as there are several issues with it. I do plan on introducing more tasks to this benchmark soon and also make them generally better.
I do have several question too.
some differences that should not impact the results:
postprocess_generation
";"
- not the list of all tokens containing the semicolon (does theEndOfFunctionCriteria
handle this?)--do_sample False
to use greedy search (temperature can't be set to 0?)concerns I hope to address:
import fcntl
as that module does not exist on my home machine (Windows), so running tests wasn't possiblegpt2
,bigscience/bloom-560m
andVipitis/santacoder-finetuned-Shadertoys-fine
when running just 10 samples. Additionally I did a single run with 300 samples (this snippet is used throughout the paper) and got matching numbers of 0.566Run parameters were the following:
accelerate launch main.py \ --model Vipitis/santacoder-finetuned-Shadertoys-fine \ --tasks ShaderEval \ --limit 10 \ --do_sample False \ --save_generations \ --save_generations_path generations_py.json \ --use_auth_token \ --trust_remote_code
(not having the last two doesn't throw any error but still runs (even slower) and return erroneous outputs)