You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How was 'java_large_adversarial' created?
Was it created by 'preprocess.py'?
Also, I checked the c2v files, but they don't have java code strings.
I wonder why they were made in that form like 'find|class override,1796187300,? override,-733851942,METHOD_NAME ?,929015244,METHOD_NAME ?,-1951158817,name ?,380512398,string METHOD_NAME,1261040172 ~~~'.
The text was updated successfully, but these errors were encountered:
Hi @skkuai ,
Thank you for your interest in our paper!
@noamyft has created the dataset, I think that he used preprocess.**sh** , which calls JavaExtractor, which is a modified version of the java extractor in the original code2vec code.
The numbers in the c2v files, like "1796187300" and "-733851942", are hashes of the original path strings.
The original path strings look like "ForLoop^Expression^Block_IfCondition", and the Java extractor just hashes these strings.
This is how it was done in the original code2vec, to save disk space (save a 10-characters strings rather than a much longer string).
There exists a flag --no_hash in the JavaExtractor that keeps these path strings in their original form instead of hashing. It has no effect on the performance, just the dataset takes more space on the disk.
Let us know if you have any other questions.
Best,
Uri
Dear sir,
How was 'java_large_adversarial' created?
Was it created by 'preprocess.py'?
Also, I checked the c2v files, but they don't have java code strings.
I wonder why they were made in that form like 'find|class override,1796187300,? override,-733851942,METHOD_NAME ?,929015244,METHOD_NAME ?,-1951158817,name ?,380512398,string METHOD_NAME,1261040172 ~~~'.
The text was updated successfully, but these errors were encountered: