Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How was 'java_large_adversarial' created? #1

Open
skkuai opened this issue Sep 20, 2021 · 1 comment
Open

How was 'java_large_adversarial' created? #1

skkuai opened this issue Sep 20, 2021 · 1 comment

Comments

@skkuai
Copy link

skkuai commented Sep 20, 2021

Dear sir,

How was 'java_large_adversarial' created?
Was it created by 'preprocess.py'?

Also, I checked the c2v files, but they don't have java code strings.
I wonder why they were made in that form like 'find|class override,1796187300,? override,-733851942,METHOD_NAME ?,929015244,METHOD_NAME ?,-1951158817,name ?,380512398,string METHOD_NAME,1261040172 ~~~'.

@urialon
Copy link
Contributor

urialon commented Sep 22, 2021

Hi @skkuai ,
Thank you for your interest in our paper!

@noamyft has created the dataset, I think that he used preprocess.**sh** , which calls JavaExtractor, which is a modified version of the java extractor in the original code2vec code.

The numbers in the c2v files, like "1796187300" and "-733851942", are hashes of the original path strings.
The original path strings look like "ForLoop^Expression^Block_IfCondition", and the Java extractor just hashes these strings.
This is how it was done in the original code2vec, to save disk space (save a 10-characters strings rather than a much longer string).
There exists a flag --no_hash in the JavaExtractor that keeps these path strings in their original form instead of hashing. It has no effect on the performance, just the dataset takes more space on the disk.

Let us know if you have any other questions.
Best,
Uri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants