Skip to content

Latest commit

 

History

History
60 lines (44 loc) · 2.63 KB

README.md

File metadata and controls

60 lines (44 loc) · 2.63 KB

rnn2source

This code implements javascript source code generation using deep recurrent neural networks(RNN) with Long Short-Term Memory(LSTM) cells. In a nutshell, our model takes text files of source code, minifies and "reads" them; then after being trained, generates sequences of source code. Currently 2 approaches are available:

  • Character Level Learning: This approach is based on Andrej Karpathy's char-rnn. It is modified to work with javascript repositories.

  • Labeled Character Learning: The first proposed approach consists of categorizing each character as one of eight simple classes(regex, keyword, string, number, operator, punctuator, identifier, other) . The model in this case takes both the character and its label as input and makes predictions for the next set.

Requirements

This code is written in Python 2.7 using Keras with Theano as a backend.

Installing Theano

Installing Keras

As deep Recurrent Neural Networks are very expensive computationally you are advised to use a GPU to train them.

Using Theano with the GPU

Various libraries are used to manipulate & evaluate the source code. If you have pip installed you can try installing them with a single command:

pip install -r requirements.txt

Usage

Preprocessing

Creating the datasets is handled by preprocess.py. It takes one the path to the JS projects as argument. It minifies, tags & shuffles the source code. Produced datasets are placed in data/input.

python preprocess.py [path to root directory of projects]

Training

To train the model you can call each implementation's script. The training scripts take one optional argument in case you want to start training from a previously trained model.

  • Character:
python char-rnn.py [-r] [filepath to previous model]
  • Labeled Character:
python labeled-char-rnn.py [-r] [filepath to previous model]

Sampling

To sample the trained models, thus generating source code, you can call each implementation's sampling script. The sampling scripts take one mandatory argument and three optional ones.

  • Character:
python sample.py [filepath to  model] [-s] [primer text] [-t] [softmax temperature [0, 1]] [-l] [length to generate]
  • Labeled Character:
python sample-labeled.py [filepath to  model] [-s] [primer text] [-t] [softmax temperature [0, 1]] [-l] [length to generate]

Licence

MIT