First create a conda environment using environment.yml
conda create --name environment_name --file environment.yml
cd build_isarstep
Follow the instructions in the README of Build_Train_IsarStep to build the training set. Follow the instructions in the README of Build_Validation_IsarStep to build the validation set.
The above folders are pulled from here. These scripts were written by Wenda Li and modified by Jeffrey Ma to produce the desired splits.
To make an alternative split of the current AFP, just alter train_names.txt. All the names that appear in train_names.txt will be processed and built into the training dataset and all excluded names will be built into the validation set. There are helper functions in ../scripts/data/ to collect theory names and count lines of code.
cd scripts
Create a new config yml in configs/ based on config.yml. Make sure finetune.py uses the config file you want.
python finetune.py
Make sure gen.py and eval.py use the correct config files. Note that n specifies how many answers to produce for each question. If n > 1, only top-n accuracy will be evaluated. If n = 1, top-1 accuracy and BLEU score will be evaluated.
To generate model output run:
python gen.py
To evaluate the output run:
python eval.py
To visualize attention weights, use viz.ipynb and plugin your model.
The 2019 IsarStep, 2022 IsarStep, and small IsarStep are available in datasets/ The trained models are available under models/ The outputs of the models are available under outputs/
Scripts used for processing and analyzing the dataset are under scripts/data/ Some of the scripts were run on Yale's HPC Grace cluster due to the size of the datasets. The scripts to run these slurm jobs on the cluster are found under grace/