Skip to content

EducationalTestingService/concept-control-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Controlled Language Generation for Language Learning Items

This is the repository for the paper Controlled Language Generation for Language Learning Items at the Industry Track, EMNLP 2022. The code is based heavily on HuggingFace's sequence-to-sequence Trainer examples.

Requirements

Scripts were tested with python 3.9 and transformers version 4.6.1. Nothing else should be required.

Data

The data is provided as jsonlines objects containing relevant fields for concept-to-sequence generation with control. The files require Git LFS.

Training

To train, call the concept2seq.py script with --mode train, along with the required parameters. The "extras" parameter includes the control: this can be "srl", "wsd", "or "cefr".

# Set a root directory
r=/home/nlp-text/dynamic/kstowe/github/concept-control-gen/
data_json=${r}/data/concept2seq_train.jsonl

# Substitute in your python
/home/conda/kstowe/envs/pretrain/bin/python $r/concept2seq.py \
    --mode train \
    --data_dir $data_json \
    --output_dir $r/models/c2s_test \
    --epochs 3 \
    --batch_size 32 \
    --model_path facebook/bart-base \
#    --extras srl \

Prediction

Prediction works similarly, using the supported parameters.

# Set a nice root
r=/home/nlp-text/dynamic/kstowe/github/concept-control-gen/

/home/conda/kstowe/envs/pretrain/bin/python $r/concept2seq.py \
        --mode test \
        --output_path $r/outputs/test.txt \
        --test_path ${r}/data/concept2seq_test.jsonl \
        --model_path kevincstowe/concept2seq

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages