Generate questions based on text in Russian. Uses ruGPT-3 implementation from https://github.com/sberbank-ai/ru-gpts
Created for AIJ-2020 Contest.
Full models: See colab notebook
Run docker run -p 5000:5000 orzhan/rugpt3-questions:latest
Open http://localhost:5000
for Swagger UI.
Small model (question generation only): https://drive.google.com/file/d/1-9sX3iWezHRwnlvHbtGjvZGkwhYaflRb/view?usp=sharing
Large models (question and answer generation): https://drive.google.com/uc?id=13siMs0HoU3WHkeGvNJxVFOF68BAQedmT
git clone https://github.com/orzhan/rugpt3-question-generation.git
pip install -r requirements.txt
./download.sh
Two types of questions are supported. To generate true/false questions, run
python true_false.py --topic [Topic_Name_From_Russian_wiki]
or python true_false.py --filename [Text file name]
To generate multiple choice questions, run
python multiple_choice.py --topic [Topic_Name_From_Russian_wiki]
or python multiple_choice.py --filename [Text file name]
There are additional command line options:
For true_false.py
:
Option | Description | Default |
---|---|---|
-t TEMPERATURE, --temperature TEMPERATURE | Temperature setting for model | 0.9 |
-c CONTEXT_SIZE, --context_size CONTEXT_SIZE | Number of sentences used for the context | 5 |
-q MAX_QUESTIONS, --max_questions MAX_QUESTIONS | Number of questions to generate | 10 |
-f FILENAME, --filename FILENAME | File name of context | None |
-w TOPIC, --topic TOPIC | Topic from wikipedia | None |
-sr SUMMARIZE_RATIO, --summarize_ratio SUMMARIZE_RATIO | Summarization ratio (for example 0.2). Alternative to --summarize_word_count. Use 1.0 to disable summarization | None |
-sw SUMMARIZE_WORD_COUNT, --summarize_word_count SUMMARIZE_WORD_COUNT | Summarization word count (for example 3000). Alternative to --summarize_ratio | 3000 |
For multiple_choice.py
:
Option | Description | Default |
---|---|---|
-f FILENAME, --filename FILENAME | File name of context | None |
-w TOPIC, --topic TOPIC | Topic from wikipedia | None |
-ta TEMPERATURE_ANSWER, --temperature_answer TEMPERATURE_ANSWER | Temperature setting for answer generation | 0.5 |
-tq TEMPERATURE_QUESTION, --temperature_question TEMPERATURE_QUESTION | Temperature setting for question generation | 0.5 |
-tw TEMPERATURE_WRONG_ANSWER, --temperature_wrong_answer TEMPERATURE_WRONG_ANSWER | Temperature setting for wrong answers | 2.0 |
-c CONTEXT_SIZE, --context_size CONTEXT_SIZE | Number of sentences used for the context | 8 |
-q MAX_QUESTIONS, --max_questions MAX_QUESTIONS | Number of questions to generate | 10 |
-a ANSWERS, --answers ANSWERS | Number of answers including correct. Set to 0 to output only questions | 5 |
-sr SUMMARIZE_RATIO, --summarize_ratio SUMMARIZE_RATIO | Summarization ratio (for example 0.2). Alternative to --summarize_word_count. Use 1.0 to disable summarization | None |
-sw SUMMARIZE_WORD_COUNT, --summarize_word_count SUMMARIZE_WORD_COUNT | Summarization word count (for example 3000). Alternative to --summarize_ratio | 3000 |
-g GENERATE_COUNT, --generate_count GENERATE_COUNT | Number of sequences generated each time. Higher values can produce better results but are slower and require more RAM |
You can also use the library from python code:
from multiple_choice import generate_multiple_choice
from tools import MultipleChoiceArgs
args = MultipleChoiceArgs()
args.topic = "Амур"
args.max_questions = 2
args.generate_count = 10
questions = generate_multiple_choice(args)
print(questions)
Run ./download.sh
and python prepare_training_data.py
, then train-large-models.sh
. Or change prepare_training_data.py
to use your own data.