Distractor generation for MCQs on english essays This file consists of python code to generate distractors for MCQs for english essays
Distractors are the options given in MCQs other than the correct answer text.
In this project we train the data having columns question, answer_text and distractor. We need to predict the distractors for any newly gven question and answer text.
- removed all special symbols of all the columns
- replaced 'nt with nt
- replaced missing values with empty strings
This problem can be viewed as classification task where the target variable will be 1 for the distractors and 0 for all other options for other questions.
- The text is featurized using word2vec model
- many other features are used like, number of words, length of the string, common words between ques/distractor and answer_distractor etc
- Fuzzy features from fuzzywuzzy library are also used.
XGBoost classifier has been used for this problem.
When we classiy the distractors, we get lot of options, if we need furthur to rank these options we can use the pairwise distances of each of these options and the answer to get more fine tuned results.