This repository contains the code for the paper "A Modular RAG-Based System for Automatic Short Answer Scoring with Feedback", which explores the use of Retrieval-Augmented Generation (RAG) for scoring short answers and providing detailed feedback to students. The system is designed to optimize scoring through modular pipelines, including zero-shot and few-shot setups, and offers an efficient method for integrating feedback.
To get started, clone this repository and set up the necessary environments for running the notebooks:
-
Clone the repository:
git clone https://github.com/mennafateen/modular-asas-f-rag cd modular-asas-f-rag
-
Install dependencies for the zero-shot and RAG pipelines by setting up the respective environments.
-
Create and activate a virtual environment (recommended:
conda
):conda create --name asas-env python=3.10 conda activate asas-env
-
Install dependencies:
pip install -r requirements.txt
- Run the zero-shot pipeline notebook for automatic short answer scoring:
jupyter notebook asas-f-z.ipynb
- Run the optimized pipeline notebook for automatic short answer scoring:
jupyter notebook asas-f-opt.ipynb
For the RAG pipeline, an additional environment for ColBERT is required:
-
Create and activate a new virtual environment:
conda create --name colbert-env python=3.10 conda activate colbert-env
-
Install the ColBERT server requirements:
pip install -r requirements_server.txt
- Run the ColBERT indexing notebook to create the required index:
jupyter notebook colbert_indexing.ipynb
- Run the ColBERT server notebook to start the server for retrieval operations:
jupyter notebook colbert_server.ipynb
- Switch to the original environment (from
requirements.txt
) and run the RAG pipeline:jupyter notebook asas-f-rag.ipynb
- Use the ColBERT environment to run the majority class scorer:
jupyter notebook colbert_majority_scoring.ipynb
The evaluation of the system is conducted on the ASAS dataset, which is available in the data
directory. The dataset contains short answers for multiple-choice questions, along with the correct answers and the corresponding scores. The evaluation metrics include accuracy, F1-score, RMSE, BLEU, ROUGE and BERTScore scores.
You can evaluate the generated outputs which are located in the generated_results
directory by running the evaluation notebook:
jupyter notebook evaluation.ipynb
If you find this work useful in your research, please cite our paper: