Skip to content

SSC-ICT-Innovatie/LearningLion-WOO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LearningLion-WOO

The project is a study on the use of generative AI to improve the services of SSC-ICT by supporting employees and optimizing internal processes. Originally, the focus is on generative large language models (LLM), in the form of Retrieval Augmented Generation (RAG), because they can have the most significant impact on the daily work of SSC-ICT employees. This version dipes deeper into the Retrieval part in RAG. The original version can be found here.

This version serves as part of the Master Thesis of Nicky Ju.

The paper corresponding to this repository can be found in the TU Delft Repository.

Flow Chart

Flow Chart

Files

Filenames starting with

  • create --> create evaluation files with specific preprocessing
  • evaluate --> running queries on vector database/corpus
  • ingest --> creating vector database/corpus
  • preprocess --> preprocess the data in different ways before creating the database
  • relevance --> (re-)evaluating the results

Complete Example Pipeline

This guide assumes that you are familiar with the basics of Python (such as setting up environment, and installing packages).

  1. First steps
  2. Preprocess Data
    • Run preprocess preprocess_real_words.py or preprocess_stem_stopwords.py to preprocess the data in different ways.
  3. Database creation
    • Create Vector Store with ingest_embeddings.py.
    • Create BM25 Corpus with ingest_bm25.py.
  4. Evaluation
    • Run the evaluation files with the vector store/bm25 corpus evaluate_bm25.py or evaluate_embeddings.py.
  5. Evaluation metrics
    • relevance_evaluation.ipynb to calculate basic metrics like precision and recall.
    • relevance_dossier_average.ipynb for frequency based, relevance_dossier_MAP.ipynb for weighted frequency based.

About

Project opensource generative AI voor WOO

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.0%
  • Python 5.0%