Skip to content

Latest commit

 

History

History
22 lines (11 loc) · 1.11 KB

README.md

File metadata and controls

22 lines (11 loc) · 1.11 KB

dharmaQA

This is my "Hello World!" in the realm of RAGs.

This is a project to build a basic question answering system with RAG (Retrieval-Augmented Generation). Dataset used is dataset thats important to me, which is dataset made from Rob Burbea's Dharma talks.

Unfortunatelly, that dataset isn't well-suited for RAG system - it's not factual, it has long-winded answers, that are sometimes not directly related to the question.

For this kind of dataset, fine-tuning a language model would be more appropriate.

I'll explore RAG using a different dataset, and then come back to this dataset later.

Notes

App is deployed with streamlit cloud

It retrieves context from transcripts of Rob Burbea's Dharma talks, and generates a response based on the context.

Transcripts where downloaded from https://airtable.com/appe9WAZCVxfdGDnX/shr9OS6jqmWvWTG5g/tblHlCKWIIhZzEFMk/viw3k0IfSo0Dve9ZJ in the form of a pdf files.

I used marker to convert pdf to Markdown files before ingesting them.