PrivateFalcon is a Python script that allows you to locally query documents using the Falcon-7b Language Model (L.L.M) from HuggingFace. This script is designed to work with documents that have been ingested into a VectorStore using the ingest.py
file. With PrivateFalcon, you can perform efficient and accurate document retrieval and similarity searches.
Before you begin, make sure you have the following prerequisites in place:
- Python 3.x
- A pre-trained Falcon-7b model (Will be installed by the script)
- Documents Placed in the
data/
directory - .env file containing:
DB_DIRECTORY=vectors
EMBEDDINGS_MODEL=all-MiniLM-L6-v2
SOURCE_CHUNKS=<Number of chunks used to create an answer, if lost - put 4>
MAX_NEW_TOKENS=200
CHUNK_SIZE=<The size of each chunk, if lost - put 1000>
CHUNK_OVERLAP=<The overlapping of different chunks, if lost - put 100>
You can put any other embeddings model into that variable.
- Clone the repository
git repo clone https://github.com/AdiKsOnDev/PrivateFalcon.git
- Install the dependencies
pip install -r requirements
PrivateFalcon is easy to use:
- Place your documents into the
data/
directory. - Run:
python ingest.py
- After creating a VectorStore, run:
python main.py
PrivateFalcon/
├── main.py # Ask questions
├── ingest.py # Script that ingests your documents
├── vectors/ # Directory with ingested documents
├── data/ # Directory with the source documents
├── requirements.txt # .txt file with all the dependencies
.csv
.doc
.docx
.enex
.epub
.html
.md
.odt
.pdf
.ppt
.pptx
.txt
If you want to contribute to PrivateFalcon, feel free to submit a pull request or make an Issue
For any questions or issues, please contact me at [email protected]
Happy querying with PrivateFalcon! 🦅🔍