A semantic search engine that takes some input text and returns some (questionably) relevant (questionably) famous quotes.
Built with:
- bert-as-a-service
- faiss (actually, faiss_prebuilt)
- streamlit
Quotes from https://thewebminer.com/.
First, install the necessary dependencies into a python 3 environment of your choice. For instance, to install the deps into a venv, run
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
There are additional native dependencies for FAISS: libomp
and libopenblas
must be available (see the FAISS repo for install instructions).
All other commands should be run from within the virtual environment.
A Makefile is provided to make things nice and easy.
make dirs
make data # downloads the raw quote data
make model # downloads ~350MB of BERT weights
Before we can run the app, we need embeddings of the quotes. To generate the embeddings and save them in a pickled pandas DataFrame, run the commands below. This will take some time (couple of hours) on CPU.
make serve # this runs bert-as-a-service
make embed # this computes the embeddings
Once the embeddings exist, we can run the streamlit app with:
make serve # not needed if still running from above
make app
Have fun!