This is a simple Python server that offers an NLP API for text completion that is based on pre-trained models.
Prerequisites:
- python3
- pip
Create a local virtual environment and activate:
python3 -m venv env
source env/bin/activate
Install the required dependencies:
pip install -r requirements.txt
Create an auth token file containing a token for authentication under secrets/token
.
Run the application:
python3 main.py
Prerequisites:
- docker
Create an auth token file containing a token for authentication under secrets/token
.
Build the docker image:
docker build -t nlp-server .
Run the image that hosts the server:
docker run -d -p 8001:8001 nlp-server
Prerequisites:
- kubernetes
The kubernetes resources under k8s/
use a pre-built (as above) image pushed to Docker Hub.
Create kubernetes namespace:
kubectl create namespace mlops
Apply kubernetes resources:
kubernetes apply -f k8s/
Example requests can be sent to the NLP server running locally using the following command:
curl -X POST "http://127.0.0.1:8001/suggestions/" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"sentence": "The weather today is <blank>."}'
If testing the kubernetes setup with minikube you can get the minikube ip address with:
minikube ip
and send the same curl command as follows:
curl -X POST "http://<minikube-ip>:31000/suggestions/" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"sentence": "The weather today is <blank>."}'
Docs for the API are available directly from the server at the /docs
path.
Locally or with docker these can be found at 127.0.0.1:8001/docs
.
For minikube these docs are accessible at <minikube-ip>:31000/docs
.
Basic caching has been implemented in-memory on a per application instance level. This can later
be extended to use distributed caching, e.g. with Redis, which can be shared by multiple instances
of the same application when running the nlp-server
in a distributed fashion with Kubernetes.