Developing and productization of a Machine Learning model for Classification of Mango Varieties.
Gamification with client app that challenges the user to compete against the IA.
https://www.kaggle.com/datasets/saurabhshahane/mango-varieties-classification
This dataset contains images of eight varieties of Pakistani mangoes. Automated classification and grading of harvested mangoes will facilitate farmers in delivering high-quality mangoes on time for export, and a high accuracy may be achieved using Convolutional Neural Network.
- Python
- Tensorflow / Tensorflow Lite
- Keras
- Models: Xception, EfficientNetB2
- Numpy, Pandas, MatplotLib
- Flask and FastAPI
- Streamlit
- Docker / docker-compose
- Kubernetes / Kind
- AWS EKS
- Locust
- Pytest
demo.mp4
- Setup environment
- Download Dataset
- Development -> notebook.ipynb, train.py, convert_model.py
- Visualize images
- Prepare dataset: train-val-test split
- Create model Xception (imagenet coefs)
- Add checkpoints
- Select learning rate
- Add inner layers
- Add dropout regularization
- Perform data augmentation
- Train with larget files (299x299)
- Test the model with test set
- Train efficient-net-b2 for comparison
- Final model trainning script using the full_train and test datasets
- Model conversion to tf_lite
- Productization -> ./production/*
- Create tf-serving image
- Create gateway (FastAPI) image
- Create tests for testing the images (with docker-compose)
- Create Kubernetes yaml files (kind and EKS)
- gateway service and deployment
- tf-serving service and deployment
- Create Streamlit app
- Performance test with Locust
- Documentation
- Documentation (code)
- Development:
- Add L1 and L2 regularization in inner layers
- Test with 80-10-10 folds.
- Production:
- Automatically get the names of the input/output from the default signature
- Find solution to use grpc with asyc/await (FastAPI)
- Add simple makefile
- Check why pytest can non import modules in unit tests of gateway. Meantime, use
python unit_tests.py
Follow the instructions in SETUP.md
Open a shell (e.g. Powershell) and execute:
git clone https://github.com/MarcosMJD/ml-mango-classification.git
Go to https://www.kaggle.com/datasets/saurabhshahane/mango-varieties-classification
Create a Kaggle account and click Download button.
The zip file actually contains two datasets, each one in a subdirectories. We only need the Classification_dataset.
So unzip the folder Classification_dataset
into the data
folder of the repository. Like this:
/data/Classificacion_dataset/Anwar Ratool
/data/Classificacion_dataset/Chanusa (Black)
...
Open a shell (GitBash, bash or Powershell) and execute the following instructions:
Note:
If there is an error when starting Powershell and the conda base environment is not activated by default (you should see (base)), try with:
powershell -executionpolicy remotesigned
or in an already launched powershell: set-executionpolicy remotesigned
To check the policy:
get-executionpolicy
(default is restricted)
You may set the default execution policy at any time.
Create a conda environment (recommended):
In the root directory of the repository, execute:
conda create --name py39 python=3.9
conda activate py39
Install pip and pipenv
pip install -U pip
alternatively python.exe -m pip install -U pip
pip install pipenv
To create the development environment and install dependencies, go to the sources/development directory and execute:
pipenv install --dev
Activate the environment
pipenv shell
Please, note that the production/gateway and production/fron-end have their specific environments in order to keep these environments separated from the development environment and from each other.
To check the tensorflow version and GPU devices used (if any):
python
import tensorflow as tf
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
jupyter notebook
In a bash go to /sources/development
and ensure that the development environment is active, then start jupyter notebook with:
jupyter notebook
Go the browser and open notebook.ipynb
Run all cells check the tranning of models and model evaluation.
The models are saved under the ./models
directory by using checkpoints, which called when the model accuracy improves for each epoch.
Finally, the final model is trained with images of size of 299x299.
The best model is selected and tested with the test dataset.
Finally, the model is converted into saved_model format under production\tf-serving
directory by calling the convert_model.py
script.
Check the model:
Go to /sources/production/tf-serving/
and run
saved_model_cli show --dir mango-model --all
IMPORTANT NOTE: Before deploying the model, it is needed to set the correct input and outputs in the gateway. To do this:
- Go to the directory
/sources/production/tf-serving
- Run
saved_model_cli show --all --dir mango-model
- Find the line
signature_def['serving_default']:
- Take the name of the input. E.G.
input_2
in theline inputs['input_2']
- the same for the output:
dense_1
inoutputs['dense_1']
Your numbers may be different.
Edit /sources/production/gateway/gateway.py
and modify input_xx
and dense_xx
with your numbers
Train script
The model with the best parameters will be trained by using the full_train dataset and checking its performance with the test_dataset.
Is a shell with the development environment activated, run:
python train.py
The best checkpoint is automatically selected and the model is converted into saved_model format under production\tf-serving
directory by calling the convert_model.py
script.
Go to /sources/production/tf-serving/ and run:
docker build -t tf-serving-mango:v1 .
Go to /sources/production/gateway/ and run:
docker build -t gateway-mango:v1 .
Go to sources/production/gateway
and activate the environment with:
pipenv install --dev
pipenv shell
If you were in the development environment, run exit
prevously, since nested environments are not permitted.
Go to /sources/production/integration-tests
and run:
docker-compose up
python test_tf_serving.py
python test_gateway.py
CTRL+C
docker-compose down
In both cases the results with the prediction for the sample image will be shown. Should be something similar to:
{'Anwar Ratool': 1.0, 'Chaunsa (Black)': 0.0, 'Chaunsa (Summer Bahisht)': 0.0, 'Chaunsa (White)': 5.0123284672731474e-37, 'Dosehri': 0 .0, 'Fajri': 2.4020681443702053e-25, 'Langra': 8.949817257817495e-34, 'Sindhri': 5.200761237684644e-35}
The application 'Mango game' is a Streamlit app that
- Shows a set of sample images as a reference
- Chooses a random image from another set of images
- Asks the user to predict the variety
- Predicts the variety by sending a request to the gateway
- Shows the winner and overall result
In order to run this app:
- In a new bash, Go to
/sources/production/front-end/
- Run
pipenv install
andpipenv shell
to activate the environment - Run
streamlit run client.py
- Go to the browser at
http://localhost:8501
- Follow the instructions.
In a bash, go to /production/k8s/kind/
and run:
./kind create cluster --name tf-gateway
Check with:
kubectl cluster-info --context kind-tf-gateway
Load images to the cluster:
./kind load docker-image tf-serving-mango:v1 gateway-mango:v1 --name tf-gateway
Create the deployments and services:
kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-service.yaml
Test gateway and its connection with tf-serving
In order to access the service (load balancer) in the cluster, we will use port forwarding, since the servicehas not been assined with an IP yet.
kubectl port-forward service/gateway 8080:80
Host port 8080 is forwarding to port 80 of the gateway service where it is listening to. Actually Port 80 of service forwards to port 9000 of container, where gunicorn is listening to.
Test the gateway service:
Ensure that the production environment is activated (the one under the production/gateway folder)
Go to production/integration-tests/
folder and run
python test_gateway.py
You may with to delete the cluster with (now, or after the performance tests):
./kind delete cluster tf-gateway
For some reason, I have not been able to run Locust within the virtual environment made by pipenv.
I've had to install Locust on the Anaconda environment and launch Locust from Anaconda environment (that is, not from the pipenv environment)
So if needed, use pip install locust
in your conda environment (you may have already created one for this project)
Go to sources/production/integration_tests/
and execute locust -H http://localhost:8080
Be sure to have a local ML server running, whether docker-compose of with kind as explained before.
Then, go to http://localhost/8089
and enter 4 users.
Please, note that this will have associated costs to the AWS services deployed.
Update to the latest awscli version and eksctl version.
aws ecr create-repository --repository-name mango-repo
Get the "repository uri", for instance:
546106488772.dkr.ecr.eu-west-1.amazonaws.com/mango-repo
Tag your local images to point to the ECR repository:
docker tag tf-serving-mango:v1 546106488772.dkr.ecr.eu-west-1.amazonaws.com/mango-repo:tf-serving-mango-v1
docker tag gateway-mango:v1 546106488772.dkr.ecr.eu-west-1.amazonaws.com/mango-repo:gateway-mango-v1
Log into ECR with aws cli (change your account id and region accordingly):
aws ecr get-login-password --region ${var.region} | docker login --username AWS --password-stdin ${var.account_id}.dkr.ecr.${var.region}.amazonaws.com
For instance:
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 546106488772.dkr.ecr.eu-west-1.amazonaws.com
Push the images:
docker push 546106488772.dkr.ecr.eu-west-1.amazonaws.com/mango-repo:tf-serving-mango-v1
docker push 546106488772.dkr.ecr.eu-west-1.amazonaws.com/mango-repo:gateway-mango-v1
Edit gateway-deployment.yaml and tf-serving-deployment.yaml to use these your new tagged images
eksctl create cluster -f eks-config.yaml
kubectl will be contigured to work with eks automatically.
Apply the configurations:
kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-service.yaml
Get the load balancer (service) external ip:
kubectl get service
For instance: a5392f8c5030f410280c2db849511f09-2136527655.eu-west-1.elb.amazonaws.com
Note: Wait some time for the DNS to propagate.
You may check the FastAPI by directly opening the API endpoint:
For instance: http://a5392f8c5030f410280c2db849511f09-2136527655.eu-west-1.elb.amazonaws.com/docs
Or go to /sources/production/front-end and run
pipenv streamlit client-py
Introduce the url of the EKS load balancer followed by /predict
For instance, http://a5392f8c5030f410280c2db849511f09-2136527655.eu-west-1.elb.amazonaws.com/predict
Finally, delete the cluster:
eksctl delete cluster --name ml-mango-eks
Change kubectl context (cluster)
kubectl config use-context kind-
Get clusters
./kind get clusters
Check images loaded in a node
kubectl get nodes
Note: There should be one which is the control plane
winpty docker exec -ti <<cluster-name>-control-plane> bash
crictl images
Delete cluster
kind delete cluster --name
Log in (bash) into a running container
winpty docker exec -it <container-id> bash
Load and send image via post with requests
import requests
url = 'http://..."
files = {'media': open('test.jpg', 'rb')}
requests.post(url, files=files)
Get image in Flask via POST
imagefile = flask.request.files.get('imagefile', '')
Convert image file from Flask to PIL
pil_image = Image.open(image)
To get the class of the highest prediction
np.array(classes)[np.argmax(preds,1)]
To evaluate a model with a dataset
X can be a keras iterator obtained from keras image data generator -> from dataframe or directory. It returns loss and metric defined in the model compilation.
model.evaluate(X)
Show debug messages with docker-compose
print('debug', flush=True)
Delete a commit
git reset --hard
Delete a file in the last commit
git reset --soft HEAD~1
(the files will be in the staged area where they can be removed or unstaged)
git rm --cached <file>
Install Gunicorn and Uvicorn for multiprocess with FastAPI
pip install "uvicorn[standard]" gunicorn
Run Gunicorn within a docker container with several Uvicorn workers (Linux only)
ENTRYPOINT [ "gunicorn", "gateway:app", "--workers=4", "--worker-class=uvicorn.workers.UvicornWorker", "--bind=0.0.0.0:9000"]
Run uvicorn for testing FastAPI
uvicorn gateway_template:app --reload
uvicorn gateway:app --reload --host 127.0.0.1 --port 8080
On Windows 0.0.0.0 will launch the server, but clients using 0.0.0.0 or 127.0.0.1 or localhost will not work.
On Windows, port 9000 is denied (permissions).
Send image with curl in multipart form data format
curl -X 'POST' 'http://localhost:8000/predict' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F 'file=@IMG_20210630_102920.jpg;type=image/jpeg'
Get the key with the max value in a dict
print(max(x, key=x.get))
Example of simple makefile
deps:
poetry install
dev: deps
cd app && poetry run python main.py
lint: lint-black lint-flake lint-pycodestyle
lint-black:
poetry run black --line-length 100 app
lint-flake:
poetry run flake8 --max-line-length 100 app
lint-pycodestyle:
poetry run pycodestyle --max-line-length 100 ./app