[This project is part of the AI Engineer cursus on OpenClassrooms]
We are provided with a dataset called Sentiment 140 containing 1,600,000 tweets extracted using the twitter api.
- simple & classic models such as LogisticRegression,
- neural network models such as RNN,
- transformers models such as BERT,
- At first, we will conduce an EDA (01_EDA.ipynb) in order to better understand the dataset and prepare some pre-processed datasets.
- Then we will search for a baseline with a DummyClassifier and a LogisticRegression (02_Classification_classique.ipynb)
- After that, we will try to find the best Neural Network configuration
- Search for the best pre-processing (03_Classification_NN_Select_PreProcessing.ipynb)
- Search for the best embedding (04_Classification_NN_Select_Embedding.ipynb)
- Search for the best architecture (05_Classification_NN_Select_Architecture.ipynb)
- Next, we will try some Transformers models (06_Classification_Transformers.ipynb)
- Finally, we will develop a python script to expose the selected model with an API (API_server.py)
As the notebooks use hyperlinks for the navigation, and because this doesn't work on GitHub, they are also avaible on nbviewer.org for convenience.
In order to use this project locally, you will need to have Python and Jupyter notebook installed. Once done, we can set the environment by using the following commands:
let's duplicate the project github repository
>>> git clone https://github.com/Valkea/OC_AI_07
>>> cd OC_AI_07
let's clone the large file with DVC (you need to install DVC prior to using the following command line):
>>> dvc remote add origin https://dagshub.com/Valkea/OC_AI_07.dvc
>>> dvc pull -r origin
let's create a virtual environment and install the required Python libraries
(Linux or Mac)
>>> python3 -m venv venvP7
>>> source venvP7/bin/activate
>>> pip install -r requirements.txt
(Windows):
>>> py -m venv venvP7
>>> .\venvP7\Scripts\activate
>>> py -m pip install -r requirements.txt
let's configure and run the virtual environment for Jupyter notebook
>>> pip install ipykernel
>>> python -m ipykernel install --user --name=venvP7
REQUIRED: let's install the spacy model used in this project
>>> python -m spacy download en_core_web_sm
In order to run the various notebooks, you will need to use the virtual environnement created above. So once the notebooks are opened (see below), prior to running it, follow this step:
- in order to see the notebooks, run:
>>> jupyter lab
or
>>> jupyter Notebook_Name.ipynb
Start Flask development server:
(venv) >> python API_server.py
Stop with CTRL+C (once the tests are done, from another terminal...)
One can check that the server is running by opening the following url: http://0.0.0.0:5000/
Then by submitting various texts, you should get various predictions. You can post data with a software such as Postman, or even using curl as belows:
curl -X POST -H "Content-Type: text/plain" --data "I love this" http://0.0.0.0:5000/predict
Note that the first request might take some time. But once you've got the first prediction, it should run pretty fast for the others.
>> docker build -t tweet-sentiment-classification .
>> docker run -it -p 5000:5000 tweet-sentiment-classification:latest
Then one can run the same test steps as before with curl.
Stop with CTRL+C
I pushed a copy of my docker image on the Docker-hub, so one can pull it:
>> docker pull valkea/tweet-sentiment-classification:latest
But this command is optionnal, as running it (see below) will pull it if required.
Then the command to start the docker is almost similar to the previous one:
>> docker run -it -p 5000:5000 valkea/tweet-sentiment-classification:latest
And once again, one can run the same curve's tests.
Stop with CTRL+C
In order to deploy this project, I decided to use Heroku.
Here is a great ressource to help deploying projects on Heroku: https://github.com/nindate/ml-zoomcamp-exercises/blob/main/how-to-use-heroku.md
So if you don't already have an account, you need to create one and to follow the process explained here: https://devcenter.heroku.com/articles/heroku-cli
Once the Heroku CLI is configured, one can login and create a project using the following commands (or their website):
>> heroku login
>> heroku create twitter-sentiment-clf
Then, the project can be compiled, published and ran on Heroku, with:
>> heroku container:login
>> heroku container:push web -a twitter-sentiment-clf
>> heroku container:release web -a twitter-sentiment-clf
curl -X POST -H "Content-Type: text/plain" --data "I love this" https://twitter-sentiment-clf.herokuapp.com/predict
This should return an "The predicted label is POSITIVE with the following probability: 92.48%".
The heroku container might take some time to start if it is asleep.
Once done with the project, the kernel can be listed and removed using the following commands:
>>> jupyter kernelspec list
>>> jupyter kernelspec uninstall venvp7