[This project is part of the AI Engineer class on OpenClassrooms]
We are provided with a dataset from globo.com containing 364,047 article's metadatas and 2,988,181 user's interactions with these articles.
- Collaborative filtering
- Content Based filtering
- Hybrid filtering
- Serverless deployment (Azure function)
- At first, we will conduce an EDA (01_EDA.ipynb) in order to better understand the dataset and prepare some pre-processed datasets.
- Then we will search for a baseline model. (02_Recommender_systems.ipynb)
- After that, we will try various appoaches of either the Collaborative and Content-based filtering. (02_Recommender_systems.ipynb)
- Next, we will build an Hybrid model based on the best Collaborative and Content-based models. (02_Recommender_systems.ipynb)
- And, we will develop and deploy an Azure Function to expose the hybrid model. (02_Recommender_systems.ipynb)
- Finally, we create a Streamlit app to test the model. (03_Streamlit.py)
As the notebooks are sometimes too big to be displayed on GitHub (and because the hyperlinks used for the navigation, doesn't work on GitHub), note that they are also avaible on nbviewer.org and dagshub.com for convenience.
In order to use this project locally, you will need to have Python and Jupyter notebook installed. Once done, we can set the environment by using the following commands:
let's duplicate the project github repository
>>> git clone https://github.com/Valkea/OC_AI_09 >>> cd OC_AI_09
let's download the dataset and unzip it in the 'data' folder:
data/news-portal-user-interactions-by-globocom/articles_metadata.csv
data/news-portal-user-interactions-by-globocom/clicks/clicks_hour_XXX.csv
data/news-portal-user-interactions-by-globocom/articles_embeddings.pickle
and let's clone the large file with DVC (you need to install DVC prior to using the following command line):
>>> dvc remote add origin https://dagshub.com/Valkea/OC_AI_09.dvc >>> dvc pull -r origin
let's create a virtual environment and install the required Python libraries
(Linux or Mac)
>>> python3 -m venv venvP9 >>> source venvP9/bin/activate >>> pip install -r requirements.txt
(Windows):
>>> py -m venv venvP9 >>> .\venvP9\Scripts\activate >>> py -m pip install -r requirements.txt
let's configure and run the virtual environment for Jupyter notebook
>>> pip install ipykernel >>> python -m ipykernel install --user --name=venvP9In order to run the various notebooks, you will need to use the virtual environnement created above. So once the notebooks are opened (see below), prior to running it, follow this step:
To see the notebooks, run:
>>> jupyter lab
01_EDA.ipynb
shows the Exploratory Data Analysis of the available files02_Recommender_systems.ipynb
shows
The hybrid recommender system is deployed using an Azure function
, and if I shared the secrets.txt file containing the FUNCTION_KEY with you, you can simply jumb to the Streamlit test.
However, in case I didn't shared the secrets.txt with you, you can still start a local instance of the very same Azure function with the following steps:
1. Install the Azure CLI and Azure CORE
>> cd azure_function
(Linux or Mac)
>>> python3 -m venv venvP9azure >>> source venvP9azure/bin/activate >>> pip install -r requirements.txt
(Windows):
>>> py -m venv venvP9azure >>> .\venvP9azure\Scripts\activate >>> py -m pip install -r requirements.txt
(venv9azure) >>> func host start --port 5000
Stop the Azure function local server, with CTRL+C (once the tests are done, from another terminal...)
Once you have access to the Azure function (either locally or in the cloud with the secret key), you can test some recommendations using the Streamlit user interface (from another terminal if you are already running the local Azure function server, and with the venv9
virtual environment):
(venv9) >>> streamlit run 03_Streamlit.py
Set the number of recommendations you want to receive, then click the button next to a user_id to get recommendations (only a tiny fraction of all users are displayed).
Stop the Streamlit server, with CTRL+C (once the tests are done)
I used Azure Function to deploy this project in the cloud. So let's recall the deployment steps...
>>> func init FOLDER_NAME or >>> func init FOLDER_NAME --python
>>> cd FOLDER_NAME >>> func new
then select HTTP trigger
>>> python -m venv VENV_NAME >>> source VENV_NAME/bin/activate >>> pip install -r requirements.txt
>>> func host start or >>> func host start --port 5000
>>> az login >>> func azure functionapp publish APP_NAME --build remote
In this project I used the following parameters:
- FOLDER_NAME: azure_function
- VENV_NAME: venvP9azure
- APP_NAME: globo-reco
Once done with the project, the kernel can be listed and removed using the following commands:
>>> jupyter kernelspec list
>>> jupyter kernelspec uninstall venvp9