This project aims to explore and build an accurate, explainable (and potentially a simple) approach to predict weekly bookings for each hotel over time for a hotel chain that operates multiple hotels across Europe.
This section outline how to initialize the runtime environment of the project that are required to run the notebook and the scripts that do model-training, evaluation & forecasting.
This project is built on Python-3.10.12. It manages its dependencies using pip-tools
.
Follow the steps below to set up the environment (assuming you have python3
and python3-venv
installed):
- We set up the virtual environment and install the dependencies. To do this, run the following commands from the root of the project:
python3 -m venv ./venv
- Activate the virtual environment:
source ven/bin/activate
- Install
pip-tools
:
pip3 install pip-tools==7.3.0
- Compile the
txt
files:
pip-compile requirements/requirements.in && pip-compile requirements/requirements-dev.in
- Install the dependencies:
pip-sync requirements/requirements-dev.txt
Note, in this step we intentionally install the dev dependencies because the packages we use to run the notebook are being treated as dev dependencies. (This is just an application design choice)
- Connect the jupyter kernel to the virtual environment:
python3 -m ipykernel install --user --name=companyx_task_venv
If you already have the venv
folder then you can execute step 2 and 5 directly.
The notebooks are in the form of jupyter notebooks (i.e. in .ipynb
format) and they all live under the ./notebooks
folder. Inside it there are the following notebooks:
data_prep.ipynb
- The main objectives of this notebook are:- To understand the data and its structure.
- To clean the data and make it ready for further analysis.
- To save the cleaned data in a format that is easy to load and use for further analysis.
eda.ipynb
- This notebook corresponds to Task - 1. The main objectives of this notebook are:- Conduct a thorough exploratory data analysis on the dataset provided.
- Answer all the questions raised as part of this task.
- Additionally, try to highlight a few more insights that could be interesting for the design of an optimal price recommendation engine.
model_engineering.ipynb
- This notebook corresponds to Task - 2. The main objectives of this notebook are:- Choose two model training algorithms and briefly explain the rationale behind these choices.
- Train two forecasting models that give predictions at per hotel per week aggregation level for a fixed number of time-steps (weeks) in the future. It also potentially tries to exploit all the insights gained from task 1.
- Conduct a holistic evaluation of the two models and make some general observations about their performance, interpretability and scalability.
To run the notebooks, execute the following command from the root (or from the notebooks
directory) of the project:
jupyter lab
Run all the cells in the notebooks in the order they appear. The notebooks are designed to be self-contained and should run without any issues.
Please run all the notebooks in the order as they appear in the list above. This is because data produced in the previous notebooks are used as inputs to the subsequent notebooks.
The first notebook data_prep.ipynb
expects the raw data hotel_bookings.csv
in the data
folder.
All data produced as part of the execution of the notebooks are saved in the data
folder in parquet
format.