Hi! This is a project me and my team made in relation to querying! Hereby we were required to answer 5 questions accompanied with visualization. These are the questions we set out to answer :
- What is the price distribution of menu items?
- What is the distribution of restaurants per location?
- Which are the top 10 pizza restaurants by rating?
- Map locations offering kapsalons and their average price. Compare restaurant distributions across UberEats, Deliveroo, and Takeaway. What are some of the market trends?
So essentially we divided up the construction into several python files. dbhandler.py to handle the databases and querying, plotmaker.py to handle the plotting of the data and finally answer.py that gives out the answer along with a main file that put's it all together.
For question involving rating specifically, we used a weighted scoring system to find the top category/restaurant which take the rating and the number of ratings into consideration. The formula: score = rating x 0.3 + number_of_ratings x 0.7.
We used sqlalchemy in python to do the querying. Afterwards we manipulated the data using pandas followed by plotting using matplotlib/plotly/geopandas. We used a combination of ORM and OOP for modularity, allowing you to swap out the queries or plots for ease of use.
- sqlalchemy
- Pandas
- Contextily
- Matploblib
- Plotly
delivery-market-analysis/
│
├── visualizations_data/
│ └── prices_destribution_data/
│ └── prices.csv
│ └── kapsalons_data/
│ └── kapsalons_deliveroo
│ └── kapsalons_ubereats
│ └── kapsalons_takeaway
│ └── deliveroo_data
│ └── takeaway_data
│
├── databases/
│ └── ubereats.db
│ └── takeaway.db
│ └── deliveroo.db
│
├── visualizations/
│
├── utils/
│ └── answers.py
│ └── dbhandler.py
│ └── plotmaker.py
│
├── notebooks/
├── requirements.txt
├── README.md
└── main.py
We did encounter some problems and inconsistencies when working with the database, here are some that we found:
- Categories are not uniform and have special characters and duplicates. example: ubereats db '€€', 'Street food', 'Street food'
- The data seems to be only from Flanders.
- Ubereats' database forgot decimal points for price, any query relating to price, be sure to divide by 100
- Takeaway database has a confusing structure, be sure to double check that keys match the correct column!
This project was completed in 5 days
So far no updates planned, might change in the future!
This project was done as part of the AI Boocamp at BeCode.org.
- https://github.com/IzaMacBor
- https://github.com/Ihor1654
- https://github.com/Rasmita-D
Be sure to check out their repos!