Skip to content

Latest commit

 

History

History
54 lines (32 loc) · 2.66 KB

README.md

File metadata and controls

54 lines (32 loc) · 2.66 KB

Analyzing Online Retail Business Performance

Insights required:

The aim of the project is to analyze online retail store data to answer to following business questions:
1)top 10 highest revenue generating products
2)top 5 highest selling products in each region
3)month over month growth comparison for 2022 and 2023 sales
4)for each category which month had highest sales
5)which sub category had highest growth by profit from 2022 to 2023

Methodology:

*Data has been extracted from Kaggle using Kaggle API into Jupyter notebook, for cleaning and pre processing using pandas.
*The cleaned data frame is then imported to MS SQL server Database using SQLAlchemy which is a Python SQL toolkit that allows developers to access and manage SQL databases using Pythonic domain language.
*SQL queries have been used to answer business questions making use of aggregations, CTEs, Case statements and window functions.

Retail Orders data imported from Kaggle through API & extracted from zip file into data frame

image

Finding columns with missing values and setting unknown/not available values in “ship_mode” to null

image

Creation of new columns: ‘discount’,’sale_price’,’profit’ and conversion of order-date to date data type

image

Creating Table schema in SQL server with required columns, data types and lengths

image

Using SQL alchemy to export data from python data frame to ORDERS table in master database of SQL Server

image

Top 10 highest revenue generating products

image

Top 5 highest selling products in each region

image

Month Over month growth comparison for 2022 & 2023

image

For each category which month had highest sales

image

Profit growth for product sub-categories from highest to lowest (2022 to 2023)

image