Skip to content

Utilized SQL in Google BigQuery to analyze sales performance, extracting key business insights to enhance decision-making and optimize operations.

Notifications You must be signed in to change notification settings

bichngocbui/SQL-in-BigQuery-Sales-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

SQL- BigQuery - Sales Analysis

Introduction

This project aims to conduct an in-depth analysis of sales performance for AdventureWorld, leveraging comprehensive SQL analytics on BigQuery to extract critical business insights across multiple dimensions.

Key Analysis Areas

The analysis is divided into the following focus areas:

Sales and Growth Analysis

  • Calculate the total quantity of items, sales value, and order quantity by subcategory for the last 12 months (L12M).
  • Determine the YoY growth rate (%) by subcategory and identify the top three categories with the highest growth rates based on quantity sold.

Territory Performance

  • Rank the top three TerritoryIDs by yearly order quantity, ensuring no ranking is skipped for ties.

Discount Costs

  • Calculate the total discount cost for seasonal discounts by subcategory.

Customer Retention

  • Perform a cohort analysis to calculate the retention rate of customers in 2014 who achieved the "Successfully Shipped" status.

Stock Trends

  • Analyze stock levels in 2011, identifying month-over-month (MoM) differences (%) for all products. If the growth rate is null, default to 0.
  • Calculate the stock-to-sales ratio by product and month in 2011, ordering results by descending month and ratio.

Order Metrics

  • Summarize the number of orders and total value for orders in "Pending" status during 2014.

Business Value

This analysis empowers AdventureWorld to:

  • Monitor sales performance and identify high-growth subcategories to drive revenue.
  • Recognize top-performing territories for strategic focus.
  • Optimize discount strategies by evaluating seasonal discount costs.
  • Retain customers through actionable insights into shipping success and retention metrics.
  • Manage inventory effectively using stock-level trends and stock-to-sales ratios.
  • Improve operational efficiency by understanding pending orders.

Data Access & Structure

Dataset Information

  • Platform: Google BigQuery
  • Dataset: AdventureWorks (Specific database for analysis)
  • Time Period: Full historical data, with focus on 2011-2014

Data Schema

For detailed information about the Google Analytics dataset schema, please refer to the official Google Analytics documentation, available at https://drive.google.com/file/d/1bwwsS3cRJYOg1cvNppc1K_8dQLELN16T/view

Technical Implementation

The project leverages BigQuery's powerful features including:

  • Advanced SQL aggregations and window functions
  • Cohort analysis using date-based grouping
  • Comparative growth calculations (YoY, MoM)

Exploring the Dataset

Query 1: Calc Quantity of items, Sales value & Order quantity by each Subcategory in L12M

Syntax

with latest_date as (
      select 
            max (ModifiedDate) as latest_date 
      from `adventureworks2019.Sales.SalesOrderDetail` 
)

select 
      format_date ('%b %Y', a.ModifiedDate) as period,
      c.name as name,
      sum (a.OrderQty) as qty_item,
      sum (a.LineTotal) as total_sales,
      count (distinct a.SalesOrderID) as order_cnt 
from `adventureworks2019.Sales.SalesOrderDetail` a 
left join `adventureworks2019.Production.Product` b 
      on a.ProductID = b.ProductID
left join `adventureworks2019.Production.ProductSubcategory` c
      on cast (b.ProductSubcategoryID as int) = c.ProductSubcategoryID
where date(a.ModifiedDate) between (date_sub('2014-06-30', interval 12 month)) and '2014-06-30'
group by period, name 
order by name;

Result

image

Based on the analysis of Quantity, Sales Value, and Order Quantity by Subcategory in L12M, it suggests that subcategories with high quantity but low sales value may indicate low-priced, high-volume products. Conversely, those with high sales value but low orders could point to premium products with lower frequency. Recommendations include focusing on high-margin subcategories for targeted promotions, and exploring bundling for lower-priced items to improve profitability.

Query 2: Calc % YoY growth rate by SubCategory & release top 3 cat with highest grow rate

Syntax

with 
sale_info as (
  SELECT 
      FORMAT_TIMESTAMP("%Y", a.ModifiedDate) as yr
      , c.Name
      , sum(a.OrderQty) as qty_item

  FROM `adventureworks2019.Sales.SalesOrderDetail` a 
  LEFT JOIN `adventureworks2019.Production.Product` b on a.ProductID = b.ProductID
  LEFT JOIN `adventureworks2019.Production.ProductSubcategory` c on cast(b.ProductSubcategoryID as int) = c.ProductSubcategoryID

  GROUP BY 1,2
  ORDER BY 2 asc , 1 desc
),

sale_diff as (
  select *
  , lead (qty_item) over (partition by Name order by yr desc) as prv_qty
  , round(qty_item / (lead (qty_item) over (partition by Name order by yr desc)) -1,2) as qty_diff
  from sale_info
  order by 5 desc 
),

rk_qty_diff as (
  select *
      ,dense_rank() over( order by qty_diff desc) dk
  from sale_diff
)

select distinct Name
      , qty_item
      , prv_qty
      , qty_diff
from rk_qty_diff 
where dk <=3
order by dk ;

Result

image

The table calculates the Year-over-Year (YoY) growth rate by subcategory, revealing the top 3 subcategories with the highest growth rates. Mountain Frames leads with a growth rate of 5.21%, followed by Socks at 4.21%, and Road Frames at 3.89%. These subcategories show strong performance, indicating successful strategies in boosting sales. Monitoring these categories can provide insights into effective approaches that could be applied to other subcategories to drive growth.

Query3: Ranking Top 3 TeritoryID with biggest Order quantity of every year

Syntax

with order_cnt_data as ( 
      select 
            extract (year from a.ModifiedDate) as yr,
            TerritoryID, 
            sum(a.OrderQty) as order_cnt,
      from `adventureworks2019.Sales.SalesOrderDetail` a 
      left join `adventureworks2019.Sales.SalesOrderHeader` b
            on a.SalesOrderID = b.SalesOrderID
      group by yr, TerritoryID
)

select * 
from ( 
      select 
            yr,
            TerritoryID,
            order_cnt,
            dense_rank () over (partition by yr order by order_cnt desc) as rk
      from order_cnt_data
      ) 
where rk < 4 
order by yr desc; 

Result

image

The table ranks the top 3 Territory IDs with the biggest order quantities for each year. TerritoryID 4 consistently ranks first, showing its dominant position across all years.

Query4: Calc Total Discount Cost belongs to Seasonal Discount for each SubCategory

Syntax

select 
    FORMAT_TIMESTAMP("%Y", ModifiedDate) as year
    , Name
    , sum(disc_cost) as total_cost
from (
      select distinct a.*
      , c.Name
      , d.DiscountPct, d.Type
      , a.OrderQty * d.DiscountPct * UnitPrice as disc_cost 
      from `adventureworks2019.Sales.SalesOrderDetail` a
      LEFT JOIN `adventureworks2019.Production.Product` b on a.ProductID = b.ProductID
      LEFT JOIN `adventureworks2019.Production.ProductSubcategory` c on cast(b.ProductSubcategoryID as int) = c.ProductSubcategoryID
      LEFT JOIN `adventureworks2019.Sales.SpecialOffer` d on a.SpecialOfferID = d.SpecialOfferID
      WHERE lower(d.Type) like '%seasonal discount%' 
)
group by 1,2;

Result

image

The query calculates the total discount cost related to the Seasonal Discount for each SubCategory, and in this case, "Helmets" is the only product that receives the discount. For 2012, the total discount cost for Helmets is 827.65, while for 2013, the cost increases to 1606.04. This shows a significant increase in the total seasonal discount cost for Helmets from 2012 to 2013_, highlighting the growing investment in discounts for this particular product.

Query 5: Retention rate of Customer in 2014 with status of Successfully Shipped (Cohort Analysis)

Syntax

with 
info as (
  select  
      extract(month from ModifiedDate) as month_no
      , extract(year from ModifiedDate) as year_no
      , CustomerID
      , count(Distinct SalesOrderID) as order_cnt
  from `adventureworks2019.Sales.SalesOrderHeader`
  where FORMAT_TIMESTAMP("%Y", ModifiedDate) = '2014'
  and Status = 5
  group by 1,2,3
  order by 3,1 
),

row_num as (--đánh số thứ tự các tháng họ mua hàng
  select *
      , row_number() over (partition by CustomerID order by month_no) as row_numb
  from info 
), 

first_order as (   --lấy ra tháng đầu tiên của từng khách
  select *
  from row_num
  where row_numb = 1
), 

month_gap as (
  select 
      a.CustomerID
      , b.month_no as month_join
      , a.month_no as month_order
      , a.order_cnt
      , concat('M - ',a.month_no - b.month_no) as month_diff
  from info a 
  left join first_order b 
  on a.CustomerID = b.CustomerID
  order by 1,3
)

select month_join
      , month_diff 
      , count(distinct CustomerID) as customer_cnt
from month_gap
group by 1,2
order by 1,2;

Result

image

The table shows a significant decline in customer counts as the months since joining increase. Month M-0 has the highest number of customers, while Month M-1 and beyond see sharp drops, with Month M-6 having the lowest count. This suggests a need for retention strategies to keep customers engaged beyond their first few months.

Query 6: Trend of Stock level & MoM diff % by all product in 2011

Syntax

with stock_current_data as (
      select
            a.Name as name, 
            extract (month from b.ModifiedDate) as month,
            extract (year from b.ModifiedDate) as year,
            sum(StockedQty) as stock_current
      from `adventureworks2019.Production.Product` a 
      left join `adventureworks2019.Production.WorkOrder` b
           on a.ProductID = b.ProductID
      where extract (year from b.ModifiedDate) = 2011 
      group by name, month, year 
), 

stock_prev_data as ( 
      select
            name,
            month,
            year,
            stock_current,
            lag (stock_current, 1) over (partition by name order by month) as stock_prev 
      from stock_current_data
)

select 
      name,
      month,
      year,
      stock_current,
      stock_prev,
      round ((stock_current - stock_prev) / stock_prev * 100, 1) as diff
from stock_prev_data
order by name, month desc; 

Result

image

The data reveals fluctuating stock levels and significant month-over-month changes in 2011, with most products bearings showing sharp declines in stock towards the end of the year. Seasonal demand appears to influence these variations, as many items experienced large swings in stock, indicating unpredictable demand. To improve, it's recommended to enhance demand forecasting models to align stock levels with actual trends, focus on better inventory management for seasonal fluctuations, and prioritize replenishing high-demand items before peak periods to avoid shortages.

Query 7: Calc Ratio of Stock / Sales in 2011 by product name, by month

Syntax

with 
sale_info as (
  select 
      extract(month from a.ModifiedDate) as mth 
     , extract(year from a.ModifiedDate) as yr 
     , a.ProductId
     , b.Name
     , sum(a.OrderQty) as sales
  from `adventureworks2019.Sales.SalesOrderDetail` a 
  left join `adventureworks2019.Production.Product` b 
    on a.ProductID = b.ProductID
  where FORMAT_TIMESTAMP("%Y", a.ModifiedDate) = '2011'
  group by 1,2,3,4
), 

stock_info as (
  select
      extract(month from ModifiedDate) as mth 
      , extract(year from ModifiedDate) as yr 
      , ProductId
      , sum(StockedQty) as stock_cnt
  from 'adventureworks2019.Production.WorkOrder'
  where FORMAT_TIMESTAMP("%Y", ModifiedDate) = '2011'
  group by 1,2,3
)

select
      a.*
    , b.stock_cnt as stock  --(*)
    , round(coalesce(b.stock_cnt,0) / sales,2) as ratio
from sale_info a 
full join stock_info b 
  on a.ProductId = b.ProductId
and a.mth = b.mth 
and a.yr = b.yr
order by 1 desc, 7 desc;

Result

image

The data shows varying stock-to-sales ratios in 2011. In December, products like HL Mountain Frame - Black, 48 had high ratios (27), indicating overstocking, while items such as Road-150 Red, 52 had balanced ratios (~1). October saw a mix, with some products overstocked (e.g., HL Mountain Frame - Black, 48, ratio 2.91) and others understocked (e.g., Road-150 Red, 52, ratio < 1). Improved demand forecasting is needed to optimize inventory levels.

Query 8: No of order and value at Pending status in 2014

Syntax

select 
      extract (year from ModifiedDate) as yr, 
      Status,
      count(distinct PurchaseOrderID) as order_Cnt, 
      sum(TotalDue) as value
from `adventureworks2019.Purchasing.PurchaseOrderHeader` 
where status = 1 
      and extract (year from ModifiedDate) = 2014 
group by yr, Status; 

Result

image

In 2014, there were 224 orders with a total value of 3,873,579.01 at the Pending status. These orders could indicate issues such as delays in processing, _inventory shortages, or payment problems. To improve operational efficiency, it's recommended to investigate the underlying causes of these pending orders, implement automation or process improvements to reduce delays, and prioritize customer communication to manage expectations and improve satisfaction.

Conclusion

This project analyzes sales data for AdventureWorld using BigQuery to provide key insights that optimize business strategies. The analysis covers sales and growth by product category, performance by region, discount cost efficiency, customer retention, inventory trends, and order metrics. The project helps identify high-growth areas, optimize discount strategies, retain long-term customers, and improve inventory and operations management.

`

About

Utilized SQL in Google BigQuery to analyze sales performance, extracting key business insights to enhance decision-making and optimize operations.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published