Skip to content

Automated data-driven project leveraging Python, Airflow, and GKE. Scrapes diverse data sources, providing insights into Xbox hardware and software data.

Notifications You must be signed in to change notification settings

Elsayed91/xbox_de_project

Repository files navigation


Logo

Xbox Data Scraping Pipeline

A data-driven project that uses Python, Airflow, GCP and K8s to gather & provide insights into Xbox data.



Table of Contents
  1. Architecture
  2. Getting Started
  3. Things to Consider
  4. How to Improve

Architecture

The project aims to periodically collect and analyze Xbox-specific data from various sources. It involves Python-based scraping scripts and is orchestrated by Airflow, running on Kubernetes (GKE).

architecture

Getting Started

Prerequisites

Installation

  1. Clone the repository:
git clone https://github.com/Elsayed91/xbox_de_project.git
  1. Install Pipenv
pip install pipenv
  1. Rename template.env to .env and fill out the values.
  2. Run the project setup script:
make setup

Things to consider:

  • Weighted Performance Metric: The dashboard employs a weighted performance metric to handle uncertainties in game ratings effectively. This approach ensures that games with only a few highly-rated reviews do not receive higher scores than games with many slightly lower ratings. The weighted metric is derived from Wilson's interval, a statistical method that takes into account the uncertainty tied to the true rating of a game.

  • Game Pass Status Matching: The function responsible for adding Game Pass Status to the data is functional; however, it may occasionally yield inaccurate results due to title mismatches between Metacritic.com and the Xbox Game Pass Master List. While the dashboard addresses this issue, some individual game lookups might still be affected by these mismatches.

  • Twitter Code: Kindly note that as of the 1st of July twitter has been made to require authentication for viewing. This has rendered Snscrape useless. According to this tweet this is temporary. If this is not reverted soon, then a fork of Snscrape that includes authentication could be used instead.

How to improve

  • Game Pass Status: Monitor the project and create a list to fix troublesome titles, or consider using a different source, such as Xbox.com, to improve accuracy.
  • Project Scope: Expand the project to include Xbox PC.
  • Data Modeling: Data Modeling techniques could be implemented.
  • Leverage Unitilized Metacritic Data: Explore the number of players data from Metacritic to analyze trends.
  • NLP and Sentiment Analysis: Use NLP to extract key words associated with good and bad reviews and perform sentiment analysis on game reviews.

About

Automated data-driven project leveraging Python, Airflow, and GKE. Scrapes diverse data sources, providing insights into Xbox hardware and software data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages