Group 3 - Problem 3 - IT4043E

Group 3 - Problem 3 - IT4043E

Introduction

Collected social data need to be monitored to evaluate the quantity and quality which enable us to do these things below:

Establish an optimized storage mechanism to augment server performance.
Log and promptly alert concerning any irregular or noteworthy events.

This project aims to:

Gather data from Twitter, utilizing either the Twitter API or web scraping techniques.
Evaluate project quality and Key Opinion Leaders (KoLs) based on key social network metrics, including posting frequency, impression counts, and engagement levels.
Create visual representations of the analyzed data.

Project Structure

.
├── airflow_dags
│   ├── __init__.py
│   └── twitter_daily_dag.py
├── config.yaml
├── data
├── DataDoc.pdf
├── images
│   └── pipelinepipeline.jpeg
├── jars
│   ├── elasticsearch-spark-30_2.12-8.9.1.jar
│   └── gcs-connector-hadoop3-latest.jar
├── kafka_jobs
│   ├── consumer
│   │   ├── elasticsearch_consumer.py
│   │   └── gcs_consumer.py
│   ├── create_topic.sh
│   ├── __init__.py
│   └── twitter_producer.py
├── keys
│   ├── infra.json
│   ├── lucky-wall-393304-2a6a3df38253.json
│   └── lucky-wall-393304-3fbad5f3943c.json
├── LICENSE
├── logger
│   ├── __init__.py
│   └── logger.py
├── logs
├── modelling
│   └── ML_models.py
├── README.md
├── requirement.txt
├── test
│   ├── __init__.py
│   ├── test_data_processing.py
│   ├── test_gcb.py
│   ├── test_kafka.py
│   └── test_logger.py
├── test.ipynb
├── test.txt
├── twitter_crawler
│   ├── followers_crawler.py
│   ├── __init__.py
│   ├── tweet_kol_crawler_v2.py
│   ├── twitter_daily.py
│   ├── user_update_crawler.py
│   └── utils.py
└── utils.py

Architecture

Prerequisite

Python >= 3.10
Airflow >= 2.7.3
Linux OS

Setup

Environment Variables File
- Create a .env file
- Adjust values and credentials in .env
Python Environment
```
pip install -r src/requirements.txt 
```
You can also create an conda environment to install required packages
Run Airflow
- Please follow this article to install Airflow.
- Change necessary paths in dag scripts (dags/logs/plugins folder).
- To run Airflow:
```
airflow webserver &
airflow scheduler &
```

Kibana Dashboard

General Dashboard

Account Dashboard

Data Structures

Please see the data structure in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group 3 - Problem 3 - IT4043E

Introduction

Project Structure

Architecture

Prerequisite

Setup

Kibana Dashboard

Data Structures

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
airflow_dags		airflow_dags
data		data
images		images
jars		jars
kafka_jobs		kafka_jobs
keys		keys
logger		logger
modelling		modelling
test		test
twitter_crawler		twitter_crawler
.gitignore		.gitignore
DataDoc.pdf		DataDoc.pdf
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirement.txt		requirement.txt
utils.py		utils.py

License

IT4043E-IT5384-2023/IT4043E_Group3_Problem3

Folders and files

Latest commit

History

Repository files navigation

Group 3 - Problem 3 - IT4043E

Introduction

Project Structure

Architecture

Prerequisite

Setup

Kibana Dashboard

Data Structures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages