Made by : Ines Achour / Safa Laabidi / Amal Sammari
In this project we made a pipeline to process the Global Terrorism Database (GTD) from Kaggle.
link : [https://www.kaggle.com/datasets/START-UMD/gtd].
The pipeline includes batch and stream processing that's why it's based on the Lambda Architecture.
- Kafka
- Streaming : Spark Streaming
- Batch : Hadoop MapReduce
- Streaming : MongoDB
- Batch : HDFS (data before processing) & MongoDB (data after processing)
- Dashboarding : MongoDB Charts
- GlobalTerrorism_Stream
- GlobalTerrorism_Batch
- GlobalTerrorism_Kafka_Stream
- GlobalTerrorism_Kafka_Batch : append the sent data from Kafka to the database csv file
- GlobalTerrorism_Batch_MongoDB : launch the batch process on the csv database and save the result in MongoDB database
- GlobalTerrorism_Kafka_MongoDB : receive streaming data, process them and save result in MongoDB database
Visualization : https://charts.mongodb.com/charts-globalterrorism-inmsa/public/dashboards/bcd6aeb2-f6f9-4aee-bbab-38f7e0b60851