A Data Mining project at Université Libre de Bruxelles (ULB). The aim is to detect anomalies in SNCB train data using a mix of preprocessing, domain knowledge, and advanced algorithms.
Team Members:
- Simon Coessens
- Md Kamrul Islam (Konok)
- Narmina Mahmudova
- José Carlos Lozano (Pepe)
Identify anomalies through:
- Data Preprocessing: Handling data quality issues.
- Domain Knowledge Analysis: Addressing specific research questions.
- Advanced Algorithms: Implementing clustering and outlier detection.
- Data Handling: Migrated from CSV to PostgreSQL for better performance.
- Exploratory Data Analysis: Identified anomalies in temperature, RPM, etc.
- Data Enrichment: Added weather data (temperature, humidity, rain).
- Research Questions: Investigated temperature anomalies, sensor errors, and speed irregularities.
- Anomaly Detection: Utilized clustering and classification techniques.
- Dashboard Development: Created visualizations for anomaly insights.
- Real-Time Detection: Set up streaming algorithms to flag live anomalies.
- 2 Nov: Initial anomaly techniques, data cleaning, visualizations.
- 9 Nov: MobilityDB setup, local Jupyter, anomaly refinement.
- 19 Nov: Feature engineering and database updates.
- 23 Nov: Data Mining lab preparation.
- Refine streaming algorithms.
- Improve clustering and classification for data cleaning.