Grab AI For S.E.A.

🏆 AI For S.E.A. is an online challenge held by Grab to search for talented, innovative technologies across Southeast Asia. In this challenge, partcipants are tasked to select and tackle one problem statement (as shown in the image below) by leveraging data science and AI technologies.

Problem Statement: How Do We Detect Dangerous Driving?

Project Objective

💡 My objective is to build an end-to-end machine learning pipeline based on telematics data to detect dangerous driving on the road.

Solution

Exploratory Data Analysis

To start off the challenge, I began by performing exploratory data analysis to find out the class distribution as well as what constitutes safe or dangerous driving.

Sample Dataset

Full Dataset is available here

Below is an example of the dataset provided

Binary Class Distribution

A visualization of the class distribution of safe vs. unsafe driving (15007 safe vs. 4993 unsafe)

Analysis of Trips

A visualization of what constitutes safe vs. unsafe driving based on the acceleration, gyro, speed, and change in speed.

Feature Engineering

After analysing the data, I then performed feature engineering to create new features to supplement my machine learning algorithm later on.

Features Engineered include:

Change in Bearing
Change in Speed (Acceleration/Deceleration)
Bucket Acceleration/Braking Values
Bucket Speed values
Magnitude of Acceleration/Gyro
Change in Magnitude
Total Distance Travelled in km
No. of Danger Events Per Distance Travelled in km (Include Acceleration/Braking/Speeding Events)

Changes made to original features include:

Convert Speed from m/s to km/h
Convert Gyro from rad/s to degree/s

Data Preparation

Next, I prepared the data to be fed into the machine learning algorithm by converting categorical features using one-hot encoding and also aggregate the features grouped by the BookingID.

One-Hot Encoding Categorical Features

One-Hot encode categorical features to transform it into an appropriate format for the machine learning algorithm

Data Aggregation

Aggregate features to "expand the number of features"

Imputation & Train-Test-Split

Before moving on to machine learning, I need to impute the missing values and split the dataset into training and test datasets.

Machine Learning

In this challenge, I have opt to use XGBoost Classifier (a type of gradient boosted decision tree algorithm) to tackle the non-linearity in the data.

Using Stratified K-Fold Cross Validation

To ensure that the model performs consistently across the entire dataset, I implemented the k-fold cross validation and obtained the model's mean score on the various sets of data.

Evaluation: Classification Report

The model is observed to be poor in identifying dangerous driving behaviors. There are several factors that could have led to this issue:
- Imbalanced Class Distribution
- Insufficient Features to distinguish dangerous driving from safe driving

Feature Importance

Based on the feature importance chart, speed and gyro magnitude seem to be the driving factors in determining safe vs. unsafe driving.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Grab_Safety.ipynb		Grab_Safety.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grab AI For S.E.A.

Problem Statement: How Do We Detect Dangerous Driving?

Project Objective

Solution

Exploratory Data Analysis

Sample Dataset

Binary Class Distribution

Analysis of Trips

Feature Engineering

Data Preparation

One-Hot Encoding Categorical Features

Data Aggregation

Imputation & Train-Test-Split

Machine Learning

Using Stratified K-Fold Cross Validation

Evaluation: Classification Report

Feature Importance

About

Releases

Packages

Languages

jjlim7/Grab-AI-Challenge-Safety

Folders and files

Latest commit

History

Repository files navigation

Grab AI For S.E.A.

Problem Statement: How Do We Detect Dangerous Driving?

Project Objective

Solution

Exploratory Data Analysis

Sample Dataset

Binary Class Distribution

Analysis of Trips

Feature Engineering

Data Preparation

One-Hot Encoding Categorical Features

Data Aggregation

Imputation & Train-Test-Split

Machine Learning

Using Stratified K-Fold Cross Validation

Evaluation: Classification Report

Feature Importance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages