Skip to content

ML pipeline to detect dangerous driving on the road using telemetry data

Notifications You must be signed in to change notification settings

jjlim7/Grab-AI-Challenge-Safety

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Grab AI For S.E.A.

🏆 AI For S.E.A. is an online challenge held by Grab to search for talented, innovative technologies across Southeast Asia. In this challenge, partcipants are tasked to select and tackle one problem statement (as shown in the image below) by leveraging data science and AI technologies.

image

Problem Statement: How Do We Detect Dangerous Driving?

image

Project Objective

💡 My objective is to build an end-to-end machine learning pipeline based on telematics data to detect dangerous driving on the road.

Solution

Exploratory Data Analysis

To start off the challenge, I began by performing exploratory data analysis to find out the class distribution as well as what constitutes safe or dangerous driving.

Sample Dataset

Full Dataset is available here

  • Below is an example of the dataset provided

image

Binary Class Distribution

  • A visualization of the class distribution of safe vs. unsafe driving (15007 safe vs. 4993 unsafe)

image

Analysis of Trips

  • A visualization of what constitutes safe vs. unsafe driving based on the acceleration, gyro, speed, and change in speed.

image

Feature Engineering

After analysing the data, I then performed feature engineering to create new features to supplement my machine learning algorithm later on.

Features Engineered include:

  • Change in Bearing
  • Change in Speed (Acceleration/Deceleration)
  • Bucket Acceleration/Braking Values
  • Bucket Speed values
  • Magnitude of Acceleration/Gyro
  • Change in Magnitude
  • Total Distance Travelled in km
  • No. of Danger Events Per Distance Travelled in km (Include Acceleration/Braking/Speeding Events)

Changes made to original features include:

  • Convert Speed from m/s to km/h
  • Convert Gyro from rad/s to degree/s

image

Data Preparation

Next, I prepared the data to be fed into the machine learning algorithm by converting categorical features using one-hot encoding and also aggregate the features grouped by the BookingID.

One-Hot Encoding Categorical Features

  • One-Hot encode categorical features to transform it into an appropriate format for the machine learning algorithm

image

Data Aggregation

  • Aggregate features to "expand the number of features"

image

Imputation & Train-Test-Split

  • Before moving on to machine learning, I need to impute the missing values and split the dataset into training and test datasets.

image

Machine Learning

In this challenge, I have opt to use XGBoost Classifier (a type of gradient boosted decision tree algorithm) to tackle the non-linearity in the data.

Using Stratified K-Fold Cross Validation

  • To ensure that the model performs consistently across the entire dataset, I implemented the k-fold cross validation and obtained the model's mean score on the various sets of data.

image

Evaluation: Classification Report

  • The model is observed to be poor in identifying dangerous driving behaviors. There are several factors that could have led to this issue:
    • Imbalanced Class Distribution
    • Insufficient Features to distinguish dangerous driving from safe driving

image

Feature Importance

  • Based on the feature importance chart, speed and gyro magnitude seem to be the driving factors in determining safe vs. unsafe driving.

image

About

ML pipeline to detect dangerous driving on the road using telemetry data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published