This project delves into basic data exploration, visualization, data manipulation and model training. Task in hand is to identify set of customer types who can be potentially involved in the fraudulent tranasctions happening in the trust bank, for this we got dataset from the bank which contains details for around 500000+ customers. This bulk dataset helps alot while training and testing the model with variance of feature sets.
Data Source: https://drive.google.com/file/d/1Aiqk9WUIidF_paBPInPwQJy5icOAYPqM/view?usp=share_link Use above link to access the dataset. Furthermore, you can find the dataset information in the first sheet of this csv file which explains clearly about the dataset features. Summary of this dataset is, it has around 590000+ records and total of 137 features inclusive of missing and misleading dataset.
In this project, firstly missing data being filtered out then data has been explored to identify the find the misbalanced data and again level of filteration has been applied, after trimming data into a clean dataset, it has been visualised that which particular features are been responsible for identifying potential fraud transactions.
This project gives learning opportunity for any AI or Data Science enthusiast to chisel their knowledge about data augmentation and model training techniques which in return will be helpful for solving relatable real world problems.