Credit-Card-Fraud-Classification

This notebook covers credit card fraud classification.

NOTE: The dataset is too large to be uploaded here, but you can get it from https://www.kaggle.com/mlg-ulb/creditcardfraud

It's split into 5 sections:

Data preparation and interpretation
Data preprocessing
Exploratory data analysis
Machine learning classification
Neural network classification
Conclusion

This README covers the best methods used in this notebook, though more are covered.

Data preparation and interpretation

First the data is found to be extremely unbalanced like so:

Data preprocessing

The data is balanced using SMOTE to achieve equal outcome.

This has a great effect on the correlation between features and class, comparing the top correlation plot to the last.

Next the outliers are removed from the fraudulent datapoints to increase model accuracy.

Before:

After:

Exploratory data analysis

Features of the data are explored, starting with higher order correlations.

Revealing some quadratic relationships to be tested with models later on. Next the data is dimensionally reduced and plotted to see differentiation between fraud and non-fraud cases.

The plots show that while mostly separated there is come overlap, with the SMOTE data magnifying this.

Machine learning classification

NOTE: Since the data is now equally distributed between classes, the standard accuracy metric is perfectly acceptable (this is not the case for imbalanced datasets).

The data is modelled and the best outcome KNN achieves 99.87% accuracy and is graphed showing a nice result, all fraudulent cases are correctly classified.

The KNN model being the best predicter is then optimised increasing the accuracy to 99.96%

Neural network classification

A neural network model is created, care has been taken to make the model complex enough to distinguish the large and varied dataset produced, I found underfitting easy to achieve. Unfortunately even when using a large model which with my limited computing power takes 45 minutes to train I was only able to produce a 99.7% accuracy score, producing the following confusion matrix:

Somewhat dissapointing given the extra work that went into the neural net. The confusion matrix also shows that some of the fraudulent cases are missed by the classifier.

Conclusion

In conclusion a well optimised KNN algorithm approach proved to be by far the best predictor of credit card fraud using a large SMOTE balanced dataset.

Check out the notebook to clear any details up or make use of the implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Images		Images
Credit Fraud Classification.ipynb		Credit Fraud Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit-Card-Fraud-Classification

Data preparation and interpretation

Data preprocessing

Exploratory data analysis

Machine learning classification

Neural network classification

Conclusion

About

Releases

Packages

Languages

MattH96/Credit-Card-Fraud-Classification

Folders and files

Latest commit

History

Repository files navigation

Credit-Card-Fraud-Classification

Data preparation and interpretation

Data preprocessing

Exploratory data analysis

Machine learning classification

Neural network classification

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages