The goal of this project is to investigate the efficacy of Machine Learning algorithms using bagging and boosting Ensemble methods and selecting a combination of features that would increase the accuracy in detecting phishing URL's as either 'Legitimate' or 'Phishing'.
The dataset is taken from Kaggle where the legitimate websites are taken from Yahoo and Starting point directory (Whitelists) and the phishing websites are collected from Phishtank data archive (Blacklists) where suspicious websites are submitted and verified - https://www.kaggle.com/akashkr/phishing-website-dataset
The dataset consists of 11055 URLs and 32 features. There are 6157 legitimate and 4898 phishing websites.
- Python
- Pandas
- Numpy
- MatPlotLib
- Seaborn
- scikit-learn
- Logistic Regression
- Random Forest Bagging algorithm
- XGBoost Classifier
- AdABoost Classifier