Skip to content

shiv0112/phishing_domain_detector

Repository files navigation

phising_domain_detector

Project Description 📄

❄️ To predict whether the domains are real or malicious.

Data:

:Phishing Websites Dataset

https://data.mendeley.com/datasets/72ptz43s9v/1

These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.

In this repository the two variants of the Phishing Dataset are presented.

Full variant - dataset_full.csv
Short description of the full variant dataset:
Total number of instances: 88,647
Number of legitimate website instances (labeled as 0): 58,000
Number of phishing website instances (labeled as 1): 30,647
Total number of features: 111

Small variant - dataset_small.csv
Short description of the small variant dataset:
Total number of instances: 58,645
Number of legitimate website instances (labeled as 0): 27,998
Number of phishing website instances (labeled as 1): 30,647
Total number of features: 111

I trained this model using Random Forest:

Selected features

Alt text

Metrics of best model used:

Alt text

Grid Search Cross-validation on Random Forest:

Alt text

The ROC Curve for Random Forest:

Alt text

Demo Video:

Alt text

Page of Website:

Alt text Alt text Alt text Alt text

Data Input from user:

Alt text

Authors
Shivansh Srivastava: [email protected]
Ashish Diwakar: [email protected]