This project involves developing a Cyber Intrusion Detection System (IDS) using various machine learning algorithms. The IDS aims to detect network intrusions with high accuracy and low false positive rates. The project utilizes the KDD Cup 1999 dataset for training and testing the machine learning models.
The repository is organized into the following files and directories:
data_preprocessing.py
: Script for preprocessing the KDD Cup 1999 dataset.model_training.py
: Script for training machine learning models.model_evaluation.py
: Script for evaluating the performance of trained models.data_preprocessing.ipynb
: Jupyter notebook for data preprocessing.model_training.ipynb
: Jupyter notebook for model training.model_evaluation.ipynb
: Jupyter notebook for model evaluation.requirements.txt
: List of Python dependencies.README.md
: Project documentation.
To set up the project on your local machine, follow these steps:
-
Clone the repository:
git clone https://github.com/arunachaleswaranms/Cyber-Intrusion-Detection-System.git cd Cyber-Intrusion-Detection-System
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
The KDD Cup 1999 dataset is used for training and testing the models. You can download the dataset from the following links and place the CSV files in the project directory:
To preprocess the dataset, run the data_preprocessing.py
script:
python data_preprocessing.py
Alternatively, you can explore the preprocessing steps using the data_preprocessing.ipynb
notebook.
To train the machine learning models, run the model_training.py
script:
python model_training.py
Alternatively, you can explore the training steps using the model_training.ipynb
notebook.
To evaluate the trained models, run the model_evaluation.py
script:
python model_evaluation.py
Alternatively, you can explore the evaluation steps using the model_evaluation.ipynb
notebook.
- Data Preprocessing: Clean, normalize, and transform the raw data from the KDD Cup 1999 dataset.
- Model Training: Train machine learning models (Random Forest and Gradient Boosting) using the preprocessed data.
- Model Evaluation: Evaluate the trained models on the test dataset and generate evaluation reports.
The project successfully developed a Cyber IDS that detects network intrusions with high accuracy and low false positive rates. The machine learning algorithms, particularly Random Forest and Gradient Boosting, proved effective in identifying various types of cyber attacks.
Future work will focus on:
- Addressing identified areas for improvement.
- Expanding the scope of the IDS.
- Further optimizing the performance of the models.
Contributions are welcome! If you have any suggestions or improvements, please open an issue or submit a pull request.