Skip to content

repository for masters thesis information engineering technology at Ghent University

Notifications You must be signed in to change notification settings

rtalwar2/ML-for-NIDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Explainable AI techniques for Detecting Contamination Features for improved Machine Learning-based Intrusion Detection

This repository contains the code and resources for my master's thesis on machine learning for network intrusion detection at Ghent University. The goal of this thesis is to develop a methodology that detects contaminating features in intrusion detection system (IDS) datasets using explainable artificial intelligence (XAI) techniques such as SHAP. Additionally, the inter-dataset generalization technique is employed to assess the impact of these features on the generalization ability of machine learning (ML) models.

Repository Structure

The repository is organized as follows:

  • contaminant_discovery: Contains the code for the training and analysis phase of our methodology to detect contaminants.
    • *-heatmaps: heatmaps generated in every training/analysis cycle are available here.
  • contaminant_validation_phase: Includes the code for the validation phase of the methodology.
    • boxplots: Contains boxplots for comparing feature distribution.
  • generalization: Contains bash scripts and Python scripts to run the inter-dataset generalization testing.
    • results: contains the results of the generalization experiment in table form and in visual form (heatmaps)
      • generalization_heatmap: Stores the heatmaps generated by running the inter-dataset generalization testing code.
      • bar_chart: Contains the grouped bar plot of the results.
  • ks-test: Contains the code for the KS-test and the analysis of the KS-test results.
    • ks_test_scatter: Includes scatterplots of the results of the KS-test code.
    • ks_test_violin: Contains violin plots of the results of the KS-test code.

Getting Started

The following packages are needed to run the code in this repository:

  • numpy
  • pandas
  • fastai
  • seaborn
  • sklearn
  • scipy
  • shap
  • matplotlib
  • plotly.express

Source Datasets

The datasets used were part of the NFV2-collection by the university of Queensland aimed at standardizing network-security datasets to achieve interoperability and larger analyses. The cleaned versions of these datasets were used and are available in Kaggle.

The UNSW-NB15 dataset with metadata used for measuring the effectiveness of our methodology is also available in Kaggle

Acknowledgements

I would like to express my deepest gratitude to all people who helped me with guidance and advice for finishing this thesis.

First and foremost my supervisors Laurens D’hooge and Miel Verkerken for their invaluable guidance and support throughout the duration of this research. Their research and discoveries have played a pivotal role in shaping the direction and quality of this thesis. Also, their persistence in encouraging me to take full ownership of the research and pursue my own ideas made this research into something I can really call my own.

I am also thankful to the members of my thesis committee, prof. dr. Bruno Volckaert, dr. ir. Tim Wauters and Prof. dr. ir. Filip De Turck, for providing me with the opportunity to conduct research in this field within the research group.

I would like to extend my appreciation to imec for providing the necessary computing resources that enabled the successful completion of this research.

Also a thanks to the NYCU for their willingness to collaborate. I am particularly grateful to Didik Sudyana and Fietyata Yudha, despite the fact that we were unable to pursue the collaboration I had initially envisioned due to challenges with data quality and time limitations. Nonetheless, both of them displayed a readiness to assist me and provided valuable explanations regarding the CREMEv2 data.

A special thanks to all lecturers from the Information Engineering Technology Department at Ghent University for the high�quality education and these 4 interesting years. Without them, I would not be where I am right now. Their expertise, passion, and dedication have been instrumental in shaping my understanding of the subject matter and laying a strong foundation for my research.

My heartfelt thanks go to my family for giving me this opportunity to study and get this degree.

Additionally, I should not forget my friends, who have made the past four years simply fly by. It is through their companion�ship that I have created unforgettable memories throughout my journey that will forever be cherished.

Lastly, I express my gratitude to all the individuals who, directly or indirectly, have contributed to the completion of this thesis. Their contributions may not be explicitly mentioned, but their impact has been significant and deeply appreciated.

Thank you for your interest in my research project! 😊

About

repository for masters thesis information engineering technology at Ghent University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages