Analyzing a huge amount of malware is a major burden for security analysts.Malware developers have been highly successful in evading signature-based detection techniques. Most of the prevailing static analysis techniques involve a tool to parse the executable, and extract features or signatures. Most of the dynamic analysis techniques involve the binary file to be run in a sand-boxed environment to examine its behaviour. This can be easily thwarted by hiding the malicious activities of the file if it is being run inside a virtual environment. Hence, there has been a need to explore new approaches to overcome the limitations of static or dynamic analysis such as time intensity, resource consumption, scalability.
We propose a method for visualizing and classifying malware using image processing techniques. Malware binaries are visualized as gray-scale images, with the observation that for many malware families, the images belonging to the same family appear very similar in layout and texture. By converting the executable into an image representation, we have made our analysis process free from the problems faced by standard static and dynamic analyses
For the training and evaluation of our proposed model we have used the Malimg Dataset. The Malimg Dataset contains 9349 malware images, belonging to 25 families/classes. Thus, our goal is to perform a multi-class classification of malware.
Link - https://drive.google.com/drive/folders/1CnFx26NfWfQchIU85wRNfHjqfk7Up6hl?usp=sharing
A Malware can belong to one of the following class :
- Adialer.C
- Agent.FYI
- Allaple.A
- Allaple.L
- Alueron.gen!J
- Autorun.K
- C2LOP.P
- C2LOP.gen!g
- Dialplatform.B
- Dontovo.A
- Fakerean
- Instantaccess
- Lolyda.AA1
- Lolyda.AA2
- Lolyda.AA3
- Lolyda.AT
- Malex.gen!J
- Obfuscator.AD
- Rbot!gen
- Skintrim.N
- Swizzor.gen!E
- Swizzor.gen!I
- VB.AT
- Wintrim.BX
- Yuner.A
To convert the binary files into gray scale images we make use of the hexadecimal representation of the file's binary content and convert those files into PNG images. For example the resulting image after converting the 0ACDbR5M3ZhBJajygTuf.bytes binary file into a PNG.
CNN model includes following layers to make it perform feature and pattern extractions from images and help classify the malware family.
- Convolutional Layer : 30 filters, (3 * 3) kernel size
- Max Pooling Layer : (2 * 2) pool size
- Convolutional Layer : 15 filters, (3 * 3) kernel size
- Max Pooling Layer : (2 * 2) pool size
- DropOut Layer : Dropping 25% of neurons.
- Flatten Layer
- Dense/Fully Connected Layer : 128 neurons, Relu activation function
- DropOut Layer : Dropping 50% of neurons.
- Dense/Fully Connected Layer : 50 neurons, Softmax activation function
- Dense/Fully Connected Layer : num_class neurons, Softmax activation function
The input has a shape of [64 * 64 * 3] : [width * height * depth]. In our case, each Malware is a RGB image.
- Future work will be focused on conducting results using more advanced CNN models like Inception V3, VGG16-Net, ResNet50, CNN-SVM, MLP-SVM ,GRU-SVM etc.
- We also want to convert malware images into color RGB images before classification to enhance the accuracy and precision.
- We also want to implement a web based or GUI based tool to convert malware binary files into images and then classifying them.