Skip to content

This repository provides API and scripts for Android malware detection.

Notifications You must be signed in to change notification settings

TamerPlatform/android-malware-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 

Repository files navigation

Android Malware Detection

My research is about Android malware detection. This repository provides APIs and scripts for static analysis. Most of the codes are written in Python for rapid development.

Table of Contents

Overview

The workflow of static analysis is the same as most of machine learning applications: acquire a dataset, extract features, and classify the samples.

Dataset Acquisition

There are several ways to get APK files. You can download them either from Google Play or from thrid-party app stores like Wandoujia. Extra tools may be needed to download APK files from Google Play such as APKs Downloader. Note the daily quota of a Google account is about 300 to 500 APK files. It takes over a month to collect ten thousand apps by a single account.

After downloading, we should label the APK files as malicious or benign. Some papers trust all the samples from Google Play as benign, and use the malicious dataset provided by Android Malware Genome Project. However, this scheme is too coarse since malicious applications may exist in Google Play. Google Bouncer is not almighty.

VirusTotal is another common way to do this job. It provdes a public API to scan arbitrary files online by several anti-virus scanners. A paper Drebin labels the applications as malicious that are detected by at least two of the ten common scanners, and combines them with Android Malware Genome Project. Another paper MAMA controls the detection ratio of all the scanners empirically.

Feature Extraction

Everything in an APK file can be a feature. The most basic and common features are permissions and APIs. The latter has dominated for almost two years due to its granularity.

You can use either Python-based Androguard or Java-based Baksmali to disassemble the APK files, and get the features needed. I choose the former one because writting Python scripts is more intuitive for me.

Classification

There are many algorithms that can be applied like Bayesian networks, support vector machines (SVMs), J48, and etc. You can implement those algorithms yourself. However, as a programmer, I suggest you using libraries like scikit-learn (Python-based) and Weka (Java-based). Never reinvent the wheel, dude.

If you have not decided which algorithm to use yet, I recommend Weka as a start point. It provides a great GUI so that you can do lots of stuff simply by a mouse, which is conveninent for trial at the beginning. You can also write your complex classification system with Java-based Weka API if you are a Java fan.

Development Environment

  • Ubuntu 14.04 & PostgreSQL 9.3.5

    You can use Ubuntu 14.10, too. CentOS 7 and Fedora 20 are also good options. Note that CentOS 6.5 may need virtualenv as it use stable but older libraries, which made me feel a bit annoyed every time I started a screen session.

  • Python 2.7.6

    Three two python modules, psycopg and poster, are needed for database manipulation and uploading files via HTTP, respectively. Use apt-get to install them:

    $ sudo apt-get install python-psycopg2 python-poster
  • Androguard 1.9

    To install Androguard, make a local clone on your computer:

    $ git clone https://github.com/androguard/androguard

    Do not forget to add the path to PYTHONPATH in yoru bashrc:

    export PYTHONPATH=/path/to/androguard:$PYTHONPATH
  • Weka 3.6.11

    After download Weka, change into the directory and type the following command to execute ti:

    $ java -Xmx1000M -jar weka.jar

    Note that using -jar will override CLASSPATH variable and only use weka.jar. By the way, I prefer another way. First Add the path to CLASSPATH in bashrc:

    export CLASSPATH=/path/to/weka:$CLASSPATH

    Later you can run the program everywhere:

    $ java weka.gui.GUIChooser
    # or java weka.gui.Main

    If your operating system is Ubuntu, you will find it convenient with its tab-completion. Finally, do not forget to add SVM Java libary to CLASSPATH variable if you need it.

Getting Started

About

This repository provides API and scripts for Android malware detection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published