-
Notifications
You must be signed in to change notification settings - Fork 528
GDA Static Taint Analysis
gjden November 5, 2018
For a long time, Researchers have tried to apply dataflow analysis(static and dynamic tait propagation) to the APP security analysis and have made a lot of achievements. but there are still many limitations in practical application, such as complex environment configuration, instability, slow speed, path explosion, large memory consumption, hard disk explosion and so on, which make the practical application very inconvenient, most of them are still used in academic research. So in 2018, I decided to write a lightweight static taint analysis engine -- Flashflow to solve the above acute problems, which took me several weeks. At the end of the year, I finished the engine test and then added it to the release version of GDA decompiler (starting from gda3.62, it supports variable and register tracking).
FlashFlow is a high-speed data flow analysis engine that simulated artificial reverse engineering by dataflow analysis technology. The analysis speed is very fast. In my incomplete test (just take the Flashflow for a comparative test: import the sourcesandsinks.txt from Flowdroid into FlashFlow, and conduct a comprehensive scanning test on APKs code). Through the data flow analysis on most pop apps to report the privacy disclosure problem, Flashflow only takes a few seconds, and report out the result of privacy disclosure, while Flowdroid takes a few minutes or even 10 hours to complete an analysis. More than 80% of APK samples are 100 times faster than Flowdroid, and 500 times faster in extreme cases(For som APK files, Flashflow only needs 2 seconds, while Flowdroid will take more than 7 hours).
Flash flow adopts context-free data flow analysis algorithm, which can track any variable or register at any position, and does not need to initialize the environment, and does not need to make any configuration. In this way, Flashflow can be embedded into GDA decompiler as a very simple and easy-to-use auxiliary function that is able to analyze single variables and registers (Privacy disclosure scanning function is currently only used for testing, not embed in Gda decompiler). At present, it supports the data flow analysis of any register under Smali code, and parameters and return-value of a method in Java code.
First, we use a simple example to illustrate the usage of flashflow function in GDA. The method below is extracted from a malicious app.
This method will get the ID of the current device and return the string with ID. In general, we will find the caller of this method by cross-reference and continue to analyze where return-value flow. But sometimes, there are too many callers or the processing logic of the callers for the return value is too complex, which made analysis work difficult. In many cases, we do not often care about the intermediate processing logic, but where the device ID will eventually flow, such as whether it is written to a file, uploaded to the Internet, or sent to a mobile phone number by SMS, etc. At this time, the data flow analysis based on taint tracking can greatly simplify this analysis process. In GDA I defined the anchor points including the APIs of file writing, network transmission, SMS sending, and other operations. After Flashflow analysis, these anchor points are highlighted, so that the flow direction of the variable can be quickly determined. Here, we do not need to do any context configuration, just press the shortcut key F
, and the following dialog box will pop up:
Then select and double-click one of them for analysis. Here p0
, p1
are the parameters of the current method, and ret
is the return value. In this case, we want to trace the return value, so double-clicking ret will cause the analysis result:
In the above figure, on the left side of the dialog box is the result of forward propagation and on the right side is the result of backward propagation. Each node represents the instruction-node (the node is based on a single instruction)that flows through in the process of return value transmission. The first node is the starting method.
Where, the return value v0 is back propagated to the statement "v0 ="6&"+ p0.getsystemservice ("phone"). Getdeviceid();"
, and the back propagation node is empty. Here we mainly look at the left part of the dialog box. The left part is the analysis result of forward propagation. Here we can see each node that the return-value flow through. I have filtered the node that we don’t care here(such as data transmission instruction node, logic operation instruction node, arithmetic operation instruction node, etc.), only showing the method that the variable flow through。
It can be seen from the analysis results of forward propagation that the data of the return value flows to two sensitive locations (highlighted anchor method), namely android.util.log.d
and android.telephone.smsmanager.sendtextmessage
, which indicates that the device ID information obtained by the method com.itcast.cn112.a.a
will be recorded in a log file finally, and will be sent to a mobile phone by SMS.
Expand the node, we can also see the register that the data flow into, the calling level and the method that the data-flow node belonged, as shown in the figure below. When the data flow calls the log.d
, it is stored in the register v1, the calling level is 3, and the data flow is currently in the com.itcast.cn112.a.d
method.
If we want to further analyze the context details of the data flow, we can click the node, GDA will decompile the method that the node belonged and highlight the location of the instruction node. As shown in the picture:
In Smali code, we can select any register for tracking analysis. There are three choices in Smali Code: forward analysis, backward analysis, and bidirectional analysis. Take the method shown in the figure below as an example.
If you want to perform forward analysis on v1 at 0007, you can right-click->Taint-follow
. The analysis result dialog will pop up on the right side of the window.
In addition, there are Taint-source, Taint-Follow and Taint-Bidirection in the right-click menu for doing forward, backward and bidirectional analysis. In smali code, the dataflow analysis results are all the instruction nodes that are not filtered. And some meaningful results are highlighted. For example, if the source of register value is a string and defined source API, it will be highlighted.
Similarly, by clicking on these nodes, GDA will decompile the methods in which the nodes are located and highlight the node instructions.
Please leave a message if you have any questions or suggestions.
Flash flow embed in GDA can be very flexible to do taint propagation analysis (forward and backward), and the speed is extremely fast. And, It’s a not fully functional system, I will go on to optimize and improve the engine in the future. Thank you for your support.