Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about taint analysis in Slither #156

Closed
cty12 opened this issue Jan 31, 2019 · 17 comments
Closed

Question about taint analysis in Slither #156

cty12 opened this issue Jan 31, 2019 · 17 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@cty12
Copy link

cty12 commented Jan 31, 2019

Consider the following example:

contract SampleContract {
    uint svar = 0;

    function func_A (uint user_input) public {
        uint a;
        a = user_input;
        func_B(a);
    }

    function func_B (uint x) internal {
        func_C(1);
        svar = x;    // write to state var
    }

    function func_C (uint y) internal {
        svar = y;    // write to state var
    }
}

Now we would like to check if the state variable svar can be controlled by user input. It is not directly written by func_A, however svar can be modified in func_B, where its value to depends on an argument passed by func_A, which depends on user input.

In the current Slither taint analysis implementation, if we set the taint source to be the local variable user_input in func_A, the local variable a in func_A is tainted, which is expected. However it cannot figure out that the value of svar is controlled by the taint source, which means that running get_state_variable_tainted gets an empty list.

Is this an intended behavior?

@montyly
Copy link
Member

montyly commented Feb 1, 2019

Hi @cty12 , thank you for your interest to Slither!

What version are you using? We refactored the data dependency and the taint with slither 0.5.0 (it uses now the SSA representation of slithIR). We did not document the taint API, but we are going to do it prior to 0.6.0.

There are two functions for what you are asking for:

If the context is the contract, the dependency/taint is from a fixpoint across all the functions. It means that if my_var_1 is set by a variable my_var_2 in f1, and my_var_2 is tainted in f2, the contract will consider my_var_1 tainted.

In your example, you can use:

from slither import Slither
from slither.analyses.data_dependency.data_dependency import is_tainted, is_dependent

slither = Slither('taint.sol')

sampleContract = slither.get_contract_from_name('SampleContract')
svar = sampleContract.get_state_variable_from_name('svar')

taint = is_tainted(svar, sampleContract, slither)

print('{} is tainted: {}'.format(svar, taint))

func_a = sampleContract.get_function_from_signature('func_A(uint256)')
user_input = func_a.parameters[0]
dependent = is_dependent(svar, user_input, func_a, slither)

print('{} is depedent from {}: {}'.format(svar, user_input, taint))

Some notes:

  • The data dependency and the taint are computed as a pre-processing, so it is free to query them
  • is_tainted and is_dependent have an optional parameter, only_unprotected. If it is set to True, the dependency will not be propagated on protected functions. It allows to filter taints done by function like the ones using isOwner
  • There are some limitations on which we are working on. For example, the taint does not take in consideration the visbility of the functions, so if a private function is never called, but set a variable with its parameters, the variable will be considered as tainted.

Let me know if that answers your questions

@montyly
Copy link
Member

montyly commented Feb 1, 2019

We have more examples of the dependency here: data_dependency.sol, data_dependency.py.

@cty12
Copy link
Author

cty12 commented Feb 1, 2019

@montyly

Thank you for the kind reply!

In the code snippet that you posted, running is_dependent(svar, user_input, sampleContract, slither) will get False, isn't it? However the value of svar can indeed depend on user_input.

Please point it out if I missed anything.

@cty12
Copy link
Author

cty12 commented Feb 1, 2019

In fact we can use pprint_dependency to check what data dependency Slither has figured out. In the above example, running pprint_dependency(sampleContract), which means that we are inspecting the dependency in the whole contract's context, gets the following results (we just care about the non-SSA part):

svar (0x7f42e1cf52b0):
	- x
	- y
a (0x7f42e1cf5a20):
	- user_input

In this case, I do not think data dependency has been propagated correctly through function calls.

By the way I have checked data_dependency.sol and it indeed does not contain an example with function calls and argument passing.

@montyly
Copy link
Member

montyly commented Feb 2, 2019

Ah yes, you're right, I was printing taint rather than dependent in the last line of my example.

The dependency is not yet propagated through the arguments of functions, while it should. That's a good catch, thanks!

I will see if I can integrate this feature for the 0.6.0 release.

@montyly montyly added this to the 0.6.0 milestone Feb 2, 2019
@montyly montyly added the enhancement New feature or request label Feb 2, 2019
@montyly montyly self-assigned this Feb 2, 2019
@cty12
Copy link
Author

cty12 commented Feb 3, 2019

@montyly

OK thanks!

By the way, the is_tainted is also flawed, since it considers a variable to be tainted as long as that variable depends on some argument of a function. However that argument may not be dependent on any user input.

@montyly
Copy link
Member

montyly commented Feb 4, 2019

Yes indeed!

It's what I meant when mentioning that the taint does not consider the visibility of the functions, so the parameters of private/internal will be considered as input-controlled. It creates an over-approximation where many parameters are considered tainted why they are not.

Propagating the dependency across parameters should allow refining this limitation!

@cty12
Copy link
Author

cty12 commented Feb 4, 2019

@montyly

Thanks for recognizing this issue. By the way, is there a plan for a simple alias analysis? It can be pretty handy when writing a slightly more complex detector.

@montyly
Copy link
Member

montyly commented Feb 4, 2019

Sure!

By alias analysis, do you mean:

  • pointer analysis: what variables point to the same location in storage; for example tracking storage reference
  • or tracking the value of the variables, for example to find that x == y in:
function f(uint x){
  uint y = x;
}

Feel free to open issues in github if you have specific needs from Slither; we will do our best to add these features

@cty12
Copy link
Author

cty12 commented Feb 4, 2019

@montyly

Yes you are right. I think pointer analysis is the broader term. By the way, in addition to simple assignments, as is shown in your example, it is also possible to store a pointer into a vector and retrieve it later (creating an alias, for example).

@montyly
Copy link
Member

montyly commented Feb 4, 2019

For info, Slither does a light alias analysis on storage references.
For example:

contract MyContract{

    struct myStruct{
        uint val;
    }

    myStruct val_a;
    myStruct val_b;
    myStruct val_c;

    function myFunc(bool condition) internal returns(myStruct storage){
        myStruct storage st1 = val_a;
        myStruct storage st2 = val_b;

        if(condition){
            st2 = val_c;
        }   

        return st2; 
    }
}
from slither import Slither
from slither.core.variables.local_variable import LocalVariable

slither = Slither('ref.sol')

myContract = slither.get_contract_from_name('MyContract')

myFunc = myContract.get_function_from_signature('myFunc(bool)')

for node in myFunc.nodes:
    if node.expression:
        print(f'Solidity expression: {node}')
        for ssa_var in set(node.ssa_local_variables_read + node.ssa_local_variables_written):
            if isinstance(ssa_var, LocalVariable):
                if ssa_var.is_storage:
                    print(f'\tLocal variable {ssa_var.non_ssa_version.name} points to {[v.non_ssa_version.name for v in ssa_var.refers_to]}')

Will result in

Solidity expression: NEW VARIABLE st1 = val_a
	Local variable st1 points to ['val_a']
Solidity expression: NEW VARIABLE st2 = val_b
	Local variable st2 points to ['val_b']
Solidity expression: IF condition
Solidity expression: EXPRESSION st2 = val_c
	Local variable st2 points to ['val_c']
Solidity expression: RETURN st2
	Local variable st2 points to ['val_c', 'val_b']

Some notes:

  • It's needed to work on the SSA version of the variables, as one local variable can change where it points in storage
  • refers_to returns a list because a variable can point to different storage variables, according to the path conditions
  • This code example only runs on the dev branch for now as it relies on some of the properties that are not stable yet.
  • The alias analysis is not robust and will can be quickly fooled. For example, it only works on the function scope and is not inter-procedural (see False positive for unused-state-variables #112)
  • It does not track copy through structures, this would need some improvements in the way SlithIR models reference variables. These improvements are unlikely to be ready for the next release (0.6.0), but are on our roadmap.

We are working to improve the IR, the data dependency and the alias analysis. Feel free to contact us on slack (#ethereum) if you need support to integrate more advanced analyses

@montyly montyly closed this as completed in 18ad15a Feb 8, 2019
@montyly
Copy link
Member

montyly commented Feb 8, 2019

Hi,

I merged the track of the taint and data dependency through arguments to master.
Additionally, only the arguments of public and external functions are considered as tainted now.

Please let me know if you have any other questions related to Slither's internals, or if something is not working as expected. We know that there is room for improvement for the data dependency computation.

@cty12
Copy link
Author

cty12 commented Feb 10, 2019

@montyly

I've checked out the latest commit on the master branch. Is the non-SSA dependency still broken? Consider the following example:

contract SampleContract {
    uint svar = 0;

    function func_A (uint [] user_input) public {
        svar = user_input.length + 1;
    }
}

Printing the dependencies information related to function func_A gives the following output:

#### NON SSA ####
user_input (0x7feb9b2584a8):
	- user_input (0x7feb9b2584a8)
REF_114 (0x7feb9b28b5f8):
	- user_input (0x7feb9b2584a8)
TMP_173 (0x7feb9b28bc50):
	- REF_114 (0x7feb9b28b5f8)
svar (0x7feb9b2582e8):
	- TMP_173 (0x7feb9b28bc50)

In other words, Slither does not consider svar depends on user_input in the context of func_A, where it should. However, svar indeed depends on user_input in the contract-wise context. Nevertheless it still keeps the SSA temporary variable names like REF_114 or TMP_173 in either the dependency of a single function or the whole contract.

@montyly
Copy link
Member

montyly commented Feb 10, 2019

I don't have the same result on your example (tested on 3dbe807):

contract SampleContract {
    uint svar = 0;

    function func_A (uint [] user_input) public {
        svar = user_input.length + 1;
    }
}
from slither import Slither
from slither.analyses.data_dependency.data_dependency import is_tainted, pprint_dependency

slither = Slither('test.sol')

sampleContract = slither.get_contract_from_name('SampleContract')
svar = sampleContract.get_state_variable_from_name('svar')

print('{} is tainted? {}'.format(svar, is_tainted(svar, sampleContract)))

pprint_dependency(sampleContract)

Gives

svar is tainted? True
#### SSA ####
[ ... ]
#### NON SSA ####
[ ... ]
svar (..):
	- user_input (..)
[ ... ]

Could you share your python script?
Are you using Slither through virtualenv (see the Developer instllation)? If you installed slither-analyzer through pip, and then re-installed through git, you might have conflicts.

Concerning the TMP and REF variables, those are slithIR variables, they exist only in SSA form (they have a lifetime of the node) so they are in SSA even if the other variables are not. pprint_dependency is mostly used for debug, but if you have a need for a cleaner function, we can add a better pretty print for data dependency which does not show SlithIR variables.

We can also add a printer that shows the dependency directly (in a textual form, or a graph-based form).

@cty12
Copy link
Author

cty12 commented Feb 10, 2019

@montyly

I built Slither from source and is on the latest commit on master. By the way, I understand pprint_dependency is for debugging :-).

Would you please replace

pprint_dependency(sampleContract)

with something like

pprint_dependency(sampleContract.functions[0])

and try again? This is because the latter one outputs the dependency information of that specific function (that is, in a function-wise context). I understand the contract-wise one does not have that issue.

@montyly
Copy link
Member

montyly commented Feb 10, 2019

Oh you're right, the dependency is not computed through a fix-point on the function's context (but only on the contract's context).

That's another good catch! I am opening a separate issue for it.

Thank you @cty12 for finding these issues, that's really valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants