Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for similar_variables.py #1945

Merged
merged 4 commits into from
Jun 9, 2023
Merged

Conversation

0xGusMcCrae
Copy link
Contributor

@0xGusMcCrae 0xGusMcCrae commented Jun 3, 2023

Fixes #1630

Optimizations to the algorithm for detecting similar variable names in similar_variables.py.

Runtime for slither/dev on current C4 competiton MAIA codebase:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

40      0.005    0.000    39.760   0.994   .../slither/detectors/variables/similar_variables.py:76(_detect)

Runtime for optimized version on same codebase:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)

40      0.004    0.000    17.769   0.444   .../slither/detectors/variables/similar_variables.py:83(_detect)

Runtime is reduced by >50% here, and the issues detected are identical.

Changes (in detect_sim):

  • Inner loop reworked so that it doesn't repeat comparisons. No need to compare the current outer loop's variable to anything with a lower index, as it's already been compared.
  • caching v1.name.lower() in the outer loop so it is only repeated once per outer loop rather than once per inner loop

@0xGusMcCrae
Copy link
Contributor Author

0xGusMcCrae commented Jun 3, 2023

There were some further optimizations (down to ~14 seconds for the MAIA codebase) that I didn't include in this PR since I wasn't 100% sure that they wouldn't break anything if anything external depended on the similar function having its exact current behavior.

It's possible to feed already lowercase variable names into similar and bring the length equality check from similar out into detect_sim

And AFAIK this line can be deleted:

since v2 won't be re-compared to v1 with the new inner loop setup.

Something like this:

@staticmethod
def similar(seq1: str, seq2: str) -> bool:
    ...
    Returns:
        bool: true if names are similar
    """
    val = difflib.SequenceMatcher(a=seq1, b=seq2).ratio()
    ret = val > 0.90
    return ret

@staticmethod
def detect_sim(contract: Contract) -> Set[Tuple[LocalVariable, LocalVariable]]:
    ...
    all_var = list(set(all_var + contract_var))

    ret = []
    for i in range(len(all_var)):
        v1 = all_var[i]
        _v1_name_lower = v1.name.lower()
        for j in range(i,len(all_var)): 
            v2 = all_var[j]
            if len(v1.name) != len(v2.name):
                continue
            _v2_name_lower = v2.name.lower()
            if _v1_name_lower != _v2_name_lower:
                if SimilarVarsDetection.similar(_v1_name_lower, _v2_name_lower):
                    ret.append((v1, v2))

    return set(ret)

So if further optimization is desired, these are possibilities.

@0xalpharush 0xalpharush added this to the 0.9.4 milestone Jun 9, 2023
@montyly
Copy link
Member

montyly commented Jun 9, 2023

that's awesome, thanks @0xGusMcCrae

@montyly
Copy link
Member

montyly commented Jun 9, 2023

I think we can reformat similar to follow your recommendation, it's only used in this detector as far as I can tell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants