Skip to content
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.

Support explicit ordering of visitor functions rather than purely relying on declaration order #84

Open
noahnu opened this issue Dec 2, 2018 · 6 comments

Comments

@noahnu
Copy link
Contributor

noahnu commented Dec 2, 2018

Is your feature request related to a problem? Please describe.

Ensuring a strict ordering on the execution of "visitors" is necessary if we want to guarantee deterministic assertions. If visitor_a increments counter_a for some node of type X and visitor_b increments counter_b for node of type X only if counter_a is greater than 0, then changing the ordering that visitor_a and visitor_b are called will change the final value of the computed statistics, meaning assertions may differ from the run that used different ordering.

While typing out this issue I was trying to come up with a good use case for having one visitor function depend on another. I could not. Perhaps someone else can think of an example?

Even if we were to consider this an anti-pattern, we should attempt to minimize possibility of error / flakiness.

Changing to a reducer-style approach for visitors may encourage inter-visitor dependencies as per #10.

Currently order of execution of visitors is defined by declaration order, purely b/c we use a central registry with a global array that is appended to whenever the @visit decoration is executed. The use of a global registry seems like an anti-pattern, however in order to remove this registry, we'd need some way to guarantee visitor order (dir(module) sorts alphabetically, while module.__dict__.keys() is only guaranteed to be ordered in py3.6+).

Describe the solution you'd like

A pattern similar to https://pytest-dependency.readthedocs.io/en/latest/about.html#what-is-the-purpose, where you essentially mark the names of tests that must be run first. We can thus use a topological sort of the dependencies.

e.g.

@visit(nodes.FunctionDef)
def some_func(node, .., ..):
  pass

@visit(nodes.FunctionDef, predicate=None, depends_on=['some_func', 'some_other_func'])
def count_funcs(node, .., ..):
  pass

We'd examine "depends_on" to generate a graph and then topological sort. We could sort alphabetically for visitors with no dependencies (or that are tied in sorting order).

Describe alternatives you've considered

  • Use inspect module to identify line numbers of items in dir(). Use this to order. Pro: consistent in all versions of python. Con: probably slow (?) and will grow in complexity when your config is spread across multiple files.
  • Rely on module.dict which is sorted by declaration order in py3.6+ and should remain in whatever arbitrary order it ends up in pre-3.6 (i.e. re-running program should keep same order even if not declaration order). Con: behaviour isn't easy to understand pre-3.6.

Additional context

https://tophat-opensource.slack.com/archives/CE14KJGET/p1543705988030800

@francoiscampbell
Copy link
Contributor

If visitor_a increments counter_a for some node of type X and visitor_b increments counter_b for node of type X only if counter_a is greater than 0, then changing the ordering that visitor_a and visitor_b are called will change the final value of the computed statistics

I can see this being very brittle. A change in visitor_a's behaviour or its removal entirely can completely break visitor_b

@noahnu
Copy link
Contributor Author

noahnu commented Dec 2, 2018

@francoiscampbell yeah. It feels like an anti-pattern. Maybe @lime-green can provide an example of why we want ordering?

@lime-green
Copy link
Contributor

lime-green commented Dec 2, 2018

Ordering by declaration order seems the most natural to me, which is what we have right now. I don't think we're looking for inter-dependency between visitors just yet, there hasn't really been a need for it from my usage of the library at least.

IMO should just keep the logic as-is until it's obvious that there's a problem

@francoiscampbell
Copy link
Contributor

francoiscampbell commented Dec 2, 2018

Are we talking about the order of all visitors or per-node? Cause I was assuming that the order of all visitors would be the AST depth-first traversal order.

For a single node type, im not opposed to running its visitors in declaration order, although it has the potential to prevent any future changes or optimizations, since the visitor order effectively becomes part of the public API

@lime-green
Copy link
Contributor

@francoiscampbell per-node; it's declaration order right now

@francoiscampbell
Copy link
Contributor

Yeah, that's cool. I'd say let's not make that a guarantee, just an implementation detail for now

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants