-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Walker to Simplify Dataflow DAG Traversal #1206
base: main
Are you sure you want to change the base?
Conversation
logger = logging.getLogger(__name__) | ||
|
||
|
||
class DataflowDagWalker(DataflowPlanNodeVisitor, Generic[VisitorOutputT], ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof. In my opinion, the need to write another layer of abstraction to make it easier to write logic against a design pattern abstraction is a fireworks display level signal that maybe the pattern - or at least the existing implementation of it - was not the right choice.
As with the generic visitor itself, this makes it a little easier to implement arbitrary things, and ESPECIALLY arbitrary things that turn out to be glorified property accessors. In exchange, it makes it harder to both read the code and understand what's happening in the runtime, because you have to use your human brain to remember that anything hitting a DagWalker means huge chunks of the stack trace can be ignored, except of course when they can't, and good luck distinguishing those scenarios.
An alternative approach to doubling and tripling down on these generic interfaces to make it marginally easier to do things like count the nodes in the graph by type is to parcel out the operations into concrete needs and then develop strongly typed interfaces around them. As an example of what we currently need:
Node level -
- Optimization classes. Then nodes can be typed according to whether or not they can reasonably be optimized. At the moment the DataflowPlan only has 3 optimizable nodes, so this can refine things in a type-safe way that doesn't require us to add these nested layers of boilerplate.
- Conversion classes (well, class, really, because as a practical matter we only actually care about dataflow -> sql). Every node needs an implementation, so every node gets one.
- Recursive node properties - these can just be property methods, no visitors required, and we should move to that model regardless of what happens with the DataflowDagWalker
Plan level -
- Node classification by type, which can be done via a single walk in the initializer for the DataflowPlan
- Plan level properties, which can be boostrapped by initializer-level node classification and recursive node properties
What are the future operations we're expecting to implement here? How many of these are not better resolved via updates to our type specifications? How many of them truly require heterogenous operations coded to node subtypes vs implementing a "default handler" that applies to 90% of the object types in the hierarchy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your points are well noted and they are similar to the ones raised in the visitor discussion. This is intended more as an incremental improvement in the current structure. One potential change that I'm down to do in the short term is replace the visitor with the walker if you have the concern about an additional class.
What are the future operations we're expecting to implement here? ...
I was going for the incremental improvement, so these will require an opportunity to think / brainstorm more about upcoming work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going for the incremental improvement, so these will require an opportunity to think / brainstorm more about upcoming work.
Fair enough! I have a specific set of ideas and I'll be setting to work on making some changes once I'm out of the weeds. I also have a concept for some type structure improvements, would be great to chat about those as well.
This won't matter to me much one way or the other, and if it eases the boilerplate of writing traversals that seems fine.
I will say, I think a DfsWalker that does not do a walk is a baffling construct. I just looked at the link you sent me on top of this to the execution plan update and went through this set of stages of confusion:
Step 1 - it doesn't walk the DAG
Step 2 - wait, maybe it does, let me look at the original cllass
Step 3 - oh wow, it DOES walk the DAG, that's a bug, better tell Paul
Step 4 - no, wait, what's this default recursion thing?
Step 5 - oh, let me check the implementing class
Step 6 - oh ok I had it right the first time
If you like this simplification we can proceed and I'll review this tomorrow. Please do consider taking out the "disable recursion" flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like this simplification we can proceed and I'll review this tomorrow. Please do consider taking out the "disable recursion" flag.
For that class, I was using walk
in a very general sense. Maybe traverse
is a better term. For traversal, you could specify traversal behavior. Let me see what that looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a maximum traversal depth parameter? None means full, etc.? Seems more complicated though.
Description
As per title.