You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of having the optimizer decide when it is done by seeing if the last pass changed the plan or not, based on the Display representation of the plan, it might also make sense to compute a unique plan id (bottom up) so that we can also use this to detect optimization cycles.
A very basic example is (assuming each letter is a unique plan id) A -> B -> C -> A -> B -> [max passes times more], where even though the previous plan is different from the current one we would still need to exit the loop. Having a unique id would mean we can just store a set somewhere and check against if known_plans.contains(new_plan.id) and it would break the loop.
Pretty much! I am not sure if there is anything internally that would prevent them being 'hashable' (as in getting consistent summary), but if there is none we can even implement native rust Hash and just use a hash-set.
As this is very similar to the optimizer in compilers, I guess we can also find some inspiration from how compilers judge to finish the optimizing process.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Improve efficiency of multiple optimizer passes
Describe the solution you'd like
Thanks to @isidentical for this suggestion.
Instead of having the optimizer decide when it is done by seeing if the last pass changed the plan or not, based on the Display representation of the plan, it might also make sense to compute a unique plan id (bottom up) so that we can also use this to detect optimization cycles.
A very basic example is (assuming each letter is a unique plan id) A -> B -> C -> A -> B -> [max passes times more], where even though the previous plan is different from the current one we would still need to exit the loop. Having a unique id would mean we can just store a set somewhere and check against if known_plans.contains(new_plan.id) and it would break the loop.
Describe alternatives you've considered
Additional context
Discussion at https://github.com/apache/arrow-datafusion/pull/3880/files#r998491734
The text was updated successfully, but these errors were encountered: