-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache output equivalence_properties
in ProjectionExec
#9097
Conversation
@@ -56,6 +56,8 @@ pub struct ProjectionExec { | |||
input: Arc<dyn ExecutionPlan>, | |||
/// The output ordering | |||
output_ordering: Option<Vec<PhysicalSortExpr>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory this doesn't need to be stored now, since it can be obtained via self.equivalence_properties.oeq_class().output_ordering()
, however the last method has some non-trivial computation which are probably best avoided anytime there's a call to ProjectionExec::output_ordering()
96ba157
to
9d1ae0a
Compare
Out of curiousity I ran the sql_planner benchmark on this PR compared to c843226 (main at the time of writing). Even though the PR is not targetted at those queries, the improvements seem to be in the range of no-change to -3% 👍 Results
|
equivalence_properties
in ProjectionExec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a very nice change to me @gruuya -- nice find
I wonder if we should add a similar optimization for other nodes where compuing this is non trivial (like HashJoinExec
or HashAggregateExec
or FilterExec
)
I could imagine even changing the signature of ExecutionPlan::equivalence_properties
to return a reference and basically force pre-computing the results 🤔
I agree with @metegenez on #9084 (comment) that it would be good if @mustafasrepo weighs in on this PR prior to merge
Co-authored-by: Andrew Lamb <[email protected]>
4209e68
to
87958e0
Compare
Thanks!
Yup, this makes sense to me. In fact
This also sounds good. Arguably for other nodes this might be forcing an eager computation of something that might not be used in theory, but in practice I think these two methods always get invoked during the planning process, at least to propagate the info up the plan tree to something that does cache it now (might also need some profiling/investigating here too). I can try and make these additional changes separately if other people think it could be a good idea. I also think breaking up |
@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week. |
This is fine with me if it is ok with @gruuya. Also given how small this PR is, if he needs the change sooner maybe we can consider merging it so it is available while the more general solution is underway. I don't feel strongly about this. @ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ? It might help to make the plans more visible to others as well as I would like to point out we have a version of proejction pushdown in InfluxDB https://github.com/influxdata/influxdb/blob/main/iox_query/src/physical_optimizer/projection_pushdown.rs that might be worth a look as well |
We will open the PR to the upstream next week. I will also request API-level suggestions there. It's not expected to significantly alter our current plans, but it will be a solid step towards optimizing potential outcomes following other existing optimizations and future ones. Thanks for the IOx solution, I will examine it. |
@gruuya, let us know if it is very urgent for you. Then we can go ahead and merge for now and change again in a few days. Otherwise, we can collaborate next week on the more general approach we are working on.
We will open a ticket and a draft PR next week as an RFC for this |
Just to be clear, I was suggesting a ticket Before the PR. Some reasons a ticket prior to PR might be be beneficial are:
|
BTW, for the sake clarity, there are two steps of the work here:
These two are loosely related but independent tasks |
@alamb, @ozankabak this is not that urgent for us (though it would certainly be nice to have a fix soon). My impression from #9084 (comment) was that the general solution considered would actually be complementary to the one in this PR (i.e. locally caching stuff in the optimizer as opposed to caching it at the node level), hence possibly not necessitating changing it later on. Otherwise, we can wait for the more general solution, thanks. |
I thought it would be easier to discuss through code, but you are right in the importance of filling a ticket. I just mean that this PR has nothing to do with the projection optimizer I'm currently working on.
"a different general solution" is not the projection optimizer 😁 I have opened the issue as you suggest: #9111. |
Converting to draft as we wait for additional potential improvements so it is clear this PR is not waiting on feedback |
Update: In a couple days, we will open a PR with a solution that takes the core idea of this and generalizes it in a neat way (we think). Stay tuned! |
Superseded by #9346 |
Which issue does this PR close?
Closes #9084.
Rationale for this change
Avoid potential exponential explosions in branching calls to
output_partitioning
andequivalence_properties
in certain plan combinations.What changes are included in this PR?
Simply store the
equivalence_properties
, which are always calculated anyway inProjectionExec
.Are these changes tested?
They were tested on the example from #9084
Are there any user-facing changes?
Not really, apart from not stalling TPC-DS q64.