-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider table scan filter during analysis of optimize projections #9131
Consider table scan filter during analysis of optimize projections #9131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mustafasrepo do you think it is #5436 related?
I think that PR is related to the projection functionality inside FilterExec. But I am not sure though.
However, in the issue I am not sure this is the desired feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember we had this discussion in the past (maybe with @Dandandan ) about if columns referenced only in pushed down projections should also appear in the projection of the scan or if the table scan implementations should have to figure that out
I think the consensus was that the projection should only reflect which columns are actually needed at the output of the TableProvider. This allows for more optimized pushdown in parquet (e.g. can skip materializing filter-only columns if we can rule out the entire page using page pruning)
In fact the documentation of TableProvider says explicitly
In certain cases, a query may only use a certain column in a Filter that has been completely pushed down to the scan. In this case, the projection will not contain all the columns found in the filter expressions.
Which I think is contrary to what this PR does.
Thus, I am not sure if this is a good change.
If we do make this change, we should also update the documentation to reflect what the code does.
I agree with @alamb. I talked to @mustafasrepo and I am not sure if #9109 is a bug. |
After these discussions, I agree that this change is not for the better. Hence I am closing this PR. |
Which issue does this PR close?
Closes #9109.
Rationale for this change
What changes are included in this PR?
This PR fixes the bug described in the issue
Are these changes tested?
Yes
Are there any user-facing changes?