-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing columns in orderby #4969
Conversation
9aff14b
to
f24450b
Compare
d3caad5
to
af4e7fd
Compare
with v, v1, v.player.name as name, count(v) as a0 | ||
where a0 > 0 | ||
return name | ||
order by a0 desc, name desc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems the a0
variable is also invisible, why not to report the semantic error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an error. According to cypher's standard, this case is valid. Please refer to the issue this PR links to, I pasted the standard there.
@@ -44,6 +44,9 @@ folly::Future<Status> SortExecutor::execute() { | |||
|
|||
auto seqIter = static_cast<SequentialIter *>(iter); | |||
std::sort(seqIter->begin(), seqIter->end(), comparator); | |||
for (auto &idx : sort->inputVars()[0]->invisibleColIndicies) { | |||
result.valuePtr()->getMutableDataSet().removeColumn(idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this in sort executor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I put this in both sort and topn. These two operators requrie this hidden columns, after which they are not needed and shall be removed from the result dataset. I want to have a lightweight solution to only remove them logically but haven't find one. Please advise me if you have better ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to relational algebra, the sort operator does not need to be responsible for these things. Column projection processing needs to be handled by the project operator.
The sort operator will be unfriendly to the optimizer if it handles these extra.
src/parser/Clauses.h
Outdated
private: | ||
Expression *expr_{nullptr}; | ||
std::string alias_; | ||
// If some columns are required by order-by and are not explicitly | ||
// declared in the return caluse, it is an invisible column and not to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, this is not a good fix. Better to do that in optimizer,such as column pruning rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. It is not an optimization issue to me. It's a functional issue. Because this type of query shall be executable without the optimizer optimizing the plan.
Previously the validator reports column-not-found. Now it expands the scope for order-by when needed and possible by selecting required columns and adding them in the query plan. These columns are not to be expected in the result set. So I mark them to be invisible in their respective YieldColumns.
It's not an validation issue anyway. It's a plan-gen issue. But it seems to me I have to do it in the validator in the current framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. To summarize, we'd better prepare all data for potential aggregation and orderby before the final projection, instead of passing hidden column beneath the projection.
ab75dfa
to
6eb62fb
Compare
6eb62fb
to
7dde0a1
Compare
We have decided not to address this compatibility (with openCypher) issue. Order-by won't support sorting by columns not listed in the return clause. If a query wants to sort its results, it has to do so on a column listed in the return clause. BTW, there is no correctness issues on order-by. |
What type of PR is this?
What problem(s) does this PR solve?
Issue(s) number:
Close #4930
Description:
To comply with the openCyper, the scope of columns that an order-by can see depends on the whether the projection before it is changing the cardinality of the data set. In cases when the projection is neither aggregating or going with distinct, the ordery-by shall see columns before the projection. Or, in other words, users can sort the returned columns using a column that is not explicited used in the return clause.
This beheviour has not been implemented before this PR. Currently, order-by can only see columns that are either explicitly declared in the return clause or return *.
How do you solve it?
Special notes for your reviewer, ex. impact of this fix, design document, etc:
Checklist:
Tests:
Affects:
Release notes:
Please confirm whether to be reflected in release notes and how to describe: