-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug-fix: MemoryExec sort expressions do NOT refer to the projected schema #12876
Changes from 5 commits
96fafa0
da03373
35a77e4
7ee1aeb
a19cadf
4105003
8f58c6e
b07f2a2
ab03889
1862c17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,7 @@ use datafusion_execution::memory_pool::MemoryReservation; | |
use datafusion_execution::TaskContext; | ||
use datafusion_physical_expr::{EquivalenceProperties, LexOrdering}; | ||
|
||
use datafusion_physical_expr::utils::collect_columns; | ||
use futures::Stream; | ||
|
||
/// Execution plan for reading in-memory batches of data | ||
|
@@ -207,6 +208,20 @@ impl MemoryExec { | |
/// [`EquivalenceProperties`], we can keep track of these equivalences | ||
/// and treat `a ASC` and `b DESC` as the same ordering requirement. | ||
pub fn with_sort_information(mut self, sort_information: Vec<LexOrdering>) -> Self { | ||
// All sort expressions must refer to the projected schema | ||
debug_assert!({ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than a debug assert, I think it is worth considering changing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. I have also insert the logic into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense -- thank you |
||
let fields = self.projected_schema.fields(); | ||
sort_information | ||
.iter() | ||
.flatten() | ||
.flat_map(|expr| collect_columns(&expr.expr)) | ||
.all(|col| { | ||
fields | ||
.get(col.index()) | ||
.map(|field| field.name() == col.name()) | ||
.unwrap_or(false) | ||
}) | ||
}); | ||
self.sort_information = sort_information; | ||
|
||
// We need to update equivalence properties when updating sort information. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adjusting the ordering here, I wonder if it would be less code / "just work" if you applied the projection to
self.sort_order
first? the normal output properties calculation should work I think 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can simply apply a straightforward projection to the
self.sort_order
, as they may still require special handling. For example, if we have an ordering[a, b, c]
and the projection excludes columna
, the remaining ordering[b,c]
would not be valid. To avoid missing edge cases like this, I prefer consulting the equivalence API to ensure correctness.