-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize Median Accumulator #7376
Conversation
.with_expected_errors(vec![ | ||
"Resources exhausted: Failed to allocate additional", | ||
"AggregateStream", | ||
]) | ||
.with_memory_limit(20_000) | ||
.with_memory_limit(2_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to lower this memory limit, as it is significantly more space-efficient now
@@ -68,12 +68,12 @@ async fn oom_sort() { | |||
#[tokio::test] | |||
async fn group_by_none() { | |||
TestCase::new() | |||
.with_query("select median(image) from t") | |||
.with_query("select median(request_bytes) from t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to request_bytes as median isn't well defined for string types, this would only "work" if there happened to be an odd number of values, or the query errored (as in this case)
❯ create table test(c varchar) as values ('foo'), ('world'), ('hello');
0 rows in set. Query took 0.005 seconds.
❯ select median(c) from test;
+----------------+
| MEDIAN(test.c) |
+----------------+
| hello |
+----------------+
1 row in set. Query took 0.005 seconds.
❯ insert into test values ('bar');
+-------+
| count |
+-------+
| 1 |
+-------+
1 row in set. Query took 0.002 seconds.
❯ select median(c) from test;
Internal error: Operator + is not implemented for types Utf8("foo") and Utf8("hello"). This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
data_type: DataType, | ||
arrays: Vec<ArrayRef>, | ||
all_values: Vec<ScalarValue>, | ||
all_values: Vec<T::Native>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
None | ||
} else if len % 2 == 0 { | ||
let (low, high, _) = d.select_nth_unstable_by(len / 2, cmp); | ||
let (_, low, _) = low.select_nth_unstable_by(low.len() - 1, cmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Which issue does this PR close?
Part of #6842
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?