-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLICKHOUSE-3547 streaming histogram aggregation #2521
Conversation
242371c
to
a9333b3
Compare
UInt32 bins_count; | ||
|
||
#define READ(VAL, PARAM) \ | ||
VAL = applyVisitor(FieldVisitorConvertToNumber<decltype(VAL)>(), PARAM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Macro is used for single case. Why don't write bins_count = applyVisitor(FieldVisitorConvertToNumber<UInt32>(), params[0]);
?
{ | ||
if (params.size() != 1) | ||
{ | ||
throw Exception("Function " + name + " requires only bins count"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add ErrorCode.
It's not obvious that bins count is a parameter (not an argument), and we must specify it. For example:
:) select histogram(number) from (select * from system.numbers limit 20);
SELECT histogram(number)
FROM
(
SELECT *
FROM system.numbers
LIMIT 20
)
Received exception from server (version 1.1.54386):
Code: 0. DB::Exception: Received from localhost:9004, ::1. DB::Exception: Function histogram requires only bins count.
next[size] = 0; | ||
previous[0] = size; | ||
|
||
using QueueItem = std::pair<Mean, int>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use int
here? max_bins
and size
are UInt32.
{ | ||
if (!points) | ||
init(arena); | ||
points[size++] = {value, weight}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use foreign memory if max_bins == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_bins=0 is strange case. I think function should accept only positive bin count.
{ | ||
throw Exception("Function " + name + " requires only bins count"); | ||
} | ||
assertUnary(name, arguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need to check that argument is numeric.
…into ssmike-CLICKHOUSE-3547
Simple test for performance and precision:
it's about 8.5 million rows/sec per single core. |
Looks like the results are wrong:
|
|
The values don't look real on this query:
|
TODO: allocate all state inplace, get rid of |
The type of parameter is not checked:
|
As we preallocate memory for maximum size of histogram, better to limit its size by 256 (or maybe 1000). |
Incorrect error message:
|
This should throw an exception: |
SELECT topK(0.2)(number) FROM (SELECT * FROM system.numbers LIMIT 50) Received exception from server (version 1.1.54387): ¯\_(ツ)_/¯ |
Ok, I've fixed limits. But I don't understand what you expect to see in histogram for one value. |
efbe03f
to
877acef
Compare
I am confused, how to use it for following scenerio I have table with two columns such as route name and response time as duration. I want to see the histogram of response time with maximum duration of 5 second with 10 buckets. is this possible?
|
Here we applied |
@alexey-milovidov Awesome. That makes sense. Let's say If don't use the
Here I can understand that 10 ranges will come, but what is the ceiling limit? |
Histogram will span between actual minimum and maximum of data value. |
@alexey-milovidov Great. Thanks |
Hi, is it possible to histogram return fixed size ranges within the number of buckets specified for example 0-5,5-10,10-15,... that would be very helpful Thanks |
@alexey-milovidov We have the following table
We could like something like
We would like to do get both the sum of count and the histogram together |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en