-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add aggregate functions IntersectionsMax and IntersectionsMaxPos #2012
Conversation
Few points:
Currently we name our functions in
NULL values are skipped before being passed to aggregate functions.
It looks very specific to reserve zero value as something special... Motivation:
BTW, this example is incorrect, because you don't use database test in second query.
Don't understand how the aggregate function can return two rows. |
Missing documentation in code ( |
this is wrong according that you have two different functions. |
The function |
The user could expect that the function will work for floats and signed numbers. |
That's absolutely incorrect: you cannot insert two values in a column as a result. |
Missing performance test. |
I've slightly changed the algorithm. Please take a look. |
thank you for such a lot of fixes. i see that algorithm was completely changed. i'm not sure which approach is better, so i would wait till build in the master will be fixed and compare them. |
My algorithm will eat more memory if there are many intervals that start/end at the same second. We can simply improve it later by adding "compress" function that will sort an array and accumulate duplicate values (or do batch insertion into already sorted part of array); and we can call this function when array becomes large (and also before serialization of the state to save network traffic). Also my algorithm will be worse if you have small amount of duplicated intervals. For example, millions of rows with just a few unique intervals. Using plain array should be better for the following reasons:
|
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
new aggregate functions:
IntersectionsMax(start_column ,end_column)
IntersectionsMaxPos(start_column ,end_column)
returns maximum count of the intersected intervals defined by start_column and end_column values
if start_column value is NULL or 0 then interval will be skipped
if end_column value is NULL or 0 then interval is considered to have no end (interval (3,0) in example below)
IntersectionsMaxPos in addition returns position where the maximum of intersected intervals found
typical application for this functions (and why they were implemented) is to count maximum active calls for the certain time frame by stored calls detailed records with timestamps for the calls start/end
example:
intervals in the table after the insert:
functions output: