-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AggregateFunctionRetention #2887
Conversation
Currently there are two main sql scripts that can calculate the retention, but they run very slowly.
Which costs 70 seconds on my machine to process 3 billion dataset. 2.use
Which costs 30 seconds on my machine to process same dataset. But we could use
It only cost 6 seconds to get the result. Related issues #2120 |
What about the following query?
|
Here is some detail test results on my new datasets.
|
auto & offsets_to = static_cast<ColumnArray &>(to).getOffsets(); | ||
|
||
const bool first_flag = this->data(place).events.test(0); | ||
data_to.insert(first_flag ? Field(static_cast<UInt64>(1)) : Field(static_cast<UInt64>(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can gain more performance if you cast array elements to ColumnUInt8 and access its data directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nice advice.
Hmm, isn't this doable by CRDT tricks? a simple test shows that the plain SQL routine outperforms this UDAF.
|
👍 Great idea, But it gets more and more complicated as you add states. |
@amosbird That is really smart... 👍 But |
@sundy-li nice catch. the fix is trivial. Just replace |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
This implements an AggregateFunction
retention
, which could be used for user retention rate analysis.