Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up cohort property filtering #5854

Closed
macobo opened this issue Sep 8, 2021 · 6 comments
Closed

Speed up cohort property filtering #5854

macobo opened this issue Sep 8, 2021 · 6 comments
Labels
enhancement New feature or request feature/cohorts Feature Tag: Cohorts performance Has to do with performance. For PRs, runs the clickhouse query performance suite stale

Comments

@macobo
Copy link
Contributor

macobo commented Sep 8, 2021

Is your feature request related to a problem?

Cohort queries:

  1. Often result in unneeded subqueries which tank performance
  2. Don't make use of materialized columns

Describe the solution you'd like

Use top-level joins and materialized columns if possible.

To achieve this, I'd propose 'expanding' cohort filters to the appropriate Property objects before calling e.g. ClickhouseEventQuery.

There are 4 cases to consider:

  1. Cohort is a static cohort - can join with cohortpeople
  2. Cohort is precalculated - can join with cohortpeople
  3. Cohort has property filters - can expand it to the appropriate cohort filters.
    • Note that one of the properties might be person belonging to another cohort - need to handle recursion
  4. Cohort has action/event filter - proposal

1-2 requires adding a new property type. Proposal type=staticcohort
4 requires a new property type. User has done event/action in last X days. Proposal: Property type="action", add extra (optional) property fields for date range + target count.

Not sure how this will affect breakdowns, but similar work might make sense there.

Describe alternatives you've considered

Additional context

Related: #5461

This also paves the way for #2594

cc @Twixes who recently worked on this
cc @EDsCODE for property filtering thoughts

Thank you for your feature request – we love each and every one!

@macobo macobo added enhancement New feature or request performance Has to do with performance. For PRs, runs the clickhouse query performance suite feature/cohorts Feature Tag: Cohorts labels Sep 8, 2021
@Twixes
Copy link
Member

Twixes commented Sep 8, 2021

Not sure if there's a significant performance win for the 1, 2, and 4 here, but overall makes sense seeing the state of 3 (no support for mat columns the current way).
For 4 this should be called something else, as action already is a this-event-is-a-specific-action filter, while "has done" is historical. I propose type engagement, since that implies it being based on past events too + that's what we seem to call this kind of thing these days ("engagement cohorts").

@macobo
Copy link
Contributor Author

macobo commented Dec 17, 2021

Update on this: 1-2 are done.

3 is mostly done if a single condition is involved. For solving 3 completely, we depend on implementing "or property filtering" into our insights filtering. I couldn't find the ticket for this, but I know @clarkus has been designing for it.

For 4, we need to bring HAS DONE X in $timerange into our core analytics capabilities #2594

I don't think we should aim to solve this directly, instead should solve these prerequisites.

@macobo
Copy link
Contributor Author

macobo commented Jan 5, 2022

@EDsCODE @hazzadous Proposal: Let's expose the analytics capability in the API in the coming sprint and use it to close out this issue. This gets us to a more consistent state and unblocks adding these capabilities on the FE side as well.

@EDsCODE
Copy link
Member

EDsCODE commented Jan 5, 2022

The analytics capability meaning # 4?

@posthog-bot
Copy link
Contributor

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

@posthog-bot
Copy link
Contributor

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.

@posthog-bot posthog-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/cohorts Feature Tag: Cohorts performance Has to do with performance. For PRs, runs the clickhouse query performance suite stale
Projects
None yet
Development

No branches or pull requests

4 participants