Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify cohort filters #6277

Merged
merged 29 commits into from
Oct 8, 2021
Merged

Simplify cohort filters #6277

merged 29 commits into from
Oct 8, 2021

Conversation

macobo
Copy link
Contributor

@macobo macobo commented Oct 6, 2021

Changes

Follow-up to #6221

This PR starts "simplifying" cohort filters. The idea is to have the business logic bits happen before we do query building, which allows to do better optimizations, e.g. avoid unneeded subqueries.

The cases that got simplified are:

  • Cohort has person property filters - we add these filters to the filter object
  • Cohort would query precalculated/static cohort tables - we add that query.
    • There was one sessions query which did not join distinct_ids -> needed to take special care here.

Not everything is simplified yet. There's 3 cases which I'd love to tackle in a follow-up:

  • When the cohort does not exist anymore
  • Cohort has action filter (e.g. has done X in timerange)
  • Cohort has multiple filters combined with an OR

Once all of these are done we can "delete" a lot of special-case query building logic :)

One benchmark got slower. This is because a large column got communicated between threads which was previously used only in a subquery. This will be improved again by #5850 which this is a prerequisite for.

How did you test this code?

See test coverage. I verified all of insights + sessions work with the new cohort cases.

@timgl timgl temporarily deployed to posthog-pr-6277 October 6, 2021 11:55 Inactive
@macobo macobo changed the base branch from master to simplify-test-accounts-2 October 6, 2021 11:56
@macobo macobo temporarily deployed to posthog-pr-6277 October 6, 2021 12:28 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 6, 2021 12:56 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 6, 2021 13:54 Inactive
@macobo macobo force-pushed the simplify-test-accounts-2 branch from ca1232b to d941d6c Compare October 7, 2021 06:17
@macobo macobo force-pushed the simplify-cohort-filters branch from f641a62 to 5b4feaf Compare October 7, 2021 06:17
@macobo macobo force-pushed the simplify-cohort-filters branch from 5b4feaf to 3e21d6a Compare October 7, 2021 06:26
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 06:27 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 08:27 Inactive
@macobo macobo requested a deployment to posthog-pr-6277 October 7, 2021 08:37 Abandoned
@macobo macobo force-pushed the simplify-test-accounts-2 branch from 92bf70b to 0a495c3 Compare October 7, 2021 08:51
@macobo macobo force-pushed the simplify-cohort-filters branch from 444911d to cfcb484 Compare October 7, 2021 08:51
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 08:52 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 09:02 Inactive
@macobo macobo added the performance Has to do with performance. For PRs, runs the clickhouse query performance suite label Oct 7, 2021
@macobo macobo had a problem deploying to clickhouse-benchmarks October 7, 2021 09:04 Failure
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 09:04 Inactive
@macobo macobo temporarily deployed to clickhouse-benchmarks October 7, 2021 10:30 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 10:31 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 10:36 Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2021

ClickHouse query benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups. Blank means no changes.

Significantly changed benchmark results (PR vs master)

       before           after         ratio
     [04045ff4]       [7203ca4e]
+  7617.0±2.2e+02  11564.0±2.7e+02     1.52  benchmarks.QuerySuite.track_trends_filter_by_cohort
-       7158.5±59   5905.5±1.4e+02     0.82  benchmarks.QuerySuite.track_trends_filter_by_cohort_materialized
Click to view full benchmark results
All benchmarks:

     before           after         ratio
   [04045ff4]       [7203ca4e]
19180.5±1.4e+03  20986.0±1.2e+03     1.09  benchmarks.QuerySuite.track_trends_event_property_filter
 4276.0±4.3e+02        4184.5±88     0.98  benchmarks.QuerySuite.track_trends_event_property_filter_materialized
+  7617.0±2.2e+02  11564.0±2.7e+02     1.52  benchmarks.QuerySuite.track_trends_filter_by_cohort
-       7158.5±59   5905.5±1.4e+02     0.82  benchmarks.QuerySuite.track_trends_filter_by_cohort_materialized
 5592.0±2.6e+02   5416.0±1.3e+02     0.97  benchmarks.QuerySuite.track_trends_filter_by_cohort_precalculated
      2074.0±40        2107.5±22     1.02  benchmarks.QuerySuite.track_trends_no_filter
     11185.5±72       11656.0±88     1.04  benchmarks.QuerySuite.track_trends_person_property_filter
      5879.5±65        6072.0±81     1.03  benchmarks.QuerySuite.track_trends_person_property_filter_materialized

@macobo macobo temporarily deployed to clickhouse-benchmarks October 7, 2021 10:39 Inactive
@macobo macobo temporarily deployed to clickhouse-benchmarks October 7, 2021 11:29 Inactive
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 11:29 Inactive
@macobo macobo marked this pull request as ready for review October 7, 2021 11:49
@macobo macobo requested review from EDsCODE and neilkakkar October 7, 2021 11:54
@macobo macobo force-pushed the simplify-test-accounts-2 branch from 0a495c3 to 182837d Compare October 7, 2021 19:49
@macobo macobo force-pushed the simplify-cohort-filters branch from 4566e3b to 1692f4d Compare October 7, 2021 21:44
@macobo macobo had a problem deploying to clickhouse-benchmarks October 7, 2021 21:44 Failure
@macobo macobo temporarily deployed to posthog-pr-6277 October 7, 2021 21:45 Inactive
Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all points I had were covered in the description.

Thought I had while reading through this PR: would be nice to have some sort of full print of queries generated by certain functions. (We've talked about this previously but just thought of it again)

@macobo
Copy link
Contributor Author

macobo commented Oct 8, 2021

Thought I had while reading through this PR: would be nice to have some sort of full print of queries generated by certain functions. (We've talked about this previously but just thought of it again)

Absolutely. I'm waiting for the dice to fall on the way we'll write queries - if we'll use pytest going forward we can bring in a functioning snapshot library.

@macobo macobo merged commit 7461f90 into master Oct 8, 2021
@macobo macobo deleted the simplify-cohort-filters branch October 8, 2021 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Has to do with performance. For PRs, runs the clickhouse query performance suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants