Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(funnels): speed up funnel queries by preventing properties being passed across shards #13935

Merged
merged 46 commits into from
Jan 30, 2023

Conversation

yakkomajuri
Copy link
Contributor

@yakkomajuri yakkomajuri commented Jan 26, 2023

Problem

#13922

Changes

Derive the funnel steps in the innermost subquery in order to prevent sending full properties across shards. This can speed up queries by up to 3-5x (estimate based on the fact that the slower the query today the greater the gains).

The current query will send data in the order of megabytes to the other shard, whereas the previous query would send gigabytes and gigabytes (see e.g. first pic).

Benchmarks:

Screenshot 2023-01-26 at 11 23 01

See more
Screenshot 2023-01-25 at 14 44 26 Screenshot 2023-01-25 at 14 44 36

Note that I chose the path of least resistance to implement this. I think there's a refactor in order to simplify a lot of the logic (which is very messy) but I decided not to do that here. I kept things mostly as they are, with few changes.

A bunch of stale snapshots got deleted as a result of my testing.

How did you test this code?

Existing tests for funnels are all passing. Also tested manually.

Marked it as a feat because of the perf improvement but from the perspective of the query results it's just a refactor - no behavior should change.

@@ -30,8 +30,11 @@
min(latest_1) over (PARTITION by aggregation_target
ORDER BY timestamp DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING) latest_1
FROM
(SELECT aggregation_target,
timestamp,
(SELECT e.event as event,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is the one showing the flattening of the query clearly.

@yakkomajuri
Copy link
Contributor Author

Well.... Seems all's good for the "traditional funnel" and f'd for everything else :D

@yakkomajuri yakkomajuri marked this pull request as draft January 26, 2023 15:30
@yakkomajuri
Copy link
Contributor Author

Seems only unordered is the problem (besides this one tricky thing with exclusions that I know how to fix), looking into it

+ "{step_filter}"
)

query = f"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why split the overall thing up? IMO not adding too much clarity here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke about this on a call - essentially there are ordering requirements with the functions building this query (as they have side effects) and this was the cleanest way to build things now without a large refactor

query = f"""
SELECT {', '.join(_fields)} FROM events {self.EVENT_TABLE_ALIAS}
select_fragment = (
f"SELECT {', '.join(_fields)}" + "{extra_select_fields}" + f" FROM events {self.EVENT_TABLE_ALIAS}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do the following in f-strings: {{extra_select_fields}} if really needed, though the logic behind it is sus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to this

@yakkomajuri yakkomajuri marked this pull request as ready for review January 27, 2023 15:41
@yakkomajuri
Copy link
Contributor Author

Ok this is ready to go @macobo

Comment on lines +154 to +155
pdi.person_id as aggregation_target,
pdi.person_id as person_id ,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we certainly don't need this :D

Kinda don't wanna touch this PR anymore though so will fix in a follow up if OK

parsed_extra_fields = f", {', '.join(extra_fields)}" if extra_fields else ""

event_query, params = FunnelEventQuery(
inner_query, params = FunnelEventQuery(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we name this events_query?

extra_join = ""

if self._filter.breakdown:
if self._filter.breakdown_type == "cohort":
extra_join = self._get_cohort_breakdown_join()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's remove this empty line

Suggested change

event_query=event_query,
extra_select_fields = f", {', '.join(all_step_cols)}" if all_step_cols else ""

inner_query = inner_query.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Should we name this events_query or funnel_events_query? inner_query had me going in cycles since it was ambigious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remnant of the old implementation where the events_query had an inner_query. Will rename

@macobo
Copy link
Contributor

macobo commented Jan 30, 2023

Amazing work!

@yakkomajuri
Copy link
Contributor Author

I'll get this in because every change I've been making seems to be a bit of a struggle with checks failing due to API limits and what not. Want to get this out of sight out of mind :D Will put a PR right away addressing the review points though @macobo

@yakkomajuri yakkomajuri merged commit 531c367 into master Jan 30, 2023
@yakkomajuri yakkomajuri deleted the funnel-props branch January 30, 2023 11:22
tomasfarias pushed a commit that referenced this pull request Jan 31, 2023
… passed across shards (#13935)

* feat(funnels): speed up funnel queries by preventing properties being passed across shards

* Update snapshots

* Update snapshots

* testing

* Update snapshots

* Update snapshots

* new approach

* Update snapshots

* Update snapshots

* Update snapshots

* remove more unnecessary fields

* update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* remove more cols and see :)

* remove event_columns_to_query

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* update query building

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

* Update snapshots

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants