Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hogql): properties between shards #14549

Merged
merged 18 commits into from
Mar 8, 2023
Merged

feat(hogql): properties between shards #14549

merged 18 commits into from
Mar 8, 2023

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Mar 3, 2023

Problem

Accessing an event's person's properties in HogQL would fetch the entire properties object and send it back to us the main select query.

Changes

Extracts properties within the person subquery and only returns those. Cutting down a lot on traffic between shards.

I do not have benchmarks, but looking at the queries, the effects should be similar to this: #13935

How did you test this code?

Wrote tests. Checked that all worked in the browser.

@PostHog PostHog deleted a comment from posthog-bot Mar 6, 2023
Copy link
Collaborator Author

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for a look

@@ -237,9 +237,38 @@
/* user_id:0 request:_snapshot_ */
SELECT event,
events.distinct_id,
replaceRegexpAll(JSONExtractRaw(events.properties, 'key'), '^"|"$', ''),
replaceRegexpAll(JSONExtractRaw(properties, 'key'), '^"|"$', ''),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously events. was added before properties to disambiguate it from the events__pdi__person.properties column that we selected in the join. That's no longer the case and thus it seems like it was automatically removed.


context = HogQLContext(within_non_hogql_query=False, using_person_on_events=True)
self.assertEqual(
# TODO: for now, explicitly writing "poe." to opt in. Automatic switching will come soon.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HogQL is not yet able to turn PoE (in its current form) on or off depending on the team's settings. It is doing this inside existing insights (non-hogql queries), but not in full hogql queries. This is planned for another PR.

@mariusandra mariusandra marked this pull request as ready for review March 6, 2023 19:36
@mariusandra mariusandra requested a review from a team March 6, 2023 19:36
Copy link
Member

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good!

Comment on lines +36 to +37
"person.properties.name",
"person.properties.email",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change from person.properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly for speed. I don't think we need to return the entire properties object with every event. There are rumours of possible customers incoming with 10k+ properties per person, etc. This makes it more in line with what selecting "person" returns in this table.

@mariusandra mariusandra merged commit 63a9453 into master Mar 8, 2023
@mariusandra mariusandra deleted the property-shard branch March 8, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants