-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(experiments): Initial data warehouse Trend support #26356
feat(experiments): Initial data warehouse Trend support #26356
Conversation
Size Change: 0 B Total Size: 1.16 MB ℹ️ View Unchanged
|
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
@@ -226,7 +255,63 @@ def calculate(self) -> ExperimentTrendsQueryResponse: | |||
|
|||
def run(query_runner: TrendsQueryRunner, result_key: str, is_parallel: bool): | |||
try: | |||
result = query_runner.calculate() | |||
database = create_hogql_database(team_id=self.team.pk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to confirm my understanding: the purpose of this is to create a Database
instance, which is an object that lets you run HogQL queries across various data sources, like person-events or data warehouse tables.
We do this because we need to build a custom context with our own Database
, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.
Please check if the above is correct and leave a comment in the code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this because we need to build a custom context with our own
Database
, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.
This is correct :) Left a comment in a07bb2a
join_table=database.get_table("events"), | ||
join_function=lambda join_to_add, context, node: ( | ||
ast.JoinExpr( | ||
table=ast.SelectQuery( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently joining over all events, but we should join only over the $feature_flag_events
. We probably need a where
clause here.
In the future we should also support custom exposure events here, but for now feel free to hardcode $feature_flag_called
and leave a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, handled with f131df9
], | ||
select_from=ast.JoinExpr(table=ast.Field(chain=["events"])), | ||
), | ||
join_type="ASOF LEFT JOIN", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to have a comment here explaining what we're doing, especially because some of how the ASOF JOIN works is only explicit in the Clickhouse docs:
# ASOF JOIN finds the most recent matching event that occurred at or before each data warehouse timestamp.
#
# Why this matters:
# When a user performs an action (recorded in data warehouse), we want to know which
# experiment variant they were assigned at that moment. The most recent $feature_flag_called
# event before their action represents their active variant assignment.
#
# Example:
# Data Warehouse: timestamp=2024-01-03 12:00, distinct_id=user1
# Events:
# 2024-01-02: (user1, variant='control') <- This event will be joined
# 2024-01-03: (user1, variant='test') <- Ignored
#
# This ensures we capture the correct causal relationship: which experiment variant
# was the user assigned to when they performed the action?
(feel free to adjust)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, added the ASOF LEFT JOIN
comment in e435442
@@ -376,6 +478,127 @@ def test_query_runner_with_holdout(self): | |||
self.assertEqual(test_result.absolute_exposure, 9) | |||
self.assertEqual(holdout_result.absolute_exposure, 4) | |||
|
|||
def test_query_runner_with_data_warehouse_series(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good tests! I'd also add some more checks for the join behavior:
- Test that we correctly join only the
$feature_flag_called
events - add some other event that's closer to the data warehouse record that should be ignored - Cases where there are no preceding
$feature_flag_called
events shouldn't be joined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work, this was very easy to follow! A few comments to address :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 it!
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
See #26332
Changes
Allows a data warehouse table to be assigned to a Trend experiment, and then uses the data warehouse table to calculate experiment results.
Includes tests for a Trend experiment where:
How did you test this code?
Tests passed, and a fair amount of manual evaluation.