feat(experiments): Initial data warehouse Trend support #26356

danielbachhuber · 2024-11-22T15:34:30Z

Changes

Allows a data warehouse table to be assigned to a Trend experiment, and then uses the data warehouse table to calculate experiment results.

Includes tests for a Trend experiment where:

One user has two exposures, the second exposure is later than the data warehouse entry, and the first exposure is used as expected.
There's an extra data warehouse entry without any exposures that is ignored as expected.

How did you test this code?

Tests passed, and a fair amount of manual evaluation.

github-actions · 2024-11-22T15:45:49Z

Size Change: 0 B

Total Size: 1.16 MB

ℹ️ View Unchanged

Filename	Size
`frontend/dist/toolbar.js`	1.16 MB

_{compressed-size-action}

posthog-bot · 2024-11-25T17:01:44Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot · 2024-11-26T11:20:09Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

jurajmajerik · 2024-11-26T13:58:49Z

posthog/hogql_queries/experiments/experiment_trends_query_runner.py

@@ -226,7 +255,63 @@ def calculate(self) -> ExperimentTrendsQueryResponse:

        def run(query_runner: TrendsQueryRunner, result_key: str, is_parallel: bool):
            try:
-                result = query_runner.calculate()
+                database = create_hogql_database(team_id=self.team.pk)


I just want to confirm my understanding: the purpose of this is to create a Database instance, which is an object that lets you run HogQL queries across various data sources, like person-events or data warehouse tables.

We do this because we need to build a custom context with our own Database, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.

Please check if the above is correct and leave a comment in the code :)

We do this because we need to build a custom context with our own Database, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.

This is correct :) Left a comment in a07bb2a

jurajmajerik · 2024-11-26T13:59:25Z

posthog/hogql_queries/experiments/experiment_trends_query_runner.py

+                        join_table=database.get_table("events"),
+                        join_function=lambda join_to_add, context, node: (
+                            ast.JoinExpr(
+                                table=ast.SelectQuery(


This is currently joining over all events, but we should join only over the $feature_flag_events. We probably need a where clause here.

In the future we should also support custom exposure events here, but for now feel free to hardcode $feature_flag_called and leave a comment.

Good catch, handled with f131df9

jurajmajerik · 2024-11-26T13:59:44Z

posthog/hogql_queries/experiments/experiment_trends_query_runner.py

+                                    ],
+                                    select_from=ast.JoinExpr(table=ast.Field(chain=["events"])),
+                                ),
+                                join_type="ASOF LEFT JOIN",


I'd love to have a comment here explaining what we're doing, especially because some of how the ASOF JOIN works is only explicit in the Clickhouse docs:

# ASOF JOIN finds the most recent matching event that occurred at or before each data warehouse timestamp. # # Why this matters: # When a user performs an action (recorded in data warehouse), we want to know which # experiment variant they were assigned at that moment. The most recent $feature_flag_called # event before their action represents their active variant assignment. # # Example: # Data Warehouse: timestamp=2024-01-03 12:00, distinct_id=user1 # Events: # 2024-01-02: (user1, variant='control') <- This event will be joined # 2024-01-03: (user1, variant='test') <- Ignored # # This ensures we capture the correct causal relationship: which experiment variant # was the user assigned to when they performed the action?

(feel free to adjust)

Good suggestion, added the ASOF LEFT JOIN comment in e435442

jurajmajerik · 2024-11-26T14:17:01Z

posthog/hogql_queries/experiments/test/test_experiment_trends_query_runner.py

@@ -376,6 +478,127 @@ def test_query_runner_with_holdout(self):
        self.assertEqual(test_result.absolute_exposure, 9)
        self.assertEqual(holdout_result.absolute_exposure, 4)

+    def test_query_runner_with_data_warehouse_series(self):


Good tests! I'd also add some more checks for the join behavior:

Test that we correctly join only the $feature_flag_called events - add some other event that's closer to the data warehouse record that should be ignored

Cases where there are no preceding $feature_flag_called events shouldn't be joined

Added tests and corresponding logic with f131df9 and c6d70fc

jurajmajerik · 2024-11-26T14:18:14Z

Fantastic work, this was very easy to follow! A few comments to address :)

jurajmajerik

🚢 it!

sentry-io · 2024-11-26T16:31:49Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ KeyError: 'hubspot_companies' posthog.tasks.tasks.process_query_task View Issue
‼️ CHQueryErrorCannotParseBool: DB::Exception: Cannot parse boolean value here: 'control', should be 'true' or 'false' controlled... posthog.tasks.tasks.process_query_task View Issue
‼️ CHQueryErrorNotFoundColumnInBlock: DB::Exception: Not found column ifNull(nullIf(toString(transform(toString(if(has(__table1.propert... posthog.tasks.tasks.process_query_task View Issue
‼️ QueryError: Could not find cohort with ID 6 posthog.tasks.tasks.process_query_task View Issue
‼️ QueryError: Unable to resolve field: joined_at_cohorts posthog.tasks.tasks.process_query_task View Issue

_{Did you find this useful? React with a 👍 or 👎}

Support for data warehouse experiments, v3

57ca26b

danielbachhuber mentioned this pull request Nov 22, 2024

feat(experiments): Support for data warehouse experiments #26247

Closed

danielbachhuber and others added 5 commits November 22, 2024 12:22

First pass at incorporating data warehouse tests

3fb6f4e

Fix type issues

e4ff1e9

Add a failing test case for the left join

4cd89e4

First pass at lazy joining the events table

3f9f2de

Update UI snapshots for chromium (2)

e4566be

danielbachhuber and others added 13 commits November 25, 2024 12:28

Property filters always use the column name

33d0149

Clean up the subselect

dcd47d4

Try an ASOF LEFT JOIN

31606f1

Update UI snapshots for chromium (1)

927a3c2

Update UI snapshots for chromium (1)

23cf381

Temporarily disable

7d7b3e8

Drop context

fddffa8

Restore to_printed_hogql()

a35d509

A note

f7c042e

Add a select to the table argument

22d7652

Add a scenario for out of bounds entry

b3fbff1

Merge branch 'master' into experiments/data-warehouse-support-v3

2e737af

Update UI snapshots for chromium (2)

15edbc2

danielbachhuber added 7 commits November 26, 2024 03:26

No longer need the column name

0df2168

Remove more extraneous code

bf607d1

Use potentially dynamic distinct_id and timestamp field names

dff03ea

Add a test for an invalid table name

7f273c6

Add a helpful note

79fc1a2

Clearer naming

310b03f

Don't apply to exposure queries quite yet

8330c53

Fix type issues

3163cb6

danielbachhuber changed the title ~~feat(experiments): Support for data warehouse experiments~~ feat(experiments): Initial data warehouse Trend experiment support Nov 26, 2024

danielbachhuber marked this pull request as ready for review November 26, 2024 12:14

danielbachhuber added 2 commits November 26, 2024 04:15

Remove comments

27416ac

Avoid unnecessary variable definition

6b3b721

danielbachhuber requested a review from jurajmajerik November 26, 2024 12:20

jurajmajerik reviewed Nov 26, 2024

View reviewed changes

Explain why we need a custom database instance

a07bb2a

jurajmajerik reviewed Nov 26, 2024

View reviewed changes

danielbachhuber added 3 commits November 26, 2024 06:33

Make sure we're only evaluating $feature_flag_called

f131df9

Add another exposure that gets ignored

c6d70fc

Explain ASOF LEFT JOIN

e435442

danielbachhuber requested a review from jurajmajerik November 26, 2024 14:42

jurajmajerik approved these changes Nov 26, 2024

View reviewed changes

danielbachhuber changed the title ~~feat(experiments): Initial data warehouse Trend experiment support~~ feat(experiments): Initial data warehouse Trend support Nov 26, 2024

danielbachhuber merged commit 90a2854 into master Nov 26, 2024
96 checks passed

danielbachhuber deleted the experiments/data-warehouse-support-v3 branch November 26, 2024 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experiments): Initial data warehouse Trend support #26356

feat(experiments): Initial data warehouse Trend support #26356

danielbachhuber commented Nov 22, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading

posthog-bot commented Nov 25, 2024

posthog-bot commented Nov 26, 2024

jurajmajerik Nov 26, 2024 •

edited

Loading

danielbachhuber Nov 26, 2024

jurajmajerik Nov 26, 2024

danielbachhuber Nov 26, 2024

jurajmajerik Nov 26, 2024

danielbachhuber Nov 26, 2024

jurajmajerik Nov 26, 2024

danielbachhuber Nov 26, 2024

jurajmajerik commented Nov 26, 2024

jurajmajerik left a comment

sentry-io bot commented Nov 26, 2024 •

edited

Loading

feat(experiments): Initial data warehouse Trend support #26356

feat(experiments): Initial data warehouse Trend support #26356

Conversation

danielbachhuber commented Nov 22, 2024 • edited Loading

Changes

How did you test this code?

github-actions bot commented Nov 22, 2024 • edited Loading

posthog-bot commented Nov 25, 2024

📸 UI snapshots have been updated

posthog-bot commented Nov 26, 2024

📸 UI snapshots have been updated

jurajmajerik Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

danielbachhuber Nov 26, 2024

Choose a reason for hiding this comment

jurajmajerik Nov 26, 2024

Choose a reason for hiding this comment

danielbachhuber Nov 26, 2024

Choose a reason for hiding this comment

jurajmajerik Nov 26, 2024

Choose a reason for hiding this comment

danielbachhuber Nov 26, 2024

Choose a reason for hiding this comment

jurajmajerik Nov 26, 2024

Choose a reason for hiding this comment

danielbachhuber Nov 26, 2024

Choose a reason for hiding this comment

jurajmajerik commented Nov 26, 2024

jurajmajerik left a comment

Choose a reason for hiding this comment

sentry-io bot commented Nov 26, 2024 • edited Loading

Suspect Issues

danielbachhuber commented Nov 22, 2024 •

edited

Loading

github-actions bot commented Nov 22, 2024 •

edited

Loading

jurajmajerik Nov 26, 2024 •

edited

Loading

sentry-io bot commented Nov 26, 2024 •

edited

Loading