Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(experiments): Initial data warehouse Trend support #26356

Merged
merged 33 commits into from
Nov 26, 2024

Conversation

danielbachhuber
Copy link
Contributor

@danielbachhuber danielbachhuber commented Nov 22, 2024

See #26332

Changes

Allows a data warehouse table to be assigned to a Trend experiment, and then uses the data warehouse table to calculate experiment results.

CleanShot 2024-11-26 at 04 11 11@2x

Includes tests for a Trend experiment where:

  • One user has two exposures, the second exposure is later than the data warehouse entry, and the first exposure is used as expected.
  • There's an extra data warehouse entry without any exposures that is ignored as expected.

How did you test this code?

Tests passed, and a fair amount of manual evaluation.

Copy link
Contributor

github-actions bot commented Nov 22, 2024

Size Change: 0 B

Total Size: 1.16 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.16 MB

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@danielbachhuber danielbachhuber changed the title feat(experiments): Support for data warehouse experiments feat(experiments): Initial data warehouse Trend experiment support Nov 26, 2024
@danielbachhuber danielbachhuber marked this pull request as ready for review November 26, 2024 12:14
@@ -226,7 +255,63 @@ def calculate(self) -> ExperimentTrendsQueryResponse:

def run(query_runner: TrendsQueryRunner, result_key: str, is_parallel: bool):
try:
result = query_runner.calculate()
database = create_hogql_database(team_id=self.team.pk)
Copy link
Contributor

@jurajmajerik jurajmajerik Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to confirm my understanding: the purpose of this is to create a Database instance, which is an object that lets you run HogQL queries across various data sources, like person-events or data warehouse tables.

We do this because we need to build a custom context with our own Database, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.

Please check if the above is correct and leave a comment in the code :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this because we need to build a custom context with our own Database, including our own table where we define the join in a custom way. Then, we pass this custom context to the query runner.

This is correct :) Left a comment in a07bb2a

join_table=database.get_table("events"),
join_function=lambda join_to_add, context, node: (
ast.JoinExpr(
table=ast.SelectQuery(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently joining over all events, but we should join only over the $feature_flag_events. We probably need a where clause here.

In the future we should also support custom exposure events here, but for now feel free to hardcode $feature_flag_called and leave a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, handled with f131df9

],
select_from=ast.JoinExpr(table=ast.Field(chain=["events"])),
),
join_type="ASOF LEFT JOIN",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to have a comment here explaining what we're doing, especially because some of how the ASOF JOIN works is only explicit in the Clickhouse docs:

        # ASOF JOIN finds the most recent matching event that occurred at or before each data warehouse timestamp.
        # 
        # Why this matters:
        # When a user performs an action (recorded in data warehouse), we want to know which
        # experiment variant they were assigned at that moment. The most recent $feature_flag_called
        # event before their action represents their active variant assignment.
        #
        # Example:
        #   Data Warehouse: timestamp=2024-01-03 12:00, distinct_id=user1
        #   Events: 
        #     2024-01-02: (user1, variant='control')   <- This event will be joined
        #     2024-01-03: (user1, variant='test')     <- Ignored
        #
        # This ensures we capture the correct causal relationship: which experiment variant
        # was the user assigned to when they performed the action?

(feel free to adjust)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, added the ASOF LEFT JOIN comment in e435442

@@ -376,6 +478,127 @@ def test_query_runner_with_holdout(self):
self.assertEqual(test_result.absolute_exposure, 9)
self.assertEqual(holdout_result.absolute_exposure, 4)

def test_query_runner_with_data_warehouse_series(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good tests! I'd also add some more checks for the join behavior:

  • Test that we correctly join only the $feature_flag_called events - add some other event that's closer to the data warehouse record that should be ignored
  • Cases where there are no preceding $feature_flag_called events shouldn't be joined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and corresponding logic with f131df9 and c6d70fc

@jurajmajerik
Copy link
Contributor

Fantastic work, this was very easy to follow! A few comments to address :)

Copy link
Contributor

@jurajmajerik jurajmajerik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢 it!

@danielbachhuber danielbachhuber changed the title feat(experiments): Initial data warehouse Trend experiment support feat(experiments): Initial data warehouse Trend support Nov 26, 2024
@danielbachhuber danielbachhuber merged commit 90a2854 into master Nov 26, 2024
96 checks passed
@danielbachhuber danielbachhuber deleted the experiments/data-warehouse-support-v3 branch November 26, 2024 15:42
Copy link

sentry-io bot commented Nov 26, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ KeyError: 'hubspot_companies' posthog.tasks.tasks.process_query_task View Issue
  • ‼️ CHQueryErrorCannotParseBool: DB::Exception: Cannot parse boolean value here: 'control', should be 'true' or 'false' controlled... posthog.tasks.tasks.process_query_task View Issue
  • ‼️ CHQueryErrorNotFoundColumnInBlock: DB::Exception: Not found column ifNull(nullIf(toString(transform(toString(if(has(__table1.propert... posthog.tasks.tasks.process_query_task View Issue
  • ‼️ QueryError: Could not find cohort with ID 6 posthog.tasks.tasks.process_query_task View Issue
  • ‼️ QueryError: Unable to resolve field: joined_at_cohorts posthog.tasks.tasks.process_query_task View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants