Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-mixpanel-native: fix cohort_members OOMs #2170

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Nov 25, 2024

Description:

Some cohorts have enough members that the imported Airbyte connector OOMs as it paginates through them. This commit makes it so the connector flushes its buffer of cohort_members records after every 100 records are read.

cohort_members behaves like a full refresh stream, but Airbyte's incremental tools are used to force records to be flushed periodically, helping prevent OOMs on these large cohorts. cohort_members was moved over to use the newer state property for state management to pre-position for future improvements.

Potential future improvements include:

  • Making cohort_members incremental by updating the state to contain cursors for each unique cohort.
  • Aligning engage with the changes to cohort_members (assuming these changes work well).

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack. Confirmed:

  • cohort_members acts like a full refresh stream
  • cohort_members checkpoints (i.e. flushes) records when paginating through a large cohort's members.
  • Memory % was typically ~30% or less while the connector paginated through results. Due to frequent schema inference updates causing connector restart & Mixpanel's 1 request/minute rate limit, I could only confirm this when paginating up through 6 pages.

Although not entirely necessary, I think all existing tasks should have their cohort_members bindings backfilled to avoid any funky behavior related to state management.


This change is Reviewable

Snapshot file names now align with what's generated automatically by `pytest`, making development a little easier.
Some cohorts have enough members that the imported Airbyte connector
OOMs as it paginates through them. This commit makes it so the connector
flushes it's buffer of `cohort_members` records after every 100 records
are read.
@Alex-Bair Alex-Bair linked an issue Nov 25, 2024 that may be closed by this pull request
@Alex-Bair Alex-Bair marked this pull request as ready for review November 25, 2024 21:35
@Alex-Bair Alex-Bair added the change:planned This is a planned change label Nov 25, 2024
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alex-Bair Alex-Bair merged commit 19a8cf4 into main Nov 25, 2024
71 of 78 checks passed
@Alex-Bair Alex-Bair deleted the bair/mixpanel-native-cohort-members-OOM branch November 25, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:planned This is a planned change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

source-mixpanel-native: runTransactions: readMessage: Killed
2 participants