source-mixpanel-native: fix cohort_members
OOMs
#2170
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Some cohorts have enough members that the imported Airbyte connector OOMs as it paginates through them. This commit makes it so the connector flushes its buffer of
cohort_members
records after every 100 records are read.cohort_members
behaves like a full refresh stream, but Airbyte's incremental tools are used to force records to be flushed periodically, helping prevent OOMs on these large cohorts.cohort_members
was moved over to use the newerstate
property for state management to pre-position for future improvements.Potential future improvements include:
cohort_members
incremental by updating the state to contain cursors for each unique cohort.engage
with the changes tocohort_members
(assuming these changes work well).Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
Tested on a local stack. Confirmed:
cohort_members
acts like a full refresh streamcohort_members
checkpoints (i.e. flushes) records when paginating through a large cohort's members.Although not entirely necessary, I think all existing tasks should have their
cohort_members
bindings backfilled to avoid any funky behavior related to state management.This change is