Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify microbatch per feedback #6544

Merged
merged 8 commits into from
Nov 27, 2024
Merged

clarify microbatch per feedback #6544

merged 8 commits into from
Nov 27, 2024

Conversation

mirnawong1
Copy link
Contributor

@mirnawong1 mirnawong1 commented Nov 26, 2024

this pr makes updates to the microbatch doc based on internal slack feedback:

  • clarify microbatch is diff from other strategies and why
  • make note microbatch is new
  • add links
  • add info that event_time can be used for upstream parents and modify 'example' structure
  • add a separate 'how microbatch works' header to break wall of text
  • add bullets for easier reading

🚀 Deployment available! Here are the direct links to the updated files:

@mirnawong1 mirnawong1 requested a review from a team as a code owner November 26, 2024 13:13
Copy link

vercel bot commented Nov 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview Nov 27, 2024 10:16am

@github-actions github-actions bot added content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: small This change will take 1 to 2 days to address and removed Docs team Authored by the Docs team @dbt Labs labels Nov 26, 2024
@github-actions github-actions bot added the Docs team Authored by the Docs team @dbt Labs label Nov 26, 2024

When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills) — in the future, concurrently — and to [retry](#retry) them independently.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just added support for concurrency - see #6550

Copy link
Contributor Author

@mirnawong1 mirnawong1 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great stuff, thanks Grace. i've tweaked this and will link out to the 'concurrently' line once we add a section about it and resolve 6550 🙏 that way, i don't block this pr.

#6550 (comment)

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.
Microbatch is an incremental strategy designed for large time-series datasets:
- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions.
- It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya @mirnawong1

Just to ask, should "complements" be "complements" or "compliments" I wasn't sure if complements is a US spelling.

Kind Regards
Natalie

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @nataliefiann , good question! this is right IMHO as complements means something that enhancements it. compliment generally means praise so in this context, microbatch enhances existing strategies.

Copy link
Contributor

@nataliefiann nataliefiann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya @mirnawong1

Thanks for creating this PR. I've approved this for you with a non-blocking QQ.

Kind Regards
Natalie

@mirnawong1 mirnawong1 merged commit d1d3dee into current Nov 27, 2024
9 checks passed
@mirnawong1 mirnawong1 deleted the mwong-update-microbatch branch November 27, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: small This change will take 1 to 2 days to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants