clarify microbatch per feedback #6544

mirnawong1 · 2024-11-26T13:13:31Z

this pr makes updates to the microbatch doc based on internal slack feedback:

clarify microbatch is diff from other strategies and why
make note microbatch is new
add links
add info that event_time can be used for upstream parents and modify 'example' structure
add a separate 'how microbatch works' header to break wall of text
add bullets for easier reading

🚀 Deployment available! Here are the direct links to the updated files:

vercel · 2024-11-26T13:13:36Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
docs-getdbt-com	✅ Ready (Inspect)	Visit Preview	Nov 27, 2024 10:16am

website/docs/docs/build/incremental-microbatch.md

graciegoheen · 2024-11-26T22:17:26Z

website/docs/docs/build/incremental-microbatch.md

+
+When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
+
+Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills) — in the future, concurrently — and to [retry](#retry) them independently.


We just added support for concurrency - see #6550

great stuff, thanks Grace. i've tweaked this and will link out to the 'concurrently' line once we add a section about it and resolve 6550 🙏 that way, i don't block this pr.

#6550 (comment)

website/docs/docs/build/incremental-microbatch.md

Co-authored-by: Grace Goheen <[email protected]>

nataliefiann · 2024-11-27T10:38:40Z

website/docs/docs/build/incremental-microbatch.md

-Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.
+Microbatch is an incremental strategy designed for large time-series datasets:
+- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions.
+- It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing.


Hiya @mirnawong1

Just to ask, should "complements" be "complements" or "compliments" I wasn't sure if complements is a US spelling.

Kind Regards
Natalie

hey @nataliefiann , good question! this is right IMHO as complements means something that enhancements it. compliment generally means praise so in this context, microbatch enhances existing strategies.

nataliefiann

Hiya @mirnawong1

Thanks for creating this PR. I've approved this for you with a non-blocking QQ.

Kind Regards
Natalie

bela's feedback

8cb3b7c

mirnawong1 requested a review from a team as a code owner November 26, 2024 13:13

github-actions bot added content Improvements or additions to content Docs team Authored by the Docs team @dbt Labs size: small This change will take 1 to 2 days to address and removed Docs team Authored by the Docs team @dbt Labs labels Nov 26, 2024

vercel bot deployed to Preview November 26, 2024 13:15 View deployment

mirnawong1 commented Nov 26, 2024

View reviewed changes

website/docs/docs/build/incremental-microbatch.md Outdated Show resolved Hide resolved

Update website/docs/docs/build/incremental-microbatch.md

2eb0ff9

github-actions bot added the Docs team Authored by the Docs team @dbt Labs label Nov 26, 2024

vercel bot deployed to Preview November 26, 2024 13:18 View deployment

graciegoheen reviewed Nov 26, 2024

View reviewed changes

website/docs/docs/build/incremental-microbatch.md Outdated Show resolved Hide resolved

graciegoheen reviewed Nov 26, 2024

View reviewed changes

website/docs/docs/build/incremental-microbatch.md Outdated Show resolved Hide resolved

graciegoheen reviewed Nov 26, 2024

View reviewed changes

website/docs/docs/build/incremental-microbatch.md Outdated Show resolved Hide resolved

Update incremental-microbatch.md

75c4466

Co-authored-by: Grace Goheen <[email protected]>

vercel bot deployed to Preview November 26, 2024 22:25 View deployment

mirnawong1 and others added 2 commits November 26, 2024 22:25

Update incremental-microbatch.md

147f080

Co-authored-by: Grace Goheen <[email protected]>

Merge branch 'current' into mwong-update-microbatch

a83f03d

vercel bot deployed to Preview November 26, 2024 22:32 View deployment

Merge branch 'current' into mwong-update-microbatch

565efb0

vercel bot deployed to Preview November 27, 2024 09:19 View deployment

Merge branch 'current' into mwong-update-microbatch

e3d8e96

vercel bot deployed to Preview November 27, 2024 10:01 View deployment

mirnawong1 mentioned this pull request Nov 27, 2024

[Core] microbatch, running batches in parallel #6550

Open

grace's feedback

9c52827

vercel bot deployed to Preview November 27, 2024 10:16 View deployment

nataliefiann reviewed Nov 27, 2024

View reviewed changes

nataliefiann approved these changes Nov 27, 2024

View reviewed changes

mirnawong1 merged commit d1d3dee into current Nov 27, 2024
9 checks passed

mirnawong1 deleted the mwong-update-microbatch branch November 27, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clarify microbatch per feedback #6544

clarify microbatch per feedback #6544

mirnawong1 commented Nov 26, 2024 •

edited by github-actions bot

Loading

vercel bot commented Nov 26, 2024 •

edited

Loading

graciegoheen Nov 26, 2024

mirnawong1 Nov 27, 2024 •

edited

Loading

nataliefiann Nov 27, 2024

mirnawong1 Nov 27, 2024

nataliefiann left a comment


		When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

		Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills) — in the future, concurrently — and to [retry](#retry) them independently.

clarify microbatch per feedback #6544

clarify microbatch per feedback #6544

Conversation

mirnawong1 commented Nov 26, 2024 • edited by github-actions bot Loading

vercel bot commented Nov 26, 2024 • edited Loading

graciegoheen Nov 26, 2024

Choose a reason for hiding this comment

mirnawong1 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

nataliefiann Nov 27, 2024

Choose a reason for hiding this comment

mirnawong1 Nov 27, 2024

Choose a reason for hiding this comment

nataliefiann left a comment

Choose a reason for hiding this comment

mirnawong1 commented Nov 26, 2024 •

edited by github-actions bot

Loading

vercel bot commented Nov 26, 2024 •

edited

Loading

mirnawong1 Nov 27, 2024 •

edited

Loading