From dc911a3f489222a3abe2e57b8d4c04820616a1c0 Mon Sep 17 00:00:00 2001 From: faithebear <149540831+faithebear@users.noreply.github.com> Date: Fri, 8 Nov 2024 17:48:54 -0700 Subject: [PATCH] remove snapshot query best practices--not relevant anymore for 1.9 and beyond (#6448) ## What are you changing in this pull request and why? The Snapshot Query best practices page feels out of date to me mentioning things like "avoid joins in your query" , "include as many columns as possible", etc. Given that there's already a configs best practice above this section, it feels like this section should get deleted altogether. ## Checklist - [ ] I have reviewed the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and/or [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content) guidelines. - [ ] I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." - [ ] The content in this PR requires a dbt release note, so I added one to the [release notes page](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes). --- website/docs/docs/build/snapshots.md | 23 ----------------------- 1 file changed, 23 deletions(-) diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index f5321aa626a..8045dac117b 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -390,29 +390,6 @@ snapshots: -## Snapshot query best practices - -This section outlines some best practices for writing snapshot queries: - -- #### Snapshot source data - Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data - -- #### Use the `source` function in your query - This helps when understanding data lineage in your project. - -- #### Include as many columns as possible - In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. - -- #### Avoid joins in your snapshot query - Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. - -- #### Limit the amount of transformation in your query - If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. - -Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include: -* Selecting specific columns if the table is wide. -* Doing light transformation to get data into a reasonable shape, for example, unpacking a blob to flatten your source data into columns. - ## Snapshot meta-fields Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*.