docs: materialization sync schedule

Adds and updates documentation for estuary/connectors#1696
estuary · Jul 17, 2024 · f991039 · f991039
1 parent d229b0a
commit f991039
Show file tree

Hide file tree

Showing 6 changed files with 208 additions and 70 deletions.
diff --git a/site/docs/reference/Connectors/materialization-connectors/BigQuery.md b/site/docs/reference/Connectors/materialization-connectors/BigQuery.md
@@ -64,8 +64,6 @@ For a complete introduction to resource organization in Bigquery, see the [BigQu
 | **`/bucket`** | Bucket | Name of the GCS bucket. | String | Required |
 | `/bucket_path` | Bucket path | Base path within the GCS bucket. Also called "Folder" in the GCS console. | String | |
 | `/billing_project_id` | Billing project ID | The project ID to which these operations are billed in BigQuery. Typically, you want this to be the same as `project_id` (the default). | String | Same as `project_id` |
-| `/advanced`                     | Advanced Options    | Options for advanced users. You should not typically need to modify these.                                                                  | object  |                            |
-| `/advanced/updateDelay`     | Update Delay    | Potentially reduce compute time by increasing the delay between updates. Defaults to 30 minutes if unset. | string  |  |
 
 To learn more about project billing, [see the BigQuery docs](https://cloud.google.com/billing/docs/how-to/verify-billing-enabled).
 
@@ -98,15 +96,10 @@ materializations:
       source: ${PREFIX}/${source_collection}
 ```
 
-## Update Delay
+## Sync Schedule
 
-The `Update Delay` parameter in Estuary materializations offers a flexible approach to data ingestion scheduling. This advanced option allows users to control when the materialization or capture tasks pull in new data by specifying a delay period. By incorporating an update delay into your workflow, you can effectively manage and optimize your active warehouse time, leading to potentially lower costs and more efficient data processing.
-
-An update delay is configured in the advanced settings of a materialization's configuration. It represents the amount of time the system will wait before it begins materializing the latest data. This delay is specified in hours and can be adjusted according to the needs of your data pipeline.
-
-For example, if an update delay is set to 2 hours, the materialization task will pause for 2 hours before processing the latest available data. This delay ensures that data is not pulled in immediately after it becomes available, allowing for batching and other optimizations that can reduce warehouse load and processing time.
-
-To configure an update delay, navigate the `Advanced Options` section of the materialization's configuration and select a value from the drop down. The default value for the update delay in Estuary materializations is set to 30 minutes.
+This connector supports configuring a schedule for sync frequency. You can read
+about how to configure this [here](../../materialization-sync-schedule.md).
 
 ## Delta updates
 

diff --git a/site/docs/reference/Connectors/materialization-connectors/Snowflake.md b/site/docs/reference/Connectors/materialization-connectors/Snowflake.md
@@ -136,8 +136,6 @@ Use the below properties to configure a Snowflake materialization, which will di
 | **`/credentials/user`**      | User                | Snowflake username                                                                                                                                              | string | Required         |
 | `/credentials/password`      | Password            | Required if using user_password authentication                                                                                                                  | string | Required         |
 | `/credentials/privateKey`    | Private Key         | Required if using jwt authentication                                                                                                                            | string | Required         |
-| `/advanced`                  | Advanced Options    | Options for advanced users. You should not typically need to modify these.                                                                                      | object |                  |
-| `/advanced/updateDelay`      | Update Delay        | Potentially reduce active warehouse time by increasing the delay between updates.                                                                               | string |                  |
 
 #### Bindings
 
@@ -209,6 +207,30 @@ materializations:
     source: ${PREFIX}/${source_collection}
 ```
 
+## Sync Schedule
+
+This connector supports configuring a schedule for sync frequency. You can read
+about how to configure this [here](../../materialization-sync-schedule.md).
+
+Snowflake compute is [priced](https://www.snowflake.com/pricing/) per second of
+activity, with a minimum of 60 seconds. Inactive warehouses don't incur charges.
+To keep costs down, you'll want to minimize your warehouse's active time.
+
+To accomplish this, we recommend a two-pronged approach:
+
+* [Configure your Snowflake warehouse to auto-suspend](https://docs.snowflake.com/en/sql-reference/sql/create-warehouse.html#:~:text=Specifies%20the%20number%20of%20seconds%20of%20inactivity%20after%20which%20a%20warehouse%20is%20automatically%20suspended.) after 60 seconds.
+
+   This ensures that after each transaction completes, you'll only be charged for one minute of compute, Snowflake's smallest granularity.
+
+   Use a query like the one shown below, being sure to substitute your warehouse name:
+
+   ```sql
+   ALTER WAREHOUSE ESTUARY_WH SET auto_suspend = 60;
+   ```
+
+* Configure the materialization's **Sync Schedule** based on your requirements for data freshness.
+
+
 ## Delta updates
 
 This connector supports both standard (merge) and [delta updates](../../../concepts/materialization.md#delta-updates).
@@ -245,47 +267,6 @@ This is because most materializations tend to be roughly chronological over time
 This means that updates of keys `/date, /user_id` will need to physically read far fewer rows as compared to a key like `/user_id`,
 because those rows will tend to live in the same micro-partitions, and Snowflake is able to cheaply prune micro-partitions that aren't relevant to the transaction.
 
-### Reducing active warehouse time
-
-Snowflake compute is [priced](https://www.snowflake.com/pricing/) per second of activity, with a minimum of 60 seconds.
-Inactive warehouses don't incur charges.
-To keep costs down, you'll want to minimize your warehouse's active time.
-
-Like other Estuary connectors, this is a real-time connector that materializes documents using continuous [**transactions**](../../../concepts/advanced/shards.md#transactions).
-Every time a Flow materialization commits a transaction, your warehouse becomes active.
-
-If your source data collection or collections don't change much, this shouldn't cause an issue;
-Flow only commits transactions when data has changed.
-However, if your source data is frequently updated, your materialization may have frequent transactions that result in
-excessive active time in the warehouse, and thus a higher bill from Snowflake.
-
-To mitigate this, we recommend a two-pronged approach:
-
-* [Configure your Snowflake warehouse to auto-suspend](https://docs.snowflake.com/en/sql-reference/sql/create-warehouse.html#:~:text=Specifies%20the%20number%20of%20seconds%20of%20inactivity%20after%20which%20a%20warehouse%20is%20automatically%20suspended.) after 60 seconds.
-
-   This ensures that after each transaction completes, you'll only be charged for one minute of compute, Snowflake's smallest granularity.
-
-   Use a query like the one shown below, being sure to substitute your warehouse name:
-
-   ```sql
-   ALTER WAREHOUSE ESTUARY_WH SET auto_suspend = 60;
-   ```
-
-* Configure the materialization's **update delay** by setting a value in the advanced configuration.
-
-For example, if you set the warehouse to auto-suspend after 60 seconds and set the materialization's
-update delay to 30 minutes, you can incur as little as 48 minutes per day of active time in the warehouse.
-
-### Update Delay
-
-The `Update Delay` parameter in Estuary materializations offers a flexible approach to data ingestion scheduling. This advanced option allows users to control when the materialization or capture tasks pull in new data by specifying a delay period. By incorporating an update delay into your workflow, you can effectively manage and optimize your active warehouse time, leading to potentially lower costs and more efficient data processing.
-
-An update delay is configured in the advanced settings of a materialization's configuration. It represents the amount of time the system will wait before it begins materializing the latest data. This delay is specified in hours and can be adjusted according to the needs of your data pipeline.
-
-For example, if an update delay is set to 2 hours, the materialization task will pause for 2 hours before processing the latest available data. This delay ensures that data is not pulled in immediately after it becomes available, allowing for batching and other optimizations that can reduce warehouse load and processing time.
-
-To configure an update delay, navigate the `Advanced Options` section of the materialization's configuration and select a value from the drop down. The default value for the update delay in Estuary materializations is set to 30 minutes.
-
 ### Snowpipe
 
 [Snowpipe](https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro) allows for loading data into target tables without waking up the warehouse, which can be cheaper and more performant. Snowpipe can be used for delta updates bindings, and it requires configuring your authentication using a private key. Instructions for configuring key-pair authentication can be found in this page: [Key-pair Authentication & Snowpipe](#key-pair-authentication--snowpipe)

diff --git a/site/docs/reference/Connectors/materialization-connectors/amazon-redshift.md b/site/docs/reference/Connectors/materialization-connectors/amazon-redshift.md
@@ -49,8 +49,6 @@ more of your Flow collections to your desired tables in the database.
 | **`/bucket`**             | S3 Staging Bucket | Name of the S3 bucket to use for staging data loads.                                                                                                             | string | Required         |
 | **`/region`**             | Region            | Region of the S3 staging bucket. For optimal performance this should be in the same region as the Redshift database cluster.                                     | string | Required         |
 | `/bucketPath`             | Bucket Path       | A prefix that will be used to store objects in S3.                                                                                                               | string |                  |
-| `/advanced`                     | Advanced Options    | Options for advanced users. You should not typically need to modify these.                                                                  | object  |                            |
-| `/advanced/updateDelay`     | Update Delay    | Potentially reduce active cluster time by increasing the delay between updates. Defaults to 30 minutes if unset. | string  |  |
 
 #### Bindings
 
@@ -83,6 +81,11 @@ materializations:
         source: ${PREFIX}/${COLLECTION_NAME}
 ```
 
+## Sync Schedule
+
+This connector supports configuring a schedule for sync frequency. You can read
+about how to configure this [here](../../materialization-sync-schedule.md).
+
 ## Setup
 
 You must configure your cluster to allow connections from Estuary. This can be accomplished by

diff --git a/site/docs/reference/Connectors/materialization-connectors/databricks.md b/site/docs/reference/Connectors/materialization-connectors/databricks.md
@@ -27,7 +27,7 @@ If you haven't yet captured your data from its external source, start at the beg
 
 You need to first create a SQL Warehouse if you don't already have one in your account. See [Databricks documentation](https://docs.databricks.com/en/sql/admin/create-sql-warehouse.html) on configuring a Databricks SQL Warehouse. After creating a SQL Warehouse, you can find the details necessary for connecting to it under the **Connection Details** tab.
 
-In order to save on costs, we recommend that you set the Auto Stop parameter for your SQL warehouse to the minimum available. Estuary's Databricks connector automatically delays updates to the destination up to a configured Update Delay (see the endpoint configuration below), with a default value of 30 minutes. If your SQL warehouse is configured to have an Auto Stop of more than 15 minutes, we disable the automatic delay since the delay is not as effective in saving costs with a long Auto Stop idle period.
+In order to save on costs, we recommend that you set the Auto Stop parameter for your SQL warehouse to the minimum available. Estuary's Databricks connector automatically delays updates to the destination according to the configured **Sync Schedule** (see configuration details below), with a default delay value of 30 minutes.
 
 You also need an access token for your user to be used by our connector, see the respective [documentation](https://docs.databricks.com/en/administration-guide/access-control/tokens.html) from Databricks on how to create an access token.
 
@@ -49,8 +49,6 @@ Use the below properties to configure a Databricks materialization, which will d
 | **`/credentials`**                       | Credentials  | Authentication credentials                                                                                                        | object                                                                                                             |                          |
 | **`/credentials/auth_type`**             | Role         | Authentication type, set to `PAT` for personal access token                                                                       | string                                                                                                             | Required                 |
 | **`/credentials/personal_access_token`** | Role         | Personal Access Token                                                                                                             | string                                                                                                             | Required                 |
-| /advanced                                | Advanced     | Options for advanced users. You should not typically need to modify these.                                                        | object                                                                                                             |                          |
-| /advanced/updateDelay                    | Update Delay | Potentially reduce active warehouse time by increasing the delay between updates. Defaults to 30 minutes if unset.                | string                                                                                                             | 30m                      |
 
 #### Bindings
 
@@ -86,6 +84,11 @@ materializations:
     source: ${PREFIX}/${source_collection}
 ```
 
+## Sync Schedule
+
+This connector supports configuring a schedule for sync frequency. You can read
+about how to configure this [here](../../materialization-sync-schedule.md).
+
 ## Delta updates
 
 This connector supports both standard (merge) and [delta updates](../../../concepts/materialization.md#delta-updates).
@@ -107,16 +110,6 @@ You can enable delta updates on a per-binding basis:
     source: ${PREFIX}/${source_collection}
 ```
 
-## Update Delay
-
-The `Update Delay` parameter in Estuary materializations offers a flexible approach to data ingestion scheduling. This advanced option allows users to control when the materialization or capture tasks pull in new data by specifying a delay period. By incorporating an update delay into your workflow, you can effectively manage and optimize your active warehouse time, leading to potentially lower costs and more efficient data processing.
-
-An update delay is configured in the advanced settings of a materialization's configuration. It represents the amount of time the system will wait before it begins materializing the latest data. This delay is specified in hours and can be adjusted according to the needs of your data pipeline.
-
-For example, if an update delay is set to 2 hours, the materialization task will pause for 2 hours before processing the latest available data. This delay ensures that data is not pulled in immediately after it becomes available, allowing for batching and other optimizations that can reduce warehouse load and processing time.
-
-To configure an update delay, navigate the `Advanced Options` section of the materialization's configuration and select a value from the drop down. The default value for the update delay in Estuary materializations is set to 30 minutes.
-
 ## Reserved words
 
 Databricks has a list of reserved words that must be quoted in order to be used as an identifier. Flow automatically quotes fields that are in the reserved words list. You can find this list in Databricks's documentation [here](https://docs.databricks.com/en/sql/language-manual/sql-ref-reserved-words.html) and in the table below.

diff --git a/site/docs/reference/Connectors/materialization-connectors/starburst.md b/site/docs/reference/Connectors/materialization-connectors/starburst.md
@@ -45,8 +45,6 @@ Use the below properties to configure a Starburst materialization, which will di
 | **`/region`**             | AWS Region             | Region of AWS storage                                                                                              | string | Required         |
 | **`/bucket`**             | Bucket name            |                                                                                                                    | string | Required         |
 | **`/bucketPath`**         | Bucket path            | A prefix that will be used to store objects in S3.                                                                 | string | Required         |
-| /advanced                 | Advanced               | Options for advanced users. You should not typically need to modify these.                                         | string |                  |
-| /advanced/updateDelay     | Update Delay           | Potentially reduce active warehouse time by increasing the delay between updates. Defaults to 30 minutes if unset. | string | 30m              |
 
 #### Bindings
 
@@ -84,6 +82,11 @@ materializations:
     source: ${PREFIX}/${source_collection}
 ```
 
+## Sync Schedule
+
+This connector supports configuring a schedule for sync frequency. You can read
+about how to configure this [here](../../materialization-sync-schedule.md).
+
 ## Reserved words
 
 Starburst Galaxy has a list of reserved words that must be quoted in order to be used as an identifier. Flow automatically quotes fields that are in the reserved words list. You can find this list in Trino's documentation [here](https://trino.io/docs/current/language/reserved.html) and in the table below.