-
Notifications
You must be signed in to change notification settings - Fork 71
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into kgpayne/version-when-package-name-differs-fr…
…om-exec-name
- Loading branch information
Showing
2 changed files
with
63 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Incremental Replication | ||
|
||
With incremental replication, a Singer tap emits only data that were created or updated since the previous import rather than the full table. | ||
|
||
To support incremental replication, the tap must first define how its replication state will be tracked, e.g. the id of the newest record or the maximal update timestamp in the previous import. | ||
|
||
You'll either have to manage your own [state file](https://hub.meltano.com/singer/spec#state-files-1), or use Meltano. The Singer SDK makes the tap state available through the [context object](./context_object.md) on subsequent runs. Using the state, the tap should then skip returning rows where the replication key comes _strictly before_ than previous maximal replication key value stored in the state. | ||
|
||
## Example Code: Timestamp-Based Incremental Replication | ||
|
||
```py | ||
class CommentsStream(RESTStream): | ||
|
||
replication_key = "date_gmt" | ||
is_sorted = True | ||
|
||
schema = th.PropertiesList( | ||
th.Property("date_gmt", th.DateTimeType, description="date"), | ||
).to_dict() | ||
|
||
def get_url_params(self, context, next_page_token): | ||
params = {} | ||
|
||
starting_date = self.get_starting_timestamp(context) | ||
if starting_date: | ||
params["after"] = starting_date.isoformat() | ||
|
||
if next_page_token is not None: | ||
params["page"] = next_page_token | ||
|
||
self.logger.info("QUERY PARAMS: %s", params) | ||
return params | ||
``` | ||
|
||
1. First we inform the SDK of the `replication_key`, which automatically triggers incremental import mode. | ||
|
||
2. Second, optionally, set `is_sorted` to true if the records are monotonically increasing (i.e. newer records always come later). With this setting, the sync will be resumable if it's interrupted at any point and the state file will reflect this. Otherwise, the tap has to run to completion so the state can safely reflect the largest replication value seen. | ||
|
||
3. Last, we have to adapt the query to the remote system, in this example by adding a query parameter with the ISO timestamp. | ||
|
||
|
||
```{note} | ||
- The SDK will throw an error if records come out of order when `is_sorted` is true. | ||
- Unlike a `primary_key`, a `replication_key` does not have to be unique | ||
- In incremental replication, it is OK and usually recommended to resend rows where the replication key is equal to previous highest key. Targets are expected to update rows that are re-synced. | ||
``` | ||
|
||
## Manually testing incremental import during development | ||
|
||
To test the tap in standalone mode, manually create a state file and run the tap: | ||
|
||
```shell | ||
$ echo '{"bookmarks": {"documents": {"replication_key": "date_gmt", "replication_key_value": "2023-01-15T12:00:00.120000"}}}' > state_test.json | ||
|
||
$ tap-my-example --config tap_config_test.json --state state_test.json | ||
``` | ||
|
||
## Additional References | ||
|
||
- [Tap SDK State](./implementation/state.md) | ||
- [Context Object](./context_object.md) | ||
- [Example tap with get_starting_replication_key_value](https://github.com/flexponsive/tap-eu-ted/blob/main/tap_eu_ted/client.py) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters