Skip to content

Commit

Permalink
docs: Fix invalid JSON in Stream Maps page and add meltano.yml tabs (
Browse files Browse the repository at this point in the history
…#1756)

* Enable sphinx tabs

* Add meltano.yml versions for all stream configs

- Also removed trailing commas in list/dict objects to make the JSON syntactically valid
- Removed all comments from JSON and added to meltano.yml tab

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Set meltano.yml as first tab

* Set sphinx-inline-tabs to optional

Co-authored-by: Edgar R. M. <[email protected]>

* Set null to __NULL__ for all undeclared/excluded map properties

Co-authored-by: Edgar R. M. <[email protected]>

* `poetry lock`

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ken Payne <[email protected]>
Co-authored-by: Edgar R. M <[email protected]>
  • Loading branch information
4 people authored Jun 21, 2023
1 parent d27f8ad commit c6a8933
Show file tree
Hide file tree
Showing 4 changed files with 271 additions and 127 deletions.
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"sphinx_copybutton",
"myst_parser",
"sphinx_reredirects",
"sphinx_inline_tabs",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
195 changes: 159 additions & 36 deletions docs/stream_maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,23 +107,41 @@ The `stream_maps` config expects a mapping of stream names to a structured trans
Here is a sample `stream_maps` transformation which removes all references to `email` and
adds `email_domain` and `email_hash` as new properties:

`config.json`:
`meltano.yml` or `config.json`:

````{tab} meltano.yml
```yaml
stream_maps:
# Apply these transforms to the stream called 'customers'
customers:
# drop the PII field from RECORD and SCHEMA messages
email: __NULL__
# capture just the email domain
email_domain: owner_email.split('@')[-1]
# for uniqueness checks
email_hash: md5(config['hash_seed'] + owner_email)
stream_map_config:
# hash outputs are not able to be replicated without the original seed:
hash_seed: 01AWZh7A6DzGm6iJZZ2T
```
````

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": { // Apply these transforms to the stream called 'customers'
"email": null, // drop the PII field from RECORD and SCHEMA messages
"email_domain": "owner_email.split('@')[-1]", // capture just the email domain
"email_hash": "md5(config['hash_seed'] + owner_email)", // for uniqueness checks
"customers": {
"email": null,
"email_domain": "owner_email.split('@')[-1]",
"email_hash": "md5(config['hash_seed'] + owner_email)"
}
},
"stream_map_config": {
// hash outputs are not able to be replicated without the original seed:
"hash_seed": "01AWZh7A6DzGm6iJZZ2T"
}
}
```
````

If map expressions should have access to special config, such as in the
one-way hash algorithm above, define those config arguments within the optional
Expand Down Expand Up @@ -197,26 +215,47 @@ The following logic is applied in determining the SCHEMA of the transformed stre
To remove a stream, declare the stream within `stream_maps` config and assign it the value
`null`. For example:

```js
````{tab} meltano.yml
```yaml
stream_maps:
# don't sync the stream called 'addresses'
addresses: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"addresses": null // don't sync the stream called 'addresses'
},
"addresses": null
}
}
```
````

To remove a property, declare the property within the designated stream's map entry and
assign it the value `null`. For example:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# don't sync the 'email' stream property
email: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"email": null, // don't sync the 'email' stream property
"email": null
}
},
}
}
```
````

### Remove all undeclared streams or properties

Expand All @@ -230,43 +269,80 @@ below.

To remove all streams except the `customers` stream:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers: {}
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {},
"__else__": null
},
}
}
```
````

To remove all fields from the `customers` stream except `customer_id`:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
customer_id: customer_id
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": "customer_id",
"__else__": null
},
},
}
}
}
```
````

### Unset or modify the stream's primary key behavior

To override the stream's default primary key properties, add the `__key_properties__` operation within the stream map definition.

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# Remove the original Customer ID column
customer_id: __NULL__
# Add a new (and still unique) ID column
customer_id_hashed: md5(customer_id)
# Updated key to reflect the new name
__key_properties__:
- customer_id_hashed
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": null, // Remove the original Customer ID column
"customer_id_hashed": "md5(customer_id)", // Add a new (and still unique) ID column
"__key_properties__": ["customer_id_hashed"] // Updated key to reflect the new name
},
},
"customer_id": null,
"customer_id_hashed": "md5(customer_id)",
"__key_properties__": ["customer_id_hashed"]
}
}
}
```
````

Notes:

Expand All @@ -278,6 +354,15 @@ Notes:
Some applications, such as multi-tenant, may benefit from adding a property with a hardcoded string literal value.
These values need to be wrapped in double quotes to differentiate them from property names:

````{tab} meltano.yml
```yaml
stream_maps:
customers:
a_new_field: '\"client-123\"'
```
````

````{tab} JSON
```json
{
"stream_maps": {
Expand All @@ -287,6 +372,7 @@ These values need to be wrapped in double quotes to differentiate them from prop
}
}
```
````

#### Q: What is the difference between `primary_keys` and `key_properties`?

Expand All @@ -308,42 +394,69 @@ To alias a stream, simply add the operation `"__alias__": "new_name"` to the str
definition. For example, to alias the `customers` stream as `customer_v2`, use the
following:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
__alias__: customers_v2
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__alias__": "customers_v2"
},
},
}
}
}
```
````

## Duplicating or splitting a stream using `__source__`

To create a new stream as a copy of the original, specify the operation
`"__source__": "stream_name"`. For example, you can create a copy of the `customers` stream
which only contains PII properties using the following:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# Exclude these since we're capturing them in the pii stream
email: __NULL__
full_name: __NULL__
customers_pii:
__source__: customers
# include just the PII and the customer_id
customer_id: customer_id
email: email
full_name: full_name
# exclude anything not declared
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
// exclude these since we're capturing them in the pii stream
"email": null,
"full_name": null
},
"customers_pii": {
"__source__": "customers",
// include just the PII and the customer_id
"customer_id": "customer_id",
"email": "email",
"full_name": "full_name",
// exclude anything not declared
"__else__": null,
},
},
"__else__": null
}
}
}
```
````

## Filtering out records from a stream using `__filter__` operation

Expand All @@ -352,15 +465,25 @@ The `__filter__` operation accept a string expression which must evaluate to `tr

For example, to only include customers with emails from the `example.com` company domain:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
__filter__: email.endswith('@example.com')
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__filter__": "email.endswith('@example.com')"
}
},
}
}
```
````

### Understanding Filters' Affects on Parent-Child Streams

Expand Down
Loading

0 comments on commit c6a8933

Please sign in to comment.