Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Fix invalid JSON in Stream Maps page and add meltano.yml tabs #1756

Merged
merged 13 commits into from
Jun 21, 2023
Merged
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"sphinx_copybutton",
"myst_parser",
"sphinx_reredirects",
"sphinx_inline_tabs",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
195 changes: 159 additions & 36 deletions docs/stream_maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,23 +107,41 @@ The `stream_maps` config expects a mapping of stream names to a structured trans
Here is a sample `stream_maps` transformation which removes all references to `email` and
adds `email_domain` and `email_hash` as new properties:

`config.json`:
`meltano.yml` or `config.json`:

````{tab} meltano.yml
```yaml
stream_maps:
# Apply these transforms to the stream called 'customers'
customers:
# drop the PII field from RECORD and SCHEMA messages
email: __NULL__
# capture just the email domain
email_domain: owner_email.split('@')[-1]
# for uniqueness checks
email_hash: md5(config['hash_seed'] + owner_email)
stream_map_config:
# hash outputs are not able to be replicated without the original seed:
hash_seed: 01AWZh7A6DzGm6iJZZ2T
```
````

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": { // Apply these transforms to the stream called 'customers'
"email": null, // drop the PII field from RECORD and SCHEMA messages
"email_domain": "owner_email.split('@')[-1]", // capture just the email domain
"email_hash": "md5(config['hash_seed'] + owner_email)", // for uniqueness checks
"customers": {
"email": null,
"email_domain": "owner_email.split('@')[-1]",
"email_hash": "md5(config['hash_seed'] + owner_email)"
}
},
"stream_map_config": {
// hash outputs are not able to be replicated without the original seed:
"hash_seed": "01AWZh7A6DzGm6iJZZ2T"
}
}
```
````

If map expressions should have access to special config, such as in the
one-way hash algorithm above, define those config arguments within the optional
Expand Down Expand Up @@ -197,26 +215,47 @@ The following logic is applied in determining the SCHEMA of the transformed stre
To remove a stream, declare the stream within `stream_maps` config and assign it the value
`null`. For example:

```js
````{tab} meltano.yml
```yaml
stream_maps:
# don't sync the stream called 'addresses'
addresses: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"addresses": null // don't sync the stream called 'addresses'
},
"addresses": null
}
}
```
````

To remove a property, declare the property within the designated stream's map entry and
assign it the value `null`. For example:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# don't sync the 'email' stream property
email: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"email": null, // don't sync the 'email' stream property
"email": null
}
},
}
}
```
````

### Remove all undeclared streams or properties

Expand All @@ -230,43 +269,80 @@ below.

To remove all streams except the `customers` stream:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers: {}
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {},
"__else__": null
},
}
}
```
````

To remove all fields from the `customers` stream except `customer_id`:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
customer_id: customer_id
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": "customer_id",
"__else__": null
},
},
}
}
}
```
````

### Unset or modify the stream's primary key behavior

To override the stream's default primary key properties, add the `__key_properties__` operation within the stream map definition.

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# Remove the original Customer ID column
customer_id: __NULL__
# Add a new (and still unique) ID column
customer_id_hashed: md5(customer_id)
# Updated key to reflect the new name
__key_properties__:
- customer_id_hashed
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": null, // Remove the original Customer ID column
"customer_id_hashed": "md5(customer_id)", // Add a new (and still unique) ID column
"__key_properties__": ["customer_id_hashed"] // Updated key to reflect the new name
},
},
"customer_id": null,
"customer_id_hashed": "md5(customer_id)",
"__key_properties__": ["customer_id_hashed"]
}
}
}
```
````

Notes:

Expand All @@ -278,6 +354,15 @@ Notes:
Some applications, such as multi-tenant, may benefit from adding a property with a hardcoded string literal value.
These values need to be wrapped in double quotes to differentiate them from property names:

````{tab} meltano.yml
```yaml
stream_maps:
customers:
a_new_field: '\"client-123\"'
```
````

````{tab} JSON
```json
{
"stream_maps": {
Expand All @@ -287,6 +372,7 @@ These values need to be wrapped in double quotes to differentiate them from prop
}
}
```
````

#### Q: What is the difference between `primary_keys` and `key_properties`?

Expand All @@ -308,42 +394,69 @@ To alias a stream, simply add the operation `"__alias__": "new_name"` to the str
definition. For example, to alias the `customers` stream as `customer_v2`, use the
following:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
__alias__: customers_v2
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__alias__": "customers_v2"
},
},
}
}
}
```
````

## Duplicating or splitting a stream using `__source__`

To create a new stream as a copy of the original, specify the operation
`"__source__": "stream_name"`. For example, you can create a copy of the `customers` stream
which only contains PII properties using the following:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
# Exclude these since we're capturing them in the pii stream
email: __NULL__
full_name: __NULL__
customers_pii:
__source__: customers
# include just the PII and the customer_id
customer_id: customer_id
email: email
full_name: full_name
# exclude anything not declared
__else__: __NULL__
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
// exclude these since we're capturing them in the pii stream
"email": null,
"full_name": null
},
"customers_pii": {
"__source__": "customers",
// include just the PII and the customer_id
"customer_id": "customer_id",
"email": "email",
"full_name": "full_name",
// exclude anything not declared
"__else__": null,
},
},
"__else__": null
}
}
}
```
````

## Filtering out records from a stream using `__filter__` operation

Expand All @@ -352,15 +465,25 @@ The `__filter__` operation accept a string expression which must evaluate to `tr

For example, to only include customers with emails from the `example.com` company domain:

```js
````{tab} meltano.yml
```yaml
stream_maps:
customers:
__filter__: email.endswith('@example.com')
```
````

````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__filter__": "email.endswith('@example.com')"
}
},
}
}
```
````

### Understanding Filters' Affects on Parent-Child Streams

Expand Down
Loading