Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Fix invalid JSON in Stream Maps page and add meltano.yml tabs #1756

Merged
merged 13 commits into from
Jun 21, 2023
Prev Previous commit
Next Next commit
Add meltano.yml versions for all stream configs
- Also removed trailing commas in list/dict objects to make the JSON syntactically valid
- Removed all comments from JSON and added to meltano.yml tab
mjsqu authored Jun 8, 2023
commit 4efe398e392c462d5b0fa047675f91504f749991
177 changes: 142 additions & 35 deletions docs/stream_maps.md
Original file line number Diff line number Diff line change
@@ -107,23 +107,41 @@ The `stream_maps` config expects a mapping of stream names to a structured trans
Here is a sample `stream_maps` transformation which removes all references to `email` and
adds `email_domain` and `email_hash` as new properties:

`config.json`:
`config.json` or `meltano.yml`:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": { // Apply these transforms to the stream called 'customers'
"email": null, // drop the PII field from RECORD and SCHEMA messages
"email_domain": "owner_email.split('@')[-1]", // capture just the email domain
"email_hash": "md5(config['hash_seed'] + owner_email)", // for uniqueness checks
"customers": {
"email": null,
"email_domain": "owner_email.split('@')[-1]",
"email_hash": "md5(config['hash_seed'] + owner_email)"
}
},
"stream_map_config": {
// hash outputs are not able to be replicated without the original seed:
"hash_seed": "01AWZh7A6DzGm6iJZZ2T"
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
# Apply these transforms to the stream called 'customers'
customers:
# drop the PII field from RECORD and SCHEMA messages
email: null
# capture just the email domain
email_domain: owner_email.split('@')[-1]
# for uniqueness checks
email_hash: md5(config['hash_seed'] + owner_email)
stream_map_config:
# hash outputs are not able to be replicated without the original seed:
hash_seed: 01AWZh7A6DzGm6iJZZ2T
```
````

If map expressions should have access to special config, such as in the
one-way hash algorithm above, define those config arguments within the optional
@@ -197,26 +215,47 @@ The following logic is applied in determining the SCHEMA of the transformed stre
To remove a stream, declare the stream within `stream_maps` config and assign it the value
`null`. For example:

```js
````{tab} JSON
```json
{
"stream_maps": {
"addresses": null // don't sync the stream called 'addresses'
},
"addresses": null
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
# don't sync the stream called 'addresses'
addresses: null
```
````

To remove a property, declare the property within the designated stream's map entry and
assign it the value `null`. For example:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"email": null, // don't sync the 'email' stream property
"email": null
}
},
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
# don't sync the 'email' stream property
email: null
```
````

### Remove all undeclared streams or properties

@@ -230,43 +269,80 @@ below.

To remove all streams except the `customers` stream:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {},
"__else__": null
},
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers: {}
__else__: null
mjsqu marked this conversation as resolved.
Show resolved Hide resolved
```
````

To remove all fields from the `customers` stream except `customer_id`:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": "customer_id",
"__else__": null
},
},
}
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
customer_id: customer_id
__else__: null
mjsqu marked this conversation as resolved.
Show resolved Hide resolved
```
````

### Unset or modify the stream's primary key behavior

To override the stream's default primary key properties, add the `__key_properties__` operation within the stream map definition.

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"customer_id": null, // Remove the original Customer ID column
"customer_id_hashed": "md5(customer_id)", // Add a new (and still unique) ID column
"__key_properties__": ["customer_id_hashed"] // Updated key to reflect the new name
},
},
"customer_id": null,
"customer_id_hashed": "md5(customer_id)",
"__key_properties__": ["customer_id_hashed"]
}
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
# Remove the original Customer ID column
customer_id: null
mjsqu marked this conversation as resolved.
Show resolved Hide resolved
# Add a new (and still unique) ID column
customer_id_hashed: md5(customer_id)
# Updated key to reflect the new name
__key_properties__:
- customer_id_hashed
```
````

Notes:

@@ -278,6 +354,7 @@ Notes:
Some applications, such as multi-tenant, may benefit from adding a property with a hardcoded string literal value.
These values need to be wrapped in double quotes to differentiate them from property names:

````{tab} JSON
```json
{
"stream_maps": {
@@ -287,6 +364,15 @@ These values need to be wrapped in double quotes to differentiate them from prop
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
a_new_field: '\"client-123\"'
```
````

#### Q: What is the difference between `primary_keys` and `key_properties`?

@@ -308,15 +394,26 @@ To alias a stream, simply add the operation `"__alias__": "new_name"` to the str
definition. For example, to alias the `customers` stream as `customer_v2`, use the
following:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__alias__": "customers_v2"
},
},
}
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
__alias__: customers_v2
```
````


## Duplicating or splitting a stream using `__source__`

@@ -337,9 +434,9 @@ which only contains PII properties using the following:
"customer_id": "customer_id",
"email": "email",
"full_name": "full_name",
"__else__": null,
},
},
"__else__": null
}
}
}
```
````
@@ -349,16 +446,16 @@ which only contains PII properties using the following:
stream_maps:
customers:
# Exclude these since we're capturing them in the pii stream
email:
full_name:
email: null
full_name: null
customers_pii:
__source__: customers
# include just the PII and the customer_id
customer_id: customer_id
email: email
full_name: full_name
# exclude anything not declared
__else__:
__else__: null
```
````

@@ -369,15 +466,25 @@ The `__filter__` operation accept a string expression which must evaluate to `tr

For example, to only include customers with emails from the `example.com` company domain:

```js
````{tab} JSON
```json
{
"stream_maps": {
"customers": {
"__filter__": "email.endswith('@example.com')"
}
},
}
}
```
````

````{tab} meltano.yml
```yaml
stream_maps:
customers:
__filter__: email.endswith('@example.com')
```
````

### Understanding Filters' Affects on Parent-Child Streams