Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Record flattening converts objects to strings at the max depth #1400

Closed
edgarrmondragon opened this issue Feb 8, 2023 · 4 comments · Fixed by #1939
Closed

bug: Record flattening converts objects to strings at the max depth #1400

edgarrmondragon opened this issue Feb 8, 2023 · 4 comments · Fixed by #1939

Comments

@edgarrmondragon
Copy link
Collaborator

Singer SDK Version

0.19.0

Python Version

NA

Bug scope

Taps (catalog, state, stream maps, etc.)

Operating System

NA

Description

With config

{
  "flattening_enabled": true,
  "flattening_max_depth": 1
}

This record

{
    "type": "RECORD",
    "stream": "mystream",
    "record": {
        "email": "[email protected]",
        "count": 21,
        "user": {
            "id": 1,
            "sub": {
                "num": 1
            }
        }
    },
    "time_extracted": "2022-01-01T00:00:00+00:00"
}

is flattened into

{
    "type": "RECORD",
    "stream": "mystream",
    "record": {
        "email": "[email protected]",
        "count": 21,
        "user__id": 1,
        "user__sub": "{\"num\": 1}"
    },
    "time_extracted": "2022-01-01T00:00:00+00:00"
}

Note how user__sub is a string. The type of object is preserved in the schema message, though.

See the snapshots:

{"type": "SCHEMA", "stream": "mystream", "schema": {"properties": {"email": {"type": ["string", "null"]}, "count": {"type": ["integer", "null"]}, "user__id": {"type": ["integer", "null"]}, "user__sub": {"properties": {"num": {"type": ["integer", "null"]}}, "type": ["object", "null"]}}, "type": "object"}, "key_properties": []}

{"type": "RECORD", "stream": "mystream", "record": {"email": "[email protected]", "count": 21, "user__id": 1, "user__sub": "{\"num\": 1}"}, "time_extracted": "2022-01-01T00:00:00+00:00"}

Code

No response

@aaronsteers
Copy link
Contributor

aaronsteers commented Feb 8, 2023

@edgarrmondragon - Definitely a bug, since transformed node data doesn't match the outputted SCHEMA message.

Do you know if pipelinewise or other taps/targets will treat this as stringified JSON (as we've done with the record data) or as a JSON object (as we've done with the SCHEMA message)? Perhaps this feature primarily exists in the target layer, where that distinction is less of an issue?

Since targets can and should be able to serialize and/or stringify object-typed top level properties, I think I lean towards not stringifying the object, but instead passing along as an object - in line with what you've written above in the bug report. (Targets then would decide whether to stringify or store as an object/json column.

@edgarrmondragon
Copy link
Collaborator Author

Do you know if pipelinewise or other taps/targets will treat this as stringified JSON (as we've done with the record data) or as a JSON object (as we've done with the SCHEMA message)? Perhaps this feature primarily exists in the target layer, where that distinction is less of an issue?

@aaronsteers I only tested this with target-sqlite and it fails during the JSON schema validation, and I guess other targets behave similarly.

I think I lean towards not stringifying the object, but instead passing along as an object - in line with what you've written above in the bug report. (Targets then would decide whether to stringify or store as an object/json column.

I agree 👍

@stale
Copy link

stale bot commented Jul 18, 2023

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Jul 18, 2023
@edgarrmondragon
Copy link
Collaborator Author

Still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants