Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed KeyError 'type' for a plain string array #72

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

spacecowboy
Copy link
Contributor

@spacecowboy spacecowboy commented Feb 8, 2022

Problem

https://github.com/transferwise/pipelinewise-tap-postgres
generates the following record for a character varying[] db column: {"type": ["null", "array"], "items": {}}}

This crashes the loader due to a an error: KeyError 'type' in

def is_unstructured_object(props):
    """Check if property is object and it has no properties."""
    return 'object' in props['type'] and not props.get('properties')

which was called from here

                    # dump array elements to strings
                    elif (
                        'array' in props['type'] and
                        is_unstructured_object(props.get('items', {}))
                    ):
                        result[name] = [json.dumps(value) for value in flatten[name]]

Proposed changes

The fix ensures that a plain array record ends up in the correct code branch for the current type conversion whereas currently it will end up in the wrong branch even if the crash wouldn't happen.

Types of changes

What types of changes does your code introduce to target-bigquery?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

@jmriego
Copy link
Owner

jmriego commented Feb 11, 2022

Thanks for all the details! Do you mind adding some testing for this data type?
Also, just out of curiosity, what would be an example table DDL in Postgres that causes this so I don't misinterpret it?

@spacecowboy
Copy link
Contributor Author

character varying[]

As mentioned, a Postgres column of type character varying[] will cause this to be generated.

It is probably related to me running your tap-postgres using Meltano. tap-postgres adds a $ref column but it appears to be filtered out by Meltano. It doesn't show up here at least.

I've since solved it by overriding the schema manually.

@jmriego
Copy link
Owner

jmriego commented Feb 11, 2022

I guess it's worth checking if Meltano or PipelineWise are using a wrong schema for that or it's actually any of those two possibilities that should be written into BigQuery as an array of strings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants