Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for anyOf in property type declarations #86

Closed
visch opened this issue Jan 23, 2023 · 6 comments · Fixed by #158
Closed

Support for anyOf in property type declarations #86

visch opened this issue Jan 23, 2023 · 6 comments · Fixed by #158
Assignees
Labels
bug Something isn't working

Comments

@visch
Copy link
Member

visch commented Jan 23, 2023

Need a test for this (Check the Standard Target tests in the SDK to see if we have one)

https://meltano.slack.com/archives/C01TCRBBJD7/p1673947888707009?thread_ts=1673874691.323399&cid=C01TCRBBJD7

{
   "type":"SCHEMA",
   "stream":"commits",
   "schema":{
      "properties":{
         "id":{
            "type":[
               "null",
               "string"
            ]
         },
         "project_id":{
            "type":[
               "null",
               "integer"
            ]
         },
         "short_id":{
            "type":[
               "null",
               "string"
            ]
         },
         "title":{
            "type":[
               "null",
               "string"
            ]
         },
         "author_name":{
            "type":[
               "null",
               "string"
            ]
         },
         "author_email":{
            "type":[
               "null",
               "string"
            ]
         },
         "authored_date":{
            "anyOf":[  // <<<< nullable datetime
               {
                  "type":"string",
                  "format":"date-time"
               },
               {
                  "type":"null"
               }
            ]
         },
         "committer_name":{
            "type":[
               "null",
               "string"
            ]
         },
         "committer_email":{
            "type":[
               "null",
               "string"
            ]
         },
         "committed_date":{
            "anyOf":[  // <<<< nullable datetime
               {
                  "type":"string",
                  "format":"date-time"
               },
               {
                  "type":"null"
               }
            ]
         },
         "created_at":{
            "anyOf":[  // <<<< nullable datetime
               {
                  "type":"string",
                  "format":"date-time"
               },
               {
                  "type":"null"
               }
            ]
         },
         "message":{
            "type":[
               "null",
               "string"
            ]
         },
         "allow_failure":{
            "type":[
               "null",
               "boolean"
            ]
         },
         "parent_ids":{
            "anyOf":[  // <<<< nullable array of strings
               {
                  "type":"array",
                  "items":{
                     "type":[
                        "null",
                        "string"
                     ]
                  }
               },
               {
                  "type":"null"
               }
            ]
         },
         "stats":{
            "properties":{
               "additions":{
                  "type":[
                     "null",
                     "integer"
                  ]
               },
               "deletions":{
                  "type":[
                     "null",
                     "integer"
                  ]
               },
               "total":{
                  "type":[
                     "null",
                     "integer"
                  ]
               }
            },
            "type":"object"
         }
      },
      "type":"object"
   },
   "key_properties":[
      "id"
   ]
}

Doesn't seem to be working.

Same as above but in a testable format

{"type": "SCHEMA", "stream": "commits", "schema": {"properties": {"id": {"type": ["null", "string"]}, "project_id": {"type": ["null", "integer"]}, "short_id": {"type": ["null", "string"]}, "title": {"type": ["null", "string"]}, "author_name": {"type": ["null", "string"]}, "author_email": {"type": ["null", "string"]}, "authored_date": {"anyOf": [{"type": "string", "format": "date-time"}, {"type": "null"}]}, "committer_name": {"type": ["null", "string"]}, "committer_email": {"type": ["null", "string"]}, "committed_date": {"anyOf": [{"type": "string", "format": "date-time"}, {"type": "null"}]}, "created_at": {"anyOf": [{"type": "string", "format": "date-time"}, {"type": "null"}]}, "message": {"type": ["null", "string"]}, "allow_failure": {"type": ["null", "boolean"]}, "parent_ids": {"anyOf": [{"type": "array", "items": {"type": ["null", "string"]}}, {"type": "null"}]}, "stats": {"properties": {"additions": {"type": ["null", "integer"]}, "deletions": {"type": ["null", "integer"]}, "total": {"type": ["null", "integer"]}}, "type": "object"}}, "type": "object"}, "key_properties": ["id"]}
@visch visch added the bug Something isn't working label Jan 23, 2023
@visch visch self-assigned this Jan 23, 2023
@aaronsteers aaronsteers changed the title Support for anyOf Support for anyOf in property type declarations Jan 23, 2023
@aaronsteers
Copy link
Contributor

@visch - I took the liberty of expanding the code snippet and adding notations for the types of anyOf use cases represented.

It looks like all of these implementations are for the purpose of making the type nullable. The coded handler will tackle this syntax:

         "committer_email":{
            "type":[
               "null",
               "string"
            ]
         },

but not

         "authored_date":{
            "anyOf":[  // <<<< nullable datetime
               {
                  "type":"string",
                  "format":"date-time"
               },
               {
                  "type":"null"
               }
            ]
         },

Adding this context because nullability is much simpler to implement than something like allowing a list or a string, or a datetime or an int. Those types of anyOf will force the target to try to find the least common denominator, which could end in coalescing everything to a string, or forcing items to array type when a single non-array item is received. Those more complex behaviors are probably best to keep out of scope unless we need to tackle sooner than later.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 23, 2023

@visch - One other question here: do you know if we currently fail gracefully (such as falling back to str) when anyOf is received and not parseable?

@visch
Copy link
Member Author

visch commented Jan 23, 2023

@visch - One other question here: do you know if we currently fail gracefully (such as falling back to str) when anyOf is received and not parseable?

I believe it fails today with a KeyError due to https://github.com/MeltanoLabs/target-postgres/blob/main/target_postgres/connector.py#L89

yes it does, in that slack thread there's a stack trace:

2023-01-15T07:04:47.084980Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/singer_sdk/connectors/sql.py", line 668, in prepare_table cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.087739Z [info     ]     self.create_empty_table(   cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.088341Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/target_postgres/connector.py", line 140, in create_empty_table cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.089901Z [info     ]     self.to_sql_type(property_jsonschema), cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.090177Z [info     ]   File "/workspaces/engineering-metrics/meltano/engineering-metrics/.meltano/loaders/target-postgres/venv/lib/python3.10/site-packages/target_postgres/connector.py", line 89, in to_sql_type cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.098893Z [info     ]     if "integer" in jsonschema_type["type"]: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.099580Z [info     ] KeyError: 'type'               cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-15T07:04:47.215360Z [error    ] Loader failed  

In the process I made some bugs it seems! Need to add this to the test suite :O.

We could try hard to test that we can push in types we haven't accounted for but I've had some issues with sqlalchemy -> pyodbc has there's sometimes typing issues specifically with JSON Arrays which makes that want tricky. Maybe there's a way to make it work in a general case.

@pnadolny13
Copy link

@visch I ran into this today. I got a KeyError for tap-gitlab's project schema using anyOf. It looks like its caused by

if "integer" in jsonschema_type["type"]:
.

If I dig deeper and I compare to pipelinewise (which worked for my use case) I see that it uses a flatten schema method to cleanup the schema before parsing it in the same way that we're parsing here where type is assumed to always be present. So I think we're receiving it as {"anyOf": [{"type": "string", "format": "date-time"}, {"type": "null"}]} vs in pipelinewise's column_type method theyre receiving it transformed into something like {"type": ["null", "string"], "format": "date-time"}. I also see that transferwise defaults to varchar.

@pnadolny13
Copy link

pnadolny13 commented Jun 26, 2023

@visch I'm not positive so you'd need to test to validate it but this might be fixed by meltano/sdk#1774.

@visch
Copy link
Member Author

visch commented Jul 6, 2023

@visch I'm not positive so you'd need to test to validate it but this might be fixed by meltano/sdk#1774.

That didn't work unfortunately :/

visch added a commit that referenced this issue Jul 14, 2023
A less-involved fix for anyOf processing that treats anyOf as if it was
a `type` array.

Closes #86

---------

Co-authored-by: Derek Visch <visch@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants