-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization handles quote in column names #5027
Conversation
@@ -8,6 +8,7 @@ select | |||
_airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, | |||
json_extract_array(`partition`, "$['double_array_data']") as double_array_data, | |||
json_extract_array(`partition`, "$['DATA']") as DATA, | |||
json_extract_array(`partition`, "$['column___with__quotes']") as column___with__quotes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to PR #5026, the raw json blob in BigQuery also contains some sanitized columns that we can extract from where quote characters have already been replaced by _
'$."DATA"') as `DATA`, | ||
'$."DATA"') as `DATA`, | ||
json_extract(`partition`, | ||
'$."column___with__quotes"') as `column__'with"_quotes`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to PR #5026, the raw json blob in MySql also contains some sanitized columns that we can extract from where quote characters have already been replaced by _
Notice that the json_extract can't handle quotes in the json path but it is possible to create column names using '
or "
in MySQL...
@@ -6,6 +6,7 @@ select | |||
_airbyte_nested_stre__nto_long_names_hashid, | |||
jsonb_extract_path("partition", 'double_array_data') as double_array_data, | |||
jsonb_extract_path("partition", 'DATA') as "DATA", | |||
jsonb_extract_path("partition", 'column`_''with"_quotes') as "column`_'with""_quotes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres is able to parse the json blob just fine. It just has an edge case of doubling the '
characters to "escape" it...
@@ -6,6 +6,7 @@ select | |||
_AIRBYTE_NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_HASHID, | |||
get_path(parse_json(PARTITION), '"double_array_data"') as DOUBLE_ARRAY_DATA, | |||
get_path(parse_json(PARTITION), '"DATA"') as DATA, | |||
get_path(parse_json(PARTITION), '"column`_''with""_quotes"') as "column`_'with""_quotes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snowflake is able to parse the json blob just fine. It just has an edge case of doubling the '
or '"' characters to "escape" them...
What
This PR depends on some changes made in bigquery/mysql destinations: #5026
Closes #4729
To avoid noise, output test files are regenerated and part of another PR #5028
How
Parse JSON blob while managing quote characters depending on the destination
Recommended reading order
x.java
y.python
Pre-merge Checklist
Expand the checklist which is relevant for this PR.
Connector checklist
airbyte_secret
in the connector's spec./gradlew :airbyte-integrations:connectors:<name>:integrationTest
./test connector=connectors/<name>
command as documented here is passing.README.md
docs/SUMMARY.md
if it's a new connectordocs/integrations/<source or destination>/<name>
.docs/integrations/...
. See changelog exampledocs/integrations/README.md
contains a reference to the new connector/publish
command described hereConnector Generator checklist
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes