Normalization handles quote in column names #5027

ChristopheDuong · 2021-07-27T16:05:48Z

What

This PR depends on some changes made in bigquery/mysql destinations: #5026

Closes #4729

To avoid noise, output test files are regenerated and part of another PR #5028

How

Parse JSON blob while managing quote characters depending on the destination

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

Connector Generator checklist

Issue acceptance criteria met
PR name follows PR naming conventions
If adding a new generator, add it to the list of scaffold modules being tested
The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
Documentation which references the generator is updated as needed.

ChristopheDuong · 2021-07-27T16:19:20Z

...normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1.sql

@@ -8,6 +8,7 @@ select
    _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid,
    json_extract_array(`partition`, "$['double_array_data']") as double_array_data,
    json_extract_array(`partition`, "$['DATA']") as DATA,
+    json_extract_array(`partition`, "$['column___with__quotes']") as column___with__quotes,


Thanks to PR #5026, the raw json blob in BigQuery also contains some sanitized columns that we can extract from where quote characters have already been replaced by _

ChristopheDuong · 2021-07-27T16:20:48Z

...reams/final/airbyte_ctes/test_normalization/nested_stream_with_co__g_names_partition_ab1.sql

-  '$."DATA"') as `DATA`,
+    '$."DATA"') as `DATA`,
+    json_extract(`partition`, 
+    '$."column___with__quotes"') as `column__'with"_quotes`,


Thanks to PR #5026, the raw json blob in MySql also contains some sanitized columns that we can extract from where quote characters have already been replaced by _

Notice that the json_extract can't handle quotes in the json path but it is possible to create column names using ' or " in MySQL...

ChristopheDuong · 2021-07-27T16:30:47Z

...ms/final/airbyte_ctes/test_normalization/nested_stream_with_c___long_names_partition_ab1.sql

@@ -6,6 +6,7 @@ select
    _airbyte_nested_stre__nto_long_names_hashid,
    jsonb_extract_path("partition", 'double_array_data') as double_array_data,
    jsonb_extract_path("partition", 'DATA') as "DATA",
+    jsonb_extract_path("partition", 'column`_''with"_quotes') as "column`_'with""_quotes",


Postgres is able to parse the json blob just fine. It just has an edge case of doubling the ' characters to "escape" it...

ChristopheDuong · 2021-07-27T16:31:35Z

...NORMALIZATION/NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_PARTITION_AB1.sql

@@ -6,6 +6,7 @@ select
    _AIRBYTE_NESTED_STREAM_WITH_COMPLEX_COLUMNS_RESULTING_INTO_LONG_NAMES_HASHID,
    get_path(parse_json(PARTITION), '"double_array_data"') as DOUBLE_ARRAY_DATA,
    get_path(parse_json(PARTITION), '"DATA"') as DATA,
+    get_path(parse_json(PARTITION), '"column`_''with""_quotes"') as "column`_'with""_quotes",


Snowflake is able to parse the json blob just fine. It just has an edge case of doubling the ' or '"' characters to "escape" them...

ChristopheDuong added 4 commits July 27, 2021 17:49

Add sanitized column name in some destinations' raw table outputs

4ee711b

update docs

766d2e1

Handle quotes in columns names

a597b36

Preview of changes to sql files

82494b5

github-actions bot added the normalization label Jul 27, 2021

ChristopheDuong mentioned this pull request Jul 27, 2021

Regenerate normalization SQL files #5028

Merged

ChristopheDuong commented Jul 27, 2021

View reviewed changes

update docs

bec34ba

github-actions bot added area/documentation Improvements or additions to documentation area/worker Related to worker labels Jul 27, 2021

ChristopheDuong commented Jul 27, 2021

View reviewed changes

tuliren approved these changes Jul 27, 2021

View reviewed changes

ChristopheDuong requested a review from marcosmarxm July 27, 2021 19:04

Regenerate SQL files (#5028)

2f6a797

Base automatically changed from chris/handle-quote-destinations to master July 28, 2021 12:38

ChristopheDuong merged commit d6429a4 into master Jul 28, 2021

ChristopheDuong deleted the chris/handle-quote-normalization branch July 28, 2021 14:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization handles quote in column names #5027

Normalization handles quote in column names #5027

ChristopheDuong commented Jul 27, 2021 •

edited

Loading

ChristopheDuong Jul 27, 2021 •

edited

Loading

ChristopheDuong Jul 27, 2021 •

edited

Loading

ChristopheDuong Jul 27, 2021

ChristopheDuong Jul 27, 2021

Normalization handles quote in column names #5027

Normalization handles quote in column names #5027

Conversation

ChristopheDuong commented Jul 27, 2021 • edited Loading

What

How

Recommended reading order

Pre-merge Checklist

ChristopheDuong Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

ChristopheDuong Jul 27, 2021 • edited Loading

Choose a reason for hiding this comment

ChristopheDuong Jul 27, 2021

Choose a reason for hiding this comment

ChristopheDuong Jul 27, 2021

Choose a reason for hiding this comment

ChristopheDuong commented Jul 27, 2021 •

edited

Loading

ChristopheDuong Jul 27, 2021 •

edited

Loading

ChristopheDuong Jul 27, 2021 •

edited

Loading