🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

edgao · 2022-03-11T21:31:14Z

What

Followup to #8383 / #1192

That PR added handling in the destination; we also need to handle this in normalization.

How

When transforming the config, check for this case and mangle the dataset ID value if needed. If the values mismatch, then raise an exception (this is the same behavior as the destination; see https://github.com/airbytehq/airbyte/pull/8383/files#diff-4fd5ae109bf5f80f90ed87bce2642adaca412cce321a84632c1208ccf8ea7ce7R173 )

I freely admit that I don't know Python conventions, so sorry in advance if this code looks terrible :P

🚨 User Impact 🚨

No

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Community member or Airbyter

Grant edit access to maintainers (instructions)
Secrets in the connector's spec are annotated with airbyte_secret
Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
Code reviews completed
Documentation updated
- Connector's README.md
- Connector's bootstrap.md. See description and examples
- Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
If new credentials are required for use in CI, add them to GSM. Instructions.
/test connector=connectors/<name> command is passing
New Connector version released on Dockerhub by running the /publish command described here
After the new connector version is published, connector version bumped in the seed directory as described here
Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit

edgao · 2022-03-11T21:53:15Z

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1971178004
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1971178004
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/tests/test_incremental.py        69     38    45%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       161     30    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    519    63%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
base_python/cdk/utils/casing.py                                                                                                     4      0   100%
base_python/__init__.py                                                                                                            13      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
base_python/cdk/utils/event_timing.py                                                                                              47      3    94%
base_python/cdk/streams/auth/core.py                                                                                                8      1    88%
base_python/cdk/streams/exceptions.py                                                                                              10      2    80%
base_python/cdk/streams/auth/token.py                                                                                               9      4    56%
base_python/logger.py                                                                                                              33     15    55%
base_python/cdk/streams/rate_limiting.py                                                                                           30     14    53%
base_python/integration.py                                                                                                         52     25    52%
base_python/cdk/streams/http.py                                                                                                    67     33    51%
base_python/cdk/streams/core.py                                                                                                    63     32    49%
base_python/client.py                                                                                                              56     33    41%
base_python/catalog_helpers.py                                                                                                     10      6    40%
base_python/source.py                                                                                                              51     34    33%
base_python/cdk/streams/auth/oauth.py                                                                                              37     26    30%
base_python/cdk/abstract_source.py                                                                                                 89     64    28%
base_python/schema_helpers.py                                                                                                      56     41    27%
base_python/entrypoint.py                                                                                                          70     56    20%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                             832    389    53%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       161     30    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    519    63%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/utils.py                                                                                           33      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      5    97%
normalization/transform_catalog/stream_processor.py                                                                               524     39    93%
normalization/transform_catalog/catalog_processor.py                                                                              143     12    92%
normalization/transform_catalog/table_name_registry.py                                                                            174     51    71%
normalization/transform_config/transform.py                                                                                       161     50    69%
normalization/transform_catalog/transform.py                                                                                       45     30    33%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    187    87%

ChristopheDuong · 2022-03-14T08:33:38Z

airbyte-integrations/bases/base-normalization/normalization/transform_config/transform.py

+        try:
+            colon_index = config["dataset_id"].index(":")
+        except ValueError:
+            colon_index = None
+
+        if colon_index is not None:


I freely admit that I don't know Python conventions, so sorry in advance if this code looks terrible :P

No worries! We all have to start somewhere

Maybe it'd be simpler to do in one line:

Suggested change

try:

colon_index = config["dataset_id"].index(":")

except ValueError:

colon_index = None

if colon_index is not None:

if ":" in config["dataset_id"]

I guess the convention my instincts are trying to follow here is related to this "rule":
https://docs.python-guide.org/writing/style/#access-a-dictionary-element

Then, you can follow-up with just using the split string function in python instead of slicing with indices

thanks! added a check for too many colons just in case it gets mistyped (previously I guess we would have thrown an exception when trying to interact with bigquery)

edgao · 2022-03-14T16:28:46Z

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1981834854
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1981834854

parse dataset ID when needed

929ecd2

github-actions bot added the normalization label Mar 11, 2022

edgao marked this pull request as ready for review March 11, 2022 23:20

edgao requested a review from ChristopheDuong March 11, 2022 23:20

ChristopheDuong reviewed Mar 14, 2022

View reviewed changes

better python

6a54e91

edgao requested a review from ChristopheDuong March 14, 2022 15:07

edgao temporarily deployed to more-secrets March 14, 2022 15:07 Inactive

ChristopheDuong approved these changes Mar 14, 2022

View reviewed changes

edgao added 2 commits March 14, 2022 09:25

Merge branch 'master' into edgao/bigquery_normalization_dataset_id

663f1a9

bump version

edaabc9

github-actions bot added area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker labels Mar 14, 2022

edgao temporarily deployed to more-secrets March 14, 2022 16:30 Inactive

edgao temporarily deployed to more-secrets March 14, 2022 20:45 Inactive

Merge branch 'master' into edgao/bigquery_normalization_dataset_id

1f2ba17

edgao temporarily deployed to more-secrets March 14, 2022 20:54 Inactive

edgao merged commit 52d5905 into master Mar 14, 2022

edgao deleted the edgao/bigquery_normalization_dataset_id branch March 14, 2022 21:13

octavia-squidington-iii mentioned this pull request Mar 14, 2022

Bump Airbyte version from 0.35.53-alpha to 0.35.54-alpha #11125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

edgao commented Mar 11, 2022 •

edited

Loading

edgao commented Mar 11, 2022 •

edited by github-actions bot

Loading

ChristopheDuong Mar 14, 2022 •

edited

Loading

ChristopheDuong Mar 14, 2022 •

edited

Loading

edgao Mar 14, 2022

edgao commented Mar 14, 2022 •

edited by github-actions bot

Loading

🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

Conversation

edgao commented Mar 11, 2022 • edited Loading

What

How

Recommended reading order

🚨 User Impact 🚨

Pre-merge Checklist

Community member or Airbyter

Airbyter

Tests

edgao commented Mar 11, 2022 • edited by github-actions bot Loading

ChristopheDuong Mar 14, 2022 • edited Loading

Choose a reason for hiding this comment

ChristopheDuong Mar 14, 2022 • edited Loading

Choose a reason for hiding this comment

edgao Mar 14, 2022

Choose a reason for hiding this comment

edgao commented Mar 14, 2022 • edited by github-actions bot Loading

edgao commented Mar 11, 2022 •

edited

Loading

edgao commented Mar 11, 2022 •

edited by github-actions bot

Loading

ChristopheDuong Mar 14, 2022 •

edited

Loading

ChristopheDuong Mar 14, 2022 •

edited

Loading

edgao commented Mar 14, 2022 •

edited by github-actions bot

Loading