Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Destination BigQuery: Handle embedded project ID in dataset ID during normalization #11077

Merged
merged 5 commits into from
Mar 14, 2022

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Mar 11, 2022

What

Followup to #8383 / #1192

That PR added handling in the destination; we also need to handle this in normalization.

How

When transforming the config, check for this case and mangle the dataset ID value if needed. If the values mismatch, then raise an exception (this is the same behavior as the destination; see https://github.com/airbytehq/airbyte/pull/8383/files#diff-4fd5ae109bf5f80f90ed87bce2642adaca412cce321a84632c1208ccf8ea7ce7R173 )

I freely admit that I don't know Python conventions, so sorry in advance if this code looks terrible :P

Recommended reading order

  1. transform.py
  2. test_transform_config.py

🚨 User Impact 🚨

No

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Unit

Screen Shot 2022-03-11 at 1 30 34 PM

@edgao
Copy link
Contributor Author

edgao commented Mar 11, 2022

/test connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1971178004
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1971178004
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/tests/test_incremental.py        69     38    45%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       161     30    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    519    63%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
base_python/cdk/utils/casing.py                                                                                                     4      0   100%
base_python/__init__.py                                                                                                            13      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
base_python/cdk/utils/event_timing.py                                                                                              47      3    94%
base_python/cdk/streams/auth/core.py                                                                                                8      1    88%
base_python/cdk/streams/exceptions.py                                                                                              10      2    80%
base_python/cdk/streams/auth/token.py                                                                                               9      4    56%
base_python/logger.py                                                                                                              33     15    55%
base_python/cdk/streams/rate_limiting.py                                                                                           30     14    53%
base_python/integration.py                                                                                                         52     25    52%
base_python/cdk/streams/http.py                                                                                                    67     33    51%
base_python/cdk/streams/core.py                                                                                                    63     32    49%
base_python/client.py                                                                                                              56     33    41%
base_python/catalog_helpers.py                                                                                                     10      6    40%
base_python/source.py                                                                                                              51     34    33%
base_python/cdk/streams/auth/oauth.py                                                                                              37     26    30%
base_python/cdk/abstract_source.py                                                                                                 89     64    28%
base_python/schema_helpers.py                                                                                                      56     41    27%
base_python/entrypoint.py                                                                                                          70     56    20%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                             832    389    53%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       161     30    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           33      7    79%
normalization/transform_catalog/catalog_processor.py                                                                              143     77    46%
normalization/transform_catalog/transform.py                                                                                       45     26    42%
normalization/transform_catalog/stream_processor.py                                                                               524    337    36%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    519    63%
Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/utils.py                                                                                           33      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      5    97%
normalization/transform_catalog/stream_processor.py                                                                               524     39    93%
normalization/transform_catalog/catalog_processor.py                                                                              143     12    92%
normalization/transform_catalog/table_name_registry.py                                                                            174     51    71%
normalization/transform_config/transform.py                                                                                       161     50    69%
normalization/transform_catalog/transform.py                                                                                       45     30    33%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1396    187    87%

@edgao edgao marked this pull request as ready for review March 11, 2022 23:20
@edgao edgao requested a review from ChristopheDuong March 11, 2022 23:20
Comment on lines 134 to 139
try:
colon_index = config["dataset_id"].index(":")
except ValueError:
colon_index = None

if colon_index is not None:
Copy link
Contributor

@ChristopheDuong ChristopheDuong Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I freely admit that I don't know Python conventions, so sorry in advance if this code looks terrible :P

No worries! We all have to start somewhere

Maybe it'd be simpler to do in one line:

Suggested change
try:
colon_index = config["dataset_id"].index(":")
except ValueError:
colon_index = None
if colon_index is not None:
if ":" in config["dataset_id"]

I guess the convention my instincts are trying to follow here is related to this "rule":
https://docs.python-guide.org/writing/style/#access-a-dictionary-element

Copy link
Contributor

@ChristopheDuong ChristopheDuong Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, you can follow-up with just using the split string function in python instead of slicing with indices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! added a check for too many colons just in case it gets mistyped (previously I guess we would have thrown an exception when trying to interact with bigquery)

@edgao edgao requested a review from ChristopheDuong March 14, 2022 15:07
@edgao edgao temporarily deployed to more-secrets March 14, 2022 15:07 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 15:07 Inactive
@edgao
Copy link
Contributor Author

edgao commented Mar 14, 2022

/publish connector=bases/base-normalization

🕑 bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1981834854
✅ bases/base-normalization https://github.com/airbytehq/airbyte/actions/runs/1981834854

@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker labels Mar 14, 2022
@edgao edgao temporarily deployed to more-secrets March 14, 2022 16:30 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 16:30 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 20:45 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 20:45 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 20:54 Inactive
@edgao edgao temporarily deployed to more-secrets March 14, 2022 20:54 Inactive
@edgao edgao merged commit 52d5905 into master Mar 14, 2022
@edgao edgao deleted the edgao/bigquery_normalization_dataset_id branch March 14, 2022 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation area/platform issues related to the platform area/worker Related to worker normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants