Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination BigQuery: Accept Dataset ID field prefixed by Project ID #8383

Merged
merged 17 commits into from
Jan 18, 2022
Merged

Destination BigQuery: Accept Dataset ID field prefixed by Project ID #8383

merged 17 commits into from
Jan 18, 2022

Conversation

koji-m
Copy link
Contributor

@koji-m koji-m commented Dec 1, 2021

What

Closes: #1192

How

In the BigQuery destination connector's Dataset ID field, we could accept both syntax:

  • project-id:dataset_id
  • dataset_id

Make it error, if project-id in the Dataset ID field doesn't match the value in the Project ID field.

Recommended reading order

  1. BigQueryDestinationTest.java
  2. BigQueryDestination.java

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

スクリーンショット 2021-12-01 23 12 40


This change is Reviewable

@github-actions github-actions bot added the area/connectors Connector related issues label Dec 1, 2021
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Dec 1, 2021
@mkhokh-33 mkhokh-33 assigned mkhokh-33 and unassigned mkhokh-33 Dec 20, 2021
@mkhokh-33
Copy link
Contributor

Hi, @koji-m could you please update the branch and resolve conflicts on PR, tnx

@koji-m
Copy link
Contributor Author

koji-m commented Dec 22, 2021

Hi, @mkhokh-33
I've updated the branch and resolved conflicts. And integration-test passed at local.
Please review.

@@ -46,6 +49,7 @@

private static final Logger LOGGER = LoggerFactory.getLogger(BigQueryUtils.class);
private static final String BIG_QUERY_DATETIME_FORMAT = "yyyy-MM-dd HH:mm:ss.SSSSSS";
private static final Pattern datasetIdPattern = Pattern.compile("^(([a-z]([a-z0-9\\-]*[a-z0-9])?):)?([a-zA-Z0-9_]+)$");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be a constant
pls, rename it to DATASET_ID_PATTERN

.put(BigQueryConsts.CONFIG_DATASET_LOCATION, "US");
}

public static Stream<Arguments> validBigQueryIdProvider() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change method definition to private and move below the public methods

assertEquals(expected, actual);
}

public static Stream<Arguments> invalidBigQueryIdProvider() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change method definition to private and move below the public methods


public static Stream<Arguments> invalidBigQueryIdProvider() {
return Stream.of(
Arguments.arguments("my-project", ":my_dataset", "BigQuery Dataset ID format must match '[project-id:]dataset_id': :my_dataset"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check regexp pattern for dataset we could also add here some other incorrect datasets:
1."my-project:my-project:my_dataset".(to check that ':' is not allowed inside project name)
2."my-project-:my_dataset". (project name cannot end with a hyphen)
3."my-project:"
3."my-project: "

@@ -56,7 +56,7 @@ public BigQueryDestination() {
@Override
public AirbyteConnectionStatus check(final JsonNode config) {
try {
final String datasetId = config.get(BigQueryConsts.CONFIG_DATASET_ID).asText();
final String datasetId = BigQueryUtils.getDatasetId(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQueryUtils.getDatasetId(config) used in 2 cases

  1. BigQueryDestination.check()
  2. BigQueryDestination.getConsumer() invoke inside BigQueryUploaderFactory.getUploader then BigQueryUtils.getSchema then BigQueryUtils.getDatasetId(config)

Could you pls consider to add integration test to BigQueryDestinationTest to check that:

  1. we can create connection with provided dataset
  2. we can get consumer and write some data

@mkhokh-33
Copy link
Contributor

Hi, @mkhokh-33 I've updated the branch and resolved conflicts. And integration-test passed at local. Please review.

hi @koji-m, thanks for your PR, I left some comments

@koji-m
Copy link
Contributor Author

koji-m commented Dec 30, 2021

@mkhokh-33 Thank you for your review. I've fixed based on your review.

@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 4, 2022 14:21 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 4, 2022 14:43 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 4, 2022 14:43 Inactive
@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 4, 2022 14:52 Inactive
@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 4, 2022 15:25 Inactive
@mkhokh-33 mkhokh-33 requested a review from sherifnada January 4, 2022 15:43
@mkhokh-33
Copy link
Contributor

LGTM,
connectors base fails is known issue will be fixed by - #9274
@sherifnada could you have a look please, integration test are passed - #9282

@sherifnada sherifnada requested a review from edgao January 5, 2022 08:14
@koji-m
Copy link
Contributor Author

koji-m commented Jan 6, 2022

@edgao Thank you for your review. I've fixed based on your review.

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! @mkhokh-33 can you take care of releasing this?

@edgao
Copy link
Contributor

edgao commented Jan 6, 2022

actually, lemme publish a separate PR first, will comment here once that finishes

@edgao
Copy link
Contributor

edgao commented Jan 6, 2022

alright @mkhokh-33 we're good to go here!

@mkhokh-33
Copy link
Contributor

mkhokh-33 commented Jan 10, 2022

Hi @koji-m ,
could you please update docker images version in following files:

  1. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-integrations/connectors/destination-bigquery/Dockerfile

LABEL io.airbyte.version=0.6.2

we also need to change version for destination-bigquery-denormalized cause it uses destination-bigquery implementation
2. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-integrations/connectors/destination-bigquery-denormalized/Dockerfile

LABEL io.airbyte.version=0.2.3

  1. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-config/init/src/main/resources/seed/destination_definitions.yaml
  • dockerImage: "airbyte/destination-bigquery:0.6.2"
  • dockerImage: "airbyte/destination-bigquery-denormalized:0.2.3"
  1. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-config/init/src/main/resources/config/STANDARD_DESTINATION_DEFINITION/22f6c74f-5699-40ff-833c-4a879ea40133.json "dockerImageTag": "0.6.2",

  2. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-config/init/src/main/resources/config/STANDARD_DESTINATION_DEFINITION/079d5540-f236-4294-ba7c-ade8fd918496.json

"dockerImageTag": "0.2.3",

  1. https://github.com/koji-m/airbyte/blob/parsing-dataset-names-in-bq/airbyte-config/init/src/main/resources/seed/destination_specs.yaml
    -dockerImage: "airbyte/destination-bigquery:0.6.2
  • dockerImage: "airbyte/destination-bigquery-denormalized:0.2.3"

Thus I could proceed with merge. Thanks

@koji-m
Copy link
Contributor Author

koji-m commented Jan 16, 2022

@mkhokh-33 Sorry for the late response. I've updated the docker image version numbers. (with docs)

  • destination-bigquery: 0.6.3 (master) -> 0.6.4 (this branch)
  • destination-bigquery-denormalized: 0.2.3 (master) -> 0.2.4 (this branch)

@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 17, 2022 11:32 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 11:34 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 11:35 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 12:15 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 12:15 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 12:21 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 17, 2022 13:43 Inactive
@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 18, 2022 08:38 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 18, 2022 08:59 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 18, 2022 09:10 Inactive
@mkhokh-33 mkhokh-33 temporarily deployed to more-secrets January 18, 2022 09:41 Inactive
@mkhokh-33 mkhokh-33 merged commit 3f9cbec into airbytehq:master Jan 18, 2022
@mkhokh-33
Copy link
Contributor

@mkhokh-33 Sorry for the late response. I've updated the docker image version numbers. (with docs)

  • destination-bigquery: 0.6.3 (master) -> 0.6.4 (this branch)
  • destination-bigquery-denormalized: 0.2.3 (master) -> 0.2.4 (this branch)

@koji-m thank you, merged 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Parsing Dataset names in BigQuery
7 participants