Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlineage, bigquery: add openlineage method support for BigQueryInsertJobOperator #31293

Merged

Conversation

mobuchowski
Copy link
Contributor

@mobuchowski mobuchowski commented May 15, 2023

This PR adds OpenLineage support for BigQueryInsertJobOperator.

Despite being SQL-based, this does not use SQL parsing due to the fact that BigQuery has an API that allows us to get the lineage data directly from it. Code that does that is implemented in OpenLineage Common library: https://github.com/OpenLineage/OpenLineage/blob/main/integration/common/openlineage/common/provider/bigquery.py
the method merely uses it.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues provider:openlineage AIP-53 labels May 15, 2023
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 2 times, most recently from 8a31e4f to c0f5aff Compare May 16, 2023 13:11
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 4 times, most recently from a82ee78 to e982bfc Compare May 18, 2023 09:55
Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments

airflow/providers/google/cloud/operators/bigquery.py Outdated Show resolved Hide resolved
airflow/providers/google/cloud/operators/bigquery.py Outdated Show resolved Hide resolved
airflow/providers/google/cloud/operators/bigquery.py Outdated Show resolved Hide resolved
airflow/providers/google/cloud/operators/bigquery.py Outdated Show resolved Hide resolved
tests/providers/google/cloud/operators/test_bigquery.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add the documentation changes around BigQueryExecuteQueryOperator lineage`

@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch from e982bfc to aded88f Compare May 22, 2023 13:04
@potiuk potiuk force-pushed the openlineage-bigquery-operation branch from aded88f to a647e01 Compare May 22, 2023 17:53
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 3 times, most recently from f4c71a0 to 6570061 Compare June 5, 2023 10:12
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 2 times, most recently from 101309a to 0d17581 Compare June 7, 2023 15:58
@pankajkoti pankajkoti requested review from pankajkoti, phanikumv, pankajastro and sunank200 and removed request for sunank200 June 13, 2023 06:25
Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation changes for open lineage support for this operator is needed

@mobuchowski
Copy link
Contributor Author

@sunank200 what kind of documentation would you like to see?

For user facing one, I believe we don't need to do that - beyond having a doc somewhere which operators are supported. Users don't need to do anything or configure anything specific to particular operator if they wish to use OpenLineage. So, for them, the actual integration is transparent.

For developers, we might need to write something. For general use I've written some tips in separate PR: #31817 - do you think we need to have something provider-specific beyond docstrings?

@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch from 84dd40b to 20d713b Compare June 15, 2023 14:10
@pankajkoti pankajkoti removed their request for review July 12, 2023 12:22
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 3 times, most recently from 861a49e to 5110229 Compare July 20, 2023 14:26
@mobuchowski mobuchowski reopened this Jul 20, 2023
@mobuchowski mobuchowski reopened this Jul 20, 2023
@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch 3 times, most recently from 71b6953 to 0bb3ddf Compare July 23, 2023 17:03
@raphaelauv
Copy link
Contributor

Since BigQueryExecuteQueryOperator is deprecated for BigQueryInsertJobOperator, why add features to BigQueryExecuteQueryOperator ?

@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch from 0bb3ddf to cedf85d Compare July 26, 2023 14:25
@mobuchowski
Copy link
Contributor Author

@raphaelauv it's still widely used. But it's valid point, so I extracted the implementation into mixin and added it to BigQueryInsertJobOperator here as well - it works pretty much the same, with small difference on SQL vs configuration part, as it's reading result data from API job definition.

@mobuchowski mobuchowski changed the title openlineage, bigquery: add openlineage method support for BigQueryExecuteQueryOperator openlineage, bigquery: add openlineage method support for BigQueryExecuteQueryOperator and BigQueryInsertJobOperator Aug 3, 2023
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the feature from deprecated classes.

@@ -1001,7 +1063,7 @@ def execute_complete(self, context: Context, event: dict[str, Any]) -> Any:
return event["records"]


class BigQueryExecuteQueryOperator(GoogleCloudBaseOperator):
class BigQueryExecuteQueryOperator(GoogleCloudBaseOperator, _BigQueryOpenLineageMixin):
Copy link
Contributor

@eladkal eladkal Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 for adding new features to deprecated classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal removed this from BigQueryExecuteQueryOperator and associated tests.

@mobuchowski mobuchowski force-pushed the openlineage-bigquery-operation branch from cedf85d to d047c65 Compare August 4, 2023 14:46
@mobuchowski mobuchowski changed the title openlineage, bigquery: add openlineage method support for BigQueryExecuteQueryOperator and BigQueryInsertJobOperator openlineage, bigquery: add openlineage method support for BigQueryInsertJobOperator Aug 4, 2023
@mobuchowski mobuchowski merged commit e10aa6a into apache:main Aug 4, 2023
42 checks passed
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Aug 8, 2023
ephraimbuddy pushed a commit that referenced this pull request Aug 8, 2023
…cuteQueryOperator (#31293)

Signed-off-by: Maciej Obuchowski <[email protected]>
(cherry picked from commit e10aa6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:google Google (including GCP) related issues provider:openlineage AIP-53
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants