Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discontinue using deprecated methods in GCSToBigQueryOperator #10354

Conversation

coopergillan
Copy link
Contributor

@coopergillan coopergillan commented Aug 17, 2020

Several methods used in GCSToBigQueryOperator are deprecated. Switch to use the preferred methods, updating parameters and tests as needed.

related: #10288

Note: I only had time to tackle the operator mentioned by the user who opened the issue. It looks like the other Big Query transfer operators have the same issue. I could tackle those at a later date.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Several methods used in GCSToBigQueryOperator are deprecated.
Switch to use the preferred methods, updating parameters and
tests as needed.
@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Aug 17, 2020
@coopergillan
Copy link
Contributor Author

@turbaszek and @mik-laj both chimed in over in the original issue #10288.

@@ -258,7 +258,7 @@ def execute(self, context):
cursor = conn.cursor()

if self.external_table:
cursor.create_external_table(
bq_hook.create_external_table(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is also deprecated

@@ -276,7 +276,7 @@ def execute(self, context):
encryption_configuration=self.encryption_configuration
)
else:
cursor.run_load(
bq_hook.insert_job(configuration=dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't do:

airflow.exceptions.AirflowException: Unknown job type. Supported types: dict_keys(['load', 'copy', 'extract', 'query'])

as per docstring:

The configuration parameter maps directly to
            BigQuery's configuration field in the job object. See
            https://cloud.google.com/bigquery/docs/reference/v2/jobs for
            details.

So what I would suggest is:

  • add configuration parameter to constructor
  • if this parameter is passed then use insert_job method
  • if not then show a deprecation warning that Users should pass load job definition and use the existing logic

Probably we should validate that only load job is passed... WDYT @edejong ?

@stale
Copy link

stale bot commented Oct 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 4, 2020
@stale stale bot closed this Oct 12, 2020
@turbaszek
Copy link
Member

Hey @coopergillan do you have bandwitdth to continue this PR? It's quite important one as we have few issues around this operator 🔥 Maybe @juneoh would like to help?

@coopergillan
Copy link
Contributor Author

@turbaszek - sorry for the nearly-year-long silence. I am hhoping to take a look back into this as it appears that the issue #10288 is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants