Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error when create external table using table resource #17998

Merged
merged 7 commits into from
Sep 13, 2021

Conversation

tnyz
Copy link
Contributor

@tnyz tnyz commented Sep 2, 2021

affected operator: class BigQueryCreateExternalTableOperator(BaseOperator):
root cause: destination_project_dataset_table is being used in l1201 but cannot pass init check at 1154 where ValueError(“You provided both table_resource and exclusive keywords arguments.“) is returned


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

destination_project_dataset_table is being used in l1201 but cannot pass init check at 1154
@tnyz tnyz requested a review from turbaszek as a code owner September 2, 2021 21:04
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Sep 2, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 2, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. You should pass desination table via tableReference dict param of "table_resource" : https://cloud.google.com/bigquery/docs/reference/rest/v2/tables

@potiuk potiuk closed this Sep 4, 2021
@tnyz
Copy link
Contributor Author

tnyz commented Sep 4, 2021

@potiuk thanks for the feedback, however I am still seeing this issue as line 1157 only supports initializing self.destimation_dataset_table from parameter not from table resource. Do you mind elaborate more and check if the configuration is unreachable from table resource, thanks

@potiuk
Copy link
Member

potiuk commented Sep 5, 2021

Please read the docs. The new "table_resource" when provided, replaces the usage of many parameters and they are ignored if specified. Whenever you use "table_resource" all provided parameters should be passed through "table_resource" rather than through individual parameters. In this particular case the right way of passing destination_dataset_table is via "table_resource"'s "tableReference" object (https://cloud.google.com/bigquery/docs/reference/rest/v2/TableReference) - where you have to specify project_id, datasetId, table_id (previously this was all passed by single "destination_project_dataset_table" string which was
"(project.)dataset.table'".

If you pass both "table_resource" and "destination_project_dataset_table" and "table_resource", the one passed by "destination_project_dataset_table" will be ignored by the new version of bigquery library - that's why when you use "table_resource" and "destination_project_dataset_table" and "table_resource" together - you get the error.

I agree that this is in super clear from the error mesage, I found it out myself couple of days ago when I corrected some warning messages, I think about improving this a bit because it also jumped for me as rather cryptic communication.

@bmbejcek
Copy link

bmbejcek commented Sep 6, 2021

I am also currently facing this issue. If we don't have a destination_project_dataset_table variable, it says that it is missing even when passing in this information through the table_resource schema you mentioned @potiuk. If we include both versions, then it says we can't have duplicate information. Seems like a catch-22.

@tnyz tnyz changed the title Fix error when create external table Fix error when create external table using table resource Sep 7, 2021
@potiuk
Copy link
Member

potiuk commented Sep 7, 2021

There were some default values in the past google provider that was fixed in 5.1.0. The default values were set for few parameters (where they should not be if "table_resource" was specified).

You can set those parameters to None (for example destination_project_dataset_table) when you use table_resource if you use earlier version of the provider.

The new version of the provider is free of that problem I think,

@tnyz
Copy link
Contributor Author

tnyz commented Sep 7, 2021

@potiuk to be more specific, this is the block when executing the command from table resource, using None will result in TableReference.from_string(None) right?

        if self.table_resource:
            tab_ref = TableReference.from_string(self.destination_project_dataset_table)
            bq_hook.create_empty_table(
                table_resource=self.table_resource,
                project_id=tab_ref.project,
                table_id=tab_ref.table_id,
                dataset_id=tab_ref.dataset_id,
            )

@potiuk potiuk reopened this Sep 7, 2021
@potiuk
Copy link
Member

potiuk commented Sep 7, 2021

Ah I see now. I have not looked that far.I am not sure though if that is the right fix, I think - looking at the code deeper, is that this is really a mixture of old and new way of passing table resource parameters - there is ambiguity here - looks like the idea was to put everything in table resource and here we have a weird expactation that part of it will be passed as destination_project_dataset_table.

So IMHO the right approach is to change the lines you marked to:

        if self.table_resource:
            bq_hook.create_empty_table(
                table_resource=self.table_resource,
                project_id=None,
                table_id=None,
                dataset_id=None,
            )

@tnyz
Copy link
Contributor Author

tnyz commented Sep 7, 2021

@potiuk I agree, please review the updated pr

@tnyz
Copy link
Contributor Author

tnyz commented Sep 7, 2021

I have not closely examine other operators but they might suffer from the same issue too

@tnyz tnyz requested a review from potiuk September 8, 2021 00:16
@@ -1119,7 +1119,6 @@ def __init__(
# BQ config
kwargs_passed = any(
[
destination_project_dataset_table,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be restored now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@tnyz tnyz requested a review from potiuk September 8, 2021 16:59
@@ -1195,36 +1200,24 @@ def execute(self, context) -> None:
else:
schema_fields = self.schema_fields

if schema_fields and self.table_resource:
self.table_resource["externalDataConfiguration"]["schema"] = schema_fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does removing this logic break backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering it was incorrectly introduced and table creation via table resource is not properly supported, I don't think this will break backwards compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Sep 12, 2021
@potiuk
Copy link
Member

potiuk commented Sep 12, 2021

Static checks are failing

@tnyz
Copy link
Contributor Author

tnyz commented Sep 13, 2021

Static checks are failing

test has been fixed

@uranusjr uranusjr merged commit 8ae2bb9 into apache:main Sep 13, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 13, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants