-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error when create external table using table resource #17998
Conversation
destination_project_dataset_table is being used in l1201 but cannot pass init check at 1154
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong. You should pass desination table via tableReference
dict param of "table_resource" : https://cloud.google.com/bigquery/docs/reference/rest/v2/tables
@potiuk thanks for the feedback, however I am still seeing this issue as line 1157 only supports initializing self.destimation_dataset_table from parameter not from table resource. Do you mind elaborate more and check if the configuration is unreachable from table resource, thanks |
Please read the docs. The new "table_resource" when provided, replaces the usage of many parameters and they are ignored if specified. Whenever you use "table_resource" all provided parameters should be passed through "table_resource" rather than through individual parameters. In this particular case the right way of passing destination_dataset_table is via "table_resource"'s "tableReference" object (https://cloud.google.com/bigquery/docs/reference/rest/v2/TableReference) - where you have to specify project_id, datasetId, table_id (previously this was all passed by single "destination_project_dataset_table" string which was If you pass both "table_resource" and "destination_project_dataset_table" and "table_resource", the one passed by "destination_project_dataset_table" will be ignored by the new version of bigquery library - that's why when you use "table_resource" and "destination_project_dataset_table" and "table_resource" together - you get the error. I agree that this is in super clear from the error mesage, I found it out myself couple of days ago when I corrected some warning messages, I think about improving this a bit because it also jumped for me as rather cryptic communication. |
I am also currently facing this issue. If we don't have a |
There were some You can set those parameters to None (for example The new version of the provider is free of that problem I think, |
@potiuk to be more specific, this is the block when executing the command from table resource, using None will result in
|
Ah I see now. I have not looked that far.I am not sure though if that is the right fix, I think - looking at the code deeper, is that this is really a mixture of old and new way of passing table resource parameters - there is ambiguity here - looks like the idea was to put everything in table resource and here we have a weird expactation that part of it will be passed as So IMHO the right approach is to change the lines you marked to:
|
@potiuk I agree, please review the updated pr |
I have not closely examine other operators but they might suffer from the same issue too |
@@ -1119,7 +1119,6 @@ def __init__( | |||
# BQ config | |||
kwargs_passed = any( | |||
[ | |||
destination_project_dataset_table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be restored now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call
@@ -1195,36 +1200,24 @@ def execute(self, context) -> None: | |||
else: | |||
schema_fields = self.schema_fields | |||
|
|||
if schema_fields and self.table_resource: | |||
self.table_resource["externalDataConfiguration"]["schema"] = schema_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does removing this logic break backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering it was incorrectly introduced and table creation via table resource is not properly supported, I don't think this will break backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Static checks are failing |
test has been fixed |
Awesome work, congrats on your first merged pull request! |
affected operator: class BigQueryCreateExternalTableOperator(BaseOperator):
root cause: destination_project_dataset_table is being used in l1201 but cannot pass init check at 1154 where ValueError(“You provided both
table_resource
and exclusive keywords arguments.“) is returned^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.