-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(BigQuery): pass table_id as str type #23141
perf(BigQuery): pass table_id as str type #23141
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Thanks for that - and sorry for delay, it's been a bit busy period for us all (and it's going to last for a while). I am not sure if this one is good or not - I am not a bq expert but maybe @turbaszek @mik-laj., @TobKed or maybe @lwyszomi or @bhirsz can chime in here. In any way that does not seem like something that needs to be merged quickly. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
I apologize, it somehow slipped my notice - I'm taking a look. |
So if I got it correctly the hook delete_table accepts 4 types of parameters but we're trying to feed it only with I think the change is OK. |
Thanks for checking. Yes, what you said is exactly why I submitted this PR. Adding |
Awesome work, congrats on your first merged pull request! |
Recently during migration from 1.10.14 to 2.2.3, I noticed an issue in the
BigQueryDeleteTableOperator
. For the context of this, there are two ways to specify a table in GCP BigQuery, one with the project_id, likemy-project.mydataset.mytable
, and the other one without project_id, likemydataset.mytable
.In 1.10.14, I was using the version without project_id, because the table can be recognized by
BigQueryHook
, usingbigquery_conn_id
to fetchproject_id
in configuration.The path to pass this info is: gcp_api_base_hook#L131 -> gcp_api_base_hook#L200 -> bigquery_hook#L71 -> bigquery_hook#L1498.
But after upgrading to 2.2.3, a full
table_id
is required. This is unexpected becausebigquery_conn_id/gcp_conn_id
is still a valid parameter,BigQueryDeleteTableOperator
should still be able to getproject_id
automatically from the connection configuration. It seems like in this line of code bigquery#L1195, it forces users to use fulltable_id
to create aTable
instance, which is the root cause.Method
delete_table
accepts 4 types of tables, such asTable
,TableReference
,TableListItem
andstr
as shown in client#L1754. Then in client#L1784, it converts these 4 types to 1 type, which isTableReference
as shown in table#L2689.So back to the possible improvement of this issue, I wonder if it will help migration get smoother if instead of using
Table.from_string
to get aTable
type, astr
type parameter is passed directly. And thisstr
parameter can be justmydataset.mytable
, withproject_id
set by theClient
as shown in bigquery#L1194. I believe due to the plan of GCP, companies are slowly migrating to Airflow 2.0 for better support. This improvement will avoid having them add theproject_id
totable_id
for hundreds of DAGs since it is already included in the connection configuration.Below are two scenarios based on the two formats of specifying a BigQuery table:
table_id
likemydataset.mytable
is passed in bigquery#L1797 and the correspondingproject_id
is configured by the connection. This will work as expected, if noproject_id
is found, error will be captured in _helpers#L825.table_id
likemy-project.mydataset.mytable
is passed. In this case, whether or not theproject_id
is configured or configured correspondingly, it will use theproject_id
defined in thetable_id
regardless as shown in _helpers#L836.This is my first attempt at submitting a PR to an open-sourced repo. Please let me know how I can improve. It is also fine if it is not worth merging such a change. I enjoyed the time when looking into this.
@kaxil @eladkal @potiuk