-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSToBQ operator does not respect project_id
in deferrable mode with impersonation chain.
#32093
Comments
Thank you for submitting this -- I have been procrastinating filing an issue of my own about this inconsistent handling of project ids. I believe the problems with changes to Somewhat inconveniently, I am only a few hours away from a 2-week vacation, so my responses may be delayed |
^ I decided to file one, just to get my thoughts down |
@bhagany Thanks for you comments and I'd be more than happy to talk more about this but there's no rush while you're away. This PR was really just me patching for a specific issue that we have to work around internally currently so it would be good to deal with these issues more comprehensively. |
|
@nathadfield @avinashpandeshwar |
@eladkal Fix to GCSToBigQueryOperator for deferrable mode was addressed in #32232, or did you mean for |
Yes sorry.. I meant to |
@nathadfield @bhagany For cc @eladkal |
I think it's best to raise a PR with what you believe is the best course of action and explain the pros/cons in the PR description. Its easier to have this discussion where we see the code and scope of changes |
Apache Airflow version
2.6.2
What happened
When using the
GCSToBigQueryOperator
in deferrable mode with an impersonation_chain service account which has a default project_id that is different from the project_id specified in the operator arguments, a failure occurs.I believe this happens because, although the BigQuery job to insert data, is raised against
self.project_id
in _submit_job, when in deferrable mode it tries to find the job within the project in self.hook.project_id.It is possible that that the default project_id assigned to the impersonation chain service account is different to the project_id specified to the operator.
In the above error, you can see that the error says that it cannot find the job_id
airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd
in the projectking-cdmt-etl-sandbox
.In fact this job_id was created successfully in the project
king-coredatasets-sandbox
What you think should happen instead
I think that we should modify the call to
self.defer
to receiveself.project_id
rather thanself.hook.project_id
How to reproduce
I haven't quite got the exact steps to reproduce but I will submit a PR for review soon.
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow-providers-google==10.0.0
Deployment
Astronomer
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: