Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BigQueryGetDataOperator where project_id is not being respected in deferrable mode #32488

Conversation

avinashpandeshwar
Copy link
Contributor

@avinashpandeshwar avinashpandeshwar commented Jul 10, 2023

The BigQueryGetDataOperator's project_id parameter is currently being used to specify table's storage project, but also used for job submissions in deferred mode. This PR aims to separate these responsibilities by introducing two new parameters:

  • table_project_id to specify the table's storage project.
  • job_project_id to specify job submission project to fetch data from the table.

Both parameters are optional, and are defaulted by hook's project_id.

The pre-existing project_id parameter is deprecated, with log warnings on usage.

related: #32093


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…ct. A new parameter table_project_id will be used for specifying table storage project.
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jul 10, 2023
@avinashpandeshwar avinashpandeshwar changed the title Fix/fix bigquerygetdataoperator projectid issue Fix BigQueryGetDataOperator where project_id is not being respected in deferrable mode Jul 10, 2023
will be returned from. If None, it will be derived from the hook's project ID. (templated)
:param table_project_id: (Optional) The project ID of the requested table.
If None, it will be derived from the hook's project ID. (templated)
:param project_id: (Optional) Google Cloud Project where the job is running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing what project_id means
If we want to do it can only be done in major release and even then its not something I am comfortable with as it may cause confusion for users.

My best advise here is deprecate project_id and use 2 new explicit parameter names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice. I have reverted project_id's meaning and usage to as it was before. A new parameter job_project_id will handle job submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants