Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(providers.google): add setter to BigQueryInsertJobOperator sql property #33211

Conversation

Lee-W
Copy link
Member

@Lee-W Lee-W commented Aug 8, 2023

due to the introduction of this property, users who set the sql attribute to some value will encounter an AttributeError This commit indents to mitigate this issue


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

…operty

due to the introduction of this property, users who set the sql attribute to some value will encounter an AttributeError
This commit indents to mitigate this issue
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Aug 8, 2023
@hussein-awala
Copy link
Member

@mobuchowski could you take a look?

related: #31293

@hussein-awala
Copy link
Member

Since your PR fixes a bug not detected in the current tests, could you add a new test which reproduces the issue and passes with your change?

@Lee-W
Copy link
Member Author

Lee-W commented Aug 8, 2023

I think the sql.setter might not be desired in the long term. Thus, I add a deprecation warning to the setter

@Lee-W
Copy link
Member Author

Lee-W commented Aug 8, 2023

Since your PR fixes a bug not detected in the current tests, could you add a new test which reproduces the issue and passes with your change?

Sure. It's late in my time zone now. Will add some tests early tomorrow

@mobuchowski
Copy link
Contributor

mobuchowski commented Aug 8, 2023

What's the use case here? Migration from BigQueryExecuteQueryOperator? In this case fixing this makes sense.

BigQueryInsertJobOperator does not use sql for anything other than for OpenLineage method. So I think the fix might be to change name of the property to something unused, like executed_sql, and make _BigQueryOpenLineageMixin read it instead.

Current solution makes OpenLineage integration not work, reading from user provided data rather than actually executed SQL in self.configuration["query"]["query"].

@eladkal
Copy link
Contributor

eladkal commented Aug 8, 2023

@Lee-W is the issue appear only in BigQueryExecuteQueryOperator?

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Putting a block on merge as this related to vote in progress and for the moment I am not sure if this is the right fix for the problem or maybe the best fix is to get rid of the deprecated class and make major release)

The BigQueryExecuteQueryOperator is deprecated for really long time and this is not the first time it causes pain.

@mobuchowski
Copy link
Contributor

@eladkal it's not changing BigQueryExecuteQueryOperator, but BigQueryJobInsertOperator. I've introduced sql property to it, to have common path between those two operators for actually getting sql text. Now, the BigQueryExecuteQueryOperator part was removed from PR - so the property can be freely renamed as long as it's changed accordingly in _BigQueryOpenLineageMixin. Those changes do not touch deprecated BigQueryExecuteQueryOperator at all.

@mobuchowski
Copy link
Contributor

Added alternative PR addressing this: #33214

@potiuk
Copy link
Member

potiuk commented Aug 8, 2023

Another interesting possibility would be to turn the property into @cached_property - that might be simpler and work better if we do not care about "caching" attribute of it (i.e. what happens if someone changes configuration["query"]["query"] after first access.

As discussed in #33084 (comment) - @cached_property has this nice feature that it actually defines attribute on first use and that attribute can be overridden or deleted (cache invalidation).

Maybe that is the best fix @Lee-W @mobuchowski ?

@pankajkoti
Copy link
Member

@mobuchowski just in case you were wondering where we use this. Long time back we tried to add an OL extractor for our async operator here: https://github.com/astronomer/astronomer-providers/blob/main/astronomer/providers/google/cloud/extractors/bigquery.py#L131

@mobuchowski
Copy link
Contributor

mobuchowski commented Aug 8, 2023

@pankajkoti ooh... this actually makes sense now and does not break OL 😄 BTW, care to migrate those to google provider now that we have OL provider released now?

@potiuk yep - this sounds like the best solution for now. According to @pankajkoti's code, this is pretty much irrelevant - the change is for backwards compatibility anyway

@eladkal
Copy link
Contributor

eladkal commented Aug 8, 2023

I think the sql.setter might not be desired in the long term.

Why?


The report claims that BigQueryInsertJobOperator is broken.
Can someone please share example usage that demonstrate it?

@mobuchowski
Copy link
Contributor

Added proposed @cached_property solution here: #33218

@mobuchowski
Copy link
Contributor

@pankajkoti @Lee-W what about adding try/except for that setter on your side, and going with rc2 release?

@pankajkoti
Copy link
Member

@eladkal I will try giving an attempt to elaborate.

While exploring openlineage earlier some time back, we wrote a custom OpenLineage extractor for one of our Asynchronous operators at Astronomer https://github.com/astronomer/astronomer-providers/blob/main/astronomer/providers/google/cloud/extractors/bigquery.py

In this module, we were trying to set the sql attribute on the BigQueryInsertJobOperator here: https://github.com/astronomer/astronomer-providers/blob/main/astronomer/providers/google/cloud/extractors/bigquery.py

Now since that attribute is redefined to be a property now, while we're trying to assign a value there, it gives an error saying we cannot set it. Here is a test that failed for us while we were testing the RC. https://app.circleci.com/pipelines/github/astronomer/astronomer-providers/6315/workflows/3e9c46a3-ee33-4c58-98cb-1994322af0d7/jobs/42343

Like @mobuchowski suggested, we would need to migrate our extractor to accommodate the changes. But meantime it would help us if we have a backward compatible fix and I think #33218 would help us here with the backward compatibility, otherwise users of this OSS async provider who have enabled OpenLineage extractor for this operator would be affected.

@pankajkoti
Copy link
Member

pankajkoti commented Aug 8, 2023

@pankajkoti @Lee-W what about adding try/except for that setter on your side, and going with rc2 release?

Since we don't have an upper bound for the google provider https://github.com/astronomer/astronomer-providers/blob/main/setup.cfg#L64, our previous releases won't have the try/except block and unfortunately it would break the provider :(

@potiuk
Copy link
Member

potiuk commented Aug 8, 2023

@pankajkoti @Lee-W what about adding try/except for that setter on your side, and going with rc2 release?

Since we don't have an upper found for the google provider https://github.com/astronomer/astronomer-providers/blob/main/setup.cfg#L64, our previous releases won't have the try/except block and unfortunately it would break the provider :(

I'd be for making RC3 with the fix I proposed and @mobuchowski implemented in #33218. While not technically breaking, this is breatking some ways with which Open-Lineage integration works for existing open-lineage integration currently existing in Astronomer.

It's a bit funny how Astronomer DOS'ed itself with it a bit - 🤣 - but hey, I think it's important not to break it - especially that this is likely version of the provider that will come with 2.7.0.

@Lee-W
Copy link
Member Author

Lee-W commented Aug 9, 2023

Thanks all for your suggestions and helping out! I think #33218 works better 🎉 Let me close this one.

@Lee-W Lee-W closed this Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants