-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make schema in DBApiHook private #17423
Merged
potiuk
merged 1 commit into
apache:main
from
potiuk:fix-backwards-compatiblity-problem-with-dbapi-schema
Aug 7, 2021
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, maybe this:
That should work for core both pre and post 2.2.0 and wouldn't break behavior of
get_uri
either like doing__schema
does.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that having
schema
as attribute in DBApiHook will encourage people to use it in their other Hooks deriving from DBApiHook. It will be rather difficult to make sure that everyone is using it in "hasattr" way. Even if we two remember it now, we will forget about it and other committers who are not involved will not even know about this. We could potentially automate a pre-commit check if the "DBApiHook's schema" is accessed with hasattr, but I think we will never be able to do it with 100% accuracy and I think it's simply not worth it for that single field, that can be simply replaced with single line for each operator that wants to use it (in init):I think we should think about DBApiHook as "Public" API of Airflow and any change to it should be very, very carefully considered.
Another option would be to add
>=Airflow 2.2
limitation to Postgres operator (and any other operator that uses it), but again I think sacrificing backwards compatibility in this case is simply not worth it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually I see an error .in my solution.. I should not do "pop" in the args :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed "pop" from DBApi to leave it for other hooks.
Actually I just realized the way it was implemented before - with pop() had undesired side effect for Hooks that already used kwargs.pop() .The side effect was that kwargs.pop() in DBApiHook would remove the
schema
from kwargs in derived classes and theirself.schema = kwargs.pop("schema", None)
would override the schema to None (!)Example: mysql:
https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
So in fact, the original change (not yet released luckily) in DBApiHook was even more disruptive than the failed
PostgresHook
. I am actually glad it was uncovered now, as it would be far more disruptive it was released in Airflow.That's why we should be extremely careful with changing DBApiHook (and BaseHook and similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 🙀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unlikely edge case (much less likely than use of schema when it will be defined in DBApiHook as public field). I hardly see the use for that. Why somone would like to define a DBApi -derived hook and pass a 'schema' parameter to it while also changing it in it's own
__init__
? Seems extremely unlikely, also It feels natural that in such case the change should be done beforesuper.__init__()
rather than after.We can assume that "schema" is our convention for kwarg para that we should use for all DB Hooks. We can standardise it (it's already commonly used) and release in Airflow 3 DBApiHook I think.
For now we might want to add some more docs/comments explaining it and some Airflow 3 notice about it. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have included an example. This is what I meant:
That's why I'm thinking letting both
DbApiHook
and any derived hooks both setschema
might be the best of both worlds here? Then the derived hooks could stop settingschema
in Airflow 3 (or the next time they get a min core version bump really).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - I think i fiigured out better way of doing it. I've added the 'schema' argument explicitly as optional kwarg to DBApiHook and made a comment about the "change schema value" in the description of the parameter.
I think it's much better - it makes schema an explicit part of the DBHookAPI hook's API, it is fully backwards compatible and it makes it very easy for Airflow 3 change - we will simply make the self.schema public and convert all the operators that use it in the "old" way. See the latest fixup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is - that it has already happened that we overlooked that the
DBApiHook.schema
object has been accessed directly by a provider, which made it backwards incompatible with Airflow 2.1. We do not have any mechanism to prevents this is in the future again, if we have it as a public field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" part and by making a public field in this class, we change the API.The way I proposed - in the last fixup,
schema
becomes part of the DBApiHook 'initialization' API. If any other hook stores the schema field as self.schema - so be it, it is "its own responsibiliity" - we make it clear now in the constructor of the DBApiHook that passing "schema" as kwargs is THE way how to override the schema, and by not having a public field, we clearly say that the deriving hook cannot expect that it can change it and expect "DBApiHook" getUri() method will use it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other comments? I think We will not release an out-of-band Postgres operator, so this is something we will have to solve mid-August, but would be good to get some opinions :)