-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hook_params in BaseSqlOperator #18718
Add hook_params in BaseSqlOperator #18718
Conversation
airflow/models/connection.py
Outdated
@@ -289,8 +289,11 @@ def rotate_fernet_key(self): | |||
if self._extra and self.is_extra_encrypted: | |||
self._extra = fernet.rotate(self._extra.encode('utf-8')).decode() | |||
|
|||
def get_hook(self): | |||
"""Return hook based on conn_type.""" | |||
def get_hook(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a backwards-compatibility (or rather future-compatibility) problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to get_hook
, they will not be usable with already released versions of Airflow.
We have A LOT of operators derived from SQLOperator or using get_hook()
directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing.
I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption.
I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (get_hook with params()
rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes hook_params
. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way of passing parameters here - one that will be easy to detect and flag by pre-commit.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have hook_kwargs
as a keyword arg as the following:
def get_hook(self, **kwargs): | |
def get_hook(self, *, hook_kwargs=None): |
and then L311 can become:
- return hook_class(**{conn_id_param: self.conn_id}, **kwargs)
+ return hook_class(**{conn_id_param: self.conn_id}, **hook_kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potiuk the above note by Kaxil has been implemented and tested. Very curious to hear your opinion on that solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to figure out how to make sure dependencies are properly updated for providers that use the new feature
Can you take a look at the failing tests please @denimalpaca - https://github.com/apache/airflow/runs/4038752898?check_suite_focus=true#step:6:6341 |
@@ -131,281 +126,3 @@ def execute(self, context: Any) -> None: | |||
|
|||
if self.do_xcom_push: | |||
return execution_info | |||
|
|||
|
|||
class SnowflakeCheckOperator(SQLCheckOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove this operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was opened as an alternative to adding more SnowflakeCheckOperators
, so it seemed to me that their removal here made sense given the nature of this change making all these *Check*
operators redundant. Although I see how it could warrant a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes a separate PR and we don't want the current users of this Operator to have to change their DAGs so let's not remove them please
ping @potiuk @jedcunningham for reviews |
pinging @potiuk and @jedcunningham again for review |
This commit modifies the SQLBaseOperator to allow for a new Dict, hook_params, passed in as a parameter. The hook_params makes the SQLBaseOperator, and therefore all SQL classes that inherit from it, more extensible to other operators, e.g.: allowing SQLCheck Operators to be used in a Snowflake environment that requires more parameters than the SQL Operator may expect.
Using hook_params as a dictionary to pass kwargs resulted in testing errors, specifically that the hook_param values were not recognized as valid arguments to conn.get_hook(). Using **kwargs resolved this.
Use a more safe way to construct hook_params as an empty dict in the case that no params are passed in.
As a response to Jarek's comment on future-proofing the use of **kwargs, Kaxil noted a change to using a hook-specific hook_kwargs param would do exactly that future-proofing. This commit makes that change.
A previous change added parentheses to the _hook call in get_db_hook, erroneously calling _hook not as a cached property, but as a method. This caused other operators using the BaseSQLOperator to try calling their own hook, resulting in an error as the hook object was not callable.
This PR was opened as a better way to add SQL Check Operator functionality the Snowflake Operator; with this functionality implemented in the SQL Operator, the additional and redundant Snowflake Check Operators are no longer needed and thus removed.
Due to the change in the SQLOperator, corresponding SnowflakeCheckOperators are no longer needed and have been removed. This commit reflects that update in the test file.
This reverts commit ff34d3ff1a6d27296b1320c8ac8a91413144686d.
This reverts commit f323b6818732703ccb63b656b7720e5987d4b747.
I just rebased on main branch for tests to pass ! |
@potiuk @ephraimbuddy Gentle Ping -- to verify this as you'll have requested changes earlier |
The requested changes have been addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Modifies
sql.py
andconnection.py
to takehook_params
in the form of**kwargs
. This allows other provider backends, like BigQuery and Snowflake, to use the SQL Operators without subclassing SQL Operators, reducing code duplication and boilerplate for what amounts to just creating a customget_db_hook()
function.This PR can be used in place of (and closes) 18413.