Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hook_params in BaseSqlOperator #18718

Merged
merged 12 commits into from
Nov 15, 2021

Conversation

denimalpaca
Copy link
Contributor

Modifies sql.py and connection.py to take hook_params in the form of **kwargs. This allows other provider backends, like BigQuery and Snowflake, to use the SQL Operators without subclassing SQL Operators, reducing code duplication and boilerplate for what amounts to just creating a custom get_db_hook() function.

This PR can be used in place of (and closes) 18413.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Oct 4, 2021
airflow/operators/sql.py Outdated Show resolved Hide resolved
@@ -289,8 +289,11 @@ def rotate_fernet_key(self):
if self._extra and self.is_extra_encrypted:
self._extra = fernet.rotate(self._extra.encode('utf-8')).decode()

def get_hook(self):
"""Return hook based on conn_type."""
def get_hook(self, **kwargs):
Copy link
Member

@potiuk potiuk Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a backwards-compatibility (or rather future-compatibility) problem with that approach. While this change on it's own is backwards, compatible and old providers will work, if you implement a change to a provider that adds kwargs to get_hook, they will not be usable with already released versions of Airflow.

We have A LOT of operators derived from SQLOperator or using get_hook() directly. If we just change a signature of get_hook() it will be very easy for anyone to start using it and breaking compatibility with released Airflow version without anyone noticing.

I am not saying we should not do it, it's a good change but I think this change should be much more thought out and we need to figure out when and how to do it to avoid disruption.

I dop not yet know how to do it best - my current thinking is that we should rather introduce a new method (get_hook with params() rather than modify existing ones, and add a pre-commit check that will check if a provider is using that method, and signals that the provider should have the "additional_dependency" added for Airflow >= X.Y.Z. We would also have to check the usages of SQL Operators the same way and flag problems with dependencies if we see that any of the BaseSQL-derived operators passes hook_params. The second part will be much more "brittle" I am afraid, and difficult to detect , so maybe we should figure a bit different way of passing parameters here - one that will be easy to detect and flag by pre-commit.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have hook_kwargs as a keyword arg as the following:

Suggested change
def get_hook(self, **kwargs):
def get_hook(self, *, hook_kwargs=None):

and then L311 can become:

- return hook_class(**{conn_id_param: self.conn_id}, **kwargs)
+ return hook_class(**{conn_id_param: self.conn_id}, **hook_kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk the above note by Kaxil has been implemented and tested. Very curious to hear your opinion on that solution.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out how to make sure dependencies are properly updated for providers that use the new feature

@kaxil
Copy link
Member

kaxil commented Nov 1, 2021

Can you take a look at the failing tests please @denimalpaca - https://github.com/apache/airflow/runs/4038752898?check_suite_focus=true#step:6:6341

@@ -131,281 +126,3 @@ def execute(self, context: Any) -> None:

if self.do_xcom_push:
return execution_info


class SnowflakeCheckOperator(SQLCheckOperator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove this operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was opened as an alternative to adding more SnowflakeCheckOperators, so it seemed to me that their removal here made sense given the nature of this change making all these *Check* operators redundant. Although I see how it could warrant a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a separate PR and we don't want the current users of this Operator to have to change their DAGs so let's not remove them please

@kaxil kaxil requested a review from jedcunningham November 5, 2021 22:16
@kaxil
Copy link
Member

kaxil commented Nov 5, 2021

ping @potiuk @jedcunningham for reviews

@denimalpaca
Copy link
Contributor Author

pinging @potiuk and @jedcunningham again for review

This commit modifies the SQLBaseOperator to allow for a new Dict,
hook_params, passed in as a parameter. The hook_params makes the
SQLBaseOperator, and therefore all SQL classes that inherit from it,
more extensible to other operators, e.g.: allowing SQLCheck Operators
to be used in a Snowflake environment that requires more parameters than
the SQL Operator may expect.
Using hook_params as a dictionary to pass kwargs resulted in testing
errors, specifically that the hook_param values were not recognized
as valid arguments to conn.get_hook(). Using **kwargs resolved this.
Use a more safe way to construct hook_params as an empty dict in
the case that no params are passed in.
As a response to Jarek's comment on future-proofing the use of **kwargs,
Kaxil noted a change to using a hook-specific hook_kwargs param would
do exactly that future-proofing. This commit makes that change.
A previous change added parentheses to the _hook call in get_db_hook,
erroneously calling _hook not as a cached property, but as a method.
This caused other operators using the BaseSQLOperator to try calling
their own hook, resulting in an error as the hook object was not
callable.
This PR was opened as a better way to add SQL Check Operator functionality
the Snowflake Operator; with this functionality implemented in the SQL
Operator, the additional and redundant Snowflake Check Operators are
no longer needed and thus removed.
Due to the change in the SQLOperator, corresponding SnowflakeCheckOperators
are no longer needed and have been removed. This commit reflects
that update in the test file.
This reverts commit ff34d3ff1a6d27296b1320c8ac8a91413144686d.
This reverts commit f323b6818732703ccb63b656b7720e5987d4b747.
@kaxil
Copy link
Member

kaxil commented Nov 11, 2021

I just rebased on main branch for tests to pass !

@kaxil
Copy link
Member

kaxil commented Nov 12, 2021

@potiuk @ephraimbuddy Gentle Ping -- to verify this as you'll have requested changes earlier

@ephraimbuddy ephraimbuddy dismissed their stale review November 14, 2021 18:51

The requested changes have been addressed

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@potiuk potiuk closed this Nov 14, 2021
@potiuk potiuk reopened this Nov 14, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 14, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow full tests needed We need to run full set of tests for this PR to merge type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants