Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding generic SqlToSlackOperator #24663

Merged
merged 29 commits into from
Jun 29, 2022
Merged

Conversation

alexkruc
Copy link
Contributor

This PR is adding a generic SqlToSlackOperator, meaning everything that uses DbApiHook and implements get_pandas_df will now be able to send the results to Slack.
As part of this PR, I've also deprecated the current PrestoToSlackOperator and SnowflakeToSlackOperator - they now inherit from SqlToSlackOperator.

closes: #24243
reference: existing PrestoToSlackOperator and SnowflakeToSlackOperator, and the generic SqlToS3Operator.

@potiuk potiuk force-pushed the add_sql_to_slack_operator branch from ad4159c to 4f67dc1 Compare June 26, 2022 10:17
@potiuk
Copy link
Member

potiuk commented Jun 26, 2022

Rebased to fix selective check problem from #24665 after it's merged.

@eladkal
Copy link
Contributor

eladkal commented Jun 26, 2022

@alexkruc please check test failures

@Taragolis
Copy link
Contributor

Just an idea: #24660 (comment) make sending to slack more generic

Might be some side effect and multiple inheritance required in some cases or not.

@Taragolis
Copy link
Contributor

@eladkal let's keep track here

The generic operator will handle this - if implemented right there will be no need for individual classes in each provider, again if you have comments on this lets please take them to #24663

I would suggest more generic way, SqlToSlackOperator this generic only for DBApi.

If we have something that I mentioned before: base class in slack-provider BasePandasDataFrameToSlackOperator without implementation get_pandas_df

It could help to build not only SqlToSlackOperator but also some user-classes like AwsDataWranglerDynamoDBToSlack, LocalDataFrameToSlack, etc.

@Taragolis
Copy link
Contributor

Additional benefits you need to test implementation how actually works part with sending DataFrame in slack-provider (consumer), and how to produce this DataFrame in actual implementation, like in SqlToSlackOperator.

If any bugs appear in slack part, we need need to fix it slack-provider, if on producer part, only need to fix on producer.
Same work with new features.

Again, just an idea and I only talk about benefits without negative part of this idea.

@eladkal
Copy link
Contributor

eladkal commented Jun 26, 2022

yes we can, but this is a further enhancement. If alex wants to handle this as well he is more than welcome :)
personally I'm OK with leaving it to a followup PR. It's OK to make a change in a few smaller PR.. it makes the review easier

@Taragolis
Copy link
Contributor

yes we can, but this is a further enhancement. If alex wants to handle this as well he is more than welcome :)
personally I'm OK with leaving it to a followup PR. It's OK to make a change in a few smaller PR.. it makes the review easier

Totally agree that it should be further enhancement, otherwise it easily took ages to implements SqlToSlackOperator, and deprecate in Snowflake and Presto.

I will look on this PR more close on Monday. Probably I would suggest something that I've already found during SnowflakeToSlack enhancement.

Comment on lines 50 to 51
'webhook_token' attribute needs to be specified in the 'Extra' JSON field against the slack_conn_id.py
Mutually exclusive with 'slack_conn_id'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook_token in Extra marked as Deprecated

extra = conn.extra_dejson
web_token = extra.get('webhook_token', '')
if web_token:
warnings.warn(
"'webhook_token' in 'extra' is deprecated. Please use 'password' field",
DeprecationWarning,
stacklevel=2,
)

def _get_hook(self) -> DbApiHook:
self.log.debug("Get connection for %s", self.sql_conn_id)
conn = BaseHook.get_connection(self.sql_conn_id)
hook = conn.get_hook(hook_params=self.sql_hook_params)
Copy link
Contributor

@eladkal eladkal Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk need your advise on this one.

The hook_params was added in #18718 / #19849 which were released on 2.3.0 The Slack provider min version is 2.2.0 so I think this code line should have cause failures?

2.2.0:
https://github.com/apache/airflow/blob/2.2.0/airflow/models/connection.py#L292

2.3.0:
https://github.com/apache/airflow/blob/2.3.0/airflow/models/connection.py#L321

I wonder how it make sense that the CI is green?

@alexkruc in any case we don't need the hook_params here.. the goal is just to check if the hook has the needed function, the user params are not required for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexkruc I think we have two options here.

  1. backport the get_hook function into the operator code (temporarily, we will remove it once the provider will be Airflow>=2.3 compatible)
  2. Leave the PrestoToSlackOperator SnowflakeToSlackOperator code as is with just a deprecation warning, in that solution we should also check Airflow version when using the operator as it works only with 2.3 (you can check from airflow.version import version )

I think option 1 is prefered as otherwise it means only users of Airflow>=2.3 will be able to use this new operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. 1) is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fix to pre-commit to make sure we don't have this case is in #24706
Thanks @potiuk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk is there a way to make the new pre commit ignore a specific row?
The pre-commit fails the PR but @alexkruc checks the user Airflow version and have different code to handle base on version.
We can use only the backported version if there is no other option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader as in myself? ;)
I'll take a look, will love to get familiar a bit with the pre-commit flow :)
I'll update here if I'll get stuck or something.

I just thought that checking for versions and matching will be easier to read for future operator maintainers, and also it's easier to deprecate later (just remove the if block), I think that using the generic methods (as in conn.get_hook(hook_params=....)) as much as possible is the better way here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk @eladkal added the ignore lines for compatibility check fix, you can find it in this commit:
f3a29fb
It should run as part of this PR and we can check that it works e2e :) (although it worked locally)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! That's what I meant. One small NIT: since you made a "generic" exclusion - let's make it # ignore airflow compat check, We have some 2.2, some 2.3 and some 2.4 checks already foreseen to be there for quite some time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. Pre-comits are SUPER easy :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! thanks for the comment :) I changed it here - efd07a5

BTW. Pre-comits are SUPER easy :D

I agree! and besides being easy, they are also kinda fun 😄 The thing is, that as a user, and as a young contributor, I'm not often exposed to this, so it's nice learning something new about the Airflow eco-system ;)

airflow/providers/slack/transfers/sql_to_slack.py Outdated Show resolved Hide resolved
airflow/providers/slack/transfers/sql_to_slack.py Outdated Show resolved Hide resolved
# For supporting Airflow versions < 2.3, we backport "get_hook()" method. This should be removed
# when "apache-airflow-providers-slack" will depend on Airflow >= 2.3. Git reference:
# https://github.com/apache/airflow/blob/main/airflow/providers/slack/provider.yaml#L38
hook = _backported_get_hook(conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we pass the sql_hook_params as well? this is the whole reason for backporing the function isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your'e right :) fixed it :)

@potiuk
Copy link
Member

potiuk commented Jun 28, 2022

Needs rebase :(

@alexkruc alexkruc requested a review from ashb as a code owner June 29, 2022 08:39
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create SqlToSlackOperator
4 participants