Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Snowflake DQ Operators #17741

Merged
merged 12 commits into from
Sep 9, 2021
Merged

Conversation

denimalpaca
Copy link
Contributor

Add three new Snowflake operators based on SQL Checks

The SnowflakeCheckOperator, SnowflakeValueCheckOperator, and
SnowflakeIntervalCheckOperators are added as subclasses of their respective
SQL Operators. These additions follow the conventions set in the BigQueryOperators
subclassing from the same SQL_CheckOperators.

closes: #17694

The SnowflakeCheckOperator, SnowflakeValueCheckOperator, and
SnowflakeIntervalCheckOperators are added as subclasses of their respective
SQL Operators. These additions follow the conventions set in the BigQueryOperators
subclassing from the same SQL_CheckOperators. This decision was made despite the
existing SnowflakeOperator, as subclassing both SnowflakeOperator and a SQL_CheckOperator
caused a couple issues:
  1. Both operators have an execute method already, so some precaution would be
	 necessary to choose the correct one.
  2. An issue with the `sql` parameter occured, where if it was given as a kwarg (sql=sql),
	 then the BaseOperator init() would through an exception for missing args, as if
	 it weren't being added to **kwargs correctly.
Added three new tests, one for each new Snowflake Operator, in the same
style as the BigQuery check operators. These tests ensure the hook is properly
called. Not sure how helpful that really is, but it's coverage, and I have used
these new operators (well, two of them) in an actual DAG with a live connection,
and they're working.

The snowflake.py file was touched to fix the method name I changed when
trying other solutions, it was restored
to make the SnowflakeOperator pass its test again.
@boring-cyborg boring-cyborg bot added area:providers provider:snowflake Issues related to Snowflake provider labels Aug 19, 2021
@denimalpaca denimalpaca changed the title Snowflake DQ Operators Add Snowflake DQ Operators Aug 19, 2021
@uranusjr
Copy link
Member

I don’t understand Snowflake a bit to comment on whether this actually works, but the implementation seems straightforward (using things already present in Airflow) and sound.

@denimalpaca
Copy link
Contributor Author

I don’t understand Snowflake a bit to comment on whether this actually works, but the implementation seems straightforward (using things already present in Airflow) and sound.

If it makes you feel better, I did add this DAG that I'm working on concurrently to the breeze environment and saw it working in a real Snowflake environment. So two of the three operators did run successfully.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Aug 20, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk closed this Aug 20, 2021
@potiuk potiuk reopened this Aug 20, 2021
@potiuk
Copy link
Member

potiuk commented Aug 20, 2021

Reopened to rebuid!

@potiuk
Copy link
Member

potiuk commented Aug 20, 2021

I think you need to rebase to latest main @denimalpaca

@potiuk
Copy link
Member

potiuk commented Aug 22, 2021

Can you please rebase to the latest main ? I think there were some fixes to address genera CI job failures.

This parameter was mistakenly added.
@@ -125,3 +126,285 @@ def execute(self, context: Any) -> None:

if self.do_xcom_push:
return execution_info


class _SnowflakeDbHookMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a class and not just a method inside the SnowflakeCheckOperator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to be more DRY. Originally I tried making the Check classes inherit from just the SnowflakeOperator, but it resulted in an error in the constructors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY philosophy is great. However, my point was that you don't need a class at all and you can get rid of the class extra layer of abstraction. My comment is inspired by https://www.youtube.com/watch?v=o9pEzgHorH0

Using the same class too in the SnowflakeOperator has been quite an improvement, but I think you can still have just a function in the file, and you can just call it from the operators. Anyway, it's not a big deal and the PR is already approved, so up to u.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wached the presentation :). Entertaining and I mostly agree with it (not everything :)

But yeah - in this case using mixin is definitely over-the-top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind changing that @denimalpaca ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have this fixed Tuesday, looks like with the way BaseSQLOperator works, even the function without a class might not be necessary.

airflow/providers/snowflake/operators/snowflake.py Outdated Show resolved Hide resolved
airflow/providers/snowflake/operators/snowflake.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

This needs to be rebased to latest main.

In a previous commit, the get_hook() method was removed from
SnowflakeOperator, and the mixin class containing this functionality
was used instead. This commit finishes updating the necessary name changes
in the original class and test.
The SnowflakeOperatorMixin class was deemed an unncessary abstraction,
so the get_db_hook() method is used as a standalone function called by
the Operators directly. Tests are updated to reflect this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:snowflake Issues related to Snowflake provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SQL Check Operators to Snowflake Provider
6 participants