-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Snowflake DQ Operators #17741
Add Snowflake DQ Operators #17741
Conversation
The SnowflakeCheckOperator, SnowflakeValueCheckOperator, and SnowflakeIntervalCheckOperators are added as subclasses of their respective SQL Operators. These additions follow the conventions set in the BigQueryOperators subclassing from the same SQL_CheckOperators. This decision was made despite the existing SnowflakeOperator, as subclassing both SnowflakeOperator and a SQL_CheckOperator caused a couple issues: 1. Both operators have an execute method already, so some precaution would be necessary to choose the correct one. 2. An issue with the `sql` parameter occured, where if it was given as a kwarg (sql=sql), then the BaseOperator init() would through an exception for missing args, as if it weren't being added to **kwargs correctly.
Added three new tests, one for each new Snowflake Operator, in the same style as the BigQuery check operators. These tests ensure the hook is properly called. Not sure how helpful that really is, but it's coverage, and I have used these new operators (well, two of them) in an actual DAG with a live connection, and they're working. The snowflake.py file was touched to fix the method name I changed when trying other solutions, it was restored to make the SnowflakeOperator pass its test again.
I don’t understand Snowflake a bit to comment on whether this actually works, but the implementation seems straightforward (using things already present in Airflow) and sound. |
If it makes you feel better, I did add this DAG that I'm working on concurrently to the breeze environment and saw it working in a real Snowflake environment. So two of the three operators did run successfully. |
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Reopened to rebuid! |
I think you need to rebase to latest main @denimalpaca |
Can you please rebase to the latest |
This parameter was mistakenly added.
@@ -125,3 +126,285 @@ def execute(self, context: Any) -> None: | |||
|
|||
if self.do_xcom_push: | |||
return execution_info | |||
|
|||
|
|||
class _SnowflakeDbHookMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a class and not just a method inside the SnowflakeCheckOperator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to be more DRY. Originally I tried making the Check
classes inherit from just the SnowflakeOperator
, but it resulted in an error in the constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY philosophy is great. However, my point was that you don't need a class at all and you can get rid of the class extra layer of abstraction. My comment is inspired by https://www.youtube.com/watch?v=o9pEzgHorH0
Using the same class too in the SnowflakeOperator has been quite an improvement, but I think you can still have just a function in the file, and you can just call it from the operators. Anyway, it's not a big deal and the PR is already approved, so up to u.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wached the presentation :). Entertaining and I mostly agree with it (not everything :)
But yeah - in this case using mixin is definitely over-the-top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind changing that @denimalpaca ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have this fixed Tuesday, looks like with the way BaseSQLOperator works, even the function without a class might not be necessary.
This needs to be rebased to latest main. |
In a previous commit, the get_hook() method was removed from SnowflakeOperator, and the mixin class containing this functionality was used instead. This commit finishes updating the necessary name changes in the original class and test.
The SnowflakeOperatorMixin class was deemed an unncessary abstraction, so the get_db_hook() method is used as a standalone function called by the Operators directly. Tests are updated to reflect this change.
Add three new Snowflake operators based on SQL Checks
The SnowflakeCheckOperator, SnowflakeValueCheckOperator, and
SnowflakeIntervalCheckOperators are added as subclasses of their respective
SQL Operators. These additions follow the conventions set in the BigQueryOperators
subclassing from the same SQL_CheckOperators.
closes: #17694