Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common-SQL Operators Various Bugs #25163

Closed
2 tasks done
denimalpaca opened this issue Jul 19, 2022 · 0 comments · Fixed by #25164
Closed
2 tasks done

Common-SQL Operators Various Bugs #25163

denimalpaca opened this issue Jul 19, 2022 · 0 comments · Fixed by #25164
Labels
area:providers kind:bug This is a clearly a bug

Comments

@denimalpaca
Copy link
Contributor

Apache Airflow Provider(s)

common-sql

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.0.0

Apache Airflow version

2.3.3 (latest released)

Operating System

macOS Monterey 12.3.1

Deployment

Astronomer

Deployment details

No response

What happened

  • SQLTableCheckOperator builds multiple checks in such a way that if two or more checks are given, and one is not a fully aggregated statement, then the SQL fails as it is missing a GROUP BY clause.
  • SQLColumnCheckOperator provides only the last SQL query built from the columns, so when a check fails, it will only give the correct SQL in the exception statement by coincidence.

What you think should happen instead

  • Multiple checks should not need a GROUP BY clause
  • Either the correct SQL statement, or no SQL statement, should be returned in the exception message.

How to reproduce

For the SQLTableCheckOperator, using the operator like so:

table_cheforestfire_costs_table_checkscks = SQLTableCheckOperator(
            task_id="forestfire_costs_table_checks",
            table=SNOWFLAKE_FORESTFIRE_COST_TABLE,
            checks={
                "row_count_check": {"check_statement": "COUNT(*) = 9"},
                "total_cost_check": {"check_statement": "land_damage_cost + property_damage_cost + lost_profits_cost = total_cost"}
            }
        )

For the SQLColumnCheckOperator, using the operator like so:

cost_column_checks = SQLColumnCheckOperator(
    task_id="cost_column_checks",
    table=SNOWFLAKE_COST_TABLE,
    column_mapping={
        "ID": {"null_check": {"equal_to": 0}},
        "LAND_DAMAGE_COST": {"min": {"geq_to": 0}},
        "PROPERTY_DAMAGE_COST": {"min": {"geq_to": 0}},
        "LOST_PROFITS_COST": {"min": {"geq_to": 0}},
    }
)

and ensuring that any of the ID, LAND_DAMAGE_COST, or PROPERTY_DAMAGE_COST checks fail.

An example DAG with the correct environment and data can be found here.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant