-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(): Announcing DataHub Open Assertions Specification #10609
docs(): Announcing DataHub Open Assertions Specification #10609
Conversation
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
```bash | ||
datahub assertions compile -f examples/library/assertions_configuration.yml -p snowflake | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```bash | |
datahub assertions compile -f examples/library/assertions_configuration.yml -p snowflake | |
``` | |
```bash | |
datahub assertions compile -f examples/library/assertions_configuration.yml -p snowflake -x DMF_SCHEMA=<db>.<schema> |
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
- You must have a Snowflake Enterprise account, where the DMFs feature is enabled. | ||
- You must have the necessary permissions to provision DMFs in your Snowflake environment (see below) | ||
- You must have the necessary permissions to query the DMF results in your Snowflake environment (see below) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep 3 separate sections for permissions required in snowflake environment, as mentioned below:
Group 1. Permissions required for creating and registering DMFs (running dmf_definitions.sql and dmf_associations.sql)
Privilege | Object | Notes |
---|---|---|
USAGE | Database, schema | Database and schema where snowflake DMFs will be created. This is configured in compile command described below. |
CREATE FUNCTION | Schema | This privilege enables creating new DMF in schema configured in compile command. |
EXECUTE DATA METRIC FUNCTION | Account | This privilege enables you to control which roles have access to server-agnostic compute resources to call the system DMF. |
USAGE | Database, schema | These objects are the database and schema that contain the referenced table in the query. |
OWNERSHIP | Table | This privilege enables you to associate a DMF with a referenced table. |
USAGE | DMF | This privilege enables calling the DMF in schema configured in compile command. |
Database Role | Notes |
---|---|
SNOWFLAKE.DATA_METRIC_USER | To use System DMFs |
Group 2. Permissions required to view DMF results (snowflake ingestion)
Application Role | Notes |
---|---|
SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER | Query the DMF results table |
Group 3. Permissions required by owner of table (as scheduled DMFs run with table owner's role)
Privilege | Object | Notes |
---|---|---|
USAGE | Database, schema | Database and schema where snowflake DMFs will be created. This is configured in compile command described below. |
USAGE | DMF | This privilege enables calling the DMF in schema configured in compile command. |
EXECUTE DATA METRIC FUNCTION | Account | This privilege enables you to control which roles have access to server-agnostic compute resources to call the system DMF. |
Database Role | Notes |
---|---|
SNOWFLAKE.DATA_METRIC_USER | To use System DMFs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snowflake system admin can follow this guide to create new DataHub-specific role for assertions.
-- setup permissions to <table-owner-role> to run DMFs on schedule
grant usage on database "<dmf-database>" to role "<table-owner-role>"
grant usage on schema "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant usage on all functions in "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant usage on future functions in "<dmf-database>.<dmf-schema>" to role "<table-owner-role>"
grant database role SNOWFLAKE.DATA_METRIC_USER to role "<table-owner-role>"
grant execute data metric function on account to role "<table-owner-role>"
-- setup permissions to <assertion-service-role> to create DMFs and associate DMFs with table
grant usage on database "<dmf-database>" to role "<assertion-service-role>"
grant usage on schema "<dmf-database>.<dmf-schema>" to role "<assertion-service-role>"
grant create function on schema "<dmf-database>.<dmf-schema>" to role "<assertion-service-role>"
-- grant ownership + rest of permissions to <assertion-service-role>
grant role "<table-owner-role>" to role "<assertion-service-role>"
grant application role SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER to role "<datahub_role>"
where "<datahub_role>" is role used for ingestion and "" is role used to provision assertions on snowflake using SQL artifacts generated in compile
step below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! added this~
either via CLI or the UI visible as normal assertions. | ||
|
||
`datahub ingest -c snowflake.yml` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention few caveats section including:
- Snowflake supports at most 1000 dmf-table associations at the moment so you can not define more than 1000 assertions for snowflake.
- Snowflake does not allow JOIN queries or non-deterministic functions in DMF definition so you can not use these in SQL for SQL assertion or in filters section.
- All DMFs scheduled on a table must follow same exact schedule, so you can not set assertions on same table to run on different schedules.
- DMFs are only supported for regular tables and not dynamic or external tables. Same limitation applies for assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that makes sense. If we redirect to Snowflake DMF documentation it's sufficiet, otherwise we'll have to constantly update this file as things change on the snowflake side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look for opportunities to incorporate.
…ler' into jj--add-docs-for-assertions-compiler
…ler' into jj--add-docs-for-assertions-compiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super excited for this!
…ject#10609) Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]>
Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]> Co-authored-by: John Joyce <[email protected]>
Summary
View rendered announcement.
In this PR, we add a doc for announcing a new initiative to build the DataHub Open Source Assertions Specification, which will be a universal specification for declaring Data Quality checks, and then compiling them into artifacts that can be registered or directly executed by 3rd party Data Quality tools like Great Expectations, dbt tests, and Snowflake via Data Quality DMFs.
The sister PR for this one is located here and declares the foundational data models for each type of assertion we aim to support, along with a reference implementation of various assertion types built on top of Snowflake DMFs.
Please reach out if this project is interesting to you and you'd like to contribute other DQ sinks like GE, dbt test, soda, etc!
Status
Work in Progress. Working to provide updated examples of the Assertion definition specification.