Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1358] [Feature] Add column type constraints as dbt native configs #6079

Closed
3 tasks done
sungchun12 opened this issue Oct 17, 2022 · 6 comments · Fixed by #6271
Closed
3 tasks done

[CT-1358] [Feature] Add column type constraints as dbt native configs #6079

sungchun12 opened this issue Oct 17, 2022 · 6 comments · Fixed by #6271
Assignees
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code

Comments

@sungchun12
Copy link
Contributor

sungchun12 commented Oct 17, 2022

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Problem:

  • Data types are created after the fact(assumed by the database) and/or a dbt user has to explicitly cast data types in their SQL. Because of this, there isn't much incentive to assure dbt models have expected data types for consumers to depend on(it's tedious work :/). This may not be a problem for 50 dbt models, but becomes a big problem when you have 2000+ dbt models. There should be an easier way to configure data type constraints for any dbt models and provide options to enable/disable data type enforcement(think: what mypy does for the python static typing experience).

Solution:

Describe alternatives you've considered

Custom Materialization Macro in BigQuery: https://www.loom.com/share/1f1f190e66254d12984962c613e8082d

This is a general design pattern, but uses meta configs to enable this functionality. This is brittle because it injects custom configs into the manifest that aren't guaranteed to behave in expected ways.

Who will this benefit?

dbt users with large projects that need extra robustness in their developer experience and downstream users get the data they expect: data types and all.

Are you interested in contributing this feature?

yes, and I'll be working with someone in the community @jonathanneo

Anything else?

Research:

Considerations:

  • have the configs embedded in dbt-core and then each adapter can take stable configs to adjust table materialization macros
  • Enforce column positions based on schema config
  • dbt-core contains the hub of native configs while adapters own specific implementations as each database has nuances to which enforceable constraints are valid.
  • think about not null and default values for a table
  • Work with Jon Neo from Canva
  • Include a check constraint as pgsql enables it?
  • What if constraints could be defined once within a subfolder path in dbt_project.yml and then in automatically creates not null constraints in the DDL?
  • Leverage how seed configs enable data types: https://docs.getdbt.com/reference/resource-configs/column_types
@sungchun12 sungchun12 added enhancement New feature or request triage labels Oct 17, 2022
@sungchun12 sungchun12 self-assigned this Oct 17, 2022
@github-actions github-actions bot changed the title [Feature] Add column type constraints as dbt native configs [CT-1358] [Feature] Add column type constraints as dbt native configs Oct 17, 2022
@sungchun12
Copy link
Contributor Author

What this could look and feel like in snowflake: https://www.loom.com/share/14020499f5f646b6bc80c909716850fd

@sungchun12
Copy link
Contributor Author

sungchun12 commented Oct 25, 2022

What if we add enum data types to prevent dbt users from having to run accepted values tests?

version: 2
models:
  - name: constraints_example
    config:
      constraints_enabled: true
    columns:
      - name: color
        data_type: varchar
        constraint: value in ('red', 'green', 'yellow', 'blue')
      - name: shape
        data_type: enum('square', 'circle', 'triangle')

@jonathanneo
Copy link

Good spike! Just in time too, more and more data warehouse providers are adding constraints into their table definitions. For example, databricks have just added table constraints in October 2022: https://docs.databricks.com/tables/constraints.html

@sungchun12
Copy link
Contributor Author

sungchun12 commented Oct 26, 2022

A couple notes on design and scoping

dbt-core design:

  • Ensure the below is allowed and contains defensive code for when constraints_enabled: true that it requires other schema config values are filled in and are not blank
  • constraints_enabled should have overrides similar to node_color configs: <example-sql-file.sql> overrides schema.yml overrides dbt_project.yml
--example sql file config

{{
  config(
    materialized = "table",
    constraints_enabled = true
  )
}}

select 
  1 as id, 
  'blue' as color, 
  cast('2019-01-01' as date) as date_day
# schema.yml
version: 2

models:
  - name: constraints_example
    config:
      constraints_enabled: true
    columns:
      - name: id
        data_type: integer
        description: I want to describe this one, but I don't want to list all the columns
        constraint: not null
      - name: color
        data_type: string
      - name: date_day
        data_type: date
        default_value: CURRENT_DATE()
# dbt_project.yml
models:
  +constraints_enabled: true # enforce all models have constraints typed
  dbt_artifacts:
    +docs:
      show: false
    +schema: dbt_artifacts_sung
    staging:
      +schema: dbt_artifacts_sung
  tpch:
    staging:
      database: '{{ env_var("DBT_DYNAMIC_DATABASE","analytics") }}'
      +materialized: view
      +docs:
        node_color: "#FB6962"

    marts:
      intermediate:
        +constraints_enabled: false # overrides the root level enforcement
        +docs:
          node_color: "#3AAAE0"
      core:
        +tags: "owner:sung"
        materialized: table
        +docs:
          node_color: "#F6AE3E"

adapter design:

@joshuataylor
Copy link
Contributor

I volunteer to help contribute the Snowflake portions and help test on a real world project. This will really help with incremental updates especially!

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code and removed triage labels Oct 28, 2022
@sungchun12
Copy link
Contributor Author

sungchun12 commented Nov 2, 2022

Team Expectations

Align on vision and scope of work: see above
Define success together: see above
Due Date: 12/7/2022(pull request submitted with passing tests)
Align on time/energy commitments: 5 weeks

  • @sungchun12 : 5-10 hours/week, product manager-lite, dbt-core, adapter engineer
  • @b-per : 4-8 hours/week, dbt-core, adapter engineer(TODO: Confirm these expectations)
  • @Victoriapm : 5 hours/week?, adapter engineer
  • All of us write passing tests

Align on project workflow and how we keep each other accountable:

  • slack DMs as a group
  • no recurring meetings
  • update with comments in this github issue for progress to keep it centralized
  • We'll use tasks within github projects to easily assign discrete units of work: here
  • Mondays slack DM updates weekly 10am PST.

Feel ready to start working this week:

Each of these will be a task in the github project above

  • What files need to be touched in dbt-core?
  • What files need to be touched in adapters: BigQuery, Postgres, Snowflake, Databricks to start?
  • What should config UX feel like(think: setting and updating configs, terminal logs, defensive behaviors to prevent users causing unintended effects)?
  • Make a loom demo video for each adapter
  • Update docs website

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code
Development

Successfully merging a pull request may close this issue.

6 participants