-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1918] [Discussion] Unify constraints and constraints_check configs #6750
Comments
I definitely think this will be a step in the right direction, by unifying into a single I'm not 100% convinced that
|
I'm all for standardizing on a separate |
In addition to standardizing across data platforms, which either call this A related thought here, prompted by dbt-labs/dbt-snowflake#341 (comment). We should create methods/mechanisms for each adapters to say:
columns:
- name: id
data_type: numeric
constraints:
# on all data platforms (even ones that call this 'nullable')
- not_null: true
# only allowed for data platforms that actually enforce 'check' constraints
- check: "id > 0"
# everything else, for metadata purposes (and sometimes query optimization ... oh boy)
- unenforced: primary key
- unenforced: foreign key Note that BigQuery will actually raise an explicit error if you try to set constraints that aren't enforced! Personally, I'm supportive of this, but it's another platform-specific behavior to be aware of: create table dbt_jcohen.my_cool_table (id int primary key) as (select 1 as id);
|
This will require follow up on all the adapters to determine what needs to be changed there. |
summarizing from what I remember of discussion yesterday Things we agree on:
Questions:
@MichelleArk Let's aim to lock down answer these questions by next week, so that we can make these changes sooner rather than later — ideally by b2 (March 1), otherwise by b3 (March 15). @sungchun12 @dave-connors-3 @Victoriapm @b-per Happy to hear any opinions / instincts / leanings you might have! |
Is this already true for |
Yes, If we can make #6751 happen — the way we'll be verifying the "contract" for data types, as a pre-flight check, will differ from the way we verify |
Postgres enforces primary and foreign keys. Would we then implement formal config constraints like: Also, I'm a big fan of housing all the constraints config under a single config vs. decoupling constraints and checks |
@sungchun12 Really good point that Postgres actually does enforce So we do need some way for each adapter to tell us which constraints it's actually able to enforce at build time. For Postgres, that's all of them. For Redshift, Snowflake, and BigQuery, it's just Then, we can use that adapter-level check to tell us:
Decision:
|
@jtcohen6 Replying here for traceability, and let's get down and dirty in slack!
I agree with telling the user which constraints are enforceable at build time. Probably a terminal logging message that lets the user know clearly what's NOT being enforced and points to the specific config shouting this warning vs. a generic, "Hey you're using
I understand the implementation reasoning, but the developer reasoning conflicts with this logic in practice. If I'm a developer building constraints for the first time OR adjusting constraints in an existing contract, I'll be wondering why
Agreed
Agreed AND I'd like to see
Agreed on functionality, I recommend
Agreed on boolean column-level attribute and NOT model-level. Do NOT agree on constraints config hierarchy.
Agreed, we don't want to provide so many guardrails that we create a prison and not the fun sandbox it should be. Thankfully, the databases provide helpful tips when constraints are wrong based on ad hoc testing.
Agreed on these primitives!
I'd say BREAKING change because the substance of the contract has changed because the expression changed. I'd be okay with it NOT being a breaking change if ONLY the name of the check is different but the expression remains the same. |
@sungchun12 Love this feedback! The only place where you + I disagree is about whether
@MichelleArk @dbeatty10 Very open to hearing your thoughts / votes one way or the other! Given that this is the only outstanding question, I think we can move this issue out of "refinement" and into "estimation." We may want to update the original issue description with the final proposed spec, or open as a new "clean" issue. |
I would be more in favor of "not special". If more warehouses start to enforce additional constraints, e.g. |
@b-per Thanks for weighing in!! Let me flip that around: If that happens — or shall we say "when" :) — what would you think about promoting In the current proposal, |
With this approach, are we saying that when a warehouse starts supporting On the first-class attribute list, would we also consider |
@b-per I'm not aware of any columnar / analytical data platforms that can enforce constraints for uniqueness, primary keys, and foreign keys. (I think they can be sorta supported in columnar extensions/forks of transactional databases, with some gotchas — e.g. Citus docs — but just as often not, e.g. MariaDB column store.) So I think I'd be open to adding these as first-class attributes once there's been a precedent set by at least one (ideally two) of the major data platform players. If/when that happens, I hear you on it not being a delightful experience to have to wait for a new version of |
These are super valid usability concerns, and to me outweigh the benefits to promoting not-null to a column-level attribute. That said, I do still think not null is 'special' for the reasons stated above: it's column-level only; supported & enforced on all data platforms; part of DB-API2 standard for integral column metadata. My motivation for seeing not null as special is so that its presence can be used as input to detecting breaking changes to model contracts as part of the state:modified check. But it doesn't need to be a column-level attribute to implement that! To me this comes down to what is part of the model contract (enforceable pre-flight, part of the state:modified check) vs model constraints (warehouse specific behaviour + configuration). The new |
As a summary, there are six (6) constraint types that have been discussed:
To help me process pros/cons, I made an attempt at re-phrasing points that have been raised so far. Let's suppose that it should be special (and promoted to a column-level attribute): Pros
Cons
My opinion: all constraints grouped together visuallyTL;DR "it shouldn't be special" From what I can tell so far, it seems like our code will be able to handle things either way. Please speak up if this is not that the case! If so, for me it would come down to what type of visual grouping we want for human cognition:
Most often, I come down on the side of whatever the standards suggest (which would be Option 3: visually separating single-column and multi-column). In this case, I'd lean towards visually grouping all constraints together since I think it will be more intuitive for users and it doesn't seem like we'd be giving up much if anything at all. (Maybe I should have done pros/cons from the "not special perspective instead!) Ultimately, this is pretty loosely held for me, because I think our users will be successful either way as long as we have good docs and good error messages. |
Upon further review, the ruling on the field is overturned! Thank you @sungchun12 @b-per @MichelleArk @dbeatty10 for weighing in with such thoughtful & well-articulated comments. Let's keep We may want to add |
@jtcohen6 thanks for listening and caring so much. This thread is such a lovely role model for people to debate healthily and thoughtfully! |
In the foundational dbt constraints work,
constraints: List[str]
andconstraints_check: str
are optional column configs that configure column-level constraints and row-level checks respectively.Let's unify these two configs into a single optional
constraints
([List[Union[str, Dict[str, str]]]]
) that encompasses both sets of functionality. Relevant discussion: #6271 (comment)Before:
After:
The text was updated successfully, but these errors were encountered: