Support new `constraints` spec, and specifying constraints in `create table as` #288

jtcohen6 · 2023-03-07T23:42:29Z

(Happy to split this into multiple issues — keeping it together here for now, to keep the conversation centralized)

Describe the feature

First, we are changing the spec of constraints to be a first-class property/configuration, and should update the implementation in dbt-databricks to match (rather than the current meta-based approach).

Changes to dbt-core:

Associated changes to dbt-spark:

Second: This is a feature request for Databricks, more so than a feature request for dbt-databricks!

create table <table> (<column_name> <data_type> not null) as (
    select <query>
);

https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html

This optional clause populates the table using the data from query. When you specify a query you must not also specify a column_specification. The table schema will be derived form the query.

Context & alternatives

Databricks supports many types of constraints, and it can enforce two: not null constraints on columns in a table, and check constraints of boolean expressions on one or more columns in a table.

However, Databricks does not support the inclusion of a column spec / constraint within CTA (create or replace table <tablename> as <sql>). This puts us in a tricky situation when trying to atomically replace a table, and enforce constraints on the data being added in. These are the options, as we understand them:

create or replace the table with the constraints, then insert the data returned by the query, with the constraints enforced on any data being added in. Because Databricks does not support transactions, while the second statement (insert) is running, the table will appear empty to downstream queriers.
Create a new table with the constraints, in a separate location, then insert the data returned by the query, with the constraints enforced. If all constraints pass, move the new table to the preexisting table’s location, requiring a “deep” clone that can be very slow (effectively copying the full dataset over from one location to another).
Atomically replace the table via create or replace table ... as, ensuring zero downtime and no need to move data, then apply the constraints after the fact via alter table statements. Unfortunately, this means that the constraints aren’t actually enforced until after the fact—no better than existing dbt test—and that the model’s table can therefore include data that violates its contracted expectations.

Is that understanding correct? For now, we have opted for option 3 in dbt-spark (dbt-labs/dbt-spark#574), as it is closest to the existing pattern for atomically replacing models and testing them after-the-fact. (It's also the existing implementation in dbt-databricks, which allows defining constraints in the meta dictionary.) The ideal is gaining the ability to include constraints within a CTA (create or replace table … as).`

Who will this benefit?

Users of "model contracts" in dbt v1.5+ (beta documentation: https://docs.getdbt.com/docs/collaborate/publish/model-contracts)

Are you interested in contributing this feature?

For the first bit, the code changes should look a lot like the changes that we'll be making in dbt-spark.

For the second, I believe this would require a change to Apache Spark or Databricks, so I don't think I'd know how :)

The text was updated successfully, but these errors were encountered:

github-actions · 2023-09-04T01:46:18Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

jtcohen6 added the enhancement New feature or request label Mar 7, 2023

github-actions bot added the Stale label Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support new `constraints` spec, and specifying constraints in `create table as` #288

Support new `constraints` spec, and specifying constraints in `create table as` #288

jtcohen6 commented Mar 7, 2023

github-actions bot commented Sep 4, 2023

Support new constraints spec, and specifying constraints in create table as #288

Support new constraints spec, and specifying constraints in create table as #288

Comments

jtcohen6 commented Mar 7, 2023

Describe the feature

Context & alternatives

Who will this benefit?

Are you interested in contributing this feature?

github-actions bot commented Sep 4, 2023

Support new `constraints` spec, and specifying constraints in `create table as` #288

Support new `constraints` spec, and specifying constraints in `create table as` #288