Support new constraints
spec, and specifying constraints in create table as
#288
Labels
constraints
spec, and specifying constraints in create table as
#288
(Happy to split this into multiple issues — keeping it together here for now, to keep the conversation centralized)
Describe the feature
First, we are changing the spec of
constraints
to be a first-class property/configuration, and should update the implementation indbt-databricks
to match (rather than the currentmeta
-based approach).Changes to
dbt-core
:Associated changes to
dbt-spark
:Second: This is a feature request for Databricks, more so than a feature request for
dbt-databricks
!https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html
Context & alternatives
Databricks supports many types of constraints, and it can enforce two:
not null
constraints on columns in a table, andcheck
constraints of boolean expressions on one or more columns in a table.However, Databricks does not support the inclusion of a column spec / constraint within CTA (
create or replace table <tablename> as <sql>
). This puts us in a tricky situation when trying to atomically replace a table, and enforce constraints on the data being added in. These are the options, as we understand them:create or replace
the table with the constraints, theninsert
the data returned by the query, with the constraints enforced on any data being added in. Because Databricks does not support transactions, while the second statement (insert
) is running, the table will appear empty to downstream queriers.insert
the data returned by the query, with the constraints enforced. If all constraints pass, move the new table to the preexisting table’s location, requiring a “deep” clone that can be very slow (effectively copying the full dataset over from one location to another).create or replace table ... as
, ensuring zero downtime and no need to move data, then apply the constraints after the fact viaalter table
statements. Unfortunately, this means that the constraints aren’t actually enforced until after the fact—no better than existingdbt test
—and that the model’s table can therefore include data that violates its contracted expectations.Is that understanding correct? For now, we have opted for option 3 in
dbt-spark
(dbt-labs/dbt-spark#574), as it is closest to the existing pattern for atomically replacing models and testing them after-the-fact. (It's also the existing implementation indbt-databricks
, which allows defining constraints in themeta
dictionary.) The ideal is gaining the ability to include constraints within a CTA (create or replace table … as
).`Who will this benefit?
Are you interested in contributing this feature?
For the first bit, the code changes should look a lot like the changes that we'll be making in
dbt-spark
.For the second, I believe this would require a change to Apache Spark or Databricks, so I don't think I'd know how :)
The text was updated successfully, but these errors were encountered: