Should dbt know / have an opinion on data types (& related functions)? #5778

jpmmcneill · 2022-09-06T22:34:52Z

jpmmcneill
Sep 6, 2022

Overview

dbt has some notion of standardising sql across database adapters.

There is a pretty clear pattern of "multi adapter" macros being moved into dbt-core: #4813, #5265). There are also multi adapter materializations, which afaik have been present in dbt for as close to "forever" as is a reasonable definition.

I believe there is a great opportunity for dbt to handle and be opinionated on data types, offering a set of multi adapter data types and functions that would make dbt code more likely to run in multiple warehouses.

Example Use Case 1

Note
This isn't a super precise example (as snowflake aliases string as varchar), but this is a feature of snowflake and I'm sure I could find an example where the below code wouldn't work between two databases.

Let's imagine I have a model, example_model.sql, that converts a column from being an integer to being a string:

-- snowflake version

select
  id,
  cast(data as varchar) as data
from {{ ref('input_model') }}

The above would work in snowflake, but fail in bigquery:

-- bigquery version

select
  id,
  cast(data as string) as data
from {{ ref('input_model') }}

Adding a catalogue of data types as well as a dbt native caster would solve this:

-- generic version

select
  id,
  {{ dbt_cast('data', string) }} as data
from {{ ref('input_model') }}

I think that the ideal case is what I've outlined above, where string is not some text passed to jinja, but anobject rather than a text, meaning it could error out if it wasn't in the supported lists.

Example Use Case 2

Let's imagine I have a model, example_model.sql, that uses a column that is a key / value store:

-- snowflake version

select
  id,
  parse_json(data) as data
from {{ ref('input_model') }}

Again, the above would work in snowflake, but fail in bigquery:

-- bigquery version
select
  id,
  json_extract(data) as data
from {{ ref('input_model') }}

An alternative could be:

-- multi db version
select
  id,
  {{ json_from_string('data') }} as data
from {{ ref('input_model') }}

Where json_from_string is be a macro that is aware of both the dbt json and string types and will convert between these two with an adapter specific function. In principle, this function could error out if it was called by non string data types / enforce this - for example under the hood it could be (psuedocode):

{{ adapter_specific_function( dbt_cast('data', string) ) }}

ie. enforce that it's being called on a string type

Why should dbt do this

This would allow a dbt native way of having project code being much closer to be adapter independent. For example, this could allow a snowflake / bigquery user to run dbt in CI on duckDB without huge effort in many cases

The way I would envision this is having a list of "dbt supported" data types, that could each have a bunch of relevant functions (macros) associated with them for operations.

It also introduces the (in my opinion, compelling) of strengthening "what is a data type" in schema yaml (ie. make a more formal & on rails version of the current free text). This could possibly even be extended to exceptions being raised pre-compilation stage if data types don't match yaml.

However, not using this feature would still be completely possible.

How would I see / use this as a user?

TODO

jaypeedevlin · 2022-09-07T00:35:28Z

jaypeedevlin
Sep 7, 2022

This is of interest to me because I did some related work for dbt_artifacts to help with cross-db compatibility. I'm wondering if you're aware of the macros in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/utils/data_types.sql which seem like they do some of what you're asking. I also opened a related issue (which I plan to work on) #5739

1 reply

jpmmcneill Sep 7, 2022
Author

This is of interest to me because I did some related work for dbt_artifacts to help with cross-db compatibility. I'm wondering if you're aware of the macros in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/utils/data_types.sql which seem like they do some of what you're asking. I also opened a related issue (which I plan to work on) #5739

Hey @jaypeedevlin thanks for this! Definitely agreed that lots of the stuff for this exists independently. Part of this discussion is definitely "should we try to centralise & formalise what currently exists a little bit". For example, none of the documentation encourages users to actively use this notion. of the current

My personal take on the above is that while macros might exist at the moment, macros can be quite arbitrary & hard to put rails on.

For example - the translate_type method that these macros use under the hood won't error out if I make a typo etc. Removing the .get() in that method might be quite jarring for lots of users, and it's definitely reasonable to have this as being a 50% of the way to what I was describing above IMO.

https://github.com/dbt-labs/dbt-core/blob/a642b20abc3b729cb04fd0616db065c5d0c19868/core/dbt/adapters/base/column.py#L22:L24

I see a more on rails version of this as being especially useful for things like arrays, where there are many different flavours, that differ in subtle ways. To be honest, the use case here is probably more like "with official data types, you get more scope to support cross db macros.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should dbt know / have an opinion on data types (& related functions)? #5778

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Should dbt know / have an opinion on data types (& related functions)? #5778

jpmmcneill Sep 6, 2022

Overview

Example Use Case 1

Example Use Case 2

Why should dbt do this

How would I see / use this as a user?

Replies: 1 comment · 1 reply

jaypeedevlin Sep 7, 2022

jpmmcneill Sep 7, 2022 Author

jpmmcneill
Sep 6, 2022

Replies: 1 comment 1 reply

jaypeedevlin
Sep 7, 2022

jpmmcneill Sep 7, 2022
Author