Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the table comment when creating the table #317

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Jun 30, 2023

This avoids a schema change in Iceberg when dbt does an:

comment on table sandbox.dbt_trino.rides is 'Combined table of NYC Taxi rides with the locations'

Because the comment is already there.

Overview

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • README.md updated and added information about my change
  • I have run changie new to create a changelog entry

@Fokko Fokko force-pushed the fd-set-the-comment-when-creating-the-table branch from db7eb96 to da19fab Compare June 30, 2023 10:19
@hovaesco
Copy link
Contributor

Can you add a test or modifying existing one for that ?

@Fokko
Copy link
Contributor Author

Fokko commented Jun 30, 2023

@hovaesco This is already tested by TestPersistDocsTable

@hovaesco
Copy link
Contributor

hovaesco commented Jul 3, 2023

@Fokko please squash the commits

@Fokko Fokko force-pushed the fd-set-the-comment-when-creating-the-table branch from 25e3d7a to df6e78f Compare July 3, 2023 08:50
This avoids a schema change in Iceberg when dbt does an:
```sql
comment on table sandbox.dbt_trino.rides is 'Combined table of NYC Taxi rides with the locations'
```
@Fokko Fokko force-pushed the fd-set-the-comment-when-creating-the-table branch from df6e78f to eb66406 Compare July 4, 2023 07:21
@damian3031
Copy link
Member

@Fokko could you explain a little bit more why this change is needed? What is wrong with current behaviour?

@Fokko
Copy link
Contributor Author

Fokko commented Jul 4, 2023

@damian3031 Certainly. In Iceberg we keep track of the changes to a schema. If you set the table comment:

{{ config(
  persist_docs={"relation": true},
  file_format="iceberg"
) }}

This will first create the table:

create table sandbox.dbt_trino.rides__dbt_tmp
      comment 'Combined table of NYC Taxi rides with the locations'

And then later it will do an:

comment on table sandbox.dbt_trino.rides is 'Combined table of NYC Taxi rides with the locations'

With this change, the comment will be set on table creation. When the COMMENT ON is being executed, it is actually a no-op for Iceberg, because nothing changes since the comment is already there. This avoids updating the schema on the Iceberg side (and also increases performance, because it doesn't have to write the updated metadata json).

I'm hesitant to remove the comment on because if you change the comment, and you use an incremental strategy, it would need to detect that the comment has changed. Unfortunately, the comment is not part of the dbt relation object, so I think it is best to keep the redundant COMMENT ON in there for now.

@damian3031 damian3031 self-requested a review July 4, 2023 12:30
@damian3031 damian3031 merged commit d8ac8a5 into starburstdata:master Jul 4, 2023
@damian3031 damian3031 mentioned this pull request Jul 5, 2023
1 task
@Fokko Fokko deleted the fd-set-the-comment-when-creating-the-table branch August 10, 2023 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants