[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

shrodingers · 2022-05-12T15:53:00Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When using postgres adapter with some varchar typed columns (like when using dbt_utils.type_string()), if you use incremental materialization with sync_new_columns or append_new_columns, or actually anything that will make dbt generate DDL statements, varchars max size defaults to 256 (handler in base/Column.py behaviour when char_size is None). This makes schema evolution impossible when having textual columns defined as {{ dbt_utils.type_string() }}

Expected Behavior

The original varchar type with no limit should remain as is and not be assigned a 256 default limit, because we don't know the size limit on a varchar without explicit limits. The schema evolution should be possible, with new columns not created as varchar(256)

Steps To Reproduce

Create an incremental model with on_schema_change: sync_all_columns
run the model for the first time
add a column whose type is {{ dbt_utils.type_string() }} and whose length exceeds 256 characters
re-run the model. It will fail because it adds a varchar(256) column

Relevant log output

Debug logs for the schema update : 
 "database"."schema"."table":
         Schema changed: True
         Source columns not in target: [<Column col1 (character varying(256))>, <Column col2 (character varying(256))>, <Column col3 (character varying(256))>]
         Target columns not in source: [<Column col4 (character varying(256))>]
         New column types: []
(Note that all these columns are typed are varchar initially not varchar(256))
Postgres columns data check :

select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from "production-data-warehouse".INFORMATION_SCHEMA.columns
      where table_name = '{{table}}'
        
        and table_schema = '{{schema}}'
        
      order by ordinal_position

Error: 
`value too long for type character varying(256)`

Environment

- OS: Any
- Python: 3.9-slim
- dbt: 1.0.0

What database are you using dbt with?

postgres

Additional Context

I already found out why it worked like that, and if validated will be happy to submit a PR (linked to the fact that the way postgres columns are retrieved have NULL info regarding varchar columns max_length, and thus when using it it falls back in default Column handler)
Just wondered if this shall be fixed in the PostgresColumn class (that already handles text columns this way), or if we shall just update the dbt_utils.string_type macro to use text instead of varchar for postgres

jtcohen6 · 2022-05-12T16:45:57Z

Just wondered if this shall be fixed in the PostgresColumn class (that already handles text columns this way), or if we shall just update the dbt_utils.string_type macro to use text instead of varchar for postgres

Yes! Very relevant: dbt-labs/dbt-utils#586

shrodingers · 2022-05-13T07:58:56Z

Seems really nice !
If i understand correctly, that PR means that it will shift from varchar to text for Postgres + Snowflake right ? But isn't it safer to still handle the varchar without limit in the PostgresColumn, tu avoid issues when explicitely using this type outside type macros ?

jtcohen6 · 2022-05-17T10:26:58Z

Ah, I see what you mean! In this logic:

dbt-core/core/dbt/adapters/base/column.py

Lines 85 to 93 in 0d8e061

    
           def string_size(self) -> int: 
        
               if not self.is_string(): 
        
                   raise RuntimeException("Called string_size() on non-string field!") 
        
               if self.dtype == "text" or self.char_size is None: 
        
                   # char_size should never be None. Handle it reasonably just in case 
        
                   return 256 
        
               else: 
        
                   return int(self.char_size)

You're saying, on Postgres, it actually should be allowed to remain None—or really 65535 (?), since we may want a not-none value here to support the comparison in can_expand_to. Does that sound right?

I think this may also be a place where the behavior differs between Postgres + Redshift. On Redshift, TEXT is just an alias for VARCHAR(256), and:

If you use the VARCHAR data type without a length specifier in a CREATE TABLE statement, the default length is 256. If used in an expression, the size of the output is determined using the input expression (up to 65535).

It's totally appropriate to implement different logic for dbt-postgres and dbt-redshift accordingly. Just noting it here because both currently use the base Column implementation.

shrodingers · 2022-05-17T13:42:19Z

As I could see Postgres already have a different way of handling TEXT columns and supersedes its parent in this case here :

dbt-core/plugins/postgres/dbt/adapters/postgres/relation.py

Lines 26 to 32 in 0d8e061

    
           class PostgresColumn(Column): 
        
               @property 
        
               def data_type(self): 
        
                   # on postgres, do not convert 'text' to 'varchar()' 
        
                   if self.dtype.lower() == "text": 
        
                       return self.dtype 
        
                   return super().data_type

Also, for postgres, max varchar size is 10485760 as stated here, but only when explicitely stated (if not, it's actually the limit for any column of 1Gb).

My guess is that it would be okay / accurate to also handle unlimited varchar the same way TEXT is handled in postgres, since i don't think the 256 default for postgres is the intended behaviour

I'll be glad to submit a pr for this point if it seems legitimate

jtcohen6 · 2022-05-17T19:29:29Z

You're so right!! Sorry I missed that. I think adding a new method to PostgresColumn is the way to go, and I'd welcome a PR for it

shrodingers added bug Something isn't working triage labels May 12, 2022

github-actions bot changed the title ~~[Bug] Postgres unlimited varchar default to varchar(256)~~ [CT-636] [Bug] Postgres unlimited varchar default to varchar(256) May 12, 2022

jtcohen6 added postgres Team:Adapters Issues designated for the adapter area of the code labels May 12, 2022

shrodingers mentioned this issue May 13, 2022

fix: postgres string type airbytehq/airbyte#12836

Merged

jtcohen6 removed the triage label May 17, 2022

jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label May 17, 2022

shrodingers mentioned this issue May 23, 2022

Fix postgres handling for unlimited varchars #5292

Merged

6 tasks

This was referenced Jun 1, 2022

Rewrite type_* macros to use built-in adapter capabilities dbt-labs/dbt-utils#598

Closed

Use built-in adapter functionality for datatypes dbt-labs/dbt-utils#586

Merged

McKnight-42 closed this as completed in #5292 Aug 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

shrodingers commented May 12, 2022 •

edited

Loading

jtcohen6 commented May 12, 2022

shrodingers commented May 13, 2022

jtcohen6 commented May 17, 2022 •

edited

Loading

shrodingers commented May 17, 2022

jtcohen6 commented May 17, 2022

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

Comments

shrodingers commented May 12, 2022 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

What database are you using dbt with?

Additional Context

jtcohen6 commented May 12, 2022

shrodingers commented May 13, 2022

jtcohen6 commented May 17, 2022 • edited Loading

shrodingers commented May 17, 2022

jtcohen6 commented May 17, 2022

shrodingers commented May 12, 2022 •

edited

Loading

jtcohen6 commented May 17, 2022 •

edited

Loading