Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

Closed
1 task done
shrodingers opened this issue May 12, 2022 · 5 comments · Fixed by #5292
Closed
1 task done

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) #5238

shrodingers opened this issue May 12, 2022 · 5 comments · Fixed by #5292
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! postgres Team:Adapters Issues designated for the adapter area of the code

Comments

@shrodingers
Copy link
Contributor

shrodingers commented May 12, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When using postgres adapter with some varchar typed columns (like when using dbt_utils.type_string()), if you use incremental materialization with sync_new_columns or append_new_columns, or actually anything that will make dbt generate DDL statements, varchars max size defaults to 256 (handler in base/Column.py behaviour when char_size is None). This makes schema evolution impossible when having textual columns defined as {{ dbt_utils.type_string() }}

Expected Behavior

The original varchar type with no limit should remain as is and not be assigned a 256 default limit, because we don't know the size limit on a varchar without explicit limits. The schema evolution should be possible, with new columns not created as varchar(256)

Steps To Reproduce

  1. Create an incremental model with on_schema_change: sync_all_columns
  2. run the model for the first time
  3. add a column whose type is {{ dbt_utils.type_string() }} and whose length exceeds 256 characters
  4. re-run the model. It will fail because it adds a varchar(256) column

Relevant log output

Debug logs for the schema update : 
 "database"."schema"."table":
         Schema changed: True
         Source columns not in target: [<Column col1 (character varying(256))>, <Column col2 (character varying(256))>, <Column col3 (character varying(256))>]
         Target columns not in source: [<Column col4 (character varying(256))>]
         New column types: []
(Note that all these columns are typed are varchar initially not varchar(256))
Postgres columns data check : 
select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from "production-data-warehouse".INFORMATION_SCHEMA.columns
      where table_name = '{{table}}'
        
        and table_schema = '{{schema}}'
        
      order by ordinal_position
Error: 
`value too long for type character varying(256)`

Environment

- OS: Any
- Python: 3.9-slim
- dbt: 1.0.0

What database are you using dbt with?

postgres

Additional Context

I already found out why it worked like that, and if validated will be happy to submit a PR (linked to the fact that the way postgres columns are retrieved have NULL info regarding varchar columns max_length, and thus when using it it falls back in default Column handler)
Just wondered if this shall be fixed in the PostgresColumn class (that already handles text columns this way), or if we shall just update the dbt_utils.string_type macro to use text instead of varchar for postgres

@shrodingers shrodingers added bug Something isn't working triage labels May 12, 2022
@github-actions github-actions bot changed the title [Bug] Postgres unlimited varchar default to varchar(256) [CT-636] [Bug] Postgres unlimited varchar default to varchar(256) May 12, 2022
@jtcohen6 jtcohen6 added postgres Team:Adapters Issues designated for the adapter area of the code labels May 12, 2022
@jtcohen6
Copy link
Contributor

Just wondered if this shall be fixed in the PostgresColumn class (that already handles text columns this way), or if we shall just update the dbt_utils.string_type macro to use text instead of varchar for postgres

Yes! Very relevant: dbt-labs/dbt-utils#586

@shrodingers
Copy link
Contributor Author

Seems really nice !
If i understand correctly, that PR means that it will shift from varchar to text for Postgres + Snowflake right ? But isn't it safer to still handle the varchar without limit in the PostgresColumn, tu avoid issues when explicitely using this type outside type macros ?

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 17, 2022

Ah, I see what you mean! In this logic:

def string_size(self) -> int:
if not self.is_string():
raise RuntimeException("Called string_size() on non-string field!")
if self.dtype == "text" or self.char_size is None:
# char_size should never be None. Handle it reasonably just in case
return 256
else:
return int(self.char_size)

You're saying, on Postgres, it actually should be allowed to remain None—or really 65535 (?), since we may want a not-none value here to support the comparison in can_expand_to. Does that sound right?

I think this may also be a place where the behavior differs between Postgres + Redshift. On Redshift, TEXT is just an alias for VARCHAR(256), and:

If you use the VARCHAR data type without a length specifier in a CREATE TABLE statement, the default length is 256. If used in an expression, the size of the output is determined using the input expression (up to 65535).

It's totally appropriate to implement different logic for dbt-postgres and dbt-redshift accordingly. Just noting it here because both currently use the base Column implementation.

@jtcohen6 jtcohen6 removed the triage label May 17, 2022
@shrodingers
Copy link
Contributor Author

As I could see Postgres already have a different way of handling TEXT columns and supersedes its parent in this case here :

class PostgresColumn(Column):
@property
def data_type(self):
# on postgres, do not convert 'text' to 'varchar()'
if self.dtype.lower() == "text":
return self.dtype
return super().data_type

Also, for postgres, max varchar size is 10485760 as stated here, but only when explicitely stated (if not, it's actually the limit for any column of 1Gb).

My guess is that it would be okay / accurate to also handle unlimited varchar the same way TEXT is handled in postgres, since i don't think the 256 default for postgres is the intended behaviour

I'll be glad to submit a pr for this point if it seems legitimate

@jtcohen6
Copy link
Contributor

You're so right!! Sorry I missed that. I think adding a new method to PostgresColumn is the way to go, and I'd welcome a PR for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! postgres Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants