Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalize quoting configs + properties #2986

Open
jtcohen6 opened this issue Dec 31, 2020 · 17 comments
Open

Rationalize quoting configs + properties #2986

jtcohen6 opened this issue Dec 31, 2020 · 17 comments
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 31, 2020

Describe the feature

Picks up from issues like #2468 and #2975, which are narrower in scope and offer more straightforward near-term fixes

  • Why do we call it quoting when configuring database/schema/identifier names, but then quote when describing properties of column names?
  • Why does each adapter have to implement its quoting character sort-of twice? (Adapter quoting should use self.Relation.quote_character #2243)
  • There's also quote_columns, which is a seed-only config item that lives on its own level but surely belongs inside the quoting config item (or will it be quote?!) as columns
  • Not to mention quoted, which while not itself a config, returns the quoted version of a column name from a Relation based on the configs above

Instead, we should have a single config/property, and I think it should be quote. This would take over from the current project-level quoting config:

quote:
  database: true|false    # or `project` on dbt-bigquery
  schema: true|false      # or `dataset` on dbt-bigquery
  identifier: true|false
  columns: true|false

The quote: {columns: true} would also replace quote_columns as a bespoke config for seeds. If that config is specified in dbt_project.yml, it can be superseded by:

  • setting quote: {} inside the config() block for a specific moel
  • quote: true|false set for a specific column in models/*.yml (it's implied that this really means quote: {column: true}
  • in a post-Set configs in schema.yml files #2401 world, a model can set its quote: {} config within models/*.yml, too

If quote is not set, it falls back to the default behavior of the adapter plugin, which also sets the character used for quoting (almost always " or `).

Questions

Here's what we have in the docs FAQs today for sources:

By default, dbt will not quote the database, schema, or identifier for the source tables that you've specified.

Should sources start respecting project-level quote settings? Or they continue to act independently, but we should enable turning this config-property on or off for all sources in dbt_project.yml:

sources:
  quote:
    schema: true

Describe alternatives you've considered

Retaining all of these configs/properties/adapter methods and documenting them exceptionally well so as to avoid confusion

Additional context

This isn't specific to any one database, though it is likely most helpful on databases that support special characters if quoted (Postgres, Redshift) or are particularly sensitive to quoting (Snowflake).

There's a round-up of all the known documentation related to quoting in dbt-labs/docs.getdbt.com#3518.

@jtcohen6 jtcohen6 added enhancement New feature or request 1.0.0 Issues related to the 1.0.0 release of dbt labels Dec 31, 2020
@leahwicz
Copy link
Contributor

  • Keep accepting the old way but don't advertise (don't break how users are currently using it)
  • Tricky- getting the inheritance to work
  • Related to the configs vs properties battle

@nathaniel-may
Copy link
Contributor

  • What is the exit criteria for this issue?
    • A new project-level config called "quote" that matches the behavior of the current quote and quoting configs.
    • It can be used in config blocks to override project-level quote settings in old or new style.
  • What are the high-level items of the work that need to be done (i.e. create x, split out y, etc.)
    • Add a new config to yaml parsing.
    • Add a new config to the internal config logic
    • Make sure overrides work (we probably don't get this for free.)
    • Testing
  • What are the open questions on this issue that still need answers?
    • Are is there anything this would do beyond what's expressible today or is it just ergonomic? A: Just ergonomic.
    • would we deprecate the old way of doing these things or just let them both ride? A: Not initially.
    • How do we want to handle sources? (question in ticket since they seem to work independently)
  • Are there blockers/prerequisites to starting this work?
    • I think it could be started.
    • It's related to configs vs property work so make sure those don't overlap. neat inheritance of these configs would need to take into account that some quoting stuff is dbt configs. some quoting stuff is dbt properties.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 11, 2022
@jtcohen6 jtcohen6 removed the stale Issues that have gone stale label May 11, 2022
@jtcohen6
Copy link
Contributor Author

I still care about this one :)

@adamcunnington-mlg
Copy link

I really care about this one!

All I want to do is ensure that dbt quotes all column names that I reference or create (via sql selects) and my only option right now is to explicitly define every single column in every single model. I've not actually tried doing that but I suspect that will only quote OUTPUT columns in a model, not columns that I select during my sql.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 14, 2024
@github-actions github-actions bot removed the stale Issues that have gone stale label Feb 27, 2024
@dbeatty10
Copy link
Contributor

Here's a couple other issues related to quoting (specifically about applying proper escaping prior to quoting):

@dbeatty10
Copy link
Contributor

@gwenwindflower
Copy link

this is in adapters now, but also adding that seeds are not consistently quoted and it would be cool if they were also a config option under the proposed quote config.

dbt-labs/dbt-adapters/issues/178

@dbeatty10
Copy link
Contributor

@alison985
Copy link

Adding ~quoting related issue for snapshots. #10356 cc: @jeremyyeo

@graciegoheen graciegoheen added the paper_cut A small change that impacts lots of users in their day-to-day label Oct 29, 2024
@t0momi219
Copy link

Hello, everyone. (@dbeatty10 Apologies if I mentioned the wrong person.)

I have encountered the following problem, and I believe it can be resolved with some changes outlined in this issue.

About the problem

When using a case-sensitive database like Snowflake, it becomes challenging to use a mix of ASCII-only model names and non-ASCII model names.

Example:

models
└  only_ascii_char.sql
└  non_ascii_chars.sql

To run the model "non_ascii_chars," the identifier needs to be enclosed in double quotes. The only current option is the project-level quoting parameter, so I configure it as follows:

# dbt_project.yml
quoting:
  identifier: true

However, this causes even the only_ascii_char model to have its identifier enclosed in quotes after compilation, which means that in Snowflake, the table can only be used in lowercase:

create or replace table database.schema."only_ascii_char" ...

When attempting to avoid this by using an alias, a different issue arises:

# non_ascii_chars.yml
models:
  - name: non_ascii_chars
    alias: '"non_ascii_chars"'

At runtime, this results in an error due to ambiguous model name detection:

Compilation Error in model non_ascii_chars (models/non_ascii_chars.sql)
  When searching for a relation, dbt found an approximate match. Instead of gussing
  which relation to use, dbt will move on. Please delete database.schema."non_ascii_chars", or rename it to be less ambiguous.
  Searched for :  database.schema."non_ascii_chars"
  Found: "database"."schema"."non_ascii_chars"

For people like myself who use a non-ASCII native language, it’s important to provide models in their business language (i.e., the ubiquitous language). I would like to contribute to adding this feature! Where would be the best place to start tackling this issue?

As outlined in this linked issue, is it necessary to consolidate or organize the quote options?
dbt-labs/docs.getdbt.com#3518

(I have also heard that non-ASCII characters cannot be used in unit tests. That may also be related to this one, however, I believe this should be addressed in a separate issue, so I will create a new one for that.)

@dbeatty10
Copy link
Contributor

Thanks for reaching out and providing such a nice example @t0momi219 !

It sounds like your goal has two parts:

  1. Be able to write queries in Snowflake that include double quotes for the ubiquitous language (that contains non-ASCII characters):

    select * from database.target.schema."注文"
  2. Be able to write queries in Snowflake that do not include double quotes for other models (that only contain ASCII characters):

    select * from database.schema.lowercase_ascii

So you're seeking a way to configure quoting identifier: true | false on a per-model basis rather than a project-wide basis.

Does that sound correct?

@t0momi219
Copy link

Hi @dbeatty10,

you're seeking a way to configure quoting identifier: true | false on a per-model basis rather than a project-wide basis.

Exactly. Being able to override the quoting setting specified at the project level on a per-model basis would increase flexibility and be ideal.

@dbeatty10
Copy link
Contributor

From @jtcohen6:

IIRC - the reason why this hasn't been supported historically is that it makes operating on the relational cache very tricky, because dbt would need cache entries & lookups to be case-sensitive depending on each object’s quoting config (rather than on/off overall for all cached relation names).

@t0momi219 Thanks for you interest in working on this! This feature is tricky enough that we'd need to invest a significant amount of time and effort regardless if a community member works on it or if we do it ourselves (see below for some details). In those cases, we'd want to do it ourselves rather than accept community contributions. Unfortunately, this isn't a priority for us in the near-term, so we don't plan on tackling it anytime soon.

@dbeatty10 dbeatty10 removed the triage label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day
Projects
None yet
Development

No branches or pull requests

9 participants