-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve the case of schemas and databases when listing relations (#2403) #2411
Conversation
efc3ddc
to
12e5567
Compare
12e5567
to
7c99b76
Compare
7c99b76
to
bd83126
Compare
The argument is a Relation object with no identifier field, configured with the appropriate quoting information Unique quoted/unquoted representations will be treated as distinct The logic for generating what schemas to search for relations is now distinct from the catalog search logic. Schema creation/dropping takes a similar relation argument Add tests
bd83126
to
e392212
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment here around community plugins (ie. Spark), but otherwise this LGTM. Let me know what you think about that comment and if there's anything we should do differently here. If not, this is approved on my end
self.cache.update_schemas(schema_map.schemas_searched()) | ||
cache_update: Set[Tuple[str, Optional[str]]] = set() | ||
for relation in cache_schemas: | ||
if relation.database is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to mention here that some plugins (namely Spark) use database as an alias for schema, and only have the one level of hierarchy. I think it's ok to add this logic here -- we can override it in a plugin if we need to -- but the idea that all relations will have a database and schema is not quite so invariant from dbt Core's perspective as this code indicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark will already need to be updated, as it overrides _get_cache_schemas
and list_relations_without_caching
.
I think this change will actually make it easier for spark to handle it: it sets database = schema
but the include policy for relations is set to database=False
- there's probably a couple rough edges to clear up, but my goal is that spark's default include policy + behavior will play pretty nicely with this and we can delete the create/drop schema implementations, at least.
Co-authored-by: Drew Banin <[email protected]>
resolves #2403
Description
There's hopefully nothing too wild or controversial in here - this is a substantial change, but it should fix some correctness issues in how dbt lists schemas. I think in the long run this will be more pleasant for adapter authors than having to figure out how to render the database/schema in each of those, and just defer it to the relations.
Change
list_relations_without_caching
(both the macro and the method) to take a single argument, aRelation
with noidentifier
andidentifier
set to not include. Macros can use this to get a database-specific representation of the relation. The logic around creating missing schemas was updated similarly - dbt will faithfully issue "create schema" statements for whatever you give it, rather than coercing everything to lowercase and wrapping it in the given quoting.In the process and for the same reasons, I updated
drop_schema
andcreate_schema
to take the same kind of argument.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.