Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change database name for the context. #229

Closed
wants to merge 386 commits into from

Conversation

EminUZUN
Copy link

@EminUZUN EminUZUN commented Feb 14, 2023

Fix for the issue:
#228 Database name is not getting changed in each run. So this fix is getting desired database for catalog_name = '{{ database }}'

Overview

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • README.md updated and added information about my change
  • I have run changie new to create a changelog entry

hovaesco and others added 30 commits October 7, 2021 10:42
…ental-append

Add support for incremental models
dbt 0.21.0 introduced changes which dropped basic_load_csv_rows dbt-labs/dbt-core#3623. Now it's able to use the default macro default__load_csv_rows
…emental-overwrite

Document insert overwrite strategy for hive incremental models
…current_timestamp

Add support for dbt `current_timestamp()` macro
…rge-csv

Change default batch size to 1000
damian3031 and others added 14 commits January 30, 2023 16:36
Bumps [mypy](https://github.com/python/mypy) from 0.991 to 1.0.0.
- [Release notes](https://github.com/python/mypy/releases)
- [Commits](python/mypy@v0.991...v1.0.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
set the string_size of an unbound varchar to max length of a varchar
add some unit tests to cover the bound and unbound varchar parsing
Fix for the issue:
starburstdata#228
Database name is not getting changed in each run. So this fix is getting desired database for catalog_name = '{{ database }}'
@mdesmet
Copy link
Member

mdesmet commented Feb 14, 2023

The catalog name in information_schema is implicitly selected through the dbt profile.

Sources can be in the same catalog or any other catalog. I don't really see how this code would solve that.

@EminUZUN
Copy link
Author

information_schema is implictly selected yes. but database is not setted in the session. so it sees "target" database only.
Whenever you change database in session. sources file in catalog.json.
You can see below results when you generate docs with debug option.

This is result when database is not explictly set:

with tables as (
            select
                table_catalog as "table_database",
                table_schema as "table_schema",
                table_name as "table_name",
                table_type as "table_type",
                null as "table_owner"

            from source_db.INFORMATION_SCHEMA.tables
            where
                table_schema != 'information_schema'
                and
                table_schema in ('public')

        ),

        columns as (

            select
                table_catalog as "table_database",
                table_schema as "table_schema",
                table_name as "table_name",

                column_name as "column_name",
                ordinal_position as "column_index",
                data_type as "column_type",
                comment as "column_comment"

            from source_db.INFORMATION_SCHEMA.columns
            where
                table_schema != 'information_schema'
                and
                table_schema in ('public')

        ),

        table_comment as (

            select
                catalog_name as "table_database",
                schema_name as "table_schema",
                table_name as "table_name",
                comment as "table_comment"

            from system.metadata.table_comments
            where
                catalog_name = 'target_db' --database name set wrong.
                and
                schema_name != 'information_schema'
                and
                schema_name in ('public')
        )

        select *
        from tables
        join columns using ("table_database", "table_schema", "table_name")
        join table_comment using ("table_database", "table_schema", "table_name")
        order by "column_index"

This is result when database implictly set:

  with tables as (

            select
                table_catalog as "table_database",
                table_schema as "table_schema",
                table_name as "table_name",
                table_type as "table_type",
                null as "table_owner"

            from source_db.INFORMATION_SCHEMA.tables
            where
                table_schema != 'information_schema'
                and
                table_schema in ('public')

        ),

        columns as (

            select
                table_catalog as "table_database",
                table_schema as "table_schema",
                table_name as "table_name",

                column_name as "column_name",
                ordinal_position as "column_index",
                data_type as "column_type",
                comment as "column_comment"

            from source_db.INFORMATION_SCHEMA.columns
            where
                table_schema != 'information_schema'
                and
                table_schema in ('public')

        ),

        table_comment as (

            select
                catalog_name as "table_database",
                schema_name as "table_schema",
                table_name as "table_name",
                comment as "table_comment"

            from system.metadata.table_comments
            where
                catalog_name = 'source_db' --database name set correctly.
                and
                schema_name != 'information_schema'
                and
                schema_name in ('public')
        )

        select *
        from tables
        join columns using ("table_database", "table_schema", "table_name")
        join table_comment using ("table_database", "table_schema", "table_name")
        order by "column_index"

@mdesmet
Copy link
Member

mdesmet commented Feb 14, 2023

What do you mean by

Whenever you change database in session

The database or catalog is set in the dbt profile and is static.

@mdesmet
Copy link
Member

mdesmet commented Feb 14, 2023

@@ -1,5 +1,6 @@
{% macro trino__get_catalog(information_schema, schemas) -%}
{%- call statement('catalog', fetch_result=True) -%}
{% set database = information_schema.database %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please inline this expression in the query itself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one query in file. Where should I move it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do it in line 51 (or line 50 in original file) catalog_name = '{{ information_schema.database }}'

@mdesmet
Copy link
Member

mdesmet commented Feb 14, 2023

@EminUZUN : Please also add a changelog entry.

@@ -0,0 +1,7 @@
kind: Fixes
body: 'Set database name in catalog.sql for correctly filtering table_comments '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
body: 'Set database name in catalog.sql for correctly filtering table_comments '
body: 'Include comments of sources or models from other catalogs in catalog.json artifact'

@@ -1,5 +1,6 @@
{% macro trino__get_catalog(information_schema, schemas) -%}
{%- call statement('catalog', fetch_result=True) -%}
{% set database = information_schema.database %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do it in line 51 (or line 50 in original file) catalog_name = '{{ information_schema.database }}'

@mdesmet
Copy link
Member

mdesmet commented Feb 14, 2023

@EminUZUN : Can you also have a look at the tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.