Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposures command references wrong models due to same alias #282

Closed
jefersonmsantos opened this issue Oct 9, 2024 · 14 comments · Fixed by #283
Closed

Exposures command references wrong models due to same alias #282

jefersonmsantos opened this issue Oct 9, 2024 · 14 comments · Fixed by #283
Labels
bug Something isn't working

Comments

@jefersonmsantos
Copy link

We use dbt alias on our dbt project. For example, below model has name finance__subscriptions but its alias is subscriptions. The table created on the warehouse will be subscriptions and Metabase will read this table with name subscriptions.

{{ config(
   alias='subscriptions',
   materialized = 'table',
  )
}}

When we use command dbt-metabase exposures, charts / dashboards / queries using this model will be identified with depends on by the name of the table on Metabase, that is subscriptions.

However, models are loaded from manifest_file on _exposures.py by the name of the model, that is finance__subscriptions:

models = self.manifest.read_models()

ctx = self.__Context(
    model_refs={m.name.lower(): m.ref for m in models if m.ref},
    table_names={t["id"]: t["name"] for t in self.metabase.get_tables()},
)

This way, they will not match and this finance__subscription model will not be referenced as a dependency. Also, we have a source with name subscriptions, that matches the alias of the other model. So, on this situation, exposures are referencing a source instead of the correct model.

Is there anyway to reference not only table_name, but also schemas to avoid these mistakes? Is there anyway to consider alias instead of names?

@gouline
Copy link
Owner

gouline commented Oct 9, 2024

Make a minimal example (e.g. only 2 models), specifying your expected and actual behaviour, then I'll have a look.

@jefersonmsantos
Copy link
Author

jefersonmsantos commented Oct 11, 2024

Environment

dbt-core==1.8.4
dbt-bigquery==1.8.2
dbt-metabase==1.3.4

Consider a dbt project with this structure, using Bigquery adapter:

image

Contents of stg_stripe__subscriptions is:

{{ config(
   alias='stg__subscriptions',
   materialized = 'table',
)
}}

select *
from {{ source('stripe','subscriptions') }}

Contents of finance__subscriptions is:

{{ config(
   alias='suscriptions',
   materialized = 'table',
)
}}

select *
from {{ ref('stg_stripe__subscriptions') }}

Contents of _stripe_sources is:

sources:
  - name: stripe
        dataset: raw_stripe
        description: Incoming Stripe data
        loader: airbyte -> gcs
        tables:
            - name: subscriptions

Schema configuration for these datasets on project.yml is:

staging:
      stripe:
        +schema: stg_stripe

marts:
      finance:
        +analytics_finance

This will produce this dataset.table on bigquery:
image

Create a subscriptions report on metabase named Finance Subscriptions, reading from analytics_finance.subscriptions
image

Actual Behavior

When run dbt-metabase exposures, exposures.yml generated has content, with depends_on = source('stripe', 'subscriptions'):

version: 2
exposures:
  - name: finance_subscriptions
    label: Finance Subscriptions
    description: '### Visualization: Table


      No description provided in Metabase


      #### Metadata


      Metabase ID: __1__


      Created On: __2024-10-11T17:21:00.719718Z__'
    type: analysis
    url: http://localhost:9090/card/1
    maturity: medium
    depends_on:
      - source('stripe', 'subscriptions')

Expected Behavior

We expect the dependency for Finance Subscriptions dashboard to be ref('stg_stripe__subscriptions'):

version: 2
exposures:
  - name: finance_subscriptions
    label: Finance Subscriptions
    description: '### Visualization: Table


      No description provided in Metabase


      #### Metadata


      Metabase ID: __1__


      Created On: __2024-10-11T17:21:00.719718Z__'
    type: analysis
    url: http://localhost:9090/card/1
    maturity: medium
    depends_on:
      - ref('stg_stripe__subscriptions')

@gouline gouline added the bug Something isn't working label Oct 12, 2024
@gouline gouline changed the title [BUG] - Exposures command references wrong models due to same alias Exposures command references wrong models due to same alias Oct 12, 2024
@gouline
Copy link
Owner

gouline commented Oct 14, 2024

As you noticed in #286 (comment), clashing table names are not supported. Dependencies are manually parsed from Metabase cards, including native SQL queries, and there's no way to distinguish which table select * from subscriptions is referring to.

Here's the code, if you're interested

def __extract_card_exposures(
self,
ctx: __Context,
card: Optional[Mapping],
) -> Mapping:
"""Extracts exposures from Metabase questions."""
depends = set()
native_query = ""
if card:
query = card.get("dataset_query", {})
if query.get("type") == "query":
# Metabase GUI derived query
query_source = query.get("query", {}).get(
"source-table", card.get("table_id")
)
if str(query_source).startswith("card__"):
# Handle questions based on other questions
depends.update(
self.__extract_card_exposures(
ctx,
card=self.metabase.find_card(
uid=query_source.split("__")[-1]
),
)["depends"]
)
elif query_source in ctx.table_names:
# Normal question
source_table = ctx.table_names.get(query_source)
if source_table:
source_table = source_table.lower()
_logger.info("Extracted model '%s' from card", source_table)
depends.add(source_table)
# Find models exposed through joins
for join in query.get("query", {}).get("joins", []):
join_source = join.get("source-table")
if str(join_source).startswith("card__"):
# Handle questions based on other questions
depends.update(
self.__extract_card_exposures(
ctx,
card=self.metabase.find_card(
uid=join_source.split("__")[-1]
),
)["depends"]
)
continue
# Joined model parsed
joined_table = ctx.table_names.get(join_source)
if joined_table:
joined_table = joined_table.lower()
_logger.info("Extracted model '%s' from join", joined_table)
depends.add(joined_table)
elif query.get("type") == "native":
# Metabase native query
native_query = query["native"].get("query")
ctes: MutableSequence[str] = []
# Parse common table expressions for exclusion
for matched_cte in re.findall(_CTE_PARSER, native_query):
ctes.extend(group.lower() for group in matched_cte if group)
# Parse SQL for exposures through FROM or JOIN clauses
for sql_ref in re.findall(_EXPOSURE_PARSER, native_query):
# Grab just the table / model name
parsed_model = sql_ref.split(".")[-1].strip('"').lower()
# Scrub CTEs (qualified sql_refs can not reference CTEs)
if parsed_model in ctes and "." not in sql_ref:
continue
# Verify this is one of our parsed refable models so exposures dont break the DAG
if not ctx.model_refs.get(parsed_model):
continue
if parsed_model:
_logger.info(
"Extracted model '%s' from native query", parsed_model
)
depends.add(parsed_model)
return {
"depends": depends,
"native_query": native_query,
}

@gouline
Copy link
Owner

gouline commented Oct 16, 2024

I'm experimenting with an enhancement to allow clashing table names in #287.

@jefersonmsantos Can you try installing from this branch and test it out with your project?

pip install git+ssh://[email protected]/gouline/dbt-metabase.git@fully-qualified-exposure-deps

@jefersonmsantos
Copy link
Author

jefersonmsantos commented Oct 16, 2024

@gouline With this branch, all depends_on field on the exposures.yml generated are empty:

version: 2
exposures:
  - name: finance_subscriptions
    label: Finance Subscriptions
    description: '### Visualization: Table


      No description provided in Metabase


      #### Metadata


      Metabase ID: __1__


      Created On: __2024-10-11T17:21:00.719718Z__'
    type: analysis
    url: http://localhost:9090/card/1
    maturity: medium
    depends_on: []

@gouline
Copy link
Owner

gouline commented Oct 16, 2024

Try reinstalling and running again, you should have a depends.yaml file appear in the same directory as your exposures. Post any relevant elements from models and depends here to figure out what doesn't match (e.g. database or schema).

For example:

models:
  dbtmetabase.public.stg_payments: ref('stg_payments')
depends:
  - dbtmetabase.public.stg_payments

I haven't used dbt with BigQuery, not sure what it would use for database and schema in the manifest (possibly project and dataset). Depending on that, I will need to see how to access it in Metabase API.

@1210yuichi0
Copy link

1210yuichi0 commented Dec 6, 2024

@gouline
I have confirmed the functionality of the fully-qualified-exposure-deps repository, and it produced the expected results.

Environment:

dbt-bigquery==1.8.2
dbt-core==1.8.4

Example: payments.sql

{{ config(alias='transactions') }}

select * from {{ ref('stg_payments') }}

Generated depends.yaml

models:
  my_project.work.transactions: ref('payments')
  my_project.work.customers: ref('customers')
  my_project.work.orders: ref('orders')
  my_project.work.stg_customers: ref('stg_customers')
  my_project.work.stg_payments: ref('stg_payments')
  my_project.work.stg_orders: ref('stg_orders')
depends:
  - my_project.work.transactions

Generated exposures.yml

version: 2
exposures:
  - name: test
    label: test
    description: "### Visualization: Table\n\nNo description provided in Metabase\n\
      \n#### Query\n\n```\n  SELECT *\n  FROM work.transactions\n```\n\n#### Metadata\n\
      \nMetabase ID: __xxx__\n\nCreated On: __2024-11-30T14:34:53.501679Z__"
    type: analysis
    url: xxxxxxx
    maturity: medium
    owner:
      name: xxxxxxx
      email: xxxxxxx
    depends_on:
      - ref('payments')
    meta:
      average_query_time: '0:03.428'
      last_used_at: '2024-12-06T03:45:29.091341Z'

@gouline
Copy link
Owner

gouline commented Dec 6, 2024

Thanks for confirming! I'm nervous about deploying this globally, so I will either provide it as a feature flag or fall back to it whenever conflicting model names are detected.

@1210yuichi0
Copy link

Thank you ! Since I always use alias_name, it would be incredibly helpful if this functionality could be supported.

@gouline
Copy link
Owner

gouline commented Dec 14, 2024

@yuichi-github-dev Give this another go before I release (I don't have access to BigQuery for testing):

pip install git+ssh://[email protected]/gouline/dbt-metabase.git@d45c0df

@1210yuichi0
Copy link

@gouline
I tested this with a BigQuery + Metabase environment, but the depends_on values could not be retrieved with this version.

Previous version

dbt-bigquery              1.8.0
dbt-core                  1.8.3
dbt-metabase              1.3.9.dev3+ga2b0617
version: 2
exposures:
  - name: test
    label: test
    description: "### Visualization: Table\n\nNo description provided in Metabase\n\
      \n#### Query\n\n```\n  SELECT *\n  FROM work.transactions\n```\n\n#### Metadata\n\
      \nMetabase ID: __218__\n\nCreated On: __2024-11-30T14:34:53.501679Z__"
    type: analysis
    url: xxxxxxx
    maturity: medium
    owner:
      name: xxxxxxx
      email: xxxxxxx
    depends_on:
      - ref('payments')
    meta:
      average_query_time: '0:02.805'
      last_used_at: '2024-12-16T00:41:28.263506Z'

Current version

dbt-bigquery              1.8.0
dbt-core                  1.8.3
dbt-metabase              1.3.12.dev6+gd45c0df
version: 2
exposures:
  - name: test
    label: test
    description: "### Visualization: Table\n\nNo description provided in Metabase\n\
      \n#### Query\n\n```\n  SELECT *\n  FROM work.transactions\n```\n\n#### Metadata\n\
      \nMetabase ID: __218__\n\nCreated On: __2024-11-30T14:34:53.501679Z__"
    type: analysis
    url: xxxxxxx
    maturity: medium
    owner:
      name: xxxxxxx
      email: xxxxxxx
    depends_on: []
    meta:
      average_query_time: '0:02.805'
      last_used_at: '2024-12-16T00:41:28.263506Z'

@gouline
Copy link
Owner

gouline commented Dec 16, 2024

Thanks for checking, this should hopefully resolve it:

pip install git+ssh://[email protected]/gouline/dbt-metabase.git@f6a1fa3

If that doesn't work, can you please go to http://<your-metabase-url>/api/database and post the details clause for your BigQuery database (masking any confidential values).

@1210yuichi0
Copy link

1210yuichi0 commented Dec 16, 2024

@gouline
Thank you! The updated version worked as expected.

Here are the details of my current setup:

dbt-core                  1.8.3
dbt-bigquery              1.8.0
dbt-metabase              1.3.12.dev7+gf6a1fa3

Generated exposure.yml :

version: 2
exposures:
  - name: test
    label: test
    description: "### Visualization: Table\n\nNo description provided in Metabase\n\
      \n#### Query\n\n```\n  SELECT *\n  FROM work.transactions\n```\n\n#### Metadata\n\
      \nMetabase ID: __218__\n\nCreated On: __2024-11-30T14:34:53.501679Z__"
    type: analysis
    url: xxxxxxx
    maturity: medium
    owner:
      name: xxxxxxx
      email: xxxxxxx
    depends_on:
      - ref('payments')
    meta:
      average_query_time: '0:02.805'
      last_used_at: '2024-12-16T00:41:28.263506Z'

@gouline
Copy link
Owner

gouline commented Dec 16, 2024

Great, released in 1.4.0!

Tests only cover PostgreSQL and we confirmed BigQuery here, but it should theoretically work with MySQL, Snowflake, Redshift, Databricks, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants