Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2273] [Feature] Allowing ref (or similar) for analysis files #7127

Closed
3 tasks done
friendofasquid opened this issue Mar 6, 2023 · 6 comments
Closed
3 tasks done
Labels
awaiting_response enhancement New feature or request stale Issues that have gone stale

Comments

@friendofasquid
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Sometimes, we want to layer one analysis on top of another. In that case, it makes sense to allow a reference to a model file, essentially treating them like an ephemeral materialization.

Describe alternatives you've considered

  • Moving analysis to an ephemeral model. This works fine, but the intention is to run an analysis, not create a model.

Who will this benefit?

Anyone who uses analysis folder within dbt.

Are you interested in contributing this feature?

No

Anything else?

No response

@friendofasquid friendofasquid added enhancement New feature or request triage labels Mar 6, 2023
@github-actions github-actions bot changed the title [Feature] Allowing ref (or similar) for analysis files [CT-2273] [Feature] Allowing ref (or similar) for analysis files Mar 6, 2023
@dbeatty10 dbeatty10 self-assigned this Mar 6, 2023
@dbeatty10
Copy link
Contributor

Thanks for raising this idea @friendofasquid !

Follow-up question

Supposing it is possible to reference an analysis file like {{ analysis('my_analysis') }}, what might a simple concrete usage look like?

Another alternative

Added another idea to the bottom to your list of alternatives:

  • move the analysis code to an ephemeral model
  • move the analysis code to a macro

@dbeatty10 dbeatty10 removed their assignment Mar 6, 2023
@NumberPiOso
Copy link

In reverse etl tools we want to use analysis (specific subsets of data) to be uploaded into different systems.

Making analysis "refeable" would allows us to document exposures easily.

model -> analysis -> exposure

@dbeatty10
Copy link
Contributor

As this issue points out, one big difference between ephemeral models and analyses is that the first is refable and the second is not.

Additional differences (that are not relevent for this discussion per se):

  • models can have tests whereas analyses can not
  • ephemeral models will be translated to a CTE (with statement) when it is ref'd within other models

Summary

After playing around a bit and researching the history of analyses and ephemeral models, it feels like they are relatively close to being isomorphic. So I'm attracted to the idea of effectively (or actually) promoting analyses to be an alias of the ephemeral materialization. Doing so would allow them to be referenced (as well as have tests defined).

Whether or not we ultimately go that direction, it appears to me like the desired functionality is already effectively possible today, and it isn't too hard to convert an existing project to get the behavior akin to the proposed feature.

If you haven't already done something similar, wanna give the instructions below a shot and provide feedback on what you do/don't like about it?

An approach that works today

@friendofasquid mentioned the alternative of moving analysis to an ephemeral model so it can be used in a ref. At least in the near-term, this feels like the way to go!

Playing around a bit, it appears relatively simple to upgrade all your analyses so they can be ref'd:

  1. Move your analyses folder to be inside your models folder
  2. Configure the default materialization of ephemeral for that folder within dbt_project.yml
  3. Update the YAML configuration of analyses to be models instead

Differences

The most crucial differences are that the logic in these files can now be used in a ref and it can also have tests.

Two other differences that I'm seeing when converting analyses to refable analyses (aka ephemeral models):

  • when doing dbt compile, different output subfolder within the target directory
  • when doing dbt docs serve, models subfolder rather than analyses subfolder

Other differences are surely present within the manifest at target/manifest.json, but I didn't examine them.

Example

Thanks for your use-case @NumberPiOso !

I've got an example repo here with step-by-step instructions of converting everything within an analyses folder so that it can be used as a ref within an exposure.

The final result shows the exposure depending on my_even_ids which was formerly an analyis:

image

Details

Suppose you have a dbt_project.yml file like this:

name: "my_dbt_project"
version: "1.0.0"
config-version: 2
profile: "sandcastle-duckdb"

Then you add this to the end of dbt_project.yml:

models:
  my_dbt_project:
      refable_analyses:
        materialized: ephemeral

And finally you move & rename your analyses folder to models/refable_analyses.

Note: if you previously had an analyses/_analysis.yml file, you'll want to update analyses: to models: within the YAML once you've moved in into the models subdirectory.

Now each analysis can be referenced within an exposure like this:

models/_exposures.yml

version: 2

exposures:
  - name: my_even_dashboard
    description: My dashboard
    type: dashboard
    owner:
      name: Somebody Somewhere
      email: [email protected]
    
    depends_on:
      - ref('my_even_ids')"

Interested to hear your feedback!

@dbeatty10 dbeatty10 removed their assignment Mar 20, 2023
@NumberPiOso
Copy link

Thanks for outlining the pros/cons of the alternative. After reviewing it some time I agree with this idea.

Thank you for taking the time to provide an overview of the pros and cons of the alternative. After considering your proposal, I agree that it would be beneficial for queries related to dbt to be explicitly included in a model, rather than an analysis. This will ensure that all transformations are clear and visible to everyone through documentation. Additionally, I concur that any model or query being used in a external tool, such as the reverse ETL tool, should be thoroughly tested.

The ephemeral materialization will prevent people from using it (which is something that I definitely like) but I think that including these models in the documentation will mislead users into thinking that they can use those them.

For my specific use case, I will try to document the models in a special way to prevent their usage instead (such as changing the colors in UI to look like an exposure instead of a model) or marking this with a special tag.

Thank you once again for your time and effort in presenting your proposal.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants