Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the unresolved form of source database representations #2744

Closed
jtcohen6 opened this issue Sep 9, 2020 · 8 comments
Closed

Store the unresolved form of source database representations #2744

jtcohen6 opened this issue Sep 9, 2020 · 8 comments
Labels
enhancement New feature or request stale Issues that have gone stale state Stateful selection (state:modified, defer)

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 9, 2020

extension of #2713

Describe the feature

Let's say a project has a source defined like:

sources:
  - name: my_postgres_db
    database: "{{ 'raw' if target.name == 'prod' else 'raw_sampled' }}"

Today, if running in dev and comparing to a manifest generated by a prod target, state:modified+ will include all models that depend on source:my_postgres_db because the database differs. Ideally, we'd be able to tell that the unresolved database (Jinja statement) is identical.

I'm most interested in sources and database representations (database, schema, identifier). If it made sense, though, we could broaden this issue to the matter of storing unresolved:

Describe alternatives you've considered

  • The proposal for state:modified subselectors gives decent user-side answers by letting people "switch off" the intersection of state:modified.database_representations and specific sources.

Who will this benefit?

  • Users who develop, test, and deploy against genuinely different source datasets. While this isn't something we recommend in the general case, it's the norm at organizations of a certain size + maturity.
@jtcohen6 jtcohen6 added enhancement New feature or request state Stateful selection (state:modified, defer) labels Sep 9, 2020
@fabrice-etanchaud
Copy link

fabrice-etanchaud commented Sep 15, 2020

You are totally right, @jtcohen6 . That 's exactly our use case, here @maif-vie (life insurance). We have a kind of environment notion (we call that a 'track') that groups data sources for a given project/feature. I ended up adding a extra credential parameter called 'environment' to use as a prefix for source's databases. That way I can configure separately the source version and the target database. This enhancement would be of great help here, and in as you already said 'mature' organizations.
Best regards from french west coast ;-)

@jtcohen6
Copy link
Contributor Author

From #3201: It's possible to define sources that are implicitly different across environments, e.g. on Snowflake by excluding the database:

sources:
  - name: etl
    tables:
      - name: etl_sample_table
        identifier: sample_table

Is really the same as:

sources:
  - name: etl
    database: "{{ target.database }}"
    tables:
      - name: etl_sample_table
        identifier: sample_table

So running with a different target.database will result in the source being marked modified. Any resolution to this issue should also seek to resolve that case.

@jtcohen6 jtcohen6 modified the milestones: Margaret Mead, Oh-Twenty-One Apr 13, 2021
@jtcohen6
Copy link
Contributor Author

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@katieclaiborne
Copy link

Would it be possible to revive this issue?

Our (BigQuery) project defines sources that are implicitly different across environments.

We're using state:modified sub-selectors as a workaround, but because this is the general pattern for our sources, creating a YAML selector with the intersection of each one is unsustainable.

@wjhrdy
Copy link

wjhrdy commented Dec 21, 2023

Would it be possible to revive this issue?

Our (BigQuery) project defines sources that are implicitly different across environments.

We're using state:modified sub-selectors as a workaround, but because this is the general pattern for our sources, creating a YAML selector with the intersection of each one is unsustainable.

can I ask how exactly you do this?

I have a source that is defined like this.

version: 2
sources:
  - name: data1
    database: data
    schema: "{{ target.name }}_schema"
    tables:

this source gives me false positives when i compare across target environments.

how would I write a YAML selector or just CLI selector to select all changed models but ignore the false positives that are from this target-aware source definition?

@katieclaiborne
Copy link

Sure thing! Here's what our selectors.yml file looks like.

selectors:
  - name: modified_plus
    description: >
      All state:modified subselectors except relation (database/schema/alias), and their children
    definition:
      union:
        - method: state
          value: modified.body
          children: true
        - method: state
          value: modified.configs
          children: true
        - method: state
          value: modified.persisted_descriptions
          children: true
        - method: state
          value: modified.macros
          children: true
        - method: state
          value: modified.contract
          children: true
        - exclude:
            - method: package
              value: dbt_project_evaluator
            - method: fqn
              value: dbt_project_evaluator_exceptions

@wjhrdy
Copy link

wjhrdy commented Jan 12, 2024

Thanks for this I've created our version of this where you can specify sources that are target aware.

selectors:
  - name: target_aware_sources
    description: >
      All sources that are target-aware
    definition:
      union:
        - "@source:source1"
        - "@source:source2"
        - "@source:source3"
        - "@source:source4"

  - name: modified_minus_relation
    description: >
      All state:modified subselectors except relation (database/schema/alias), and their children
    definition:
      union:
        - method: state
          value: modified.body
          children: true
        - method: state
          value: modified.configs
          children: true
        - method: state
          value: modified.persisted_descriptions
          children: true
        - method: state
          value: modified.macros
          children: true
        - method: state
          value: modified.contract
          children: true

  - name: modified_target_aware
    description: >
      All state:modified subselectors except relation (database/schema/alias), and their children intersected with sources that need this
    definition:
      intersection:
        - method: selector
          value: target_aware_sources
        - method: selector
          value: modified_minus_relation

  - name: modified_minus_target_aware
    description: >
      All state:modified minus target_aware_sources
    definition:
      - method: state
        value: modified
        children: true
        exclude:
          - method: selector
            value: target_aware_sources

  - name: modified_plus
    description: >
      All state:modified accounting for target_aware_sources
    definition:
      union:
        - method: selector
          value: modified_minus_target_aware
        - method: selector
          value: modified_target_aware

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Issues that have gone stale state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

4 participants