Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New model selection syntax #550

Closed
drewbanin opened this issue Oct 4, 2017 · 2 comments
Closed

New model selection syntax #550

drewbanin opened this issue Oct 4, 2017 · 2 comments
Labels
enhancement New feature or request

Comments

@drewbanin
Copy link
Contributor

drewbanin commented Oct 4, 2017

Feature

Feature description

dbt's current model selection syntax with --models and --exclude is both ambiguous and constrained.

Ambiguity:
Given:

models/
  - orders.sql
  - orders/
    - order_items.sql
    - order_discounts.sql

This command:

dbt run --model orders

will run orders, order_items, and order_discounts.

Filenames
dbt should be able to run files by their name, enabling dbt work in concert with tools like ls and find, as described in #454

Multi-resource runs
dbt should be able to run different types of resources in the same invocation. dbt should support the specification of resource types, including:

  • models
  • seeds
  • tests (both data tests and schema tests)
  • archives (eventually)

This might also mean that dbt run should in fact run all resource types by default. That would be a pretty big change, so it might be worth adding a new subcommand and deprecating dbt run, or keeping the run subcommand and preserving the existing behavior.

Model Selection

In investigating different option for a new graph selection syntax, it becomes clear that the CLI is not well suited for rich selection syntaxes. I think we should punt on the broader question of orchestration (instead, handling that with user-space Python), and instead make small tweaks to --models to make it more flexible without overcomplicating things.

Proposed implementation

This section outlines that changes that we should make to --models today, and does not tackle broader questions like multi-resource runs and orchestration. The implementation below proposes a --models syntax that supports selection by model attributes.

These attributes include:

  • tags (not yet built)
  • sources (not yet built)

Examples:

# 1. Run all models with the tag "nightly"
dbt run --models tag:nightly

# 2. Run all models sourced from Snowplow data
dbt run --models source:snowplow

# 3. Run all models sourced from Snowplow or Shopify data, and their children
dbt run --models source:snowplow+ source:shopify+

# 4. Run all nightly models (and children), excluding anything built from Snowplow data
dbt run --models tag:nightly+ --exclude source:snowplow

I think this strikes a happy medium between ease of implementation and capability. There are certainly reasonable queries that this syntax does not support, but those are better handled through code as discussed above.

@drewbanin drewbanin changed the title remove ambiguity from --models argument Rethink --models selection syntax Jun 28, 2018
@drewbanin drewbanin changed the title Rethink --models selection syntax New model selection syntax Sep 18, 2018
@drewbanin drewbanin added enhancement New feature or request estimate: 8 and removed estimate: 8 labels Sep 19, 2018
This was referenced Sep 19, 2018
@mikekaminsky
Copy link
Contributor

I like this idea! Some speculative ideas below:

I'd love to be able to do something like

dbt run test --model customers

Which would run the customers model and then test it (crucially, only paying the startup parsing cost once).

I assume that a common pattern for dbt deployed in production is to do something like:

dbt seed && dbt run && dbt test

Which is annoying because it parses all of the templating and builds the DAG three times!

In fact, I use a script that looks like this as a poor-man's implementation of this for my most-common use-case:

$ cat run_and_test.sh
if [ "$1" != "" ]; then
dbt run --models $1  --target dev && dbt test --models $1 --target dev
else
dbt run --target dev && dbt test --target dev
fi

I'm not sure exactly what the right user-interface for this is, since not all of DBTs sub-commands make sense to be combined. Additionally, should DBT infer the correct order of the subcommands to be run, or do exactly what the user says (e.g., what should happen if a user runs dbt test run seed?).

Another option (which maybe clutters the subcommand space too much) is to create additional sub-commands that represent the most common patterns like:

dbt full_rebuild  -- equivalent to dbt seed run test
dbt run_with_test  -- equivalent to dbt run test

@drewbanin
Copy link
Contributor Author

The "proposed implementation" part of this issue has been live since the introduction of tags in #1102. I'm going to close this issue in favor of more actionable issues:

  1. Supply filenames to --models: pass filenames to --models #454
  2. Support multi-resource runs: Support multi-resource runs #1227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants