New model selection syntax #550

drewbanin · 2017-10-04T13:44:39Z

Feature

Feature description

dbt's current model selection syntax with --models and --exclude is both ambiguous and constrained.

Ambiguity:
Given:

models/
  - orders.sql
  - orders/
    - order_items.sql
    - order_discounts.sql

This command:

dbt run --model orders

will run orders, order_items, and order_discounts.

Filenames
dbt should be able to run files by their name, enabling dbt work in concert with tools like ls and find, as described in #454

Multi-resource runs
dbt should be able to run different types of resources in the same invocation. dbt should support the specification of resource types, including:

models
seeds
tests (both data tests and schema tests)
archives (eventually)

This might also mean that dbt run should in fact run all resource types by default. That would be a pretty big change, so it might be worth adding a new subcommand and deprecating dbt run, or keeping the run subcommand and preserving the existing behavior.

Model Selection

In investigating different option for a new graph selection syntax, it becomes clear that the CLI is not well suited for rich selection syntaxes. I think we should punt on the broader question of orchestration (instead, handling that with user-space Python), and instead make small tweaks to --models to make it more flexible without overcomplicating things.

Proposed implementation

This section outlines that changes that we should make to --models today, and does not tackle broader questions like multi-resource runs and orchestration. The implementation below proposes a --models syntax that supports selection by model attributes.

These attributes include:

tags (not yet built)
sources (not yet built)

Examples:

# 1. Run all models with the tag "nightly"
dbt run --models tag:nightly

# 2. Run all models sourced from Snowplow data
dbt run --models source:snowplow

# 3. Run all models sourced from Snowplow or Shopify data, and their children
dbt run --models source:snowplow+ source:shopify+

# 4. Run all nightly models (and children), excluding anything built from Snowplow data
dbt run --models tag:nightly+ --exclude source:snowplow

I think this strikes a happy medium between ease of implementation and capability. There are certainly reasonable queries that this syntax does not support, but those are better handled through code as discussed above.

The text was updated successfully, but these errors were encountered:

mikekaminsky · 2018-11-15T18:31:39Z

I like this idea! Some speculative ideas below:

I'd love to be able to do something like

dbt run test --model customers

Which would run the customers model and then test it (crucially, only paying the startup parsing cost once).

I assume that a common pattern for dbt deployed in production is to do something like:

dbt seed && dbt run && dbt test

Which is annoying because it parses all of the templating and builds the DAG three times!

In fact, I use a script that looks like this as a poor-man's implementation of this for my most-common use-case:

$ cat run_and_test.sh
if [ "$1" != "" ]; then
dbt run --models $1  --target dev && dbt test --models $1 --target dev
else
dbt run --target dev && dbt test --target dev
fi

I'm not sure exactly what the right user-interface for this is, since not all of DBTs sub-commands make sense to be combined. Additionally, should DBT infer the correct order of the subcommands to be run, or do exactly what the user says (e.g., what should happen if a user runs dbt test run seed?).

Another option (which maybe clutters the subcommand space too much) is to create additional sub-commands that represent the most common patterns like:

dbt full_rebuild  -- equivalent to dbt seed run test
dbt run_with_test  -- equivalent to dbt run test

drewbanin · 2019-01-09T16:15:32Z

The "proposed implementation" part of this issue has been live since the introduction of tags in #1102. I'm going to close this issue in favor of more actionable issues:

Supply filenames to --models: pass filenames to --models #454
Support multi-resource runs: Support multi-resource runs #1227

drewbanin added command line interface and removed command line interface labels Oct 9, 2017

drewbanin changed the title ~~remove ambiguity from --models argument~~ Rethink --models selection syntax Jun 28, 2018

drewbanin added this to the 0.11.0 - Isaac Asimov (unreleased) milestone Jun 28, 2018

drewbanin removed this from the 0.11.0 - Isaac Asimov (unreleased) milestone Jul 26, 2018

drewbanin changed the title ~~Rethink --models selection syntax~~ New model selection syntax Sep 18, 2018

drewbanin added enhancement New feature or request estimate: 8 and removed estimate: 8 labels Sep 19, 2018

This was referenced Sep 19, 2018

Custom tags #311

Closed

Define source tables #814

Closed

drewbanin closed this as completed Jan 9, 2019

jtcohen6 mentioned this issue Feb 28, 2020

Advanced node selection syntax #2172

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New model selection syntax #550

New model selection syntax #550

drewbanin commented Oct 4, 2017 •

edited

Loading

mikekaminsky commented Nov 15, 2018

drewbanin commented Jan 9, 2019

New model selection syntax #550

New model selection syntax #550

Comments

drewbanin commented Oct 4, 2017 • edited Loading

Feature

Feature description

Model Selection

Proposed implementation

mikekaminsky commented Nov 15, 2018

drewbanin commented Jan 9, 2019

drewbanin commented Oct 4, 2017 •

edited

Loading