Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Resource selection: allow to specify intersection with union #10596

Open
3 tasks done
jaklan opened this issue Aug 23, 2024 · 6 comments
Open
3 tasks done

[Feature] Resource selection: allow to specify intersection with union #10596

jaklan opened this issue Aug 23, 2024 · 6 comments
Labels
enhancement New feature or request node selection Functionality and syntax for selecting DAG nodes triage

Comments

@jaklan
Copy link

jaklan commented Aug 23, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently I don't see a way to define an intersection with union when specifying resources using --select.

Example:

  • model_a1 without tag:test
  • model_a2 with tag:test
  • model_a3 without tag:test
  • model_b1 - doesn't exist
> dbt ls --select tag:test,"model_a1 model_a2 model_a3 model_b1"
02:28:39  Running with dbt=1.8.5
02:28:39  Registered adapter: duckdb=1.8.3
02:28:39  Found 3 models, 527 macros, 2 groups
02:28:39  The selection criterion 'tag:test,model_a1' does not match any enabled nodes
02:28:39  The selection criterion 'model_b1' does not match any enabled nodes
project_a.model_a2
project_a.model_a3

Here intersection is only applied to model_a1, quotation marks don't prioritise union over intersection.

However, selectors seem to work:

selectors:
  - name: test
    definition:
      intersection:
        - union: 
           - model_a1
           - model_a2
           - model_a3
           - model_b1
        - tag: test

In that case we get:

> dbt ls --selector test
02:20:20  Running with dbt=1.8.5
02:20:20  Registered adapter: duckdb=1.8.3
02:20:21  Found 3 models, 527 macros, 2 groups
project_a.model_a2
@jaklan jaklan added enhancement New feature or request triage labels Aug 23, 2024
@graciegoheen
Copy link
Contributor

Hi @jaklan thanks for opening!

I haven't tested this, but I wonder if something like dbt ls --select "tag:test,model_a1 tag:test,model_a2 tag:test,model_a3 tag:test,model_b1" would accomplish what you're after (though obviously that is rather verbose).

@jaklan
Copy link
Author

jaklan commented Aug 29, 2024

@graciegoheen it will, but in our case the problem is - in our Airflow DbtOperator we allow end-users to specify their selectors like select="tag:tag_a tag:tag_b1,tag:tag_b2", but within the operator - we attach some mandatory filters as intersection, e.g. package:this because of this issue: #8954 (unfortunately closed...). So we would like to be able to simply concatenate these two parts into:

--select package:this,"tag:tag_a tag:tag_b1,tag:tag_b2"

but currently we need custom logic to parse select value and create sth like:

--select package:this,tag:tag_a package:this,tag:tag_b1,tag:tag_b2

which creates additional overhead. So it's not "undoable", but affecting user experience. And as mentioned above - it works for YAML selectors, so there's inconsistency here (if there's really no CLI syntax to enforce such intersection).

@mroy-seedbox
Copy link

It seems to me like we would need a grouping/priority operator. Parentheses is probably what would make the most sense.

So: --select 'package:this,(tag:tag_a tag:tag_b1,tag:tag_b2)'

@mroy-seedbox
Copy link

mroy-seedbox commented Nov 13, 2024

With the selector: method (see #10992), this superior syntax would be supported:

--select package:this,selector:tag_a_b1_b2

No need for parentheses, as the YAML selector itself would act as the grouping operator.

And you might be able to create the selector dynamically before running dbt. It could be as simple as injecting something like this in selectors.yml:

  - name: selector_to_run_now
    definition: <selector specified by user>

Just need to be careful to get the indentation right. 😅 Or do a string replace of a placeholder value, that would probably be much easier (easy to do even with sed -i 's/placeholder/<selector specified by user>/' selectors.yml).

And then: --select package:this,selector:selector_to_run_now

And another alternative would be this one, but it would need to be maintained over time as packages are added (unless dbt adds support for a NOT operator): --select <selector specified by user> --exclude selector:everything_that_is_not_in_this_package

💡 Edit: Actually, a NOT operator is currently supported in YAML selectors!

Like this:

  - name: everything_that_is_not_in_this_package
    definition:
      intersection:
        - '*'
        - exclude:
          - package:this # Or my_project_name also works here

Or actually, you should already be able to create a final selector to run which would be:

  - name: final_selector_to_run_now
    definition:
      intersection:
        - method: selector
          value: selector_to_run_now
        - package:this

@jaklan
Copy link
Author

jaklan commented Nov 13, 2024

@mroy-seedbox these are just 2 different use-cases. We definitely need support for parentheses, which you could utilise with any method (incl. proposed selector:), but enforcing people to use YAML selectors and do weird hacks with sed to solve the above issue shouldn't be the way to go

@mroy-seedbox
Copy link

It sounds like --exclude selector:everything_that_is_not_in_this_package would be a good start at least.

But yeah, I too have often wished for some kind of grouping/priority operator (i.e. parentheses). Otherwise, we have to repeat ourselves (or worse and do parsing, as in you case), and it sucks. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request node selection Functionality and syntax for selecting DAG nodes triage
Projects
None yet
Development

No branches or pull requests

4 participants