Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add unix-style fqn wildcard selector method #6599

Merged
merged 4 commits into from
Mar 5, 2023

Conversation

z3z1ma
Copy link
Contributor

@z3z1ma z3z1ma commented Jan 12, 2023

resolves #6598

Description

Adds a new selector method plus tests. This method is called wildcard. It uses fnmatch from the python stdlib.
Reference here

We look at the fqn as a string:

project.resource.some.dir.my_model
project.resource.some.dir.schema_file.dbt_utils_expression_is_true_my_model_column_a

And simply use fnmatch to determine if it is selected:

Here's a convoluted example just to show how flexible and robust you can imagine this being:

dbt test -s 'wildcard:project.*.*.dir.schema_file.*_column_a

Also you can see the tests as a reference for playing with it if you want to continue imagining what it opens up.

Checklist

@z3z1ma z3z1ma requested a review from a team January 12, 2023 23:10
@z3z1ma z3z1ma requested a review from a team as a code owner January 12, 2023 23:10
@cla-bot cla-bot bot added the cla:yes label Jan 12, 2023
@z3z1ma z3z1ma changed the title [Feature] Add unix-style wildcard selector method [Feature] Add unix-style fqn wildcard selector method Jan 12, 2023
@jtcohen6 jtcohen6 added Team:Execution ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering labels Jan 19, 2023
Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @z3z1ma Thanks for this amazing PR that provide a powerful feature! Great work think through which selector methods should support wildcards!

I want to make sure I understand the exact function you are adding here.
For existing selector, the following now support wildcard( the rest are more status based so they don't need to support wildcard match)

  • FQN
  • Tag
  • Source
  • Exposure
  • Metric
  • File
  • Package
  • TestName

And added a new one Wildcard.

After playing with it a bit I have some question/ask

  1. FQN selector support everything Wildcard selector support, plus special logic to the following possible. Is that correct? Is they anything that the wildcard support that the new FQN doesn't supports? If not should we consider just leave the new FQN, or remove wildcard match from FQN and add the new wildcard selector method, whichever feels more consistent with other selector methods.

    assert search_manifest_using_method(
    manifest, method, 'pkg.unions') == {'union_model', 'mynamespace.union_model'}

  2. Do you mind add some new test to the tests for those selectors that supports wildcard match? For example, for test_select_tag maybe we can add something like

    assert search_manifest_using_method(manifest, method, 'uses_eph*') == {
            'view_model', 'table_model'}
  3. Only because this change is super powerful, I would like us to think a bit more about is there any concern of the this fnmatch would cause the

    1. additional things got selected for user's existing selection command?(@z3z1ma @aranke you two probably know the use case better than I do)
    2. performance issue of selector on large projects (@jtcohen6 should we test this on some large size testing project before merging?) (EDIT: Thinking again this shouldn't have performance issue on existing selection. Might be slow for new selection that uses wildcard.)

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Feb 28, 2023

FQN selector support everything Wildcard selector support, plus special logic to the following possible. Is that correct? Is they anything that the wildcard support that the new FQN doesn't supports? If not should we consider just leave the new FQN, or remove wildcard match from FQN and add the new wildcard selector method, whichever feels more consistent with other selector methods.

The Wildcard selector is an artifact from the initial spike in which I had not planned on updating the other selectors.
After some iteration I found I could implement the functionality without any breaking change. So consider that the Wildcard selector will be dropped since the functionality is native to FQN.

We should definitely keep the wildcard matching capabilities on FQN because, as the default selector, it is significantly more accessible to users. Furthermore, existing use of the * character is functionally unchanged by this update. This means its a net new feature that does not change users who opted to use the old FQN "selector glob". Telling users they can run models using wildcard goodies out of the box without explicitly specifying wildcard: is big.

Command Old Outcome New Outcome
dbt run -s something.* Select all nodes from package something Select all nodes from package something
dbt run -s something.idk_* Select all nodes whose fqn starts with [something, idk*, ...] (weird right) Select all nodes from package something whose next fqn segment starts with idk_

Unless a user has * or ? or [] in their filenames which should be impossible or just an ill-advised move, we should be shored against any possibility (that I can conjure) of this not being backwards compatible. If anything, this might be more intuitive because I believe we have all used globs at one point thinking they would work in the middle of an fqn segment.

Can elucidate more tomorrow (new comment or will edit this one), but do expect a commit that drops Wildcard and updates to the tests @ChenyuLInx

@ChenyuLInx
Copy link
Contributor

@z3z1ma Thanks for the fast response!! This makes sense to me!! I was asking the third question only out of cautious not that I think the answer would be no.
Looking forward to the your new commit!

@z3z1ma z3z1ma force-pushed the feature/fnmatch-selector-method branch from 60fde4d to 906821b Compare March 2, 2023 04:14
@z3z1ma
Copy link
Contributor Author

z3z1ma commented Mar 2, 2023

@ChenyuLInx @aranke

Mentioned updates are complete.

  • More tests for other methods with extended wildcard enabled
  • Drop of independent Wildcard selector from initial spike since it is baked into FQN

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @z3z1ma !

@aranke aranke merged commit 24ca76e into dbt-labs:main Mar 5, 2023
acurtis-evi pushed a commit to acurtis-evi/dbt-core that referenced this pull request Mar 7, 2023
@acurtis-evi acurtis-evi mentioned this pull request Mar 8, 2023
6 tasks
jtcohen6 added a commit to dbt-labs/docs.getdbt.com that referenced this pull request Apr 14, 2023
Updates wildcard selection documentation (added in
#3130), based on the
implementation we actually ended up with:
-
#2702 (comment)
- dbt-labs/dbt-core#6599

Crucially, there is no standalone `wildcard:` method, but rather support
for unix-style wildcards in a number of existing methods. See Alex's
comment (linked above) above for details.

---------

Co-authored-by: Doug Beatty <[email protected]>
dbeatty10 added a commit to dbt-labs/docs.getdbt.com that referenced this pull request Jun 23, 2023
resolves #3549

## What are you changing in this pull request and why?

This PR adds the (undocumented!) `resource_type` method that has been
possible for a long, long time.

Reason: we discovered a selection method wasn't documented! So we did 😎 

## Not done

#3549 mentions that there are no examples for the wildcard method that
include `wildcard:`. This seems okay since the intent in
dbt-labs/dbt-core#6599 was to use Unix-style
wildcards in conjunction with all the other selection methods.

So we're choosing not to add `wildcard:` examples in this PR. If needed,
they can always be added later.

## 🎩 


[Preview](https://deploy-preview-3548--docs-getdbt-com.netlify.app/reference/node-selection/methods#the-resource_type-method)

<img width="550" alt="image"
src="https://github.com/dbt-labs/docs.getdbt.com/assets/44704949/c46f18c5-d4ea-4e9a-815d-7f682efa4ee2">


## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
and [About
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
so my content adheres to these guidelines.

---------

Co-authored-by: mirnawong1 <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch
4 participants