Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: support simple wildcards #4864

Merged
merged 7 commits into from
Nov 11, 2020
Merged

add: support simple wildcards #4864

merged 7 commits into from
Nov 11, 2020

Conversation

ju0gri
Copy link
Contributor

@ju0gri ju0gri commented Nov 9, 2020

Related to #4816.

Signed-off-by: Ioana Grigoropol [email protected]

iterative/dvc.org#1928

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
@ju0gri ju0gri force-pushed the ioana-fix-4816 branch 2 times, most recently from 81a82d3 to d0299e4 Compare November 9, 2020 18:28
@efiop
Copy link
Contributor

efiop commented Nov 9, 2020

Fixes #4816.

It doesn't really fix it, just use Related please, so that github doesn't close it automatically πŸ™‚

}
}
)
subdir_path = os.path.join("dir", "subdir", "subdata*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test more features or just single wildcards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some more use cases for shell-style wildcards, and possible some negative ones for recursive matching which will not work as it is now, unless we enable the recursive flag. do we want that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ju0gri Note that the more the tests - the more expensive it get's, as we run them in real filesystem. I suppose os.glob has some spec that it sticks to, so it probably doesn't make sense to test it all.

Recursive flag is for double wildcards only, right? Definitely seems useful. Any reason why it is disabled by default? Maybe some perf issues (except for the obvious misuse).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the recursive flag enables **. According to their docs:

Using the β€œ**” pattern in large directory trees may consume an inordinate amount of time.

Copy link
Contributor Author

@ju0gri ju0gri Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding one test for each use case defined in fnmatch manual be an acceptable compromise ?

* | matches everything
? | matches any single character
[seq] | matches any character inΒ seq
[!seq] | matches any character not inΒ seq
** | searches recursively

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the recursive flag enables **. According to their docs:

Sure, I am wondering whether it has some hidden cons. E.g. maybe it is considerably slower by itself when that pattern is supported. I guess need to give it a shot/look into internals for that.

Copy link
Contributor

@efiop efiop Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding one test for each use case defined in fnmatch manual be an acceptable compromise ?

Sounds good.

Btw, pathspec might not be desired in some cases. E.g. dvc add '?*[seq]**', where all the special stuff is already shieled by the single quotes. Current patch enabled globbing by default, but it looks like we should probably make it optional.

Copy link
Contributor Author

@ju0gri ju0gri Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop - Yes, makes sense. I am bad with naming, but added an argument called pathpattern for the add command to enable globing of the passed in target. Also, as I am not very familiar with the structure of the repo, are there other places where I need to add coverage for the new argument ?

@@ -57,6 +58,12 @@ def add_parser(subparsers, parent_parser):
default=False,
help="Allow targets that are outside of the DVC repository.",
)
parser.add_argument(
"--pathpattern",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about --glob?

dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
dvc/repo/add.py Outdated Show resolved Hide resolved
ju0gri and others added 7 commits November 11, 2020 17:47
Related to iterative#4816.

Signed-off-by: Ioana Grigoropol <[email protected]>
Adds a new argument for the add command `glob` that is disabled by default and when enabled it passes
the input targets through glob filtering.

Related: iterative#4816

Signed-off-by: Ioana Grigoropol <[email protected]>
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing left now is to create a ticket for docs or create a doc PR, as noted in the PR checkbox.

@efiop efiop merged commit 556143f into iterative:master Nov 11, 2020
@efiop
Copy link
Contributor

efiop commented Nov 11, 2020

Thank you!

@skshetry skshetry added the enhancement Enhances DVC label Nov 12, 2020
@skshetry skshetry changed the title api: add support for simple wildcards add: support simple wildcards Nov 12, 2020
Comment on lines +61 to +66
parser.add_argument(
"--glob",
action="store_true",
default=False,
help="Allows targets containing shell-style wildcards.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! A few questions (related to documenting this per iterative/dvc.org/issues/1928):

  • What does "glob" abbreviate or stand for?
  • Why is a flag needed to allow/enable wildcards?
  • Is there a short description of what "shell-style" means exactly? Or a link to some Linux doc, Python package, or something specifying what can be done in terms of wildcards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more natural to use wildcards in the default mode without any flag, just like what we do in Unix. And itβ€˜s better to extract the glob to some common function that could be used in all commands.

Copy link
Contributor

@efiop efiop Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001 I personally wasn't ready to enable this by default right away as I'm not sure it won't affect negatively some existing usecases πŸ™ Once we're ready for it, it will be as simple as turning that flag on by default. For PS users --glob will be a temporary inconvenience, but at least the functionality is being slowly filled up.

@jorgeorpinel Maybe just forget about the doc ticket for it for now, we'll be adding more stuff for it in the future.

What does "glob" abbreviate or stand for?

https://docs.python.org/3/library/glob.html

Why is a flag needed to allow/enable wildcards?

Usually, your shell handles the expansion and it will still do that. Our --glob is useful for programatic use or in shells that don't support explansion natively (e.g. powershell). I'm personally not ready to enable it by default right now.

Is there a short description of what "shell-style" means exactly? Or a link to some Linux doc, Python package, or something specifying what can be done in terms of wildcards?

https://docs.python.org/3/library/glob.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, just wanted to understand this. The initial docs PR is ready for merge.

I159 added a commit to I159/dvc that referenced this pull request Nov 12, 2020
…limit

* 'master' of github.com:iterative/dvc:
  dag: add --outs option (iterative#4739)
  Add test server and tests for webdav (iterative#4827)
  Simpler param updates with python-benedict (iterative#4780)
  checkpoints: set DVC_ROOT environment variable (iterative#4877)
  api: add support for simple wildcards (iterative#4864)
  tests: mark azure test as flaky (iterative#4881)
  setup.py: limit responses version for moto (iterative#4879)
  remote: avoid chunking on webdav. Fixes iterative#4796 (iterative#4828)
  checkpoints: `exp run` and `exp res[ume]` refactor (iterative#4855)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants