-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add: support simple wildcards #4864
Conversation
43bce27
to
e288095
Compare
e288095
to
c12eedf
Compare
81a82d3
to
d0299e4
Compare
It doesn't really fix it, just use |
tests/func/test_add.py
Outdated
} | ||
} | ||
) | ||
subdir_path = os.path.join("dir", "subdir", "subdata*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test more features or just single wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some more use cases for shell-style wildcards, and possible some negative ones for recursive matching which will not work as it is now, unless we enable the recursive flag. do we want that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ju0gri Note that the more the tests - the more expensive it get's, as we run them in real filesystem. I suppose os.glob has some spec that it sticks to, so it probably doesn't make sense to test it all.
Recursive flag is for double wildcards only, right? Definitely seems useful. Any reason why it is disabled by default? Maybe some perf issues (except for the obvious misuse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the recursive flag enables **
. According to their docs:
Using the β**β pattern in large directory trees may consume an inordinate amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding one test for each use case defined in fnmatch
manual be an acceptable compromise ?
* | matches everything
? | matches any single character
[seq] | matches any character inΒ seq
[!seq] | matches any character not inΒ seq
** | searches recursively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the recursive flag enables **. According to their docs:
Sure, I am wondering whether it has some hidden cons. E.g. maybe it is considerably slower by itself when that pattern is supported. I guess need to give it a shot/look into internals for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding one test for each use case defined in fnmatch manual be an acceptable compromise ?
Sounds good.
Btw, pathspec might not be desired in some cases. E.g. dvc add '?*[seq]**'
, where all the special stuff is already shieled by the single quotes. Current patch enabled globbing by default, but it looks like we should probably make it optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop - Yes, makes sense. I am bad with naming, but added an argument called pathpattern
for the add
command to enable globing of the passed in target. Also, as I am not very familiar with the structure of the repo, are there other places where I need to add coverage for the new argument ?
d0299e4
to
4a7cf15
Compare
dvc/command/add.py
Outdated
@@ -57,6 +58,12 @@ def add_parser(subparsers, parent_parser): | |||
default=False, | |||
help="Allow targets that are outside of the DVC repository.", | |||
) | |||
parser.add_argument( | |||
"--pathpattern", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about --glob
?
2093e45
to
06ce046
Compare
Related to iterative#4816. Signed-off-by: Ioana Grigoropol <[email protected]>
Adds a new argument for the add command `glob` that is disabled by default and when enabled it passes the input targets through glob filtering. Related: iterative#4816 Signed-off-by: Ioana Grigoropol <[email protected]>
d501c2e
to
0a49415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The only thing left now is to create a ticket for docs or create a doc PR, as noted in the PR checkbox.
Thank you! |
parser.add_argument( | ||
"--glob", | ||
action="store_true", | ||
default=False, | ||
help="Allows targets containing shell-style wildcards.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! A few questions (related to documenting this per iterative/dvc.org/issues/1928):
- What does "glob" abbreviate or stand for?
- Why is a flag needed to allow/enable wildcards?
- Is there a short description of what "shell-style" means exactly? Or a link to some Linux doc, Python package, or something specifying what can be done in terms of wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more natural to use wildcards in the default mode without any flag, just like what we do in Unix. And itβs better to extract the glob
to some common function that could be used in all commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@karajan1001 I personally wasn't ready to enable this by default right away as I'm not sure it won't affect negatively some existing usecases π Once we're ready for it, it will be as simple as turning that flag on by default. For PS users --glob
will be a temporary inconvenience, but at least the functionality is being slowly filled up.
@jorgeorpinel Maybe just forget about the doc ticket for it for now, we'll be adding more stuff for it in the future.
What does "glob" abbreviate or stand for?
https://docs.python.org/3/library/glob.html
Why is a flag needed to allow/enable wildcards?
Usually, your shell handles the expansion and it will still do that. Our --glob is useful for programatic use or in shells that don't support explansion natively (e.g. powershell). I'm personally not ready to enable it by default right now.
Is there a short description of what "shell-style" means exactly? Or a link to some Linux doc, Python package, or something specifying what can be done in terms of wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, just wanted to understand this. The initial docs PR is ready for merge.
β¦limit * 'master' of github.com:iterative/dvc: dag: add --outs option (iterative#4739) Add test server and tests for webdav (iterative#4827) Simpler param updates with python-benedict (iterative#4780) checkpoints: set DVC_ROOT environment variable (iterative#4877) api: add support for simple wildcards (iterative#4864) tests: mark azure test as flaky (iterative#4881) setup.py: limit responses version for moto (iterative#4879) remote: avoid chunking on webdav. Fixes iterative#4796 (iterative#4828) checkpoints: `exp run` and `exp res[ume]` refactor (iterative#4855)
Related to #4816.
Signed-off-by: Ioana Grigoropol [email protected]
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
iterative/dvc.org#1928
Thank you for the contribution - we'll try to review it as soon as possible. π