-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git pathspecs and submodule recursion for iter_gitworkstree()
with pathspec support
#719
Conversation
87fef46
to
0a1cb27
Compare
19153fc
to
057d7b2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #719 +/- ##
==========================================
+ Coverage 92.43% 92.52% +0.09%
==========================================
Files 193 197 +4
Lines 14135 14440 +305
Branches 2134 2209 +75
==========================================
+ Hits 13065 13360 +295
- Misses 806 810 +4
- Partials 264 270 +6 ☔ View full report in Codecov by Sentry. |
The main (if not only) purpose of this functionality is pathspec mangling/translation for handing them over to analog Git command calls on submodules -- for any Git command that supports pathspecs, but not recursion. A simple example for such a command is `git ls-files --other`. It accepts pathspecs, but does not implement `--recurse-submodules` for listing untracked files. The goal of this functionality is to be able to take pathspecs that is valid in the context of a top-level repository, and translate it such that the set of paths specs given to the same command running on/in a submodule/subdirectory gives the same results, as if the initial top-level invocation reported them (if it even could). The key algorihtm lives in a standlone function `yield_subdir_match_remainder_pathspecs()` that performs a purely lexical analysis. It also comes with a dedicated test collection that is leaner and easier to extend than the previous one (which remains also). The additionally included sketch of a testbattery uses ``git ls-files --other` for testing, rather than a formal description -- because the behavior of the implementation is more elaborate than the documentation at https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec suggests. All testing is (for now) performed within a single repository, and with translation for execution in subdirectories. Refs: datalad#587, datalad/datalad#6933
The purpose of this class is to be able to serve as an (internal) handler for any and all pathspec-like arguments.
Basic tests are included that cover globbing types, but also matching by gitattributes. Especially the latter is something that is hard to achieve without pathspec support.
This alters the `pathspec` evaluation to yield submodules that *may* have content matching any pathspec, rather than the pathspec having to match a submodule item directly. This feature can be useful for implementing submodule recursion around Git commands that do not support submodule recursion directly.
The plan is to monitor yielded items, looking for submodules. If one is found, call the hlper again, with translates pathspecs on and for the submodule.
This plan is to use this helper as a standard for such conversions, rather than the establish complex conditional in the code.
Previously, we passed file system query related flags down to a helper. Now we are handling this at the top-level, with item ppost-processing. This decomplexifies the logic, simplifies the function(signature)s.
This change prepares for an extension of the mode collection to only report untracked files.
This change is a preparation for replacing the `gitstatus()` helper `_yield_repo_untracked()`, and remove that now duplicate implementation with this standard iterator.
This will avoid having to implement pathspec support in two closely related implementation. This change is made possible by `iter_gitworktree()` newly acquired ability to report on untracked content only.
There is a new `any_match_subdir()` that provides the fast-exit test for a subdir-scope match that is implemented in `iter_submodules()`. Moreover, there is standard `__eq__()` method to aid identity comparisions. The tests of `GitPathSpecs` have been brought to 100% coverage. This raised a bunch of questions regarding semantics. What is the diffrence between: - `GitPathSpecs(None)` - `GitPathSpecs([])` - `GitPathSpecs([':'])` This needs to be documented.
Replaces a custom implementation that was the inspiration for the new class helper.
…tests Disabling untracked reporting is done with the parameter value `None`. This is uniformly communicated in the docstrings. Only the tests used and undocumented `'no'` value. This change removes support for `'no'` and aligns the implementation with the documentation.
We can now benefit from the recently introduced submodule recursion, and need not implement that separately.
This change is motivated by the need to be able to distinguish no-constraint scenarios from the inability to translate a constraint into a particular directory scope -- which typically means that no match would be possible. A `None` argument to `GitPathSpecs` is intepreted as no-contraint (i.e., a `:` pathspec), which translates to an empty `arglist()` return. It was confusing that an empty list returned from `for_subdir()` simultaneously meant 'no-match' rather than 'no-constraint'. This ambiguity is now resolved with an explicit `ValueError` for the path that produces a `no-match` result. This enables the implementation of early-stopping testing in client code -- which is demo'd for `iter_gitworktree()` in this changeset.
record for the containing immediate subdirectory ``path`` is yielded. | ||
For example, with 'path/subdir/file1' and 'path/subdir/file2' there | ||
will only be a single item with ``name='subdir'`` and | ||
``type='directory'``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added arg pathspecs
seems was not added/described.
|
||
@dataclass(frozen=True) | ||
class GitPathSpec: | ||
"""Support class for patterns used to limit paths in Git commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own information:
it reimplements (no git commands involved) what git implements for pathspec handling? as of which git version ( I bet it changes once in a while)?
It also operates without awareness of what is in git or not, and purely on filesystem level? (I wonder then how in general it would work for e.g. datalad status
on deleted files with a pathspec limit matching their names)
here is how it looks with master -- may be pathrefs are not interfaced to status yet at all
❯ git status
On branch master
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: 123.jpg
no changes added to commit (use "git add" and/or "git commit -a")
❯ git status ':(glob)*.jpg'
On branch master
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
deleted: 123.jpg
no changes added to commit (use "git add" and/or "git commit -a")
❯ datalad status ':(glob)*.jpg'
CommandError: 'git -c diff.ignoreSubmodules=none ls-tree HEAD -z -r --full-tree -l -- ':(glob)*.jpg'' failed with exitcode 128
fatal: :(glob)*.jpg: pathspec magic not supported by this command: 'glob'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so apparently even not all git
commands implement pathrefs uniformly across them :-/
This is yet another attempt towards pathspecs. It replaces #714 with something even more low-level and lightweight.
GitPathSpec
support class and constraint #587TODO
None
vs[]
forGitPathSpecs