Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git pathspecs and submodule recursion for iter_gitworkstree() with pathspec support #719

Merged
merged 17 commits into from
Jun 11, 2024

Conversation

mih
Copy link
Member

@mih mih commented Jun 10, 2024

This is yet another attempt towards pathspecs. It replaces #714 with something even more low-level and lightweight.

TODO

  • Clarify None vs [] for GitPathSpecs

@mih mih force-pushed the gitps branch 3 times, most recently from 87fef46 to 0a1cb27 Compare June 10, 2024 15:12
@adswa adswa force-pushed the gitps branch 2 times, most recently from 19153fc to 057d7b2 Compare June 10, 2024 19:09
Copy link

codecov bot commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 97.75281% with 8 lines in your changes missing coverage. Please review.

Project coverage is 92.52%. Comparing base (c77087b) to head (1722e1f).
Report is 99 commits behind head on main.

Files with missing lines Patch % Lines
datalad_next/gitpathspec/pathspec.py 98.31% 0 Missing and 2 partials ⚠️
datalad_next/gitpathspec/tests/test_gitpathspec.py 97.29% 1 Missing and 1 partial ⚠️
datalad_next/iter_collections/gitworktree.py 97.01% 2 Missing ⚠️
...ext/iter_collections/tests/test_itergitworktree.py 95.12% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #719      +/-   ##
==========================================
+ Coverage   92.43%   92.52%   +0.09%     
==========================================
  Files         193      197       +4     
  Lines       14135    14440     +305     
  Branches     2134     2209      +75     
==========================================
+ Hits        13065    13360     +295     
- Misses        806      810       +4     
- Partials      264      270       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mih added 17 commits June 11, 2024 10:57
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The key algorihtm lives in a standlone function
`yield_subdir_match_remainder_pathspecs()` that performs a purely
lexical analysis. It also comes with a dedicated test collection that is
leaner and easier to extend than the previous one (which remains also).

The additionally included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

Refs: datalad#587, datalad/datalad#6933
The purpose of this class is to be able to serve as an (internal)
handler for any and all pathspec-like arguments.
Basic tests are included that cover globbing types, but also
matching by gitattributes. Especially the latter is something
that is hard to achieve without pathspec support.
This alters the `pathspec` evaluation to yield submodules that
*may* have content matching any pathspec, rather than the
pathspec having to match a submodule item directly.

This feature can be useful for implementing submodule recursion
around Git commands that do not support submodule recursion
directly.
The plan is to monitor yielded items, looking for submodules. If one is
found, call the hlper again, with translates pathspecs on and for
the submodule.
This plan is to use this helper as a standard for such conversions,
rather than the establish complex conditional in the code.
Previously, we passed file system query related flags down to a helper.
Now we are handling this at the top-level, with item ppost-processing.
This decomplexifies the logic, simplifies the function(signature)s.
This change prepares for an extension of the mode collection to only
report untracked files.
This change is a preparation for replacing the `gitstatus()` helper
`_yield_repo_untracked()`, and remove that now duplicate implementation
with this standard iterator.
This will avoid having to implement pathspec support in two closely
related implementation. This change is made possible by
`iter_gitworktree()` newly acquired ability to report on untracked
content only.
There is a new `any_match_subdir()` that provides the fast-exit test for
a subdir-scope match that is implemented in `iter_submodules()`.
Moreover, there is standard `__eq__()` method to aid identity
comparisions.

The tests of `GitPathSpecs` have been brought to 100% coverage. This
raised a bunch of questions regarding semantics. What is the diffrence
between:

- `GitPathSpecs(None)`
- `GitPathSpecs([])`
- `GitPathSpecs([':'])`

This needs to be documented.
Replaces a custom implementation that was the inspiration for the new
class helper.
…tests

Disabling untracked reporting is done with the parameter value `None`.
This is uniformly communicated in the docstrings. Only the tests used
and undocumented `'no'` value.

This change removes support for `'no'` and aligns the implementation
with the documentation.
We can now benefit from the recently introduced submodule recursion, and
need not implement that separately.
This change is motivated by the need to be able to distinguish
no-constraint scenarios from the inability to translate a constraint
into a particular directory scope -- which typically means that no match
would be possible.

A `None` argument to `GitPathSpecs` is intepreted as no-contraint (i.e.,
a `:` pathspec), which translates to an empty `arglist()` return.

It was confusing that an empty list returned from `for_subdir()`
simultaneously meant 'no-match' rather than 'no-constraint'.

This ambiguity is now resolved with an explicit `ValueError` for the
path that produces a `no-match` result. This enables the implementation
of early-stopping testing in client code -- which is demo'd for
`iter_gitworktree()` in this changeset.
@mih mih mentioned this pull request Jun 11, 2024
6 tasks
@mih mih merged commit a447b48 into datalad:main Jun 11, 2024
9 checks passed
@mih mih deleted the gitps branch June 11, 2024 09:49
record for the containing immediate subdirectory ``path`` is yielded.
For example, with 'path/subdir/file1' and 'path/subdir/file2' there
will only be a single item with ``name='subdir'`` and
``type='directory'``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added arg pathspecs seems was not added/described.


@dataclass(frozen=True)
class GitPathSpec:
"""Support class for patterns used to limit paths in Git commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own information:
it reimplements (no git commands involved) what git implements for pathspec handling? as of which git version ( I bet it changes once in a while)?

It also operates without awareness of what is in git or not, and purely on filesystem level? (I wonder then how in general it would work for e.g. datalad status on deleted files with a pathspec limit matching their names)

here is how it looks with master -- may be pathrefs are not interfaced to status yet at all
❯ git status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    123.jpg

no changes added to commit (use "git add" and/or "git commit -a")
❯ git status ':(glob)*.jpg'
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	deleted:    123.jpg

no changes added to commit (use "git add" and/or "git commit -a")
❯ datalad status ':(glob)*.jpg'
CommandError: 'git -c diff.ignoreSubmodules=none ls-tree HEAD -z -r --full-tree -l -- ':(glob)*.jpg'' failed with exitcode 128
fatal: :(glob)*.jpg: pathspec magic not supported by this command: 'glob'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so apparently even not all git commands implement pathrefs uniformly across them :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GitPathSpec support class and constraint
2 participants