Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitPathSpec support class and constraint #587

Closed
mih opened this issue Jan 8, 2024 · 2 comments · Fixed by #719
Closed

Add GitPathSpec support class and constraint #587

mih opened this issue Jan 8, 2024 · 2 comments · Fixed by #719

Comments

@mih
Copy link
Member

mih commented Jan 8, 2024

Looking into #586 it seems attractive to start support Git's pathspecs properly https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

The main development challenge would be support for mangling pathspecs for subdataset recursions implemented outside Git. Something like stripping away a path prefix when a pathspec is passed on to another call on a subdataset -- or detecting that a particular pathspec would no longer apply to a subdataset.

The specification suggests great flexibility. The key statement/property seems to be

the pathspec up to the last slash represents a directory prefix. The scope of that pathspec is limited to that subtree.

This would imply that we need to implement pathspec parsing enough to be able to extract and manipulate that directory prefix.

Such a manipulation might be as simple as removing segments of the / delimited directory prefix or glob.

For operation on Windows we might also want to convert platform paths to POSIX paths for simplicity. But I have not yet tested how Git behaves there.

@mih
Copy link
Member Author

mih commented Jan 9, 2024

Turns out that the mangling of path specs for submodules (subdirectories) is not trivial. Supporting all possible combinations of magics is laborious, but seems doable. The main challenge is the deal with wildcards. When translating a pathspec for a subdirectory we do have a full path to match against (only the subdirectory). I cannot come up with a rule when and how to decide where to strip the longest or the shortest match from the pathspec. The best I can come up with is to multiply the pathspecs whenever they contain *, and use shortest and longest match for a subdirectory.

mih added a commit to mih/datalad-next that referenced this issue Jan 9, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue Jan 9, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue Jan 9, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue Jan 9, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
@mih
Copy link
Member Author

mih commented Jan 17, 2024

#592 provides code and a test environment to continue the development of #588.

mih added a commit to mih/datalad-next that referenced this issue Jan 26, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue Jan 29, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping datalad#587

Also see datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 23, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 24, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 24, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 27, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue May 27, 2024
It is a thin proxy for `GitPathSpec.from_pathspec_str()`.

The error message template includes `__itemized_causes__` to ensure
that underlying violation causes are reported to a user.

Refs: datalad#587
@mih mih mentioned this issue May 27, 2024
6 tasks
mih added a commit to mih/datalad-next that referenced this issue Jun 5, 2024
It is a thin proxy for `GitPathSpec.from_pathspec_str()`.

The error message template includes `__itemized_causes__` to ensure
that underlying violation causes are reported to a user.

Refs: datalad#587
mih added a commit to mih/datalad-next that referenced this issue Jun 10, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The key algorihtm lives in a standlone function
`yield_subdir_match_remainder_pathspecs()` that performs a purely
lexical analysis. It also comes with a dedicated test collection that is
leaner and easier to extend than the previous one (which remains also).

The additionally included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

Refs: datalad#587, datalad/datalad#6933
adswa pushed a commit to mih/datalad-next that referenced this issue Jun 10, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The key algorihtm lives in a standlone function
`yield_subdir_match_remainder_pathspecs()` that performs a purely
lexical analysis. It also comes with a dedicated test collection that is
leaner and easier to extend than the previous one (which remains also).

The additionally included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

Refs: datalad#587, datalad/datalad#6933
mih added a commit to mih/datalad-next that referenced this issue Jun 11, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The key algorihtm lives in a standlone function
`yield_subdir_match_remainder_pathspecs()` that performs a purely
lexical analysis. It also comes with a dedicated test collection that is
leaner and easier to extend than the previous one (which remains also).

The additionally included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

Refs: datalad#587, datalad/datalad#6933
@mih mih closed this as completed in #719 Jun 11, 2024
mih added a commit that referenced this issue Jul 21, 2024
The main (if not only) purpose of this functionality is pathspec
mangling/translation for handing them over to analog Git command
calls on submodules -- for any Git command that supports pathspecs,
but not recursion.

A simple example for such a command is `git ls-files --other`. It
accepts pathspecs, but does not implement `--recurse-submodules` for
listing untracked files.

The goal of this functionality is to be able to take pathspecs that is
valid in the context of a top-level repository, and translate it such
that the set of paths specs given to the same command running on/in a
submodule/subdirectory gives the same results, as if the initial
top-level invocation reported them (if it even could).

The included sketch of a testbattery uses ``git ls-files --other`
for testing, rather than a formal description -- because the behavior
of the implementation is more elaborate than the documentation at
https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec
suggests.

All testing is (for now) performed within a single repository, and with
translation for execution in subdirectories.

The implementation is a rough sketch for exploring the problem, rather
than anything polished.

Ping #587

Also see datalad/datalad#6933
mih added a commit that referenced this issue Jul 21, 2024
It is a thin proxy for `GitPathSpec.from_pathspec_str()`.

The error message template includes `__itemized_causes__` to ensure
that underlying violation causes are reported to a user.

Refs: #587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant