You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While iter_gitstatus() builds on iter_gitdiff(), the implementations are quite different. This is not good, because from afar iter_gitdiff() should be nothing more than iter_gitstatus() without reporting on untracked content and vice versa.
When looking into porting the iter_gitstatus() tests to the undertested iter_gitdiff() I find their behavior to be (slightly) different in the non-standard modes -- likely an indication of bugs.
The implementations should be aligned. I suspect that the implementation complexity of iter_gitstatus() will go down, and that of iter_gitdiff() will go up a bit.
Few aspects that I believe are true and should be reflected in the implementations:
iter_gitstatus() is iter_gitdiff() + untracked reporting
there are only two basic algorithms that need to be implemented
(1) reporting on the content of a container (directory/submodule)
(2) evaluate the modifications of a container (without recursion; ie. no reporting on individual content inside the container)
(2) is just (1) wrapped into an aggregation of modification type (plus an early stopping criterion)
The semantics of the -r and the future pathspecs arguments also need to be define clearly and be aligned across implementations. This is tricky, because Git's own handling of pathspecs sets no example for translation across submodules. Here is an example:
"Give me any modifications of *.py files across the entire dataset hierarchy"
datalad next-status -r '*.py'
Git uses the pathspec to limit the reporting. I believe we need to extend that concept to the evaluation of containers. I propose the following semantics:
--recursion should be something like --report-recursion...
It determines the items that are considered for evaluation:
no: only direct children of the base path are evaluated (no reporting on individual content inside them)
repository: only items underneath the base path AND in the same repository are evaluated
submodules: all items in the full dataset hierarchy underneath the base path are evaluated
With no pathspecs given, any modifications of any content are considered. All items that are not recursed into are reported on (see yield_tree_items for reporting the containers too).
With pathspecs given, a constrained is put on what is being considered for evaluation:
if a file does not match the pathspecs, it will be ignored
any content in a container (directory/submodule) that does not match the pathspecs is ignored too
Going back to
datalad next-status -r '*.py'
The proposed rules have the key consequence that, unlike Git, the above command would report on modifications of submodules where their respective name does NOT match the pathspec *.py. Moreover, for any submodule (or directory) only modifications of items with a name matching *.py are considered.
Only then we get "Give me any modifications of *.py files" in a way that truly treats a hierarchy of repositories as if they would be a monolithic dataset.
The text was updated successfully, but these errors were encountered:
While
iter_gitstatus()
builds oniter_gitdiff()
, the implementations are quite different. This is not good, because from afariter_gitdiff()
should be nothing more thaniter_gitstatus()
without reporting on untracked content and vice versa.When looking into porting the
iter_gitstatus()
tests to the undertestediter_gitdiff()
I find their behavior to be (slightly) different in the non-standard modes -- likely an indication of bugs.The implementations should be aligned. I suspect that the implementation complexity of
iter_gitstatus()
will go down, and that ofiter_gitdiff()
will go up a bit.Few aspects that I believe are true and should be reflected in the implementations:
iter_gitstatus()
isiter_gitdiff()
+ untracked reportingThe semantics of the
-r
and the futurepathspecs
arguments also need to be define clearly and be aligned across implementations. This is tricky, because Git's own handling of pathspecs sets no example for translation across submodules. Here is an example:"Give me any modifications of
*.py
files across the entire dataset hierarchy"Git uses the pathspec to limit the reporting. I believe we need to extend that concept to the evaluation of containers. I propose the following semantics:
--recursion
should be something like--report-recursion
...It determines the items that are considered for evaluation:
no
: only direct children of the basepath
are evaluated (no reporting on individual content inside them)repository
: only items underneath the basepath
AND in the same repository are evaluatedsubmodules
: all items in the full dataset hierarchy underneath the basepath
are evaluatedWith no
pathspecs
given, any modifications of any content are considered. All items that are not recursed into are reported on (seeyield_tree_items
for reporting the containers too).With
pathspecs
given, a constrained is put on what is being considered for evaluation:Going back to
The proposed rules have the key consequence that, unlike Git, the above command would report on modifications of submodules where their respective name does NOT match the pathspec
*.py
. Moreover, for any submodule (or directory) only modifications of items with a name matching*.py
are considered.Only then we get "Give me any modifications of
*.py
files" in a way that truly treats a hierarchy of repositories as if they would be a monolithic dataset.The text was updated successfully, but these errors were encountered: