-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce pipeline processes #298
base: maint_0.4
Are you sure you want to change the base?
Conversation
e18f550
to
113321d
Compare
113321d
to
6183a2c
Compare
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## maint_0.4 #298 +/- ##
==========================================
Coverage 86.27% 86.27%
==========================================
Files 88 88
Lines 4830 4830
==========================================
Hits 4167 4167
Misses 663 663 ☔ View full report in Codecov by Sentry. |
that's a lot of commits/work -- is it going to be merged? |
31d87d4
to
7cea3e2
Compare
This commit adds `build` and `twine` to `requirements-devel.txt`. It also moves sphinx-dependencies into development requirements. The datalad version is updated to >=0.17 In addition it sort the entries in `requirements-devel.txt` and `requirements.txt`.
This commit introduces AnnexedFileInfo, to hold annex-status information for a single file. To simplify handling, the dataclasses_json package is used and added to requirements Python version requirement has been set to >=3.7
This commit extends the FileInfo dataclass and derives the AnnexedFileInfo class from it. The classes hold file-information that is returned by AnnexRepo.get_content_annexinfo(), or by GitRepo.status(). It adds a parameter to pass JSON-serialized FileInfo or AnnexedFileInfo objects to the extract process via arguments, thus releaving the necessity to invoce git-annex to determine file-status.
This commit uses the --file-info parameter to provide extractors with status information about the element from which metadata should be extracted.
This commit adds code to handle repositories that do not posses an ID, usually these are plain git repositories.
This commit adds a pipeline provider and a pipeline processor with definable input output behavior. The content that should be yielded can be defined externally, the rate in which content is yielded can also be defined externally. This allows to perform repeatable performance measurements.
This commit adds information about the object-id and the processor pid of the processor and provider probes that are executed by meta-conduct
This commit adds an invocation count to the processor probe that counts the invocations on this instance of the probe.
This commit rebases the branch on maint_0.4 and adds code to check for the existence of datset IDs.
This commit fixes the reporting of datasets in traversal. There is still something to due in datasets, i.e. report "state", "gitshasum", and "prev_gitshasum"
This commit adds size information to the traverser output for non-annexed files. Because git-ls-files does not provide the size information, an additional git-ls-tree call is used to determine sizes of non-annexed files
Git version 2.30.2, which is currently available in debian-stable, does not support I very much like the spirit of this PR and I'm not sure whether debian-stable compatibility is a goal, so I'm just leaving this as an observation. |
This commit improves the code to handle the result of `git ls-files` properly, especially the size-attribute of directories.
This PR fixes #261 partially and fixes #268
This PR modifies the dataset traverser component to send almost all information it has about the traversed dataset elements to the processors. That reduces the number of processes that the processors have to execute.