Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: datalad tree command #92

Merged
merged 135 commits into from
Aug 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
0c83a30
Port test suite to pytest
mih Jun 8, 2022
fcd846d
Temporarily depend on datalad-not-yet-0.17
mih Jun 8, 2022
501b124
register datalad tree command, dummy implementation
catetrai Jun 29, 2022
3cdeb2f
bulk of implementation with 2 passing tests
catetrai Jul 3, 2022
089226b
clean up docstrings / comments
catetrai Jul 3, 2022
6f191ec
set default depth to 1 (prevents annoying wall-of-text if forgot to s…
catetrai Jul 3, 2022
c17868b
remove parameter --full-paths, does not add much value
catetrai Jul 4, 2022
cafd3d3
set up parametrized tests to cover combinations of datalad tree options
catetrai Jul 4, 2022
7e7ab1d
major refactoring, add tests for tree stats
catetrai Jul 6, 2022
da31023
fix class name registered as command implementation class
catetrai Jul 6, 2022
24ace59
customize print format for different node types
catetrai Jul 6, 2022
599f854
fix command parameter names and examples
catetrai Jul 6, 2022
689c303
clean up docstrings
catetrai Jul 6, 2022
de259a4
add methods for yielding string output lines
catetrai Jul 7, 2022
5c9ca96
add test for normalization of root path
catetrai Jul 7, 2022
4a84ee6
fix incorrect string creation in to_string()
catetrai Jul 7, 2022
2518d63
add tests for trees with datasets
catetrai Jul 7, 2022
17d6091
reinstate --full-paths param (useful in combination with --datasets-o…
catetrai Jul 7, 2022
deb06e3
support color terminal output for directories and dataset paths
catetrai Jul 7, 2022
7326603
allow --depth=0 (useful in combination with --dataset-depth)
catetrai Jul 7, 2022
888dfe5
clean up comment, removed unused imports
catetrai Jul 7, 2022
f2eb743
add tests for stats with datasets
catetrai Jul 8, 2022
4691966
remove failing tests for --datasets-only (impl needs rework)
catetrai Jul 8, 2022
5fd855d
fix false-positive detection of datasets on pure git repos
catetrai Jul 8, 2022
6401f55
improve efficiency of dataset detection
catetrai Jul 11, 2022
364a925
remove argument --datasets-only, to be replaced with dataset subtree …
catetrai Jul 11, 2022
00cbab9
remove redundant logic in directory walk
catetrai Jul 12, 2022
9cb0baf
store last children as set for faster is_last_child check
catetrai Jul 12, 2022
5e65110
extract is_dataset() into function outside class for easier reuse, ad…
catetrai Jul 12, 2022
c50f7b9
WIP: refactor using pathlib for simpler node generation
catetrai Jul 15, 2022
9a1b6fc
separate low-level vs public node generator to allow using decorator …
catetrai Jul 15, 2022
537a753
add tests for report line with max_datasets
catetrai Jul 24, 2022
0ced1e9
precalculate dataset location (WIP, tests are failing)
catetrai Jul 24, 2022
12b6160
replace parameters include_hidden and include_files with generic call…
catetrai Jul 25, 2022
a4bf0b1
reimplement dataset tree using exclusion function (all tests are pass…
catetrai Jul 25, 2022
993999a
optimize search of child datasets by starting from the current path a…
catetrai Jul 26, 2022
c892152
update command parameter help texts
catetrai Jul 26, 2022
59592bb
generate stats string automatically based on subclasses of _TreeNode
catetrai Jul 26, 2022
602f161
extract symbols for indentation display
catetrai Jul 26, 2022
1809294
replace remaining usages of os module with pathlib
catetrai Jul 26, 2022
bc0316b
refactor test suite using test classes
catetrai Jul 26, 2022
04717c3
cleanup for readability
catetrai Jul 26, 2022
c7fba5c
move placement of class declarations
catetrai Jul 26, 2022
31856d2
clean up remaining vestiges of non-Path paths
catetrai Jul 26, 2022
daf1957
cache results of ds.get_superdatasets() which brings a modest speed i…
catetrai Jul 26, 2022
9a938f0
use Path obj in test
catetrai Jul 27, 2022
f00dca6
resolve merge conflicts
catetrai Jul 27, 2022
0ccac9b
improve docstrings, convert to numpy format
catetrai Jul 27, 2022
0329d2d
specify explicit command name in entrypoints (otherwise, does not gen…
catetrai Jul 27, 2022
b19319d
rewording in docstring
catetrai Jul 27, 2022
7286763
remove short-form options for now (TBD)
catetrai Jul 27, 2022
7a5fb88
replace pathlib function with implementation compatible with python<3.9
catetrai Jul 27, 2022
f191337
change appearance of dataset marker, place before path for tidier dis…
catetrai Jul 28, 2022
3d12474
use common call to Tree/DatasetTree constructor
catetrai Jul 28, 2022
7a74655
use custom result renderers
catetrai Jul 29, 2022
5730693
improve search algorithm: cache results of git operations, use 1 fixe…
catetrai Jul 30, 2022
054cb98
update docstrings
catetrai Jul 30, 2022
2745065
reword docstrings
catetrai Jul 30, 2022
c71a1b3
do not print generic render output (command with status 'ok') in cust…
catetrai Aug 1, 2022
ac99ee3
get subdatasets by calling command to avoid import of full dataset API
catetrai Aug 1, 2022
80d0147
major refactor: move all tree2string logic to custom renderer in comm…
catetrai Aug 4, 2022
0c232f0
explicitly specify state 'any' for subdatasets
catetrai Aug 4, 2022
21b9125
minor rewordings of comments
catetrai Aug 4, 2022
045a813
only show file count in stats line if --include-files option is given
catetrai Aug 4, 2022
793ecaa
use ui.message() instead of print() in result renderers
catetrai Aug 4, 2022
bab4830
remove unneeded Tree attribute 'skip_root'
catetrai Aug 4, 2022
5f2f2d5
start ds_generator from node below the root node
catetrai Aug 4, 2022
f66e30f
rename 'visited_parents' to 'visited' since we store all yielded nodes
catetrai Aug 4, 2022
6004d51
cast exhausted_subtrees set to list in results dict for easier conver…
catetrai Aug 4, 2022
9d074fd
remove redundant constructors for FileNode and DirectoryNode, which j…
catetrai Aug 4, 2022
f7ee925
replace print with ui.message in tests as well (fixes encoding error …
catetrai Aug 4, 2022
d3c8243
get dataset's pathobj property instead of re-instantiating Path object
catetrai Aug 5, 2022
d50653a
use context manager 'make_tempfile' for deleting temp dir
catetrai Aug 5, 2022
54f4f87
clean up imports, improve docstring wording
catetrai Aug 6, 2022
b9caf11
first error handling impl with checks for OSErrors and circular symlinks
catetrai Aug 6, 2022
e67be06
clean up imports formatting
catetrai Aug 6, 2022
a9e7d32
assert that directory exists before deleting it
catetrai Aug 6, 2022
45b73f9
add commented option to compare test result with tree command output
catetrai Aug 7, 2022
dfb3856
raise exception if trying to calculate depth of path outside the tree…
catetrai Aug 7, 2022
b9ca3ef
return true for is_recursive_symlinks() if link points to itself
catetrai Aug 7, 2022
6a047fa
detect dataset without ds.id if it has metadata aggregator
catetrai Aug 7, 2022
e101c85
add debug logging
catetrai Aug 7, 2022
7e16f26
use None as dummy depth instead of -1 (allowed value)
catetrai Aug 7, 2022
1c6b9a3
improve documentation of TestTree base class
catetrai Aug 7, 2022
8f46400
move tests for filesystem issues to separate test class, add test for…
catetrai Aug 7, 2022
ae1c6ad
add test for no difference if tree root is absolute or relative path
catetrai Aug 7, 2022
ee162cd
expand test for 0-depth tree
catetrai Aug 7, 2022
99b627d
fix logic for recursive symlink detection, add logging
catetrai Aug 7, 2022
de16e02
support symlinks in result dict and custom renderer
catetrai Aug 7, 2022
6b2aec3
skip test for missing permissions if on windows
catetrai Aug 7, 2022
5bfaf00
on windows, symlink loop raises OSError
catetrai Aug 7, 2022
e5a0e50
use platform-specific path separator for display of relative symlinks
catetrai Aug 7, 2022
26e2f17
catch OSError on windows for self-referencing symlink
catetrai Aug 7, 2022
f3d7a17
add (failing) test for broken symlinks pointing to inaccessible files…
catetrai Aug 12, 2022
322e763
handle permission error when symlink points to file under inaccessibl…
catetrai Aug 12, 2022
69fa591
validate input of is_broken_symlink(), catch PermissionError as separ…
catetrai Aug 12, 2022
6db0e12
discard exhausted_levels deeper than the current node's depth (not ne…
catetrai Aug 13, 2022
9c7990b
replace 'with_tempfile' decorator with fixture to allow passing multi…
catetrai Aug 13, 2022
858ab68
split test for broken symlinks with vs without permission errors (to …
catetrai Aug 13, 2022
ad482ce
make uniform usage of pathlib's resolve()
catetrai Aug 13, 2022
5ba7bdb
handle permission error for input root path in Tree constructor
catetrai Aug 13, 2022
55bf11d
create nodes using single factory class to centralize error handling
catetrai Aug 13, 2022
46c0881
exclude_node_func() now accepts node object as arg instead of path
catetrai Aug 13, 2022
ae574f6
move is_recursive_symlink() to _TreeNode method, extract path_depth()…
catetrai Aug 13, 2022
e4d041d
Cannot use too-modern type annotation (yet)
mih Aug 15, 2022
e3acc01
Merge branch 'main' into nf-tree
mih Aug 15, 2022
d4664fd
Configure ReadTheDocs to use PY3.9 to handle the type annotations
mih Aug 15, 2022
31cdba9
Adjust test skip condition to match problem
mih Aug 15, 2022
7a66ba6
disable results rendering for ds.create() calls in test setup
catetrai Aug 15, 2022
990af0c
remove redundant calls to path.relative_to() in path_depth() (suggest…
catetrai Aug 15, 2022
a551e25
remove unnecessary calls to is_path_relative_to() to improve performance
catetrai Aug 15, 2022
37281a3
get parent ds from stored visited nodes in _ds_child_node_exceeds_max…
catetrai Aug 15, 2022
445fb22
remove is_dir() check to limit system calls (expensive on huge direct…
catetrai Aug 15, 2022
7b1f5b1
add parameter 'installed_only' to is_dataset() for skipping non-insta…
catetrai Aug 15, 2022
af01420
cache results of get_dataset_root_datalad_only() (suggested by @mih)
catetrai Aug 15, 2022
2311a1e
skip unnecessary input validation in Node() constructor
catetrai Aug 15, 2022
070953d
do not count datasets that are only metadata aggregators (because of …
catetrai Aug 15, 2022
744caaa
update tree command docs
catetrai Aug 16, 2022
e32cf58
reword args descriptions in tree command docs
catetrai Aug 16, 2022
8510dd6
fix lru_cache decorator syntax for python 3.7 compatibility
catetrai Aug 17, 2022
1be6a77
Merge branch 'main' into nf-tree
catetrai Aug 17, 2022
c64ff43
add test for dataset tree with resulting directory depth that exceeds…
catetrai Aug 20, 2022
cebaf60
add test for dataset tree when there are no datasets
catetrai Aug 20, 2022
26a35cf
fix formatting of multiple imports
catetrai Aug 20, 2022
b3d1ac3
add note on performance of --dataset-depth option
catetrai Aug 20, 2022
bea50e3
reword docstrings / log messages
catetrai Aug 20, 2022
0b7808d
cast Tree root to Path object in constructor
catetrai Aug 20, 2022
ca689be
add '__repr__' methods to classes
catetrai Aug 20, 2022
d290d41
rewording in docstrings
catetrai Aug 20, 2022
79732a7
compute whole dataset tree upfront instead of yielding in tandem with…
catetrai Aug 20, 2022
1a02df5
use %-string formatting for log messages (evaluated only if log is em…
catetrai Aug 20, 2022
90ff0e9
rename option --dataset-depth to --recursion-limit and add short form
catetrai Aug 22, 2022
a864693
add option --recursive for unlimited-depth dataset tree
catetrai Aug 22, 2022
1d8146e
Merge branch 'main' into nf-tree
catetrai Aug 22, 2022
8634618
Add changelog snippet
mih Aug 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions changelog.d/20220824_085736_michael.hanke_nf_tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
### 💫 Enhancements and new features

- New `tree` command for traversing a directory hierarchy.
Like the UNIX equivalent, it can visualize a directory tree.
Additionally, it annotates the output with DataLad-related
information, like the location of dataset, and their nesting
depth. Besides visualization, `tree` also reports structured
data in the form of result records that enable other applications
to use `tree` for gathering data from the file system.
Fixes https://github.com/datalad/datalad-next/issues/78 via
https://github.com/datalad/datalad-next/pull/92 (by @catetrai)
8 changes: 8 additions & 0 deletions datalad_next/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@
# not pick it up, due to the dashes in the name
'create-sibling-webdav',
),
(
# importable module that contains the command implementation
'datalad_next.tree',
# name of the command class implementation in above module
'TreeCommand',
# command name (differs from lowercase command class name)
'tree'
)
]
)

Expand Down
2 changes: 1 addition & 1 deletion datalad_next/conftest.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from datalad.conftest import setup_package
from datalad.conftest import setup_package
1 change: 1 addition & 0 deletions datalad_next/tests/test_register.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
def test_register():
import datalad.api as da
assert hasattr(da, 'credentials')
assert hasattr(da, 'tree')
Loading