From 859b2cff1e1497fcf74c372a359fb160eb6fad0a Mon Sep 17 00:00:00 2001 From: Guro Bokum Date: Mon, 3 Feb 2020 23:53:46 +0000 Subject: [PATCH 1/4] api: add list command --- public/static/docs/command-reference/list.md | 138 +++++++++++++++++++ public/static/docs/sidebar.json | 4 + 2 files changed, 142 insertions(+) create mode 100644 public/static/docs/command-reference/list.md diff --git a/public/static/docs/command-reference/list.md b/public/static/docs/command-reference/list.md new file mode 100644 index 0000000000..5463b7b2c0 --- /dev/null +++ b/public/static/docs/command-reference/list.md @@ -0,0 +1,138 @@ +# list + +List repository contents, including files and directories tracked +by DVC (data artifacts) and by Git. + +## Synopsis + +```usage +usage: dvc list [-h] [-q | -v] [-R] [--outs-only] [--rev [REV]] url [path] + +positional arguments: + url Location of DVC repository to list. + path Path to a file or directory within the repository. +``` + +## Description + +List files, dirs and data artifacts for the pointed URL. The output +is sorted lexicographically. + +With the command you may list all data artifacts the repo contains. +Also it works with remote repos and you can list files before trying to get it +(with `dvc get` or `dvc import`). + +The `url` argument specifies the address of the DVC or Git repository containing +the data source. Both HTTP and SSH protocols are supported for online repos +(e.g. `[user@]server:project.git`). `url` can also be a local file system path. +When the url is remote Git URL the content is checkout into temporary directory. + +`--outs-only` option allows to filter data artifacts into the repo, +so only data artifacts will be printed. + +`path` argument is used for pointing relative path into the repo. So you may use +it when need to list files for some specific path. With recursive option `-R` it +allows to filter output by prefix. + +Also with `path` argument you may check existense of some file - if the file +doesn't exist the error would be thrown. + +## Options + +- `--outs-only` - show only data artifacts. + +- `-R`, `--recursive` - recursively prints the directory. When `path` is not + specified the directory is the root of the repo. + +- `--rev` - commit hash, branch or tag name, etc. (any + [Git revision](https://git-scm.com/docs/revisions)) of the repository to list + content for. The latest commit in `master` (tip of the default branch) is used + by default when this option is not specified. + +- `-h`, `--help` - prints the usage/help message, and exit. + +- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no + problems arise, otherwise 1. + +- `-v`, `--verbose` - displays detailed tracing information. when this option is + not specified. + +## Example: List remote git repo + +We can use the command for getting information about remote repository with all +files, dirs and data artifacts. + +```dvc +$ dvc list -R https://github.com/iterative/dataset-registry + +.gitignore +README.md +get-started/.gitignore +get-started/data.xml +get-started/data.xml.dvc +images/.gitignore +images/dvc-logo-outlines.png +images/dvc-logo-outlines.png.dvc +images/owl_sticker.png +images/owl_sticker.png.dvc +images/owl_sticker.svg +... +``` + +Or + +```dvc +$ dvc list https://github.com/iterative/dataset-registry + +.gitignore +README.md +get-started +images +tutorial +use-cases +``` + +for getting flat information about the repo + +## Example: List the repo with the rev + +Another useful case is checking the files for the **specific revision** + +```dvc +$ dvc list -R --rev 7476a858f6200864b5755863c729bff41d0fb045 \ + https://github.com/iterative/dataset-registry + +.gitignore +README.md +get-started/.gitignore +get-started/data.xml +get-started/data.xml.dvc +tutorial/nlp/.gitignore +tutorial/nlp/Posts.xml.zip +tutorial/nlp/Posts.xml.zip.dvc +tutorial/nlp/pipeline.zip +tutorial/nlp/pipeline.zip.dvc +tutorial/ver/.gitignore +tutorial/ver/data.zip +tutorial/ver/data.zip.dvc +tutorial/ver/new-labels.zip +``` + +## Example: Check the path + +Before trying to get or import some data artifacts with `dvc get` +or `dvc import` we can check their existence with + +```dvc +$ dvc list --outs-only \ + https://github.com/iterative/dataset-registry \ + tutorial/nlp/pipeline.zip +``` + +Or everything under the prefix + +```dvc +$ dvc list -R --outs-only \ + https://github.com/iterative/dataset-registry \ + tutorial +``` diff --git a/public/static/docs/sidebar.json b/public/static/docs/sidebar.json index ab11ce0e2b..af83eb16e0 100644 --- a/public/static/docs/sidebar.json +++ b/public/static/docs/sidebar.json @@ -241,6 +241,10 @@ "label": "install", "slug": "install" }, + { + "label": "list", + "slug": "list" + }, { "label": "lock", "slug": "lock" From df8784d9dc4cd42533d884f7fd5d3dce4dea9fb9 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 3 Mar 2020 19:11:49 -0600 Subject: [PATCH 2/4] cmd ref: reqrite description (examples pending review) and link from get/import --- public/static/docs/command-reference/get.md | 7 ++- .../static/docs/command-reference/import.md | 7 ++- public/static/docs/command-reference/list.md | 57 +++++++++++-------- 3 files changed, 42 insertions(+), 29 deletions(-) diff --git a/public/static/docs/command-reference/get.md b/public/static/docs/command-reference/get.md index 36ebc1fcba..daed948cb0 100644 --- a/public/static/docs/command-reference/get.md +++ b/public/static/docs/command-reference/get.md @@ -12,8 +12,8 @@ directory. usage: dvc get [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url Location of DVC or Git repository to download from. - path Path to a file or directory within the repository. + url Location of DVC or Git repository to download from + path Path to a file or directory within the repository ``` ## Description @@ -27,6 +27,9 @@ target file or directory (`url`/`path`) to the current working directory. Note that this command doesn't require an existing DVC project to run in. It's a single-purpose command that can be used out of the box after installing DVC. +> See `dvc list` for a way to browse repository contents to find files or +> directories to download. + The `url` argument specifies the address of the DVC or Git repository containing the data source. Both HTTP and SSH protocols are supported for online repos (e.g. `[user@]server:project.git`). `url` can also be a local file system path diff --git a/public/static/docs/command-reference/import.md b/public/static/docs/command-reference/import.md index b97bf79a57..ec46be3075 100644 --- a/public/static/docs/command-reference/import.md +++ b/public/static/docs/command-reference/import.md @@ -15,8 +15,8 @@ import. usage: dvc import [-h] [-q | -v] [-o [OUT]] [--rev [REV]] url path positional arguments: - url Location of DVC or Git repository to download from. - path Path to a file or directory within the repository. + url Location of DVC or Git repository to download from + path Path to a file or directory within the repository ``` ## Description @@ -28,6 +28,9 @@ the target file or directory (`url`/`path`) in a way so that it's tracked with DVC, becoming a local data artifact. This also permits updating the import later, if it has changed in its data source. (See `dvc update`.) +> See `dvc list` for a way to browse repository contents to find files or +> directories to import. + The `url` argument specifies the address of the DVC or Git repository containing the data source. Both HTTP and SSH protocols are supported for online repos (e.g. `[user@]server:project.git`). `url` can also be a local file system path diff --git a/public/static/docs/command-reference/list.md b/public/static/docs/command-reference/list.md index 5463b7b2c0..37808d59a9 100644 --- a/public/static/docs/command-reference/list.md +++ b/public/static/docs/command-reference/list.md @@ -1,48 +1,55 @@ # list -List repository contents, including files and directories tracked -by DVC (data artifacts) and by Git. +List repository contents, including files and directories tracked by DVC +(data artifacts) and by Git. ## Synopsis ```usage -usage: dvc list [-h] [-q | -v] [-R] [--outs-only] [--rev [REV]] url [path] +usage: dvc list [-h] [-q | -v] [-R] [--outs-only] [--rev [REV]] + url [target] positional arguments: - url Location of DVC repository to list. - path Path to a file or directory within the repository. + url Location of DVC or Git repository to list from + target Path to a file or directory within the repository ``` ## Description -List files, dirs and data artifacts for the pointed URL. The output -is sorted lexicographically. +Lists files and directories in the root of a repository, including +data artifacts tracked by DVC (e.g. data, models), and Git-tracked +files (e.g. source code). To list recursively, use the `-R` option. -With the command you may list all data artifacts the repo contains. -Also it works with remote repos and you can list files before trying to get it -(with `dvc get` or `dvc import`). +This command especially useful to browse a public repo in order to find the +exact file or directory names to `dvc import` or `dvc get`. The list is sorted +alphabetically. -The `url` argument specifies the address of the DVC or Git repository containing -the data source. Both HTTP and SSH protocols are supported for online repos -(e.g. `[user@]server:project.git`). `url` can also be a local file system path. -When the url is remote Git URL the content is checkout into temporary directory. +Note that this command doesn't require an existing DVC project to run in. Also, +it does not support listing DVC projects that aren't tracked by Git +(see the `--no-scm` option of `dvc init`). -`--outs-only` option allows to filter data artifacts into the repo, -so only data artifacts will be printed. +The `url` argument specifies the address of the DVC or Git repository to list. +Both HTTP and SSH protocols are supported for online repos (e.g. +`[user@]server:project.git`). `url` can also be a local file system path to an +"offline" repo. + +The `target` argument of this command is used to specify a path within the +source repository at `url`. If the target is a file found in the repo, it's file +name will be printed as a way to confirm its existence. If it's a Git-tracked +directory, files and directories directly under it will be listed (use option +`-R` to list recursively). -`path` argument is used for pointing relative path into the repo. So you may use -it when need to list files for some specific path. With recursive option `-R` it -allows to filter output by prefix. +Listing the contents of DVC-tracked directories is not supported at the time. -Also with `path` argument you may check existense of some file - if the file -doesn't exist the error would be thrown. +`--outs-only` option allows to filter data artifacts into the repo, +so only data artifacts will be printed. ## Options -- `--outs-only` - show only data artifacts. +- `-R`, `--recursive` - recursively prints the repository contents. (It can be + limited to a specific Git-tracked directory by supplying a `target` argument.) -- `-R`, `--recursive` - recursively prints the directory. When `path` is not - specified the directory is the root of the repo. +- `--outs-only` - show only DVC-tracked data (outputs). - `--rev` - commit hash, branch or tag name, etc. (any [Git revision](https://git-scm.com/docs/revisions)) of the repository to list @@ -60,7 +67,7 @@ doesn't exist the error would be thrown. ## Example: List remote git repo We can use the command for getting information about remote repository with all -files, dirs and data artifacts. +files, directories and data artifacts. ```dvc $ dvc list -R https://github.com/iterative/dataset-registry From 8a94f6bd0785d65cf270011ef7d122b9a51f30a8 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 3 Mar 2020 19:18:01 -0600 Subject: [PATCH 3/4] api ref: list copy edits --- public/static/docs/command-reference/list.md | 26 +++++++++----------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/public/static/docs/command-reference/list.md b/public/static/docs/command-reference/list.md index 37808d59a9..19e44252bf 100644 --- a/public/static/docs/command-reference/list.md +++ b/public/static/docs/command-reference/list.md @@ -18,14 +18,14 @@ positional arguments: Lists files and directories in the root of a repository, including data artifacts tracked by DVC (e.g. data, models), and Git-tracked -files (e.g. source code). To list recursively, use the `-R` option. +files (e.g. source code). To list recursively, use the `-R` option. The list is +sorted alphabetically. -This command especially useful to browse a public repo in order to find the -exact file or directory names to `dvc import` or `dvc get`. The list is sorted -alphabetically. +This command is especially useful to browse a public repo in order to find the +exact file or directory names to `dvc import` or `dvc get`. Note that this command doesn't require an existing DVC project to run in. Also, -it does not support listing DVC projects that aren't tracked by Git +it doesn't support listing DVC projects that are not tracked by Git (see the `--no-scm` option of `dvc init`). The `url` argument specifies the address of the DVC or Git repository to list. @@ -34,22 +34,20 @@ Both HTTP and SSH protocols are supported for online repos (e.g. "offline" repo. The `target` argument of this command is used to specify a path within the -source repository at `url`. If the target is a file found in the repo, it's file -name will be printed as a way to confirm its existence. If it's a Git-tracked -directory, files and directories directly under it will be listed (use option -`-R` to list recursively). +source repository at `url`. If the target is a file and it's found in the repo, +it's file name will be printed as a way to confirm its existence. If it's a +Git-tracked directory, files and directories directly under it will be listed +(use option `-R` to list recursively). -Listing the contents of DVC-tracked directories is not supported at the time. - -`--outs-only` option allows to filter data artifacts into the repo, -so only data artifacts will be printed. +> Listing the contents of DVC-tracked directories is not supported at the time. ## Options - `-R`, `--recursive` - recursively prints the repository contents. (It can be limited to a specific Git-tracked directory by supplying a `target` argument.) -- `--outs-only` - show only DVC-tracked data (outputs). +- `--outs-only` - show only DVC-tracked files and directories + (outputs). - `--rev` - commit hash, branch or tag name, etc. (any [Git revision](https://git-scm.com/docs/revisions)) of the repository to list From 8526158dc10142518f78bcb0dc0fbc52cfd37099 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 4 Mar 2020 17:10:21 -0600 Subject: [PATCH 4/4] cmd ref: target->path in list per https://github.com/iterative/dvc.org/pull/1021#discussion_r387847999 --- public/static/docs/command-reference/list.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/public/static/docs/command-reference/list.md b/public/static/docs/command-reference/list.md index 19e44252bf..026c372185 100644 --- a/public/static/docs/command-reference/list.md +++ b/public/static/docs/command-reference/list.md @@ -7,11 +7,11 @@ List repository contents, including files and directories tracked by DVC ```usage usage: dvc list [-h] [-q | -v] [-R] [--outs-only] [--rev [REV]] - url [target] + url [path] positional arguments: url Location of DVC or Git repository to list from - target Path to a file or directory within the repository + path Path to a file or directory within the repository ``` ## Description @@ -33,18 +33,18 @@ Both HTTP and SSH protocols are supported for online repos (e.g. `[user@]server:project.git`). `url` can also be a local file system path to an "offline" repo. -The `target` argument of this command is used to specify a path within the -source repository at `url`. If the target is a file and it's found in the repo, -it's file name will be printed as a way to confirm its existence. If it's a -Git-tracked directory, files and directories directly under it will be listed -(use option `-R` to list recursively). +The `path` argument of this command is used to specify a path within the source +repository at `url`. If the path is a file and it's found in the repo, it will +be printed back as a confirmation of its existence. If it's a Git-tracked +directory, files and directories directly under it will be listed (use option +`-R` to list recursively). > Listing the contents of DVC-tracked directories is not supported at the time. ## Options - `-R`, `--recursive` - recursively prints the repository contents. (It can be - limited to a specific Git-tracked directory by supplying a `target` argument.) + limited to a specific Git-tracked directory by supplying a `path` argument.) - `--outs-only` - show only DVC-tracked files and directories (outputs).