From 33f7b2bbde373517fc316663e167d178411934bf Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Wed, 10 Mar 2021 11:13:01 +0300 Subject: [PATCH] Command and typo fixes in Accessing Data scenario --- get-started/accessing/01-download.md | 2 +- get-started/accessing/02-discovering-files.md | 8 ++++---- get-started/accessing/03-python-api.md | 6 ++++-- get-started/accessing/04-reusing-data-or-models.md | 3 ++- get-started/accessing/05-congrats.md | 2 +- 5 files changed, 12 insertions(+), 9 deletions(-) diff --git a/get-started/accessing/01-download.md b/get-started/accessing/01-download.md index b545061..3991122 100644 --- a/get-started/accessing/01-download.md +++ b/get-started/accessing/01-download.md @@ -1,6 +1,6 @@ # Download -We can download any file in a DVC repository: +We can download any file from a DVC repository: ``` dvc get \ diff --git a/get-started/accessing/02-discovering-files.md b/get-started/accessing/02-discovering-files.md index a5c1c5f..e160507 100644 --- a/get-started/accessing/02-discovering-files.md +++ b/get-started/accessing/02-discovering-files.md @@ -1,9 +1,9 @@ # Discovering files -As we mentioned, if you look at the [repository][dr], you won't see -`data/data.xml` or `model.pkl`, or any DVC-tracked files. They are not stored -in Git. We can `dvc get` them, but how do we even know what data is tracked in a -remote DVC repo before accessing it? +If you look at the [repository][dr], you won't see `data/data.xml` or +`model.pkl`, or any DVC-tracked files. They are not stored in Git. We can +`dvc get` them, but how do we even know what data is tracked in a remote DVC +repo before accessing it? [dr]: https://github.com/iterative/dataset-registry diff --git a/get-started/accessing/03-python-api.md b/get-started/accessing/03-python-api.md index 50ed21a..c12a72d 100644 --- a/get-started/accessing/03-python-api.md +++ b/get-started/accessing/03-python-api.md @@ -2,13 +2,15 @@ Besides using DVC commands in the command line, we can also access any DVC-tracked artifact "natively" from Python with -[the API](https://dvc.org/doc/api-reference): +[the API](https://dvc.org/doc/api-reference). Please click the below link to open the Python script: `process.py`{{open}} The script downloads the data like `dvc get` and counts the number of lines in it: -`python3 process.py`{{execute}} +`python3 ~/process.py`{{execute}} + +Note that the script doesn't download the data to a file before counting the lines. The interface of [`dvc.api.open`][apiopen] is similar to the one we've seen already. It receives Git repo URL and path as arguments, and works diff --git a/get-started/accessing/04-reusing-data-or-models.md b/get-started/accessing/04-reusing-data-or-models.md index e6343da..1adae4d 100644 --- a/get-started/accessing/04-reusing-data-or-models.md +++ b/get-started/accessing/04-reusing-data-or-models.md @@ -11,6 +11,7 @@ A DVC repository and the `dvc import` command are enough to export data and mode reuse them, track upstream changes, etc. Let's give it a try: ``` +mkdir data dvc import \ https://github.com/iterative/dataset-registry \ get-started/data.xml -o data/data.xml @@ -19,7 +20,7 @@ dvc import \ `dvc import` command creates `data/data.xml.dvc` to track the dependency. You can view this file in the editor: -`data/data.xml.dvc`{{open}} +`project/data/data.xml.dvc`{{open}} The `url` and `rev_lock` subfields under `repo` are used to save the origin and the version of the dependency, respectively: diff --git a/get-started/accessing/05-congrats.md b/get-started/accessing/05-congrats.md index f536b3c..b31ceec 100644 --- a/get-started/accessing/05-congrats.md +++ b/get-started/accessing/05-congrats.md @@ -5,7 +5,7 @@ download model and data files with `dvc get` or import them to DVC repositories with `dvc import`. DVC also has an API that streams large files directly into the memory with `dvc.api.open`. -Our vision is to have a central registry for all the data and model files and +DVC allows to have a central registry for all the data and model files and using them in different projects. It's based on Git, and provides flexibility without requiring additional infrastructure.