-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more 1.x updates... (import) #2094
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
# import | ||
|
||
Download a file or directory tracked by DVC or by Git into the | ||
<abbr>workspace</abbr>. It also creates a `.dvc` file with information about the | ||
data source, which can later be used to [update](/doc/command-reference/update) | ||
the import. | ||
Download a file or directory tracked by another DVC or Git repository into the | ||
<abbr>workspace</abbr>, and track it (an import `.dvc` file is created). | ||
Comment on lines
1
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For |
||
|
||
> See also our `dvc.api.open()` Python API function. | ||
|
||
|
@@ -25,16 +23,25 @@ positional arguments: | |
Provides an easy way to reuse files or directories tracked in any <abbr>DVC | ||
repository</abbr> (e.g. datasets, intermediate results, ML models) or Git | ||
repository (e.g. code, small image/other files). `dvc import` downloads the | ||
target file or directory (found at `path` in `url`) into the workspace and | ||
tracks it in the project. This makes it possible to update the import later, if | ||
it has changed in its data source (see `dvc update`). | ||
target file or directory (found at `path` in `url`), and tracks it in the local | ||
project. This makes it possible to update the import later, if the data source | ||
has changed (see `dvc update`). | ||
|
||
> Note that `dvc get` corresponds to the first step this command performs (just | ||
> download the data). | ||
|
||
> See `dvc list` for a way to browse repository contents to find files or | ||
> directories to import. | ||
|
||
The imported data is <abbr>cached</abbr>, and linked (or copied) to the current | ||
working directory with its original file name e.g. `data.txt` (or to a location | ||
provided with `--out`). An _import `.dvc` file_ is created in the same location | ||
e.g. `data.txt.dvc` – similar to using `dvc add` after downloading the data. | ||
|
||
⚠️ DVC won't push or pull data imported from other DVC repos to/from | ||
[remote storage](/doc/command-reference/remote). It will rely on it's original | ||
source. | ||
|
||
The `url` argument specifies the address of the DVC or Git repository containing | ||
the data source. Both HTTP and SSH protocols are supported (e.g. | ||
`[user@]server:project.git`). `url` can also be a local file system path | ||
|
@@ -46,33 +53,22 @@ tracked by either Git or DVC (including paths inside tracked directories). Note | |
that DVC-tracked targets must be found in a `dvc.yaml` or `.dvc` file of the | ||
repo. | ||
|
||
⚠️ DVC repos should have a default [DVC remote](/doc/command-reference/remote) | ||
containing the target actual for this command to work. The only exception is for | ||
local repos, where DVC will try to copy the data from its <abbr>cache</abbr> | ||
first. | ||
⚠️ Source DVC repos should have a default | ||
[DVC remote](/doc/command-reference/remote) containing the target data for this | ||
command to work. The only exception is for local repos, where DVC will try to | ||
copy the data from its <abbr>cache</abbr> first. | ||
|
||
> See `dvc import-url` to download and track data from other supported locations | ||
> such as S3, SSH, HTTP, etc. | ||
|
||
After running this command successfully, the imported data is placed in the | ||
current working directory (unless `-o` is used) with its original file name e.g. | ||
`data.txt`. An _import stage_ (`.dvc` file) is also created in the same | ||
location, extending the name of the imported data e.g. `data.txt.dvc` – similar | ||
to having used `dvc run` to generate the data as a stage <abbr>output</abbr>. | ||
|
||
`.dvc` files support references to data in an external DVC repository (hosted on | ||
a Git server). In such a `.dvc` file, the `deps` field specifies the remote | ||
`url` and data `path`, and the `outs` field contains the corresponding local | ||
path in the <abbr>workspace</abbr>. It records enough metadata about the | ||
imported data to enable DVC efficiently determining whether the local copy is | ||
out of date. | ||
|
||
⚠️ DVC won't push or pull imported data to/from | ||
[remote storage](/doc/command-reference/remote), it will rely on it's original | ||
source. | ||
a Git server). In such a `.dvc` file, the `deps` field specifies the `url` and | ||
data `path`, and the `outs` field contains the corresponding local path in the | ||
<abbr>workspace</abbr>. It records enough metadata about the imported data to | ||
enable DVC efficiently determining whether the local copy is out of date. | ||
|
||
To actually [version the data](/doc/tutorials/get-started/data-versioning), | ||
`git add` (and `git commit`) the import stage. | ||
`git add` (and `git commit`) the import `.dvc` file. | ||
|
||
Note that `dvc repro` doesn't check or update import `.dvc` files (see | ||
`dvc freeze`), use `dvc update` to bring the import up to date from the data | ||
|
@@ -98,8 +94,8 @@ repo at `url`) are not supported. | |
download the file or directory from. The latest commit in `master` (tip of the | ||
default branch) is used by default when this option is not specified. | ||
|
||
> Note that this adds a `rev` field in the import stage that fixes it to the | ||
> revision. This can impact the behavior of `dvc update` (see the | ||
> Note that this adds a `rev` field in the import `.dvc` file that fixes it to | ||
> the revision. This can impact the behavior of `dvc update` (see the | ||
> [Importing and updating fixed revisions](#example-importing-and-updating-fixed-revisions) | ||
> example below). | ||
|
||
|
@@ -140,8 +136,8 @@ Importing 'data/data.xml ([email protected]:iterative/example-get-started)' | |
``` | ||
|
||
In contrast with `dvc get`, this command doesn't just download the data file, | ||
but it also creates an import stage (`.dvc` file) with a link to the data source | ||
(as explained in the description above). (This `.dvc` file can later be used to | ||
but it also creates an import `.dvc` file with a link to the data source (as | ||
explained in the description above). (This `.dvc` file can later be used to | ||
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`: | ||
|
||
```yaml | ||
|
@@ -176,8 +172,8 @@ Importing | |
-> 'cats-dogs' | ||
``` | ||
|
||
When using this option, the import stage (`.dvc` file) will also have a `rev` | ||
subfield under `repo`: | ||
When using this option, the import `.dvc` file will also have a `rev` subfield | ||
under `repo`: | ||
|
||
```yaml | ||
deps: | ||
|
@@ -192,14 +188,14 @@ If `rev` is a Git branch or tag (where the underlying commit changes), the data | |
source may have updates at a later time. To bring it up to date if so (and | ||
update `rev_lock` in the `.dvc` file), simply use `dvc update <stage>.dvc`. If | ||
`rev` is a specific commit hash (does not change), `dvc update` without options | ||
will not have an effect on the import stage. You may force-update it to a | ||
will not have an effect on the import `.dvc` file. You may force-update it to a | ||
different commit with `dvc update --rev`: | ||
|
||
```dvc | ||
$ dvc update --rev cats-dogs-v2 | ||
``` | ||
|
||
> In the above example, the value for `rev` in the new import stage will be | ||
> In the above example, the value for `rev` in the new `.dvc` file will be | ||
> `master` (a branch) so it will be able update normally going forward. | ||
|
||
## Example: Data registry | ||
|
@@ -230,7 +226,7 @@ $ dvc import [email protected]:iterative/dataset-registry.git \ | |
`dvc import` provides a better way to incorporate data files tracked in external | ||
<abbr>DVC repositories</abbr> because it saves the connection between the | ||
current project and the source repo. This means that enough information is | ||
recorded in an import stage (`.dvc` file) in order to | ||
recorded in an import `.dvc` file in order to | ||
[reproduce](/doc/command-reference/repro) downloading of this same data version | ||
in the future, where and when needed. This is achieved with the `repo` field, | ||
for example (matching the import command above): | ||
|
@@ -265,8 +261,8 @@ Importing ... | |
|
||
> Note that Git-tracked files can be imported from DVC repos as well. | ||
|
||
The file is imported, and along with it, an import stage (`.dvc` file) is | ||
created. Check `it-standards.csv.dvc`: | ||
The file is imported, and along with it, an import `.dvc` file is created. Check | ||
`it-standards.csv.dvc`: | ||
|
||
```yaml | ||
deps: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For
import-url -h