diff --git a/content/docs/command-reference/get.md b/content/docs/command-reference/get.md index 4bec20ed25..a105b4efee 100644 --- a/content/docs/command-reference/get.md +++ b/content/docs/command-reference/get.md @@ -96,12 +96,11 @@ model.pkl Note that the `model.pkl` file doesn't actually exist in the [root directory](https://github.com/iterative/example-get-started/tree/master/) -of the external Git repo. Instead, it's exported in the +of the source Git repo. Instead, it's exported in the [`dvc.yaml`](https://github.com/iterative/example-get-started/blob/master/dvc.yaml) file as an output of the `train` stage (in the `outs` field). DVC then [pulls](/doc/command-reference/pull) the file from the default -[remote](/doc/command-reference/remote) of the external DVC project (found in -its +[remote](/doc/command-reference/remote) of the source DVC project (found in its [config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)). > A recommended use for downloading binary files from DVC repositories, as done diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md index 7126074ce7..7d3542efe7 100644 --- a/content/docs/command-reference/run.md +++ b/content/docs/command-reference/run.md @@ -240,6 +240,9 @@ $ dvc run -n my_stage './my_script.sh $MYENVVAR' > Note that DVC-files without dependencies are automatically considered > "always changed", so this option has no effect in those cases. +- `--external` - allow outputs that are outside of the DVC repository. See + [Managing External Data](/doc/user-guide/managing-external-data). + - `-h`, `--help` - prints the usage/help message, and exit. - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 71da15531c..61f232daed 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -52,8 +52,9 @@ The default local cache location is `.dvc/cache`, so there is no need to specify it explicitly. ```dvc -$ dvc add /home/shared/mydata +$ dvc add /home/shared/mydata --external $ dvc run -d data.txt \ + --external \ -o /home/shared/data.txt \ cp data.txt /home/shared/data.txt ``` @@ -68,10 +69,11 @@ $ dvc remote add sshcache ssh://user@example.com:/cache $ dvc config cache.ssh sshcache # Add data on SSH directly -$ dvc add ssh://user@example.com:/mydata +$ dvc add ssh://user@example.com:/mydata --external # Create the stage with external SSH output $ dvc run -d data.txt \ + --external \ -o ssh://user@example.com:/home/shared/data.txt \ scp data.txt user@example.com:/home/shared/data.txt ``` @@ -86,10 +88,11 @@ $ dvc remote add s3cache s3://mybucket/cache $ dvc config cache.s3 s3cache # Add data on S3 directly -$ dvc add s3://mybucket/mydata +$ dvc add s3://mybucket/mydata --external # Create the stage with external S3 output $ dvc run -d data.txt \ + --external \ -o s3://mybucket/data.txt \ aws s3 cp data.txt s3://mybucket/data.txt ``` @@ -104,10 +107,11 @@ $ dvc remote add gscache gs://mybucket/cache $ dvc config cache.gs gscache # Add data on GS directly -$ dvc add gs://mybucket/mydata +$ dvc add gs://mybucket/mydata --external # Create the stage with external GS output $ dvc run -d data.txt \ + --external \ -o gs://mybucket/data.txt \ gsutil cp data.txt gs://mybucket/data.txt ``` @@ -122,10 +126,11 @@ $ dvc remote add hdfscache hdfs://user@example.com/cache $ dvc config cache.hdfs hdfscache # Add data on HDFS directly -$ dvc add hdfs://user@example.com/mydata +$ dvc add hdfs://user@example.com/mydata --external # Create the stage with external HDFS output $ dvc run -d data.txt \ + --external \ -o hdfs://user@example.com/home/shared/data.txt \ hdfs fs -copyFromLocal \ data.txt \ diff --git a/content/docs/user-guide/what-is-dvc/index.md b/content/docs/user-guide/what-is-dvc/index.md index 24a32662e9..0e958b2074 100644 --- a/content/docs/user-guide/what-is-dvc/index.md +++ b/content/docs/user-guide/what-is-dvc/index.md @@ -7,7 +7,7 @@ between existing tools and data science needs, allowing users to take advantage of experiment management while reusing existing skills and intuition. Leveraging an underlying source code management system eliminates the need to -use external services. Data science experiment sharing and collaboration can be +use 3rd-party services. Data science experiment sharing and collaboration can be done through regular Git features (commit messages, merges, pull requests, etc) the same way it works for software engineers.