-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update dvc install and dvc import documentation #260
Conversation
`gs` | Google Storage | `gs://mybucket/data.csv` | ||
`ssh` | SSH server | `ssh://[email protected]:/path/to/data.csv` | ||
`hdfs` | HDFS | `hdfs://[email protected]/path/to/data.csv` | ||
`http` | HTTP to file with _strong ETag_ | `https://example.com/path/to/data.csv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is one more actually - remote://
. See this ticket - #108. It would be great to add it here and propagate the explanation to the external dependencies section, and dvc run if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remote URL needs to be documented in the dvc remote
command documentation. It should then be enough to reference that documentation from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still explain it briefly here - just give an example of the transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Really good stuff.
I put lots of small comments. Probably the biggest one is re the second example in DVC import. Let's simplify it and use local 'remote` instead. We can make it this way reproducible.
It would be great to add a phrase like - one of the use cases for the DVC import is to track inputs to the ETL pipeline. Imagine you use cron to run repro that checks some external file and if it changed rebuilds some model.
@@ -51,6 +51,10 @@ The output of `dvc checkout` does not list which data files were restored. It | |||
does report removed files and files that DVC was unable to restore due to it | |||
missing from the cache. | |||
|
|||
This command will fail to checkout files that are missing from the cache. In | |||
such a case, `dvc checkout` prints a warning message. Any files that can be | |||
checked out without error will be restored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any files that are found in cache
instead of an error. It's not an error usually - it's a warning like you mentioned above.
* `ssh` - URL to a file on another machine with SSH access | ||
* `hdfs` - URL to a file on HDFS | ||
* `http` - URL to a file with a _strong ETag_ served with HTTP or HTTPS | ||
Import file from any supported URL or local directory to local workspace and track changes in remote file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not 80 symbols here
This commit is only the dvc install documentation. The dvc import documentation is still to come.