From 7d2a6a04b3d6644df533e6b73e8497f57e143aac Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Fri, 10 Mar 2023 18:56:30 -0500 Subject: [PATCH] guide: move self-hosted remote info to guide (#4378) * guide: move self-hosted remote info to guide * fix link --- content/docs/command-reference/remote/add.md | 110 +---- .../docs/command-reference/remote/modify.md | 455 +----------------- content/docs/sidebar.json | 21 +- .../data-management/remote-storage/hdfs.md | 153 ++++++ .../data-management/remote-storage/http.md | 118 +++++ .../data-management/remote-storage/index.md | 12 +- .../data-management/remote-storage/ssh.md | 117 +++++ .../data-management/remote-storage/webdav.md | 116 +++++ 8 files changed, 550 insertions(+), 552 deletions(-) create mode 100644 content/docs/user-guide/data-management/remote-storage/hdfs.md create mode 100644 content/docs/user-guide/data-management/remote-storage/http.md create mode 100644 content/docs/user-guide/data-management/remote-storage/ssh.md create mode 100644 content/docs/user-guide/data-management/remote-storage/webdav.md diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index 3809609ba3..7c14204b38 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -141,103 +141,13 @@ The following are the supported types of storage protocols and platforms. ### Self-hosted / On-premises -
- -### SSH - -```cli -$ dvc remote add -d myremote ssh://user@example.com/path -``` - -> See `dvc remote modify` for a full list of SSH parameters. - -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). - -> Note that the server's SFTP root might differ from its physical root (`/`). - -
- -
- -### HDFS - -⚠️ Using HDFS with a Hadoop cluster might require additional setup. Our -assumption is that the client is set up to use it. Specifically, [`libhdfs`] -should be installed. - -[`libhdfs`]: - https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/LibHdfs.html - -💡 Using an HDFS cluster as remote storage is also supported via the WebHDFS -API. Read more about it by expanding the WebHDFS section below. - -```cli -$ dvc remote add -d myremote hdfs://user@example.com/path -``` - -> See `dvc remote modify` for a full list of HDFS parameters. - -
- -
- -### WebHDFS - -⚠️ Using WebHDFS requires to enable REST API access in the cluster: set the -config property `dfs.webhdfs.enabled` to `true` in `hdfs-site.xml`. - -If your cluster is secured, then WebHDFS is commonly used with Kerberos and -HTTPS. To enable these for the DVC remote, set `use_https` and `kerberos` to -`true`. - -```cli -$ dvc remote add -d myremote webhdfs://example.com/path -$ dvc remote modify myremote use_https true -$ dvc remote modify myremote kerberos true -$ dvc remote modify --local myremote token SOME_BASE64_ENCODED_TOKEN -``` - -💡 You may want to run `kinit` before using the remote to make sure you have an -active kerberos session. - -> `token` contains sensitive user info. Therefore, it's safer to add it with the -> `--local` option, so it's written to a Git-ignored config file. - -> See `dvc remote modify` for a full list of WebHDFS parameters. - -
- -
- -### HTTP - -```cli -$ dvc remote add -d myremote https://example.com/path -``` - -> See `dvc remote modify` for a full list of HTTP parameters. - -
- -
- -### WebDAV - -```cli -$ dvc remote add -d myremote \ - webdavs://example.com/owncloud/remote.php/dav -``` - -If your remote is located in a subfolder of your WebDAV server e.g. -`files/myuser`, this path may be appended to the base URL: - -```cli -$ dvc remote add -d myremote \ - webdavs://example.com/owncloud/remote.php/dav/files/myuser -``` - -> See `dvc remote modify` for a full list of WebDAV parameters. - -
+- [SSH]; Like `scp` +- [HDFS] & [WebHDFS] +- [HTTP] +- [WebDAV] + +[ssh]: /doc/user-guide/data-management/remote-storage/ssh +[hdfs]: /doc/user-guide/data-management/remote-storage/hdfs +[webhdfs]: /doc/user-guide/data-management/remote-storage/hdfs#webhdfs +[http]: /doc/user-guide/data-management/remote-storage/http +[webdav]: /doc/user-guide/data-management/remote-storage/webdav diff --git a/content/docs/command-reference/remote/modify.md b/content/docs/command-reference/remote/modify.md index 46b1185259..e3b3fe97d3 100644 --- a/content/docs/command-reference/remote/modify.md +++ b/content/docs/command-reference/remote/modify.md @@ -151,451 +151,16 @@ details in the pages linked below. ### Self-hosted / On-premises -
- -### SSH - -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. - -- `url` - remote location, in a regular - [SSH format](https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax). - Note that this can already include the `user` parameter, embedded into the - URL: - - ```cli - $ dvc remote modify myremote url \ - ssh://user@example.com:1234/path - ``` - - ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. - Please check that you are able to connect both ways with tools like `ssh` and - `sftp` (GNU/Linux). - - > Note that your server's SFTP root might differ from its physical root (`/`). - -- `user` - user name to access the remote: - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - - The order in which DVC picks the user name: - - 1. `user` parameter set with this command (found in `.dvc/config`); - 2. User defined in the URL (e.g. `ssh://user@example.com/path`); - 3. User defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Current system user - -- `port` - port to access the remote. - - ```cli - $ dvc remote modify myremote port 2222 - ``` - - The order in which DVC decide the port number: - - 1. `port` parameter set with this command (found in `.dvc/config`); - 2. Port defined in the URL (e.g. `ssh://example.com:1234/path`); - 3. Port defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Default SSH port 22 - -- `keyfile` - path to private key to access the remote. - - ```cli - $ dvc remote modify --local myremote keyfile /path/to/keyfile - ``` - -- `password` - a password to access the remote - - ```cli - $ dvc remote modify --local myremote password mypassword - ``` - -- `ask_password` - ask for a password to access the remote. - - ```cli - $ dvc remote modify myremote ask_password true - ``` - -- `passphrase` - a private key passphrase to access the remote - - ```cli - $ dvc remote modify --local myremote passphrase mypassphrase - ``` - -- `ask_passphrase` - ask for a private key passphrase to access the remote. - - ```cli - $ dvc remote modify myremote ask_passphrase true - ``` - -- `gss_auth` - use Generic Security Services authentication if available on host - (for example, - [with kerberos](https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos)). - Using this param requires `paramiko[gssapi]`, which is currently only - supported by our pip package, and could be installed with - `pip install 'dvc[ssh_gssapi]'`. Other packages (Conda, Windows, and macOS - PKG) do not support it. - - ```cli - $ dvc remote modify myremote gss_auth true - ``` - -- `allow_agent` - whether to use [SSH agents](https://www.ssh.com/ssh/agent) - (`true` by default). Setting this to `false` is useful when `ssh-agent` is - causing problems, such as a "No existing session" error: - - ```cli - $ dvc remote modify myremote allow_agent false - ``` - -
- -
- -### HDFS - - - -Using an HDFS cluster as remote storage is also supported via the WebHDFS API. -Read more about it [here]. - -[here]: /doc/command-reference/remote/add#webhdfs - - - - - -If any values given to the parameters below contain sensitive user info, add -them with the `--local` option, so they're written to a Git-ignored config file. - - - -- `url` - remote location: - - ```cli - $ dvc remote modify myremote url hdfs://user@example.com/path - ``` - -- `user` - user name to access the remote. - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - -- `kerb_ticket` - path to the Kerberos ticket cache for Kerberos-secured HDFS - clusters - - ```cli - $ dvc remote modify --local myremote \ - kerb_ticket /path/to/ticket/cache - ``` - -
- -
- -### WebHDFS - - - -WebHDFS serves as an alternative for using the same remote storage supported by -HDFS. Read more about it [here]. - - - - - -If any values given to the parameters below contain sensitive user info, add -them with the `--local` option, so they're written to a Git-ignored config file. - - - -- `url` - remote location: - - ```cli - $ dvc remote modify myremote url webhdfs://user@example.com/path - ``` - - > Do not provide a `user` in the URL with `kerberos` or `token` - > authentication. - -- `user` - user name to access the remote. Do not set this with `kerberos` or - `token` authentication. - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - -- `kerberos` - enable Kerberos authentication (`false` by default): - - ```cli - $ dvc remote modify myremote kerberos true - ``` - -- `kerberos_principal` - [Kerberos principal] to use, in case you have multiple - ones (for example service accounts). Only used if `kerberos` is `true`. - - ```cli - $ dvc remote modify myremote kerberos_principal myprincipal - ``` - - [kerberos principal]: - https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html - -- `proxy_to` - Hadoop [superuser] to proxy as. _Proxy user_ feature must be - enabled on the cluster, and the user must have the correct access rights. If - the cluster is secured, Kerberos must be enabled (set `kerberos` to `true`) - for this to work. This parameter is incompatible with `token`. - - ```cli - $ dvc remote modify myremote proxy_to myuser - ``` - - [superuser]: - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html - -- `use_https` - enables SWebHdfs. Note that DVC still expects the protocol in - `url` to be `webhdfs://`, and will fail if `swebhdfs://` is used. - - ```cli - $ dvc remote modify myremote use_https true - ``` - - [swebhdfs]: - https://hadoop.apache.org/docs/r3.1.0/api/org/apache/hadoop/fs/SWebHdfs.html - -- `ssl_verify` - whether to verify SSL requests. Defaults to `true` when - `use_https` is enabled, `false` otherwise. - - ```cli - $ dvc remote modify myremote ssl_verify false - ``` - -- `token` - Hadoop [delegation token] (as returned by the [WebHDFS API]). If the - cluster is secured, Kerberos must be enabled (set `kerberos` to `true`) for - this to work. This parameter is incompatible with providing a `user` and with - `proxy_to`. - - ```cli - $ dvc remote modify myremote token "mysecret" - ``` - - [delegation token]: - https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html - [webhdfs api]: - https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Delegation_Token_Operations - -
- -
- -### HTTP - -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. - -- `url` - remote location: - - ```cli - $ dvc remote modify myremote url https://example.com/path - ``` - - > The URL can include a query string, which will be preserved (e.g. - > `example.com?loc=path%2Fto%2Fdir`) - -- `auth` - authentication method to use when accessing the remote. The accepted - values are: - - - `basic` - - [basic authentication scheme](https://tools.ietf.org/html/rfc7617). `user` - and `password` (or `ask_password`) parameters should also be configured. - - `digest` (**removed** in 2.7.1) - - [digest Access Authentication Scheme](https://tools.ietf.org/html/rfc7616). - `user` and `password` (or `ask_password`) parameters should also be - configured. - - `custom` - an additional HTTP header field will be set for all HTTP requests - to the remote in the form: `custom_auth_header: password`. - `custom_auth_header` and `password` (or `ask_password`) parameters should - also be configured. - - ```cli - $ dvc remote modify myremote auth basic - ``` - -- `method` - override the - [HTTP method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) to - use for file uploads (e.g. `PUT` should be used for - [Artifactory](https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API)). - By default, `POST` is used. - - ```cli - $ dvc remote modify myremote method PUT - ``` - -- `custom_auth_header` - HTTP header field name to use when the `auth` parameter - is set to `custom`. - - ```cli - $ dvc remote modify --local myremote \ - custom_auth_header 'My-Header' - ``` - -- `user` - user name to use when the `auth` parameter is set to `basic`. - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - - The order in which DVC picks the user name: - - 1. `user` parameter set with this command (found in `.dvc/config`); - 2. User defined in the URL (e.g. `http://user@example.com/path`); - -- `password` - password to use for any `auth` method. - - ```cli - $ dvc remote modify --local myremote password mypassword - ``` - -- `ask_password` - ask each time for the password to use for any `auth` method. - - ```cli - $ dvc remote modify myremote ask_password true - ``` - - > Note that the `password` parameter takes precedence over `ask_password`. If - > `password` is specified, DVC will not prompt the user to enter a password - > for this remote. - -- `ssl_verify` - whether or not to verify SSL certificates, or a path to a - custom CA bundle to do so (`true` by default). - - ```cli - $ dvc remote modify myremote ssl_verify false - # or - $ dvc remote modify myremote ssl_verify path/to/ca_bundle.pem - ``` - -- `read_timeout` - set the time in seconds till a timeout exception is thrown - when attempting to read a portion of data from a connection (60 by default). - Let's set it to 5 minutes for example: - - ```cli - $ dvc remote modify myremote read_timeout 300 - ``` - -- `connect_timeout` - set the time in seconds till a timeout exception is thrown - when attempting to make a connection (60 by default). Let's set it to 5 - minutes for example: - - ```cli - $ dvc remote modify myremote connect_timeout 300 - ``` - -
- -
- -### WebDAV - -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. - -- `url` - remote location: - - ```cli - $ dvc remote modify myremote url \ - webdavs://example.com/nextcloud/remote.php/dav/files/myuser/ - ``` - -- `token` - token for WebDAV server, can be empty in case of using - `user/password` authentication. - - ```cli - $ dvc remote modify --local myremote token 'mytoken' - ``` - -- `user` - user name for WebDAV server, can be empty in case of using `token` - authentication. - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - - The order in which DVC searches for user name is: - - 1. `user` parameter set with this command (found in `.dvc/config`); - 2. User defined in the URL (e.g. `webdavs://user@example.com/endpoint/path`) - -- `custom_auth_header` - HTTP header field name to use for authentication. Value - is set via `password`. - - ```cli - $ dvc remote modify --local myremote \ - custom_auth_header 'My-Header' - ``` - -- `password` - password for WebDAV server, combined either with `user` or - `custom_auth_header`. Leave empty for `token` authentication. - - ```cli - $ dvc remote modify --local myremote password mypassword - ``` - - - - Auth based on `user` or `custom_auth_header` (with `password`) is incompatible - with `token` auth. - - - -- `ask_password` - ask each time for the password to use for `user/password` - authentication. This has no effect if `password` or `token` are set. - - ```cli - $ dvc remote modify myremote ask_password true - ``` - -- `ssl_verify` - whether or not to verify SSL certificates, or a path to a - custom CA bundle to do so (`true` by default). - - ```cli - $ dvc remote modify myremote ssl_verify false - # or - $ dvc remote modify myremote ssl_verify path/to/ca_bundle.pem - ``` - -- `cert_path` - path to certificate used for WebDAV server authentication, if - you need to use local client side certificates. - - ```cli - $ dvc remote modify --local myremote cert_path /path/to/cert - ``` - -- `key_path` - path to private key to use to access a remote. Only has an effect - in combination with `cert_path`. - - ```cli - $ dvc remote modify --local myremote key_path /path/to/key - ``` - - > Note that the certificate in `cert_path` might already contain the private - > key. - -- `timeout` - connection timeout (in seconds) for WebDAV server (default: 30). - - ```cli - $ dvc remote modify myremote timeout 120 - ``` - -
+- [SSH]; Like `scp` +- [HDFS] & [WebHDFS] +- [HTTP] +- [WebDAV] + +[ssh]: /doc/user-guide/data-management/remote-storage/ssh +[hdfs]: /doc/user-guide/data-management/remote-storage/hdfs +[webhdfs]: /doc/user-guide/data-management/remote-storage/hdfs#webhdfs +[http]: /doc/user-guide/data-management/remote-storage/http +[webdav]: /doc/user-guide/data-management/remote-storage/webdav ## Example: Some Azure authentication methods diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 2da2f720a4..9f5d06aa6f 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -132,7 +132,26 @@ "azure-blob-storage", "google-cloud-storage", "google-drive", - "aliyun-oss" + { + "label": "Aliyun OSS", + "slug": "aliyun-oss" + }, + { + "label": "SSH", + "slug": "ssh" + }, + { + "label": "HDFS & WebHDFS", + "slug": "hdfs" + }, + { + "label": "HTTP", + "slug": "http" + }, + { + "label": "WebDAV", + "slug": "webdav" + } ] }, "cloud-versioning", diff --git a/content/docs/user-guide/data-management/remote-storage/hdfs.md b/content/docs/user-guide/data-management/remote-storage/hdfs.md new file mode 100644 index 0000000000..940a772c77 --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/hdfs.md @@ -0,0 +1,153 @@ +# HDFS & WebHDFS + + + +Start with `dvc remote add` to define the remote: + +```cli +$ dvc remote add -d myremote hdfs://user@example.com:path +``` + +⚠️ Using HDFS with a Hadoop cluster might require additional setup. Our +assumption is that the client is set up to use it. Specifically, [`libhdfs`] +should be installed. + +[`libhdfs`]: + https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/LibHdfs.html +[webhdfs api]: #webhdfs + +## HDFS configuration parameters + + + +If any values given to the parameters below contain sensitive user info, add +them with the `--local` option, so they're written to a Git-ignored config file. + + + +- `url` - remote location: + + ```cli + $ dvc remote modify myremote url hdfs://user@example.com/path + ``` + +- `user` - user name to access the remote. + + ```cli + $ dvc remote modify --local myremote user myuser + ``` + +- `kerb_ticket` - path to the Kerberos ticket cache for Kerberos-secured HDFS + clusters + + ```cli + $ dvc remote modify --local myremote \ + kerb_ticket /path/to/ticket/cache + ``` + +## WebHDFS + +Using an HDFS cluster as remote storage is also supported via the WebHDFS API. + +If your cluster is secured, then WebHDFS is commonly used with Kerberos and +HTTPS. To enable these for the DVC remote, set `use_https` and `kerberos` to +`true`. + +```cli +$ dvc remote add -d myremote webhdfs://example.com/path +$ dvc remote modify myremote use_https true +$ dvc remote modify myremote kerberos true +$ dvc remote modify --local myremote token SOME_BASE64_ENCODED_TOKEN +``` + +⚠️ Using WebHDFS requires to enable REST API access in the cluster: set the +config property `dfs.webhdfs.enabled` to `true` in `hdfs-site.xml`. + +💡 You may want to run `kinit` before using the remote to make sure you have an +active kerberos session. + +## WebHDFS configuration parameters + + + +If any values given to the parameters below contain sensitive user info, add +them with the `--local` option, so they're written to a Git-ignored config file. + + + +- `url` - remote location: + + ```cli + $ dvc remote modify myremote url webhdfs://user@example.com/path + ``` + + > Do not provide a `user` in the URL with `kerberos` or `token` + > authentication. + +- `user` - user name to access the remote. Do not set this with `kerberos` or + `token` authentication. + + ```cli + $ dvc remote modify --local myremote user myuser + ``` + +- `kerberos` - enable Kerberos authentication (`false` by default): + + ```cli + $ dvc remote modify myremote kerberos true + ``` + +- `kerberos_principal` - [Kerberos principal] to use, in case you have multiple + ones (for example service accounts). Only used if `kerberos` is `true`. + + ```cli + $ dvc remote modify myremote kerberos_principal myprincipal + ``` + + [kerberos principal]: + https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html + +- `proxy_to` - Hadoop [superuser] to proxy as. _Proxy user_ feature must be + enabled on the cluster, and the user must have the correct access rights. If + the cluster is secured, Kerberos must be enabled (set `kerberos` to `true`) + for this to work. This parameter is incompatible with `token`. + + ```cli + $ dvc remote modify myremote proxy_to myuser + ``` + + [superuser]: + https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html + +- `use_https` - enables SWebHdfs. Note that DVC still expects the protocol in + `url` to be `webhdfs://`, and will fail if `swebhdfs://` is used. + + ```cli + $ dvc remote modify myremote use_https true + ``` + + [swebhdfs]: + https://hadoop.apache.org/docs/r3.1.0/api/org/apache/hadoop/fs/SWebHdfs.html + +- `ssl_verify` - whether to verify SSL requests. Defaults to `true` when + `use_https` is enabled, `false` otherwise. + + ```cli + $ dvc remote modify myremote ssl_verify false + ``` + +- `token` - Hadoop [delegation token] (as returned by the [WebHDFS API]). If the + cluster is secured, Kerberos must be enabled (set `kerberos` to `true`) for + this to work. This parameter is incompatible with providing a `user` and with + `proxy_to`. + + ```cli + $ dvc remote modify myremote token "mysecret" + ``` + + [delegation token]: + https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/delegation_tokens.html + [webhdfs api]: + https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Delegation_Token_Operations diff --git a/content/docs/user-guide/data-management/remote-storage/http.md b/content/docs/user-guide/data-management/remote-storage/http.md new file mode 100644 index 0000000000..503f6ec376 --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/http.md @@ -0,0 +1,118 @@ +# HTTP + + + +Start with `dvc remote add` to define the remote: + +```cli +$ dvc remote add -d myremote https://example.com/path +``` + +## Configuration parameters + + + +If any values given to the parameters below contain sensitive user info, add +them with the `--local` option, so they're written to a Git-ignored config file. + + + +- `url` - remote location: + + ```cli + $ dvc remote modify myremote url https://example.com/path + ``` + + > The URL can include a query string, which will be preserved (e.g. + > `example.com?loc=path%2Fto%2Fdir`) + +- `auth` - authentication method to use when accessing the remote. The accepted + values are: + + - `basic` - + [basic authentication scheme](https://tools.ietf.org/html/rfc7617). `user` + and `password` (or `ask_password`) parameters should also be configured. + - `digest` (**removed** in 2.7.1) - + [digest Access Authentication Scheme](https://tools.ietf.org/html/rfc7616). + `user` and `password` (or `ask_password`) parameters should also be + configured. + - `custom` - an additional HTTP header field will be set for all HTTP requests + to the remote in the form: `custom_auth_header: password`. + `custom_auth_header` and `password` (or `ask_password`) parameters should + also be configured. + + ```cli + $ dvc remote modify myremote auth basic + ``` + +- `method` - override the + [HTTP method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) to + use for file uploads (e.g. `PUT` should be used for + [Artifactory](https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API)). + By default, `POST` is used. + + ```cli + $ dvc remote modify myremote method PUT + ``` + +- `custom_auth_header` - HTTP header field name to use when the `auth` parameter + is set to `custom`. + + ```cli + $ dvc remote modify --local myremote \ + custom_auth_header 'My-Header' + ``` + +- `user` - user name to use when the `auth` parameter is set to `basic`. + + ```cli + $ dvc remote modify --local myremote user myuser + ``` + + The order in which DVC picks the user name: + + 1. `user` parameter set with this command (found in `.dvc/config`); + 2. User defined in the URL (e.g. `http://user@example.com/path`); + +- `password` - password to use for any `auth` method. + + ```cli + $ dvc remote modify --local myremote password mypassword + ``` + +- `ask_password` - ask each time for the password to use for any `auth` method. + + ```cli + $ dvc remote modify myremote ask_password true + ``` + + > Note that the `password` parameter takes precedence over `ask_password`. If + > `password` is specified, DVC will not prompt the user to enter a password + > for this remote. + +- `ssl_verify` - whether or not to verify SSL certificates, or a path to a + custom CA bundle to do so (`true` by default). + + ```cli + $ dvc remote modify myremote ssl_verify false + # or + $ dvc remote modify myremote ssl_verify path/to/ca_bundle.pem + ``` + +- `read_timeout` - set the time in seconds till a timeout exception is thrown + when attempting to read a portion of data from a connection (60 by default). + Let's set it to 5 minutes for example: + + ```cli + $ dvc remote modify myremote read_timeout 300 + ``` + +- `connect_timeout` - set the time in seconds till a timeout exception is thrown + when attempting to make a connection (60 by default). Let's set it to 5 + minutes for example: + + ```cli + $ dvc remote modify myremote connect_timeout 300 + ``` diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md index 5dd29a99a5..67f5f01129 100644 --- a/content/docs/user-guide/data-management/remote-storage/index.md +++ b/content/docs/user-guide/data-management/remote-storage/index.md @@ -112,16 +112,16 @@ team. ### Self-hosted / On-premises -- [SSH servers]; Like `scp` +- [SSH]; Like `scp` - [HDFS] & [WebHDFS] - [HTTP] - [WebDAV] -[ssh servers]: /doc/command-reference/remote/modify#ssh -[hdfs]: /doc/command-reference/remote/modify#hdfs -[webhdfs]: /doc/command-reference/remote/modify#webhdfs -[http]: /doc/command-reference/remote/modify#http -[webdav]: /doc/command-reference/remote/modify#webdav +[ssh]: /doc/user-guide/data-management/remote-storage/ssh +[hdfs]: /doc/user-guide/data-management/remote-storage/hdfs +[webhdfs]: /doc/user-guide/data-management/remote-storage/hdfs#webhdfs +[http]: /doc/user-guide/data-management/remote-storage/http +[webdav]: /doc/user-guide/data-management/remote-storage/webdav ## File systems (local remotes) diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md new file mode 100644 index 0000000000..32855b2d9a --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -0,0 +1,117 @@ +# SSH + + + +Start with `dvc remote add` to define the remote: + +```cli +$ dvc remote add -d myremote ssh://user@example.com/path +``` + +⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. +Check that you can connect both ways with tools like `ssh` and `sftp` +(GNU/Linux). + +> Note that the server's SFTP root might differ from its physical root (`/`). + +## Configuration parameters + +> If any values given to the parameters below contain sensitive user info, add +> them with the `--local` option, so they're written to a Git-ignored config +> file. + +- `url` - remote location, in a regular + [SSH format](https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax). + Note that this can already include the `user` parameter, embedded into the + URL: + + ```cli + $ dvc remote modify myremote url \ + ssh://user@example.com:1234/path + ``` + + ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. + Please check that you are able to connect both ways with tools like `ssh` and + `sftp` (GNU/Linux). + + > Note that your server's SFTP root might differ from its physical root (`/`). + +- `user` - user name to access the remote: + + ```cli + $ dvc remote modify --local myremote user myuser + ``` + + The order in which DVC picks the user name: + + 1. `user` parameter set with this command (found in `.dvc/config`); + 2. User defined in the URL (e.g. `ssh://user@example.com/path`); + 3. User defined in the SSH config file (e.g. `~/.ssh/config`) for this host + (URL); + 4. Current system user + +- `port` - port to access the remote. + + ```cli + $ dvc remote modify myremote port 2222 + ``` + + The order in which DVC decide the port number: + + 1. `port` parameter set with this command (found in `.dvc/config`); + 2. Port defined in the URL (e.g. `ssh://example.com:1234/path`); + 3. Port defined in the SSH config file (e.g. `~/.ssh/config`) for this host + (URL); + 4. Default SSH port 22 + +- `keyfile` - path to private key to access the remote. + + ```cli + $ dvc remote modify --local myremote keyfile /path/to/keyfile + ``` + +- `password` - a password to access the remote + + ```cli + $ dvc remote modify --local myremote password mypassword + ``` + +- `ask_password` - ask for a password to access the remote. + + ```cli + $ dvc remote modify myremote ask_password true + ``` + +- `passphrase` - a private key passphrase to access the remote + + ```cli + $ dvc remote modify --local myremote passphrase mypassphrase + ``` + +- `ask_passphrase` - ask for a private key passphrase to access the remote. + + ```cli + $ dvc remote modify myremote ask_passphrase true + ``` + +- `gss_auth` - use Generic Security Services authentication if available on host + (for example, + [with kerberos](https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos)). + Using this param requires `paramiko[gssapi]`, which is currently only + supported by our pip package, and could be installed with + `pip install 'dvc[ssh_gssapi]'`. Other packages (Conda, Windows, and macOS + PKG) do not support it. + + ```cli + $ dvc remote modify myremote gss_auth true + ``` + +- `allow_agent` - whether to use [SSH agents](https://www.ssh.com/ssh/agent) + (`true` by default). Setting this to `false` is useful when `ssh-agent` is + causing problems, such as a "No existing session" error: + + ```cli + $ dvc remote modify myremote allow_agent false + ``` diff --git a/content/docs/user-guide/data-management/remote-storage/webdav.md b/content/docs/user-guide/data-management/remote-storage/webdav.md new file mode 100644 index 0000000000..72d4472fb5 --- /dev/null +++ b/content/docs/user-guide/data-management/remote-storage/webdav.md @@ -0,0 +1,116 @@ +# WebDAV + + + +Start with `dvc remote add` to define the remote: + +```cli +$ dvc remote add -d myremote \ + webdavs://example.com/owncloud/remote.php/dav +``` + +If your remote is located in a subfolder of your WebDAV server e.g. +`files/myuser`, this path may be appended to the base URL: + +```cli +$ dvc remote add -d myremote \ + webdavs://example.com/owncloud/remote.php/dav/files/myuser +``` + +## Configuration parameters + + + +If any values given to the parameters below contain sensitive user info, add +them with the `--local` option, so they're written to a Git-ignored config file. + + + +- `url` - remote location: + + ```cli + $ dvc remote modify myremote url \ + webdavs://example.com/nextcloud/remote.php/dav/files/myuser/ + ``` + +- `token` - token for WebDAV server, can be empty in case of using + `user/password` authentication. + + ```cli + $ dvc remote modify --local myremote token 'mytoken' + ``` + +- `user` - user name for WebDAV server, can be empty in case of using `token` + authentication. + + ```cli + $ dvc remote modify --local myremote user myuser + ``` + + The order in which DVC searches for user name is: + + 1. `user` parameter set with this command (found in `.dvc/config`); + 2. User defined in the URL (e.g. `webdavs://user@example.com/endpoint/path`) + +- `custom_auth_header` - HTTP header field name to use for authentication. Value + is set via `password`. + + ```cli + $ dvc remote modify --local myremote \ + custom_auth_header 'My-Header' + ``` + +- `password` - password for WebDAV server, combined either with `user` or + `custom_auth_header`. Leave empty for `token` authentication. + + ```cli + $ dvc remote modify --local myremote password mypassword + ``` + + + + Auth based on `user` or `custom_auth_header` (with `password`) is incompatible + with `token` auth. + + + +- `ask_password` - ask each time for the password to use for `user/password` + authentication. This has no effect if `password` or `token` are set. + + ```cli + $ dvc remote modify myremote ask_password true + ``` + +- `ssl_verify` - whether or not to verify SSL certificates, or a path to a + custom CA bundle to do so (`true` by default). + + ```cli + $ dvc remote modify myremote ssl_verify false + # or + $ dvc remote modify myremote ssl_verify path/to/ca_bundle.pem + ``` + +- `cert_path` - path to certificate used for WebDAV server authentication, if + you need to use local client side certificates. + + ```cli + $ dvc remote modify --local myremote cert_path /path/to/cert + ``` + +- `key_path` - path to private key to use to access a remote. Only has an effect + in combination with `cert_path`. + + ```cli + $ dvc remote modify --local myremote key_path /path/to/key + ``` + + > Note that the certificate in `cert_path` might already contain the private + > key. + +- `timeout` - connection timeout (in seconds) for WebDAV server (default: 30). + + ```cli + $ dvc remote modify myremote timeout 120 + ```