Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: create Remote Reference (config) #4264

Merged
merged 20 commits into from
Feb 23, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
659828f
ref: start Remote Reference (config)
jorgeorpinel Jan 26, 2023
d49afa6
Restyled by prettier (#4265)
restyled-io[bot] Jan 26, 2023
12430a6
Merge branch 'main' into guide/data-mgmt/remote-config
jorgeorpinel Jan 27, 2023
a91e7c5
Merge branch 'main' into guide/data-mgmt/remote-config
jorgeorpinel Jan 27, 2023
bef1201
guide: move Remote Storage ref into Data Mgmt
jorgeorpinel Feb 1, 2023
27bdcd2
start: links to new Remotes guide and
jorgeorpinel Feb 1, 2023
b2cc99c
guide: finalize S3 storage page and
jorgeorpinel Feb 1, 2023
94a16c1
guide: move "local remotes" to Remotes (index page) and
jorgeorpinel Feb 2, 2023
d68c5a3
Merge branch 'main' into guide/data-mgmt/remote-config
jorgeorpinel Feb 2, 2023
032d4be
ref: remove S3 examples
jorgeorpinel Feb 2, 2023
392f6d7
Merge branch 'main' into guide/data-mgmt/remote-config
jorgeorpinel Feb 2, 2023
5ba293c
guide: emphasize that remotes use regular cloud storage config
jorgeorpinel Feb 2, 2023
033ecc8
Merge branch 'main' into guide/data-mgmt/remote-config
jorgeorpinel Feb 23, 2023
2badd87
Update content/docs/user-guide/data-management/remote-storage/amazon-…
jorgeorpinel Feb 23, 2023
b9f3b1d
guide: drop `worktree` cloud versioning from Remotes Config
jorgeorpinel Feb 23, 2023
acdebaa
guide: move cloud versioning near the top of Remote Config
jorgeorpinel Feb 23, 2023
f9fb502
Update content/docs/user-guide/data-management/remote-storage/amazon-…
Feb 23, 2023
1523b84
Update content/docs/user-guide/data-management/remote-storage/index.md
Feb 23, 2023
8a01e64
Restyled by prettier (#4331)
restyled-io[bot] Feb 23, 2023
2a25526
Update content/docs/user-guide/data-management/remote-storage/index.md
Feb 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,8 @@ location. A [DVC remote](/doc/command-reference/remote) name is used (instead of
the URL) because often it's necessary to configure authentication or other
connection settings, and configuring a remote is the way that can be done.

- `cache.local` - name of a _local remote_ to use as external cache (refer to
`dvc remote` for more info. on "local remotes".) This will overwrite the value
in `cache.dir` (see `dvc cache dir`).
- `cache.local` - name of a [local remote] to use as external cache. This will
overwrite the value in `cache.dir` (see `dvc cache dir`).

- `cache.s3` - name of an Amazon S3 remote to use as external cache.

Expand All @@ -265,10 +264,17 @@ connection settings, and configuring a remote is the way that can be done.
- `cache.webhdfs` - name of an HDFS remote with WebHDFS enabled to use as
external cache.

> ⚠️ Avoid using the same [remote storage](/doc/command-reference/remote) used
> for `dvc push` and `dvc pull` as external cache, because it may cause file
> hash overlaps: the hash of an external <abbr>output</abbr> could collide with
> that of a local file with different content.
<admon type="warn">

Avoid using the same [remote storage](/doc/command-reference/remote) used for
`dvc push` and `dvc pull` as external cache, because it may cause file hash
overlaps: the hash of an external <abbr>output</abbr> could collide with that
of a local file with different content.

</admon>

[local remote]:
/doc/user-guide/data-management/remote-storage#file-systems-local-remotes

### exp

Expand Down
146 changes: 7 additions & 139 deletions content/docs/command-reference/remote/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ A [default remote] is expected by `dvc push`, `dvc pull`, `dvc status`,
</admon>

The remote `name` (required) is used to identify the remote and must be unique.
DVC will determine the [type of remote](#supported-storage-types) based on the
DVC will determine the [storage type](#supported-storage-types) based on the
provided `url` (also required), a URL or path for the location.

<admon type="info">
Expand Down Expand Up @@ -121,60 +121,15 @@ $ pip install "dvc[s3]"

## Supported storage types

The following are the types of remote storage (protocols) supported:
The following are the supported types of storage protocols and platforms.

<details>

### Amazon S3

> 💡 Before adding an S3 remote, be sure to
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```cli
$ dvc remote add -d myremote s3://mybucket/path
Comment on lines -128 to -134
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 info is now totally removed (just linked instead) from remote add...

```

By default, DVC authenticates using your AWS CLI
[configuration](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
(if set). This uses the default AWS credentials file. To use a custom
authentication method, use the parameters described in `dvc remote modify`.

Make sure you have the following permissions enabled: `s3:ListBucket`,
`s3:GetObject`, `s3:PutObject`, `s3:DeleteObject`. This enables the S3 API
methods that are performed by DVC (`list_objects_v2` or `list_objects`,
`head_object`, `upload_file`, `download_file`, `delete_object`, `copy`).

> See `dvc remote modify` for a full list of S3 parameters.

</details>

<details>

### S3-compatible storage

For object storage that supports an S3-compatible API (e.g.
[Minio](https://min.io/),
[DigitalOcean Spaces](https://www.digitalocean.com/products/spaces/),
Comment on lines -153 to -157
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(including S3-compatible)

[IBM Cloud Object Storage](https://www.ibm.com/cloud/object-storage) etc.),
configure the `endpointurl` parameter. For example, let's set up a DigitalOcean
"space" (equivalent to a bucket in S3) called `mystore` that uses the `nyc3`
region:

```cli
$ dvc remote add -d myremote s3://mystore/path
$ dvc remote modify myremote endpointurl \
https://nyc3.digitaloceanspaces.com
```
### Cloud providers

By default, DVC authenticates using your AWS CLI
[configuration](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
(if set). This uses the default AWS credentials file. To use a custom
authentication method, use the parameters described in `dvc remote modify`.
- [Amazon S3] (AWS) and [S3-compatible] e.g. MinIO

Any other S3 parameter can also be set for S3-compatible storage. Whether
they're effective depends on each storage platform.

</details>
[amazon s3]: /doc/user-guide/data-management/remote-storage/amazon-s3
[s3-compatible]:
/doc/user-guide/data-management/remote-storage/amazon-s3#s3-compatible-servers-non-amazon

<details>

Expand Down Expand Up @@ -396,90 +351,3 @@ $ dvc remote add -d myremote \
> See `dvc remote modify` for a full list of WebDAV parameters.

</details>

<details>

### local remote

A "local remote" is a directory in the machine's file system. Not to be confused
with the `--local` option of `dvc remote` (and other config) commands!

> While the term may seem contradictory, it doesn't have to be. The "local" part
> refers to the type of location where the storage is: another directory in the
> same file system. "Remote" is how we call storage for <abbr>DVC
> projects</abbr>. It's essentially a local backup for data tracked by DVC.

Using an absolute path (recommended):

```cli
$ dvc remote add -d myremote /tmp/dvcstore
$ cat .dvc/config
...
['remote "myremote"']
url = /tmp/dvcstore
...
```

> Note that the absolute path `/tmp/dvcstore` is saved as is.

Using a relative path. It will be resolved against the current working
directory, but saved **relative to the config file location**:

```cli
$ dvc remote add -d myremote ../dvcstore
$ cat .dvc/config
...
['remote "myremote"']
url = ../../dvcstore
...
```

> Note that `../dvcstore` has been resolved relative to the `.dvc/` dir,
> resulting in `../../dvcstore`.

</details>

## Example: Customize an S3 remote

Add an Amazon S3 remote as the _default_ (via the `-d` option), and modify its
region.

> 💡 Before adding an S3 remote, be sure to
> [Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

```cli
$ dvc remote add -d myremote s3://mybucket/path
Setting 'myremote' as a default remote.

$ dvc remote modify myremote region us-east-2
```

The <abbr>project</abbr>'s config file (`.dvc/config`) now looks like this:

```ini
['remote "myremote"']
url = s3://mybucket/path
region = us-east-2
[core]
remote = myremote
```

The list of remotes should now be:

```cli
$ dvc remote list
myremote s3://mybucket/path
```

You can overwrite existing remotes using `-f` with `dvc remote add`:

```cli
$ dvc remote add -f myremote s3://mybucket/another-path
```

List remotes again to view the updated remote:

```cli
$ dvc remote list
myremote s3://mybucket/another-path
```
12 changes: 4 additions & 8 deletions content/docs/command-reference/remote/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,12 @@ default). Alternatively, the config files can be edited manually.

## Example: Add a default local remote

<details>
<admon type="tip">

### What is a "local remote" ?
Learn more about
[local remotes](/doc/user-guide/data-management/remote-storage#file-systems-local-remotes).

While the term may seem contradictory, it doesn't have to be. The "local" part
refers to the type of location where the storage is: another directory in the
same file system. "Remote" is what we call storage for <abbr>DVC
projects</abbr>. It's essentially a local backup for data tracked by DVC.

</details>
</admon>

We use the `-d` (`--default`) option of `dvc remote add` for this:

Expand Down
16 changes: 4 additions & 12 deletions content/docs/command-reference/remote/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,7 @@ and local config files (in that order).

## Examples

For simplicity, let's add a default local remote:

<details>

### What is a "local remote" ?

While the term may seem contradictory, it doesn't have to be. The "local" part
refers to the type of location where the storage is: another directory in the
same file system. "Remote" is how we call storage for <abbr>DVC projects</abbr>.
It's essentially a local backup for data tracked by DVC.

</details>
For simplicity, let's add a default [local remote]:

```cli
$ dvc remote add -d myremote /path/to/remote
Expand All @@ -66,3 +55,6 @@ myremote /path/to/remote
```

The list will also include any previously added remotes.

[local remote]:
/doc/user-guide/data-management/remote-storage#file-systems-local-remotes
Loading