Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documented Bulk IO issues #2340

Merged
merged 6 commits into from
Jan 22, 2018
Merged

Documented Bulk IO issues #2340

merged 6 commits into from
Jan 22, 2018

Conversation

Amruta-Ranade
Copy link
Contributor

@Amruta-Ranade Amruta-Ranade commented Jan 9, 2018

Closes #2318 #2238 #2254 #2253 #2255

Because the PR has multiple reviewers, I have pointed out the sections to be reviewed for each reviewer. Hope that helps!

@cockroach-teamcity
Copy link
Member

This change is Reviewable

v2.0/backup.md Outdated
{{site.data.alerts.callout_info}}Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.{{site.data.alerts.end}}
- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
- [<sup>1</sup>](#import-file-temp-directory-urls) Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.
- [<sup>2</sup>](#import-file-temp-directory-urls) <span class="version-tag">New in v2.0</span> The file system backup location on a local node or NFS drive is relative to the path specified by the `--external-io-dir` flag set while [starting the node](start-a-node.html). If the flag is set to `disabled`, then backups to local directories and NFS drives are disabled.
Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt Please review the bullet point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, though we could include in the last one (which defaults a subdirectory of the first configured store) or similar if we wanted to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also mention that, if you do not want to restart your nodes to change this flag, you can create symlinks to the locations you want to use from within the extern dir to read/write to paths outside the extern dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt Does this sound right: "The file system backup location on a local node or NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the --external-io-dir flag is set to disabled, then backups to local directories and NFS drives are disabled. By default, the flag is set to the extern subdirectory of the first configured store. To avoid restarting your nodes to set the --external-io-dir flag to the locations you want to use, create symlinks to the desired locations from within the extern directory."


{{site.data.alerts.callout_info}}Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.{{site.data.alerts.end}}
- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benesch Please review this paragraph.

v2.0/import.md Outdated
@@ -172,14 +172,17 @@ URLs for the file you want to import and your temp directory must use the follow
| Google Cloud [<sup>1</sup>](#notes) | `gs` | Bucket name | None |
| HTTP [<sup>2</sup>](#notes) | `http` | Remote host | N/A |
| NFS/Local [<sup>3</sup>](#notes) | `nodelocal` | File system location | N/A |
| S3-compatible services | `s3` | Bucket name | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `AWS_ENDPOINT` |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjibson Please review this row.

@@ -109,8 +109,13 @@ The URL for your backup's locations must use the following format:
| Google Cloud Storage | `gs` | Bucket name | None––currently only supports instance auth, but we can build non-instance auth at a customer's request |
| HTTP | `http` | Remote host | N/A |
| NFS | `nodelocal` | File system location | N/A |
| S3-compatible services | `s3` | Bucket name | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `AWS_ENDPOINT` |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjibson Please review this row.

{{site.data.alerts.callout_info}}Backups stored on NFSes work only if each node in the cluster accesses the drive in the same way.{{site.data.alerts.end}}
#### Considerations

- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benesch Please review this section

#### Considerations

- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
- Backups stored on NFSes work only if each node in the cluster accesses the drive in the same way.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt Please review this section

Copy link
Member

@dt dt Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --external-io-dir mitigates this limitation to some extent -- previously, since the path was relative to /, every node had to mount the NFS share in the same place, like /mnt/nfs/backups or whatever. Now that the backup path is relative to the node's configured external-io-dir, nodes can have the NFS mount at whatever location they want as long as they all point to the same share using their external-io-dir flag.. e.g. one node might be started with --external-io-dir=/mnt/nfs/backups while another might use --external-io-dir=/Volumes/backups or whatever.

The important thing is that all nodes use --external-io-dir to point to the same shared storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt So I can delete the second bullet (Backups stored on NFSes...), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amruta-Ranade, @dt, has this conversation been resolved?

@@ -42,6 +42,7 @@ Flag | Description
`--background` | Set this to start the node in the background. This is better than appending `&` to the command because control is returned to the shell only once the node is ready to accept requests.
`--cache` | The total size for caches, shared evenly if there are multiple storage devices. This can be a percentage or any bytes-based unit, for example: <br><br>`--cache=25%`<br>`--cache=1000000000 ----> 1000000000 bytes`<br>`--cache=1GB ----> 1000000000 bytes`<br>`--cache=1GiB ----> 1073741824 bytes` <br><br><strong>Note:</strong> If you enter the cache size as a percentage, you might need to escape the `%` sign, for instance, while configuring CockroachDB through systemd service files.<br><br><span class="version-tag">Changed in v1.1: </span>**Default:** `128MiB`<br><br>The default cache size is reasonable for local development clusters. For production deployments, this should be increased to 25% or higher. See [Recommended Production Settings](recommended-production-settings.html#cache-and-sql-memory-size-changed-in-v1-1) for more details.
`--certs-dir` | The path to the [certificate directory](create-security-certificates.html). The directory must contain valid certificates if running in secure mode.<br><br>**Default:** `${HOME}/.cockroach-certs/`
`--external-io-dir` | <span class="version-tag">New in v2.0</span> The path of the external IO directory with which the local file access paths are prefixed while performing backup and restore operations using local node directories or NFS drives. If set to `disabled`, backups and restores using local node directories and NFS drives are disabled.<br><br>**Default:** `extern` directory on the first storage device specified by the `--store` flag.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt Please review this section

Copy link
Member

@dt dt Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just call them stores? though we generally only advise configuring one per physical storage device. To be a bit pedantic, since we default to cockroach-data, it isn't always explicitly specified by --store.

I might go for something like:

extern subdirectory of the first configured store (see the --store flag).

| Backup Location | scheme | host | parameters |
|-----------------|--------|------|------------|
| Amazon S3 | `s3` | Bucket name | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
| Azure | `azure` | Container name | `AZURE_ACCOUNT_KEY`, `AZURE_ACCOUNT_NAME` |
| Google Cloud Storage | `gs` | Bucket name | None––currently only supports instance auth, but we can build non-instance auth at a customer's request |
| HTTP | `http` | Remote host | N/A |
| NFS | `nodelocal` | File system location | N/A |
| NFS [<sup>1</sup>](#considerations) | `nodelocal` | File system location [<sup>2</sup>](#considerations) | N/A |
| S3-compatible services | `s3` | Bucket name | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `AWS_ENDPOINT` |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjibson Please review this row.

v2.0/import.md Outdated

#### Notes

[<sup>1</sup>](#import-file-temp-directory-urls) Only supports instance auth.

[<sup>2</sup>](#import-file-temp-directory-urls) You can easily create your own HTTP server with [Caddy or nginx](create-a-file-server.html).

[<sup>3</sup>](#import-file-temp-directory-urls) If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL.
[<sup>3</sup>](#import-file-temp-directory-urls) If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL. The file system backup location on the NFS drive is relative to the path specified by the `--external-io-dir` flag set while [starting the node](start-a-node.html). If the flag is set to `disabled`, then imports from local directories and NFS drives are disabled.
Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dt Please review this bullet point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence of this is no longer quite true (due to the second).
Now it is important only that each node in your cluster configure its --external-io-dir flag to point to its local mount point of the that NFS share (or, as an alternative, symlink from the default extern subdirectory to the NFS-backed directory).

v2.0/import.md Outdated
[<sup>3</sup>](#import-file-temp-directory-urls) If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL.
[<sup>3</sup>](#import-file-temp-directory-urls) If using NFS for your temp directory, each node in the cluster must have access to the NFS using the same URL. The file system backup location on the NFS drive is relative to the path specified by the `--external-io-dir` flag set while [starting the node](start-a-node.html). If the flag is set to `disabled`, then imports from local directories and NFS drives are disabled.

The location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
Copy link
Contributor Author

@Amruta-Ranade Amruta-Ranade Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benesch Please review this paragraph

Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My parts LGTM

@benesch
Copy link
Contributor

benesch commented Jan 10, 2018

:lgtm:


Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


Comments from Reviewable

@dt
Copy link
Member

dt commented Jan 16, 2018

Review status: 1 of 4 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


v2.0/backup.md, line 135 at r1 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

@dt Does this sound right: "The file system backup location on a local node or NFS drive is relative to the path specified by the --external-io-dir flag set while starting the node. If the --external-io-dir flag is set to disabled, then backups to local directories and NFS drives are disabled. By default, the flag is set to the extern subdirectory of the first configured store. To avoid restarting your nodes to set the --external-io-dir flag to the locations you want to use, create symlinks to the desired locations from within the extern directory."

👍 that last sentence could be prefixed with Note: or something or have a you can or some indication that it is an optional suggestion, not a strong recommendation.


v2.0/restore.md, line 117 at r1 (raw file):

Previously, Amruta-Ranade (Amruta Ranade) wrote…

@dt So I can delete the second bullet (Backups stored on NFSes...), right?

yes.


Comments from Reviewable

@dt
Copy link
Member

dt commented Jan 16, 2018

:lgtm:


Review status: 1 of 4 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


Comments from Reviewable

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, @Amruta-Ranade. Looks good overall. Just a few comments/questions.

v2.0/backup.md Outdated
{{site.data.alerts.callout_info}}Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.{{site.data.alerts.end}}
- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
- [<sup>1</sup>](#import-file-temp-directory-urls) Because CockroachDB is a distributed system, you cannot meaningfully store backups "locally" on nodes. The entire backup file must be stored in a single location, so attempts to store backups locally must point to an NFS drive to be useful.
- [<sup>2</sup>](#import-file-temp-directory-urls) <span class="version-tag">New in v2.0</span> The file system backup location on a local node or NFS drive is relative to the path specified by the `--external-io-dir` flag set while [starting the node](start-a-node.html). If the `--external-io-dir` flag is set to `disabled`, then backups to local directories and NFS drives are disabled. By default, the flag is set to the `extern` subdirectory of the first configured [`store`](#store). To avoid restarting your nodes to set the `--external-io-dir` flag to the locations you want to use, create symlinks to the desired locations from within the `extern` directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The <span class="version-tag">New in v2.0</span> doesn't behave well will bulleted lists. Renders weird:

    screen shot 2018-01-19 at 10 11 27 am

    So let's just remove it here. It's called out as new in the cockroach start docs, which is sufficient.

  • (#store) should be (start-a-node.html#store).

  • In the last sentence, "To avoid restarting your nodes to set the --external-io-dir flag to the locations you want to use` is confusing me. Can you rephrase?

v2.0/import.md Outdated
@@ -172,14 +172,17 @@ URLs for the file you want to import and your temp directory must use the follow
| Google Cloud [<sup>1</sup>](#notes) | `gs` | Bucket name | None |
| HTTP [<sup>2</sup>](#notes) | `http` | Remote host | N/A |
| NFS/Local [<sup>3</sup>](#notes) | `nodelocal` | File system location | N/A |
| S3-compatible services | `s3` | Bucket name | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, `AWS_ENDPOINT` |

#### Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is called Considerations in backup.md, perhaps use that heading here as well? Also use bullets like in backup.md?

#### Considerations

- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
- Backups stored on NFSes work only if each node in the cluster accesses the drive in the same way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Amruta-Ranade, @dt, has this conversation been resolved?

v2.0/restore.md Outdated

- The backup location parameters often contain special characters that need to be URI-encoded. Use Javascript's [encodeURIComponent](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent) function or Go language's [url.QueryEscape](https://golang.org/pkg/net/url/#QueryEscape) function to URI-encode the parameters. Other languages provide similar functions to URI-encode special characters.
- Backups stored on NFSes work only if each node in the cluster accesses the drive in the same way.
- <span class="version-tag">New in v2.0</span> The file system backup location on a local node or NFS drive is relative to the path specified by the `--external-io-dir` flag set while [starting the node](start-a-node.html). If the flag is set to `disabled`, then backups and restores from local directories and NFS drives are disabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feedback as for backup.md: Let's remove <span class="version-tag">New in v2.0</span> here.

@@ -42,6 +42,7 @@ Flag | Description
`--background` | Set this to start the node in the background. This is better than appending `&` to the command because control is returned to the shell only once the node is ready to accept requests.
`--cache` | The total size for caches, shared evenly if there are multiple storage devices. This can be a percentage or any bytes-based unit, for example: <br><br>`--cache=25%`<br>`--cache=1000000000 ----> 1000000000 bytes`<br>`--cache=1GB ----> 1000000000 bytes`<br>`--cache=1GiB ----> 1073741824 bytes` <br><br><strong>Note:</strong> If you enter the cache size as a percentage, you might need to escape the `%` sign, for instance, while configuring CockroachDB through systemd service files.<br><br><span class="version-tag">Changed in v1.1: </span>**Default:** `128MiB`<br><br>The default cache size is reasonable for local development clusters. For production deployments, this should be increased to 25% or higher. See [Recommended Production Settings](recommended-production-settings.html#cache-and-sql-memory-size-changed-in-v1-1) for more details.
`--certs-dir` | The path to the [certificate directory](create-security-certificates.html). The directory must contain valid certificates if running in secure mode.<br><br>**Default:** `${HOME}/.cockroach-certs/`
`--external-io-dir` | <span class="version-tag">New in v2.0</span> The path of the external IO directory with which the local file access paths are prefixed while performing backup and restore operations using local node directories or NFS drives. If set to `disabled`, backups and restores using local node directories and NFS drives are disabled.<br><br>**Default:** `extern` subdirectory of the first configured `store` (see [`store`](#store) below).<br><br>To avoid restarting your nodes to set the `--external-io-dir` flag to the locations you want to use, create symlinks to the desired locations from within the `extern` directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add colon to "new" callout, i.e., <span class="version-tag">New in v2.0:</span>.

  • Let's remove "see store below" and just make the previous instance of store a link, i.e.,

    the first configured store.

  • As I mentioned in my review of backup.md, I don't understand what this is saying:

    To avoid restarting your nodes to set the --external-io-dir flag to the locations you want to use, create symlinks to the desired locations from within the extern directory.

    Can you rephrase, or explain it to me in other terms so I can figure out what I'm not getting?

@jseldess
Copy link
Contributor

LGTM

1 similar comment
@jseldess
Copy link
Contributor

TC is failing on a link to www.bottlecaps.de/rr/ui, which is an attribution at the bottom of our full sql grammar page. Merging in the expectation that the site comes back eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants