Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote store main page updates. Add shallow snapshots #5078

Merged
merged 9 commits into from
Sep 22, 2023

Conversation

Naarcha-AWS
Copy link
Collaborator

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Naarcha-AWS Naarcha-AWS added 3 - Tech review PR: Tech review in progress release-notes PR: Include this PR in the automated release notes v2.10.0 labels Sep 22, 2023
@Naarcha-AWS Naarcha-AWS self-assigned this Sep 22, 2023
Signed-off-by: Naarcha-AWS <[email protected]>
@sachinpkale
Copy link
Member

Overall changes for remote store main page LGTM.

@Naarcha-AWS Naarcha-AWS added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Sep 22, 2023
Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS Please see my comments and changes and let me know if you have any questions. Thanks!

```json
POST my_index/_refresh
```
After segments are created on the primary shard as part of the refresh, flush, and merge flow, the segments are uploaded to remote segment store and the replica shards source a copy from the same remote segment store. This frees up the primary shard from having to perform a data copying operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"prevents" instead of "frees up"?


## Enable the feature flag
Remote-backed storage is a cluster level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Remote-backed storage is a cluster level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level.
Remote-backed storage is a cluster-level setting. It can only be enabled when bootstrapping to the cluster. After bootstrapping completes, the remote-backed storage cannot be enabled or disabled. This provides durability at the cluster level.


# Shallow snapshots

Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than normal snapshots, because segment data is not stored in the snapshot repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last sentence, I would either remove "than normal snapshots" or change to "than when using normal snapshots".


- Shallow copy snapshots only work for remote-backed indexes.
- All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots.
- There is no difference in file size between standard (regular, normal, primary or replica???) shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming either the entire parenthetical or the question marks within it should be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a suggestion from Chris I misunderstood. Adjusting.

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
@Naarcha-AWS Naarcha-AWS merged commit 9479429 into main Sep 22, 2023

Use the [Cluster Settings API]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-settings/) to enable the `remote_store_index_shallow_copy` repository setting, as shown in the following example:

```bash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS we don't need to update cluster settings api, we need to call the PUT _snapshot/ api to enable this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you post the exact call with request body here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this:

curl -X PUT "localhost:9200/_snapshot/snap_repo?pretty" -H 'Content-Type: application/json' -d'
{
        "type": "s3",
        "settings": {
            "bucket": "test-bucket",
            "base_path": "daily-snaps",
            "remote_store_index_shallow_copy": true
        }
    }
'


- Shallow copy snapshots only work for remote-backed indexes.
- All nodes in the cluster must use OpenSearch 2.10 or later to take advantage of shallow copy snapshots.
- There is no difference in file size between standard shards and shallow copy snapshot shards because no segment data is stored in the snapshot itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this line need to be updated. so, we show the incremental file count and size in bytes between the last snapshot and current snapshot in snapshot status API (https://opensearch.org/docs/latest/api-reference/snapshots/get-snapshot-status/#snapshot-file-stats). in case of shallow copy snapshot incremental file count and size in bytes will be zero.


# Shallow snapshots

Shallow copy snapshots allow you to reference data from an entire remote-backed segment instead of storing all of the data from the segment in a snapshot. This makes accessing segment data faster than using normal snapshots because segment data is not stored in the snapshot repository.
Copy link

@harishbhakuni harishbhakuni Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy snapshots allows you to reference data directly from remote store repository instead of storing all of the segment data again in snapshot repository. These snapshots gets created faster than normal snapshots because segment data is not stored in the snapshot repository.

harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
…-project#5078)

* Add remote store main page updates. Add shallow snapshots

Signed-off-by: Naarcha-AWS <[email protected]>

* Add next steps section

Signed-off-by: Naarcha-AWS <[email protected]>

* Remove old content. Fix link. Fix typo.

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix link

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli pushed a commit that referenced this pull request Dec 21, 2023
* Add remote store main page updates. Add shallow snapshots

Signed-off-by: Naarcha-AWS <[email protected]>

* Add next steps section

Signed-off-by: Naarcha-AWS <[email protected]>

* Remove old content. Fix link. Fix typo.

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix link

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update _tuning-your-cluster/availability-and-recovery/remote-store/index.md

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@Naarcha-AWS Naarcha-AWS deleted the remote-store-main-page branch March 28, 2024 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Doc review PR: Doc review in progress release-notes PR: Include this PR in the automated release notes v2.10.0
Projects
None yet
5 participants