Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Clarify migration from warm to cold tier for searchable snapshots #77583

Conversation

arteam
Copy link
Contributor

@arteam arteam commented Sep 10, 2021

Specify that we can't just reuse the local data as a cold cache and
the migration incurs data transfer costs.

Closes #74385

Specify that we can't just reuse the local data as a cold cache and
the migration incurs data transfer costs.

Closes elastic#74385
@arteam arteam added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 Team:Docs Meta label for docs team labels Sep 10, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@arteam arteam added the >docs General docs changes label Sep 10, 2021
@arteam arteam changed the title Clarify migration from warm to cold tier for searchable snapshots [DOCS] Clarify migration from warm to cold tier for searchable snapshots Sep 10, 2021
Comment on lines +34 to +36
[NOTE]
Migration from the warm to the cold tier requires snapshotting the data in the repo
and reading it back to the node which incurs data transfer costs.
Copy link
Contributor

@jrodewig jrodewig Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this update @arteam!

Is this isolated to the warm->cold transition?

It seems like you'd incur this cost whenever you create a searchable snapshot, regardless of the tier or phase transition. For example, if someone goes directly from hot->frozen, I think you'd incur the same data transfer costs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should elaborate a bit here, mentioning that this is also true when hot/warm and cold are co-located. I think this is the part that can confuse some users. Also, we should note that if you already take snapshots of your data, the actual data uploaded to the repo will be minimal. Finally, I am a bit worried about the terminology "data transfer costs", I think we should use clarify that it requires a download of data which may have costs depending on operating environment. Perhaps link to what David wrote in #77607 rather than elaborate too much here on the costs.

I think frozen is less confusing since:

  1. We recommend using dedicated frozen nodes.
  2. Partially mounted indices will only download a minimum set of data initially.
  3. The bulk of the download happens on search.

Still, adding a note may be worthwhile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #77607 kind of supersedes this change and makes it superfluous. #77607 explains in detail all the points about the migration between the tiers including the structure of "data transfer costs".

@DaveCTurner
Copy link
Contributor

I just came across this, sorry I didn't know you were working on these docs too @arteam (hence why I opened #77607). Echoing what Henning says, yes, I think we need to write more words here as this is the sort of thing that comes up frequently when speaking to customers. If we just said this we would be giving the impression that searchable snapshots normally cost more in terms of data transfer than regular indices, which isn't the case.

@arteam
Copy link
Contributor Author

arteam commented Sep 14, 2021

Totally agree, that's why I left my comment #77583 (comment). The doc change #77607 is definitely much more clear, detailed, and articulate than this change which is too blunt. I totally agree that we shouldn't give a "data transfer costs" warning by default because for the majority of customers these costs would be negligible.

I think we can close this issue is favour of #77607 when it's merged.

@henningandersen
Copy link
Contributor

I think the specific point in #74385 is not really covered by #77607 and could still deserve an explicit call-out, either in #77607 or as a follow-up. When hot/warm and cold are co-located, it is not obvious from docs that it will still need to be downloaded (even when no "relocation" happens).

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Sep 14, 2021

I think it is pretty much covered in #77607. We say these things

Takes a snapshot of the managed index in the configured repository and mounts it
[...]
Mounting and relocating the shards of {search-snap} indices involves copying data from the snapshot repository.

I've adjusted this last bit to say "... involves copying the shard contents ..." in 93e0141. Do you think that's enough?

@arteam
Copy link
Contributor Author

arteam commented Sep 15, 2021

Superseded by #77607

@arteam arteam closed this Sep 15, 2021
@arteam arteam deleted the not-data-transfer-costs-for-ss-data-tiers branch September 15, 2021 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Docs Meta label for docs team v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify migration from warm to cold tier
6 participants