-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SourceOnlySnapshotRepository is Leaking Files #50231
Labels
>bug
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed
Meta label for distributed team (obsolete)
Comments
original-brownbear
added
>bug
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
labels
Dec 16, 2019
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
original-brownbear
added a commit
to original-brownbear/elasticsearch
that referenced
this issue
Feb 2, 2020
We shouldn't be creating a new commit (by bootstrapping an new history) if nothing has changed about the shard. Relates elastic#50231 in that it prevents a bunch of redundant `segments_N` from being uploaded and makes a fix shorter/clearer
ywelsch
added a commit
that referenced
this issue
Mar 23, 2020
Source-only snapshots currently create a second full source-only copy of the shard on disk to support incrementality during upload. Given that stored fields are occupying a substantial part of a shard's storage, this means that clusters with source-only snapshots can require up to 50% more local storage. Ideally we would only generate source-only parts of the shard for the things that need to be uploaded (i.e. do incrementality checks on original file instead of trimmed-down source-only versions), but that requires much bigger changes to the snapshot infrastructure. This here is an attempt to dramatically cut down on the storage used by the source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying them. Relates #50231
ywelsch
added a commit
that referenced
this issue
Mar 23, 2020
Source-only snapshots currently create a second full source-only copy of the shard on disk to support incrementality during upload. Given that stored fields are occupying a substantial part of a shard's storage, this means that clusters with source-only snapshots can require up to 50% more local storage. Ideally we would only generate source-only parts of the shard for the things that need to be uploaded (i.e. do incrementality checks on original file instead of trimmed-down source-only versions), but that requires much bigger changes to the snapshot infrastructure. This here is an attempt to dramatically cut down on the storage used by the source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying them. Relates #50231
Pinging @elastic/es-distributed (Team:Distributed) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
>bug
:Distributed Coordination/Snapshot/Restore
Anything directly related to the `_snapshot/*` APIs
Team:Distributed
Meta label for distributed team (obsolete)
When using
SourceOnlySnapshotRepository
a_snapshot
directory is created in each snapshotted shard's data path. The contents of this path directory are never cleaned up it seems.I tried fixing this via a trivial fix:
but this seems to severely alter the incrementality properties of the source only snapshotting (tests fails because the expected file counts change).
This is kind of obvious from the way it works, but shouldn't we at least delete these
_snapshot
folders when removing a source only repository? Also, shouldn't we document the fact that using a source only repository may require significant disk space to track the source_only on disk index used to make things incremental?Originally reported in https://discuss.elastic.co/t/large-snapshot-directories-on-disk/211988
The text was updated successfully, but these errors were encountered: