Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inefficient (worst case exponential) loading of snapshot repository #24510

Merged
merged 4 commits into from
May 8, 2017

Conversation

joachimdraeger
Copy link
Contributor

Ensure that getRepositoryData() is only called once during a list snapshots operation. Fixes #24509.

…ry data

when checking for incompatible snapshots.
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

* @param snapshotId snapshot id
* @return information about snapshot
*/
SnapshotInfo getSnapshotInfo(RepositoryData repositoryData, SnapshotId snapshotId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that requiring to pass RepositoryData on getSnapshotInfo might not be the cleanest solution regarding Repository's API.

@jasontedor jasontedor requested a review from abeyad May 5, 2017 14:24
@abeyad
Copy link

abeyad commented May 5, 2017

@joachimdraeger thank you for this contribution, and great catch! Indeed this would help solve the performance issues introduced by incompatible-snapshots. However, as you stated, I don't think we want to alter the Repository interface to take the RepositoryData. Rather, I would propose a different solution: in TransportGetSnapshotsAction, we have already retrieved the RepositoryData, so we can pass the incompatible snapshots list from that RepositoryData instance into SnapshotsService#snapshots. The SnapshotsServce#snapshots method can return a SnapshotInfo.incompatible instance, instead of relying on BlobStoreRepository#getSnapshotInfo to do so.

Let me know if you have the time to do this, or if you prefer I take it on. We would want to get this in for 5.4.1 and 5.5.0.

Thanks again!

Regarding your question of running gradle check, what exactly was failing?

Copy link

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left feedback on a different approach

@joachimdraeger
Copy link
Contributor Author

@abeyad thanks for your feedback. I see that the incompatible SnapshotInfo is only used for the list. In get and restore incompatible is determined twice at the moment: directly and in getSnapshotInfo.

I think it would be nicer in general if the source of truth would be getSnapshotInfo with the INCOMPATIBLE state in all cases. However, that would require a much bigger refactoring.

For now I agree that removing the check from 'getSnapshotInfo' and letting 'SnapshotsService#snapshots' check for incompatible snapshots itself seems to be the most viable solution.

I'm happy to give it a go on Monday.

@abeyad
Copy link

abeyad commented May 5, 2017

Thank you @joachimdraeger, don't hesitate to ping me on Monday if I can help in any way.

@joachimdraeger
Copy link
Contributor Author

@abeyad I re-implemented the fix as discussed. I tested the incompatible snapshots listing manually and the performance improvement.

Copy link

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple small comments. Once fixed, we can get this merged.

// an incompatible snapshot - cannot read its snapshot metadata file, just return
// a SnapshotInfo indicating its incompatible
return SnapshotInfo.incompatible(snapshotId);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getSnapshotInfoInternal (just below) is only used by getSnapshotInfo, we can move the code in getSnapshotInfoInternal directly into getSnapshotInfo and get rid of getSnaphotInfoInternal

@@ -196,6 +206,7 @@ public SnapshotInfo snapshot(final String repositoryName, final SnapshotId snaps
return Collections.unmodifiableList(snapshotList);
}


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra new line

Copy link

@abeyad abeyad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will run the tests locally as well and merge. Thank you for working on this @joachimdraeger!

@abeyad
Copy link

abeyad commented May 8, 2017

@elasticmachine test this please

@abeyad abeyad added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug v5.4.1 v5.5.0 v6.0.0 labels May 8, 2017
@abeyad abeyad merged commit fec1802 into elastic:master May 8, 2017
abeyad pushed a commit that referenced this pull request May 8, 2017
This commit fixes inefficient (worst case exponential) loading of 
snapshot repository data when checking for incompatible snapshots,
that was introduced in #22267.  When getting snapshot information,
getRepositoryData() was called on every snapshot, so if there are
a large number of snapshots in the repository and _all snapshots
were requested, the performance degraded exponentially.  This
commit fixes the issue by only calling getRepositoryData once and
using the data from it in all subsequent calls to get snapshot 
information.

Closes #24509
abeyad pushed a commit that referenced this pull request May 8, 2017
This commit fixes inefficient (worst case exponential) loading of 
snapshot repository data when checking for incompatible snapshots,
that was introduced in #22267.  When getting snapshot information,
getRepositoryData() was called on every snapshot, so if there are
a large number of snapshots in the repository and _all snapshots
were requested, the performance degraded exponentially.  This
commit fixes the issue by only calling getRepositoryData once and
using the data from it in all subsequent calls to get snapshot 
information.

Closes #24509
@abeyad
Copy link

abeyad commented May 8, 2017

5.x commit: 690250e
5.4 commit: 21da532

@abeyad
Copy link

abeyad commented May 8, 2017

@joachimdraeger the PR has been merged - thanks again!

@talevy
Copy link
Contributor

talevy commented May 8, 2017

@abeyad are you sure this change was made properly into 5.x?

I am seeing that there is a rogue reference to the getSnapshotInfoInternal method:

https://github.com/elastic/elasticsearch/blob/5.x/core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L369

seems to be the only call to it... but the method definition is deleted.

abeyad pushed a commit that referenced this pull request May 8, 2017
abeyad pushed a commit that referenced this pull request May 8, 2017
@abeyad
Copy link

abeyad commented May 8, 2017

@talevy I pushed ccee1b0 and 6a0e070 to fix the issue on 5.x

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 9, 2017
* master:
  Increase compilation limit in ingest tests
  Mark 6.0.0-alpha1 as prerelease
  Updated release notes for 6.0.0-alpha1
  Fix single shard scroll within a cluster with nodes in version >= 5.3 and <= 5.3 (elastic#24512)
  add option for _ingest.timestamp to use new ZonedDateTime (elastic#24030)
  Fixes inefficient loading of snapshot repository data (elastic#24510)
  Scripting: Deprecate file scripts (elastic#24552)
  Remove commented code from ESILRTC
  Ensure test replicas have valid recovery state
  Add global checkpoint assertion in index shard
  Improve bootstrap checks error messages
  Refactor UpdateHelper into unit-testable pieces
  Fix cache expire after access
  Document work-around for jar hell in idea_rt.jar file (elastic#24523)
  Move MockLogAppender to elasticsearch test (elastic#24542)
  Remove gap skipping when opening engine
  documentation of preserve existing settings
  remove duplicated import in AppendProcessor
@clintongormley clintongormley changed the title Fix inefficient (worst case exponential) loading of snapshot reposito… Fix inefficient (worst case exponential) loading of snapshot repository May 15, 2017
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants