Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter Unneeded SnapshotInfo Instances Early in TransportGetSnapshotsAction #78032

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Sep 20, 2021

Better to filter as early as possible to release the memory asap and not even fetch things we don't need to fetch to begin with.
There's still a bunch of spots remaining where similar optimizations can be added quickly before we implement the system index for the remaining searching/fetching that we can't easily exclude up-front.
This already gives vastly improved performance for many requests though for obvious reasons.

relates #74350

…Action

Better to filter as early as possible to release the memory asap.
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I've left minor comments only.

predicate = fromSortValuePredicate.and(predicate);
}
}
return predicate;
}

// Builds a predicate that can be applied to a combination of snapshot id and repository data to filter out snapshots that are not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regular Javadoc blocks look better in most tools/IDE, why not use them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ moved to regular docs all around :)

? snapshotInfo -> after.compareTo(snapshotInfo.snapshotId().getName()) <= 0
: snapshotInfo -> after.compareTo(snapshotInfo.snapshotId().getName()) >= 0;
// already covered by preflight filtering
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the Nullable annotation to the returned value of the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this a little with Henning's suggestions now and added nullable wherever it was appropriate

} else {
isAfter = order == SortOrder.ASC
// TODO: cover via pre-flight predicate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be too complex/noisy to address in this PR too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoyingly enough yes because of #78032 (comment)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, otherwise this looks good.

if (details == null) {
return true;
}
return after <= details.getStartTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check for -1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ added the check here and for the duration logic

if (details == null) {
return true;
}
return after >= details.getStartTimeMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also check for -1 explicitly here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ added the check here and for the duration logic

} else {
isAfter = order == SortOrder.ASC
// TODO: cover via pre-flight predicate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to solve this case in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I was going for a follow-up to keep this one shorter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should point out that the main reason adding this would blow up the size of the PR is a the peculiar behavior we have for after at the moment. The after parameter currently behaves differently from the from_sort_value in the count that it reports. It always reports the same overall count (the number of snapshots matching the query without after) while the from_sort_value simply reports the count of everything matching the request period.

I'm not sure the behavior of after is needed (as that param is just there for iteration anyway and I don't see how the total count even matters in use cases) but I think we should discuss + clean that up in a separate PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it makes sense to report back the count on every iteration/pagination, since this is a moving target. So you could see 51 first time for a page of 50 and then after pagination you may see 55. I think it would be nice to have the count in a UI be somewhat consistent with the last page of data shown?

// required to answer the request before loading their SnapshotInfo from the repository
// TODO: extend this method to cover the pagination after value as well where possible
@Nullable
private static BiPredicate<SnapshotId, RepositoryData> buildSnapshotPreflightPredicate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead have a single method that builds both the early and the late filter? I think that would make it easier to instinctly verify that the return nulls in buildFromSortValuePredicate are correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it build both predicates in the same code path now to make this easier to follow :)

final long after = Long.parseLong(fromSortValue);
return order == SortOrder.ASC ? (snapshotId, repositoryData) -> {
final RepositoryData.SnapshotDetails details = repositoryData.getSnapshotDetails(snapshotId);
if (details == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests generating data with no details as well as with details but -1 timestamp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't unfortunately. I'll try adding some. We should have the infrastructure to do it with reasonable effort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have the infrastructure to do it with reasonable effort

Turns out that we do not. We have infrastructure for testing this, but it's broken in a subtle way causing it to not actually test missing information here (it's build using the test-repo BlobStoreRepositoryTest but doesn't work because the repo never writes RepositoryData without the details at this point).
Adding tests for this case requires fixing that functionality which is a bigger piece of work I think. Given how the fact that the repository regenerates the snapshot details so efficiently these days it seems like quite the edge case and may be ok to test in a follow-up? :)

@original-brownbear
Copy link
Member Author

Thanks @henningandersen and @tlrx I addressed what I could from your comments and tried to explain the open points. This should be good for another round whenever you have some time :)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -438,7 +470,11 @@ private void snapshots(
snapshotIdsToIterate,
ignoreUnavailable == false,
task::isCancelled,
(context, snapshotInfo) -> snapshotInfos.add(snapshotInfo),
predicate == null ? (context, snapshotInfo) -> snapshotInfos.add(snapshotInfo) : (context, snapshotInfo) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use a special predicate object instead of nul to avoid checking for null. We could still have a specific predicate we use to allow the assertions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly similar to what Henning suggested, I wonder if we could fold the logic of checking the nullity of predicates and then test directly into SnapshotPredicates, which would provide a preFilter and filter methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in a follow-up. I have a queued up round of improvements here that really want the special case null (where it would be the same code complexity if I used a placeholder predicate and checked for that) :)

fromSortValuePredicate = null;
break;
case REPOSITORY:
preflightPredicate = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a comment here explaining where this is handled?

: (info -> compareName(snapshotName, repoName, info) > 0);
}
break;
// TODO: cover via pre-flight predicate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you can unless we change how the count works? Though I guess we could possibly avoid getting the snapshot info for the "before" snapshots if they are not needed.

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I've left a suggestion to move more logic - is possible - into SnapshotPredicates.

@@ -438,7 +470,11 @@ private void snapshots(
snapshotIdsToIterate,
ignoreUnavailable == false,
task::isCancelled,
(context, snapshotInfo) -> snapshotInfos.add(snapshotInfo),
predicate == null ? (context, snapshotInfo) -> snapshotInfos.add(snapshotInfo) : (context, snapshotInfo) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly similar to what Henning suggested, I wonder if we could fold the logic of checking the nullity of predicates and then test directly into SnapshotPredicates, which would provide a preFilter and filter methods

@original-brownbear
Copy link
Member Author

Thanks Henning & Tanguy!

@original-brownbear original-brownbear merged commit 377f546 into elastic:master Sep 27, 2021
@original-brownbear original-brownbear deleted the optimize-get-snapshots-api branch September 27, 2021 09:23
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 17, 2021
…Action (elastic#78032)

Better to filter as early as possible to release the memory asap and not even fetch things we don't need to fetch to begin with.
There's still a bunch of spots remaining where similar optimizations can be added quickly before we implement the system index for the remaining searching/fetching that we can't easily exclude up-front.
This already gives vastly improved performance for many requests though for obvious reasons.
original-brownbear added a commit that referenced this pull request Oct 17, 2021
…Action (#78032) (#79321)

Better to filter as early as possible to release the memory asap and not even fetch things we don't need to fetch to begin with.
There's still a bunch of spots remaining where similar optimizations can be added quickly before we implement the system index for the remaining searching/fetching that we can't easily exclude up-front.
This already gives vastly improved performance for many requests though for obvious reasons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants