-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Next Field in Paginated GetSnapshots Response #74236
Introduce Next Field in Paginated GetSnapshots Response #74236
Conversation
Follow up to elastic#73952 adding documentation for the `after` query parameter and the related `next` response field.
Pinging @elastic/es-distributed (Team:Distributed) |
} | ||
} | ||
], | ||
"next": "snapshot_2,snapshot_2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just not allow pagination when doing a query across multiple repos. The current approach of doing pagination per repo becomes really confusing when multiple next
values are returned but only one after
can be set. Maybe this doesn't matter too much in the short-run as it's an 8.x only issue anyway but for `8 we should have something nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that does seem suboptimal, I think it would be useful to have pagination even if multiple repos are involved. Why do we need to return multiple next
values? Since we're using the same sort order for all of them can we not use the minimum of the next
values? Relatedly, does this mean that the limit
applies to each repository separately rather than across all of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, does this mean that the limit applies to each repository separately rather than across all of them?
Yes, currently that's how it's implemented (mainly because of the focus on getting this out the door for 7.x where this is a non-concern because we only work with a single repo at a time and I didn't want to create different logic for master
and 7.x
for now to keep the backport reasonably simple).
Since we're using the same sort order for all of them can we not use the minimum of the next values?
Why do we need to return multiple next values?
We can implement pagination globally across all repos but then the logic for master will diverge a lot from 7.x
which I'd rather not do right now. If we don't do that though, I don't see an alternative to either forbidding pagination with multiple repos in the query or having multiple next values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, forbidding it seems the most future-proof answer today, it's easy to become more permissive in future but much harder to change something that works into something else that works differently.
I think it'd be better for the next
field to be at the top level of the response tho rather than having it inside the (single) snapshot object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problems/confusion here has been resolved last week in #74451. We now have just a single next
field at the top level and snapshots are paginated across repositories :)
@DaveCTurner @henningandersen sorry for the noise in this one, now that #74451 has been merged this one should be good to review. I added the next parameter at the top level now and it works across repositories (admittedly not yet in a super efficient manner but the API should behave well :)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, looking good otherwise.
final String[] parts = param.split(","); | ||
if (parts.length != 2) { | ||
throw new IllegalArgumentException( | ||
"after param must be of the form ${sort_value},${snapshot_name} but was [" + param + "]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not include the repo as a tiebreaker too? I.e., the format is sort_value,repo_name,snapshot_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ will do
qa/smoke-test-http/src/test/java/org/elasticsearch/http/snapshots/RestGetSnapshotsIT.java
Outdated
Show resolved
Hide resolved
this.snapshots = List.copyOf(snapshots); | ||
this.failures = failures == null ? Map.of() : Map.copyOf(failures); | ||
this.next = next; | ||
} | ||
|
||
public GetSnapshotsResponse(StreamInput in) throws IOException { | ||
this.snapshots = in.readList(SnapshotInfo::readFrom); | ||
if (in.getVersion().onOrAfter(GetSnapshotsRequest.MULTIPLE_REPOSITORIES_SUPPORT_ADDED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it you will backport both this and the multiple repo PR in one go then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on whether this goes in today I guess with FF incoming :) If not I'll adjust the PR and backport separately.
NOTE: The pagination parameters `size`, `order`, and `sort` are not supported when using `verbose=false` and the sort order for | ||
`after`:: | ||
(Optional, string) | ||
Offset identifier to start pagination from as returned by the `next` field in the response body. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth spending a few lines explaining that the underlying list can change and how pagination works in case of deletes and additions during pagination (i.e., you are sure to not miss any that were in the list originally and have not been deleted during pagination and you may see snapshots added during pagination).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ added a few lines on this
@@ -119,7 +119,7 @@ private GetSnapshotsResponse createTestInstance() { | |||
failures.put(repository, new ElasticsearchException(randomAlphaOfLength(10))); | |||
} | |||
|
|||
return new GetSnapshotsResponse(responses, failures); | |||
return new GetSnapshotsResponse(responses, failures, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we sometimes need to add a random next
value here? Also, assertEqualInstances
should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
} | ||
|
||
private final List<SnapshotInfo> snapshots; | ||
|
||
private final Map<String, ElasticsearchException> failures; | ||
|
||
public GetSnapshotsResponse(List<SnapshotInfo> snapshots, Map<String, ElasticsearchException> failures) { | ||
@Nullable | ||
private final String next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update equals
and hashCode
to account for the new field too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still reviewing this properly but had one comment straight away regarding encoding.
|
||
[source,console] | ||
---- | ||
GET /_snapshot/my_repository/snapshot*?size=2&sort=name&after=snapshot_2,snapshot_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry we will have users running into encoding problems with this API, since we permit characters like &
and %
in snapshot names and the effects of failing to encode this parameter properly might be pretty subtle. Could we base64-encode this parameter to avoid all that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ will do :)
Jenkins run elasticsearch-ci/rest-compatibility (git timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks ok to me but I'll let Henning have the final say. I left more suggestions re. the encoding.
.../src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/GetSnapshotsRequest.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/GetSnapshotsRequest.java
Outdated
Show resolved
Hide resolved
final String[] parts = new String(Base64.getDecoder().decode(param), StandardCharsets.UTF_8).split(","); | ||
if (parts.length != 3) { | ||
throw new IllegalArgumentException( | ||
"after param must be base64 encoded and of the form ${sort_value},${repository_name},${snapshot_name} but was [" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to spell out the format here IMO
"after param must be base64 encoded and of the form ${sort_value},${repository_name},${snapshot_name} but was [" | |
"invalid ?after parameter [" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
...est/java/org/elasticsearch/action/admin/cluster/snapshots/get/GetSnapshotsResponseTests.java
Outdated
Show resolved
Hide resolved
Thanks @henning and @DaveCTurner! all points addressed now I think :) |
This comment has been minimized.
This comment has been minimized.
Sorry @henning wrong Henning. Ment to ping @henningandersen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think next
as a string and after
as an object in response/request is slightly inconsistent, but resolving that can be done in a followup rather than in this PR. I think after
should turn into a String too, making it opaque at the transport layer too.
.../src/main/java/org/elasticsearch/action/admin/cluster/snapshots/get/GetSnapshotsRequest.java
Outdated
Show resolved
Hide resolved
Jenkins run elasticsearch-ci/part-1 (unrelated) |
Thanks Henning + David! |
) Backport of the recently introduced snapshot pagination and scalability improvements listed below. Merged as a single backport because the `7.x` and master snapshot status API logic had massively diverged between master and 7.x. With the work in the below PRs, the logic in master and 7.x once again has been aligned very closely again. #72842 #73172 #73199 #73570 #73952 #74236 #74451 (this one is only partly applicable as it was mainly a change to master to align `master` and `7.x` branches)
Follow up to #73952 adding documentation for the
after
query parameterand the related
next
response field.