Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: pagination changes #1295

Merged
merged 9 commits into from
May 13, 2016

Conversation

stephenplusplus
Copy link
Contributor

@stephenplusplus stephenplusplus commented May 6, 2016

Fixes #1293
RE: #1260
RE: #1260 (comment)

To Dos

Changes:

  • autoPaginate is gone
  • apiResponse from run and runQuery is gone
  • We only continue running a query automatically for a user if Datastore returns NOT_FINISHED
  • In place of a nextQuery object, we return an info object, which contains
    • moreResults
    • endCursor (if provided by Datastore)

@stephenplusplus stephenplusplus added the api: datastore Issues related to the Datastore API. label May 6, 2016
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 6, 2016
@stephenplusplus stephenplusplus added don't merge and removed cla: yes This human has signed the Contributor License Agreement. labels May 6, 2016
@pcostell
Copy link
Contributor

pcostell commented May 6, 2016

This seems like a reasonable approach to me.

Wouldn't this fit it with all the existing pagination + streaming infrastructure if we return a next_query when more_results == NOT_FINISHED (and then let the system handle everything for us)?

In the case of other more_results values, just return the info object and no next_query.

@stephenplusplus
Copy link
Contributor Author

Wouldn't this fit it with all the existing pagination + streaming infrastructure if we return a next_query when more_results == NOT_FINISHED (and then let the system handle everything for us)?

Yes, we can do that, but I would still remove autoPaginate as an option. We should always just run NOT_FINISHED automatically (right?). The only downside is the pagination/streaming infrastructure ignores further arguments, such as apiResponse, because if there are multiple API calls, a single apiResponse is somewhat confusing. I would just do a manual override and implement the streaming behavior inside of the runQuery method itself-- not a huge deal.

In the case of other more_results values, just return the info object and no next_query.

I think I'm still seeing the value in returning info in place of nextQuery. An example of where we're introducing complexity is one of our tests which went from:

var q = datastore.createQuery('Character')
  .hasAncestor(ancestor)
  .limit(5);

datastore.runQuery(q, function(err, firstEntities, nextQuery) {
  // ... get the next results ...
  datastore.runQuery(nextQuery, // ...

to:

var q = datastore.createQuery('Character')
  .hasAncestor(ancestor)
  .limit(5);

datastore.runQuery(q, function(err, firstEntities, nextQuery) {
  // ... get the next results ...
  q.start(info.endCursor).limit(q.limitVal - firstEntities.length);
  datastore.runQuery(q, ...

If the user only needs to know the start cursor of the next query for returning to the FE:

var q = datastore.createQuery('Character')
  .hasAncestor(ancestor)
  .limit(5);

datastore.runQuery(q, function(err, firstEntities, nextQuery) {
  hypotheticalHttpResponse.send({ startCursor: nextQuery.startVal });

There's always the option of getting it from the raw response as well:

datastore.runQuery(q, function(err, firstEntities, nextQuery, apiResponse) {
  hypotheticalHttpResponse.send({ startCursor: apiResponse.batch.endCursor });

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 11, 2016
@pcostell
Copy link
Contributor

I'm confused about that test change:


var q = datastore.createQuery('Character')
  .hasAncestor(ancestor)
  .limit(5);

datastore.runQuery(q, function(err, firstEntities, nextQuery) {
  // ... get the next results ...
  q.start(info.endCursor).limit(q.limitVal - firstEntities.length);
  datastore.runQuery(q, ...

If we always continue the query when the backend returns NOT_FINISHED, the callback should never get a query where q.limitVal > firstEntities.length.

@stephenplusplus
Copy link
Contributor Author

Doesn't it get "MORE_RESULTS_AFTER_LIMIT"?

@stephenplusplus
Copy link
Contributor Author

Sorry if I misunderstood. The test checks that we only get 5 results first, then after the second query, we get the remaining. I was just duplicating the logic that we do (currently) when preparing the nextQuery:

nextQuery.limit(limit - resp.batch.entityResults.length);

@pcostell
Copy link
Contributor

Yes in that test info.more_results == MORE_RESULTS_AFTER_LIMIT. But that means the limit should equal the number of entities returned. So q.limitVal - firstEntities.length would always be <= 0.

@stephenplusplus
Copy link
Contributor Author

So it's basically just a weird way of saying "remove the limit" then, right? The previous test didn't do that because it was done by runQuery automatically. But without nextQuery being built for the user already, the user can either make an entirely new query object, or remove the limit themselves from their previous query, which is what this test is doing.

@pcostell
Copy link
Contributor

I don't think so in this case. If you want to use this for paging, you would always set the same limit ("I want pages of size 50").

The logic for updating the limit and offset should only happen inside the clients and only when more_results == NOT_FINISHED.

Clearing the limit after the first query is more like saying "I want the first 50 results, then after that I want all the results.". I don't think the client should make this easy -- if a user wants all the results, they should just not set a limit and ask for all the results.

@stephenplusplus
Copy link
Contributor Author

My mistake. It's just a fluke in this test case that it works. Ignoring the part where we always run nextQuery automatically, this is the logic I think we should go back to, in regards to how nextQuery is built: https://github.com/GoogleCloudPlatform/gcloud-node/blob/339cda2b14bb116c619a92788ebabdcd1c837e77/lib/datastore/request.js#L510 -- is there anything wrong with how we're setting the offset and limit in this case?

@pcostell
Copy link
Contributor

That logic makes sense. I think though that it gets confusing that basically your first query handles an offset, but then we generate more queries that don't. I think we shouldn't try to handle so much in the client and just expose the cursors -- The user can make their own decision about how to treat the first page and 2-n pages differently.

@stephenplusplus
Copy link
Contributor Author

I think though that it gets confusing that basically your first query handles an offset, but then we generate more queries that don't.

Can you elaborate on that?

@pcostell
Copy link
Contributor

It makes sense in the case of NOT_FINISHED -- basically the user says "I want 50 entities, starting with the 100th". This is exactly what we should give them. Then we should just give them a cursor and say "here's where to start if you want to continue." Providing a nextQuery assumes that they want another 50 entities starting at the 150th. Giving them a cursor makes the user be explicit about what they are doing.

@stephenplusplus
Copy link
Contributor Author

Thanks. So at this point, you prefer this:

var q = datastore.createQuery('Character')
  .hasAncestor(ancestor)
  .limit(5);

datastore.runQuery(q, function(err, firstEntities, info) {
  // ... get the next results ...
  var nextQuery = datastore.createQuery('Character')
    .hasAncestor(ancestor)
    .start(info.endCursor);

  datastore.runQuery(q, ...

I'm also a fan of forcing users to be explicit, but I do like providing sane defaults and making things as easy as possible. Isn't it a relatively safe assumption that if a user asked for 50, that they expect the "nextQuery" would be prepared to get the next 50? The user will always have the choice of creating a new query or simply overriding the limit if it's wrong.

@pcostell
Copy link
Contributor

Possibly, but we are also assuming that they don't want an offset either. Since we are expecting them to not actually call this in the same context (given the httpresponse to the user and back), they'll likely need to recreate the query anyways.

I would consider making it easy to issue the same query multiple times in the same request a non-goal of the client. It will encourage users to try managing batching themselves. Instead we should let the server handle it's own resource management. It will also make it easier to switch to a streaming setup in the future without changing how the customer interacts with the client. In particular, if we have a streaming RPC it would be less efficient to fetch entities in multiple batches as we'll have to create and tear down these connections, rather than streaming as many results as we can.

@stephenplusplus
Copy link
Contributor Author

Alright, I'll go forward with the info object and manual stream override of runQuery. Will ping again for another review soon!

@stephenplusplus
Copy link
Contributor Author

@pcostell ptal!

* };
*
* if (info.moreResults !== 'NO_MORE_RESULTS') {
* frontEndResponse.startCursor = info.endCursor;

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

coveralls commented May 12, 2016

Coverage Status

Coverage decreased (-0.02%) to 99.978% when pulling 7528e09 on stephenplusplus:spp--1260 into f5ca22f on GoogleCloudPlatform:master.

@coveralls
Copy link

coveralls commented May 12, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling d920615 on stephenplusplus:spp--1260 into f5ca22f on GoogleCloudPlatform:master.

@stephenplusplus
Copy link
Contributor Author

@pcostell feel free to take another look. I believe I've covered everything we're going for.

@pcostell
Copy link
Contributor

LGTM

@stephenplusplus
Copy link
Contributor Author

Great! I'm glad we are finally doing things the Datastore-approved way. Thanks for being patient with my questions!

@pcostell
Copy link
Contributor

Haha thank you for being patient with my answers :-). Hopefully it makes user's lives easier. Let me know what the feedback is and we can evaluate if the server API should change at all to accommodate.

* contacts: entities
* };
*
* if (info.moreResults !== 'NO_MORE_RESULTS') {

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus
Copy link
Contributor Author

@callmehiphop updated with the constant.

@coveralls
Copy link

coveralls commented May 13, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling d190a72 on stephenplusplus:spp--1260 into 41f5233 on GoogleCloudPlatform:master.

@@ -374,6 +398,8 @@ Datastore.int = function(value) {
return new entity.Int(value);
};

Datastore.NO_MORE_RESULTS = 'NO_MORE_RESULTS';

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus
Copy link
Contributor Author

Added the other constants!

@coveralls
Copy link

coveralls commented May 13, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 2239bbf on stephenplusplus:spp--1260 into 41f5233 on GoogleCloudPlatform:master.

* @param {object} callback.info - An object useful for pagination.
* @param {?string} callback.info.endCursor - Use this in a follow-up query to
* begin from where these results ended.
* @param {string} callback.info.moreResults - Datastore responds with one of:

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

coveralls commented May 13, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling a7e175d on stephenplusplus:spp--1260 into 41f5233 on GoogleCloudPlatform:master.

@coveralls
Copy link

coveralls commented May 13, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling bdcf552 on stephenplusplus:spp--1260 into 41f5233 on GoogleCloudPlatform:master.

@@ -111,10 +111,17 @@ describe('documentation', function() {
global: global
};

function FakeExpress() {
return {
get: function() {}

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@coveralls
Copy link

coveralls commented May 13, 2016

Coverage Status

Coverage remained the same at 100.0% when pulling 2b24a88 on stephenplusplus:spp--1260 into 41f5233 on GoogleCloudPlatform:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants