Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextQuery always returns a value #1260

Closed
susanlinsfu opened this issue Apr 24, 2016 · 32 comments
Closed

nextQuery always returns a value #1260

susanlinsfu opened this issue Apr 24, 2016 · 32 comments
Assignees
Labels
api: datastore Issues related to the Datastore API.

Comments

@susanlinsfu
Copy link

susanlinsfu commented Apr 24, 2016

I have 42 items which should be returned:

/*
query originally looks like this:
datastore.createQuery('Messages').filter('sender',username).limit(10).autoPaginate(false);
*/

  var query = nextQuery_var_passed_in_from_client;
  getmessages(query,function(err,data){
        if(err){
          console.log('error: ' + err);
        }
    });

});

//callback function
function getrecmessages(query,callback){
    datastore.runQuery(query, function(err, entities, nextQuery, apiResponse){
        if (err) {
            return(callback(err));
        }

        if (nextQuery) {
          console.log('nextQuery = ' + JSON.stringify(nextQuery) );
            if(entities){
                console.log(entities);
            }else{
               console.log('no entities');
         }
        } else {
            // No more results exist.
            console.log('NextQuery IS NULL' );
        }
    });
}

The query first gets the first 10 items, then the next 10, then the next 10, then the remaining 2(42 items). The problem then is when I make the last query to get the last 2 items it returns a nextQuery. As a result it looks as if there is more items even though there are none. I have gone over my code and I can't see anything that is causing this so I am thinking it has something to do with gcloud. I am expecting that when the last query is made no nextQuery should be returned by the datastore. Is this correct?

I am using gcd-rpc emulator and gcloud-node 0.31.0.

@stephenplusplus
Copy link
Contributor

I believe there are issues with the emulator. @pcostell Do you think is related to the issue we found before?

@stephenplusplus stephenplusplus added the api: datastore Issues related to the Datastore API. label Apr 25, 2016
@pcostell
Copy link
Contributor

Yes I believe this is the same issue as before. However, even if there are no more results Cloud Datastore will still return an end cursor which may be useful. So I'm re-echoing my concern that nextQuery isn't the right API for when more_results is MORE_RESULTS_AFTER_LIMIT or MORE_RESULTS_AFTER_CURSOR. Importantly, unlike other APIs the Cloud Datastore end_cursor is not a traditional next_page_token.

@stephenplusplus
Copy link
Contributor

What would be the ideal way of handling pagination? Sorry if I missed your explanation before, feel free to link me.

@stephenplusplus
Copy link
Contributor

// just dropping a link to #1237 as that was the issue we discovered the API/emulator mismatch

@pcostell
Copy link
Contributor

Coming back to this:

I think we need the following:

On more_results = NOT_FINISHED, set next_query. Properly track offset/limit so that this next_query has these set.
On any other more_results, do not set next_query.
This means we can defer to the normal paging tools to handle paging for us. It would also mean that when more_results == MORE_RESULTS_AFTER_LIMIT, next_query would not be set. This means we'll need to expose the end_cursor and more_results so that the user can make an informed decision when choosing to display information to the user.

All of this will apply regardless of the auto_paginate setting. Users that set auto_paginate=False will just have to deal with the case where more_results=NOT_FINISHED, however we'll properly setup the next_query for them.

@stephenplusplus
Copy link
Contributor

I think I'm stuck on why we wouldn't want to provide nextQuery. I think we're doing it just to save the user the time and hassle of building their own query. They of course don't have to run it. And if it's poor form for us to run it for them (i.e, how we're treating it like a nextPageToken), then we can just always return nextQuery, but never run it for the user.

// @jgeewax

@pcostell
Copy link
Contributor

In the case of more_result=NOT_FINISHED, then we want to run nextQuery for them (exactly like a nextPageToken) -- really in this case we don't ever want to not run it for them, but in theory they can thanks to autoPaginate. However, in most other cases we don't expect the user to run it again with the context of that code (for example, we'd expect the user to serialize the cursor then use it later). In this case, having the query isn't actually helpful, they just need to know how the current position in the query. Cursor functionality expends beyond paging. They can be used for splitting work over queries and defining ranges.

@stephenplusplus
Copy link
Contributor

In the case of more_result=NOT_FINISHED, then we want to run nextQuery for them (exactly like a nextPageToken) -- really in this case we don't ever want to not run it for them, but in theory they can thanks to autoPaginate

Woops, forgot about that case. We should definitely run it for them 👍

for example, we'd expect the user to serialize the cursor then use it later

Makes a lot of sense. We could still return a nextQuery, but make it more straightforward on getting the endCursor from it. It's currently the un-documented nextQuery.endVal, but we could expose this with nextQuery.getEndCursor() or just nextQuery.endCursor.

I may be wrong here, but I don't think there is harm in leaving the behavior as it is now. I would bet that the "handle everything for me" approach is desirable to a fair amount of users. However, we definitely need to document nextQuery better, explain what a non-autoPagination cursoring flow looks like, and add the above mentioned accessors.

@susanlinsfu
Copy link
Author

I'm sorry I am not quite following. So with the code I provided would there be a way to modify it to know when there are no more results?

Cheers

@pcostell
Copy link
Contributor

The way I see it is that our desired behavior is a streaming API. If a user asks for a limit of N, we will give them a stream of up to N entities. We can then add some helpers on top of that to convert the stream to a list.

Cloud Datastore then offers a paging feature, which provides the user a cursor they can use to get extra pages. On top of that, Cloud Datastore has the feature of best effort end condition tracking (more_results). This is very useful for users who are showing pages to their users as they can get information about whether or not they should show a "Next Page" button (this is MORE_RESULTS_AFTER_LIMIT). This feature is conservative (MORE_RESULTS_AFTER_LIMIT may be returned even if the correct result is NO_MORE_RESULTS).

Importantly, Cloud Datastore's paging feature is completely separate from the batching API, which is really just an implementation detail because prior to gRPC we couldn't offer a streaming API.

@pcostell
Copy link
Contributor

@susanlinsfu Sorry I took over this thread a bit to discuss proper Cloud Datastore usage. Right now, the next_query is still returned even though there are no more results. I'll try to make sure to explicitly send an update when the emulator has better more_results support.

@stephenplusplus
Copy link
Contributor

I think it would be good to hear from JJ. I'll bring it up in the next meeting (Monday) if we don't hear back sooner.

@jgeewax
Copy link
Contributor

jgeewax commented Apr 27, 2016

I'd like to pop up a level here and look at the situations that users are actually interested in:

  1. I want to loop over all results
  2. I want to loop over all results and break out early if I match a condition
  3. I want to loop over results, but make <= N API calls
  4. I want to get <= N results (with any number of API calls)
  5. I want to build a UI that shows pages and picks up where I left off

Looking at these cases, I see the following:

  1. For all results, we have the streaming API which is probably the best way to handle this use case.
  2. For breaking early, just do the above and call this.end() as documented.
  3. I don't have an easy way to limit the number of API calls...
  4. I can tell Datastore to limit things, and then use the streaming API to iterate through the limit
  5. I should manually track tokens for pages/cursors, but I have quite a bit of code to write

IMO, nextQuery is nice, but it's an implementation detail. I really think that having to reference it at all means we've kind of built the wrong thing.


Here's what I think we should make things look like:

Give me all the results:

datastore.runQuery(query).on('data', function(entity) {
  // do something with entity
});

Give me results until I change my mind:

datastore.runQuery(query).on('data', function(entity) {
  // do something with entity
  if (iFeelLikeIt) this.end();
});

Limit the number of API calls (because that is the same as $$$):

datastore.runQuery(query, {apiCallLimit: 10}).on('data', function(entity) {
  // do something with entity
});

Limit the number of entities (because I only want a max of 10 to show in my list):

query = query.limit(10);
datastore.runQuery(query).on('data', function(entity) {
  // do something with entity
});

Page through things as though I'm showing a UI

query = query.limit(10).startCursor(request.get('cursor'));
var pageEntities = [];
datastore.runQuery(query).on('data', function(entity) {
  pageEntities.push(entity);
});
render_template('template.html', {entities: entities, cursor: query.nextCursor});

Obviously if we do promises, this gets much happier:

query = query.limit(10).startCursor(request.get('cursor'));
datastore.runQuery(query).then(function(entities) {
  render_template('template.html', {entities: entities, cursor: query.nextCursor});
});

@pcostell
Copy link
Contributor

pcostell commented Apr 27, 2016

I'd agree here, except for

I want to loop over results, but make <= N API calls

How many API calls get made is really an implementation detail, and may depend on many other things (like timeouts and constraints on the server). I think this is reflected in Cloud Datastore's new pricing model, which charges per entity returned and doesn't factor in the number of API calls made (starting July 1).

@stephenplusplus
Copy link
Contributor

I think we already do everything you guys agree on :)

@callmehiphop
Copy link
Contributor

So the takeaway here is that we need to tweak the logic for providing a nextQuery object?

@stephenplusplus
Copy link
Contributor

stephenplusplus commented May 2, 2016

I don't have an easy way to limit the number of API calls...

We can add in a maxApiCalls option on each method. We would want to find a way to implement this across our API.

I should manually track tokens for pages/cursors, but I have quite a bit of code to write

Right now, our nextQuery exposes endVal. We can change this (and document it) to endCursor to make it more obvious.

Are these the to-do's from this issue?

@jgeewax
Copy link
Contributor

jgeewax commented May 3, 2016

Re maxApiCalls: Although @pcostell is right that it shouldn't matter for Datastore, it does matter for other APIs. I think it'd be a good idea to allow someone to limit the number of API calls in any method call.

The take-away for me is:

  1. Limiting API calls
    Let's figure out how to do maxApiCalls across the entire library so that people know how to limit things. (In many APIs, the number of calls is directly related to money.)

  2. Exposing a cursor/page rather than nextQuery
    I don't think we need to expose or document nextQuery itself, rather we just need to expose a specific cursor and automagically end when we're done. This might lead to situations where you get an empty page, but unfortunately I don't think there's anything we can do about that as it's a limitation of Datastore itself and not our client.

    Additionally, I'd be curious if anyone was with me in thinking that maybe we should expose a paginate() method on the query that acts like runQuery but does what users expect, and calls things "nextPage" and whatnot rather than a query cursor. This would make code look like...

    var query = datastore.query('Person').filter('name', '=', 'Patrick');
    var itemsPerPage = 10;
    var nextPageToken = request.get('nextPage');
    
    query.paginate(itemsPerPage, nextPageToken, function(pageOfEntities, nextPage) {
      // nextPage just gets sent back in the following request.
      render_template('template.html', {entities: pageOfEntities, nextPage: nextPage});
    });
  3. Update examples
    Update the order of our examples to show the .on('data') way first -- it's far more common that people getting started want to limit to N entities and get them all rather than page through. We should explain the advanced case, but not start with it.

@stephenplusplus
Copy link
Contributor

stephenplusplus commented May 3, 2016

Issue made for point 1 (limiting API calls): #1281


On point 2, I don't think we have to special-case the pagination design throughout our API just for Datastore. Here's what your ideal example looks like:

var query = datastore.query('Person').filter('name', '=', 'Patrick');
var itemsPerPage = 10;
var nextPageToken = request.get('nextPage');

query.paginate(itemsPerPage, nextPageToken, function(pageOfEntities, nextPage) {
  // nextPage just gets sent back in the following request.
  render_template('template.html', {entities: pageOfEntities, nextPage: nextPage});
});

... and what it looks like today:

var query = datastore.query('Person')
  .filter('name', 'Patrick')
  .limit(10)
  .start(request.get('nextPage'));

datastore.runQuery(query, function(err, pageOfEnities, nextQuery) {
  // nextPage just gets sent back in the following request.
  render_template('template.html', {entities: pageOfEntities, nextPage: nextQuery.startVal});
});

I really think the only thing to mess with is nextQuery.startVal. It seems far easier for the user to give them a full Query object that is ready to run (especially after #1280 which will add "nextQuery.run()"), than just a cursor. We should lose the "nextQuery.startVal" and just make it "nextQuery.startCursor" (maybe?).


Point 3, regarding examples:

We should explain the advanced case, but not start with it.

The first example from https://googlecloudplatform.github.io/gcloud-node/#/docs/v0.31.0/datastore?method=runQuery is:

var query = datastore.createQuery('Lion');

datastore.runQuery(query, function(err, entities) {});

That uses autoPagination:true by default, so it will collect all of the results. Examples after that go on to show the manual pagination and stream API. We should update these examples to show how to manually paginate from just a start cursor (basically the example from point 2). During that edit, we can also elaborate on how the first example will pull down all of the results available.

@pcostell
Copy link
Contributor

pcostell commented May 3, 2016

When you say:

we have to special-case the pagination design throughout our API just for Cloud Datastore

Do you mean pagination, or batching? Based on how the whole nextQuery structure works, I would consider that batching from the Cloud Datastore API perspective.

I think trying to expose this using nextQuery causes some of the confusion that we arrived at in the first place. If we use nextQuery for batching, we probably shouldn't also use it for pagination.

@stephenplusplus
Copy link
Contributor

Sorry if I'm about to state the obvious, but I'm not sure how to answer that, regarding the terminology.

nextQuery is just a prepared "get the next set of results from the API" object. All methods that return a nextQuery accept that object. Originally, we never ran any subsequent queries for the user; they always had to manually run it if they wanted more. @jgeewax thought that was not ideal, so to support the common case of "getting all the results", we kept the nextQuery perpetual fetching system* for those that need manual control of requests, but put it behind an autoPaginate: false flag. That way, the common case doesn't have to even know perpetual fetching is happening... when they say "datastore.runQuery(query)", they get all of the results that match their query.

* being careful not to call it "pagination" or "batching"

@stephenplusplus
Copy link
Contributor

Opened up #1293 for the documentation improvements and changing query.startVal to query.startCursor.

We also have #1281 for adding an option to limit the amount of API requests a method can make.

I'm going to close this, as it's unclear to me what we should be doing to nextQuery. The only discussion I'm following would either remove features (returning only a cursoring token instead of a Query object where you can get that from) or go against the common use case that @jgeewax wants us to support firstly, which is pull down as many results as we can. If I'm misunderstanding, just re-open and ELI5.

@susanlinsfu
Copy link
Author

Sorry, but I am a little confused now. I am really interested in just knowing how would I know there are no more results to fetch in the query? Can I currently do this in any way?

@stephenplusplus
Copy link
Contributor

@susanlinsfu Sorry I took over this thread a bit to discuss proper Cloud Datastore usage. Right now, the next_query is still returned even though there are no more results. I'll try to make sure to explicitly send an update when the emulator has better more_results support.

Using the emulator in its current state, you'll always receive a nextQuery object. I'm not sure if there's a reliable way to tell if all of the results are in until the emulator is updated. You'll probably have to use the presence of a query that returned 0 entities to make the assumption there are no more results coming.

Just for clarity, this only affects gcloud-node >= 0.30.0, which uses gRPC. You can always downgrade to an older version and run the stable Datastore emulator that is included with the gcloud SDK.

@pcostell
Copy link
Contributor

pcostell commented May 6, 2016

I wouldn't recommend downgrading to an older version because you'll also switch to using Cloud Datastore API v1beta2, which has much worse latency characteristics than v1beta3.

@stephenplusplus -- This should work now if @susanlinsfu just provides autoPaginate(false), right? The query should just return all 42 results. Until the emulator returns the proper more_results flag you'll still get a next query, but otherwise it will work, right?

Just a note that downgrading shouldn't work. As I understand our setup, the emulator still returned MORE_RESULTS_AFTER_LIMIT even if the limit hadn't been reached in v1beta2.

@stephenplusplus
Copy link
Contributor

@pcostell the bare bones question is:

how would I know there are no more results to fetch in the query?

If the emulator only responds with "MORE_RESULTS_AFTER_LIMIT" or "NOT_FINISHED", but never "NO_MORE_RESULTS", our API always return a nextQuery. So while autoPaginate(false) has to be used*, a user will still have to keep manually running nextQuery to see if more results come back. Compared to when running against the cloud API, we won't return a nextQuery when we get NO_MORE_RESULTS, which is how the user knows there are no more results to fetch.

*if autoPaginate(false) is not used, you'll end up in an infinite loop because we auto-run nextQuery.

Thanks for the clarification around the old emulator.

@pcostell
Copy link
Contributor

pcostell commented May 6, 2016

Ok so here's the issue from my side:

You should only auto fetch the next batch if more_results is NOT_FINISHED.

A query with MORE_RESULTS_AFTER_LIMIT should have the next batch fetched ONLY by developers.

Then, you can get all 42 entities by just issuing the query:

datastore.createQuery('Messages').filter('sender',username);

@stephenplusplus
Copy link
Contributor

Could you please explain when each of the responses is returned? (MORE_RESULTS_AFTER_LIMIT, NOT_FINISHED, NO_MORE_RESULTS)

@pcostell
Copy link
Contributor

pcostell commented May 6, 2016

NOT_FINISHED: returned when the query is not complete.

If the user set a limit, we haven't satisfied the limit. If the user did not, there are likely more results. However, this is best effort: it is still possible that NOT_FINISHED is returned and the follow up query returns no results but a new more_results with updated information. This will happen if Cloud Datastore decides to end the request based on resource constraints and cannot determine if there are more results after the last result.

MORE_RESULTS_AFTER_LIMIT: there may be more results after the user specified limit.

It's possible for this to be returned even though there are not actually results after the limit because we weren't able to determine if there are more results or not.

In v1beta2 and the emulator version of v1beta3, this is always the result returned for completion.

MORE_RESULTS_AFTER_CURSOR: there may be more results after the user specified end cursor.

NO_MORE_RESULTS: there are no more results.

@stephenplusplus
Copy link
Contributor

stephenplusplus commented May 6, 2016

Thank you! From the gcloud-node side:

If a user runs a query with a limit, we return that many results, as well as a nextQuery for them to get more:

runQuery(queryWithLimit, function (err, results, nextQuery) {
  results.length <= Limit
  if (nextQuery) {
    nextQuery.startVal === originalQueryResponse.endCursor
  }
})

If a user runs a query without a limit, we fetch until we get NO_MORE_RESULTS.

runQuery(queryWithoutLimit, function(err, results) {
  // all results that match the query (no `nextQuery`)
})

To handle pagination manually:

runQuery(queryWithoutLimitAndAutoPaginationDisabled, function(err, results, nextQuery) {
  results.length === However many Datastore returned with until `MORE_RESULTS_AFTER_LIMIT`
})

I think this is the case you want us to change-- you're saying we should let the user do all of the pagination manually, as in the last example?

Thanks for your patience!

@pcostell
Copy link
Contributor

pcostell commented May 6, 2016

I'm not sure I'm asking the user to handle pagination manually, but let me restate to hopefully answer the question:

  1. gcloud-node should iterate as long as more_results == NOT_FINISHED. <== note that the way you've built auto_paginate, I think if auto_paginate=False you should stop iterating? Which is why I think next_query should only return IF more_results == NOT_FINISHED and there should be a separate mechanism for the other cases.

  2. When gcloud-node returns to the user, more_results information should be returned to them. They can then decide if they want to display the "next page" icon to their user. In the case of the emulator or v1beta2, they'll just always have a "Next Page" icon (they can avoid this by looking at limits if it's really important).

@stephenplusplus
Copy link
Contributor

I'm going to work on a PR that does that and see if I start to understand what we're going for in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API.
Projects
None yet
Development

No branches or pull requests

5 participants