Change Datastore QueryResult#next? to use NOT_FINISHED #803

blowmage · 2016-07-25T17:38:41Z

Change what check is made to paginate. Requested by the Datastore team.

[closes #793]

Change what check is made to paginate. Requested by the Datastore team. [closes googleapis#793]

blowmage · 2016-07-25T17:39:08Z

@pcostell Can you verify this is correct? We'd like to get this resolved and released ASAP. Thanks!

lib/gcloud/datastore/dataset/query_results.rb

@@ -133,7 +133,7 @@ def initialize arr = []
        #   end
        #
        def next?
-          !no_more?
+          not_finished?


blowmage · 2016-07-25T18:04:15Z

This PR is an attempt to un-stick users of the emulator and to be more in-line with how the Datastore team wants the client to be used. There is a larger issue of not exposing NOT_FINISHED and always streaming results, but that is outside of this PR and will require a larger set of changes.

pcostell · 2016-07-25T18:17:33Z

I guess I'm confused about how next? is used. It seems like maybe it is used for both paging and batching. Right now it is implemented like it should be for paging, but that breaks batching. But the PR changes it to be implemented like it should for batching, but that will break paging.

blowmage · 2016-07-25T19:16:20Z

The other cloud services don't differentiate between paging and batching. We want consistency with the other services, which can use #next?, #next, and all. But perhaps we should discuss the larger question of how much of the Datastore API needs to be exposed. I'm surprised that NOT_FINISHED should not be exposed to users, for example.

pcostell · 2016-07-25T20:40:26Z

Consider the case where a user specifies limit(100). If we were implementing a streaming API, all the multi-RPC handling would happen inside of gRPC. The end result would be the user getting a stream of exactly 100 results (only less if there aren't that many results). In this world, there is no such thing as NOT_FINISHED; if the stream wasn't finished, it would have continued to stream data. When the stream completes more_results will only have 3 possible results: MORE_RESULTS_AFTER_LIMIT, MORE_RESULTS_AFTER_END_CURSOR, or NO_MORE_RESULTS.

In a streaming API world, the API doesn't need any batching primitives (like more_results = NOT_FINISHED). However, it would still expose paging primitives which are user facing and must work across streams (because they'll generally be associated with exposing something to the end user, like a "Next page" button, or fancy ajax infinite scrolling).

Since the desired state is a streaming API, we should hide all batching details inside the client library, since it's really just an implementation detail until streaming is supported.

Here's the same discussion on gcloud-node.

blowmage · 2016-07-25T20:45:17Z

Thanks for the heads up. We will absorb all this and come up with a new recommendation.

blowmage · 2016-07-26T23:52:44Z

After reading the discussion on gcloud-node I think I have a better appreciation for the terminology being used. To date gcloud-ruby has not differentiated between "batching" and "paging", but what we are intending to offer in Datastore is batching. Unfortunately we have been referring to this as "pagination" because that is what we are calling it in non-Datastore services.

We have no plans currently to provide pagination by incrementing offset. We use the cursor of the previously last returned result to batch fill the results. There may be instances where this approach does not play nicely with limit or offset.

Our original approach was to model the batched list of results in an object, but we have recently implemented Enumerators on the result list's #all method. With more services moving to grpc and streaming, it may make sense to drop the batched list object completely and only expose Enumerators.

# Currently
datastore.run query #=> Datastore::QueryResult

# Future
datastore.run query #=> Enumerator

This would allow us to stream results as they are needed. And users could use Enumerable to collect the results as they want.

# get the first 100 results, regardless of how many batch requests are made
datastore.run(query).take(100)

# use a lazy enumerator to get the first 25 titles
datastore.run(query).lazy.map { |entry| entry["title"] }.take(25)

bmclean · 2016-07-27T03:01:11Z

@blowmage If changed to return an Enumerator instead of Datastore::QueryResult we would still receive the query’s cursor and more_results value though, right?

quartzmo · 2016-07-27T18:22:05Z

Based on @pcostell's comments in the gcloud-node thread on this issue, I think any solution must continue to expose cursor and more_results:

This means we'll need to expose the end_cursor and more_results so that the user can...

blowmage · 2016-07-27T20:35:38Z

Also, starting in v1beta3 there is a per-result cursor that we will also need to continue to support.

quartzmo · 2016-07-27T20:48:09Z

We currently have a looping method, all_with_cursor, that exposes the cursor for each result:

query = datastore.query "Task"
tasks = datastore.run query
tasks.all_with_cursor do |task, cursor|
  puts "Task #{task.key.id} (#cursor)"
end

Can we add more_results to the block? (It's not pretty to go beyond two block parameters, but...)

query = datastore.query "Task"
tasks = datastore.run query
tasks.all_with_cursor do |task, cursor, more_results|
  puts "Task #{task.key.id} (#cursor)"
  if more_results == "NO_MORE_RESULTS"
    # disable some UI element...
  end
end

blowmage · 2016-07-27T21:57:12Z

The #all_with_cursor Enumerator batches requests as needed, so it probably doesn't make much sense to pass more_results as an argument to the yielded block, since the value will change as more and more batches are made. I don't see the benefit. The intention is to be analogous to Enumerator#each_with_index.

One option would be to place the value on the Enumerator object.

require "gcloud"

gcloud = Gcloud.new
datastore = gcloud.datastore

query = datastore.query("Task")
results = datastore.run query

results #=> #<Enumerator: [...]:run more_results="NOT_FINISHED">
results.take(100).each do |entry|
  puts entry["name"]
end
results #=> #<Enumerator: [...]:run more_results="NO_MORE_RESULTS">

quartzmo · 2016-07-27T22:04:27Z

@blowmage Can you tweak your example to show both cursor and more_results being accessible? (I agree that more_results does not need to be available per-result; its value should be NOT_FINISHED until iteration completes.)

blowmage · 2016-07-27T22:27:19Z

I can't say for sure how it would work, I'm just throwing out some ideas. It is possible that the Enumerator resets each time it is used, and it isn't possible for the more_results value shown below to be mutable on the Enumerator object. But if possible, we could squash the QueryResults behavior onto the Enumerator object:

require "gcloud"

gcloud = Gcloud.new
datastore = gcloud.datastore

query = datastore.query("Task")
results = datastore.run query

results #=> #<Enumerator: [...]:run more_results="NOT_FINISHED">

# 10 results
results.take(10).each do |entry|
  puts entry["name"]
end

results.more_results #=> "NOT_FINISHED"
results.not_finished? #=> true

# take 100 results with cursor, with two batch API requests, but there are only 70 results available...
results.take(100).count #=> 70
results.take(100).each_with_cursor do |entry, cursor|
  puts "#{entry["name"]} - #{cursor}"
end

results.more_results #=> "NO_MORE_RESULTS"
results.no_more? #=> true
results #=> #<Enumerator: [...]:run more_results="NO_MORE_RESULTS">

It is worth pointing out that the 10 results pulled will also be pulled in the take(100) call.

quartzmo · 2016-07-27T22:32:05Z

@blowmage Thanks!
@pcostell How does the example above look to you?

pcostell · 2016-07-28T16:45:13Z

The example seems reasonable, but I'm a little concerned with how take and limit interact. If take is the ruby-esque / gcloud-ruby way of doing things, maybe we can drop setting a limit on query and pass through take to the query before actually running it.

quartzmo · 2016-07-28T18:22:05Z

@pcostell My understanding is that gcloud-ruby's limit works with Datastore's limit/offset-based pagination, but that Ruby's Enumerable#take, which can be called on the return value from gcloud-ruby's all, works with Datastore's cursor-based batching. To the degree to which these mechanisms are intended to be used separately in Datastore, they should be used separately in gcloud-ruby. (Is there any practical reason to use limit with cursors? Edit: Yes, see below.)

If you dig into the implementation of QueryResults#all, you will see that it will make repeated API requests using the last cursor, until there are NO_MORE_RESULTS. Using take will transform this relatively open-ended iteration into a concrete array of a specific length.

Use case: Give me the top 5 records of 100,000 records.
Solution: Use limit, do not use all or take.

Use case: Give me an in-memory array holding the top 1,000 records of 100,000 records.
Solution: Use all and take, do not use limit.

bmclean · 2016-07-28T19:22:48Z

@quartzmo What we currently have implemented is the following:
For a particular entity kind give me the top 25 records of 100,000 records from datastore.
The user scrolls down the page, and we use the provided cursor to query for the next 25 records. And the next. And so on.

quartzmo · 2016-07-28T19:28:03Z

@bmclean So for the above, you use limit and cursor together? So then it is offset and cursor that are nonsensical to use together, actually? Thanks for the example.

But I'm pretty sure you would never use all (and take) for this use case, you use the cursor yourself directly, correct?

bmclean · 2016-07-28T19:39:13Z

@quartzmo Correct, we use limit and cursor together for most of our user facing queries. Yes, we use the cursor directly. Not sure about offset, I don't think we have ever used it.

I love the idea of using all and take but bringing back more entities from datastore than we need to display at any given time will increase our page load.

quartzmo · 2016-07-28T19:41:16Z

@bmclean Now that you mention it, this use case seems totally obvious (and probably super common). Appreciate your input!

pcostell · 2016-07-28T22:27:39Z

cursor + limit is very common for Cloud Datastore (eg show a page of results on your site). @bmclean's use case is also what the more_results field ( and the paging feature in general) are used for -- knowing when to stop trying to get more results in the UI.

I believe cursor + offset is less useful, but still totally possible (start @cursor but then skip 100 results).

You might use it if you have a paging UI (eg Google web search):

Prev Page 1 2 3 4 5 6 7 Next Page

Prev Page and Next Page would just use cursors to continue, but if you wanted to skip here from page 4 to page 6 you would use the page 4 end cursor plus an offset of page-size to skip over page 5.

quartzmo · 2016-07-29T21:04:14Z

Currently, this PR simply corrects the logic in QueryResult#next? to use NOT_FINISHED. Am I correct to think that this is adequate, and that the PR can be accepted?

Each new QueryResults instance returned by #next already exposes more_results and cursor. Based on the use cases above, I do not see a need for the Enumerator returned by #all to expose these values as well, but does anyone else?

timanovsky · 2016-07-31T17:42:25Z

Guys, what is the resolution here? Would be great to get 0.12 release. Meanwhile I have to do a monkey patching to bring this fix in.

quartzmo · 2016-08-01T15:44:48Z

I'm going to accept this PR presently for release today unless anyone objects.

timanovsky · 2016-08-01T16:53:56Z

@quartzmo It looks like 0.12 has already been released. Will there be separate release containing this PR?

quartzmo · 2016-08-01T17:22:16Z

@timanovsky Yes. I am working now on release 0.12.1.

quartzmo · 2016-08-01T18:27:49Z

@timanovsky Please try 0.12.1 and confirm the fix. Thanks again.

timanovsky · 2016-08-01T19:04:45Z

@quartzmo confirming

Change Datastore#next? to use NOT_FINISHED

e5584e1

Change what check is made to paginate. Requested by the Datastore team. [closes googleapis#793]

blowmage added the api: datastore Issues related to the Datastore API. label Jul 25, 2016

blowmage added this to the v0.12 milestone Jul 25, 2016

blowmage assigned quartzmo Jul 25, 2016

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jul 25, 2016

blowmage changed the title ~~Change Datastore#next? to use NOT_FINISHED~~ Change Datastore QueryResult#next? to use NOT_FINISHED Jul 25, 2016

pcostell reviewed Jul 25, 2016
View reviewed changes

lib/gcloud/datastore/dataset/query_results.rb

@@ -133,7 +133,7 @@ def initialize arr = []

# end

#

def next?

!no_more?

not_finished?

This comment was marked as spam.

Sign in to view

This comment was marked as spam.

Sign in to view

This comment was marked as spam.

Sign in to view

quartzmo merged commit 0f70805 into googleapis:master Aug 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Datastore QueryResult#next? to use NOT_FINISHED #803

Change Datastore QueryResult#next? to use NOT_FINISHED #803

blowmage commented Jul 25, 2016

blowmage commented Jul 25, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

blowmage commented Jul 25, 2016

pcostell commented Jul 25, 2016

blowmage commented Jul 25, 2016

pcostell commented Jul 25, 2016

blowmage commented Jul 25, 2016

blowmage commented Jul 26, 2016

bmclean commented Jul 27, 2016

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016 •

edited

Loading

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016

quartzmo commented Jul 27, 2016

pcostell commented Jul 28, 2016

quartzmo commented Jul 28, 2016 •

edited

Loading

bmclean commented Jul 28, 2016

quartzmo commented Jul 28, 2016

bmclean commented Jul 28, 2016

quartzmo commented Jul 28, 2016

pcostell commented Jul 28, 2016

quartzmo commented Jul 29, 2016

timanovsky commented Jul 31, 2016

quartzmo commented Aug 1, 2016

timanovsky commented Aug 1, 2016

quartzmo commented Aug 1, 2016

quartzmo commented Aug 1, 2016

timanovsky commented Aug 1, 2016

Change Datastore QueryResult#next? to use NOT_FINISHED #803

Change Datastore QueryResult#next? to use NOT_FINISHED #803

Conversation

blowmage commented Jul 25, 2016

blowmage commented Jul 25, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

blowmage commented Jul 25, 2016

pcostell commented Jul 25, 2016

blowmage commented Jul 25, 2016

pcostell commented Jul 25, 2016

blowmage commented Jul 25, 2016

blowmage commented Jul 26, 2016

bmclean commented Jul 27, 2016

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016 • edited Loading

quartzmo commented Jul 27, 2016

blowmage commented Jul 27, 2016

quartzmo commented Jul 27, 2016

pcostell commented Jul 28, 2016

quartzmo commented Jul 28, 2016 • edited Loading

bmclean commented Jul 28, 2016

quartzmo commented Jul 28, 2016

bmclean commented Jul 28, 2016

quartzmo commented Jul 28, 2016

pcostell commented Jul 28, 2016

quartzmo commented Jul 29, 2016

timanovsky commented Jul 31, 2016

quartzmo commented Aug 1, 2016

timanovsky commented Aug 1, 2016

quartzmo commented Aug 1, 2016

quartzmo commented Aug 1, 2016

timanovsky commented Aug 1, 2016

blowmage commented Jul 27, 2016 •

edited

Loading

quartzmo commented Jul 28, 2016 •

edited

Loading