-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Datastore QueryResult#next? to use NOT_FINISHED #803
Conversation
Change what check is made to paginate. Requested by the Datastore team. [closes googleapis#793]
@pcostell Can you verify this is correct? We'd like to get this resolved and released ASAP. Thanks! |
@@ -133,7 +133,7 @@ def initialize arr = [] | |||
# end | |||
# | |||
def next? | |||
!no_more? | |||
not_finished? |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This PR is an attempt to un-stick users of the emulator and to be more in-line with how the Datastore team wants the client to be used. There is a larger issue of not exposing |
I guess I'm confused about how |
The other cloud services don't differentiate between paging and batching. We want consistency with the other services, which can use |
Consider the case where a user specifies In a streaming API world, the API doesn't need any batching primitives (like Since the desired state is a streaming API, we should hide all batching details inside the client library, since it's really just an implementation detail until streaming is supported. |
Thanks for the heads up. We will absorb all this and come up with a new recommendation. |
After reading the discussion on gcloud-node I think I have a better appreciation for the terminology being used. To date gcloud-ruby has not differentiated between "batching" and "paging", but what we are intending to offer in Datastore is batching. Unfortunately we have been referring to this as "pagination" because that is what we are calling it in non-Datastore services. We have no plans currently to provide pagination by incrementing Our original approach was to model the batched list of results in an object, but we have recently implemented Enumerators on the result list's # Currently
datastore.run query #=> Datastore::QueryResult
# Future
datastore.run query #=> Enumerator This would allow us to stream results as they are needed. And users could use Enumerable to collect the results as they want. # get the first 100 results, regardless of how many batch requests are made
datastore.run(query).take(100)
# use a lazy enumerator to get the first 25 titles
datastore.run(query).lazy.map { |entry| entry["title"] }.take(25) |
@blowmage If changed to return an Enumerator instead of Datastore::QueryResult we would still receive the query’s cursor and more_results value though, right? |
Based on @pcostell's comments in the gcloud-node thread on this issue, I think any solution must continue to expose
|
Also, starting in v1beta3 there is a per-result cursor that we will also need to continue to support. |
We currently have a looping method, all_with_cursor, that exposes the cursor for each result: query = datastore.query "Task"
tasks = datastore.run query
tasks.all_with_cursor do |task, cursor|
puts "Task #{task.key.id} (#cursor)"
end Can we add query = datastore.query "Task"
tasks = datastore.run query
tasks.all_with_cursor do |task, cursor, more_results|
puts "Task #{task.key.id} (#cursor)"
if more_results == "NO_MORE_RESULTS"
# disable some UI element...
end
end |
The One option would be to place the value on the Enumerator object. require "gcloud"
gcloud = Gcloud.new
datastore = gcloud.datastore
query = datastore.query("Task")
results = datastore.run query
results #=> #<Enumerator: [...]:run more_results="NOT_FINISHED">
results.take(100).each do |entry|
puts entry["name"]
end
results #=> #<Enumerator: [...]:run more_results="NO_MORE_RESULTS"> |
@blowmage Can you tweak your example to show both |
I can't say for sure how it would work, I'm just throwing out some ideas. It is possible that the Enumerator resets each time it is used, and it isn't possible for the require "gcloud"
gcloud = Gcloud.new
datastore = gcloud.datastore
query = datastore.query("Task")
results = datastore.run query
results #=> #<Enumerator: [...]:run more_results="NOT_FINISHED">
# 10 results
results.take(10).each do |entry|
puts entry["name"]
end
results.more_results #=> "NOT_FINISHED"
results.not_finished? #=> true
# take 100 results with cursor, with two batch API requests, but there are only 70 results available...
results.take(100).count #=> 70
results.take(100).each_with_cursor do |entry, cursor|
puts "#{entry["name"]} - #{cursor}"
end
results.more_results #=> "NO_MORE_RESULTS"
results.no_more? #=> true
results #=> #<Enumerator: [...]:run more_results="NO_MORE_RESULTS"> It is worth pointing out that the 10 results pulled will also be pulled in the |
The example seems reasonable, but I'm a little concerned with how take and limit interact. If take is the ruby-esque / gcloud-ruby way of doing things, maybe we can drop setting a limit on query and pass through take to the query before actually running it. |
@pcostell My understanding is that gcloud-ruby's If you dig into the implementation of Use case: Give me the top 5 records of 100,000 records. Use case: Give me an in-memory array holding the top 1,000 records of 100,000 records. |
@quartzmo What we currently have implemented is the following: |
@bmclean So for the above, you use But I'm pretty sure you would never use |
@quartzmo Correct, we use limit and cursor together for most of our user facing queries. Yes, we use the cursor directly. Not sure about offset, I don't think we have ever used it. I love the idea of using |
@bmclean Now that you mention it, this use case seems totally obvious (and probably super common). Appreciate your input! |
cursor + limit is very common for Cloud Datastore (eg show a page of results on your site). @bmclean's use case is also what the more_results field ( and the paging feature in general) are used for -- knowing when to stop trying to get more results in the UI. I believe cursor + offset is less useful, but still totally possible (start You might use it if you have a paging UI (eg Google web search): Prev Page 1 2 3 4 5 6 7 Next Page Prev Page and Next Page would just use cursors to continue, but if you wanted to skip here from page 4 to page 6 you would use the page 4 end cursor plus an offset of page-size to skip over page 5. |
Currently, this PR simply corrects the logic in Each new |
Guys, what is the resolution here? Would be great to get 0.12 release. Meanwhile I have to do a monkey patching to bring this fix in. |
I'm going to accept this PR presently for release today unless anyone objects. |
@quartzmo It looks like 0.12 has already been released. Will there be separate release containing this PR? |
@timanovsky Yes. I am working now on release |
@timanovsky Please try |
@quartzmo confirming |
Change what check is made to paginate. Requested by the Datastore team.
[closes #793]