Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryResults.next? keeps returning true forever under emulator #793

Closed
timanovsky opened this issue Jul 22, 2016 · 16 comments · Fixed by #803
Closed

QueryResults.next? keeps returning true forever under emulator #793

timanovsky opened this issue Jul 22, 2016 · 16 comments · Fixed by #803
Assignees

Comments

@timanovsky
Copy link

Hello, I've finally got some time to upgrade from 0.8.2. I'm running into the problem running a query under emulator. Basically subject says it all. Query returns 1 entry, but next?==true, .next return 0 entries, but again next?==true and so on. With real cloud API it works fine returning one record. This btw results in .all hanging.
The emulator I downloaded from documentation page link: https://storage.googleapis.com/gcd/tools/cloud-datastore-emulator-1.1.1.zip

@timanovsky
Copy link
Author

BTW, in the emulator log I don't see any new requests when I'm calling .next

@timanovsky
Copy link
Author

I found out that cloud-datastore-emulator (--no-leagacy) is now available in gcloud tool, does it mean separate download is nit necessary? Anyways it behaves the same. next? keeps returning true; next keeps returning [] without doing actual calls it seems.

@timanovsky
Copy link
Author

.more_results == :MORE_RESULTS_AFTER_LIMIT for initial and following query results

@timanovsky
Copy link
Author

And, as a separate related issue, if understand semantics of MORE_RESULTS_AFTER_LIMIT correctly, .all should stop iterations once limit is reached. That would be logical from user standpoint.

@blowmage blowmage self-assigned this Jul 22, 2016
@blowmage
Copy link
Contributor

I'll look at this over the weekend. That's frustrating that the live service and emulator behave differently. Thanks for letting us know @timanovsky!

@timanovsky
Copy link
Author

timanovsky commented Jul 23, 2016

@blowmage if you want to reproduce, it is very basic ancestry query, no any other query params (no limit etc). Result set is expected to be 1 entry.

    query = Gcloud::Datastore::Query.new
    level1_key = Gcloud::Datastore::Key.new("TypeA", "keyA")
    level2_key = Gcloud::Datastore::Key.new("TypeB", "keyB")
    level2_key.parent = level1_key
    query.kind("TypeC").where("__key__", "HAS ANCESTOR", level2_key)
    res = dataset.run(query)

@timanovsky
Copy link
Author

You don't need ancestry query. Minimal code to reproduce the problem:

require 'gcloud/datastore'
dataset = Gcloud.datastore()
key = Gcloud::Datastore::Key.new("EntryA", "keyA")
entity = Gcloud::Datastore::Entity.new
entity.key = key
dataset.save(entity)

query = Gcloud::Datastore::Query.new
query.kind("EntryA")
r = dataset.run(query)
r.next?
r.more_results
r.next.next?

@blowmage
Copy link
Contributor

I've looked into this over the weekend and this appears to be a bug in the beta emulator. I didn't get the cloud-datastore-emulator to work for me, so I only used the emulator referenced in our documentation.

Normally, when a query has no more results it will return the value NO_MORE_RESULTS. But the emulator is not returning this, it instead returns MORE_RESULTS_AFTER_LIMIT. This makes it incredibly difficult for the software to know if there are indeed no more results.

Working around this is tricky. We don't want to introduce any bugs to gcloud-ruby in an effort to fix this for the emulator. There is a legitimate use for MORE_RESULTS_AFTER_LIMIT to be used, and treating those query results as NO_MORE_RESULTS would be a mistake.

One option is to use the request_limit optional argument when calling #all to limit the number of requests made to the server. This will limit the number of times #next is called, and will ensure that the Enumerator does not hang indefinitely.

query = datastore.query "EntryA"
datastore.run(query).all(request_limit: 10) do |entry|
  puts entry
end

BTW, in the emulator log I don't see any new requests when I'm calling .next

I don't see them in the log output either, but those requests are happening.

And, as a separate related issue, if understand semantics of MORE_RESULTS_AFTER_LIMIT correctly, .all should stop iterations once limit is reached. That would be logical from user standpoint.

Those are two different things. Specifying the limit in the query will limit the number of results returned in the initial request. But if you paginate those results it will request more and more results. My understanding is that MORE_RESULTS_AFTER_LIMIT informs users that there are more results past the limit, which indicates that pagination can continue.

To stop at a limit, you can specify a limit in the query. But, there are no guarantees that the server will return the limit. It may return less than the limit. It all depends on what is happening on the server. Alternatively, you can use ruby's Enumerable to take the number of results you are looking for. This will load all the records until the server indicates there are no more results, or the specified number of results is taken, whichever comes first.

query = datastore.query "EntryA"
first_25 = datastore.run(query).all.take(25)

Also, you can combine this with the request_limit optional argument to workaround this emulator bug:

query = datastore.query "EntryA"
first_25 = datastore.run(query).all(request_limit: 10).take(25)

@blowmage
Copy link
Contributor

@pcostell Can you confirm that this is a bug in the emulator? If so, how do we file an issue for it?

@pcostell
Copy link
Contributor

pcostell commented Jul 25, 2016

This is a known issue with the emulator, I'll find the public issue you can
track when I'm at a computer. This is a result of the emulator being built
on the Cloud Datastore API for App Engine, which only distinguishes between
NOT_FINISHED or not. Since the more_results flag is a hint (see below),
the only safe response when the query is finished is
MORE_RESULTS_AFTER_LIMIT.

Something to note:

more_results is a best effort hint to the developer. In the case of the
emulator it is buggy that MORE_RESULTS_AFTER_LIMIT is returned even if
the query didn't reach the limit. However even in production, the query may
return MORE_RESULTS_AFTER_LIMIT even if there are no more results if the
server is unwilling to do the work to check for the next entity.

This is not the issue here but for posterity: in order to determine if you
should keep iterating through results the clients should only ever use
more_results == NOT_FINISHED aa loop condition.

@blowmage
Copy link
Contributor

@pcostell Interesting. So currently gcloud-ruby is paginating based on whether more_results != NO_MORE_RESULTS. But since you say this should be changed to more_results == NOT_FINISHED we will make that change. Thanks for the guidance.

@pcostell
Copy link
Contributor

Importantly there are two features exposed through the more_results field: batching and paging.

Our desired behavior for RunQuery is a streaming API. If a user asks for a limit of N, we will give them a stream of up to N entities. We can then add some helpers on top of that to convert the stream to a list. Since we are still moving everything over to gRPC (which will allow this), we have a batching API which can be used to simulate a streaming API:

while (more_results == NOT_FINISHED) {
  results.append(get_next_query().execute())
}

Separately, Cloud Datastore offers a paging feature, which provides the user a cursor they can use to get extra pages. This has an extra feature which is best effort end condition tracking (more_results). This is very useful for users who are showing pages to their users as they can get information about whether or not they should show a "Next Page" button (this is MORE_RESULTS_AFTER_LIMIT). This feature is conservative (MORE_RESULTS_AFTER_LIMIT may be returned even if the correct result is NO_MORE_RESULTS).

@pcostell
Copy link
Contributor

I couldn't find an existing issue (I believe it lives in one of the other gcloud libraries). Here is one for the emulator: googleapis/google-cloud-datastore#130

@blowmage
Copy link
Contributor

Okay, so interestingly, with the change it is no longer possible to paginating with #next or #all after setting a limit in the query:

query = dataset.query("EntryA").limit(3)
dataset.run(query).all.count #=> always 3, even if there are thousands of results

This is the desired behavior, right?

@pcostell
Copy link
Contributor

The use of all is a bit confusing to me. If the user wants all the results, they shouldn't set a limit.

@blowmage
Copy link
Contributor

Agreed, it doesn't mesh with #limit very well. Its intended to ease multiple paginations to get the desired results. Also for consistency with the other gcloud-ruby services. :)

query = dataset.query("Task")
dataset.run(query).all.take(250) # paginate as many times as it takes to get 250 results

blowmage added a commit to blowmage/google-cloud-ruby that referenced this issue Jul 25, 2016
Change what check is made to paginate. Requested by the Datastore team.

[closes googleapis#793]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants