-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batches in >=2.0.0 are significantly slower #145
Comments
Are you seeing this in both emulator and live endpoints? |
I haven't tested on a live endpoint. I'm in vacation for the week, I'll definitely test this when I come back. |
Thanks @BinarSkugga . I know we overhauled, rather substantially, the transport layer in the v1 -> v2. It would be very helpful to understand if you are seeing a slowdown on a live endpoint. |
This comment has been minimized.
This comment has been minimized.
Additional potentially impactful information:
|
I may have talked to fast. It also happens on live endpoints. [SOLLUM][DATA][INFO][10:42:35]: Fetching 4130 'LayoutEntity' entities...
[SOLLUM][DATA][VERBOSE][10:42:35]: Using 9 'LayoutEntity' batches...
[SOLLUM][DATA][VERBOSE][10:42:36]: Fetched 'LayoutEntity' batch #1/9
[SOLLUM][DATA][VERBOSE][10:42:36]: Fetched 'LayoutEntity' batch #2/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #4/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #3/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #5/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #6/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #7/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #8/9
[SOLLUM][DATA][VERBOSE][10:42:37]: Fetched 'LayoutEntity' batch #9/9 Logs from 2.1.x: [SOLLUM][DATA][INFO][14:35:42]: Fetching 4130 'LayoutEntity' entities...
[SOLLUM][DATA][VERBOSE][14:35:42]: Using 9 'LayoutEntity' batches...
[SOLLUM][DATA][VERBOSE][14:36:48]: Fetched 'LayoutEntity' batch #1/9
[SOLLUM][DATA][VERBOSE][14:39:31]: Fetched 'LayoutEntity' batch #2/9
[SOLLUM][DATA][VERBOSE][14:39:32]: Fetched 'LayoutEntity' batch #3/9 Here's the code that handles this: def _do_fetch_chunk(chunk, client, kind, count, total):
with client.batch():
ents = client.get_multi(chunk)
Loggers.DATA.verbose('Fetched \'{}\' batch #{}/{}'.format(kind, count.next(), total))
return ents
count = AtomicNumber()
chunked = list(chunks(keys, config.MAX_BATCH_ENTITY))
Loggers.DATA.verbose('Using {} \'{}\' batches...'.format(len(chunked), kind))
entities = ConcurrentService.i().thread_each(chunked, _do_fetch_chunk, self.client, kind, count, len(chunked))
return [item for sublist in entities for item in sublist] The concurrent service simply submit a new task per batch in the thread pool. |
Any insight on this ? |
I have received another customer report. While not quite the difference stated here, 2-3x slower is about what the other instance was observing. It would be good to benchmark a few operations and see if there are any obvious gains that could be made. Reference PR from PubSub that was able to get back some time: googleapis/python-pubsub@c29d7f8 |
I've been attempting to recreate this, but so far am seeing more or less similar outcomes for batch operations between versions Querying may very well be slower, however. First test run: $ python datastore-test.py
[[email protected]] finished creation loop in 5.0 ms
[[email protected]] finished batch write in 1926.0 ms
[[email protected]] finished batch get in 391.0 ms
[[email protected]] finished batch update in 1706.0 ms
[[email protected]] finished batch query in 2216.0 ms
> loaded 3500 objects Switch to version python datastore-test.py
[[email protected]] finished creation loop in 4.0 ms
[[email protected]] finished batch write in 2109.0 ms
[[email protected]] finished batch get in 588.0 ms
[[email protected]] finished batch update in 1701.0 ms
[[email protected]] finished batch query in 3702.0 ms
> loaded 4000 objects I will continue to investigate, but for now, here's the testing script I'm using. |
More efficiently uses proto-plus wrappers, as well as inner protobuf attribute access, to greatly reduce the performance costs seen in version 2.0.0 (which stemmed from the introduction of proto-plus). The size of the performance improvement scales with the number of attributes on each Entity, but in general, speeds once again closely approximate those from 1.15. Fixes #145 Fixes #150
I believe there is a change of API or a performance issue when using batches with 2.1.0.
Environment details
google-cloud-datastore
version: 2.1.0google-api-core
version: 1.24.1Steps to reproduce
Code example
Performance
When using 1.15.0, my code can handle 9 batches of 500 entities in about 1-2 seconds while 2.1.0 takes literal minutes.
Disclaimer: I use the datastore emulator
EDIT: It can be reproduced in 2.0.0
The text was updated successfully, but these errors were encountered: