-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sharded read rows #766
Merged
daniel-sanche
merged 405 commits into
googleapis:v3
from
daniel-sanche:sharded_read_rows
Jun 23, 2023
Merged
Changes from all commits
Commits
Show all changes
405 commits
Select commit
Hold shift + click to select a range
38e5662
did some restructuring
daniel-sanche 5155800
got some tests working
daniel-sanche 522f7fa
improved tests
daniel-sanche 9429244
renamed RowResponse and CellResponse to Row and Cell
daniel-sanche 1aa7424
fixed tests
daniel-sanche a603649
simplified row construction
daniel-sanche 68a5a0f
added RowRange object
daniel-sanche cc2e7c8
added comments
daniel-sanche ba629c8
added api-core submodule
daniel-sanche 75d2c10
copied in rough retryable logic
daniel-sanche d5eca2a
Merge branch 'v3_row_response' into read_rows_state_machine
daniel-sanche 2a26797
updated Row and Cell class names
daniel-sanche bcd394f
fixed tests
daniel-sanche 037af0d
added last scanned row class
daniel-sanche e17d9bc
ran blacken
daniel-sanche db80d22
Merge branch 'read_rows_state_machine' into read_rows_retries
daniel-sanche b3d977d
handle last scanned rows
daniel-sanche 1f85462
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche 1fba6ea
updated add_keys
daniel-sanche c4f82b0
removed chaining
daniel-sanche caca14c
improved to_dicts
daniel-sanche 5f9ce85
improving row_ranges
daniel-sanche 8e5f60a
fixed properties
daniel-sanche 57184c1
added type checking to range
daniel-sanche 3eda7f4
got tests passing
daniel-sanche 65f5a2a
blacken, mypy
daniel-sanche 3e724db
ran blacken
daniel-sanche 45eadce
improved API usage
daniel-sanche c06213f
use invalid chunk
daniel-sanche 6e75a2f
added per request timeouts
daniel-sanche a205e93
account for RequestStats
daniel-sanche ce3eb75
added output generator wrapper
daniel-sanche 74029c9
updated template
daniel-sanche 7f2be30
got tests passing
daniel-sanche 2b044ce
removed metadata
daniel-sanche 1743098
added sleep between swwapping and closing channels
daniel-sanche e5fa4b6
ran blacken
daniel-sanche 8955ec5
got tests working
daniel-sanche 002bc5f
fixed lint issue
daniel-sanche 65f0d2f
fixed tests
daniel-sanche 664a6d2
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche d3db731
Merge branch 'add_new_transport' into read_rows_state_machine
daniel-sanche 5f41c06
changed return type
daniel-sanche aa26911
Merge branch 'v3_read_rows_query' into read_rows_state_machine
daniel-sanche 7b68207
fixed typing issues
daniel-sanche a776cb5
Merge branch 'read_rows_state_machine' into read_rows_retries
daniel-sanche c164a47
adjusted types
daniel-sanche 96d58d1
added per-row-rimeout to merge_row_stream_with_cache
daniel-sanche 216610e
cancel stream on exception
daniel-sanche c505c39
moved retry logic into RetryableRowMerger
daniel-sanche 179c8b8
fixed issues in merger
daniel-sanche 3cc5380
moved streaming into cache into RetryableRowMerger
daniel-sanche 4af0218
restructuring
daniel-sanche d6a323f
added idle timeout
daniel-sanche 7b6d1db
keep track of last_raised
daniel-sanche 733a393
fixed mypy issues
daniel-sanche 12807e0
made idle timeout internal value
daniel-sanche 0e3d32c
combined row merger functions
daniel-sanche 5b055b4
made adjustments to RowMerger
daniel-sanche dbf19c9
holds a gapic client instead of inherits from it
daniel-sanche ab7931c
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche 88f14f6
don't emit _LastScannedRows
daniel-sanche 9f15a6a
fixed type issues
daniel-sanche b3c32b0
got tests passing
daniel-sanche 770d9f5
added comments
daniel-sanche 9f3e0c5
added comment
daniel-sanche a0620ea
added random noise to refresh intervals
daniel-sanche 4f5ed46
improving comments; clean up
daniel-sanche c169ba8
fixed param order
daniel-sanche 9ec3697
working on getting end-to-end read_rows working
daniel-sanche b6873e8
fixed issue in pulling from cache
daniel-sanche 2facc79
added timeout to results generator
daniel-sanche ee826bb
added acceptance tests for read_rows
daniel-sanche 25af0c0
adding tests
daniel-sanche 2f7778d
got operation deadline error working properly
daniel-sanche d6b8e6b
made RowMerger back into an iterable
daniel-sanche 3f085a9
added test for per-row timeout
daniel-sanche 6abb9d4
don't attach retry errors if there are none
daniel-sanche 128320c
added tests for per_request_timeout
daniel-sanche a048536
added idle timeout test
daniel-sanche 371dd64
remove row merger after error
daniel-sanche ebbaa1e
reorganized retryable_merge_rows
daniel-sanche 2a3e379
improved resource clean up on retries and expiration
daniel-sanche 2e50c51
added tests for request stats
daniel-sanche 0b63b2b
added tests for exceptions
daniel-sanche de102bb
clean up on_error
daniel-sanche bbdb8e6
await sleep
daniel-sanche 83472dc
got tests working
daniel-sanche bef40bd
updated api-core
daniel-sanche 29a98ed
Merge branch 'v3' into read_rows_retries
daniel-sanche 534005a
ran blacken
daniel-sanche 6f1c781
made invalid chunk a server error
daniel-sanche 38f66e5
moved invalid chunk with other exceptions
daniel-sanche bf24c25
made row merger and classes private
daniel-sanche 4dbacb5
added read_rows
daniel-sanche 6e6978e
ran blacken
daniel-sanche 21f7846
added comments
daniel-sanche 52e9dbf
added test for revise rowset
daniel-sanche 715be51
fixed lint issues
daniel-sanche 2f50cb7
moved ReadRowsIterator into new file
daniel-sanche 1486d5a
Merge branch 'v3' into add_new_transport
daniel-sanche 28d5a7a
fixed lint issues
daniel-sanche 3b11580
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche d47c941
changed comment
daniel-sanche d1bd128
added comments to iterator
daniel-sanche 039d623
added var for idle timeout
daniel-sanche 3d34dcd
sped up acceptance tests
daniel-sanche 70fbff9
reduced size of template by making subclass
daniel-sanche 383d8eb
reverted unintentional gapic generation changes
daniel-sanche 018fe03
updated submodule
daniel-sanche 3764a98
added default timeouts to table surface
daniel-sanche 745ae38
end after row_limit rows
daniel-sanche 3d11d55
changed retryable exceptions
daniel-sanche f0403e7
changed warning stack level
daniel-sanche 84a775a
changed retryable errors
daniel-sanche 15a9d23
improved comments
daniel-sanche 8636654
improved idle timeouts
daniel-sanche 1aca392
changed retry parameters
daniel-sanche 45fef1e
added limit revision to each retry
daniel-sanche 951a77b
removed unneeded check
daniel-sanche e3a0b66
fixed idle timeout test
daniel-sanche 6089934
removed tracking of emitted rows
daniel-sanche fb4b0ca
removed revise_on_retry flag
daniel-sanche 83b908c
changed initial sleep
daniel-sanche 5688561
added extra timeout check
daniel-sanche 7f57e7c
implemented sample_keys
daniel-sanche a6a140b
initial implementation of query.shard
daniel-sanche 0f03aea
added read_rows_sharded implementation
daniel-sanche cfa181d
fixed bugs in implementation
daniel-sanche e8007c8
added str and equal to query and range
daniel-sanche e190dc6
added a test for sharding
daniel-sanche 5aa89da
got first set of tests passing
daniel-sanche a565f47
added table scan test
daniel-sanche 7041dfd
added more tests
daniel-sanche 2f7973a
made row ranges into set
daniel-sanche 872480f
added unsorted test
daniel-sanche 47be958
ran blacken
daniel-sanche c945687
fixed mypy issues
daniel-sanche b0dbaed
fixed lint issues
daniel-sanche 53878a9
fixed bug in from_dict
daniel-sanche ff2dfca
fixed tests
daniel-sanche ff3724d
removed outdated test
daniel-sanche 78a309c
fixed type annotations
daniel-sanche c50ae18
added slots
daniel-sanche d73121b
renamed cache to buffer
daniel-sanche 14d8527
renamed errors
daniel-sanche 4b89c86
replaced type check with None check
daniel-sanche 9f89577
added comment for last_scanned_row heartbeat
daniel-sanche 4b229b9
added early return
daniel-sanche 152bccf
moved validation
daniel-sanche 67c2911
added close call to ReadRowsIterator
daniel-sanche ff11ad3
removed del
daniel-sanche 78bd5d3
pull out buffer control logic
daniel-sanche ca4a16d
got buffering working
daniel-sanche 0dba121
check for full table scan revision
daniel-sanche 3537566
renamed and added underscores
daniel-sanche 981f169
added extra check
daniel-sanche d3d4c76
removed unneeded validation
daniel-sanche 1901094
renamed RowMerger to ReadRowsOperation
daniel-sanche 947fe9b
changed _read_rows test file name
daniel-sanche 773d4e5
added row builder tests
daniel-sanche cbb0513
added revise_row tests
daniel-sanche 2bec693
ran blacken
daniel-sanche 5cd8e00
added constructor tests
daniel-sanche d6f3ae1
upgraded submodule
daniel-sanche f2d7e71
added tests
daniel-sanche cb23d32
update docstring
daniel-sanche bc31ab8
update docstring
daniel-sanche f54dfde
fix typo
daniel-sanche 46cfc49
docstring improvements
daniel-sanche 573bbd1
made creating table outside loop into error
daniel-sanche 4f2657d
make tables own active instances, and remove instances when tables close
daniel-sanche 59955be
added pool_size and channels as public properties
daniel-sanche 377a8c9
fixed typo
daniel-sanche 8a29898
simplified pooled multicallable
daniel-sanche 50aa5ba
ran blacken
daniel-sanche 42a52a3
associate ids with instances, instead of Table objects
daniel-sanche abc7a2d
fixed tests
daniel-sanche 836af0f
made sure that empty strings are valid family and qualifier inputs
daniel-sanche e73551d
added tests for state machine
daniel-sanche 792aba1
added state machine tests
daniel-sanche e57c510
fixed broken mock
daniel-sanche 88748a9
added additional tests
daniel-sanche 0c38981
ran blacken
daniel-sanche 50dc608
reverted pooled multicallable changes
daniel-sanche b116755
pass scopes to created channels
daniel-sanche ec5eb07
added basic ping system test
daniel-sanche 55cdcc2
keep both the names and ids in table object
daniel-sanche 0253692
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche 3855333
added api-core to noxfile tests
daniel-sanche 213519e
added basic read rows stream to system tests
daniel-sanche 9e3b411
pull project details out of env vars
daniel-sanche d8cf158
added automatic row creation for system tests
daniel-sanche c9b8217
added read_rows non stream
daniel-sanche 500eff0
added range query system test
daniel-sanche 27130f0
added logic for temporary test tables and instances
daniel-sanche f4f4fac
made iterator active into a property
daniel-sanche 06dee54
added more read_rows system tests
daniel-sanche 9e11f88
fixed lint issues
daniel-sanche 794c55a
added iterator tests
daniel-sanche ccd9545
added tests for timeouts
daniel-sanche ca84b96
ran black
daniel-sanche eb936cf
fixed lint issues
daniel-sanche ab43138
restructured test_client
daniel-sanche cb1884d
changed how random is mocked
daniel-sanche 9a89d74
ran black
daniel-sanche 7f783fc
restructred test_client
daniel-sanche 6a6d219
Merge branch 'add_new_transport' into read_rows_retries
daniel-sanche 72eca75
restructured test_client_read_rows
daniel-sanche ad42436
moved read rows tests in test_client
daniel-sanche 7606e3a
update submodules in nox
daniel-sanche 829e68f
ran black
daniel-sanche e8eff39
Merge branch 'v3' into read_rows_retries
daniel-sanche 6a58e86
removed submodule update
daniel-sanche 9be5b07
removed unneeded import
daniel-sanche 4f819b2
Merge branch 'read_rows_retries' into sharded_read_rows
daniel-sanche f476ad7
Merge branch 'v3' into sharded_read_rows
daniel-sanche 62dcbb5
cleaned up read_rows_sharded function
daniel-sanche b4a95b3
refactored read_rows_query tests to match other files
daniel-sanche 5972722
finished read_rows_query tests
daniel-sanche 7c1643c
fixed issue with ping and warm
daniel-sanche a39d931
added tests for sharded queries
daniel-sanche 482eed9
added new exception type for sharded rpcs
daniel-sanche faec93e
added test for concurrency
daniel-sanche 6f6e010
removed subclass for sharded tests
daniel-sanche a005ec8
added sample_key samples
daniel-sanche dd10624
added system tests
daniel-sanche 82789ec
refactoring shard function
daniel-sanche d39fd0f
added extra checks to query class
daniel-sanche 7e26d40
fixed comment
daniel-sanche 34aea1a
added extra docstring
daniel-sanche 42cac01
renamed sample_keys to sample_row_keys
daniel-sanche 632a106
Merge branch 'v3' into sharded_read_rows
daniel-sanche 05a311e
added metadata to sample_row_keys
daniel-sanche f53af32
changed shard points to be range ends instead of starts
daniel-sanche ac4378d
added concurrency limit
daniel-sanche 9eaa279
added retries for sample_keys
daniel-sanche 6cca7cf
cleaned up code block
daniel-sanche 88e88d4
documented and simplified sharding function
daniel-sanche 26ffe0c
Merge branch 'v3' into sharded_read_rows
daniel-sanche 9302286
split row_range sharding into own helper
daniel-sanche 3f4dd0e
added type alias
daniel-sanche bb72b5e
modify timeouts with batch
daniel-sanche 71b034c
added successfult rows to ShardedReadRowsExceptionGroup
daniel-sanche ceb8129
improved end segment search
daniel-sanche 0e277f4
removed changes to mutation exception
daniel-sanche 37b4967
added excaption tests for new exception types
daniel-sanche 9508a0f
fixed error in sharded_read_rows
daniel-sanche d3f6b0f
added timeouts to batching test
daniel-sanche a4f606e
ran black
daniel-sanche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to raise an error if any of the shard queries overlap? Or is it ok to get duplicate rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we need an error. Also the rows will be de-duplicated on the serverside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the de-duplication work if we're requesting the duplicates in separate rpcs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think the same key can exist in multiple RPCs in the current implementation. The same key value will be put in the shard and we arent segmenting the shard. So it should end up in the rpc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah assuming they use the query.shard() function, that should be the case. But this method allows passing in a generic list of queries, so users may pass in overlapping queries, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that its possible. I think we should avoid this situation, but not by throwing an error. I think we should make it impossible to happen. Perhaps we can do the following:
Create a Batch fetching context that end users create. The context will automatically call SampleRowKeys and cache the result. And maybe refresh it every X minutes.
The end user then interact with this object by passing it lists of keys and ranges that the context shards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then move the
read_rows_sharded(unsharded_query)
function onto the context object? Or something else? I'd be a bit hesitant to add more background tasks if we can avoid it, but we can probably work something out.Another option that would be very simple to add would be to make
query.shard
return a customShardedQuery
object that just wraps the query list, and then only accept that as input forread_rows_sharded
. Or even simpler, just make it a type aliasIs this something we can create an issue for and address after the first alpha, or do you want it resolved before merging this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would need to come before alpha as its part of the public surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made a custom type for
ShardedQueries
, which should discourage people from passing their own custom queries. We can discuss more advanced changes later. Let me know what you think