-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/contention/txnidcache: reuse blocks in list, account for space #77220
Conversation
@Azhng what do you think of this? I didn't update the tests. It seemed sort of painful. It might be a similar amount of effort to port that failing test to be datadriven. It'd be an improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I will run this through kv95 to see what's the performance impact. If it looks good I will go ahead and port this to datadriven test.
Reviewable status: complete! 0 of 0 LGTMs obtained
8ac79c0
to
edf6f89
Compare
kv95 (3 node with n1-highcpu-32 nodes): Workload configuration:
TxnID Cache disabled:
TxnID Cache enabled:
|
Added datadriven test. |
inspectEvictionListShape | ||
---- | ||
blockIdx: 0, blockSize: 42 | ||
blockIdx: 1, blockSize: 84 | ||
blockIdx: 2, blockSize: 168 | ||
|
||
ensureEvictionBlockOrdering names=block1,block2,block3 | ||
---- | ||
blockIdx: 0, blockName: block1 | ||
blockIdx: 1, blockName: block2 | ||
blockIdx: 2, blockName: block3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you're missing out on the power of datadriven tests. The point is that you visually verify your expectations when you read this file and you can rewrite it.
It seems like all of verifyBlockExistence
, inspectEvictionListShape
and ensureEvictionBlockOrdering
all should just be something like show
show
----
block1 42
block2 84
block3 168
That tells you the order and the size and the indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also removes the need for verifyBlockNonExistence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test so that show
will output the eviction list and a dump of the in-memory map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do you think the output of the in-memory map is a bit too verbose ?
edf6f89
to
e2b5e1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm, do you think the output of the in-memory map is a bit too verbose ?
way too verbose, it was not my suggestion. My suggestion was just show the block names and their number of elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, ajwerner wrote…
way too verbose, it was not my suggestion. My suggestion was just show the block names and their number of elements.
Even the indexes are superfluous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, ajwerner wrote…
Even the indexes are superfluous.
Hmm I see. Just to clarify, how should this test go about testing the content within the in-memory map? Should the check be performed while show
is populating the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Hmm I see. Just to clarify, how should this test go about testing the content within the in-memory map? Should the check be performed while
show
is populating the output?
For the contents you can have a different command which probes? Were you testing the contents before?
e2b5e1e
to
fae5171
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, ajwerner wrote…
For the contents you can have a different command which probes? Were you testing the contents before?
Ah, I see the where the confusion is coming from 🤦 . verifyBlockExistence
was the one that's doing the check on the in-memroy map content. Though I should definitely rename that. 👍
6ec05fe
to
2a3830b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/contention/txnidcache/testdata/fifo_cache, line 67 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Ah, I see the where the confusion is coming from 🤦 .
verifyBlockExistence
was the one that's doing the check on the in-memroy map content. Though I should definitely rename that. 👍
Updated.
This change does two things to the txnidcache: 1) It accounts for the space used by the fifo eviction list. Previously we'd use more than double the intended space. We should probably also subtrace out the size of the buffers we're currently filling and the channel we use to communicate them, but I'll leave that for later. 2) It stops trying to compact the blocks. Compacting the blocks ends up being a good deal of overhead because we have to copy across every single message. Instead we can just append the block directly to the list. This does have the hazard of wasting a lot of space when the blocks are sparse. However, if the blocks are sparse, we know that the throughput is low, so it's fine. Resolves cockroachdb#76738 Release justification: bug fixes and low-risk updates to new functionality Release note: None
2a3830b
to
7dd272a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @Azhng)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r1, 1 of 2 files at r5, 1 of 1 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @ajwerner)
TFTR! bors r=maryliag,ajwerner |
Build succeeded: |
Previously, TxnID cache was disabled by default due to performance issues. This has since been resolved in cockroachdb#77220. This commit re-enables TxnID Cache by default. Release justification: low risk high reward change. Release note: None
Previously, TxnID cache was disabled by default due to performance issues. This has since been resolved in cockroachdb#77220. This commit re-enables TxnID Cache by default. Release justification: low risk high reward change. Release note: None
77442: sqlproxyccl: Add codeUnavailable to the list of error codes r=alyshanjahani-crl a=alyshanjahani-crl Release justification: fixes for high-priority bug in existing functionality Previously, if a tenant cluster had maxPods set to 0 the error returned by directory.EnsureTenantAddr was not treated as a non-retryable error. The tenant directory implementation used in CockroachCloud now identifies this situation and returns a status error with codes.FailedPrecondition and a descriptive message. This patch adds a check for the FailedPrecondition error in connector.lookupAddr. Release note: None 77626: sql: re-enable txnidcache by default r=maryliag,ajwerner a=Azhng Previously, TxnID cache was disabled by default due to performance issues. This has since been resolved in #77220. This commit re-enables TxnID Cache by default. Release justification: low risk high reward change. Release note: None 77675: cmd/roachtest: tweaks to sstable corruption test r=itsbilal a=nicktrav A few minor changes to the existing roachtest: Reduce the warehouse count on ingest back to 100 - this should be ample, and it matches the warehouse count on the read path in a later part of the test. Change the stop options to wait until the process has been terminated before continuing. Re-work the command to search for SSTables to corrupt - use tables from the most recent manifest (if multiple are present); consider SSTables with either a start or end key representing a user table; shuffle tables to distribute corruption over the LSM; perform filtering in the bash command rather than in the test runner itself. Release justification: Roachtest-only change. Release note: None. Touches #77321. 77700: ccl/sqlproxyccl: add connection migration-related metrics r=JeffSwenson a=jaylim-crl #### ccl/sqlproxyccl: add metric that measures connection migration latency Previously, we added support for connection migration in #76805. This commit adds a new `proxy.conn_migration.attempted.latency` metric that tracks latency for attempted connection migrations. Having this metric would be beneficial as we can now know how long users were blocked during connection migrations. Release justification: Low-risk metric-only change within sqlproxy. Release note: None #### ccl/sqlproxyccl: add metric that records the transfer response message size Informs #76000, and follow up to #76805. This commit adds a new proxy.conn_migration.transfer_response.message_size metric that will track the distribution of the transfer response message size. This will be used to tune a value for the SQL cluster setting: sql.session_transfer.max_session_size. Release justification: Low-risk metric-only change within sqlproxy. Release note: None Co-authored-by: Alyshan Jahani <[email protected]> Co-authored-by: Azhng <[email protected]> Co-authored-by: Nick Travers <[email protected]> Co-authored-by: Jay <[email protected]>
This change does two things to the txnidcache:
we'd use more than double the intended space. We should probably also
subtrace out the size of the buffers we're currently filling and the
channel we use to communicate them, but I'll leave that for later.
being a good deal of overhead because we have to copy across every
single message. Instead we can just append the block directly to the
list. This does have the hazard of wasting a lot of space when the
blocks are sparse. However, if the blocks are sparse, we know that the
throughput is low, so it's fine.
Resolves #76738
Release justification: bug fixes and low-risk updates to new functionality
Release note: None