-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: Add new roachtest to test fast rebalance #103030
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-storage
Storage Team
Comments
itsbilal
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-storage
Relating to our storage engine (Pebble) on-disk storage.
T-storage
Storage Team
labels
May 10, 2023
One challenge with this is going to be orchestrating access to an S3 or GCS bucket to be able to use as a shared storage, and also handling things like cleanup, quotas/billing, etc. One way to work around this could be to just spin up one extra node in the cluster and use that as the blob storage, through minio or something on that node. cc @RaduBerinde (thanks for bringing this issue up) |
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 21, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 24, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 25, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 25, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 25, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
itsbilal
added a commit
to itsbilal/cockroach
that referenced
this issue
Jul 25, 2023
This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: cockroachdb#103030 Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 17, 2023
107394: cmd/roachtest: add disagg-rebalance roachtest r=renatolabs a=itsbilal This test adds a roachtest that spins up a cluster with 3 nodes using S3 as the --experimental-shared-storage, and then adds a fourth node after loading a tpcc fixture and with a foreground workload running on it. It confirms the fourth node gets hydrated without transferring all live bytes over the wire. Epic: none Fixes: #103030 Release note: None 108154: kvcoord: refactor ambiguous commit tests r=AlexTalks a=AlexTalks In #107323, testing for the ambiguous write case that leads to the "transaction unexpectedly committed" bug were introduced, however to increase test coverage of the fix, multiple schedules of operations need to be tested. This change simply refactors the framework of the existing test in order to enable the addition of muliple subtests. The subtests are included in a separate patch. Part of: #103817 Release note: None 108819: roachtest: add a c2c cutover `TO LATEST` test r=lidorcarmel a=lidorcarmel We only have c2c roachtests that cutover to the past, adding one that does a cutover to LATEST. Using the `TO LATEST` sql because we expect that to be used more in production. Epic: none Release note: None 108910: streamingccl: minor log updates and code reorg r=lidorcarmel a=stevendanna See individual commits. Epic: none 108914: sqlproxyccl: do not report BackendDown metrics on throttle and routing errors r=JeffSwenson,andy-kimball a=jaylim-crl #### sqlproxyccl: do not report BackendDown metrics on throttle and routing errors Previously, we were reporting the backend_down metric on the following errors: - codeProxyRefusedConnection - codeParamsRoutingFailed - codeUnavailable These errors do not imply that the backend is down. We originally introduced this in #57431, but looking at the PR, it appears unintentional. This commit fixes that by not reporting the backend_down metric when the proxy returns such errors. Release note: None Epic: none #### sqlproxyccl: rename codeBackendDown to codeBackendDialFailed This commit renames codeBackendDown to codeBackendDialFailed to prevent confusions by developers. Note that we don't rename the metric here to avoid breaking downstream consumers. At the same time, we will remove the old codeBackendRefusedTLS code as it does not serve any purpose, and there wasn't a metric for it as well. Release note: None Epic: none Release justification: This fixes accuracy issues with SQL Proxy metrics. 108920: util/log: add custom crash tags to sentry r=dhartunian a=pjtatlow In #106786 we added the ability to provide an environment variable that was meant to add custom tags to sentry crash reports. That change added the function that would create the map of crash report tags / values, but it was never actually used. This change ensures that tags from that environment variable will actually show up in the sentry reports. Release note: None Epic: None Co-authored-by: Bilal Akhtar <[email protected]> Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Lidor Carmel <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: PJ Tatlow <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-storage
Relating to our storage engine (Pebble) on-disk storage.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-storage
Storage Team
Once #103028 is complete, a roachtest that creates a cluster, loads a fixture, and adds a node or two (and ensures they catch up in replica count without crashing) would be good to have as an end-to-end test for disaggregated ingestions / fast rebalances.
Jira issue: CRDB-27802
The text was updated successfully, but these errors were encountered: