-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: replicate/wide failed #98385
Comments
xref #97794 |
I've tried reproducing this 5 times and haven't been able to. There a couple of issues, however the cause of this failure is due to the default span config being used (3x replication). This only occurs for a short period of time right after the circuit breakers reset and connection is established to the rest of the nodes. The default span config causes the replicate queue to remove voters, despite the actual replication factor being 9x. Voter removal stops once the span config is correct for the range and not the default (3x). Whilst I wasn't able to reproduce the failure, I was able to reproduce the default span config being seen for a short period (much shorter than in the test failure). Consider this timeline of events:
The default replication factor can be seen in the distribution logging, specifically Distribution LogsI230310 22:07:04.281788 13756 13@kv/kvserver/replicate_queue.go:626 ⋮ [T1,n3,replicate,s3,r24/10:‹/Table/2{2-3}›] 2880 repair needed (‹range unavailable›), enqueuing
... gossip established, node liveness established, default span config returned for all ranges
I230310 22:07:04.288304 13768 13@kv/kvserver/allocator/allocatorimpl/allocator.go:1034 ⋮ [T1,n3,replicate,s3,r55/2:‹/Table/5{4-5}›] 2881 ‹remove voter› - need=3, have=9, priority=799.00
I230310 22:07:04.288355 13768 13@kv/kvserver/replicate_queue.go:626 ⋮ [T1,n3,replicate,s3,r55/2:‹/Table/5{4-5}›] 2882 repair needed (‹remove voter›), enqueuing
... span config watcher starts up, correct span config returned for all ranges
I230310 22:07:04.374981 13871 13@kv/kvserver/replicate_queue.go:664 ⋮ [T1,n3,replicate,s3,r25/12:‹/Table/2{3-4}›] 2889 no rebalance target found, not enqueuing Cockroach LogsCockroach Logs - these show when gossip, liveness and eventually span configs are established: I230310 22:07:04.061346 13344 gossip/client.go:124 ⋮ [T1,n3] 127 started gossip client to n0 (‹35.229.107.180:26257›)
I230310 22:07:04.062136 13344 gossip/client.go:129 ⋮ [T1,n3] 128 closing client to n1 (‹35.229.107.180:26257›): stopping outgoing client to n1 (‹35.229.107.180:26257›); already have incoming
I230310 22:07:04.062195 124 1@gossip/gossip.go:1420 ⋮ [T1,n3] 129 node has connected to cluster via gossip
I230310 22:07:04.062779 124 kv/kvserver/stores.go:282 ⋮ [T1,n3] 130 wrote 4 node addresses to persistent storage
I230310 22:07:04.072183 13358 kv/kvserver/liveness/liveness.go:1224 ⋮ [T1,n3,s3,r3/8:‹/System/{NodeLive…-tsd}›] 131 incremented n4 liveness epoch to 3
I230310 22:07:04.324725 13799 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335 ⋮ [T1,n3] 132 spanconfig-subscriber: established range feed cache
I230310 22:07:04.371446 36 1@util/log/event_log.go:32 ⋮ [T1,n3] 133 ={"Timestamp":1678486024371429052,"EventType":"node_restart","NodeID":3,"StartedAt":1678486016052306715,"LastUp":1678486006479237816}
I230310 22:07:04.372991 13824 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335 ⋮ [T1,n3] 134 settings-watcher: established range feed cache A fix for this is to never use the default span config and instead just error out very loudly. If a store is connected to gossip and also liveness, however is unable to retrieve a span config for a range - we can't make any decisions. Using incorrect span configs seems worse than just doing nothing in many cases. Another issue is that when restarting every node, the rpc circuit breakers kick in causing essentially a stall for 15 seconds. |
Going to get rid of the release blocker since this bug exists in older releases already. |
Filed #98421 |
...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation. - For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in cockroachdb#98385. It was possible then for us to ignore configured voter/non-voter/lease constraints. - For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart. - For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data. - For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints. Fixes cockroachdb#98421. Fixes cockroachdb#98385. Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways: - If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication. - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas. - If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations". - If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage. - If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.
roachtest.replicate/wide failed with artifacts on master @ e4924e2b9be4a36d466beab53a80df9241df4783:
Parameters: |
Adding the |
98261: sql: add crdb_internal views for system statistics tables r=ericharmeling a=ericharmeling This commit adds two new crdb_internal views: - crdb_internal.statement_statistics_persisted, which surfaces the system.statement_statistics table - crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table Example output from after flush: ``` [email protected]:26257/insights> select * from crdb_internal.statement_statistics_persisted limit 3; -[ RECORD 1 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x3ab7869b0bc4aa5a transaction_fingerprint_id | \x95d43bd78dc51d85 plan_hash | \x9aec25074eb1e3a0 app_name | $ cockroach sql node_id | 1 agg_interval | 01:00:00 metadata | {"db": "insights", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT * FROM crdb_internal.statement_statistics_persisted", "querySummary": "SELECT * FROM crdb_internal.statement_statis...", "stmtTyp": "TypeDML", "vec": true} statistics | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 24667, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 2.048E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 0, "sqDiff": 0}, "indexes": ["42@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:14:04.614242Z", "latencyInfo": {"max": 0.001212208, "min": 0.001212208, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000007791999999999955, "sqDiff": 0}, "parseLat": {"mean": 0.000016666, "sqDiff": 0}, "planGists": ["AgFUAgD/FwAAAAcYBhg="], "planLat": {"mean": 0.000691666, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.000496084, "sqDiff": 0}, "svcLat": {"mean": 0.001212208, "sqDiff": 0}}} plan | {"Children": [], "Name": ""} index_recommendations | {} indexes_usage | ["42@1"] -[ RECORD 2 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x44c9fdb49be676cf transaction_fingerprint_id | \xc1efcc0bba0909f8 plan_hash | \x780a1ba35976b15d app_name | insights node_id | 1 agg_interval | 01:00:00 metadata | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "UPDATE insights_workload_table_0 SET balance = balance + $1 WHERE id = $1", "querySummary": "UPDATE insights_workload_table_0 SET balance = balan... WHERE id = $1", "stmtTyp": "TypeDML", "vec": true} statistics | {"execution_statistics": {"cnt": 28, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 402538.75, "sqDiff": 1160598792985.25}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 4.096E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 31570.321428571428, "sqDiff": 20932497128.107143}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 6.857142857142857, "sqDiff": 435.42857142857133}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 2, "sqDiff": 0}, "seekCountInternal": {"mean": 2, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 4.857142857142857, "sqDiff": 435.42857142857133}, "valueBytes": {"mean": 360.32142857142856, "sqDiff": 756476.107142857}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 159.04887361588396, "sqDiff": 3909.7441771668127}, "cnt": 2619, "firstAttemptCnt": 2619, "idleLat": {"mean": 0.021495726165330273, "sqDiff": 36.39774900003032}, "indexes": ["106@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:31:03.079093Z", "latencyInfo": {"max": 1.724660916, "min": 0.0001765, "p50": 0.000757916, "p90": 0.00191375, "p99": 0.004730417}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000018584035891561339, "sqDiff": 3.132932109484058E-7}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AgHUAQIADwIAAAcKBQoh1AEAAA=="], "planLat": {"mean": 0.0002562748900343638, "sqDiff": 0.0002118085353898781}, "regions": ["us-east1"], "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.0024048477613592997, "sqDiff": 4.850230671161608}, "svcLat": {"mean": 0.0026629810549828195, "sqDiff": 4.852464499918359}}} plan | {"Children": [], "Name": ""} index_recommendations | {} indexes_usage | ["106@1"] -[ RECORD 3 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x54202c7b75a5ba87 transaction_fingerprint_id | \x0000000000000000 plan_hash | \xbee0e52ec8c08bdd app_name | $$ $ cockroach demo node_id | 1 agg_interval | 01:00:00 metadata | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "INSERT INTO system.jobs(id, created, status, payload, progress, claim_session_id, claim_instance_id, job_type) VALUES ($1, $1, __more1_10__)", "querySummary": "INSERT INTO system.jobs(id, created, st...)", "stmtTyp": "TypeDML", "vec": true} statistics | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 300625, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 9223372036.854776, "sqDiff": 0}, "indexes": [], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:13:25.132671Z", "latencyInfo": {"max": 0.000589375, "min": 0.000589375, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000016249999999999988, "sqDiff": 0}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AiACHgA="], "planLat": {"mean": 0.000203792, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.000383958, "sqDiff": 0}, "svcLat": {"mean": 0.000589375, "sqDiff": 0}}} plan | {"Children": [], "Name": ""} index_recommendations | {} indexes_usage | [] Time: 4ms total (execution 3ms / network 1ms) [email protected]:26257/insights> select * from crdb_internal.transaction_statistics_persisted limit 3; -[ RECORD 1 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x17d80cf128571d63 app_name | $ internal-migration-job-mark-job-succeeded node_id | 1 agg_interval | 01:00:00 metadata | {"stmtFingerprintIDs": ["b8bbb1bdae56aabc"]} statistics | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 64709, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 6, "commitLat": {"mean": 0, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "svcLat": {"mean": 0.00026919450000000006, "sqDiff": 1.7615729685500012E-8}}} -[ RECORD 2 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x2b024f7e2567a238 app_name | $ internal-get-job-row node_id | 1 agg_interval | 01:00:00 metadata | {"stmtFingerprintIDs": ["8461f232a36615e7"]} statistics | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 50835, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 3.072E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 3, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 3, "sqDiff": 0}, "stepCountInternal": {"mean": 3, "sqDiff": 0}, "valueBytes": {"mean": 186, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 284.81818181818176, "sqDiff": 3465.636363636355}, "cnt": 11, "commitLat": {"mean": 0.000003469727272727273, "sqDiff": 4.946789218181818E-11}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.0006771060909090909, "sqDiff": 8.91510436082909E-7}}} -[ RECORD 3 ] aggregated_ts | 2023-03-08 23:00:00+00 fingerprint_id | \x37e130a1df20d299 app_name | $ internal-create-stats node_id | 1 agg_interval | 01:00:00 metadata | {"stmtFingerprintIDs": ["98828ded59216546"]} statistics | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 11875, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "commitLat": {"mean": 0.000002291, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 0, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.008680208, "sqDiff": 0}}} Time: 3ms total (execution 2ms / network 1ms) ``` Epic: none Release note: Added two views to the crdb_internal catalog: crdb_internal.statement_statistics_persisted, which surfaces data in the persisted system.statement_statistics table, and crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table. 98422: kvserver: disable {split,replicate,mvccGC} queues until... r=irfansharif a=irfansharif ...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation. - For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in #98385. It was possible then for us to ignore configured voter/non-voter/lease constraints. - For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart. - For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data. - For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints. Fixes #98421. Fixes #98385. Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways: - If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication. - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas. - If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations". - If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage. - If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after. 98468: sql: add closest-instance physical planning r=dt a=dt This changes physical planning, specifically how the SQL instance for a given KV node ID is resolved, to be more generalized w.r.t. different locality tier taxonomies. Previously this function had a special case that checked for, and only for, a specific locality tier with the key "region" and if it was found, picked a random instance from the subset of instances where their value for that matched the value for the KV node. Matching on and only on the "region" tier is both too specific and not specific enough: it is "too specific" in that it requires a tier with the key "region" to be used and to match, and is "not specific enough" in that it simultaneously ignores more specific locality tiers that would indicate closer matches (e.g. subregion, AZ, data-center or rack). Instead, this change generalizes this function to identify the subset of instances that have the "closest match" in localities to the KV node and pick one of them, where closest match is defined as the longest matching prefix of locality tiers. In a simple, single-tier locality taxonomy using the key "region" this should yield the same behavior as the previous implementation, as all instances with a matching "region" will have the same longest matching prefix (at length 1), however this more general approach should better handle other locality taxonomies that use more tiers and/or tiers with names other than "region". Currently this change only applies to physical planning for secondary tenants until physical planning is unified for system and secondary tenants. Release note: none. Epic: CRDB-16910 98471: changefeedccl: fix kafka messagetoolarge test failure r=samiskin a=samiskin Fixes: #93847 This fixes the following bug in the TestChangefeedKafkaMessageTooLarge test setup: 1. The feed starts sending messages, randomly triggering a MessageTooLarge error causing a retry with a smaller batch size 2. Eventually, while the retrying process is still ongoing, all 2000 rows are successfully received by the mock kafka sink, causing assertPayloads to complete, causing the test to closeFeed and run CANCEL on the changefeed. 3. The retrying process gets stuck in sendMessage where it can't send the message to the feedCh which has been closed since the changefeed is trying to close, but it also can't exit on the mock sink's tg.done since that only closes after the feed fully closes, which requires the retrying process to end. Release note: None Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: irfan sharif <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Shiranka Miskin <[email protected]>
...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation. - For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in cockroachdb#98385. It was possible then for us to ignore configured voter/non-voter/lease constraints. - For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart. - For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data. - For the store rebalancer, ignoring span configs could result in violating lease preferences and voter/non-voter constraints. Fixes cockroachdb#98421. Fixes cockroachdb#98385. While here, we also introduce the following non-public cluster settings to selectively enable/disable KV queues: - kv.mvcc_gc_queue.enabled - kv.split_queue.enabled - kv.replicate_queue.enabled Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This could only happen for a short window after nodes with existing replicas were restarted (measured in seconds), and self-rectified (also within seconds). This manifested in a few ways: - If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication. - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas. - If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations". - If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage. - If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.
roachtest.replicate/wide failed with artifacts on master @ d4a584e49f0b1ca89738376090939d7669c3b3db:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=1
,ROACHTEST_encrypted=false
,ROACHTEST_fs=ext4
,ROACHTEST_localSSD=true
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-25240
The text was updated successfully, but these errors were encountered: