Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: replicate/wide failed #98385

Closed
cockroach-teamcity opened this issue Mar 10, 2023 · 6 comments · Fixed by #98422
Closed

roachtest: replicate/wide failed #98385

cockroach-teamcity opened this issue Mar 10, 2023 · 6 comments · Fixed by #98422
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 10, 2023

roachtest.replicate/wide failed with artifacts on master @ d4a584e49f0b1ca89738376090939d7669c3b3db:

test artifacts and logs in: /artifacts/replicate/wide/run_1
(allocator.go:440).runWideReplication: expected 0 mis-replicated ranges, but found 19

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=1 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-25240

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 10, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 10, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 10, 2023
@kvoli
Copy link
Collaborator

kvoli commented Mar 10, 2023

xref #97794

@kvoli
Copy link
Collaborator

kvoli commented Mar 10, 2023

I've tried reproducing this 5 times and haven't been able to.

There a couple of issues, however the cause of this failure is due to the default span config being used (3x replication). This only occurs for a short period of time right after the circuit breakers reset and connection is established to the rest of the nodes. The default span config causes the replicate queue to remove voters, despite the actual replication factor being 9x. Voter removal stops once the span config is correct for the range and not the default (3x).

Whilst I wasn't able to reproduce the failure, I was able to reproduce the default span config being seen for a short period (much shorter than in the test failure).

Consider this timeline of events:

restart node
circuit breakers trip
circuit breakers reset 
gossip connection established and liveness established
default span config applied
... remove voter actions returned.
... remove voter actions acted upon causing 8x replication instead of 9x
spanconfig-subscriber established
... consider rebalance actions or add voter returned (for replicas removed)

The default replication factor can be seen in the distribution logging, specifically ‹remove voter› - need=3, have=9,.

Distribution Logs
I230310 22:07:04.281788 13756 13@kv/kvserver/replicate_queue.go:626 ⋮ [T1,n3,replicate,s3,r24/10:‹/Table/2{2-3}›] 2880  repair needed (‹range unavailable›), enqueuing

... gossip established, node liveness established, default span config returned for all ranges

I230310 22:07:04.288304 13768 13@kv/kvserver/allocator/allocatorimpl/allocator.go:1034 ⋮ [T1,n3,replicate,s3,r55/2:‹/Table/5{4-5}›] 2881  ‹remove voter› - need=3, have=9, priority=799.00
I230310 22:07:04.288355 13768 13@kv/kvserver/replicate_queue.go:626 ⋮ [T1,n3,replicate,s3,r55/2:‹/Table/5{4-5}›] 2882  repair needed (‹remove voter›), enqueuing

... span config watcher starts up, correct span config returned for all ranges

I230310 22:07:04.374981 13871 13@kv/kvserver/replicate_queue.go:664 ⋮ [T1,n3,replicate,s3,r25/12:‹/Table/2{3-4}›] 2889  no rebalance target found, not enqueuing
Cockroach Logs

Cockroach Logs - these show when gossip, liveness and eventually span configs are established:

I230310 22:07:04.061346 13344 gossip/client.go:124 ⋮ [T1,n3] 127  started gossip client to n0 (‹35.229.107.180:26257›)
I230310 22:07:04.062136 13344 gossip/client.go:129 ⋮ [T1,n3] 128  closing client to n1 (‹35.229.107.180:26257›): stopping outgoing client to n1 (‹35.229.107.180:26257›); already have incoming
I230310 22:07:04.062195 124 1@gossip/gossip.go:1420 ⋮ [T1,n3] 129  node has connected to cluster via gossip
I230310 22:07:04.062779 124 kv/kvserver/stores.go:282 ⋮ [T1,n3] 130  wrote 4 node addresses to persistent storage
I230310 22:07:04.072183 13358 kv/kvserver/liveness/liveness.go:1224 ⋮ [T1,n3,s3,r3/8:‹/System/{NodeLive…-tsd}›] 131  incremented n4 liveness epoch to 3
I230310 22:07:04.324725 13799 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335 ⋮ [T1,n3] 132  spanconfig-subscriber: established range feed cache
I230310 22:07:04.371446 36 1@util/log/event_log.go:32 ⋮ [T1,n3] 133 ={"Timestamp":1678486024371429052,"EventType":"node_restart","NodeID":3,"StartedAt":1678486016052306715,"LastUp":1678486006479237816}
I230310 22:07:04.372991 13824 kv/kvclient/rangefeed/rangefeedcache/watcher.go:335 ⋮ [T1,n3] 134  settings-watcher: established range feed cache

A fix for this is to never use the default span config and instead just error out very loudly.

If a store is connected to gossip and also liveness, however is unable to retrieve a span config for a range - we can't make any decisions.

Using incorrect span configs seems worse than just doing nothing in many cases.

Another issue is that when restarting every node, the rpc circuit breakers kick in causing essentially a stall for 15 seconds.

@kvoli kvoli added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 10, 2023
@kvoli kvoli added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed GA-blocker labels Mar 10, 2023
@irfansharif irfansharif assigned irfansharif and unassigned kvoli Mar 10, 2023
@irfansharif irfansharif removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Mar 10, 2023
@irfansharif
Copy link
Contributor

Going to get rid of the release blocker since this bug exists in older releases already.

@kvoli
Copy link
Collaborator

kvoli commented Mar 10, 2023

Filed #98421

irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 11, 2023
...subscribed to span configs. Do the same for the store
rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122
since the fallback behavior, if not subscribed, is to use the statically
defined span config for every operation.

- For the replicate queue this mean obtusely applying a replication
  factor of 3, regardless of configuration. This was possible typically
  post node restart before subscription was initially established. We
  saw this in cockroachdb#98385. It was possible then for us to ignore configured
  voter/non-voter/lease constraints.
- For the split queue, we wouldn't actually compute any split keys if
  unsubscribed, so the missing check was somewhat benign. But we would
  be obtusely applying the default range sizes [128MiB,512MiB], so for
  clusters configured with larger range sizes, this could lead to a
  burst of splitting post node-restart.
- For the MVCC GC queue, it would mean applying the the statically
  configured default GC TTL and ignoring any set protected timestamps.
  The latter is best-effort protection but could result in internal
  operations relying on protection (like backups, changefeeds) failing
  informatively. For clusters configured with GC TTL greater than the
  default, post node-restart it could lead to a burst of MVCC GC
  activity and AOST queries failing to find expected data.
- For the store rebalancer, ignoring span configs could result in
  violating lease preferences and voter constraints.

Fixes cockroachdb#98421.
Fixes cockroachdb#98385.

Release note (bug fix): It was previously possible for CockroachDB to
not respect non-default zone configs. This only happened for a short
window after nodes with existing replicas were restarted, and
self-rectified within seconds. This manifested in a few ways:
- If num_replicas was set to something other than 3, we would still
  add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and
    non-voting replicas, it would be ignored. CockroachDB could possibly
    remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and
  512 MiB respectively, we would instead try to size ranges to be within
  [128 MiB, 512MiB]. This could appear as an excess amount of range
  splits or merges, as visible in the Replication Dashboard under "Range
  Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we
  would still GC data only older than 90000s/25h. If the GC TTL was set
  to something larger than 25h, AOST queries going further back may now
  start failing. For GC TTLs less than the 25h default, clusters would
  observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they
  would be ignored. Range data and leases would possibly be moved
  outside where prescribed.
This issues only lasted a few seconds post node-restarts, and any zone
config violations were rectified shortly after.
@erikgrinaker erikgrinaker added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. labels Mar 11, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.replicate/wide failed with artifacts on master @ e4924e2b9be4a36d466beab53a80df9241df4783:

test artifacts and logs in: /artifacts/replicate/wide/run_1
(test_runner.go:990).runTest: test timed out (10m0s)
(cluster.go:1896).Start: ~ COCKROACH_CONNECT_TIMEOUT=0 ./cockroach sql --url 'postgres://root@localhost:26257?sslmode=disable' -e "CREATE SCHEDULE IF NOT EXISTS test_only_backup FOR BACKUP INTO 'gs://cockroachdb-backup-testing/roachprod-scheduled-backups/teamcity-9029456-1678684872-112-n9cpu1/1678723870068004190?AUTH=implicit' RECURRING '*/15 * * * *' FULL BACKUP '@hourly' WITH SCHEDULE OPTIONS first_run = 'now'"
ERROR: server closed the connection.
Is this a CockroachDB node?
unexpected EOF
Failed running "sql": COMMAND_PROBLEM: ssh verbose log retained in ssh_161110.068043100_n1_init-backup-schedule.log: exit status 1

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=1 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@nvanbenschoten
Copy link
Member

Adding the GA-blocker back in for tracking purposes, though it looks like this will be closed shortly anyway.

craig bot pushed a commit that referenced this issue Mar 13, 2023
98261: sql: add crdb_internal views for system statistics tables r=ericharmeling a=ericharmeling

This commit adds two new crdb_internal views:

- crdb_internal.statement_statistics_persisted, which surfaces the system.statement_statistics table
- crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table

Example output from after flush:

```
[email protected]:26257/insights> select * from crdb_internal.statement_statistics_persisted limit 3;
-[ RECORD 1 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x3ab7869b0bc4aa5a
transaction_fingerprint_id | \x95d43bd78dc51d85
plan_hash                  | \x9aec25074eb1e3a0
app_name                   | $ cockroach sql
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT * FROM crdb_internal.statement_statistics_persisted", "querySummary": "SELECT * FROM crdb_internal.statement_statis...", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 24667, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 2.048E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 0, "sqDiff": 0}, "indexes": ["42@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:14:04.614242Z", "latencyInfo": {"max": 0.001212208, "min": 0.001212208, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000007791999999999955, "sqDiff": 0}, "parseLat": {"mean": 0.000016666, "sqDiff": 0}, "planGists": ["AgFUAgD/FwAAAAcYBhg="], "planLat": {"mean": 0.000691666, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.000496084, "sqDiff": 0}, "svcLat": {"mean": 0.001212208, "sqDiff": 0}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | ["42@1"]
-[ RECORD 2 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x44c9fdb49be676cf
transaction_fingerprint_id | \xc1efcc0bba0909f8
plan_hash                  | \x780a1ba35976b15d
app_name                   | insights
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "UPDATE insights_workload_table_0 SET balance = balance + $1 WHERE id = $1", "querySummary": "UPDATE insights_workload_table_0 SET balance = balan... WHERE id = $1", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 28, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 402538.75, "sqDiff": 1160598792985.25}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 4.096E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 31570.321428571428, "sqDiff": 20932497128.107143}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 6.857142857142857, "sqDiff": 435.42857142857133}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 2, "sqDiff": 0}, "seekCountInternal": {"mean": 2, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 4.857142857142857, "sqDiff": 435.42857142857133}, "valueBytes": {"mean": 360.32142857142856, "sqDiff": 756476.107142857}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 159.04887361588396, "sqDiff": 3909.7441771668127}, "cnt": 2619, "firstAttemptCnt": 2619, "idleLat": {"mean": 0.021495726165330273, "sqDiff": 36.39774900003032}, "indexes": ["106@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:31:03.079093Z", "latencyInfo": {"max": 1.724660916, "min": 0.0001765, "p50": 0.000757916, "p90": 0.00191375, "p99": 0.004730417}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000018584035891561339, "sqDiff": 3.132932109484058E-7}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AgHUAQIADwIAAAcKBQoh1AEAAA=="], "planLat": {"mean": 0.0002562748900343638, "sqDiff": 0.0002118085353898781}, "regions": ["us-east1"], "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.0024048477613592997, "sqDiff": 4.850230671161608}, "svcLat": {"mean": 0.0026629810549828195, "sqDiff": 4.852464499918359}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | ["106@1"]
-[ RECORD 3 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x54202c7b75a5ba87
transaction_fingerprint_id | \x0000000000000000
plan_hash                  | \xbee0e52ec8c08bdd
app_name                   | $$ $ cockroach demo
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "INSERT INTO system.jobs(id, created, status, payload, progress, claim_session_id, claim_instance_id, job_type) VALUES ($1, $1, __more1_10__)", "querySummary": "INSERT INTO system.jobs(id, created, st...)", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 300625, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 9223372036.854776, "sqDiff": 0}, "indexes": [], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:13:25.132671Z", "latencyInfo": {"max": 0.000589375, "min": 0.000589375, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000016249999999999988, "sqDiff": 0}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AiACHgA="], "planLat": {"mean": 0.000203792, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.000383958, "sqDiff": 0}, "svcLat": {"mean": 0.000589375, "sqDiff": 0}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | []

Time: 4ms total (execution 3ms / network 1ms)

[email protected]:26257/insights> select * from crdb_internal.transaction_statistics_persisted limit 3;
-[ RECORD 1 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x17d80cf128571d63
app_name       | $ internal-migration-job-mark-job-succeeded
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["b8bbb1bdae56aabc"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 64709, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 6, "commitLat": {"mean": 0, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "svcLat": {"mean": 0.00026919450000000006, "sqDiff": 1.7615729685500012E-8}}}
-[ RECORD 2 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x2b024f7e2567a238
app_name       | $ internal-get-job-row
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["8461f232a36615e7"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 50835, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 3.072E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 3, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 3, "sqDiff": 0}, "stepCountInternal": {"mean": 3, "sqDiff": 0}, "valueBytes": {"mean": 186, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 284.81818181818176, "sqDiff": 3465.636363636355}, "cnt": 11, "commitLat": {"mean": 0.000003469727272727273, "sqDiff": 4.946789218181818E-11}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.0006771060909090909, "sqDiff": 8.91510436082909E-7}}}
-[ RECORD 3 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x37e130a1df20d299
app_name       | $ internal-create-stats
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["98828ded59216546"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 11875, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "commitLat": {"mean": 0.000002291, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 0, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.008680208, "sqDiff": 0}}}

Time: 3ms total (execution 2ms / network 1ms)
```

Epic: none

Release note: Added two views to the crdb_internal catalog: crdb_internal.statement_statistics_persisted, which surfaces data in the persisted system.statement_statistics table, and crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table.

98422: kvserver: disable {split,replicate,mvccGC} queues until... r=irfansharif a=irfansharif

...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation.

- For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in #98385. It was possible then for us to ignore configured voter/non-voter/lease constraints.
- For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart.
- For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data.
- For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints.

Fixes #98421.
Fixes #98385.

Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways:
- If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.

98468: sql: add closest-instance physical planning r=dt a=dt

This changes physical planning, specifically how the SQL instance for a
given KV node ID is resolved, to be more generalized w.r.t. different
locality tier taxonomies.

Previously this function had a special case that checked for, and only
for, a specific locality tier with the key "region" and if it was
found, picked a random instance from the subset of instances where their
value for that matched the value for the KV node.

Matching on and only on the "region" tier is both too specific and not
specific enough: it is "too specific" in that it requires a tier with
the key "region" to be used and to match, and is "not specific enough"
in that it simultaneously ignores more specific locality tiers that
would indicate closer matches (e.g. subregion, AZ, data-center or rack).

Instead, this change generalizes this function to identify the subset of
instances that have the "closest match" in localities to the KV node and
pick one of them, where closest match is defined as the longest matching
prefix of locality tiers. In a simple, single-tier locality taxonomy
using the key "region" this should yield the same behavior as the
previous implementation, as all instances with a matching "region" will
have the same longest matching prefix (at length 1), however this more
general approach should better handle other locality taxonomies that use
more tiers and/or tiers with names other than "region".

Currently this change only applies to physical planning for secondary
tenants until physical planning is unified for system and secondary
tenants.

Release note: none.
Epic: CRDB-16910


98471: changefeedccl: fix kafka messagetoolarge test failure r=samiskin a=samiskin

Fixes: #93847

This fixes the following bug in the TestChangefeedKafkaMessageTooLarge test setup:
1. The feed starts sending messages, randomly triggering a MessageTooLarge error causing a retry with a smaller batch size
2. Eventually, while the retrying process is still ongoing, all 2000 rows are successfully received by the mock kafka sink, causing assertPayloads to complete, causing the test to closeFeed and run CANCEL on the changefeed.
3. The retrying process gets stuck in sendMessage where it can't send the message to the feedCh which has been closed since the changefeed is trying to close, but it also can't exit on the mock sink's tg.done since that only closes after the feed fully closes, which requires the retrying process to end.

Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Shiranka Miskin <[email protected]>
@craig craig bot closed this as completed in 234bcd0 Mar 13, 2023
irfansharif added a commit to irfansharif/cockroach that referenced this issue Mar 16, 2023
...subscribed to span configs. Do the same for the store
rebalancer. We applied this treatment for the merge queue back in cockroachdb#78122
since the fallback behavior, if not subscribed, is to use the statically
defined span config for every operation.

- For the replicate queue this mean obtusely applying a replication
  factor of 3, regardless of configuration. This was possible typically
  post node restart before subscription was initially established. We
  saw this in cockroachdb#98385. It was possible then for us to ignore configured
  voter/non-voter/lease constraints.
- For the split queue, we wouldn't actually compute any split keys if
  unsubscribed, so the missing check was somewhat benign. But we would
  be obtusely applying the default range sizes [128MiB,512MiB], so for
  clusters configured with larger range sizes, this could lead to a
  burst of splitting post node-restart.
- For the MVCC GC queue, it would mean applying the the statically
  configured default GC TTL and ignoring any set protected timestamps.
  The latter is best-effort protection but could result in internal
  operations relying on protection (like backups, changefeeds) failing
  informatively. For clusters configured with GC TTL greater than the
  default, post node-restart it could lead to a burst of MVCC GC
  activity and AOST queries failing to find expected data.
- For the store rebalancer, ignoring span configs could result in
  violating lease preferences and voter/non-voter constraints.

Fixes cockroachdb#98421.
Fixes cockroachdb#98385.

While here, we also introduce the following non-public cluster settings
to selectively enable/disable KV queues:
- kv.mvcc_gc_queue.enabled
- kv.split_queue.enabled
- kv.replicate_queue.enabled

Release note (bug fix): It was previously possible for CockroachDB to
not respect non-default zone configs. This could only happen for a short
window after nodes with existing replicas were restarted (measured in
seconds), and self-rectified (also within seconds). This manifested in a
few ways:
- If num_replicas was set to something other than 3, we would still
  add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and
    non-voting replicas, it would be ignored. CockroachDB could possibly
    remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and
  512 MiB respectively, we would instead try to size ranges to be within
  [128 MiB, 512MiB]. This could appear as an excess amount of range
  splits or merges, as visible in the Replication Dashboard under "Range
  Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we
  would still GC data only older than 90000s/25h. If the GC TTL was set
  to something larger than 25h, AOST queries going further back may now
  start failing. For GC TTLs less than the 25h default, clusters would
  observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they
  would be ignored. Range data and leases would possibly be moved
  outside where prescribed.
This issues lasted a few seconds post node-restarts, and any zone config
violations were rectified shortly after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants