Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: TestChangefeedKafkaMessageTooLarge failed #93847

Closed
cockroach-teamcity opened this issue Dec 17, 2022 · 6 comments · Fixed by #98471
Closed

ccl/changefeedccl: TestChangefeedKafkaMessageTooLarge failed #93847

cockroach-teamcity opened this issue Dec 17, 2022 · 6 comments · Fixed by #98471
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cdc
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 17, 2022

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ 7c8e7012b73a598f3aa1da9bbf422e833706f5d3:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge2875385573
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:7550: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge2875385573
--- FAIL: TestChangefeedKafkaMessageTooLarge (46.91s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:751: making server as secondary tenant
    helpers_test.go:837: making kafka feed factory
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (46.90s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7478: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.48s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-22552

Epic CRDB-11732

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 17, 2022
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Dec 17, 2022
@blathers-crl blathers-crl bot added the T-cdc label Dec 17, 2022
@samiskin samiskin self-assigned this Dec 21, 2022
@tbg
Copy link
Member

tbg commented Dec 23, 2022

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ c4bde8b72cdd4016845ae70ef5162b3f11fab1fb:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1541882091
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:7551: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1541882091
--- FAIL: TestChangefeedKafkaMessageTooLarge (46.53s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:754: making server as system tenant
    helpers_test.go:837: making kafka feed factory
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (46.52s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7479: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.39s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ 0d089058d1be30cc7fce61a898c63e2cf87795cd:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/50a8284f7c7f32a812f6eaeb52caed36/logTestChangefeedKafkaMessageTooLarge1626955451
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:7940: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/50a8284f7c7f32a812f6eaeb52caed36/logTestChangefeedKafkaMessageTooLarge1626955451
--- FAIL: TestChangefeedKafkaMessageTooLarge (47.73s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:810: making server as system tenant
    helpers_test.go:893: making kafka feed factory
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (47.69s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7868: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.86s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ 31365e21dc606cdc1e4302c86192ffc5a6cf1255:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1934714961
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:8036: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1934714961
--- FAIL: TestChangefeedKafkaMessageTooLarge (46.62s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:810: making server as system tenant
    helpers_test.go:893: making kafka feed factory
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (46.61s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7964: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.36s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ c95bef097bd4c213c6b5c0c125a9a846c4479d73:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1213807325
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:8036: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/905465f697dc04c39f7d8a82cb057bfe/logTestChangefeedKafkaMessageTooLarge1213807325
--- FAIL: TestChangefeedKafkaMessageTooLarge (49.50s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:807: making server as secondary tenant
    helpers_test.go:893: making kafka feed factory
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (49.49s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7964: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.89s)

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedKafkaMessageTooLarge failed with artifacts on master @ 8124f66625a8451a5f45f1831246038fb0eeb2a1:

=== RUN   TestChangefeedKafkaMessageTooLarge
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/b4b4ff567208b3b36d9c1ab53c36b00f/logTestChangefeedKafkaMessageTooLarge3619894655
    test_log_scope.go:79: use -show-logs to present logs inline
=== CONT  TestChangefeedKafkaMessageTooLarge
    changefeed_test.go:8037: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/b4b4ff567208b3b36d9c1ab53c36b00f/logTestChangefeedKafkaMessageTooLarge3619894655
--- FAIL: TestChangefeedKafkaMessageTooLarge (47.97s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka
    helpers_test.go:810: making server as system tenant
    helpers_test.go:893: making kafka feed factory
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka
    testfeed_test.go:278: creating external connection
    testfeed_test.go:281: ran create external connection
    --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka (47.91s)
=== RUN   TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
=== CONT  TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill
    changefeed_test.go:7965: still waiting for job status; current reverting
        --- FAIL: TestChangefeedKafkaMessageTooLarge/kafka/succeed_against_a_large_backfill (45.63s)
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Mar 13, 2023
98261: sql: add crdb_internal views for system statistics tables r=ericharmeling a=ericharmeling

This commit adds two new crdb_internal views:

- crdb_internal.statement_statistics_persisted, which surfaces the system.statement_statistics table
- crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table

Example output from after flush:

```
[email protected]:26257/insights> select * from crdb_internal.statement_statistics_persisted limit 3;
-[ RECORD 1 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x3ab7869b0bc4aa5a
transaction_fingerprint_id | \x95d43bd78dc51d85
plan_hash                  | \x9aec25074eb1e3a0
app_name                   | $ cockroach sql
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT * FROM crdb_internal.statement_statistics_persisted", "querySummary": "SELECT * FROM crdb_internal.statement_statis...", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 24667, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 2.048E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 0, "sqDiff": 0}, "indexes": ["42@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:14:04.614242Z", "latencyInfo": {"max": 0.001212208, "min": 0.001212208, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000007791999999999955, "sqDiff": 0}, "parseLat": {"mean": 0.000016666, "sqDiff": 0}, "planGists": ["AgFUAgD/FwAAAAcYBhg="], "planLat": {"mean": 0.000691666, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.000496084, "sqDiff": 0}, "svcLat": {"mean": 0.001212208, "sqDiff": 0}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | ["42@1"]
-[ RECORD 2 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x44c9fdb49be676cf
transaction_fingerprint_id | \xc1efcc0bba0909f8
plan_hash                  | \x780a1ba35976b15d
app_name                   | insights
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "UPDATE insights_workload_table_0 SET balance = balance + $1 WHERE id = $1", "querySummary": "UPDATE insights_workload_table_0 SET balance = balan... WHERE id = $1", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 28, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 402538.75, "sqDiff": 1160598792985.25}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 4.096E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 31570.321428571428, "sqDiff": 20932497128.107143}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 6.857142857142857, "sqDiff": 435.42857142857133}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 2, "sqDiff": 0}, "seekCountInternal": {"mean": 2, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 4.857142857142857, "sqDiff": 435.42857142857133}, "valueBytes": {"mean": 360.32142857142856, "sqDiff": 756476.107142857}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 159.04887361588396, "sqDiff": 3909.7441771668127}, "cnt": 2619, "firstAttemptCnt": 2619, "idleLat": {"mean": 0.021495726165330273, "sqDiff": 36.39774900003032}, "indexes": ["106@1"], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:31:03.079093Z", "latencyInfo": {"max": 1.724660916, "min": 0.0001765, "p50": 0.000757916, "p90": 0.00191375, "p99": 0.004730417}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000018584035891561339, "sqDiff": 3.132932109484058E-7}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AgHUAQIADwIAAAcKBQoh1AEAAA=="], "planLat": {"mean": 0.0002562748900343638, "sqDiff": 0.0002118085353898781}, "regions": ["us-east1"], "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.0024048477613592997, "sqDiff": 4.850230671161608}, "svcLat": {"mean": 0.0026629810549828195, "sqDiff": 4.852464499918359}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | ["106@1"]
-[ RECORD 3 ]
aggregated_ts              | 2023-03-08 23:00:00+00
fingerprint_id             | \x54202c7b75a5ba87
transaction_fingerprint_id | \x0000000000000000
plan_hash                  | \xbee0e52ec8c08bdd
app_name                   | $$ $ cockroach demo
node_id                    | 1
agg_interval               | 01:00:00
metadata                   | {"db": "insights", "distsql": false, "failed": false, "fullScan": false, "implicitTxn": false, "query": "INSERT INTO system.jobs(id, created, status, payload, progress, claim_session_id, claim_instance_id, job_type) VALUES ($1, $1, __more1_10__)", "querySummary": "INSERT INTO system.jobs(id, created, st...)", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 300625, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": [], "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "firstAttemptCnt": 1, "idleLat": {"mean": 9223372036.854776, "sqDiff": 0}, "indexes": [], "lastErrorCode": "", "lastExecAt": "2023-03-08T23:13:25.132671Z", "latencyInfo": {"max": 0.000589375, "min": 0.000589375, "p50": 0, "p90": 0, "p99": 0}, "maxRetries": 0, "nodes": [1], "numRows": {"mean": 1, "sqDiff": 0}, "ovhLat": {"mean": 0.0000016249999999999988, "sqDiff": 0}, "parseLat": {"mean": 0, "sqDiff": 0}, "planGists": ["AiACHgA="], "planLat": {"mean": 0.000203792, "sqDiff": 0}, "regions": ["us-east1"], "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "runLat": {"mean": 0.000383958, "sqDiff": 0}, "svcLat": {"mean": 0.000589375, "sqDiff": 0}}}
plan                       | {"Children": [], "Name": ""}
index_recommendations      | {}
indexes_usage              | []

Time: 4ms total (execution 3ms / network 1ms)

[email protected]:26257/insights> select * from crdb_internal.transaction_statistics_persisted limit 3;
-[ RECORD 1 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x17d80cf128571d63
app_name       | $ internal-migration-job-mark-job-succeeded
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["b8bbb1bdae56aabc"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 64709, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 6, "commitLat": {"mean": 0, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 1, "sqDiff": 0}, "svcLat": {"mean": 0.00026919450000000006, "sqDiff": 1.7615729685500012E-8}}}
-[ RECORD 2 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x2b024f7e2567a238
app_name       | $ internal-get-job-row
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["8461f232a36615e7"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 50835, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 3.072E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 3, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 1, "sqDiff": 0}, "seekCountInternal": {"mean": 1, "sqDiff": 0}, "stepCount": {"mean": 3, "sqDiff": 0}, "stepCountInternal": {"mean": 3, "sqDiff": 0}, "valueBytes": {"mean": 186, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 284.81818181818176, "sqDiff": 3465.636363636355}, "cnt": 11, "commitLat": {"mean": 0.000003469727272727273, "sqDiff": 4.946789218181818E-11}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 1, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 1, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.0006771060909090909, "sqDiff": 8.91510436082909E-7}}}
-[ RECORD 3 ]
aggregated_ts  | 2023-03-08 23:00:00+00
fingerprint_id | \x37e130a1df20d299
app_name       | $ internal-create-stats
node_id        | 1
agg_interval   | 01:00:00
metadata       | {"stmtFingerprintIDs": ["98828ded59216546"]}
statistics     | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "cpuSQLNanos": {"mean": 11875, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 1.024E+4, "sqDiff": 0}, "mvccIteratorStats": {"blockBytes": {"mean": 0, "sqDiff": 0}, "blockBytesInCache": {"mean": 0, "sqDiff": 0}, "keyBytes": {"mean": 0, "sqDiff": 0}, "pointCount": {"mean": 0, "sqDiff": 0}, "pointsCoveredByRangeTombstones": {"mean": 0, "sqDiff": 0}, "rangeKeyContainedPoints": {"mean": 0, "sqDiff": 0}, "rangeKeyCount": {"mean": 0, "sqDiff": 0}, "rangeKeySkippedPoints": {"mean": 0, "sqDiff": 0}, "seekCount": {"mean": 0, "sqDiff": 0}, "seekCountInternal": {"mean": 0, "sqDiff": 0}, "stepCount": {"mean": 0, "sqDiff": 0}, "stepCountInternal": {"mean": 0, "sqDiff": 0}, "valueBytes": {"mean": 0, "sqDiff": 0}}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 1, "commitLat": {"mean": 0.000002291, "sqDiff": 0}, "idleLat": {"mean": 0, "sqDiff": 0}, "maxRetries": 0, "numRows": {"mean": 0, "sqDiff": 0}, "retryLat": {"mean": 0, "sqDiff": 0}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "svcLat": {"mean": 0.008680208, "sqDiff": 0}}}

Time: 3ms total (execution 2ms / network 1ms)
```

Epic: none

Release note: Added two views to the crdb_internal catalog: crdb_internal.statement_statistics_persisted, which surfaces data in the persisted system.statement_statistics table, and crdb_internal.transaction_statistics_persisted, which surfaces the system.transaction_statistics table.

98422: kvserver: disable {split,replicate,mvccGC} queues until... r=irfansharif a=irfansharif

...subscribed to span configs. Do the same for the store rebalancer. We applied this treatment for the merge queue back in #78122 since the fallback behavior, if not subscribed, is to use the statically defined span config for every operation.

- For the replicate queue this mean obtusely applying a replication factor of 3, regardless of configuration. This was possible typically post node restart before subscription was initially established. We saw this in #98385. It was possible then for us to ignore configured voter/non-voter/lease constraints.
- For the split queue, we wouldn't actually compute any split keys if unsubscribed, so the missing check was somewhat benign. But we would be obtusely applying the default range sizes [128MiB,512MiB], so for clusters configured with larger range sizes, this could lead to a burst of splitting post node-restart.
- For the MVCC GC queue, it would mean applying the the statically configured default GC TTL and ignoring any set protected timestamps. The latter is best-effort protection but could result in internal operations relying on protection (like backups, changefeeds) failing informatively. For clusters configured with GC TTL greater than the default, post node-restart it could lead to a burst of MVCC GC activity and AOST queries failing to find expected data.
- For the store rebalancer, ignoring span configs could result in violating lease preferences and voter constraints.

Fixes #98421.
Fixes #98385.

Release note (bug fix): It was previously possible for CockroachDB to not respect non-default zone configs. This only happened for a short window after nodes with existing replicas were restarted, and self-rectified within seconds. This manifested in a few ways:
- If num_replicas was set to something other than 3, we would still add or remove replicas to get to 3x replication.
  - If num_voters was set explicitly to get a mix of voting and non-voting replicas, it would be ignored. CockroachDB could possibly remove non-voting replicas.
- If range_min_bytes or range_max_bytes were changed from 128 MiB and 512 MiB respectively, we would instead try to size ranges to be within [128 MiB, 512MiB]. This could appear as an excess amount of range splits or merges, as visible in the Replication Dashboard under "Range Operations".
- If gc.ttlseconds was set to something other than 90000 seconds, we would still GC data only older than 90000s/25h. If the GC TTL was set to something larger than 25h, AOST queries going further back may now start failing. For GC TTLs less than the 25h default, clusters would observe increased disk usage due to more retained garbage.
- If constraints, lease_preferences or voter_constraints were set, they would be ignored. Range data and leases would possibly be moved outside where prescribed. This issues only lasted a few seconds post node-restarts, and any zone config violations were rectified shortly after.

98468: sql: add closest-instance physical planning r=dt a=dt

This changes physical planning, specifically how the SQL instance for a
given KV node ID is resolved, to be more generalized w.r.t. different
locality tier taxonomies.

Previously this function had a special case that checked for, and only
for, a specific locality tier with the key "region" and if it was
found, picked a random instance from the subset of instances where their
value for that matched the value for the KV node.

Matching on and only on the "region" tier is both too specific and not
specific enough: it is "too specific" in that it requires a tier with
the key "region" to be used and to match, and is "not specific enough"
in that it simultaneously ignores more specific locality tiers that
would indicate closer matches (e.g. subregion, AZ, data-center or rack).

Instead, this change generalizes this function to identify the subset of
instances that have the "closest match" in localities to the KV node and
pick one of them, where closest match is defined as the longest matching
prefix of locality tiers. In a simple, single-tier locality taxonomy
using the key "region" this should yield the same behavior as the
previous implementation, as all instances with a matching "region" will
have the same longest matching prefix (at length 1), however this more
general approach should better handle other locality taxonomies that use
more tiers and/or tiers with names other than "region".

Currently this change only applies to physical planning for secondary
tenants until physical planning is unified for system and secondary
tenants.

Release note: none.
Epic: CRDB-16910


98471: changefeedccl: fix kafka messagetoolarge test failure r=samiskin a=samiskin

Fixes: #93847

This fixes the following bug in the TestChangefeedKafkaMessageTooLarge test setup:
1. The feed starts sending messages, randomly triggering a MessageTooLarge error causing a retry with a smaller batch size
2. Eventually, while the retrying process is still ongoing, all 2000 rows are successfully received by the mock kafka sink, causing assertPayloads to complete, causing the test to closeFeed and run CANCEL on the changefeed.
3. The retrying process gets stuck in sendMessage where it can't send the message to the feedCh which has been closed since the changefeed is trying to close, but it also can't exit on the mock sink's tg.done since that only closes after the feed fully closes, which requires the retrying process to end.

Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Shiranka Miskin <[email protected]>
@craig craig bot closed this as completed in f5902c0 Mar 13, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 13, 2023
Fixes: #93847

This fixes the following bug in the TestChangefeedKafkaMessageTooLarge
test setup:
1. The feed starts sending messages, randomly triggering a
   MessageTooLarge error causing a retry with a smaller batch size
2. Eventually, while the retrying process is still ongoing, all 2000
   rows are successfully received by the mock kafka sink, causing
   assertPayloads to complete, causing the test to closeFeed and run
   CANCEL on the changefeed.
3. The retrying process gets stuck in sendMessage where it can't send
   the message to the feedCh which has been closed since the changefeed
   is trying to close, but it also can't exit on the mock sink's tg.done
   since that only closes after the feed fully closes, which requires
   the retrying process to end.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants