ccl/changefeedccl/cdceval: TestEvaluator failed #108348

cockroach-teamcity · 2023-08-08T05:48:36Z

ccl/changefeedccl/cdceval.TestEvaluator failed with artifacts on master @ 1382b26a97bf6f70a07d363dc319283c173359eb:

=== RUN   TestEvaluator
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/4f079cd13f8a776a99f45e473cd32df8/logTestEvaluator4214666620
    test_log_scope.go:81: use -show-logs to present logs inline
=== CONT  TestEvaluator
    expr_eval_test.go:628: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/4f079cd13f8a776a99f45e473cd32df8/logTestEvaluator4214666620
--- FAIL: TestEvaluator (16.07s)
=== RUN   TestEvaluator/user_defined_function
    expr_eval_test.go:573: timeout reading row
    --- FAIL: TestEvaluator/user_defined_function (5.04s)

Parameters: TAGS=bazel,gss,deadlock , stress=true

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/cdc _{This test on roachdash | Improve this report!

Jira issue: CRDB-30434
Epic CRDB-11732}

The text was updated successfully, but these errors were encountered:

106130: sql: add extra information to protocol errors in bind r=rafiss a=cucaroach A user is running into mysterious protocol errors when using prepared statements. Add some extra information to the error message to help guide the investigation. Informs: https://github.com/cockroachlabs/support/issues/2184 Release note: none Epic: none 107912: sql: fix exec+audit logs for BEGIN, COMMIT, SET stmts r=rafiss a=rafiss Epic: None Release note (bug fix): Fixed a bug where BEGIN, COMMIT, SET, ROLLBACK, and SAVEPOINT statements would not be written to the execution or audit logs. 108093: jobs: avoid crdb_internal.system_jobs in gc-jobs r=miretskiy a=stevendanna The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table. For the previous query, SELECT id, payload, status FROM "".crdb_internal.system_jobs WHERE (created < $1) AND (id > $2) ORDER BY id LIMIT $3 this is a little suboptimal because: - We don't make use of the progress column so any read of that is useless. - While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used: • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • virtual table table: system_jobs@system_jobs_id_idx spans: [/101 - ] This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used: ``` WITH latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC), latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC) SELECT DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress, created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type FROM system.jobs AS j INNER JOIN latestpayload AS payload ON j.id = payload.job_id LEFT JOIN latestprogress AS progress ON j.id = progress.job_id ``` which has a full scan of the jobs table and 2 full scans of the info table: • distinct │ distinct on: id, value, value │ └── • merge join │ equality: (job_id) = (id) │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • merge join (right outer) │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,317 │ │ filter: info_key = 'legacy_progress' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • scan missing stats table: jobs@primary spans: FULL SCAN Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great. In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table. • distinct │ distinct on: id, value │ └── • merge join │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: [/101/'legacy_payload' - ] │ └── • render │ └── • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • scan missing stats table: jobs@primary spans: [/101 - ] In a local example, this does seem faster: > SELECT id, payload, status, created > FROM "".crdb_internal.system_jobs > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100; id | payload | status | created -----+---------+--------+---------- (0 rows) Time: 183ms total (execution 183ms / network 0ms) > WITH > latestpayload AS ( > SELECT job_id, value > FROM system.job_info AS payload > WHERE job_id > 100 AND info_key = 'legacy_payload' > ORDER BY written desc > ), > jobpage AS ( > SELECT id, status, created > FROM system.jobs > WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100) > ORDER BY id > LIMIT 100 > ) > SELECT distinct (id), latestpayload.value AS payload, status > FROM jobpage AS j > INNER JOIN latestpayload ON j.id = latestpayload.job_id; id | payload | status -----+---------+--------- (0 rows) Time: 43ms total (execution 42ms / network 0ms) Release note: None Epic: none 108313: cli: debug zip uses read-only range probe r=joshimhoff a=dhartunian To reduce the chance of corruption issues when run against older clusters, the recently added use of `crdb_internal.probe_ranges` is modified to use a `read` probe instead of a `write` probe. See discussion in #107720 Release note: None Epic: None 108335: kvcoord: Implement CloseStream for MuxRangeFeed r=miretskiy a=miretskiy Extend MuxRangeFeed protocol to support explicit, caller initiated CloseStream operation. The caller may decide to stop receiving events for a particular stream, which is part of MuxRangeFeed. The caller may issue a request to MuxRangeFeed server to close the stream. The server will cancel underlying range feed, and return a `RangeFeedRetryError_REASON_RANGEFEED_CLOSED` error as a response. Note, current mux rangefeed clinet does not use this request. The code to support cancellation is added pre-emptively in case this functionality will be required in the future to support restarts due to stuck rangefeeds. Epic: CRDB-26372 Release note: None 108355: roachtest/multitenant-upgrade: hard-code predecessor version temporarily r=knz a=healthy-pod Hard-code the pre-decessor release to 23.1.4 until a new patch release is out (23.1.9) because the test is in-compatible with 23.1.{5,6,7,8} due to an erroneous PR merged on the 23.1 branch. Release note: None Epic: none 108375: changefeedccl: Increase test utility timeout r=miretskiy a=miretskiy As observed in #108348, a test failed because it timed out reading a row. Yet, this flake could not be reproduced in over 50k runs. Bump timeout period to make this flake even less likely. Fixes #108348 Release note: None Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: healthy-pod <[email protected]>

As observed in #108348, a test failed because it timed out reading a row. Yet, this flake could not be reproduced in over 50k runs. Bump timeout period to make this flake even less likely. Fixes #108348 Release note: None

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cdc labels Aug 8, 2023

cockroach-teamcity added this to the 23.2 milestone Aug 8, 2023

blathers-crl bot added the A-cdc Change Data Capture label Aug 8, 2023

miretskiy self-assigned this Aug 8, 2023

miretskiy mentioned this issue Aug 8, 2023

changefeedccl: Increase test utility timeout #108375

Merged

craig bot closed this as completed in a1cf907 Aug 8, 2023

blathers-crl bot mentioned this issue Aug 18, 2023

release-23.1: changefeedccl: Increase test utility timeout #109003

Merged

blathers-crl bot mentioned this issue Aug 18, 2023

release-22.2: changefeedccl: Increase test utility timeout #109004

Merged

blathers-crl bot mentioned this issue Aug 18, 2023

release-23.1.9-rc: changefeedccl: Increase test utility timeout #109005

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ccl/changefeedccl/cdceval: TestEvaluator failed #108348

ccl/changefeedccl/cdceval: TestEvaluator failed #108348

cockroach-teamcity commented Aug 8, 2023 •

edited by exalate-issue-sync bot

Loading

ccl/changefeedccl/cdceval: TestEvaluator failed #108348

ccl/changefeedccl/cdceval: TestEvaluator failed #108348

Comments

cockroach-teamcity commented Aug 8, 2023 • edited by exalate-issue-sync bot Loading

cockroach-teamcity commented Aug 8, 2023 •

edited by exalate-issue-sync bot

Loading