-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/changefeedccl/cdceval: TestEvaluator failed #108348
Labels
A-cdc
Change Data Capture
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
T-cdc
Milestone
Comments
cockroach-teamcity
added
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
T-cdc
labels
Aug 8, 2023
craig bot
pushed a commit
that referenced
this issue
Aug 8, 2023
106130: sql: add extra information to protocol errors in bind r=rafiss a=cucaroach A user is running into mysterious protocol errors when using prepared statements. Add some extra information to the error message to help guide the investigation. Informs: https://github.com/cockroachlabs/support/issues/2184 Release note: none Epic: none 107912: sql: fix exec+audit logs for BEGIN, COMMIT, SET stmts r=rafiss a=rafiss Epic: None Release note (bug fix): Fixed a bug where BEGIN, COMMIT, SET, ROLLBACK, and SAVEPOINT statements would not be written to the execution or audit logs. 108093: jobs: avoid crdb_internal.system_jobs in gc-jobs r=miretskiy a=stevendanna The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table. For the previous query, SELECT id, payload, status FROM "".crdb_internal.system_jobs WHERE (created < $1) AND (id > $2) ORDER BY id LIMIT $3 this is a little suboptimal because: - We don't make use of the progress column so any read of that is useless. - While the crdb_internal.virtual table has a virtual index on job id, and EXPLAIN will even claim that it will be used: • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • virtual table table: system_jobs@system_jobs_id_idx spans: [/101 - ] This is actually a lie. A virtual index can only handle single-key spans. As a result the unconstrained query is used: ``` WITH latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC), latestprogress AS (SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC) SELECT DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress, created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type FROM system.jobs AS j INNER JOIN latestpayload AS payload ON j.id = payload.job_id LEFT JOIN latestprogress AS progress ON j.id = progress.job_id ``` which has a full scan of the jobs table and 2 full scans of the info table: • distinct │ distinct on: id, value, value │ └── • merge join │ equality: (job_id) = (id) │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • merge join (right outer) │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,317 │ │ filter: info_key = 'legacy_progress' │ │ │ └── • scan │ estimated row count: 14,648 (100% of the table; stats collected 39 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: FULL SCAN │ └── • scan missing stats table: jobs@primary spans: FULL SCAN Because of the limit, I don't think this ends up being as bad as it looks. But it still isn't great. In this PR, we replace crdb_internal.jobs with a query that removes the join on the unused progress field and also constrains the query of the job_info table. • distinct │ distinct on: id, value │ └── • merge join │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 7,318 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 14,646 (100% of the table; stats collected 45 minutes ago; using stats forecast for 2 hours in the future) │ table: job_info@primary │ spans: [/101/'legacy_payload' - ] │ └── • render │ └── • limit │ count: 100 │ └── • filter │ filter: created < '2023-07-20 07:29:01.17001' │ └── • scan missing stats table: jobs@primary spans: [/101 - ] In a local example, this does seem faster: > SELECT id, payload, status, created > FROM "".crdb_internal.system_jobs > WHERE (created < '2023-07-20 07:29:01.17001') AND (id > 100) ORDER BY id LIMIT 100; id | payload | status | created -----+---------+--------+---------- (0 rows) Time: 183ms total (execution 183ms / network 0ms) > WITH > latestpayload AS ( > SELECT job_id, value > FROM system.job_info AS payload > WHERE job_id > 100 AND info_key = 'legacy_payload' > ORDER BY written desc > ), > jobpage AS ( > SELECT id, status, created > FROM system.jobs > WHERE (created < '2023-07-20 07:29:01.17001') and (id > 100) > ORDER BY id > LIMIT 100 > ) > SELECT distinct (id), latestpayload.value AS payload, status > FROM jobpage AS j > INNER JOIN latestpayload ON j.id = latestpayload.job_id; id | payload | status -----+---------+--------- (0 rows) Time: 43ms total (execution 42ms / network 0ms) Release note: None Epic: none 108313: cli: debug zip uses read-only range probe r=joshimhoff a=dhartunian To reduce the chance of corruption issues when run against older clusters, the recently added use of `crdb_internal.probe_ranges` is modified to use a `read` probe instead of a `write` probe. See discussion in #107720 Release note: None Epic: None 108335: kvcoord: Implement CloseStream for MuxRangeFeed r=miretskiy a=miretskiy Extend MuxRangeFeed protocol to support explicit, caller initiated CloseStream operation. The caller may decide to stop receiving events for a particular stream, which is part of MuxRangeFeed. The caller may issue a request to MuxRangeFeed server to close the stream. The server will cancel underlying range feed, and return a `RangeFeedRetryError_REASON_RANGEFEED_CLOSED` error as a response. Note, current mux rangefeed clinet does not use this request. The code to support cancellation is added pre-emptively in case this functionality will be required in the future to support restarts due to stuck rangefeeds. Epic: CRDB-26372 Release note: None 108355: roachtest/multitenant-upgrade: hard-code predecessor version temporarily r=knz a=healthy-pod Hard-code the pre-decessor release to 23.1.4 until a new patch release is out (23.1.9) because the test is in-compatible with 23.1.{5,6,7,8} due to an erroneous PR merged on the 23.1 branch. Release note: None Epic: none 108375: changefeedccl: Increase test utility timeout r=miretskiy a=miretskiy As observed in #108348, a test failed because it timed out reading a row. Yet, this flake could not be reproduced in over 50k runs. Bump timeout period to make this flake even less likely. Fixes #108348 Release note: None Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Steven Danna <[email protected]> Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: healthy-pod <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cdc
Change Data Capture
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
T-cdc
ccl/changefeedccl/cdceval.TestEvaluator failed with artifacts on master @ 1382b26a97bf6f70a07d363dc319283c173359eb:
Parameters:
TAGS=bazel,gss,deadlock
,stress=true
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-30434
Epic CRDB-11732
The text was updated successfully, but these errors were encountered: