-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test: TestCCLLogic_regional_by_row_read_committed failed #111481
Comments
A bisect of this points to d0c46fe |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ fad649d89721ddb3e9f3dcab1ad5d14f74c91bc9:
|
This test also fails under stress at the commit prior to d0c46fe: 4b2badd, but with a different error:
|
d0c46fe added the logic test that is failing, but I don't think the failure is related to read committed isolation. I'm able to reproduce it using this smaller logic test which runs entirely at serializable isolation:
Notice the slightly changed error message, it always seems to happen when key is |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ f6f355b50e0dbf28633e25ddd05f2775141af31e:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 4494e945a9070862cfae1b668b3f8b667c41be17:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 1b83c04ed0106dd0d3380821707861008ec32b73:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 3b438b4a59ad4759e5bf22ff0b6dd6e678c2be0d:
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 7e1f909e9f2ddcef2e418d6c2df11f3139a2b85a:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 5af56c0ecd701d452c19fb89908012294b5336aa:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 619aa6edad0269f75d12cb7d4b88b52f09825c7b:
Parameters: Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 45f6344a2fc7d2d52f410e1ecad98f8c928cfeea:
Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 45f6344a2fc7d2d52f410e1ecad98f8c928cfeea:
Parameters: Same failure on other branches
|
I'm making some incremental progress on understanding this issue. I found that the reproduction cycle is faster for me with the original
(Note that if at least one request in a The interesting bit is that I added a "hypothetical error" by additionally attempting to get an error using Thus, it seems like we're setting |
I think it has something to do with range splits. I disabled |
I think I understand what's happening. The problem is in the DistSender when it has to retry a BatchRequest because the range cache had stale routing information.
Here is what I think is happening:
AFAICT this bug has been present for a long time, and it's not multitenant-specific - we just happened to hit it in this config. |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed on master @ 59a69ae0c5474a8b323afac269055bf6ce8afa34:
Parameters: Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed on master @ 59a69ae0c5474a8b323afac269055bf6ce8afa34:
Parameters: Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed on master @ 59a69ae0c5474a8b323afac269055bf6ce8afa34:
Parameters: Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed on master @ 59a69ae0c5474a8b323afac269055bf6ce8afa34:
Parameters: Same failure on other branches
|
FYI: Previous 4 comments are failures on EngFlow. Failures seem the same as previous. |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed on master @ bf9a22dc85683966ee65e84d3eeadf2b44185127:
Parameters: Same failure on other branches
|
I should get to it this week, so I'll take this over. |
I think this diff
should fix the problem. I'm currently thinking through (and asking for pointers) on how to write a targeted unit test. |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 8315a4bc997fb8b8d679079e14d6d7ca94d53bc6:
Parameters: Same failure on other branches
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 2205334339583cea57db33e3c0d992f672e660e8:
|
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ bfc5c35b112fe612d837ef247b2c61e532e8184d:
Parameters: |
pkg/ccl/logictestccl/tests/multiregion-9node-3region-3azs-tenant/multiregion-9node-3region-3azs-tenant_test.TestCCLLogic_regional_by_row_read_committed failed with artifacts on master @ 27e7c166b5591b0e05bb2fd23780309b490d7af9:
Parameters:
TAGS=bazel,gss
,stress=true
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-31919
The text was updated successfully, but these errors were encountered: