-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/ccl/serverccl/statusccl/statusccl_test: TestTenantStatusAPI failed #92979
Comments
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 93ed65565357538c9048ff45c878a493f2ed9b45:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 94e03c016955dfd64d4e15358ea92226a8f362aa:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 642afd6e8c8d1f967da4aa0c3c08e2bdc0495100:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ b6434c942b4811c088349c43a81a46a53fb9b6c0:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 6eabc2f348710f4146ff984bc346cd8cbe96f83b:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 5fbcd8a8deac0205c7df38e340c1eb9692854383:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 64a867acd25c0a214209957eefb6483d1158b4f0:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 64a867acd25c0a214209957eefb6483d1158b4f0:
Parameters: Same failure on other branches
|
Attempted to repro the pagination test locally and could not |
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ a0ab818e89508ca0b65926a4faac4c563d114acf:
Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ af3bf237ce3774258a13a0c2aa0e2da22716270f:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ af3bf237ce3774258a13a0c2aa0e2da22716270f:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 7e2df35a2f6bf7a859bb0539c8ca43c4e72ed260:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ c95bef097bd4c213c6b5c0c125a9a846c4479d73:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 5b2a5670cbbe895d76602c230390816e783e0caa:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 95684388bfbb372ac8c7e1a8ebd4a0d6f447e147:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ e4924e2b9be4a36d466beab53a80df9241df4783:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 43f7cde6f036f720388912bde7310095c2f8ca9e:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ ca5ae38022c45aec434fa1dd98f58c76e37643f5:
Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 9b43dc50d468171bc9919dc684b7274b63e13da2:
Parameters: Same failure on other branches
|
I've been looking more into the This specific tests runs the following procedure:
What does it mean for this assertion to fail, which is to ask, what does it mean when a This means that there are no more ranges for The test definitely appears to be failing due to a race condition. When creating the test, we initialize the tenant, which somehow triggers the creation of ranges for the tenant. It seems that these ranges are, in rare instances, not initialized in the way we'd expect, which is what's causing this assertion to fail - we expect a certain number of ranges to exist, and they don't. I am going to look into some options to protect against this case. Can we maybe use |
Thanks for the investigation. Was there a way to wait until the ranges are ready? |
You're right, SpanConfig reconciliation is asynchronous for tenants. However, when we create a tenant, we synchronously write a SpanConfig record for it -- one that covers its entire range. The splitting of this range is asynchronous though. If there's a race condition, maybe that's the reason for it? That is, the split queue may not have come around to do its thing. |
@arulajmani thanks for the context, this is helpful. I don't believe we do anything special here when we create the tenants. You mention that the splitting of the range is async. Is it a normal procedure for a newly started tenant to split this range? I am not familiar with this level of details when it comes to creating & starting a new tenant, but if the range split is part of the process, that sounds like it could be the culprit. (I was looking to see if we have any testing knobs around this, but it doesn't seem like it) |
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 53dbb86acb1d48309530181b94838faf937084d3:
Parameters: Same failure on other branches
|
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ 9acc75317aebfdfe69fb097a8a28e0040c7a67fd:
Parameters: Same failure on other branches
|
Yeah, we expect a newly created tenant to split its range off. |
And when is this expected to happen? I see in |
#99048 requests a documentation update to outline the point-in-time nature of this endpoint. The endpoint here performed as expected. The problem lies in the test itself making some incorrect assumptions about the tenant create process being deterministic w.r.t. time when it comes to the number of ranges present once the tenant is ready. #99054 will deal with the flakes by waiting for the split queue to process sufficiently so that enough ranges exist for the test to pass. Given this, I'm going to remove the GA-blocker label as the endpoint performs as expected, we're elaborating this behavior in docs, and the client has a workaround (make the request again). |
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ b89fa2cc4bc1fb9447ab1009b34b6f354f9618f0:
Parameters: Same failure on other branches
|
98527: sql: deprecate gossip-based physical planning r=dt a=dt Previously system tenants and secondary tenants used differnt physical planning implementations, with the system tenant and only the system tenant using nodeIDs while other tenants used the instance table. This unifies those implementations such that all tenants use NodeIDs if running in mixed mode and use the instance table if not. Release note: none. Epic: CRDB-16910 98879: sql: support array-flatten subqueries within UDFs r=mgartner a=mgartner Array-flatten subqueries (e.g., `ARRAY(SELECT a FROM t)`) are now supported within UDFs. They are now converted to a normal subquery with a ScalarGroupBy if they exist within a UDF, even if they are uncorrelated. This allows them to be executed without any changes to the execbuilder or the evaluation logic of `tree.Routine`. Fixes #98738 Release note: None 98988: ui: add badges for filter elements r=maryliag a=maryliag Adds badges for each of the selected filters on SQL Activity and Insights pages. Part Of #98891 https://www.loom.com/share/e7417209fc704d71b2733f58609fb4de <img width="1160" alt="Screenshot 2023-03-19 at 1 30 49 PM" src="https://user-images.githubusercontent.com/1017486/226195412-03d3ef58-5d6d-4c24-84f1-6a55b952f5c0.png"> Release note (ui change): Adds badges for each selected filter on SQL Activity and Insights pages. 99042: sql: added indexes to system.statement_statistics, system.transaction_statistics r=ericharmeling a=ericharmeling Fixes #98624. This commit adds indexes on new computed columns to the system.statement_statistics and system.transaction_statistics tables. Epic: none Release note: None 99054: pkg/ccl: Unskip TestTenantStatusAPI/tenant_ranges/pagination r=dhartunian a=abarganier Fixes: #92979 Previously, in #97386, we skipped test_tenant_ranges_pagination because it was marked as flaky. The test makes a request for a single range and expects an offset of `1` back. It then uses this offset to request a second range, and expects an offset of `2`. This means that the test requires at least 3 ranges to exist on the tenant. The test was flaking on the assertion that the offset returned by the second request came back as `2`. Instead, it was flaking when the offset came back as `0`, which signifies that there are no more ranges to process. We learned that the tenant create process has an asycnhronous splitting of ranges that occurs, which is what would lead to this sporadic scenario where not enough ranges existed (yet) for the test to succeed. This patch updates the test with a `testutils.SucceedsSoon` clause that checks first that `crdb_internal.ranges` contains at least 3 ranges, prior to making the second request. This should provide sufficient time for the range split queue to be processed and eliminate the vast majority of these test flakes. Release note: none 99119: kvserver: mark in-flight storage checkpoints r=tbg a=pavelkalinnikov This commit makes it so that the consistency checker checkpoints are first created in a "_pending" folder, and only after completion are atomically renamed to the intended directory name. This is to help distinguish valid checkpoints from the ones that weren't finalized (for example, when the node crashed in the meantime). Part of #81819 Epic: none Release note (ops change): checkpoint directories (that can be created in the rare event of range inconsistency) are now clearly indicated as pending until they are fully populated. This is to help operators to distinguish valid checkpoints from corrupted ones. Co-authored-by: David Taylor <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: maryliag <[email protected]> Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]>
Fixes: cockroachdb#92979 Previously, in cockroachdb#97386, we skipped test_tenant_ranges_pagination because it was marked as flaky. The test makes a request for a single range and expects an offset of `1` back. It then uses this offset to request a second range, and expects an offset of `2`. This means that the test requires at least 3 ranges to exist on the tenant. The test was flaking on the assertion that the offset returned by the second request came back as `2`. Instead, it was flaking when the offset came back as `0`, which signifies that there are no more ranges to process. We learned that the tenant create process has an asycnhronous splitting of ranges that occurs, which is what would lead to this sporadic scenario where not enough ranges existed (yet) for the test to succeed. This patch updates the test with a `testutils.SucceedsSoon` clause that checks first that `crdb_internal.ranges` contains at least 3 ranges, prior to making the second request. This should provide sufficient time for the range split queue to be processed and eliminate the vast majority of these test flakes. Release note: none
Fixes: cockroachdb#92979 Previously, in cockroachdb#97386, we skipped test_tenant_ranges_pagination because it was marked as flaky. The test makes a request for a single range and expects an offset of `1` back. It then uses this offset to request a second range, and expects an offset of `2`. This means that the test requires at least 3 ranges to exist on the tenant. The test was flaking on the assertion that the offset returned by the second request came back as `2`. Instead, it was flaking when the offset came back as `0`, which signifies that there are no more ranges to process. We learned that the tenant create process has an asycnhronous splitting of ranges that occurs, which is what would lead to this sporadic scenario where not enough ranges existed (yet) for the test to succeed. This patch updates the test with a `testutils.SucceedsSoon` clause that checks first that `crdb_internal.ranges` contains at least 3 ranges, prior to making the second request. This should provide sufficient time for the range split queue to be processed and eliminate the vast majority of these test flakes. Release note: none
pkg/ccl/serverccl/statusccl/statusccl_test.TestTenantStatusAPI failed with artifacts on master @ ca079ac1fc8a8f7a5360f5b1f4fa210c14dc8410:
Parameters:
TAGS=bazel,gss
Help
See also: How To Investigate a Go Test Failure (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-22078
The text was updated successfully, but these errors were encountered: