-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/logictestccl: TestTenantLogic failed #63466
Comments
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 617e1ed8b755fcb488fd90c15c69eb68ba1dd64e:
Reproduce
To reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 8939515baf9fee3b605ace650629c24096fc38fd:
Reproduce
To reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ b35fe177c0fb6bf58ba00357e152ecaf6c1e0a2a:
Reproduce
To reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
@nvanbenschoten not sure if you're already working on this, but this looks like it might be a Queries issue |
AFAICT the test assumes that this is enough to trigger auto stats:
This is 1000 rows. Our random coin is tuned to get stats for 500 rows, so it wouldn't be surprising that we get unlucky sometimes. That would not explain why we only saw this starting in April though. |
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ c995342ead51e08f8ed1155de4218d30a00d86d2:
ReproduceTo reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 1c46e1cd4e5be986bf9d13799bb7e13ddc896ed2:
ReproduceTo reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 967ed00f80981ce8848a5e8144ee6fbd29bc95bb:
ReproduceTo reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
Same failure on other branches
|
I was able to reproduce this by running |
Help me understand how this test works in the first place :) The table has 1000 rows and we update 300 of them. But the "target rows" will be 0.2 * 1000 + 500 = 700.. So if an update gets triggered, it is because of luck or some of the other heuristics (avg refresh time). In addition, note that the mutation batch size can randomly be reduced in tests. Then we may be processing one mutated row at a time, and that has some non-trivial probability of not firing even if the updated count was higher than the target rows (e.g it's about |
Sorry for the slow reply... In cockroach/pkg/sql/logictest/logic.go Line 1542 in dd2884a
So
This test has |
Yes (to be precise it is now 100k after Andrei's recent change). |
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 9ba8499e80a3234da094e061827f1c23d9d33341:
ReproduceTo reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 7d752a1ecac82d6f44ddb0a3b199f94a6ff85d76:
ReproduceTo reproduce, try: make stressrace TESTS=TestTenantLogic PKG=./pkg/ccl/logictestccl TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1 Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ d91fead28392841a943251842fbd43a0affb2eca:
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ f75d6d7ac298c3852f9c5c156b136a8170f483c2:
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 28bb1ea049da5bfb6e15a7003cd7b678cbc4b67f: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ a5e32e1d18aa07a65eb44063177a6c196623f360: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 506d129f5f187134c35e2f71860490e044fde989: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 39923c0b11d229b394fa5498ee58e455cae8ec99: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ e89328d92398a3e2d6487179845a51e7f1caa435: Fatal error:
Stack:
Log preceding fatal error
HelpSee also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:
Same failure on other branches
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 4c1b9fb7ac9e058111cebacdb4d98c9ebacf6bf7:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ d8278a096c4164b6be0eb5d6a77b2838c527dc84:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 912964e02ddd951c77d4f71981ae18b3894e9084: Fatal error:
Stack:
Log preceding fatal error
Help
See also: How To Investigate a Go Test Failure (internal)
|
I've finally had a chance to make some progress on this. I enabled event logging for auto stats and added a print statement in the automatic stats refresher code to see what was going on, and here is a snippet of the log output when the test fails:
In particular, notice these four lines, which correspond to the print statement I added:
We only force a refresh if Normally, these four lines are combined into one:
Since Clearly, there is something about the |
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 6aa6c727b1d990bc0e3e8fbc36e25fc358ba39c1:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ ae101ea32d99a9142a319b0a1f6850ee76d55cd9:
Help
See also: How To Investigate a Go Test Failure (internal)
|
I'm going to go ahead and disable |
75451: backupccl,spanconfig,kvserver: ExportRequest noops on ephemeral ranges r=adityamaru a=adityamaru This change is the first of two changes that gets us to the goal of backup ignoring certain table row data, and not holding up GC on these ranges. This change does a few things: - It sets up the transport of the exclude_data_from_backup bit set on a table descriptor, to the span configuration applied in KV. - It teaches ExportRequest on a range marked as excluded to return an empty ExportResponse. In this way, a backup processor will receive no row data to backup up for an ephemeral table. - A follow up change will also teach the SQLTranslator to not populate the protected timestamp field on the SpanConfig for such tables. This way, a long running backup will not hold up GC on such high-churn tables. With no protection on such ranges, it is possible that an ExportRequest targetting the range has a StartTime below the range's GCThreshold. To avoid the returned BatchTimestampBeforeGCError from failing the backup we decorate the the error with information about the range being excluded from backup and handle the error in the backup processor. Informs: #73536 Release note (sql change): BACKUP of a table marked with `exclude_data_from_backup` via `ALTER TABLE ... SET (exclude_data_from_backup = true)` will no longer backup that table's row data. The backup will continue to backup the table's descriptor and related metadata, and so on restore we will end up with an empty version of the backed up table. 76459: kvnemesis: update table ID r=RaduBerinde a=RaduBerinde These tests hardcode a table ID of 50. This now overlaps with the tenant_settings table. Updating to 100, which is now the first user-created ID in a new cluster. Release note: None 76480: kvstreamer: remove a memory leak r=yuzefovich a=yuzefovich At the moment, we have a memory leak of `Streamer` objects (although nil-ed out) because of `SetOnChange` handler of the streamer concurrency limit cluster setting and passing in a closure into `Stopper.AddCloser`. This was copied over from the `DistSender` code, but a crucial difference wasn't appreciated - we have a single global `DistSender` that lives throughout the uptime of the server whereas each `Streamer` object lives only during the query execution. We don't need to dynamically react to changes in the streamer concurrency limits, so this commit removes the handler. The closure has been refactored too. Fixes: #76471. Release note: None 76481: colexec: remove log scope from benchmarks r=yuzefovich a=yuzefovich Using the log scope for benchmarks is not necessary and produces somewhat annoying output where the benchmark results are alternating with the log scope messages. Release note: None 76486: opt: fix bug in histogram estimation code for multi-column spans r=rytaft a=rytaft This commit fixes a bug in the histogram estimation code, which could cause the optimizer to think that an index scan produced 0 rows, when in fact it produced a large number. This was due to an inaccurate assumption in the histogram filtering code that if a span had an exclusive boundary, the upper bound of the span was excluded from the histogram. However, this failed to account for the fact that we support constraining a histogram with multi-column spans, and we can select different column offsets to use to constrain the histogram. The assumption above is only valid if the column offset corresponds to the last column in the span key. This logic has now been fixed. Fixes #76485 Release note (performance improvement): Fixed a bug in the histogram estimation code that could cause the optimizer to think a scan of a multi-column index would produce 0 rows, when in fact it would produce many rows. This could cause the optimizer to choose a suboptimal plan. This bug has now been fixed, making it less likely for the optimizer to choose a suboptimal plan when multiple multi-column indexes are available. 76518: sql/builtins: remove the `root` special case r=rafiss,dt a=knz Discovered by `@dt.` This was leftover complexity from an earlier age. Release note (sql change): The buil-in functions `crdb_internal.force_panic`, `crdb_internal.force_log_fatal`, `crdb_internal.set_vmodule`, `crdb_internal.get_vmodule` are now available to all `admin` users, not just `root`. 76519: sql: deflake TestTenantLogic/3node-tenant/distsql_automatic_stats r=rytaft a=rytaft This commit disables the `3node-tenant` config for the `distsql_automatic_stats` automatic stats test since it's flaky. It also adds comments to explain why. Fixes #63466 Release note: None Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
This commit disables the 3node-tenant config for the distsql_automatic_stats automatic stats test since it's flaky. It also adds comments to explain why. Fixes cockroachdb#63466 Release note: None
ccl/logictestccl.TestTenantLogic failed with artifacts on master @ 3e543621a81be7953bee980a6493beec952c693d:
Reproduce
To reproduce, try:
Parameters in this failure:
Same failure on other branches
See this test on roachdash
Improve this report!
The text was updated successfully, but these errors were encountered: