Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl: TestCCLLogic occasionally gets stuck #83630

Closed
yuzefovich opened this issue Jun 30, 2022 · 4 comments
Closed

ccl: TestCCLLogic occasionally gets stuck #83630

yuzefovich opened this issue Jun 30, 2022 · 4 comments
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@yuzefovich
Copy link
Member

yuzefovich commented Jun 30, 2022

I've been trying to figure out the cause of the increase in "Execution timeouts" (here you can see the failures on the staging branch). I think I ran into three such timeouts manually, on the gceworker.

First time on CREATE UNIQUE INDEX idx_uniq_hash_email ON t_unique_hash_sec_key (email) USING HASH; in regional_by_row_hash_sharded_index_query_plan.

Second time on CREATE INDEX ON t_to_be_hashed (b) USING HASH;; in regional_by_row_hash_sharded_index.

Third time on

[00:33:24] SELECT * FROM crdb_internal.validate_multi_region_zone_configs();
[00:33:24] --- done: /home/yuzefovich/go/src/github.com/cockroachdb/cockroach/pkg/ccl/logictestccl/testdata/logic_test/regional_by_table_placement_restricted with config multiregion-9node-3region-3azs: 23 tests, 0 failures
[00:33:24] --- total progress: 2523 statements
=== RUN   TestCCLLogic/multiregion-9node-3region-3azs-tenant
=== RUN   TestCCLLogic/multiregion-9node-3region-3azs-tenant/multi_region

Unfortunately, I only have the logs for the third failure (first two were through bazel that doesn't dump the goroutines). Here is the log from the third timeout.
ccl.zip

I didn't analyze this log, but this seems like a schema-related issue, so putting it into the corresponding project board.

Jira issue: CRDB-17180

@yuzefovich yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 30, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 30, 2022
@ajwerner
Copy link
Contributor

There's some interesting things in the logs you uploaded. The test is stuck in cluster creation. This is a tenant test. It is stuck setting a cluster setting:

github.com/cockroachdb/cockroach/pkg/sql/logictest.(*logicTest).newCluster(0xc0136166c0, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, {0xc00c0c4cf0, 0x1, 0x1})
        /home/yuzefovich/go/src/github.com/cockroachdb/cockroach/pkg/sql/logictest/logic.go:1893 +0x114e
github.com/cockroachdb/cockroach/pkg/sql/logictest.(*logicTest).setup(0xc0136166c0, {{0x507eefa, 0x25}, 0x9, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, ...}, ...)
        /home/yuzefovich/go/src/github.com/cockroachdb/cockroach/pkg/sql/logictest/logic.go:2180 +0x2a6
github.com/cockroachdb/cockroach/pkg/sql/logictest.RunLogicTestWithDefaultConfig.func1.1(0xc007524820)
        /home/yuzefovich/go/src/github.com/cockroachdb/cockroach/pkg/sql/logictest/logic.go:4502 +0x3d2

if _, err := conn.Exec("SET CLUSTER SETTING kv.tenant_rate_limiter.rate_limit = 100000"); err != nil {
t.Fatal(err)
}

Looking a bit deeper, it appears to be some sort of deadlock. One thing which stuck out to me is that one of the transaction is trying to write an event log. One observation is that the transaction handling inside of the cluster settings code is a little bit surprising and opaque. There's a restriction that you not set cluster settings in an explicit or multi-statement transaction presumably to support manual transaction management here:

err = execCfg.DB.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {

The transaction which appears blocked is actually underneath the logFn that was passed into that function. I think this all points away from schema related thing and towards cluster setting and event-log related things.

@ajwerner
Copy link
Contributor

Oh, I should note it's writing a cluster setting to the system tenant IIUC.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 1, 2022

Can confirm this is also waiting on request lease. I think this and #83664 are going to end up being #83687.

@knz knz added A-multitenancy Related to multi-tenancy and removed T-sql-schema-deprecated Use T-sql-foundations instead labels Jul 11, 2022
@nvanbenschoten
Copy link
Member

Solved by 139dc42.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

4 participants