Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testserver: racy access to DisableDefaultTestTenant #104500

Closed
knz opened this issue Jun 7, 2023 · 0 comments · Fixed by #108762
Closed

testserver: racy access to DisableDefaultTestTenant #104500

knz opened this issue Jun 7, 2023 · 0 comments · Fixed by #108762
Assignees
Labels
A-multitenancy Related to multi-tenancy A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team

Comments

@knz
Copy link
Contributor

knz commented Jun 7, 2023

Describe the problem

This code in (ts *TestServer) maybeStartDefaultTestTenant:

  if err := base.CheckEnterpriseEnabled(ts.st, clusterID(), "SQL servers"); err != nil {
    ts.cfg.DisableDefaultTestTenant = true   // <- HERE
    return nil
  }

possibly races with the following code in (s *Server) makeSharedProcessTenantConfig:

  parentCfg := s.cfg   // <- HERE

The race is triggered when the following conditions hold simultaneously:

  • ccl.TestingEnableEnterprise() has not been called
  • a test server is started via serverutils.StartServer
  • concurrently, the test makes the server controller internally start a tenant server (eg as side effect of alter tenant start service shared)

These conditions are rather unlikely and currently only happen in configprofiles.TestDataDriven (this is the root cause of test failures #102800 and #102408.)

Expected behavior

I don't think it's safe/reasonable for the code in (ts *TestServer) maybeStartDefaultTestTenant to write to its ts.cfg.

I also don't think it's even necessary for this logic.

Jira issue: CRDB-28566
Epic: CRDB-18499

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure A-multitenancy Related to multi-tenancy T-multitenant Issues owned by the multi-tenant virtual team labels Jun 7, 2023
knz added a commit to knz/cockroach that referenced this issue Jun 7, 2023
This test was experiencing a failure due to a race condition.
The root cause is described in issue cockroachdb#104500 and will be addressed
separately. This change ensures that the test does not encounter the
code path at all.

Release note: None
craig bot pushed a commit that referenced this issue Jun 7, 2023
98889: kvserver: increase rebalance action impact req r=andrewbaptist a=kvoli

Previously, the store rebalancer would attempt to transfer leases or relocate replicas of a range so long as the load of the range (estimated using the local leaseholder replica) was greater than 0.1% of the store's load.

This commit updates the value to be higher for both lease rebalancing and range rebalancing, specifically:

lease rebalancing: 0.1% -> 0.5%
range rebalancing: 0.1% -> 2.0%

The documented rationale, as for why these values exist is also the reason for the increase: decrease unnecessary churn. Only the hottest ranges are considered, when the impact of a rebalance action for these ranges is estimated to be less than a small fraction of the store's total load, it is indicative that the distribution of replica load is not skewed. Letting the replicate queue balance replica and lease counts is a better option in such distributions (until we consolidate rebalancing responsibility).

Making the minimum for range rebalancing higher is also added, as multiple replicas for a range is inherently more costly than moving a lease.

Resolves: #98890

Release note: None

102813: kvserver: write to system.rangelog async r=nvanbenschoten a=miraradeva

Previously, writing to the system.rangelog table was done as part of the
transaction that triggered the range change (e.g. split, merge,
rebalance). If the logging failed for some reason (e.g. JSON being
logged was too large), the entire transaction needed to be retried.
This has caused at least one major incident.

This change introduces the option to write to system.rangelog async, and
not as part of the original transaction; this option is also now used
by default for all writes to system.rangelog. When logging async, the
actual write to system.rangelog is done in an async task executed as a
commit trigger after the original transaction ends.

Epic: [CRDB-16517](https://cockroachlabs.atlassian.net/browse/CRDB-16517)

Fixes: #82538

Informs: #104075

Release note: None

104370: storage: better error message for non-dev store with dev binary r=RaduBerinde a=RaduBerinde

Release note: None
Fixes: #103647

104454: bazel: mark targets as being `no-remote-exec` instead of `no-remote` r=healthy-pod a=rickystewart

`no-remote` implies `no-remote-cache` as well. Properly these targets can be cached remotely, they just can't be executed remotely.

Epic: CRDB-8308
Release note: None

104501: configprofiles: deflake TestDataDriven r=itsbilal a=knz

Informs #104500.
Fixes #102408.

This test was experiencing a failure due to a race condition. The root cause is described in issue #104500 and will be addressed separately. This change ensures that the test does not encounter the code path at all.

Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Mira Radeva <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jun 7, 2023
This test was experiencing a failure due to a race condition.
The root cause is described in issue #104500 and will be addressed
separately. This change ensures that the test does not encounter the
code path at all.

Release note: None
@knz knz self-assigned this Aug 15, 2023
@craig craig bot closed this as completed in #108762 Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team
Projects
None yet
1 participant