Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testutils: the multi-tenant test configs are too complex #106772

Open
knz opened this issue Jul 13, 2023 · 1 comment
Open

testutils: the multi-tenant test configs are too complex #106772

knz opened this issue Jul 13, 2023 · 1 comment
Labels
A-multitenancy Related to multi-tenancy A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-multitenant Issues owned by the multi-tenant virtual team

Comments

@knz
Copy link
Contributor

knz commented Jul 13, 2023

xref #76378

Problem description

There are too many test configs.

  • Whether or not to start a tenant server to run the test's queries:

    • random (default - desirable)
    • always-off
    • always-on
  • Which tenant server config to use:

    • Serverless-like config, using a networked KV tenant config + rate limiting - StartTenant()
    • Dedicated-like config - StartSharedProcessTenant()

All these choices results in (currently) 6 different possible test configurations.

In addition to that complexity, some tests ignore the config above and configure their test tenants manually themselves.

In those tests, it's possible to see at run-time both the test tenant created by the test framework, and the test tenant created by the test code, side-by-side. This is confusing.

Expected solution

We want to collapse the selection of whether or not to use a test tenant and which server config to use into just 1 knob with just 2 possible values:

  • auto-test-tenant-probabilistically (default):

    • in CCL builds: 30% chance to have no secondary tenant; 30% to use serverless-like config, 30% to use dedicated-like config.
    • in OSS builds: 30% chance to have no secondary tenant; 60% chance to use the dedicated-like config.
  • no-auto-test-tenant: no secondary tenant created by the test framework.

The latter setting can have informational variants (like those defined in #106768) to distinguish cases where there's a bug (yet to be fixed) and from cases when the test truly wishes no automatic behavior.

Jira issue: CRDB-29709

Epic CRDB-31933

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure A-multitenancy Related to multi-tenancy labels Jul 13, 2023
@knz
Copy link
Contributor Author

knz commented Jul 13, 2023

Another thought that came up during tech pod today:

use the same mechanism "server controller" to start the secondary tenants in TestServer, both for the serverless-like and the dedicated-like configs. Currently TestServer uses bespoke orchestration for the serverless-like case (StarTenant) but we could instead make server controller responsible for running the service.

craig bot pushed a commit that referenced this issue Jul 14, 2023
106768: base,server,testutils: clarify the multi-tenant test situations r=yuzefovich a=knz

Informs #76378 .
Informs #106772.
Epic: CRDB-18499

Prior to this patch, a test could choose only between the following
options:

- `TestTenantProbabilisticOnly` and `TestTenantProbabilistic`:
  whether or not the load goes to a secondary tenant is randomly
  selected for each test.

- `TestTenantEnabled`: always use a secondary tenant.

- `TestTenantDisabled`: always disable the secondary tenant.

In particular the last option was problematic, as we ended up using it
both in the case "the test doesn't work with secondary tenants and
*we don't know why*" and the case "it doesn't work, we know why
*and it's OK to keep it that way*".

We actually want to distinguish these cases, because the former
case requires further investigation, and we want the test code
to refer to this follow-up work more clearly.

To address this, this commit defines the following options:

```go
// (UNCHANGED)
TestTenantProbabilisticOnly
// (UNCHANGED)
TestTenantProbabilistic

// RENAMED: was previously "TestTenantEnabled"
//
// TestTenantAlwaysEnabled will always start the default test tenant. This is useful
// for quickly verifying that a test works with tenants enabled.
//
// Note: this value should not be used for checked in test code
// unless there is a good reason to do so. We want the common case
// to use TestTenantProbabilistic or TestTenantProbabilisticOnly.
TestTenantAlwaysEnabled

// RENAMED: was previously "TestTenantDisabled"
//
// TODOTestTenantDisabled should not be used anymore. Use the
// other values instead.
// TODO(#76378): Review existing tests and use the proper value instead.
TODOTestTenantDisabled = DefaultTestTenantOptions{testBehavior: ttDisabled}

// (NEW) TestIsSpecificToStorageLayerAndNeedsASystemTenant is used when
// the test needs to be given access to a SQL conn to a tenant with
// sufficient capabilities to access all the storage layer.
// (Initially that'd be "the" system tenant.)
TestIsSpecificToStorageLayerAndNeedsASystemTenant

// (NEW) TestControlsTenantsExplicitly is used when the test wants to
// manage its own secondary tenants and tenant servers.
TestControlsTenantsExplicitly

// (NEW) TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs is used when
// a test wants to use a single set of testing knobs for both the
// storage layer and the SQL layer and for simplicity of the test
// code we want to give that test a simplified environment.
//
// Note: it is debatable whether the gain in test code simplicity is
// worth the cost of never running that test with the virtualization
// layer active.
TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs

// (NEW) TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet can be used
// to disable virtualization because the test doesn't appear to be compatible
// with it, and we don't understand it yet.
// It should link to a github issue with label C-test-failure.
TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet(issueNumber int)

// (NEW) TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet can be
// used to disable virtualization because the test exercises a feature
// known not to work with virtualization enabled yet but we wish it
// to work eventually.
//
// It should link to a github issue with label C-bug
// and the issue should be linked to an epic under INI-213 or INI-214.
TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(issueNumber int)
```

Release note: None



Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz added the db-cy-23 label Jul 26, 2023
@knz knz added the T-multitenant Issues owned by the multi-tenant virtual team label Jul 27, 2023
craig bot pushed a commit that referenced this issue Jul 28, 2023
107612: serverutils: clean up some interfaces r=knz a=knz

Epic: CRDB-18499
Part of solving #107058.
Informs #106772.

In this commit:

- `TestTenantInterface` renamed to `ApplicationLayerInterface`.

- `TestServerInterface` split into:

  - `StorageLayerInterface` containing most of the methods previously
    in `TestServerInterface`.

  - `TenantControlInterface` with just the methods relating to
    starting the service for secondary tenants.

- Relevant methods moved from `TestServerInterface` to
  `ApplicationLayerInterface`:

  - `ServingSQLAddr()` - renamed to `AdvSQLAddr()`
  - `RPCAddr()`
  - `LeaseManager()`
  - `DistSenderI()`
  - `MigrationServer()`
  - `SetDistSQLSpanResolver()`
  - `CollectionFactory()`
  - `SystemTableIDResolver()`

- Removed `HostSQLAddr()` - use `.SystemLayer().AdvSQLAddr()` instead.

- `AdvSQLAddr()` (previously `ServingSQLAddr()`) was
  simplified to always return the advertised SQL address for the
  current layer.

- Updated comments/docstrings for the interfaces.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this issue Jul 28, 2023
107391: roachtest: include link to testeng grafana in issue posts r=smg260,tbg a=annrpom

This adds a link, populated with relevant cluster name and test timeframe, to the testeng grafana instance for failed roachtests.

Fixes: #105894
Release note: None

107659: serverutils: provide SQLConn/SQLConnE in ApplicationLayerInterface r=stevendanna a=knz

Fixes  #107672.
Part of solving #107058.
Informs #106772.

Epic: CRDB-18499



107697: rpc: avoid crash in newPeer r=erikgrinaker a=tbg

It was previously possible to make a new peer while the old one was in the
middle of being deleted, which caused a crash due to to acquiring child metrics
when they still existed.

Luckily, this is easy enough to fix: just remove some premature optimization
where I had tried to be too clever.

Fixes #105335.

Epic: CRDB-21710
Release note: None (bug never released)

107721: asim: skip TestAllocatorSimulatorDeterministic and example_fulldisk r=wenyihu6 a=wenyihu6

We found some non-deterministic behavior in the allocator simulator (see #105904
for more details). For now, we are skipping these potentially flaky tests.

Release Note: None 
Epic: None

107728: persistedsqlstats: specify background qos for compaction job r=xinhaoz a=xinhaoz

The compaction job can be an expensive operation so we should de-prioritize it with the `UserLow` qos setting.

Fixes: #99949

Release note: None

107750: ui: fix app = empty string filter on stmts page r=xinhaoz a=xinhaoz

The filter on app name = empty string was not working on
the stmts page. This was due to the fact that we use (unset)
as the option in the filter to represent selecting the empty
string app name. However when filtering statements, the empty
string app name on the stmt was not changed accordingly.
this commit fixes this and also adds testing for the unset case.

Epic: none
Fixes: #107748

Release note (bug fix): Filter on stmts page works for
app name = empty string (represented as 'unset').

https://www.loom.com/share/2fee4f0fb7b04208803e0dac1d9694ab?sid=5cabecf9-1c2a-406b-89a8-b378ed07d329



107753: backupccl: deflake TestBackupAndRestoreJobDescription r=stevendanna a=adityamaru

This change sorts the jobs based on when they
were created to ensure we get a stable sort of
job descriptions.

Fixes: #107684
Release note: None

Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: wenyihu6 <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: adityamaru <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. db-cy-23 T-multitenant Issues owned by the multi-tenant virtual team
Projects
None yet
Development

No branches or pull requests

1 participant