testutils: the multi-tenant test configs are too complex #106772

knz · 2023-07-13T18:26:50Z

Problem description

There are too many test configs.

Whether or not to start a tenant server to run the test's queries:
- random (default - desirable)
- always-off
- always-on
Which tenant server config to use:
- Serverless-like config, using a networked KV tenant config + rate limiting - StartTenant()
- Dedicated-like config - StartSharedProcessTenant()

All these choices results in (currently) 6 different possible test configurations.

In addition to that complexity, some tests ignore the config above and configure their test tenants manually themselves.

In those tests, it's possible to see at run-time both the test tenant created by the test framework, and the test tenant created by the test code, side-by-side. This is confusing.

Expected solution

We want to collapse the selection of whether or not to use a test tenant and which server config to use into just 1 knob with just 2 possible values:

auto-test-tenant-probabilistically (default):
- in CCL builds: 30% chance to have no secondary tenant; 30% to use serverless-like config, 30% to use dedicated-like config.
- in OSS builds: 30% chance to have no secondary tenant; 60% chance to use the dedicated-like config.
no-auto-test-tenant: no secondary tenant created by the test framework.

The latter setting can have informational variants (like those defined in #106768) to distinguish cases where there's a bug (yet to be fixed) and from cases when the test truly wishes no automatic behavior.

Jira issue: CRDB-29709

Epic CRDB-31933

The text was updated successfully, but these errors were encountered:

knz · 2023-07-13T18:31:41Z

Another thought that came up during tech pod today:

use the same mechanism "server controller" to start the secondary tenants in TestServer, both for the serverless-like and the dedicated-like configs. Currently TestServer uses bespoke orchestration for the serverless-like case (StarTenant) but we could instead make server controller responsible for running the service.

106768: base,server,testutils: clarify the multi-tenant test situations r=yuzefovich a=knz Informs #76378 . Informs #106772. Epic: CRDB-18499 Prior to this patch, a test could choose only between the following options: - `TestTenantProbabilisticOnly` and `TestTenantProbabilistic`: whether or not the load goes to a secondary tenant is randomly selected for each test. - `TestTenantEnabled`: always use a secondary tenant. - `TestTenantDisabled`: always disable the secondary tenant. In particular the last option was problematic, as we ended up using it both in the case "the test doesn't work with secondary tenants and *we don't know why*" and the case "it doesn't work, we know why *and it's OK to keep it that way*". We actually want to distinguish these cases, because the former case requires further investigation, and we want the test code to refer to this follow-up work more clearly. To address this, this commit defines the following options: ```go // (UNCHANGED) TestTenantProbabilisticOnly // (UNCHANGED) TestTenantProbabilistic // RENAMED: was previously "TestTenantEnabled" // // TestTenantAlwaysEnabled will always start the default test tenant. This is useful // for quickly verifying that a test works with tenants enabled. // // Note: this value should not be used for checked in test code // unless there is a good reason to do so. We want the common case // to use TestTenantProbabilistic or TestTenantProbabilisticOnly. TestTenantAlwaysEnabled // RENAMED: was previously "TestTenantDisabled" // // TODOTestTenantDisabled should not be used anymore. Use the // other values instead. // TODO(#76378): Review existing tests and use the proper value instead. TODOTestTenantDisabled = DefaultTestTenantOptions{testBehavior: ttDisabled} // (NEW) TestIsSpecificToStorageLayerAndNeedsASystemTenant is used when // the test needs to be given access to a SQL conn to a tenant with // sufficient capabilities to access all the storage layer. // (Initially that'd be "the" system tenant.) TestIsSpecificToStorageLayerAndNeedsASystemTenant // (NEW) TestControlsTenantsExplicitly is used when the test wants to // manage its own secondary tenants and tenant servers. TestControlsTenantsExplicitly // (NEW) TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs is used when // a test wants to use a single set of testing knobs for both the // storage layer and the SQL layer and for simplicity of the test // code we want to give that test a simplified environment. // // Note: it is debatable whether the gain in test code simplicity is // worth the cost of never running that test with the virtualization // layer active. TestNeedsTightIntegrationBetweenAPIsAndTestingKnobs // (NEW) TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet can be used // to disable virtualization because the test doesn't appear to be compatible // with it, and we don't understand it yet. // It should link to a github issue with label C-test-failure. TestDoesNotWorkWithSecondaryTenantsButWeDontKnowWhyYet(issueNumber int) // (NEW) TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet can be // used to disable virtualization because the test exercises a feature // known not to work with virtualization enabled yet but we wish it // to work eventually. // // It should link to a github issue with label C-bug // and the issue should be linked to an epic under INI-213 or INI-214. TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(issueNumber int) ``` Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>

107612: serverutils: clean up some interfaces r=knz a=knz Epic: CRDB-18499 Part of solving #107058. Informs #106772. In this commit: - `TestTenantInterface` renamed to `ApplicationLayerInterface`. - `TestServerInterface` split into: - `StorageLayerInterface` containing most of the methods previously in `TestServerInterface`. - `TenantControlInterface` with just the methods relating to starting the service for secondary tenants. - Relevant methods moved from `TestServerInterface` to `ApplicationLayerInterface`: - `ServingSQLAddr()` - renamed to `AdvSQLAddr()` - `RPCAddr()` - `LeaseManager()` - `DistSenderI()` - `MigrationServer()` - `SetDistSQLSpanResolver()` - `CollectionFactory()` - `SystemTableIDResolver()` - Removed `HostSQLAddr()` - use `.SystemLayer().AdvSQLAddr()` instead. - `AdvSQLAddr()` (previously `ServingSQLAddr()`) was simplified to always return the advertised SQL address for the current layer. - Updated comments/docstrings for the interfaces. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>

107391: roachtest: include link to testeng grafana in issue posts r=smg260,tbg a=annrpom This adds a link, populated with relevant cluster name and test timeframe, to the testeng grafana instance for failed roachtests. Fixes: #105894 Release note: None 107659: serverutils: provide SQLConn/SQLConnE in ApplicationLayerInterface r=stevendanna a=knz Fixes #107672. Part of solving #107058. Informs #106772. Epic: CRDB-18499 107697: rpc: avoid crash in newPeer r=erikgrinaker a=tbg It was previously possible to make a new peer while the old one was in the middle of being deleted, which caused a crash due to to acquiring child metrics when they still existed. Luckily, this is easy enough to fix: just remove some premature optimization where I had tried to be too clever. Fixes #105335. Epic: CRDB-21710 Release note: None (bug never released) 107721: asim: skip TestAllocatorSimulatorDeterministic and example_fulldisk r=wenyihu6 a=wenyihu6 We found some non-deterministic behavior in the allocator simulator (see #105904 for more details). For now, we are skipping these potentially flaky tests. Release Note: None Epic: None 107728: persistedsqlstats: specify background qos for compaction job r=xinhaoz a=xinhaoz The compaction job can be an expensive operation so we should de-prioritize it with the `UserLow` qos setting. Fixes: #99949 Release note: None 107750: ui: fix app = empty string filter on stmts page r=xinhaoz a=xinhaoz The filter on app name = empty string was not working on the stmts page. This was due to the fact that we use (unset) as the option in the filter to represent selecting the empty string app name. However when filtering statements, the empty string app name on the stmt was not changed accordingly. this commit fixes this and also adds testing for the unset case. Epic: none Fixes: #107748 Release note (bug fix): Filter on stmts page works for app name = empty string (represented as 'unset'). https://www.loom.com/share/2fee4f0fb7b04208803e0dac1d9694ab?sid=5cabecf9-1c2a-406b-89a8-b378ed07d329 107753: backupccl: deflake TestBackupAndRestoreJobDescription r=stevendanna a=adityamaru This change sorts the jobs based on when they were created to ensure we get a stable sort of job descriptions. Fixes: #107684 Release note: None Co-authored-by: Annie Pompa <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: wenyihu6 <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: adityamaru <[email protected]>

knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure A-multitenancy Related to multi-tenancy labels Jul 13, 2023

knz mentioned this issue Mar 29, 2023

[dnm] server: add multi-tenant modes for default test tenant #99497

Closed

knz mentioned this issue Jul 13, 2023

base,server,testutils: clarify the multi-tenant test situations #106768

Merged

knz mentioned this issue Jul 14, 2023

multitenant: Investigate tests which fail when run using the default test tenant #76378

Open

This was referenced Jul 26, 2023

serverutils: clean up some interfaces #107612

Merged

serverutils: provide SQLConn/SQLConnE in ApplicationLayerInterface #107659

Merged

knz added the db-cy-23 label Jul 26, 2023

knz mentioned this issue Jul 27, 2023

serverutils: merge StartTenant and StartSharedProcessTenant #107695

Closed

knz added the T-multitenant Issues owned by the multi-tenant virtual team label Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testutils: the multi-tenant test configs are too complex #106772

testutils: the multi-tenant test configs are too complex #106772

knz commented Jul 13, 2023 •

edited by exalate-issue-sync bot

Loading

knz commented Jul 13, 2023

testutils: the multi-tenant test configs are too complex #106772

testutils: the multi-tenant test configs are too complex #106772

Comments

knz commented Jul 13, 2023 • edited by exalate-issue-sync bot Loading

Problem description

Expected solution

knz commented Jul 13, 2023

knz commented Jul 13, 2023 •

edited by exalate-issue-sync bot

Loading