Skip to content

Commit

Permalink
tabledesc_test: deflake TestIndexDoesNotStorePrimaryKeyColumnMixedVer…
Browse files Browse the repository at this point in the history
…sion

Previously, this test would flake under `--stress` and the root cause
is because the upgrade precondition check that it relies on actually
checks for corrupt descriptor at AOST '-10s', and it's possible that
the corrupt descriptor was injected within the last 10s to the system,
and thus the check produced a "false negative" (meaning it saw
there isn't no corruption at AOST '-10s' and mistakenly assumed there
isn't corruption now) and thus mistakenly allowed the cluster upgrade
to proceed.

The AOST '-10s' trick in the upgrade was implemented to help with
performance, and, for now, I think that outweighs the possibility of
producing such false-negatives, so this fix instead modified the test such
that it waits 10s after injecting the corrupt descriptor to ensure it
will be seen by the time the upgrade logic runs. The downside that comes
with this decision is this test will now take ~10-11s to run and we need
to skip it under `--stress` (bc 1000 iterations / # of parallelism * 10s
can still be larger than 1h, the default timeout for nightly stresses).

Release note: None
  • Loading branch information
Xiang-Gu committed Dec 19, 2023
1 parent 6096da0 commit de78ad1
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 4 deletions.
2 changes: 1 addition & 1 deletion pkg/sql/catalog/tabledesc/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ go_test(
"//pkg/sql/types",
"//pkg/testutils",
"//pkg/testutils/serverutils",
"//pkg/testutils/skip",
"//pkg/testutils/sqlutils",
"//pkg/upgrade/upgrades",
"//pkg/util/hlc",
"//pkg/util/leaktest",
"//pkg/util/log",
Expand Down
5 changes: 3 additions & 2 deletions pkg/sql/catalog/tabledesc/validate_version_gating_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ import (
"github.com/cockroachdb/cockroach/pkg/sql/catalog/descpb"
"github.com/cockroachdb/cockroach/pkg/sql/catalog/tabledesc"
"github.com/cockroachdb/cockroach/pkg/testutils/serverutils"
"github.com/cockroachdb/cockroach/pkg/testutils/skip"
"github.com/cockroachdb/cockroach/pkg/testutils/sqlutils"
"github.com/cockroachdb/cockroach/pkg/upgrade/upgrades"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
Expand Down Expand Up @@ -54,7 +54,6 @@ import (
func TestIndexDoesNotStorePrimaryKeyColumnMixedVersion(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)
skip.UnderStressRace(t)

// Start a test cluster whose cluster version is MinSupported and auto-upgrade is
// disabled.
Expand Down Expand Up @@ -115,6 +114,8 @@ func TestIndexDoesNotStorePrimaryKeyColumnMixedVersion(t *testing.T) {

// Assert cluster version upgrade is blocked.
require.Equal(t, [][]string{{"1000023.1"}}, tdb.QueryStr(t, "SHOW CLUSTER SETTING version;"))
// disable AOST for upgrade precondition check to ensure it sees the injected, corrupt descriptor.
defer upgrades.TestingSetFirstUpgradePreconditionAOST(false)()
_, err := sqlDB.Exec(`SET CLUSTER SETTING version = $1`, clusterversion.Latest.String())
require.ErrorContains(t, err, `verifying precondition for version 1000023.1-upgrading-to-1000023.2-step-002: "".crdb_internal.invalid_objects is not empty`)
}
Expand Down
12 changes: 11 additions & 1 deletion pkg/upgrade/upgrades/first_upgrade.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,16 @@ func upgradeDescriptors(
})
}

var firstUpgradePreconditionUsesAOST = true

// TestingSetFirstUpgradePreconditionAOST allows tests to disable withAOST trick in upgrade
// precondition check, which otherwise can produce false negative.
func TestingSetFirstUpgradePreconditionAOST(enabled bool) func() {
old := firstUpgradePreconditionUsesAOST
firstUpgradePreconditionUsesAOST = enabled
return func() { firstUpgradePreconditionUsesAOST = old }
}

// FirstUpgradeFromReleasePrecondition is the precondition check for upgrading
// from any supported major release.
//
Expand All @@ -118,7 +128,7 @@ func FirstUpgradeFromReleasePrecondition(
// a diagnostic query. If no corruptions were found back then, we assume that
// there are no corruptions now. Otherwise, we retry and do everything
// without an AOST clause henceforth.
withAOST := true
withAOST := firstUpgradePreconditionUsesAOST
diagnose := func(tbl string) (hasRows bool, err error) {
q := fmt.Sprintf("SELECT count(*) FROM \"\".crdb_internal.%s", tbl)
if withAOST {
Expand Down

0 comments on commit de78ad1

Please sign in to comment.